XZ compressed kernel broken?

2018-07-08 Thread Felix von Leitner
I could not get kernel 4.17 or up to boot on my laptop. I tried various
things, started a new config from scratch on a freshly downloaded (not
up-patched) source tree, to no avail.

Today I tried switching kernel compression from XZ to GZIP and now it
boots fine.

Just as a PSA if your kernel just insta-resets after being loaded from
grub.

Might be some kind of freak interdependency, didn't analyze further.

Felix


XZ compressed kernel broken?

2018-07-08 Thread Felix von Leitner
I could not get kernel 4.17 or up to boot on my laptop. I tried various
things, started a new config from scratch on a freshly downloaded (not
up-patched) source tree, to no avail.

Today I tried switching kernel compression from XZ to GZIP and now it
boots fine.

Just as a PSA if your kernel just insta-resets after being loaded from
grub.

Might be some kind of freak interdependency, didn't analyze further.

Felix


Re: getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Thus spake Peter Meerwald-Stadler (pme...@pmeerw.net):
> > I am trying to add inotify support to my tail implementation (for -F).
> > This is what happens:
> > 
> >   inotify_init()  = 4
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
> >   inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
> > 
> > There is also some polling, some reading and some statting going on here, 
> > but
> > those are on other descriptors than 4 so they should not matter).
> > 
> > Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?
> > This is a stock kernel 4.5.0.

> #include 
> #include 
> int main() {
> int fd, i, j;
> printf("init %d\n", fd=inotify_init()); // 3
> printf("add %d\n", i=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 1
> printf("rm %d\n", inotify_rm_watch(fd, i)); // 0
> printf("add %d\n", j=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 2
> return 0;
> }

> Ubuntu kernel x86_64 4.4.0-21, seems to work here
> so we have to guess what's going on between _add and _rm?

Wait!
It just occurred to me that this does not make any sense at all.
You use the name of the file with inotify_add_watch, not the descriptor
to the file. Why would closing the file matter?

My "load generator" test program is:

  #include 
  #include 
  #include 
  #include 

  int main() {
int fd=open("/tmp/foo",O_WRONLY|O_CREAT|O_TRUNC,0600);
assert(fd>-1);
sleep(1);
write(fd,"1\n",2);
sleep(1);
write(fd,"2\n",2);
int fd2=open("/tmp/bar",O_WRONLY|O_CREAT|O_TRUNC,0600);
assert(fd>-1);
write(fd2,"3\n",2);
rename("/tmp/bar","/tmp/foo");
close(fd);
sleep(1);
write(fd2,"4\n",2);
close(fd2);
  }

I touch /tmp/foo first, then I run my inotify tail -F on it, and I expect the
output to be

  1\n2\n3\n4\n

It is. Then I press Ctrl-C.

Here is the strace of the tail:

  execve("./bin-x86_64/tail", ["./bin-x86_64/tail", "-F", "/tmp/foo"], [/* 57 
vars */]) = 0
  arch_prctl(ARCH_SET_FS, 0x7fff1b1e2920) = 0
  rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_NODEFER, 0x4018d0}, 
{SIG_DFL, [], 0}, 8) = 0
  open("/tmp/foo", O_RDONLY)  = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f251cf9f000
  read(3, "", 32768)  = 0
  inotify_init()  = 4
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
  read(3, "1\n", 8192)= 2
  write(1, "1\n", 2)  = 2
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
  read(3, "2\n", 8192)= 2
  write(1, "2\n", 2)  = 2
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  close(3)= 0
  inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
  open("/tmp/foo", O_RDONLY)  = 3
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
  open("/tmp", O_RDONLY)  = 5
  read(3, "3\n4\n", 8192) = 4
  write(1, "3\n4\n", 4)   = 4
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = ? ERESTART_RESTARTBLOCK 
(Interrupted by signal)
  --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
  +++ killed by SIGINT +++

As you can see, I do close(3) and then inotify_rm_watch, and it returns EINVAL.
If I do the inotify_rm_watch first and then 

Re: getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Thus spake Peter Meerwald-Stadler (pme...@pmeerw.net):
> > I am trying to add inotify support to my tail implementation (for -F).
> > This is what happens:
> > 
> >   inotify_init()  = 4
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
> >   inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
> > 
> > There is also some polling, some reading and some statting going on here, 
> > but
> > those are on other descriptors than 4 so they should not matter).
> > 
> > Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?
> > This is a stock kernel 4.5.0.

> #include 
> #include 
> int main() {
> int fd, i, j;
> printf("init %d\n", fd=inotify_init()); // 3
> printf("add %d\n", i=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 1
> printf("rm %d\n", inotify_rm_watch(fd, i)); // 0
> printf("add %d\n", j=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 2
> return 0;
> }

> Ubuntu kernel x86_64 4.4.0-21, seems to work here
> so we have to guess what's going on between _add and _rm?

Wait!
It just occurred to me that this does not make any sense at all.
You use the name of the file with inotify_add_watch, not the descriptor
to the file. Why would closing the file matter?

My "load generator" test program is:

  #include 
  #include 
  #include 
  #include 

  int main() {
int fd=open("/tmp/foo",O_WRONLY|O_CREAT|O_TRUNC,0600);
assert(fd>-1);
sleep(1);
write(fd,"1\n",2);
sleep(1);
write(fd,"2\n",2);
int fd2=open("/tmp/bar",O_WRONLY|O_CREAT|O_TRUNC,0600);
assert(fd>-1);
write(fd2,"3\n",2);
rename("/tmp/bar","/tmp/foo");
close(fd);
sleep(1);
write(fd2,"4\n",2);
close(fd2);
  }

I touch /tmp/foo first, then I run my inotify tail -F on it, and I expect the
output to be

  1\n2\n3\n4\n

It is. Then I press Ctrl-C.

Here is the strace of the tail:

  execve("./bin-x86_64/tail", ["./bin-x86_64/tail", "-F", "/tmp/foo"], [/* 57 
vars */]) = 0
  arch_prctl(ARCH_SET_FS, 0x7fff1b1e2920) = 0
  rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_NODEFER, 0x4018d0}, 
{SIG_DFL, [], 0}, 8) = 0
  open("/tmp/foo", O_RDONLY)  = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f251cf9f000
  read(3, "", 32768)  = 0
  inotify_init()  = 4
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
  read(3, "1\n", 8192)= 2
  write(1, "1\n", 2)  = 2
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
  read(3, "2\n", 8192)= 2
  write(1, "2\n", 2)  = 2
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  close(3)= 0
  inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
  open("/tmp/foo", O_RDONLY)  = 3
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
  open("/tmp", O_RDONLY)  = 5
  read(3, "3\n4\n", 8192) = 4
  write(1, "3\n4\n", 4)   = 4
  read(3, "", 8192)   = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 1 ([{fd=4, revents=POLLIN}])
  read(4, "\1\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0", 2048) = 16
  fstat(3, {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = 0 (Timeout)
  fstat(3, {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  stat("/tmp/foo", {st_mode=S_IFREG|0600, st_size=4, ...}) = 0
  poll([{fd=4, events=POLLIN}], 1, 1000)  = ? ERESTART_RESTARTBLOCK 
(Interrupted by signal)
  --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
  +++ killed by SIGINT +++

As you can see, I do close(3) and then inotify_rm_watch, and it returns EINVAL.
If I do the inotify_rm_watch first and then 

Re: getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Thus spake Peter Meerwald-Stadler (pme...@pmeerw.net):
> > I am trying to add inotify support to my tail implementation (for -F).
> > This is what happens:
> > 
> >   inotify_init()  = 4
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
> >   inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
> > 
> > There is also some polling, some reading and some statting going on here, 
> > but
> > those are on other descriptors than 4 so they should not matter).
> > 
> > Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?
> > This is a stock kernel 4.5.0.

> #include 
> #include 
> int main() {
> int fd, i, j;
> printf("init %d\n", fd=inotify_init()); // 3
> printf("add %d\n", i=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 1
> printf("rm %d\n", inotify_rm_watch(fd, i)); // 0
> printf("add %d\n", j=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 2
> return 0;
> }

> Ubuntu kernel x86_64 4.4.0-21, seems to work here
> so we have to guess what's going on between _add and _rm?

Oh, it turns out to be my fault.
I called close() on the file first, then did inotify_rm_watch.

It was not clear to me from the documentation that that automatically
removes the inotify watch.

Sorry for the noise,

Felix


Re: getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Thus spake Peter Meerwald-Stadler (pme...@pmeerw.net):
> > I am trying to add inotify support to my tail implementation (for -F).
> > This is what happens:
> > 
> >   inotify_init()  = 4
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
> >   inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
> >   inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2
> > 
> > There is also some polling, some reading and some statting going on here, 
> > but
> > those are on other descriptors than 4 so they should not matter).
> > 
> > Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?
> > This is a stock kernel 4.5.0.

> #include 
> #include 
> int main() {
> int fd, i, j;
> printf("init %d\n", fd=inotify_init()); // 3
> printf("add %d\n", i=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 1
> printf("rm %d\n", inotify_rm_watch(fd, i)); // 0
> printf("add %d\n", j=inotify_add_watch(fd, "/tmp/foo", IN_MODIFY)); // 2
> return 0;
> }

> Ubuntu kernel x86_64 4.4.0-21, seems to work here
> so we have to guess what's going on between _add and _rm?

Oh, it turns out to be my fault.
I called close() on the file first, then did inotify_rm_watch.

It was not clear to me from the documentation that that automatically
removes the inotify watch.

Sorry for the noise,

Felix


getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Hi,

I am trying to add inotify support to my tail implementation (for -F).
This is what happens:

  inotify_init()  = 4
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
  inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2

There is also some polling, some reading and some statting going on here, but
those are on other descriptors than 4 so they should not matter).

Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?

This is a stock kernel 4.5.0.

Thanks,

Felix


getting mysterious (to me) EINVAL from inotify_rm_watch

2016-05-11 Thread Felix von Leitner
Hi,

I am trying to add inotify support to my tail implementation (for -F).
This is what happens:

  inotify_init()  = 4
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 1
  inotify_rm_watch(4, 1)  = -1 EINVAL (Invalid argument)
  inotify_add_watch(4, "/tmp/foo", IN_MODIFY) = 2

There is also some polling, some reading and some statting going on here, but
those are on other descriptors than 4 so they should not matter).

Can somebody explain the EINVAL I'm getting from inotify_rm_watch to me?

This is a stock kernel 4.5.0.

Thanks,

Felix


Re: fork on processes with lots of memory

2016-01-26 Thread Felix von Leitner
> Dear Linux kernel devs,

> I talked to someone who uses large Linux based hardware to run a
> process with huge memory requirements (think 4 GB), and he told me that
> if they do a fork() syscall on that process, the whole system comes to
> standstill. And not just for a second or two. He said they measured a 45
> minute (!) delay before the system became responsive again.

I'm sorry, I meant 4 TB not 4 GB.
I'm not used to working with that kind of memory sizes.

> Their working theory is that all the pages need to be marked copy-on-write
> in both processes, and if you touch one page, a copy needs to be made,
> and than just takes a while if you have a billion pages.

> I was wondering if there is any advice for such situations from the
> memory management people on this list.

> In this case the fork was for an execve afterwards, but I was going to
> recommend fork to them for something else that can not be tricked around
> with vfork.

> Can anyone comment on whether the 45 minute number sounds like it could
> be real? When I heard it, I was flabberghasted. But the other person
> swore it was real. Can a fork cause this much of a delay? Is there a way
> to work around it?

> I was going to recommend the fork to create a boundary between the
> processes, so that you can recover from memory corruption in one
> process. In fact, after the fork I would want to munmap almost all of
> the shared pages anyway, but there is no way to tell fork that.

> Thanks,

> Felix

> PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
> list.


fork on processes with lots of memory

2016-01-26 Thread Felix von Leitner
Dear Linux kernel devs,

I talked to someone who uses large Linux based hardware to run a
process with huge memory requirements (think 4 GB), and he told me that
if they do a fork() syscall on that process, the whole system comes to
standstill. And not just for a second or two. He said they measured a 45
minute (!) delay before the system became responsive again.

Their working theory is that all the pages need to be marked copy-on-write
in both processes, and if you touch one page, a copy needs to be made,
and than just takes a while if you have a billion pages.

I was wondering if there is any advice for such situations from the
memory management people on this list.

In this case the fork was for an execve afterwards, but I was going to
recommend fork to them for something else that can not be tricked around
with vfork.

Can anyone comment on whether the 45 minute number sounds like it could
be real? When I heard it, I was flabberghasted. But the other person
swore it was real. Can a fork cause this much of a delay? Is there a way
to work around it?

I was going to recommend the fork to create a boundary between the
processes, so that you can recover from memory corruption in one
process. In fact, after the fork I would want to munmap almost all of
the shared pages anyway, but there is no way to tell fork that.

Thanks,

Felix

PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
list.


fork on processes with lots of memory

2016-01-26 Thread Felix von Leitner
Dear Linux kernel devs,

I talked to someone who uses large Linux based hardware to run a
process with huge memory requirements (think 4 GB), and he told me that
if they do a fork() syscall on that process, the whole system comes to
standstill. And not just for a second or two. He said they measured a 45
minute (!) delay before the system became responsive again.

Their working theory is that all the pages need to be marked copy-on-write
in both processes, and if you touch one page, a copy needs to be made,
and than just takes a while if you have a billion pages.

I was wondering if there is any advice for such situations from the
memory management people on this list.

In this case the fork was for an execve afterwards, but I was going to
recommend fork to them for something else that can not be tricked around
with vfork.

Can anyone comment on whether the 45 minute number sounds like it could
be real? When I heard it, I was flabberghasted. But the other person
swore it was real. Can a fork cause this much of a delay? Is there a way
to work around it?

I was going to recommend the fork to create a boundary between the
processes, so that you can recover from memory corruption in one
process. In fact, after the fork I would want to munmap almost all of
the shared pages anyway, but there is no way to tell fork that.

Thanks,

Felix

PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
list.


Re: fork on processes with lots of memory

2016-01-26 Thread Felix von Leitner
> Dear Linux kernel devs,

> I talked to someone who uses large Linux based hardware to run a
> process with huge memory requirements (think 4 GB), and he told me that
> if they do a fork() syscall on that process, the whole system comes to
> standstill. And not just for a second or two. He said they measured a 45
> minute (!) delay before the system became responsive again.

I'm sorry, I meant 4 TB not 4 GB.
I'm not used to working with that kind of memory sizes.

> Their working theory is that all the pages need to be marked copy-on-write
> in both processes, and if you touch one page, a copy needs to be made,
> and than just takes a while if you have a billion pages.

> I was wondering if there is any advice for such situations from the
> memory management people on this list.

> In this case the fork was for an execve afterwards, but I was going to
> recommend fork to them for something else that can not be tricked around
> with vfork.

> Can anyone comment on whether the 45 minute number sounds like it could
> be real? When I heard it, I was flabberghasted. But the other person
> swore it was real. Can a fork cause this much of a delay? Is there a way
> to work around it?

> I was going to recommend the fork to create a boundary between the
> processes, so that you can recover from memory corruption in one
> process. In fact, after the fork I would want to munmap almost all of
> the shared pages anyway, but there is no way to tell fork that.

> Thanks,

> Felix

> PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
> list.


Re: security problem with seccomp-filter

2015-04-12 Thread Felix von Leitner
> What you're describing should work correctly (it's part of the
> regression test suite we use). So, given that, I'd love to get to the
> bottom of what you're seeing. Do you have a URL to your code? What
> architecture are you running on?

Well, I must be doing something wrong then.
I extracted a test case from my program.
I put it on http://ptrace.fefe.de/seccompfail.c

It installs three seccomp filters, the last one containing this:

DISALLOW_SYSCALL(prctl),

with

#define DISALLOW_SYSCALL(name) \
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)

It is my understanding that that should then kill the process if the
prctl syscall is called again.

I test this by attempting to install the very same seccomp filter again,
which calls prctl, but the process is not killed.

What am I doing wrong?

Thanks,
Felix
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 

#ifndef SECCOMP_MODE_FILTER
# define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
# define SECCOMP_RET_KILL	0xU /* kill the task immediately */
# define SECCOMP_RET_TRAP	0x0003U /* disallow and force a SIGSYS */
# define SECCOMP_RET_ALLOW	0x7fffU /* allow */
struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};
#endif
#ifndef SYS_SECCOMP
# define SYS_SECCOMP 1
#endif

#define syscall_nr (offsetof(struct seccomp_data, nr))

#if defined(__i386__)
# define REG_SYSCALL	REG_EAX
# define ARCH_NR	AUDIT_ARCH_I386
#elif defined(__x86_64__)
# define REG_SYSCALL	REG_RAX
# define ARCH_NR	AUDIT_ARCH_X86_64
#else
# error "Platform does not support seccomp filter yet"
#endif

#define ALLOW_SYSCALL(name) \
	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \
	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

static int install_syscall_filter(void) {
  /* Linux allows a process to restrict itself (and potential children)
   * in what syscalls can be issued.  The mechanism is called
   * seccomp-filter or "seccomp mode 2".  It works by reusing the
   * Berkeley Packet Filter, which is meant for PCAP-style packet
   * filtering expressions like "only TCP packets, please".  But it is
   * really a bytecode that has to be passed inside an array, and each
   * instruction is constructed using scary looking macros.  The basics
   * are not so bad, however.  We have two registers, one accumulator
   * and one index register (which is not used in this part of the
   * code), and instead of a network packet we are operating on a
   * certain struct with the syscall info, which is called seccomp_data
   * (reproduced above). */
  struct sock_filter filter[] = {
/* validate architecture to avoid x32-on-x86_64 syscall aliasing shenanigans */

/* BPF_LD = load, BPF_W = word, BPF_ABS = absolute offset */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, arch)),
/* BPF_JMP+BPF_JEQ+BPF_K = compare accumulator to constant (in our
 * case, ARCH_NR), and skip the next instruction if equal */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ARCH_NR, 1, 0),
/* "return SECCOMP_RET_KILL", tell seccomp to kill the process */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

/* load the syscall number */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

/* and now a list of allowed syscalls */
ALLOW_SYSCALL(rt_sigreturn),
#ifdef __NR_sigreturn
ALLOW_SYSCALL(sigreturn),
#endif
ALLOW_SYSCALL(exit_group),
ALLOW_SYSCALL(exit),

#ifdef __NR_socketcall
ALLOW_SYSCALL(socketcall),
#else
ALLOW_SYSCALL(socket),
ALLOW_SYSCALL(sendto),
ALLOW_SYSCALL(recvfrom),
#endif

ALLOW_SYSCALL(poll),

/* so we can further restrict allowed syscalls */
ALLOW_SYSCALL(prctl),

/* so gethostbyname can open /etc/resolv.conf */
ALLOW_SYSCALL(open),
ALLOW_SYSCALL(read),
ALLOW_SYSCALL(mmap),
ALLOW_SYSCALL(mmap2),
ALLOW_SYSCALL(munmap),
ALLOW_SYSCALL(lseek),
ALLOW_SYSCALL(_llseek),
ALLOW_SYSCALL(close),

/* for our time keeping */
ALLOW_SYSCALL(gettimeofday),	// x86_64 uses a vsyscall for this, so this filter will never trigger

/* for when buffer writes the output; since we only write to stdout, filter for fd==1 */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 0, 4),
/* it's write(2).  Load first argument into accumulator */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, args[0])),
/* if it's 1 (stdout), skip 1 instruction */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 1, 1, 0),
/* "return SECCOMP_RET_KILL", tell seccomp to kill the process */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
/* "return SECCOMP_RET_ALLOW", tell seccomp to allow the syscall */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

/* if none of these syscalls matched, kill the process */

Re: security problem with seccomp-filter

2015-04-12 Thread Felix von Leitner
 What you're describing should work correctly (it's part of the
 regression test suite we use). So, given that, I'd love to get to the
 bottom of what you're seeing. Do you have a URL to your code? What
 architecture are you running on?

Well, I must be doing something wrong then.
I extracted a test case from my program.
I put it on http://ptrace.fefe.de/seccompfail.c

It installs three seccomp filters, the last one containing this:

DISALLOW_SYSCALL(prctl),

with

#define DISALLOW_SYSCALL(name) \
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)

It is my understanding that that should then kill the process if the
prctl syscall is called again.

I test this by attempting to install the very same seccomp filter again,
which calls prctl, but the process is not killed.

What am I doing wrong?

Thanks,
Felix
#include stddef.h
#include features.h
#include inttypes.h
#include sys/socket.h
#include netinet/in.h
#include netinet/ip_icmp.h
#include arpa/inet.h
#include sys/poll.h
#include unistd.h
#include time.h
#include netdb.h
#include alloca.h
#include signal.h
#include errno.h

#include sys/prctl.h
#include linux/unistd.h
#include linux/audit.h
#include linux/filter.h
#include linux/seccomp.h

#ifndef SECCOMP_MODE_FILTER
# define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
# define SECCOMP_RET_KILL	0xU /* kill the task immediately */
# define SECCOMP_RET_TRAP	0x0003U /* disallow and force a SIGSYS */
# define SECCOMP_RET_ALLOW	0x7fffU /* allow */
struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};
#endif
#ifndef SYS_SECCOMP
# define SYS_SECCOMP 1
#endif

#define syscall_nr (offsetof(struct seccomp_data, nr))

#if defined(__i386__)
# define REG_SYSCALL	REG_EAX
# define ARCH_NR	AUDIT_ARCH_I386
#elif defined(__x86_64__)
# define REG_SYSCALL	REG_RAX
# define ARCH_NR	AUDIT_ARCH_X86_64
#else
# error Platform does not support seccomp filter yet
#endif

#define ALLOW_SYSCALL(name) \
	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \
	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

static int install_syscall_filter(void) {
  /* Linux allows a process to restrict itself (and potential children)
   * in what syscalls can be issued.  The mechanism is called
   * seccomp-filter or seccomp mode 2.  It works by reusing the
   * Berkeley Packet Filter, which is meant for PCAP-style packet
   * filtering expressions like only TCP packets, please.  But it is
   * really a bytecode that has to be passed inside an array, and each
   * instruction is constructed using scary looking macros.  The basics
   * are not so bad, however.  We have two registers, one accumulator
   * and one index register (which is not used in this part of the
   * code), and instead of a network packet we are operating on a
   * certain struct with the syscall info, which is called seccomp_data
   * (reproduced above). */
  struct sock_filter filter[] = {
/* validate architecture to avoid x32-on-x86_64 syscall aliasing shenanigans */

/* BPF_LD = load, BPF_W = word, BPF_ABS = absolute offset */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, arch)),
/* BPF_JMP+BPF_JEQ+BPF_K = compare accumulator to constant (in our
 * case, ARCH_NR), and skip the next instruction if equal */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ARCH_NR, 1, 0),
/* return SECCOMP_RET_KILL, tell seccomp to kill the process */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

/* load the syscall number */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

/* and now a list of allowed syscalls */
ALLOW_SYSCALL(rt_sigreturn),
#ifdef __NR_sigreturn
ALLOW_SYSCALL(sigreturn),
#endif
ALLOW_SYSCALL(exit_group),
ALLOW_SYSCALL(exit),

#ifdef __NR_socketcall
ALLOW_SYSCALL(socketcall),
#else
ALLOW_SYSCALL(socket),
ALLOW_SYSCALL(sendto),
ALLOW_SYSCALL(recvfrom),
#endif

ALLOW_SYSCALL(poll),

/* so we can further restrict allowed syscalls */
ALLOW_SYSCALL(prctl),

/* so gethostbyname can open /etc/resolv.conf */
ALLOW_SYSCALL(open),
ALLOW_SYSCALL(read),
ALLOW_SYSCALL(mmap),
ALLOW_SYSCALL(mmap2),
ALLOW_SYSCALL(munmap),
ALLOW_SYSCALL(lseek),
ALLOW_SYSCALL(_llseek),
ALLOW_SYSCALL(close),

/* for our time keeping */
ALLOW_SYSCALL(gettimeofday),	// x86_64 uses a vsyscall for this, so this filter will never trigger

/* for when buffer writes the output; since we only write to stdout, filter for fd==1 */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 0, 4),
/* it's write(2).  Load first argument into accumulator */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, args[0])),
/* if it's 1 (stdout), skip 1 instruction */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 1, 1, 0),
/* return SECCOMP_RET_KILL, tell seccomp to kill the process */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
/* return 

security problem with seccomp-filter

2015-03-27 Thread Felix von Leitner
Hi,

I have had some great success with seccomp-filter a while ago, so I
decided to use it to add some defense in depth to a ping program I wrote.

The premise is, like for all ping programs I assume, that it starts
setuid root, gets a raw socket, drops privileges, parses the command
line, potentially does a DNS lookup, and then it sends and receives
packets, using gettimeofday and poll.

So I added a seccomp filter that allows this. But where do you put it?
Ideally you'd want the filter installed right away after dropping
privileges, so the command line parsing and the DNS routines are
secured, too. But then you'd allow unnecessary attack surface (why allow
open after the DNS routines are done parsing /etc/resolv.conf, for
example?).

The documentation says you can add more than one seccomp filter, just
call prctl multiple times and allow prctl initially.

So that's what I did.

But when I added the secondary filters (which would blacklist open and
setsockopt), and for double checking tried installing the last one twice
(after the last one was supposed to blacklist prctl), to my surprise
my attempt did not lead to process termination but to a success return
value.

I think this is a serious security breach. Maybe I am the first one to
attempt to install multiple seccomp filters in the same process?
The observed behavior is consistent with only the first filter being
consulted.

I'm using stock kernel 3.19 for what it's worth.

Thanks,

Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


security problem with seccomp-filter

2015-03-27 Thread Felix von Leitner
Hi,

I have had some great success with seccomp-filter a while ago, so I
decided to use it to add some defense in depth to a ping program I wrote.

The premise is, like for all ping programs I assume, that it starts
setuid root, gets a raw socket, drops privileges, parses the command
line, potentially does a DNS lookup, and then it sends and receives
packets, using gettimeofday and poll.

So I added a seccomp filter that allows this. But where do you put it?
Ideally you'd want the filter installed right away after dropping
privileges, so the command line parsing and the DNS routines are
secured, too. But then you'd allow unnecessary attack surface (why allow
open after the DNS routines are done parsing /etc/resolv.conf, for
example?).

The documentation says you can add more than one seccomp filter, just
call prctl multiple times and allow prctl initially.

So that's what I did.

But when I added the secondary filters (which would blacklist open and
setsockopt), and for double checking tried installing the last one twice
(after the last one was supposed to blacklist prctl), to my surprise
my attempt did not lead to process termination but to a success return
value.

I think this is a serious security breach. Maybe I am the first one to
attempt to install multiple seccomp filters in the same process?
The observed behavior is consistent with only the first filter being
consulted.

I'm using stock kernel 3.19 for what it's worth.

Thanks,

Felix
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
> Past performance is no guarantee of current correctness :)  And over an 
> Ethernet, there will be a very different set of both timings and TCP 
> segment sizes compared to loopback.

> My guess is that you will find setting the lo mtu to 1500 a very 
> interesting experiment.

Setting the MTU on lo to 1500 eliminates the problem and gives me double
digit MB/sec throughput.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
> >Oh I'm pretty sure it's not my application, because my application performs
> >well over ethernet, which is after all its purpose.  Also I see the
> >write, the TCP uncork, then a pause, and then the packet leaving.
> Well, a wise old engineer tried to teach me that the proper spelling is 
> ass-u-me :) so just for grins, you might try the TCP_RR test anyway :)  And 
> even if your application is correct (although I wonder why the receiver 
> isn't sucking data-out very quickly...) if you can reproduce the problem 
> with netperf it will be easier for others to do so.

My application is only the server, the receiver is smbget from Samba, so
I don't feel responsible for it :-)

Still, when run over Ethernet, it works fine without waiting for
timeouts to expire.

To reproduce this:

  - smbget is from samba, you probably already have this
  - gatling (my server) can be gotten from
cvs -d :pserver:[EMAIL PROTECTED]:/cvs -z9 co dietlibc libowfat gatling

dietlibc is not strictly needed, but it's my environment.
First built dietlibc, then libowfat, then gatling.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TCP_DEFER_ACCEPT issues

2007-11-02 Thread Felix von Leitner
Thus spake Eric Dumazet ([EMAIL PROTECTED]):
> 1) Setting a timeout in a millisecond range (< 1000) is not very good 
> because some clients may need much more time to send your server the data 
> (very long distance). So a second granularity is OK.

I want millisecond accuracy for consistency.  select and poll have it,
we have a 1000 Hz timer, we should also expose that accuracy.  I don't
want to have sub second timeouts, in case you were wondering.

> 2) After timeout is elapsed, the server tcp stack has no socket associated 
> to your client attempt. So closing the server listening socket wont be able 
> to send RST. I agree a RST *should* be sent by the server once the timeout 
> is triggered.

I don't see any evidence for a timeout happening at all.
I passed 1 as argument to the setsockopt, so I'd expect a timeout to
happen pretty quickly.  There was no connection reset until I Ctrl-C'd
the server 15 minuets (!) laster.

> A typical tcpdump of what is happening for a tcp_defer_accept timeout of 20 
> seconds is :

> [1]08:52:47.480291 IP client.60930 > server.http: S 
> 2498995442:2498995442(0) win 5840  0,nop,wscale 2>
> [2]08:52:47.480302 IP server.http > client.60930: S 
> 1173302644:1173302644(0) ack 2498995443 win 5840 
> [3]08:52:47.481669 IP client.60930 > server.http: . ack 1 win 5840

> [4]08:52:50.757543 IP server.http > client.60930: S 
> 1173302644:1173302644(0) ack 2498995443 win 5840 
> [5]08:52:50.758953 IP client.60930 > server.http: . ack 1 win 5840

> [6]08:52:56.760611 IP server.http > client.60930: S 
> 1173302644:1173302644(0) ack 2498995443 win 5840 
> [7]08:52:56.761886 IP client.60930 > server.http: . ack 1 win 5840

> [8]08:53:08.771254 IP server.http > client.60930: S 
> 1173302644:1173302644(0) ack 2498995443 win 5840 
> [9]08:53:08.772514 IP client.60930 > server.http: . ack 1 win 5840

> [10]08:53:32.782488 IP server.http > client.60930: S 
> 1173302644:1173302644(0) ack 2498995443 win 5840 
> [11]08:53:32.783754 IP client.60930 > server.http: . ack 1 win 5840

> 

> [12]08:59:30.509097 IP client.60930 > server.http: P 1:3(2) ack 1 win 5840
> [13]08:59:30.509125 IP server.http > client.60930: R 
> 1173302645:1173302645(0) win 0

I see this, too.  If I connect and not send something, I expected the
kernel to drop the connection when the timeout is reached.  Nothing like
that happens.

> So TCP_DEFER_ACCEPT might send way more packets than needed.

Only in the face of attackers, and after the handshake.  I could live
with that.  If the timeout happened.

> We only should wait for the data coming from the client to be able to pass 
> the new socket to the listening application.

Yes.  And we should send a RST if no data is coming in within the
timeout, which is not happening for me (2.6.23).

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
> >How could I test this theory?
> Can you take another trace that isn't so "cooked?"  One that just sticks 
> with TCP-level and below stuff?

Sorry for taking so long.  Here is a tcpdump.  The side on port 445 is
the SMB server using TCP_CORK.

23:03:20.283772 IP 127.0.0.1.33230 > 127.0.0.1.445: S 1503927325:1503927325(0) 
win 32792 
23:03:20.283774 IP 127.0.0.1.445 > 127.0.0.1.33230: S 1513925692:1513925692(0) 
ack 1503927326 win 32768 
23:03:20.283797 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 1 win 257 

23:03:20.295851 IP 127.0.0.1.33230 > 127.0.0.1.445: P 1:195(194) ack 1 win 257 

23:03:20.295881 IP 127.0.0.1.445 > 127.0.0.1.33230: . ack 195 win 265 

23:03:20.295959 IP 127.0.0.1.445 > 127.0.0.1.33230: P 1:87(86) ack 195 win 265 

23:03:20.295998 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 87 win 256 

23:03:20.296063 IP 127.0.0.1.33230 > 127.0.0.1.445: P 195:287(92) ack 87 win 
256 
23:03:20.296096 IP 127.0.0.1.445 > 127.0.0.1.33230: P 87:181(94) ack 287 win 
265 
23:03:20.296135 IP 127.0.0.1.33230 > 127.0.0.1.445: P 287:373(86) ack 181 win 
255 
23:03:20.296163 IP 127.0.0.1.445 > 127.0.0.1.33230: P 181:239(58) ack 373 win 
265 
23:03:20.296201 IP 127.0.0.1.33230 > 127.0.0.1.445: P 373:459(86) ack 239 win 
255 
23:03:20.296245 IP 127.0.0.1.445 > 127.0.0.1.33230: P 239:309(70) ack 459 win 
265 
23:03:20.296286 IP 127.0.0.1.33230 > 127.0.0.1.445: P 459:535(76) ack 309 win 
254 
23:03:20.296314 IP 127.0.0.1.445 > 127.0.0.1.33230: P 309:461(152) ack 535 win 
265 
23:03:20.296361 IP 127.0.0.1.33230 > 127.0.0.1.445: P 535:594(59) ack 461 win 
253 
23:03:20.296400 IP 127.0.0.1.445 > 127.0.0.1.33230: . 461:16845(16384) ack 594 
win 265 
23:03:20.335748 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 16845 win 125 


[note the .2 sec pause]

23:03:20.547763 IP 127.0.0.1.445 > 127.0.0.1.33230: P 16845:32845(16000) ack 
594 win 265 
23:03:20.547797 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 32845 win 0 

23:03:20.547855 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 32845 win 96 

23:03:20.547863 IP 127.0.0.1.445 > 127.0.0.1.33230: P 32845:33229(384) ack 594 
win 265 
23:03:20.547890 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 33229 win 96 


[note the .2 sec pause]

23:03:20.755775 IP 127.0.0.1.445 > 127.0.0.1.33230: P 33229:45517(12288) ack 
594 win 265 
23:03:20.755855 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 45517 win 96 

23:03:20.755868 IP 127.0.0.1.445 > 127.0.0.1.33230: P 45517:49613(4096) ack 594 
win 265 
23:03:20.755898 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 49613 win 96 


[another one]

23:03:20.963789 IP 127.0.0.1.445 > 127.0.0.1.33230: P 49613:61901(12288) ack 
594 win 265 
23:03:20.963871 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 61901 win 96 

23:03:20.963885 IP 127.0.0.1.445 > 127.0.0.1.33230: P 61901:64525(2624) ack 594 
win 265 
23:03:20.963909 IP 127.0.0.1.33230 > 127.0.0.1.445: . ack 64525 win 96 

23:03:20.964101 IP 127.0.0.1.33230 > 127.0.0.1.445: P 594:653(59) ack 64525 win 
96 
23:03:21.003790 IP 127.0.0.1.445 > 127.0.0.1.33230: . ack 653 win 265 

23:03:21.171811 IP 127.0.0.1.445 > 127.0.0.1.33230: P 64525:76813(12288) ack 
653 win 265 

You get the idea.

Anyway, now THIS is the interesting case, because we have two packets in
the answer, and you see the first half of the answer leaving immediately
(when I wanted the whole answer to be sent) but the second only leaving
after the .2 sec delay.

> If SMB is a one-request-at-a-time protocol (I can never remember),

It is.

> you 
> could simulate it with a netperf TCP_RR test by passing suitable values to 
> the test-specific -r option:

> netperf -H  -t TCP_RR -- -r ,

> If that shows similar behaviour then you can ass-u-me it isn't your 
> application.

Oh I'm pretty sure it's not my application, because my application performs
well over ethernet, which is after all its purpose.  Also I see the
write, the TCP uncork, then a pause, and then the packet leaving.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TCP_DEFER_ACCEPT issues

2007-11-02 Thread Felix von Leitner
Thus spake Eric Dumazet ([EMAIL PROTECTED]):
 1) Setting a timeout in a millisecond range ( 1000) is not very good 
 because some clients may need much more time to send your server the data 
 (very long distance). So a second granularity is OK.

I want millisecond accuracy for consistency.  select and poll have it,
we have a 1000 Hz timer, we should also expose that accuracy.  I don't
want to have sub second timeouts, in case you were wondering.

 2) After timeout is elapsed, the server tcp stack has no socket associated 
 to your client attempt. So closing the server listening socket wont be able 
 to send RST. I agree a RST *should* be sent by the server once the timeout 
 is triggered.

I don't see any evidence for a timeout happening at all.
I passed 1 as argument to the setsockopt, so I'd expect a timeout to
happen pretty quickly.  There was no connection reset until I Ctrl-C'd
the server 15 minuets (!) laster.

 A typical tcpdump of what is happening for a tcp_defer_accept timeout of 20 
 seconds is :

 [1]08:52:47.480291 IP client.60930  server.http: S 
 2498995442:2498995442(0) win 5840 mss 1460,sackOK,timestamp 2685904595 
 0,nop,wscale 2
 [2]08:52:47.480302 IP server.http  client.60930: S 
 1173302644:1173302644(0) ack 2498995443 win 5840 mss 1460
 [3]08:52:47.481669 IP client.60930  server.http: . ack 1 win 5840

 [4]08:52:50.757543 IP server.http  client.60930: S 
 1173302644:1173302644(0) ack 2498995443 win 5840 mss 1460
 [5]08:52:50.758953 IP client.60930  server.http: . ack 1 win 5840

 [6]08:52:56.760611 IP server.http  client.60930: S 
 1173302644:1173302644(0) ack 2498995443 win 5840 mss 1460
 [7]08:52:56.761886 IP client.60930  server.http: . ack 1 win 5840

 [8]08:53:08.771254 IP server.http  client.60930: S 
 1173302644:1173302644(0) ack 2498995443 win 5840 mss 1460
 [9]08:53:08.772514 IP client.60930  server.http: . ack 1 win 5840

 [10]08:53:32.782488 IP server.http  client.60930: S 
 1173302644:1173302644(0) ack 2498995443 win 5840 mss 1460
 [11]08:53:32.783754 IP client.60930  server.http: . ack 1 win 5840

 a very long time, then client finally sends 2 bytes

 [12]08:59:30.509097 IP client.60930  server.http: P 1:3(2) ack 1 win 5840
 [13]08:59:30.509125 IP server.http  client.60930: R 
 1173302645:1173302645(0) win 0

I see this, too.  If I connect and not send something, I expected the
kernel to drop the connection when the timeout is reached.  Nothing like
that happens.

 So TCP_DEFER_ACCEPT might send way more packets than needed.

Only in the face of attackers, and after the handshake.  I could live
with that.  If the timeout happened.

 We only should wait for the data coming from the client to be able to pass 
 the new socket to the listening application.

Yes.  And we should send a RST if no data is coming in within the
timeout, which is not happening for me (2.6.23).

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
 How could I test this theory?
 Can you take another trace that isn't so cooked?  One that just sticks 
 with TCP-level and below stuff?

Sorry for taking so long.  Here is a tcpdump.  The side on port 445 is
the SMB server using TCP_CORK.

23:03:20.283772 IP 127.0.0.1.33230  127.0.0.1.445: S 1503927325:1503927325(0) 
win 32792 mss 16396,sackOK,timestamp 9451736 0,nop,wscale 7
23:03:20.283774 IP 127.0.0.1.445  127.0.0.1.33230: S 1513925692:1513925692(0) 
ack 1503927326 win 32768 mss 16396,sackOK,timestamp 9451737 9451736,nop,wscale 
7
23:03:20.283797 IP 127.0.0.1.33230  127.0.0.1.445: . ack 1 win 257 
nop,nop,timestamp 9451737 9451737
23:03:20.295851 IP 127.0.0.1.33230  127.0.0.1.445: P 1:195(194) ack 1 win 257 
nop,nop,timestamp 9451740 9451737
23:03:20.295881 IP 127.0.0.1.445  127.0.0.1.33230: . ack 195 win 265 
nop,nop,timestamp 9451740 9451740
23:03:20.295959 IP 127.0.0.1.445  127.0.0.1.33230: P 1:87(86) ack 195 win 265 
nop,nop,timestamp 9451740 9451740
23:03:20.295998 IP 127.0.0.1.33230  127.0.0.1.445: . ack 87 win 256 
nop,nop,timestamp 9451740 9451740
23:03:20.296063 IP 127.0.0.1.33230  127.0.0.1.445: P 195:287(92) ack 87 win 
256 nop,nop,timestamp 9451740 9451740
23:03:20.296096 IP 127.0.0.1.445  127.0.0.1.33230: P 87:181(94) ack 287 win 
265 nop,nop,timestamp 9451740 9451740
23:03:20.296135 IP 127.0.0.1.33230  127.0.0.1.445: P 287:373(86) ack 181 win 
255 nop,nop,timestamp 9451740 9451740
23:03:20.296163 IP 127.0.0.1.445  127.0.0.1.33230: P 181:239(58) ack 373 win 
265 nop,nop,timestamp 9451740 9451740
23:03:20.296201 IP 127.0.0.1.33230  127.0.0.1.445: P 373:459(86) ack 239 win 
255 nop,nop,timestamp 9451740 9451740
23:03:20.296245 IP 127.0.0.1.445  127.0.0.1.33230: P 239:309(70) ack 459 win 
265 nop,nop,timestamp 9451740 9451740
23:03:20.296286 IP 127.0.0.1.33230  127.0.0.1.445: P 459:535(76) ack 309 win 
254 nop,nop,timestamp 9451740 9451740
23:03:20.296314 IP 127.0.0.1.445  127.0.0.1.33230: P 309:461(152) ack 535 win 
265 nop,nop,timestamp 9451740 9451740
23:03:20.296361 IP 127.0.0.1.33230  127.0.0.1.445: P 535:594(59) ack 461 win 
253 nop,nop,timestamp 9451740 9451740
23:03:20.296400 IP 127.0.0.1.445  127.0.0.1.33230: . 461:16845(16384) ack 594 
win 265 nop,nop,timestamp 9451740 9451740
23:03:20.335748 IP 127.0.0.1.33230  127.0.0.1.445: . ack 16845 win 125 
nop,nop,timestamp 9451750 9451740

[note the .2 sec pause]

23:03:20.547763 IP 127.0.0.1.445  127.0.0.1.33230: P 16845:32845(16000) ack 
594 win 265 nop,nop,timestamp 9451803 9451750
23:03:20.547797 IP 127.0.0.1.33230  127.0.0.1.445: . ack 32845 win 0 
nop,nop,timestamp 9451803 9451803
23:03:20.547855 IP 127.0.0.1.33230  127.0.0.1.445: . ack 32845 win 96 
nop,nop,timestamp 9451803 9451803
23:03:20.547863 IP 127.0.0.1.445  127.0.0.1.33230: P 32845:33229(384) ack 594 
win 265 nop,nop,timestamp 9451803 9451803
23:03:20.547890 IP 127.0.0.1.33230  127.0.0.1.445: . ack 33229 win 96 
nop,nop,timestamp 9451803 9451803

[note the .2 sec pause]

23:03:20.755775 IP 127.0.0.1.445  127.0.0.1.33230: P 33229:45517(12288) ack 
594 win 265 nop,nop,timestamp 9451855 9451803
23:03:20.755855 IP 127.0.0.1.33230  127.0.0.1.445: . ack 45517 win 96 
nop,nop,timestamp 9451855 9451855
23:03:20.755868 IP 127.0.0.1.445  127.0.0.1.33230: P 45517:49613(4096) ack 594 
win 265 nop,nop,timestamp 9451855 9451855
23:03:20.755898 IP 127.0.0.1.33230  127.0.0.1.445: . ack 49613 win 96 
nop,nop,timestamp 9451855 9451855

[another one]

23:03:20.963789 IP 127.0.0.1.445  127.0.0.1.33230: P 49613:61901(12288) ack 
594 win 265 nop,nop,timestamp 9451907 9451855
23:03:20.963871 IP 127.0.0.1.33230  127.0.0.1.445: . ack 61901 win 96 
nop,nop,timestamp 9451907 9451907
23:03:20.963885 IP 127.0.0.1.445  127.0.0.1.33230: P 61901:64525(2624) ack 594 
win 265 nop,nop,timestamp 9451907 9451907
23:03:20.963909 IP 127.0.0.1.33230  127.0.0.1.445: . ack 64525 win 96 
nop,nop,timestamp 9451907 9451907
23:03:20.964101 IP 127.0.0.1.33230  127.0.0.1.445: P 594:653(59) ack 64525 win 
96 nop,nop,timestamp 9451907 9451907
23:03:21.003790 IP 127.0.0.1.445  127.0.0.1.33230: . ack 653 win 265 
nop,nop,timestamp 9451917 9451907
23:03:21.171811 IP 127.0.0.1.445  127.0.0.1.33230: P 64525:76813(12288) ack 
653 win 265 nop,nop,timestamp 9451959 9451907

You get the idea.

Anyway, now THIS is the interesting case, because we have two packets in
the answer, and you see the first half of the answer leaving immediately
(when I wanted the whole answer to be sent) but the second only leaving
after the .2 sec delay.

 If SMB is a one-request-at-a-time protocol (I can never remember),

It is.

 you 
 could simulate it with a netperf TCP_RR test by passing suitable values to 
 the test-specific -r option:

 netperf -H remote -t TCP_RR -- -r req,rsp

 If that shows similar behaviour then you can ass-u-me it isn't your 
 application.

Oh I'm pretty sure it's not my application, because my application performs
well over ethernet, which is after all its purpose.  Also I see the

Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
 Oh I'm pretty sure it's not my application, because my application performs
 well over ethernet, which is after all its purpose.  Also I see the
 write, the TCP uncork, then a pause, and then the packet leaving.
 Well, a wise old engineer tried to teach me that the proper spelling is 
 ass-u-me :) so just for grins, you might try the TCP_RR test anyway :)  And 
 even if your application is correct (although I wonder why the receiver 
 isn't sucking data-out very quickly...) if you can reproduce the problem 
 with netperf it will be easier for others to do so.

My application is only the server, the receiver is smbget from Samba, so
I don't feel responsible for it :-)

Still, when run over Ethernet, it works fine without waiting for
timeouts to expire.

To reproduce this:

  - smbget is from samba, you probably already have this
  - gatling (my server) can be gotten from
cvs -d :pserver:[EMAIL PROTECTED]:/cvs -z9 co dietlibc libowfat gatling

dietlibc is not strictly needed, but it's my environment.
First built dietlibc, then libowfat, then gatling.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-11-02 Thread Felix von Leitner
Thus spake Rick Jones ([EMAIL PROTECTED]):
 Past performance is no guarantee of current correctness :)  And over an 
 Ethernet, there will be a very different set of both timings and TCP 
 segment sizes compared to loopback.

 My guess is that you will find setting the lo mtu to 1500 a very 
 interesting experiment.

Setting the MTU on lo to 1500 eliminates the problem and gives me double
digit MB/sec throughput.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TCP_DEFER_ACCEPT issues

2007-11-01 Thread Felix von Leitner
I am trying to use TCP_DEFER_ACCEPT in my web server.

There are some operational problems.  First of all: timeout handling.  I
would like to be able to set a timeout in seconds (or better:
milliseconds) for how long the socket is allowed to sit there without
data coming in.  For high load situations, I have been enforcing
timeouts in the range of 15 seconds, otherwise someone can DoS the
server by opening a lot of connections and tying up data structures.

It is still possible, of course, to tie up kernel memory this way, by
not reacting to the FIN or RST packets and running into a timeout there,
too, but that is partially tunable via sysctl.

According to tcp(7) the int argument to TCP_DEFER_ACCEPT is in seconds.
In the kernel code, it's converted to TCP timeout units.  When I ran my
server, and connected without sending any data, nothing happened.  No
timeout.  Minutes later, the connection was still there.  Even worse:
when I killed (!) the server process (thus closing the server socket),
the client did not get a reset.  Only when I type something in the
telnet, I get a reset.  This appears to be very broken.

My suggestion:

  1. make the argument to the setsockopt be in seconds, or milliseconds.
  2. if the server socket is closed, reset all pending connections.

Comments?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TCP_DEFER_ACCEPT issues

2007-11-01 Thread Felix von Leitner
I am trying to use TCP_DEFER_ACCEPT in my web server.

There are some operational problems.  First of all: timeout handling.  I
would like to be able to set a timeout in seconds (or better:
milliseconds) for how long the socket is allowed to sit there without
data coming in.  For high load situations, I have been enforcing
timeouts in the range of 15 seconds, otherwise someone can DoS the
server by opening a lot of connections and tying up data structures.

It is still possible, of course, to tie up kernel memory this way, by
not reacting to the FIN or RST packets and running into a timeout there,
too, but that is partially tunable via sysctl.

According to tcp(7) the int argument to TCP_DEFER_ACCEPT is in seconds.
In the kernel code, it's converted to TCP timeout units.  When I ran my
server, and connected without sending any data, nothing happened.  No
timeout.  Minutes later, the connection was still there.  Even worse:
when I killed (!) the server process (thus closing the server socket),
the client did not get a reset.  Only when I type something in the
telnet, I get a reset.  This appears to be very broken.

My suggestion:

  1. make the argument to the setsockopt be in seconds, or milliseconds.
  2. if the server socket is closed, reset all pending connections.

Comments?

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-10-18 Thread Felix von Leitner
> the packet trace was a bit too cooked perhaps, but there were indications 
> that at times the TCP window was going to zero - perhaps something with 
> window updates or persist timers?

Does TCP use different window sizes on loopback?  Why is this not
happening on ethernet?

How could I test this theory?

My initial idea was that it has something todo with the different MTU on
loopback.  My initial block size was 16k, but the problem stayed when I
changed it to 64k.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-10-18 Thread Felix von Leitner
 the packet trace was a bit too cooked perhaps, but there were indications 
 that at times the TCP window was going to zero - perhaps something with 
 window updates or persist timers?

Does TCP use different window sizes on loopback?  Why is this not
happening on ethernet?

How could I test this theory?

My initial idea was that it has something todo with the different MTU on
loopback.  My initial block size was 16k, but the problem stayed when I
changed it to 64k.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bizarre network timing problem

2007-10-17 Thread Felix von Leitner
Thus spake Chuck Ebbert ([EMAIL PROTECTED]):
> > Any ideas what could cause this?
> (cc: netdev)

Maybe I should mention this, too:

accept(5, {sa_family=AF_INET6, sin6_port=htons(59821), inet_pton(AF_INET6, 
":::127.0.0.1", _addr), sin6_flowinfo=0, sin6_scope_id=0}, 
[18446744069414584348]) = 8
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0

And if it would be the Nagle algorithm, it should also impact the
ethernet case.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bizarre network timing problem

2007-10-17 Thread Felix von Leitner
I wrote a small read-only SMB server, and wanted to see how fast it was.
So I used smbget to download a moderately large file from it via localhost.
smbget only got ~70 KB/sec.

This is what the view from strace -tt on the server is:

22:44:58.812467 read(8, 
"\0\0\0007\377SMB.\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3"..., 
8192) = 59
22:44:58.812619 mmap(NULL, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b46b8e5e000
22:44:58.812729 fcntl(9, F_GETFL)   = 0x8000 (flags O_RDONLY|O_LARGEFILE)
22:44:58.812847 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=8, 
u64=13323248792850399240}}) = 0
22:44:58.812946 epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, 
u64=18251433459580936}}) = 0
22:44:58.813039 epoll_wait(7, {{EPOLLOUT, {u32=8, u64=18251433459580936}}}, 
100, 442) = 1
22:44:58.813132 setsockopt(8, SOL_TCP, TCP_CORK, [1], 4) = 0
22:44:58.813215 write(8, 
"\0\0\372<\377SMB.\0\0\0\0\200A\300\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3"..., 64) 
= 64
22:44:58.813323 sendfile(8, 9, [128000], 64000) = 64000
22:44:58.813430 setsockopt(8, SOL_TCP, TCP_CORK, [0], 4) = 0
22:44:58.813511 munmap(0x2b46b8e5e000, 8192) = 0
22:44:58.813600 epoll_wait(7, {{EPOLLOUT, {u32=8, u64=18251433459580936}}}, 
100, 442) = 1
22:44:58.813693 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=8, u64=8}}) = 0
22:44:58.813778 epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=8, 
u64=18252000395264008}}) = 0
22:44:58.813869 epoll_wait(7, {}, 100, 441) = 0
22:44:59.255789 epoll_wait(7, {{EPOLLIN, {u32=8, u64=18252000395264008}}}, 100, 
999) = 1
22:44:59.688519 read(8, 
"\0\0\0007\377SMB.\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3"..., 
8192) = 59


As you can see, the time difference between reading the query and writing the
result is very small, but there is a big delay before receiving the next 
request.

This is the view from a sniffer on the lo interface:

1192653899.688385127.0.0.1 -> 127.0.0.1SMB Read AndX Request, FID: 
0x0001, 64000 bytes at offset 192000
1192653899.688399127.0.0.1 -> 127.0.0.1TCP 445 > 42990 [ACK] Seq=192660 
Ack=779 Win=33920 Len=0 TSV=359208 TSER=359208
1192653899.895725127.0.0.1 -> 127.0.0.1SMB [TCP Window Full] Read AndX 
Response, FID: 0x0001, 64000 bytes
1192653899.895793127.0.0.1 -> 127.0.0.1TCP 42990 > 445 [ACK] Seq=779 
Ack=204948 Win=12288 Len=0 TSV=359260 TSER=359260
1192653899.895805127.0.0.1 -> 127.0.0.1NBSS NBSS Continuation Message
1192653899.935725127.0.0.1 -> 127.0.0.1TCP 42990 > 445 [ACK] Seq=779 
Ack=209044 Win=12288 Len=0 TSV=359270 TSER=359260
1192653900.147739127.0.0.1 -> 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.147767127.0.0.1 -> 127.0.0.1TCP [TCP ZeroWindow] 42990 > 445 
[ACK] Seq=779 Ack=221332 Win=0 Len=0 TSV=359323 TSER=359323
1192653900.147807127.0.0.1 -> 127.0.0.1TCP [TCP Window Update] 42990 > 
445 [ACK] Seq=779 Ack=221332 Win=12288 Len=0 TSV=359323 TSER=359323
1192653900.147815127.0.0.1 -> 127.0.0.1NBSS NBSS Continuation Message
1192653900.147837127.0.0.1 -> 127.0.0.1TCP 42990 > 445 [ACK] Seq=779 
Ack=225428 Win=12288 Len=0 TSV=359323 TSER=359323
1192653900.355754127.0.0.1 -> 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.355782127.0.0.1 -> 127.0.0.1TCP [TCP ZeroWindow] 42990 > 445 
[ACK] Seq=779 Ack=237716 Win=0 Len=0 TSV=359375 TSER=359375
1192653900.355820127.0.0.1 -> 127.0.0.1TCP [TCP Window Update] 42990 > 
445 [ACK] Seq=779 Ack=237716 Win=12288 Len=0 TSV=359375 TSER=359375
1192653900.355829127.0.0.1 -> 127.0.0.1NBSS NBSS Continuation Message
1192653900.355849127.0.0.1 -> 127.0.0.1TCP 42990 > 445 [ACK] Seq=779 
Ack=241812 Win=12288 Len=0 TSV=359375 TSER=359375
1192653900.563766127.0.0.1 -> 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.563794127.0.0.1 -> 127.0.0.1TCP [TCP ZeroWindow] 42990 > 445 
[ACK] Seq=779 Ack=254100 Win=0 Len=0 TSV=359427 TSER=359427
1192653900.563831127.0.0.1 -> 127.0.0.1TCP [TCP Window Update] 42990 > 
445 [ACK] Seq=779 Ack=254100 Win=12288 Len=0 TSV=359427 TSER=359427
1192653900.563839127.0.0.1 -> 127.0.0.1NBSS NBSS Continuation Message
1192653900.563858127.0.0.1 -> 127.0.0.1TCP 42990 > 445 [ACK] Seq=779 
Ack=256724 Win=12288 Len=0 TSV=359427 TSER=359427
1192653900.56127.0.0.1 -> 127.0.0.1SMB Read AndX Request, FID: 
0x0001, 64000 bytes at offset 256000


Note the delay between sending the response and getting the reply.
Also note that there is almost no delay between getting the reply and sending
the next request.

My understanding of TCP_CORK from the tcp(7) man page is that it should flush
out the data immediately, but the network trace seems to suggest that there is
a 200 ms delay between the request and the outgoing data.  tcp(7) says there is
a 200 ms delay for sending out data when the socket is in corked mode, so
uncorking does not appear to work.


Now for the strange part: the 

bizarre network timing problem

2007-10-17 Thread Felix von Leitner
I wrote a small read-only SMB server, and wanted to see how fast it was.
So I used smbget to download a moderately large file from it via localhost.
smbget only got ~70 KB/sec.

This is what the view from strace -tt on the server is:

22:44:58.812467 read(8, 
\0\0\0007\377SMB.\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3..., 
8192) = 59
22:44:58.812619 mmap(NULL, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b46b8e5e000
22:44:58.812729 fcntl(9, F_GETFL)   = 0x8000 (flags O_RDONLY|O_LARGEFILE)
22:44:58.812847 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=8, 
u64=13323248792850399240}}) = 0
22:44:58.812946 epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, 
u64=18251433459580936}}) = 0
22:44:58.813039 epoll_wait(7, {{EPOLLOUT, {u32=8, u64=18251433459580936}}}, 
100, 442) = 1
22:44:58.813132 setsockopt(8, SOL_TCP, TCP_CORK, [1], 4) = 0
22:44:58.813215 write(8, 
\0\0\372\377SMB.\0\0\0\0\200A\300\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3..., 64) 
= 64
22:44:58.813323 sendfile(8, 9, [128000], 64000) = 64000
22:44:58.813430 setsockopt(8, SOL_TCP, TCP_CORK, [0], 4) = 0
22:44:58.813511 munmap(0x2b46b8e5e000, 8192) = 0
22:44:58.813600 epoll_wait(7, {{EPOLLOUT, {u32=8, u64=18251433459580936}}}, 
100, 442) = 1
22:44:58.813693 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=8, u64=8}}) = 0
22:44:58.813778 epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=8, 
u64=18252000395264008}}) = 0
22:44:58.813869 epoll_wait(7, {}, 100, 441) = 0
22:44:59.255789 epoll_wait(7, {{EPOLLIN, {u32=8, u64=18252000395264008}}}, 100, 
999) = 1
22:44:59.688519 read(8, 
\0\0\0007\377SMB.\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\232\3..., 
8192) = 59


As you can see, the time difference between reading the query and writing the
result is very small, but there is a big delay before receiving the next 
request.

This is the view from a sniffer on the lo interface:

1192653899.688385127.0.0.1 - 127.0.0.1SMB Read AndX Request, FID: 
0x0001, 64000 bytes at offset 192000
1192653899.688399127.0.0.1 - 127.0.0.1TCP 445  42990 [ACK] Seq=192660 
Ack=779 Win=33920 Len=0 TSV=359208 TSER=359208
1192653899.895725127.0.0.1 - 127.0.0.1SMB [TCP Window Full] Read AndX 
Response, FID: 0x0001, 64000 bytes
1192653899.895793127.0.0.1 - 127.0.0.1TCP 42990  445 [ACK] Seq=779 
Ack=204948 Win=12288 Len=0 TSV=359260 TSER=359260
1192653899.895805127.0.0.1 - 127.0.0.1NBSS NBSS Continuation Message
1192653899.935725127.0.0.1 - 127.0.0.1TCP 42990  445 [ACK] Seq=779 
Ack=209044 Win=12288 Len=0 TSV=359270 TSER=359260
1192653900.147739127.0.0.1 - 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.147767127.0.0.1 - 127.0.0.1TCP [TCP ZeroWindow] 42990  445 
[ACK] Seq=779 Ack=221332 Win=0 Len=0 TSV=359323 TSER=359323
1192653900.147807127.0.0.1 - 127.0.0.1TCP [TCP Window Update] 42990  
445 [ACK] Seq=779 Ack=221332 Win=12288 Len=0 TSV=359323 TSER=359323
1192653900.147815127.0.0.1 - 127.0.0.1NBSS NBSS Continuation Message
1192653900.147837127.0.0.1 - 127.0.0.1TCP 42990  445 [ACK] Seq=779 
Ack=225428 Win=12288 Len=0 TSV=359323 TSER=359323
1192653900.355754127.0.0.1 - 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.355782127.0.0.1 - 127.0.0.1TCP [TCP ZeroWindow] 42990  445 
[ACK] Seq=779 Ack=237716 Win=0 Len=0 TSV=359375 TSER=359375
1192653900.355820127.0.0.1 - 127.0.0.1TCP [TCP Window Update] 42990  
445 [ACK] Seq=779 Ack=237716 Win=12288 Len=0 TSV=359375 TSER=359375
1192653900.355829127.0.0.1 - 127.0.0.1NBSS NBSS Continuation Message
1192653900.355849127.0.0.1 - 127.0.0.1TCP 42990  445 [ACK] Seq=779 
Ack=241812 Win=12288 Len=0 TSV=359375 TSER=359375
1192653900.563766127.0.0.1 - 127.0.0.1NBSS [TCP Window Full] NBSS 
Continuation Message
1192653900.563794127.0.0.1 - 127.0.0.1TCP [TCP ZeroWindow] 42990  445 
[ACK] Seq=779 Ack=254100 Win=0 Len=0 TSV=359427 TSER=359427
1192653900.563831127.0.0.1 - 127.0.0.1TCP [TCP Window Update] 42990  
445 [ACK] Seq=779 Ack=254100 Win=12288 Len=0 TSV=359427 TSER=359427
1192653900.563839127.0.0.1 - 127.0.0.1NBSS NBSS Continuation Message
1192653900.563858127.0.0.1 - 127.0.0.1TCP 42990  445 [ACK] Seq=779 
Ack=256724 Win=12288 Len=0 TSV=359427 TSER=359427
1192653900.56127.0.0.1 - 127.0.0.1SMB Read AndX Request, FID: 
0x0001, 64000 bytes at offset 256000


Note the delay between sending the response and getting the reply.
Also note that there is almost no delay between getting the reply and sending
the next request.

My understanding of TCP_CORK from the tcp(7) man page is that it should flush
out the data immediately, but the network trace seems to suggest that there is
a 200 ms delay between the request and the outgoing data.  tcp(7) says there is
a 200 ms delay for sending out data when the socket is in corked mode, so
uncorking does not appear to work.


Now for the strange part: the same code works without a 200 ms delay

Re: bizarre network timing problem

2007-10-17 Thread Felix von Leitner
Thus spake Chuck Ebbert ([EMAIL PROTECTED]):
  Any ideas what could cause this?
 (cc: netdev)

Maybe I should mention this, too:

accept(5, {sa_family=AF_INET6, sin6_port=htons(59821), inet_pton(AF_INET6, 
:::127.0.0.1, sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 
[18446744069414584348]) = 8
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0

And if it would be the Nagle algorithm, it should also impact the
ethernet case.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nforce 4 audio has no s/pdif out

2005-03-24 Thread Felix von Leitner
My shiny new nforce 4 main board has sound that is detected OK by ALSA:

  intel8x0_measure_ac97_clock: measured 49970 usecs
  intel8x0: clocking to 46877
  ALSA device list:
#0: NVidia CK804 with ALC850 at 0xd2003000, irq 185

but I can't get my stereo to play.  It is connected via optical S/PDIF.
Works fine under Windoze, so the hardware is ok.

Any idea what I could do?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-24 Thread Felix von Leitner
Thus spake Jeremy Fitzhardinge ([EMAIL PROTECTED]):
> Unfortunately, the Dothans *REQUIRE* some degree of ACPI support; the
> speedfreq-centrino needs to extract a table from ACPI to know what are
> valid operating (voltage/frequency) points to use for the CPU.  The
> patch you're using is definitely wrong in principle, though if it works
> for you in practice then by all means use it.

I enabled these:

  CONFIG_CPU_FREQ=y
  CONFIG_CPU_FREQ_STAT=y
  CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
  CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
  CONFIG_CPU_FREQ_GOV_POWERSAVE=y
  CONFIG_CPU_FREQ_GOV_USERSPACE=y
  CONFIG_CPU_FREQ_GOV_ONDEMAND=y
  CONFIG_CPU_FREQ_TABLE=y
  CONFIG_X86_SPEEDSTEP_CENTRINO=y
  CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI=y
  CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y

It should have worked, shouldn't it?

Well, it did not.  You can look at the kernel messages at
http://dl.fefe.de/dmesg.gz if that helps.

No cpufreq, and as far as I can see, no speedstep.
The fan is running, that's all I can tell.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-24 Thread Felix von Leitner
Thus spake Jeremy Fitzhardinge ([EMAIL PROTECTED]):
 Unfortunately, the Dothans *REQUIRE* some degree of ACPI support; the
 speedfreq-centrino needs to extract a table from ACPI to know what are
 valid operating (voltage/frequency) points to use for the CPU.  The
 patch you're using is definitely wrong in principle, though if it works
 for you in practice then by all means use it.

I enabled these:

  CONFIG_CPU_FREQ=y
  CONFIG_CPU_FREQ_STAT=y
  CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
  CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
  CONFIG_CPU_FREQ_GOV_POWERSAVE=y
  CONFIG_CPU_FREQ_GOV_USERSPACE=y
  CONFIG_CPU_FREQ_GOV_ONDEMAND=y
  CONFIG_CPU_FREQ_TABLE=y
  CONFIG_X86_SPEEDSTEP_CENTRINO=y
  CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI=y
  CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y

It should have worked, shouldn't it?

Well, it did not.  You can look at the kernel messages at
http://dl.fefe.de/dmesg.gz if that helps.

No cpufreq, and as far as I can see, no speedstep.
The fan is running, that's all I can tell.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nforce 4 audio has no s/pdif out

2005-03-24 Thread Felix von Leitner
My shiny new nforce 4 main board has sound that is detected OK by ALSA:

  intel8x0_measure_ac97_clock: measured 49970 usecs
  intel8x0: clocking to 46877
  ALSA device list:
#0: NVidia CK804 with ALC850 at 0xd2003000, irq 185

but I can't get my stereo to play.  It is connected via optical S/PDIF.
Works fine under Windoze, so the hardware is ok.

Any idea what I could do?

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-22 Thread Felix von Leitner
Thus spake Adam Belay ([EMAIL PROTECTED]):
> > > Why not use ACPI for CPU scaling?
> > Felix, did you try this?
> ACPI is the preferred (and only standardized) method of controlling cpu
> throttling on x86 systems.

  1. I don't trust ACPI
  2. my battery runs out quicker with ACPI compared to cpufreq

I _really_ _really_ don't want ACPI.  No, really not.  This is no idle
decision.  My current notebook is the only hardware I have ever seen
enabling ACPI not completely break Linux.  Of all my 10+ machines,
including my other 3 ones that are actually in use.

Which ACPI way to you mean, by the way?  Just enabling ACPI with thermal
and CPU or the cpufreq ACPI driver?  I think I tried that driver and did
not get the /sys interface to switch frequencies and governors.  If I
must, I can try again with 2.6.11, but I really really really do not
want to use ACPI, unless someone with a big shotgun is standing behind
me.

> Also, as I said earlier, I wanted to see an lspci for the usb issues.

:00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev 
a3)
:00:01.0 ISA bridge: nVidia Corporation: Unknown device 0050 (rev a3)
:00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
:00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
:00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
:00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio 
Controller (rev a2)
:00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2)
:00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev 
a3)
:00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev 
a3)
:00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
:00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
:00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
:00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
:00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
:00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Address Map
:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
DRAM Controller
:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
:01:00.0 VGA compatible controller: nVidia Corporation: Unknown device 0141 
(rev a2)
:05:0b.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A 
IEEE-1394a-2000 Controller (PHY/Link)
:05:0c.0 Ethernet controller: Marvell Technology Group Ltd. Gigabit 
Ethernet Controller (rev 13)

This kernel is stock 2.6.11 with CONFIG_USB_DEBUG=y.

When I put in my USB hub with my USB webcam, I get this:

Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: state 5 ports 10 chg  evt 0400
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: GetStatus port 10 
status 001803 POWER sig=j  CSC CONNECT
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: port 10, status 0501, change 
0001, 480 Mb/s
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: debounce: port 10: total 100ms 
stable 100ms status 0x501
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: port 10 high speed
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: GetStatus port 10 
status 001005 POWER sig=se0  PE CONNECT
Mar 22 23:25:40 demilich kernel: usb 1-10: new high speed USB device using 
ehci_hcd and address 3
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: port 10 reset error -110
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: port 10 not enabled, trying reset 
again...
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: port 10 reset error -110
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: port 10 not enabled, trying reset 
again...
Mar 22 23:25:40 demilich kernel: ehci_hcd :00:02.1: port 10 reset error -110
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Mar 22 23:25:40 demilich kernel: hub 1-0:1.0: port 10 not enabled, trying reset 
again...
Mar 22 23:25:41 demilich kernel: ehci_hcd :00:02.1: port 10 high speed
Mar 22 23:25:41 demilich kernel: ehci_hcd :00:02.1: GetStatus port 10 
status 001005 POWER sig=se0  PE CONNECT
Mar 22 23:25:41 demilich kernel: usb 1-10: new device strings: Mfr=0, 
Product=0, SerialNumber=0
Mar 22 23:25:41 demilich kernel: usb 1-10: hotplug
Mar 22 23:25:41 demilich kernel: usb 1-10: adding 1-10:1.0 (config #1, 
interface 0)
Mar 22 23:25:41 demilich kernel: usb 1-10:1.0: hotplug

(the line with Mfr=0 looks wrong to me).

Now pulling the device and putting it on through my USB hub (same hardware port
on the 

Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-13 Thread Felix von Leitner
Thus spake Andrew Morton ([EMAIL PROTECTED]):
> > Finally Centrino SpeedStep.
> > I have a "Intel(R) Pentium(R) M processor 1.80GHz" in my notebook.
> > Linux does not support it.  This architecture has been out there for
> > months now, and there even was a patch to support it posted here a in
> > October last year or so.  Linux still does not include it.  Until
> > 2.6.11-rc4-bk8 or so, the old patched file from back then still worked.
> > Now it doesn't.  Because some interface changed.  Now what?  Using a
> > Centrino notebook without CPU throttling is completely out of the
> > question.  Linux might as well not boot on it at all.
> Could you please dig out the old patch, send it?

I didn't keep the patch, but I kept the patched C file.
I'll attach it.

Felix


centrino-speedstep.tar.gz
Description: Binary data


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-13 Thread Felix von Leitner
Thus spake Andrew Morton ([EMAIL PROTECTED]):
> > My new nForce 4 mainboard has 10 or so USB 2.0 outlets.  In Windows,
> > they all work.  In Linux, two of them work.  Putting my USB stick or
> > anything else in one of the others produces nothing in Linux.
> > Apparently no IRQ getting through or something?

> Did it work correctly on any earlier kernel?  If so, which one(s)?

It turns out the ports do work with 2.6.11; I was running rc4 when I
last observed it break.

Sorry for the bad bug report.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-13 Thread Felix von Leitner
Thus spake Andrew Morton ([EMAIL PROTECTED]):
  My new nForce 4 mainboard has 10 or so USB 2.0 outlets.  In Windows,
  they all work.  In Linux, two of them work.  Putting my USB stick or
  anything else in one of the others produces nothing in Linux.
  Apparently no IRQ getting through or something?

 Did it work correctly on any earlier kernel?  If so, which one(s)?

It turns out the ports do work with 2.6.11; I was running rc4 when I
last observed it break.

Sorry for the bad bug report.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-13 Thread Felix von Leitner
Thus spake Andrew Morton ([EMAIL PROTECTED]):
  Finally Centrino SpeedStep.
  I have a Intel(R) Pentium(R) M processor 1.80GHz in my notebook.
  Linux does not support it.  This architecture has been out there for
  months now, and there even was a patch to support it posted here a in
  October last year or so.  Linux still does not include it.  Until
  2.6.11-rc4-bk8 or so, the old patched file from back then still worked.
  Now it doesn't.  Because some interface changed.  Now what?  Using a
  Centrino notebook without CPU throttling is completely out of the
  question.  Linux might as well not boot on it at all.
 Could you please dig out the old patch, send it?

I didn't keep the patch, but I kept the patched C file.
I'll attach it.

Felix


centrino-speedstep.tar.gz
Description: Binary data


2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-11 Thread Felix von Leitner
Linux is getting less and less usable for me. :-(


My new nForce 4 mainboard has 10 or so USB 2.0 outlets.  In Windows,
they all work.  In Linux, two of them work.  Putting my USB stick or
anything else in one of the others produces nothing in Linux.
Apparently no IRQ getting through or something?

This is what /proc/interrupts has to say:

  177:9503618   IO-APIC-level  ohci_hcd, eth0

These are the USB boot messages:

  usbcore: registered new driver usbfs
  usbcore: registered new driver hub
  ehci_hcd :00:02.1: new USB bus registered, assigned bus number 1
  ehci_hcd :00:02.1: USB 2.0 initialized, EHCI 1.00, driver 26 Oct 2004
  hub 1-0:1.0: USB hub found
  ohci_hcd: 2004 Nov 08 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
  ohci_hcd :00:02.0: new USB bus registered, assigned bus number 2
  hub 2-0:1.0: USB hub found
  usbcore: registered new driver usblp
  drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
  Initializing USB Mass Storage driver...
  usb 2-4: new low speed USB device using ohci_hcd and address 2
  usbcore: registered new driver usb-storage
  USB Mass Storage support registered.
  input: USB HID v1.10 Mouse [B16_b_02 USB-PS/2 Optical Mouse] on
  usb-:00:02.0-4
  usbcore: registered new driver usbhid
  drivers/usb/input/hid-core.c: v2.0:USB HID core driver
  HUB0 XVR0 XVR1 XVR2 XVR3 USB0 USB2 MMAC MMCI UAR1

As you can see, it appears to work in principle.



Now about IPv6: npush and npoll are two applications I wrote.  npush
sends multicast announcements and opens a TCP socket.  npoll receives
the multicast announcement and connects to the source IP/port/scope_id
of the announcement.  If both are run on the same machine, npoll sees
the link local address of eth0 as source IP, and the interface number of
eth0 as scope_id.  So far so good.  Trying to connect() however hangs.
Since this has been broken in different ways for as long as I can
remember in Linux, and I keep complaining about it every half a year or
so.  Can't someone fix this once and for all?  IPv4 checks whether we
are connecting to our own address and reroutes through loopback, why
can't IPv6?



Finally Centrino SpeedStep.
I have a "Intel(R) Pentium(R) M processor 1.80GHz" in my notebook.
Linux does not support it.  This architecture has been out there for
months now, and there even was a patch to support it posted here a in
October last year or so.  Linux still does not include it.  Until
2.6.11-rc4-bk8 or so, the old patched file from back then still worked.
Now it doesn't.  Because some interface changed.  Now what?  Using a
Centrino notebook without CPU throttling is completely out of the
question.  Linux might as well not boot on it at all.



Did I mention that I'm really tired of you putting stones into ATI's
way?  You might believe you have a right to piss everyone off, after all
people get what they paid for.  Or maybe you think you are on a crusade
to promote open source software.  But if you keep alienating me (I'm a
software developer) like this, I spend more time working around this
bullshit and less time writing free software.  In the end, everyone
loses.  I sincerely hope some day you people are done pissing in the
pool and can create at least some semblance of semi-stable APIs.  This
house is never going to be safe for living until you stop digging around
the foundation.

You know, people are actually spending time (and money!) to learn how to
write Linux kernel modules.  And all this API shifting makes sure their
knowledge is completely obsolete a few months down the road.  That's not
how you create a community of people working on a shared goal.


Enough ranting for today.  Sigh.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11: USB broken on nforce4, ipv6 still broken, centrino speedstep even more broken than in 2.6.10

2005-03-11 Thread Felix von Leitner
Linux is getting less and less usable for me. :-(


My new nForce 4 mainboard has 10 or so USB 2.0 outlets.  In Windows,
they all work.  In Linux, two of them work.  Putting my USB stick or
anything else in one of the others produces nothing in Linux.
Apparently no IRQ getting through or something?

This is what /proc/interrupts has to say:

  177:9503618   IO-APIC-level  ohci_hcd, eth0

These are the USB boot messages:

  usbcore: registered new driver usbfs
  usbcore: registered new driver hub
  ehci_hcd :00:02.1: new USB bus registered, assigned bus number 1
  ehci_hcd :00:02.1: USB 2.0 initialized, EHCI 1.00, driver 26 Oct 2004
  hub 1-0:1.0: USB hub found
  ohci_hcd: 2004 Nov 08 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
  ohci_hcd :00:02.0: new USB bus registered, assigned bus number 2
  hub 2-0:1.0: USB hub found
  usbcore: registered new driver usblp
  drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
  Initializing USB Mass Storage driver...
  usb 2-4: new low speed USB device using ohci_hcd and address 2
  usbcore: registered new driver usb-storage
  USB Mass Storage support registered.
  input: USB HID v1.10 Mouse [B16_b_02 USB-PS/2 Optical Mouse] on
  usb-:00:02.0-4
  usbcore: registered new driver usbhid
  drivers/usb/input/hid-core.c: v2.0:USB HID core driver
  HUB0 XVR0 XVR1 XVR2 XVR3 USB0 USB2 MMAC MMCI UAR1

As you can see, it appears to work in principle.



Now about IPv6: npush and npoll are two applications I wrote.  npush
sends multicast announcements and opens a TCP socket.  npoll receives
the multicast announcement and connects to the source IP/port/scope_id
of the announcement.  If both are run on the same machine, npoll sees
the link local address of eth0 as source IP, and the interface number of
eth0 as scope_id.  So far so good.  Trying to connect() however hangs.
Since this has been broken in different ways for as long as I can
remember in Linux, and I keep complaining about it every half a year or
so.  Can't someone fix this once and for all?  IPv4 checks whether we
are connecting to our own address and reroutes through loopback, why
can't IPv6?



Finally Centrino SpeedStep.
I have a Intel(R) Pentium(R) M processor 1.80GHz in my notebook.
Linux does not support it.  This architecture has been out there for
months now, and there even was a patch to support it posted here a in
October last year or so.  Linux still does not include it.  Until
2.6.11-rc4-bk8 or so, the old patched file from back then still worked.
Now it doesn't.  Because some interface changed.  Now what?  Using a
Centrino notebook without CPU throttling is completely out of the
question.  Linux might as well not boot on it at all.



Did I mention that I'm really tired of you putting stones into ATI's
way?  You might believe you have a right to piss everyone off, after all
people get what they paid for.  Or maybe you think you are on a crusade
to promote open source software.  But if you keep alienating me (I'm a
software developer) like this, I spend more time working around this
bullshit and less time writing free software.  In the end, everyone
loses.  I sincerely hope some day you people are done pissing in the
pool and can create at least some semblance of semi-stable APIs.  This
house is never going to be safe for living until you stop digging around
the foundation.

You know, people are actually spending time (and money!) to learn how to
write Linux kernel modules.  And all this API shifting makes sure their
knowledge is completely obsolete a few months down the road.  That's not
how you create a community of people working on a shared goal.


Enough ranting for today.  Sigh.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


diff for ipv6 RFC compatibility

2001-06-08 Thread Felix von Leitner

I have been told that I should send a diff rather than complain and
expect others to make a diff.  Oops ,)

So attached is a diff.

Oh boy oh boy will I now become part of the Linux Changelog? ;)

Felix


--- linux/include/linux/in6.h   Sat May 19 02:45:08 2001
+++ linux.fefe/include/linux/in6.h  Fri Jun  8 20:37:13 2001
@@ -53,7 +53,7 @@
struct in6_addr ipv6mr_multiaddr;
 
/* local IPv6 address of interface */
-   int ipv6mr_ifindex;
+   int ipv6mr_interface;
 };
 
 struct in6_flowlabel_req
--- linux/net/ipv6/ipv6_sockglue.c  Mon Mar 26 04:14:25 2001
+++ linux.fefe/net/ipv6/ipv6_sockglue.c Fri Jun  8 20:37:01 2001
@@ -346,9 +346,9 @@
break;
 
if (optname == IPV6_ADD_MEMBERSHIP)
-   retv = ipv6_sock_mc_join(sk, mreq.ipv6mr_ifindex, 
_multiaddr);
+   retv = ipv6_sock_mc_join(sk, mreq.ipv6mr_interface, 
+_multiaddr);
else
-   retv = ipv6_sock_mc_drop(sk, mreq.ipv6mr_ifindex, 
_multiaddr);
+   retv = ipv6_sock_mc_drop(sk, mreq.ipv6mr_interface, 
+_multiaddr);
break;
}
case IPV6_ROUTER_ALERT:



Linux kernel headers violate RFC2553

2001-06-08 Thread Felix von Leitner

glibc works around this, but the diet libc uses the kernel headers and
thus exports the wrong API to user land.

Here is what RFC2553 mandates:

   struct ipv6_mreq {
   struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */
   unsigned intipv6mr_interface; /* interface index */
   };

...and here is what include/linux/in6.h declares:

  struct ipv6_mreq {
  /* IPv6 multicast address of group */
  struct in6_addr ipv6mr_multiaddr;

  /* local IPv6 address of interface */
  int ipv6mr_ifindex;
  };

Note the ipv6mr_ifindex instead of the correct ipv6mr_interface.

This wrong name is only used twice in net/ipv6/ipv6_sockglue.c, so it should be
trivial to fix.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Linux kernel headers violate RFC2553

2001-06-08 Thread Felix von Leitner

glibc works around this, but the diet libc uses the kernel headers and
thus exports the wrong API to user land.

Here is what RFC2553 mandates:

   struct ipv6_mreq {
   struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */
   unsigned intipv6mr_interface; /* interface index */
   };

...and here is what include/linux/in6.h declares:

  struct ipv6_mreq {
  /* IPv6 multicast address of group */
  struct in6_addr ipv6mr_multiaddr;

  /* local IPv6 address of interface */
  int ipv6mr_ifindex;
  };

Note the ipv6mr_ifindex instead of the correct ipv6mr_interface.

This wrong name is only used twice in net/ipv6/ipv6_sockglue.c, so it should be
trivial to fix.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



diff for ipv6 RFC compatibility

2001-06-08 Thread Felix von Leitner

I have been told that I should send a diff rather than complain and
expect others to make a diff.  Oops ,)

So attached is a diff.

Oh boy oh boy will I now become part of the Linux Changelog? ;)

Felix


--- linux/include/linux/in6.h   Sat May 19 02:45:08 2001
+++ linux.fefe/include/linux/in6.h  Fri Jun  8 20:37:13 2001
@@ -53,7 +53,7 @@
struct in6_addr ipv6mr_multiaddr;
 
/* local IPv6 address of interface */
-   int ipv6mr_ifindex;
+   int ipv6mr_interface;
 };
 
 struct in6_flowlabel_req
--- linux/net/ipv6/ipv6_sockglue.c  Mon Mar 26 04:14:25 2001
+++ linux.fefe/net/ipv6/ipv6_sockglue.c Fri Jun  8 20:37:01 2001
@@ -346,9 +346,9 @@
break;
 
if (optname == IPV6_ADD_MEMBERSHIP)
-   retv = ipv6_sock_mc_join(sk, mreq.ipv6mr_ifindex, 
mreq.ipv6mr_multiaddr);
+   retv = ipv6_sock_mc_join(sk, mreq.ipv6mr_interface, 
+mreq.ipv6mr_multiaddr);
else
-   retv = ipv6_sock_mc_drop(sk, mreq.ipv6mr_ifindex, 
mreq.ipv6mr_multiaddr);
+   retv = ipv6_sock_mc_drop(sk, mreq.ipv6mr_interface, 
+mreq.ipv6mr_multiaddr);
break;
}
case IPV6_ROUTER_ALERT:



ipv6: can't connect to myself?!

2001-06-07 Thread Felix von Leitner

I can't connect() to my own link-local address.
connect just hangs.

Before some wise guy now tells me I should be connecting to ::1 instead:
"oh, really!" ;)  The application is npush/npoll from my ncp program
suite, which can be found at http://www.fefe.de/ncp/.

Basically, the sender sends UDP announcements and the receiver connects
to the IP of the announcement on the interface of the announcement.

strace of the receiver reveals that it hangs in the connect() call.

Any takers?  Why does this not work?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



ipv6: can't connect to myself?!

2001-06-07 Thread Felix von Leitner

I can't connect() to my own link-local address.
connect just hangs.

Before some wise guy now tells me I should be connecting to ::1 instead:
oh, really! ;)  The application is npush/npoll from my ncp program
suite, which can be found at http://www.fefe.de/ncp/.

Basically, the sender sends UDP announcements and the receiver connects
to the IP of the announcement on the interface of the announcement.

strace of the receiver reveals that it hangs in the connect() call.

Any takers?  Why does this not work?

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



include/asm-sparc/ptrace.h is broken

2001-05-31 Thread Felix von Leitner

on line 76, it includes , which does not exist.

This is critical because this include file does not work when used from
a libc.  ptrace.h is from 1997 on my 2.4.5 kernel, so this is not
something that broke recently.

My suggestion is to remove the offending line altogether or at least
protect it with #ifdef __KERNEL__.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



include/asm-sparc/ptrace.h is broken

2001-05-31 Thread Felix von Leitner

on line 76, it includes asm/asm_offsets.h, which does not exist.

This is critical because this include file does not work when used from
a libc.  ptrace.h is from 1997 on my 2.4.5 kernel, so this is not
something that broke recently.

My suggestion is to remove the offending line altogether or at least
protect it with #ifdef __KERNEL__.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



problem: reading from (rivafb) framebuffer is really slow

2001-05-17 Thread Felix von Leitner

When benchmarking DirectFB, I found that a typical software alpha
blending rectangle fill is completely dominated (I'm talking 90% of the
CPU cycles here) by the time it takes to read pixels from the
framebuffer.

The pixels are read linearly in chunks of aligned 32-bit words.  It's a
Geforce 2 GTS in 1024x768 with 32-bit color depth using rivafb.  This
looks quite crass to me.  Any ideas?  Maybe rivafb does not initialize
AGP and the card is in PCI mode or something?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



problem: reading from (rivafb) framebuffer is really slow

2001-05-17 Thread Felix von Leitner

When benchmarking DirectFB, I found that a typical software alpha
blending rectangle fill is completely dominated (I'm talking 90% of the
CPU cycles here) by the time it takes to read pixels from the
framebuffer.

The pixels are read linearly in chunks of aligned 32-bit words.  It's a
Geforce 2 GTS in 1024x768 with 32-bit color depth using rivafb.  This
looks quite crass to me.  Any ideas?  Maybe rivafb does not initialize
AGP and the card is in PCI mode or something?

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



vfat large file support

2001-05-16 Thread Felix von Leitner

I can't copy a file larger than 2 gigs to my vfat partition.
What gives?  2.4.4-ac5 kernel.  My cp copies 2 gigs and then aborts.

  $ echo foo >> file_on_vfat_partition

causes the shell to become unresponsive and consume lots of CPU time.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



vfat large file support

2001-05-16 Thread Felix von Leitner

I can't copy a file larger than 2 gigs to my vfat partition.
What gives?  2.4.4-ac5 kernel.  My cp copies 2 gigs and then aborts.

  $ echo foo  file_on_vfat_partition

causes the shell to become unresponsive and consume lots of CPU time.

Felix
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



chown bug

2001-03-05 Thread Felix von Leitner

The man page says:

   If the owner or group is specified as -1, then that ID is not
   changed.

If user !root says chown("/usr",-1,-1), he gets EPERM.  Why?
He explicitly told the kernel that he does not actually want to change
anything.  Why would the kernel say EPERM?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



chown bug

2001-03-05 Thread Felix von Leitner

The man page says:

   If the owner or group is specified as -1, then that ID is not
   changed.

If user !root says chown("/usr",-1,-1), he gets EPERM.  Why?
He explicitly told the kernel that he does not actually want to change
anything.  Why would the kernel say EPERM?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



USB and 2.4.2: "uhci: host system error, PCI problems?"

2001-02-23 Thread Felix von Leitner

This is the log.

Feb 23 14:35:53 hellhound kernel: usb.c: registered new driver usb_mouse
Feb 23 14:35:53 hellhound kernel: PCI: Found IRQ 12 for device 00:07.2
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:07.3
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:0b.0
Feb 23 14:35:53 hellhound kernel: uhci.c: USB UHCI at I/O 0xa400, IRQ 12
Feb 23 14:35:53 hellhound kernel: uhci.c: detected 2 ports
Feb 23 14:35:53 hellhound kernel: usb.c: new USB bus registered, assigned bus number 1
Feb 23 14:35:53 hellhound kernel: hub.c: USB hub found
Feb 23 14:35:53 hellhound kernel: hub.c: 2 ports detected
Feb 23 14:35:53 hellhound kernel: PCI: Found IRQ 12 for device 00:07.3
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:07.2
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:0b.0
Feb 23 14:35:53 hellhound kernel: uhci.c: USB UHCI at I/O 0xa800, IRQ 12
Feb 23 14:35:53 hellhound kernel: uhci.c: detected 2 ports
Feb 23 14:35:53 hellhound kernel: usb.c: new USB bus registered, assigned bus number 2
Feb 23 14:35:53 hellhound kernel: hub.c: USB hub found
Feb 23 14:35:53 hellhound kernel: hub.c: 2 ports detected
Feb 23 14:35:53 hellhound usbmgr[2819]: start 0.4.4
Feb 23 14:35:53 hellhound kernel: usb.c: registered new driver hid
Feb 23 14:35:53 hellhound kernel: mice: PS/2 mouse device common for all mice
Feb 23 14:35:53 hellhound insmod: Note: /etc/modules.conf is more recent than 
/lib/modules/2.4.2-fefe1/modules.dep
Feb 23 14:35:53 hellhound usbmgr[2821]: "hid" was loaded
Feb 23 14:35:53 hellhound usbmgr[2821]: "mousedev" was loaded
Feb 23 14:35:53 hellhound usbmgr[2821]: open error "host"
Feb 23 14:35:53 hellhound usbmgr[2824]: mount /proc/bus/usb
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 5ab port2: 58a 
data: 6
Feb 23 14:35:53 hellhound kernel: hub.c: USB new device connect on bus1/1, assigned 
device number 22
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 58a port2: 58a 
data: 6
Feb 23 14:35:53 hellhound kernel: mouse0: PS/2 mouse device for input0
Feb 23 14:35:53 hellhound kernel: input0: Logitech USB Mouse on usb1:22.0
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 5a5 port2: 588 
data: 4
Feb 23 14:35:54 hellhound kernel: uhci.c: root-hub INT complete: port1: 58a port2: 58a 
data: 6
Feb 23 14:35:54 hellhound usbmgr[2821]: class:0x9 subclass:0x0 protocol:0x0
Feb 23 14:35:54 hellhound kernel: uhci.c: root-hub INT complete: port1: 588 port2: 588 
data: 6
Feb 23 14:35:54 hellhound usbmgr[2821]: USB device is matched the configuration
Feb 23 14:35:54 hellhound usbmgr[2821]: "none" isn't loaded
Feb 23 14:35:54 hellhound usbmgr[2821]: vendor:0x46d product:0xc00c
Feb 23 14:35:54 hellhound usbmgr[2821]: class:0x3 subclass:0x1 protocol:0x2
Feb 23 14:35:54 hellhound usbmgr[2821]: USB device is matched the configuration
Feb 23 14:35:54 hellhound kernel: uhci: host system error, PCI problems?
Feb 23 14:35:54 hellhound kernel: uhci: host controller halted. very bad


Any ideas?  It's a VIA based Athlon board.  Worked fine with 2.4.0 and
2.4.1.  The only change was that I added rivafb, which finally adds
Geforce support in 2.4.2.  /proc/interrupts does not show any interrupts
assigned to rivafb, maybe there is a conflict?


   CPU0
  0: 457839  XT-PIC  timer
  1:  24705  XT-PIC  keyboard
  2:  0  XT-PIC  cascade
  5: 156420  XT-PIC  eth0
  8:  0  XT-PIC  rtc
 11: 26  XT-PIC  ncr53c8xx
 12:   5232  XT-PIC  usb-uhci, usb-uhci
 14:  17610  XT-PIC  ide0
 15:   2441  XT-PIC  ide1
NMI:  0
ERR:  0


Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



USB and 2.4.2: uhci: host system error, PCI problems?

2001-02-23 Thread Felix von Leitner

This is the log.

Feb 23 14:35:53 hellhound kernel: usb.c: registered new driver usb_mouse
Feb 23 14:35:53 hellhound kernel: PCI: Found IRQ 12 for device 00:07.2
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:07.3
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:0b.0
Feb 23 14:35:53 hellhound kernel: uhci.c: USB UHCI at I/O 0xa400, IRQ 12
Feb 23 14:35:53 hellhound kernel: uhci.c: detected 2 ports
Feb 23 14:35:53 hellhound kernel: usb.c: new USB bus registered, assigned bus number 1
Feb 23 14:35:53 hellhound kernel: hub.c: USB hub found
Feb 23 14:35:53 hellhound kernel: hub.c: 2 ports detected
Feb 23 14:35:53 hellhound kernel: PCI: Found IRQ 12 for device 00:07.3
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:07.2
Feb 23 14:35:53 hellhound kernel: PCI: The same IRQ used for device 00:0b.0
Feb 23 14:35:53 hellhound kernel: uhci.c: USB UHCI at I/O 0xa800, IRQ 12
Feb 23 14:35:53 hellhound kernel: uhci.c: detected 2 ports
Feb 23 14:35:53 hellhound kernel: usb.c: new USB bus registered, assigned bus number 2
Feb 23 14:35:53 hellhound kernel: hub.c: USB hub found
Feb 23 14:35:53 hellhound kernel: hub.c: 2 ports detected
Feb 23 14:35:53 hellhound usbmgr[2819]: start 0.4.4
Feb 23 14:35:53 hellhound kernel: usb.c: registered new driver hid
Feb 23 14:35:53 hellhound kernel: mice: PS/2 mouse device common for all mice
Feb 23 14:35:53 hellhound insmod: Note: /etc/modules.conf is more recent than 
/lib/modules/2.4.2-fefe1/modules.dep
Feb 23 14:35:53 hellhound usbmgr[2821]: "hid" was loaded
Feb 23 14:35:53 hellhound usbmgr[2821]: "mousedev" was loaded
Feb 23 14:35:53 hellhound usbmgr[2821]: open error "host"
Feb 23 14:35:53 hellhound usbmgr[2824]: mount /proc/bus/usb
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 5ab port2: 58a 
data: 6
Feb 23 14:35:53 hellhound kernel: hub.c: USB new device connect on bus1/1, assigned 
device number 22
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 58a port2: 58a 
data: 6
Feb 23 14:35:53 hellhound kernel: mouse0: PS/2 mouse device for input0
Feb 23 14:35:53 hellhound kernel: input0: Logitech USB Mouse on usb1:22.0
Feb 23 14:35:53 hellhound kernel: uhci.c: root-hub INT complete: port1: 5a5 port2: 588 
data: 4
Feb 23 14:35:54 hellhound kernel: uhci.c: root-hub INT complete: port1: 58a port2: 58a 
data: 6
Feb 23 14:35:54 hellhound usbmgr[2821]: class:0x9 subclass:0x0 protocol:0x0
Feb 23 14:35:54 hellhound kernel: uhci.c: root-hub INT complete: port1: 588 port2: 588 
data: 6
Feb 23 14:35:54 hellhound usbmgr[2821]: USB device is matched the configuration
Feb 23 14:35:54 hellhound usbmgr[2821]: "none" isn't loaded
Feb 23 14:35:54 hellhound usbmgr[2821]: vendor:0x46d product:0xc00c
Feb 23 14:35:54 hellhound usbmgr[2821]: class:0x3 subclass:0x1 protocol:0x2
Feb 23 14:35:54 hellhound usbmgr[2821]: USB device is matched the configuration
Feb 23 14:35:54 hellhound kernel: uhci: host system error, PCI problems?
Feb 23 14:35:54 hellhound kernel: uhci: host controller halted. very bad


Any ideas?  It's a VIA based Athlon board.  Worked fine with 2.4.0 and
2.4.1.  The only change was that I added rivafb, which finally adds
Geforce support in 2.4.2.  /proc/interrupts does not show any interrupts
assigned to rivafb, maybe there is a conflict?


   CPU0
  0: 457839  XT-PIC  timer
  1:  24705  XT-PIC  keyboard
  2:  0  XT-PIC  cascade
  5: 156420  XT-PIC  eth0
  8:  0  XT-PIC  rtc
 11: 26  XT-PIC  ncr53c8xx
 12:   5232  XT-PIC  usb-uhci, usb-uhci
 14:  17610  XT-PIC  ide0
 15:   2441  XT-PIC  ide1
NMI:  0
ERR:  0


Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [rfc] Near-constant time directory index for Ext2

2001-02-22 Thread Felix von Leitner

Thus spake Alan Cox ([EMAIL PROTECTED]):
> > > There will be a lot fewer metadata index
> > > blocks in your directory file, for one thing.
> > Oh yes, another thing: a B-tree directory structure does not need
> > metadata index blocks.
> Before people get excited about complex tree directory indexes, remember to 
> solve the other 95% before implementation - recovering from lost blocks,
> corruption and the like

And don't forget the trouble with NFS handles after the tree was rebalanced.

Trees are nice only theoretically.  In practice, the benefits are
outweighed by the nastiness in form of fsck and NFS and bigger code
(normally: more complex -> less reliable).

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [rfc] Near-constant time directory index for Ext2

2001-02-22 Thread Felix von Leitner

Thus spake Alan Cox ([EMAIL PROTECTED]):
   There will be a lot fewer metadata index
   blocks in your directory file, for one thing.
  Oh yes, another thing: a B-tree directory structure does not need
  metadata index blocks.
 Before people get excited about complex tree directory indexes, remember to 
 solve the other 95% before implementation - recovering from lost blocks,
 corruption and the like

And don't forget the trouble with NFS handles after the tree was rebalanced.

Trees are nice only theoretically.  In practice, the benefits are
outweighed by the nastiness in form of fsck and NFS and bigger code
(normally: more complex - less reliable).

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



sendfile64?

2001-02-19 Thread Felix von Leitner

Why isn't there a sendfile64?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



sendfile64?

2001-02-19 Thread Felix von Leitner

Why isn't there a sendfile64?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux stifles innovation...

2001-02-17 Thread Felix von Leitner

Thus spake Dennis ([EMAIL PROTECTED]):
> You are confusing "progress" with "innovation". If there is only 1 choice, 
> thats not innovation. Expanding on a bad idea, or even a good one, is not 
> innovation.

This is bizarre.

Please name one innovation in the history of mankind that could not be
seen as expanding on a different idea or even cloning an idea from
someone else (for example, nature).

Dennis, do you have a single argument or are you going to post bizarre
statements like this forever?  Please just say so, so people cann
killfile you now.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [LONG RANT] Re: Linux stifles innovation...

2001-02-17 Thread Felix von Leitner

Thus spake Henning P . Schmiedehausen ([EMAIL PROTECTED]):
> "If a company does not write a driver which works on all hardware
>  platforms in all cases and gives us the source, then it is better,
>  that the company writes no drivers at all."

> "If I can't force a company to write a driver for everyone, then I
>  don't want to write them any driver at all."

> IMHO you're like a spoiled kid: "If I can't have it, noone should have it".

Henning, what is the matter with you?

I bought the hardware.  Why should I pay for the driver?
Not even on Windows you pay extra for a driver!

Please state your intentions.  Why would you want to split the Linux
user base into people who pay companies to screw them (I get a driver
for hardware I already paid for, but the driver will work with exactly
one kernel version on one hardware) and people who think they deserve
support when they buy hardware?

Why do we even have to discuss drivers?
A company that actively hinders developing a good driver with patents,
NDAs or other legal crap does not deserve my money.  If you throw your
money at such people, you deserve everything you get.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [LONG RANT] Re: Linux stifles innovation...

2001-02-17 Thread Felix von Leitner

Thus spake Henning P . Schmiedehausen ([EMAIL PROTECTED]):
 "If a company does not write a driver which works on all hardware
  platforms in all cases and gives us the source, then it is better,
  that the company writes no drivers at all."

 "If I can't force a company to write a driver for everyone, then I
  don't want to write them any driver at all."

 IMHO you're like a spoiled kid: "If I can't have it, noone should have it".

Henning, what is the matter with you?

I bought the hardware.  Why should I pay for the driver?
Not even on Windows you pay extra for a driver!

Please state your intentions.  Why would you want to split the Linux
user base into people who pay companies to screw them (I get a driver
for hardware I already paid for, but the driver will work with exactly
one kernel version on one hardware) and people who think they deserve
support when they buy hardware?

Why do we even have to discuss drivers?
A company that actively hinders developing a good driver with patents,
NDAs or other legal crap does not deserve my money.  If you throw your
money at such people, you deserve everything you get.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux stifles innovation...

2001-02-17 Thread Felix von Leitner

Thus spake Dennis ([EMAIL PROTECTED]):
 You are confusing "progress" with "innovation". If there is only 1 choice, 
 thats not innovation. Expanding on a bad idea, or even a good one, is not 
 innovation.

This is bizarre.

Please name one innovation in the history of mankind that could not be
seen as expanding on a different idea or even cloning an idea from
someone else (for example, nature).

Dennis, do you have a single argument or are you going to post bizarre
statements like this forever?  Please just say so, so people cann
killfile you now.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [reiserfs-list] ReiserFS Oops (2.4.1, deterministic, symlink related)

2001-02-03 Thread Felix von Leitner

Thus spake J . A . Magallon ([EMAIL PROTECTED]):
> > How about a simple patch to the top level makefile that checks the gcc
> > version then prints a distinct message ..'this compiler hasn't been approved
> > for compiling the kernel', sleeping for one second, then continuing on.  This
> > solution doesn't stop compiling and makes a visible indicator without forcing
> > anything.
> Or a config option like CONFIG_TRUSTED_COMPILER, and everyone that wants
> can bracket his code in 'if [ $TRUSTED = "y" ] ... fi', so if some driver-fs
> fails with untrusted compilers it is just not selectable.

What kind of crap is this?
It is not the kernel's job to work around RedHat bugs.
If RedHat ships a broken compiler, it is their responsibility to tell
their customers and provide a working one.

This kind of compatibility crap has caused commercial Unices to
suffocate in their own bloat.  We don't need this.  And we don't want
this.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [reiserfs-list] ReiserFS Oops (2.4.1, deterministic, symlink related)

2001-02-03 Thread Felix von Leitner

Thus spake J . A . Magallon ([EMAIL PROTECTED]):
  How about a simple patch to the top level makefile that checks the gcc
  version then prints a distinct message ..'this compiler hasn't been approved
  for compiling the kernel', sleeping for one second, then continuing on.  This
  solution doesn't stop compiling and makes a visible indicator without forcing
  anything.
 Or a config option like CONFIG_TRUSTED_COMPILER, and everyone that wants
 can bracket his code in 'if [ $TRUSTED = "y" ] ... fi', so if some driver-fs
 fails with untrusted compilers it is just not selectable.

What kind of crap is this?
It is not the kernel's job to work around RedHat bugs.
If RedHat ships a broken compiler, it is their responsibility to tell
their customers and provide a working one.

This kind of compatibility crap has caused commercial Unices to
suffocate in their own bloat.  We don't need this.  And we don't want
this.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN))

2001-01-28 Thread Felix von Leitner

Thus spake Felix von Leitner ([EMAIL PROTECTED]):
> What is missing here is a good authoritative web ressource that tells
> people which NIC to buy.

I started one now.

It's at http://www.fefe.de/linuxeth/, but there is not much content yet.
Please contribute!

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)

2001-01-28 Thread Felix von Leitner

Thus spake Andrew Morton ([EMAIL PROTECTED]):
> Conclusions:

>   For a NIC which cannot do scatter/gather/checksums, the zerocopy
>   patch makes no change in throughput in all case.

>   For a NIC which can do scatter/gather/checksums, sendfile()
>   efficiency is improved by 40% and send() efficiency is decreased by
>   10%.  The increase and decrease caused by the zerocopy patch will in
>   fact be significantly larger than these two figures, because the
>   measurements here include a constant base load caused by the device
>   driver.

What is missing here is a good authoritative web ressource that tells
people which NIC to buy.

I have a tulip NIC because a few years ago that apparently was the NIC
of choice.  It has good multicast (which is important to me), but AFAIK
it has neither scatter-gather nor hardware checksumming.

Is there such a web page already?
If not, I volunteer to create amd maintain one.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)

2001-01-28 Thread Felix von Leitner

Thus spake Andrew Morton ([EMAIL PROTECTED]):
 Conclusions:

   For a NIC which cannot do scatter/gather/checksums, the zerocopy
   patch makes no change in throughput in all case.

   For a NIC which can do scatter/gather/checksums, sendfile()
   efficiency is improved by 40% and send() efficiency is decreased by
   10%.  The increase and decrease caused by the zerocopy patch will in
   fact be significantly larger than these two figures, because the
   measurements here include a constant base load caused by the device
   driver.

What is missing here is a good authoritative web ressource that tells
people which NIC to buy.

I have a tulip NIC because a few years ago that apparently was the NIC
of choice.  It has good multicast (which is important to me), but AFAIK
it has neither scatter-gather nor hardware checksumming.

Is there such a web page already?
If not, I volunteer to create amd maintain one.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN))

2001-01-28 Thread Felix von Leitner

Thus spake Felix von Leitner ([EMAIL PROTECTED]):
 What is missing here is a good authoritative web ressource that tells
 people which NIC to buy.

I started one now.

It's at http://www.fefe.de/linuxeth/, but there is not much content yet.
Please contribute!

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Off-Topic: how do I trace a PID over double-forks?

2001-01-18 Thread Felix von Leitner

This is more a Unix API question than a Linux question.

I hope the issue is interesting enough to be of interest to some of you.

Basically, I am writing an init which features process watching
capabilities.  My init has a management channel with which you can tell
it "the PID of the ssh process is really 123 instead of 12".

When init forks a getty and that getty exits, it is restarted.  So far
so good.  But I want my init to be able to restart uncooperative
processes like sendmail that fork in the background.  sendmail may be a
bad example because the sources are available, but please imagine you
didn't have the sources to sendmail or didn't want to touch them.

Now, the back channel for my init has a function that allows to set the
PID of a process.  The idea is that the init does not start sendmail but
a wrapper.  The wrapper forks, runs sendmail, does some magic trickery
to find the real PID of the daemonized sendmail and tells init this PID
so init will know it has to restart sendmail when it exits and won't
restart the wrapper when that exits.

Follow me this far?  Great!  The real problem at hand is: what kind of
trickery can I employ in that wrapper.  I was hoping for something that
is not Linux specific, but I haven't found anything yet.  I was also
hoping that I could find a method that does not rely on /proc being
there or on any filesystem being mounted read-write (yes, my back
channel works if the filesystem is mounted read-only).  So, using
/proc and relying on something like /var/run/sendmail.pid are out.

Someone suggested using fcntl to create a lock and then use fcntl again
to see who holds the lock.  That sounded good at first, but fork() does
not seem to inherit locks.  Does anyone have another idea?

In case I made you wonder: http://www.fefe.de/minit/

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Documenting stat(2)

2001-01-18 Thread Felix von Leitner

Thus spake Eric S. Raymond ([EMAIL PROTECTED]):
> Here is what I think I know about stat(2) that isn't in the
> Linux man pages:

> * For a symlink (S_IFLNK) it reports the size of the link file, not the
> size of the file the link points to.

I think you confuse stat and lstat here.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Documenting stat(2)

2001-01-18 Thread Felix von Leitner

Thus spake Eric S. Raymond ([EMAIL PROTECTED]):
 Here is what I think I know about stat(2) that isn't in the
 Linux man pages:

 * For a symlink (S_IFLNK) it reports the size of the link file, not the
 size of the file the link points to.

I think you confuse stat and lstat here.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Off-Topic: how do I trace a PID over double-forks?

2001-01-18 Thread Felix von Leitner

This is more a Unix API question than a Linux question.

I hope the issue is interesting enough to be of interest to some of you.

Basically, I am writing an init which features process watching
capabilities.  My init has a management channel with which you can tell
it "the PID of the ssh process is really 123 instead of 12".

When init forks a getty and that getty exits, it is restarted.  So far
so good.  But I want my init to be able to restart uncooperative
processes like sendmail that fork in the background.  sendmail may be a
bad example because the sources are available, but please imagine you
didn't have the sources to sendmail or didn't want to touch them.

Now, the back channel for my init has a function that allows to set the
PID of a process.  The idea is that the init does not start sendmail but
a wrapper.  The wrapper forks, runs sendmail, does some magic trickery
to find the real PID of the daemonized sendmail and tells init this PID
so init will know it has to restart sendmail when it exits and won't
restart the wrapper when that exits.

Follow me this far?  Great!  The real problem at hand is: what kind of
trickery can I employ in that wrapper.  I was hoping for something that
is not Linux specific, but I haven't found anything yet.  I was also
hoping that I could find a method that does not rely on /proc being
there or on any filesystem being mounted read-write (yes, my back
channel works if the filesystem is mounted read-only).  So, using
/proc and relying on something like /var/run/sendmail.pid are out.

Someone suggested using fcntl to create a lock and then use fcntl again
to see who holds the lock.  That sounded good at first, but fork() does
not seem to inherit locks.  Does anyone have another idea?

In case I made you wonder: http://www.fefe.de/minit/

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]

2001-01-16 Thread Felix von Leitner

Thus spake Ingo Molnar ([EMAIL PROTECTED]):
> if you read my (radical) proposal, the identification is based on a kernel
> pointer and a 256-bit random integer. So non-negative integers are not
> needed. (file-IO system-calls would be modified to detect if 'Unix file
> descriptors' or pointers to 'native file descriptors' are passed to them,
> so this is truly radical.)

Yuck, don't pass pointers in kernel space to user space!
NT does it and look what kernel call argument verification havoc it
wrought over them!

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 'native files', 'object fingerprints' [was: sendpath()]

2001-01-16 Thread Felix von Leitner

Thus spake Ingo Molnar ([EMAIL PROTECTED]):
> But even user-space code could use 'native files', via the following, safe
> mechanizm:

[something reminiscient of a token from a capability system]

> (this 'fingerprint' mechanizm can be used for any object, not only files.)

One good thing about tokens is that file handles can be implemented on
top of them in user space.

On the other hand, there already are mechanisms to pass file descriptors
around and so on, so you don't gain anything tangible from your efford.

I would advise reading some text books about capability systems, there
is a lot to be learned here.  But retrofitting something like this on an
existing kernel is probably not a very good idea.  Experience shows that
you can't "un-bloat" a piece of software by introducing a few elegant
concepts.  The compatibility stuff eats most of the benefits.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Felix von Leitner

Thus spake Jamie Lokier ([EMAIL PROTECTED]):
> You would need to use a new open() flag: O_ANYFD.
> The requirement comes from this like this:

>   close (0);
>   close (1);
>   close (2);
>   open ("/dev/console", O_RDWR);
>   dup ();
>   dup ();

So it's not actually part of POSIX, it's just to get around fixing
legacy code? ;-)

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Felix von Leitner

Thus spake Ingo Molnar ([EMAIL PROTECTED]):
> > I don't know how Linux does it, but returning the first free file
> > descriptor can be implemented as O(1) operation.
> to put it more accurately: the requirement is to be able to open(), use
> and close() an unlimited number of file descriptors with O(1) overhead,
> under any allocation pattern, with only RAM limiting the number of files.
> Both of my proposals attempt to provide this. It's possible to open() O(1)
> but do a O(log(N)) close(), but that is of no practical value IMO.

I cheated.  I was only talking about open().
close() is of course more expensive then.

Other than that: where does the requirement come from?
Can't we just use a free list where we prepend closed fds and always use
the first one on open()?  That would even increase spatial locality and
be good for the CPU caches.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Felix von Leitner

Thus spake Albert D. Cahalan ([EMAIL PROTECTED]):
> Rather than combining open() with sendfile(), it could be combined
> with stat(). Since the syscall would be new anyway, it could skip
> the normal requirement about returning the next free file descriptor
> in favor of returning whatever can be most quickly found.

I don't know how Linux does it, but returning the first free file
descriptor can be implemented as O(1) operation.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Felix von Leitner

Thus spake Albert D. Cahalan ([EMAIL PROTECTED]):
 Rather than combining open() with sendfile(), it could be combined
 with stat(). Since the syscall would be new anyway, it could skip
 the normal requirement about returning the next free file descriptor
 in favor of returning whatever can be most quickly found.

I don't know how Linux does it, but returning the first free file
descriptor can be implemented as O(1) operation.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Is sendfile all that sexy?

2001-01-16 Thread Felix von Leitner

Thus spake Ingo Molnar ([EMAIL PROTECTED]):
  I don't know how Linux does it, but returning the first free file
  descriptor can be implemented as O(1) operation.
 to put it more accurately: the requirement is to be able to open(), use
 and close() an unlimited number of file descriptors with O(1) overhead,
 under any allocation pattern, with only RAM limiting the number of files.
 Both of my proposals attempt to provide this. It's possible to open() O(1)
 but do a O(log(N)) close(), but that is of no practical value IMO.

I cheated.  I was only talking about open().
close() is of course more expensive then.

Other than that: where does the requirement come from?
Can't we just use a free list where we prepend closed fds and always use
the first one on open()?  That would even increase spatial locality and
be good for the CPU caches.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]

2001-01-16 Thread Felix von Leitner

Thus spake Ingo Molnar ([EMAIL PROTECTED]):
 if you read my (radical) proposal, the identification is based on a kernel
 pointer and a 256-bit random integer. So non-negative integers are not
 needed. (file-IO system-calls would be modified to detect if 'Unix file
 descriptors' or pointers to 'native file descriptors' are passed to them,
 so this is truly radical.)

Yuck, don't pass pointers in kernel space to user space!
NT does it and look what kernel call argument verification havoc it
wrought over them!

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

2000-12-25 Thread Felix von Leitner

Thus spake Felix von Leitner ([EMAIL PROTECTED]):
> Here is the result of my test program on the strip set:
>   # rb < /dev/md/0
>   30.3 meg/sec
>   #

One more detail: top says the CPU is 50% system when reading from either
one of the disk or raid devices.  That seems awfully high considering
that the Promise controller claims to do UDMA.

Any comments?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

2000-12-25 Thread Felix von Leitner

Hi,

I bought 4 ATA-100 Maxtor drives and put them on a Promise Ultra100
controller to make a single striping RAID of them to increase
throughput.

I wrote a small test program that simply reads stdin linearly and
displays the throughput.  The block size is 100k.  This is the result:

  # cat /etc/raidtab
  raiddev /dev/md/0
  raid-level 0
  nr-raid-disks 4
  persistent-superblock 1
  chunk-size 32

  device /dev/ide/host2/bus0/target0/lun0/part1
  raid-disk 0
  device /dev/ide/host2/bus0/target1/lun0/part1
  raid-disk 2

  device /dev/ide/host2/bus1/target0/lun0/part1
  raid-disk 1
  device /dev/ide/host2/bus1/target1/lun0/part1
  raid-disk 3

Here are the results of my test program on the disk devices:
  # rb < /dev/ide/host2/bus0/target0/lun0/part1
  27.8 meg/sec
  # rb < /dev/ide/host2/bus0/target0/lun0/part1
  26.8 meg/sec

the other two disks have approximately the same numbers.

Here is the result of my test program on the strip set:
  # rb < /dev/md/0
  30.3 meg/sec
  #

While this is faster than linear mode, I would have expected much better
performance.  These are the boot messages of the Promise adapter:

  PDC20267: IDE controller on PCI bus 00 dev 60
  PDC20267: chipset revision 2
  PDC20267: not 100% native mode: will probe irqs later
  PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
  ide2: BM-DMA at 0xec00-0xec07, BIOS settings: hde:pio, hdf:pio
  ide3: BM-DMA at 0xec08-0xec0f, BIOS settings: hdg:pio, hdh:pio
  ide2 at 0xdc00-0xdc07,0xe002 on irq 10
  ide3 at 0xe400-0xe407,0xe802 on irq 10
  hde: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdf: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdg: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdh: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)

I tuned the devices with hdparm -c 1 -a 32 -m 16 -p -u 1, for what it's worth
(did not increase throughput but appeared to lessen the CPU usage).

To verify that this is not an issue of the Promise controller, I started
two instances of my test tool at the same time, one working on hde, the
other on hdg (the two channels).  Both yielded approximately 25 meg/sec,
so it does not appear to be a hardware or driver issue.  Is the RAID
code really this slow?  Any ideas what I can do?

I am using the user space tools from raidtools-19990421-0.90.tar.bz2,
but that should not have any influence, right?

I heard that there is a new, faster RAID code somewhere, but it only
claimed to be faster on RAID level 5, not on striping.

Any tuning advice?

By the way: I noticed another thing: one of the Maxtor hard disks was
broken.  It caused the whole box to freeze solid (no numlock, no console
switches, no sysrq).  That to me severely limits the usefulness of IDE
RAID.  While SCSI problems cause trouble, too, I have never seen one
cause a complete freeze.  How am I supposed to hot-swap the disks?
I am using VESA framebuffer, so maybe there was a panic and it simply
did not appear on my screen (or in the logs).

Hope to hear from you soon (the RAID is needed on Dec 27).
Should I use LVM instead of the MD code?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

2000-12-25 Thread Felix von Leitner

Hi,

I bought 4 ATA-100 Maxtor drives and put them on a Promise Ultra100
controller to make a single striping RAID of them to increase
throughput.

I wrote a small test program that simply reads stdin linearly and
displays the throughput.  The block size is 100k.  This is the result:

  # cat /etc/raidtab
  raiddev /dev/md/0
  raid-level 0
  nr-raid-disks 4
  persistent-superblock 1
  chunk-size 32

  device /dev/ide/host2/bus0/target0/lun0/part1
  raid-disk 0
  device /dev/ide/host2/bus0/target1/lun0/part1
  raid-disk 2

  device /dev/ide/host2/bus1/target0/lun0/part1
  raid-disk 1
  device /dev/ide/host2/bus1/target1/lun0/part1
  raid-disk 3

Here are the results of my test program on the disk devices:
  # rb  /dev/ide/host2/bus0/target0/lun0/part1
  27.8 meg/sec
  # rb  /dev/ide/host2/bus0/target0/lun0/part1
  26.8 meg/sec

the other two disks have approximately the same numbers.

Here is the result of my test program on the strip set:
  # rb  /dev/md/0
  30.3 meg/sec
  #

While this is faster than linear mode, I would have expected much better
performance.  These are the boot messages of the Promise adapter:

  PDC20267: IDE controller on PCI bus 00 dev 60
  PDC20267: chipset revision 2
  PDC20267: not 100% native mode: will probe irqs later
  PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
  ide2: BM-DMA at 0xec00-0xec07, BIOS settings: hde:pio, hdf:pio
  ide3: BM-DMA at 0xec08-0xec0f, BIOS settings: hdg:pio, hdh:pio
  ide2 at 0xdc00-0xdc07,0xe002 on irq 10
  ide3 at 0xe400-0xe407,0xe802 on irq 10
  hde: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdf: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdg: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
  hdh: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)

I tuned the devices with hdparm -c 1 -a 32 -m 16 -p -u 1, for what it's worth
(did not increase throughput but appeared to lessen the CPU usage).

To verify that this is not an issue of the Promise controller, I started
two instances of my test tool at the same time, one working on hde, the
other on hdg (the two channels).  Both yielded approximately 25 meg/sec,
so it does not appear to be a hardware or driver issue.  Is the RAID
code really this slow?  Any ideas what I can do?

I am using the user space tools from raidtools-19990421-0.90.tar.bz2,
but that should not have any influence, right?

I heard that there is a new, faster RAID code somewhere, but it only
claimed to be faster on RAID level 5, not on striping.

Any tuning advice?

By the way: I noticed another thing: one of the Maxtor hard disks was
broken.  It caused the whole box to freeze solid (no numlock, no console
switches, no sysrq).  That to me severely limits the usefulness of IDE
RAID.  While SCSI problems cause trouble, too, I have never seen one
cause a complete freeze.  How am I supposed to hot-swap the disks?
I am using VESA framebuffer, so maybe there was a panic and it simply
did not appear on my screen (or in the logs).

Hope to hear from you soon (the RAID is needed on Dec 27).
Should I use LVM instead of the MD code?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

2000-12-25 Thread Felix von Leitner

Thus spake Felix von Leitner ([EMAIL PROTECTED]):
 Here is the result of my test program on the strip set:
   # rb  /dev/md/0
   30.3 meg/sec
   #

One more detail: top says the CPU is 50% system when reading from either
one of the disk or raid devices.  That seems awfully high considering
that the Promise controller claims to do UDMA.

Any comments?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



question about pread

2000-11-30 Thread Felix von Leitner

I am trying to implement pread for my diet libc.
This is my test program:

  #include 
  main() {
char buf[1024];
int fd=open("/etc/passwd",0);
pread(fd,buf,30,32);
close(fd);
write(1,buf,32);
  }

I compiled it against diet libc and glibc and ran it on a powerpc box.
t is the test program linked against diet libc, t1 is the test program
linked against glibc.
Here is the result:

  $ strace ./t1
  execve("./t1", ["./t1"], [/* 19 vars */]) = 0
  brk(0)  = 0x100106a8
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30014000
  open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory)
  open("/usr/local/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
  stat("/usr/local/lib", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
  open("/usr/X11R6/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
  stat("/usr/X11R6/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
  open("/etc/ld.so.cache", O_RDONLY)  = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=9729, ...}) = 0
  mmap(NULL, 9729, PROT_READ, MAP_PRIVATE, 3, 0) = 0x30015000
  close(3)= 0
  open("/lib/libc.so.6", O_RDONLY)= 3
  fstat(3, {st_mode=S_IFREG|0755, st_size=992080, ...}) = 0
  read(3, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\24\0\0\0\1\0\2(\340"..., 4096) = 4096
  mmap(0xfeea000, 1072860, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xfeea000
  mprotect(0xffcb000, 151260, PROT_NONE)  = 0
  mmap(0xffda000, 69632, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 
0xe) = 0xffda000
  mmap(0xffeb000, 20188, PROT_READ|PROT_WRITE|PROT_EXEC, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffeb000
  close(3)= 0
  munmap(0x30015000, 9729)= 0
  getpid()= 11304
  open("/etc/passwd", O_RDONLY)   = 3
  pread(3, "daemon:x:1:1:daemon:/usr/sbin:", 30, 137438953472) = 30
  close(3)= 0
  write(1, "daemon:x:1:1:daemon:/usr/sbin:j ", 32daemon:x:1:1:daemon:/usr/sbin:j ) = 32
  exit(32)= ?
  $ strace ./t
  execve("./t", ["./t"], [/* 19 vars */]) = 0
  open("/etc/passwd", O_RDONLY)   = 3
  pread(3, "", 30, 137438953472)  = 0
  close(3)= 0
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32) = 32
  exit(32)= ?
  $

How can this be?  Both open the same file and call pread with the same arguments,
yet pread returns 30 for the glibc program and 0 for the diet libc one?!

Can anyone shed some light on this?  What exactly is the calling convention
for pread?  The diet libc pread code appears to work on x86 and sparc
but not on mips and ppc.

I used kernel 2.4.0-test10 on x86 and 2.2.17 on sparc and ppc, for what
it's worth.

stumped,

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



question about pread

2000-11-30 Thread Felix von Leitner

I am trying to implement pread for my diet libc.
This is my test program:

  #include unistd.h
  main() {
char buf[1024];
int fd=open("/etc/passwd",0);
pread(fd,buf,30,32);
close(fd);
write(1,buf,32);
  }

I compiled it against diet libc and glibc and ran it on a powerpc box.
t is the test program linked against diet libc, t1 is the test program
linked against glibc.
Here is the result:

  $ strace ./t1
  execve("./t1", ["./t1"], [/* 19 vars */]) = 0
  brk(0)  = 0x100106a8
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30014000
  open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory)
  open("/usr/local/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
  stat("/usr/local/lib", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
  open("/usr/X11R6/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
  stat("/usr/X11R6/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
  open("/etc/ld.so.cache", O_RDONLY)  = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=9729, ...}) = 0
  mmap(NULL, 9729, PROT_READ, MAP_PRIVATE, 3, 0) = 0x30015000
  close(3)= 0
  open("/lib/libc.so.6", O_RDONLY)= 3
  fstat(3, {st_mode=S_IFREG|0755, st_size=992080, ...}) = 0
  read(3, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\24\0\0\0\1\0\2(\340"..., 4096) = 4096
  mmap(0xfeea000, 1072860, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xfeea000
  mprotect(0xffcb000, 151260, PROT_NONE)  = 0
  mmap(0xffda000, 69632, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 
0xe) = 0xffda000
  mmap(0xffeb000, 20188, PROT_READ|PROT_WRITE|PROT_EXEC, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffeb000
  close(3)= 0
  munmap(0x30015000, 9729)= 0
  getpid()= 11304
  open("/etc/passwd", O_RDONLY)   = 3
  pread(3, "daemon:x:1:1:daemon:/usr/sbin:", 30, 137438953472) = 30
  close(3)= 0
  write(1, "daemon:x:1:1:daemon:/usr/sbin:j ", 32daemon:x:1:1:daemon:/usr/sbin:j ) = 32
  exit(32)= ?
  $ strace ./t
  execve("./t", ["./t"], [/* 19 vars */]) = 0
  open("/etc/passwd", O_RDONLY)   = 3
  pread(3, "", 30, 137438953472)  = 0
  close(3)= 0
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32) = 32
  exit(32)= ?
  $

How can this be?  Both open the same file and call pread with the same arguments,
yet pread returns 30 for the glibc program and 0 for the diet libc one?!

Can anyone shed some light on this?  What exactly is the calling convention
for pread?  The diet libc pread code appears to work on x86 and sparc
but not on mips and ppc.

I used kernel 2.4.0-test10 on x86 and 2.2.17 on sparc and ppc, for what
it's worth.

stumped,

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Felix von Leitner

Thus spake Linus Torvalds ([EMAIL PROTECTED]):
> I disagree.

> Let's just face it, poll() is a bad interface scalability-wise.

Is that a reason to implement it badly?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Felix von Leitner

Thus spake Linus Torvalds ([EMAIL PROTECTED]):
 I disagree.

 Let's just face it, poll() is a bad interface scalability-wise.

Is that a reason to implement it badly?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: driver initialization pipelining

2000-10-19 Thread Felix von Leitner

Thus spake Andre Hedrick ([EMAIL PROTECTED]):
> > Some of the initialization can definitely be done in parallel, but there
> > are all sorts of special cases, like devices which turn off interrupts
> > during init (IDE), and other fun tricks...  Some of the delays during
> > init are timing sensitive, where you don't want to have to wait for the
> > tasklet to be called for completion.
> I will be happy to break the IRQ code for a demo for Felix.
> But do backup your data first, because it will not be there when you boot
> again!

I don't get it.
If you say that IDE disables interrupts during init, does that mean that
it disables _all_ interrupts or just that you mask the IDE IRQs?

Actually, I was thinking more along the lines of SCSI bus scan, because
the Linux IDE reset is already barely noticeable.

Does "timing sensitive" mean "don't come again too early" or "be 100%
punctual"?

There ought to be _some_ initializations that don't require interrupts?
Registering the file systems and network protocols, stuff like that?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: bind() allowed to non-local addresses

2000-10-19 Thread Felix von Leitner

Thus spake David S. Miller ([EMAIL PROTECTED]):
> I'll say it again, if you have to make changes to apps/servers the
> feature does not make any sense.  It must operate transparently or
> not at all.

There once was a socket file system which solved exactly this problem in
a nice and obvious way.  If you wanted to allow user joe to bind to port
80, you just do "chown joe /socks/80".

Whatever happened to that neat idea?
If it was under /proc, I would be happy.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Proposal: driver initialization pipelining

2000-10-19 Thread Felix von Leitner

Linux already boots fairly quickly, but there seems to be one
straightforward way to speed it up a little more: pipelining.

The idea is to split the initialization of drivers into two routines.
This is only useful for drivers that reset hardware and then wait a
while before continuing.  My thought is: during that time, other drivers
could work.

If we split the initialization into one "trigger the reset" routine and
one "do the rest" routine, we could interleave initializations by first
calling all the reset routines, then doing some static initializations
and then call all the second halves of the initialization.  Particularly
SCSI and IDE scans need noticeable time and could possibly be done in
parallel with the USB init, right?

This is just a quick idea.
If the whole concept is broken, please just say so.  No need to start a
monster thread about this.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Proposal: driver initialization pipelining

2000-10-19 Thread Felix von Leitner

Linux already boots fairly quickly, but there seems to be one
straightforward way to speed it up a little more: pipelining.

The idea is to split the initialization of drivers into two routines.
This is only useful for drivers that reset hardware and then wait a
while before continuing.  My thought is: during that time, other drivers
could work.

If we split the initialization into one "trigger the reset" routine and
one "do the rest" routine, we could interleave initializations by first
calling all the reset routines, then doing some static initializations
and then call all the second halves of the initialization.  Particularly
SCSI and IDE scans need noticeable time and could possibly be done in
parallel with the USB init, right?

This is just a quick idea.
If the whole concept is broken, please just say so.  No need to start a
monster thread about this.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: bind() allowed to non-local addresses

2000-10-19 Thread Felix von Leitner

Thus spake David S. Miller ([EMAIL PROTECTED]):
 I'll say it again, if you have to make changes to apps/servers the
 feature does not make any sense.  It must operate transparently or
 not at all.

There once was a socket file system which solved exactly this problem in
a nice and obvious way.  If you wanted to allow user joe to bind to port
80, you just do "chown joe /socks/80".

Whatever happened to that neat idea?
If it was under /proc, I would be happy.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: driver initialization pipelining

2000-10-19 Thread Felix von Leitner

Thus spake Andre Hedrick ([EMAIL PROTECTED]):
  Some of the initialization can definitely be done in parallel, but there
  are all sorts of special cases, like devices which turn off interrupts
  during init (IDE), and other fun tricks...  Some of the delays during
  init are timing sensitive, where you don't want to have to wait for the
  tasklet to be called for completion.
 I will be happy to break the IRQ code for a demo for Felix.
 But do backup your data first, because it will not be there when you boot
 again!

I don't get it.
If you say that IDE disables interrupts during init, does that mean that
it disables _all_ interrupts or just that you mask the IDE IRQs?

Actually, I was thinking more along the lines of SCSI bus scan, because
the Linux IDE reset is already barely noticeable.

Does "timing sensitive" mean "don't come again too early" or "be 100%
punctual"?

There ought to be _some_ initializations that don't require interrupts?
Registering the file systems and network protocols, stuff like that?

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Need help with SPARC fork()

2000-10-16 Thread Felix von Leitner

I need help with fork() on SPARC Linux.  I am trying to port my diet
libc to SPARC Linux but can't get fork() to work.  Even when I copy the
fork() code from glibc verbatim, the tasks have a corrupted stack frame.

I tried to strip the init code and it looks like I broke fork in the
process.  Does anyone have a pointer about fork() constraints that I
might have failed to notice?

Felix

PS: In case anyone is interested: the intel only version of diet libc is
at http://www.fefe.de/dietlibc/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >