Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-14 Thread James Antill
On Sat, 10 Feb 2007 18:49:56 -0800, Linus Torvalds wrote:

> And I actually talked about that in one of the emails already. There is no 
> way you can beat an event-based thing for things that _are_ event-based. 
> That means mainly networking.
> 
> For things that aren't event-based, but based on real IO (ie filesystems 
> etc), event models *suck*. They suck because the code isn't amenable to it 
> in the first place (ie anybody who thinks that a filesystem is like a 
> network stack and can be done as a state machine with packets is just 
> crazy!).
> 
> So you would be crazy to makea web server that uses this to handle _all_ 
> outstanding IO. Network connections are often slow, and you can have tens 
> of thousands outstanding (and some may be outstanding for hours until they 
> time out, if ever). But that's the whole point: you can easily mix the 
> two, as given in several examples already (ie you can easily make the main 
> loop itself basically do just

 I don't see any replies to this, so here's my 2¢. The simple model of
what a webserver does when sending static data is:

1. local_disk_fd = open()
2. fstat(local_disk_fd)
3. TCP_CORK on
4. send_headers();
5. LOOP
5a. sendfile(network_con_fd, local_disk_fd)
5b. epoll(network_con_fd)
6. TCP_CORK off

...and here's my personal plan (again, somewhat simplified), which I
think will be "better":

7. helper_proc_pipe_fd = DO open() + fstat()
8. read_stat_event_data(helper_proc_pipe_fd)
9. TCP_CORK on network_con_fd
10. send_headers(network_con_fd);
11. LOOP
11a. splice(helper_proc_pipe_fd, network_con_fd)
11b. epoll(network_con_fd && helper_proc_pipe_fd)
12. TCP_CORK off network_con_fd

...where the "helper proc" is doing splice() from disk to the pipe, on the
other end. This, at least in theory, gives you an async webserver and zero
copy disk to network[1]. My assumption is that Evgeniy's aio_sendfile()
could fit into that model pretty easily, and would be faster.

 However, from what you've said above you're only trying to help #1 and #2
(which are likely to be cached in the app. anyway) and apps.
that want to sendfile() to the network either do horrible hacks like
lighttpd's "AIO"[2], do a read+write copy loop with AIO or don't use AIO.


[1] And allows things like IO limiting, which aio_sendfile() won't.

[2] http://illiterat.livejournal.com/2989.html

-- 
James Antill -- [EMAIL PROTECTED]
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-14 Thread James Antill
On Sat, 10 Feb 2007 18:49:56 -0800, Linus Torvalds wrote:

 And I actually talked about that in one of the emails already. There is no 
 way you can beat an event-based thing for things that _are_ event-based. 
 That means mainly networking.
 
 For things that aren't event-based, but based on real IO (ie filesystems 
 etc), event models *suck*. They suck because the code isn't amenable to it 
 in the first place (ie anybody who thinks that a filesystem is like a 
 network stack and can be done as a state machine with packets is just 
 crazy!).
 
 So you would be crazy to makea web server that uses this to handle _all_ 
 outstanding IO. Network connections are often slow, and you can have tens 
 of thousands outstanding (and some may be outstanding for hours until they 
 time out, if ever). But that's the whole point: you can easily mix the 
 two, as given in several examples already (ie you can easily make the main 
 loop itself basically do just

 I don't see any replies to this, so here's my 2¢. The simple model of
what a webserver does when sending static data is:

1. local_disk_fd = open()
2. fstat(local_disk_fd)
3. TCP_CORK on
4. send_headers();
5. LOOP
5a. sendfile(network_con_fd, local_disk_fd)
5b. epoll(network_con_fd)
6. TCP_CORK off

...and here's my personal plan (again, somewhat simplified), which I
think will be better:

7. helper_proc_pipe_fd = DO open() + fstat()
8. read_stat_event_data(helper_proc_pipe_fd)
9. TCP_CORK on network_con_fd
10. send_headers(network_con_fd);
11. LOOP
11a. splice(helper_proc_pipe_fd, network_con_fd)
11b. epoll(network_con_fd  helper_proc_pipe_fd)
12. TCP_CORK off network_con_fd

...where the helper proc is doing splice() from disk to the pipe, on the
other end. This, at least in theory, gives you an async webserver and zero
copy disk to network[1]. My assumption is that Evgeniy's aio_sendfile()
could fit into that model pretty easily, and would be faster.

 However, from what you've said above you're only trying to help #1 and #2
(which are likely to be cached in the app. anyway) and apps.
that want to sendfile() to the network either do horrible hacks like
lighttpd's AIO[2], do a read+write copy loop with AIO or don't use AIO.


[1] And allows things like IO limiting, which aio_sendfile() won't.

[2] http://illiterat.livejournal.com/2989.html

-- 
James Antill -- [EMAIL PROTECTED]
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-11 Thread James Antill

"Stephen D. Williams" <[EMAIL PROTECTED]> writes:

> James Antill wrote:
> ...
> > >The
> > > time went from 3.7 to 4.4 seconds per 10.
> > 
> >  Ok here's a quick test that I've done. This passes data between 2
> > processes. Obviously you can't compare this to your code or Michael's,
> > however...
> 
> I've attached my version of his code with your suggested change. 
> Possibly I didn't do it correctly.

 It's not a code thing, but I think you are measuring the wrong thing
(at least in relation to the original question). Given the below
diff...

--- sdw-sockperf.c-orig Wed Apr 11 18:30:28 2001
+++ sdw-sockperf.c  Wed Apr 11 18:33:09 2001
@@ -17,6 +17,7 @@
 #include 
 #include 
 
+#define USE_DOUBLE_SELECT 0
 
 #ifndef INADDR_NONE
 #define INADDR_NONE ~0
@@ -147,7 +148,8 @@
 int pings = 0;
struct timeval zerotime;
int ret;
-
+unsigned int misses = 0;
+
 FD_ZERO();
 FD_SET(r, );
 gettimeofday(, 0);
@@ -163,8 +165,10 @@
//  if (!(ret = select( ... , )))
//  ret = select( ... , NULL);
//while ((readfds=fds, ret = select(r+1, , 0, 0, 0)) ) {-  
 while ((ret = select(r+1, , 0, 0, )) ||
-  (readfds=fds, ret = select(r+1, , 0, 0, 0)) ) {
+   while ((USE_DOUBLE_SELECT &&
+(ret = select(r+1, , 0, 0, ))) ||
+  (++misses &&
+(readfds=fds, ret = select(r+1, , 0, 0, 0)) )) {
   if (FD_ISSET(r, )) {
 char buf[1];
 int n = read(r, buf, sizeof(buf));
@@ -186,6 +190,8 @@
 readfds = fds;
 }
 gettimeofday(, 0);
+fprintf(stderr, "USE_DOUBLE_SELECT=%d\n", USE_DOUBLE_SELECT);
+fprintf(stderr, "misses=%u\n", misses);
 fprintf(stderr, "elapsed time for 10 pingpongs is %g\n", now.tv_sec - 
then.tv_sec + (now.tv_usec -
 then.tv_usec) / 100.0);
 fprintf(stderr, "closing %d\n", r);

...I get constitently better results for "localhost 45644 45644 a"
with USE_DOUBLE_SELECT=1 worth noting is that misses == 0 was always
true. However if I have 2 programs, one doing "localhost 45642 45643"
and one doing "localhost 45643 45642 a" then I get better results for
USE_DOUBLE_SELECT=0[1] and misses is 80-90 thousand (Ie. it has to do 2
select calls 80-90 % of the time).

 Please note that the original question was, select/poll does a small
schedule if you specify a timeout and that's bad. However in the two
process case you _need_ the schedule, because there isn't any data
there yet.
 So again given the original assumtion that data is available on one
of the fd's then doing the double select is better, but if it isn't
then you're wasting time no matter what you do.

 As to why my test code got good results even though it uses 2
processes, I used PF_LOCAL/AF_LOCAL sockets not PF_INET/AF_INET and
those are fast enough at transfering the data that you don't need the
schedule (misses == 0, if you add similar code to above).

[1] This is on a real computer, on a 486 with 8Meg of RAM I still get
better results with USE_DOUBLE_SELECT=1, and there are still 80%
misses (no idea why).

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-11 Thread James Antill

"Stephen D. Williams" [EMAIL PROTECTED] writes:

 James Antill wrote:
 ...
  The
   time went from 3.7 to 4.4 seconds per 10.
  
   Ok here's a quick test that I've done. This passes data between 2
  processes. Obviously you can't compare this to your code or Michael's,
  however...
 
 I've attached my version of his code with your suggested change. 
 Possibly I didn't do it correctly.

 It's not a code thing, but I think you are measuring the wrong thing
(at least in relation to the original question). Given the below
diff...

--- sdw-sockperf.c-orig Wed Apr 11 18:30:28 2001
+++ sdw-sockperf.c  Wed Apr 11 18:33:09 2001
@@ -17,6 +17,7 @@
 #include unistd.h
 #include netinet/tcp.h
 
+#define USE_DOUBLE_SELECT 0
 
 #ifndef INADDR_NONE
 #define INADDR_NONE ~0
@@ -147,7 +148,8 @@
 int pings = 0;
struct timeval zerotime;
int ret;
-
+unsigned int misses = 0;
+
 FD_ZERO(fds);
 FD_SET(r, fds);
 gettimeofday(then, 0);
@@ -163,8 +165,10 @@
//  if (!(ret = select( ... , zerotime)))
//  ret = select( ... , NULL);
//while ((readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) ) {-  
 while ((ret = select(r+1, readfds, 0, 0, zerotime)) ||
-  (readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) ) {
+   while ((USE_DOUBLE_SELECT 
+(ret = select(r+1, readfds, 0, 0, zerotime))) ||
+  (++misses 
+(readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) )) {
   if (FD_ISSET(r, readfds)) {
 char buf[1];
 int n = read(r, buf, sizeof(buf));
@@ -186,6 +190,8 @@
 readfds = fds;
 }
 gettimeofday(now, 0);
+fprintf(stderr, "USE_DOUBLE_SELECT=%d\n", USE_DOUBLE_SELECT);
+fprintf(stderr, "misses=%u\n", misses);
 fprintf(stderr, "elapsed time for 10 pingpongs is %g\n", now.tv_sec - 
then.tv_sec + (now.tv_usec -
 then.tv_usec) / 100.0);
 fprintf(stderr, "closing %d\n", r);

...I get constitently better results for "localhost 45644 45644 a"
with USE_DOUBLE_SELECT=1 worth noting is that misses == 0 was always
true. However if I have 2 programs, one doing "localhost 45642 45643"
and one doing "localhost 45643 45642 a" then I get better results for
USE_DOUBLE_SELECT=0[1] and misses is 80-90 thousand (Ie. it has to do 2
select calls 80-90 % of the time).

 Please note that the original question was, select/poll does a small
schedule if you specify a timeout and that's bad. However in the two
process case you _need_ the schedule, because there isn't any data
there yet.
 So again given the original assumtion that data is available on one
of the fd's then doing the double select is better, but if it isn't
then you're wasting time no matter what you do.

 As to why my test code got good results even though it uses 2
processes, I used PF_LOCAL/AF_LOCAL sockets not PF_INET/AF_INET and
those are fast enough at transfering the data that you don't need the
schedule (misses == 0, if you add similar code to above).

[1] This is on a real computer, on a 486 with 8Meg of RAM I still get
better results with USE_DOUBLE_SELECT=1, and there are still 80%
misses (no idea why).

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-10 Thread James Antill

"Stephen D. Williams" <[EMAIL PROTECTED]> writes:

> James Antill wrote:
> > 
> >  I seemed to miss the original post, so I can't really comment on the
> > tests. However...
> 
> It was a thread in January, but just ran accross it looking for
> something else.  See below for results.

 Ahh, ok.

> > > Michael Lindner wrote:
> ...
> > > > <0.21>
> > > >  0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6])
> > > > <0.47>
> > 
> >  The strace here shows select() with an infinite timeout, you're
> > numbers will be much better if you do (pseudo code)...

[snip ... ]

> > ...basically you completely miss the function call for __pollwait()
> > inside poll_wait (include/linux/poll.h in the linux sources, with
> > __pollwait being in fs/select.c).
> 
> Apparently the extra system call overhead outweighs any benefit.

 There shouldn't be any "extra" system calls in the fast path. If data
is waiting then you do one call to poll() either way, if not then you
are wasting time blocking so it doesn't matter what you do.

>   In any
> case, what you suggest would be better done in the kernel anyway.

 Possibly, however when this has come up before the kernel people have
said it's hard to do in kernel space.

>The
> time went from 3.7 to 4.4 seconds per 10.

 Ok here's a quick test that I've done. This passes data between 2
processes. Obviously you can't compare this to your code or Michael's,
however...

 The results with USE_DOUBLE_POLL on are...

% time ./pingpong
./pingpong  0.15s user 0.89s system 48% cpu 2.147 total
% time ./pingpong
./pingpong  0.19s user 0.91s system 45% cpu 2.422 total
% time ./pingpong
./pingpong  0.10s user 1.02s system 49% cpu 2.282 total

 The results with USE_DOUBLE_POLL off are...

% time ./pingpong
./pingpong  0.24s user 1.07s system 50% cpu 2.614 total
% time ./pingpong
./pingpong  0.21s user 1.00s system 44% cpu 2.695 total
% time ./pingpong
./pingpong  0.21s user 1.13s system 50% cpu 2.667 total

 Don't forget that the poll here is done with _1_ fd. Most real
programs have more, and so benifit more.

 I also did the TRY_NO_POLL, as I was pretty sure what the results
would be, that gives...

% time ./pingpong
./pingpong  0.03s user 0.41s system 50% cpu 0.874 total
% time ./pingpong
./pingpong  0.06s user 0.44s system 58% cpu 0.855 total
% time ./pingpong
./pingpong  0.07s user 0.35s system 51% cpu 0.820 total


 pingpong.c



-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null



Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-10 Thread James Antill

"Stephen D. Williams" [EMAIL PROTECTED] writes:

 James Antill wrote:
  
   I seemed to miss the original post, so I can't really comment on the
  tests. However...
 
 It was a thread in January, but just ran accross it looking for
 something else.  See below for results.

 Ahh, ok.

   Michael Lindner wrote:
 ...
0.21
 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6])
0.47
  
   The strace here shows select() with an infinite timeout, you're
  numbers will be much better if you do (pseudo code)...

[snip ... ]

  ...basically you completely miss the function call for __pollwait()
  inside poll_wait (include/linux/poll.h in the linux sources, with
  __pollwait being in fs/select.c).
 
 Apparently the extra system call overhead outweighs any benefit.

 There shouldn't be any "extra" system calls in the fast path. If data
is waiting then you do one call to poll() either way, if not then you
are wasting time blocking so it doesn't matter what you do.

   In any
 case, what you suggest would be better done in the kernel anyway.

 Possibly, however when this has come up before the kernel people have
said it's hard to do in kernel space.

The
 time went from 3.7 to 4.4 seconds per 10.

 Ok here's a quick test that I've done. This passes data between 2
processes. Obviously you can't compare this to your code or Michael's,
however...

 The results with USE_DOUBLE_POLL on are...

% time ./pingpong
./pingpong  0.15s user 0.89s system 48% cpu 2.147 total
% time ./pingpong
./pingpong  0.19s user 0.91s system 45% cpu 2.422 total
% time ./pingpong
./pingpong  0.10s user 1.02s system 49% cpu 2.282 total

 The results with USE_DOUBLE_POLL off are...

% time ./pingpong
./pingpong  0.24s user 1.07s system 50% cpu 2.614 total
% time ./pingpong
./pingpong  0.21s user 1.00s system 44% cpu 2.695 total
% time ./pingpong
./pingpong  0.21s user 1.13s system 50% cpu 2.667 total

 Don't forget that the poll here is done with _1_ fd. Most real
programs have more, and so benifit more.

 I also did the TRY_NO_POLL, as I was pretty sure what the results
would be, that gives...

% time ./pingpong
./pingpong  0.03s user 0.41s system 50% cpu 0.874 total
% time ./pingpong
./pingpong  0.06s user 0.44s system 58% cpu 0.855 total
% time ./pingpong
./pingpong  0.07s user 0.35s system 51% cpu 0.820 total


 pingpong.c



-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null



Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-09 Thread James Antill

"Stephen D. Williams" <[EMAIL PROTECTED]> writes:

> An old thread, but important to get these fundamental performance
> numbers up there:
> 
> 2.4.2 on an 800mhz PIII Sceptre laptop w/ 512MB ram:
> 
> elapsed time for 10 pingpongs is
> 3.81327
> 10/3.81256
> ~26229.09541095746689888159 
> 1/.379912
> ~26321.88506812103855629724  
> 
> 26300 compares to 8000/sec. quite well ;-)  You didn't give specs for
> your test machine unfortunately.
> 
> Since this tests both 'sides' of an application communication, it
> indicates a 'null transaction' rate of twice that.
> 
> This was typical cpu usage on a triple run of 1:
> CPU states:  7.2% user, 92.7% system,  0.0% nice,  0.0% idle  

 I seemed to miss the original post, so I can't really comment on the
tests. However...

> Michael Lindner wrote:
> > 
> > OK, 2.4.0 kernel installed, and a new set of numbers:
> > 
> > testkernel  ping-pongs/s. @ total CPU util  w/SOL_NDELAY
> > sample (2 skts) 2.2.18  100 @ 0.1%  800 @ 1%
> > sample (1 skt)  2.2.18  8000 @ 100% 8000 @ 50%
> > real app2.2.18  100 @ 0.1%  800 @ 1%
> > 
> > sample (2 skts) 2.4.0   8000 @ 50%  8000 @ 50%
> > sample (1 skt)  2.4.0   1 @ 50% 1 @ 50%
> > real app2.4.0   1200 @ 50%  1200 @ 50%
> > 
> > real appWindows 2K  4000 @ 100%
> > 
> > The two points that still seem strange to me are:
> > 
> > 1. The 1 socket case is still 25% faster than the 2 socket case in 2.4.0
> > (in 2.2.18 the 1 socket case was 10x faster).
> > 
> > 2. Linux never devotes more than 50% of the CPU (average over a long
> > run) to the two processes (25% to each process, with the rest of the
> > time idle).
> > 
> > I'd really love to show that Linux is a viable platform for our SW, and
> > I think it would be doable if I could figure out how to get the other
> > 50% of my CPU involved. An "strace -rT" of the real app on 2.4.0 looks
> > like this for each ping/pong.
> > 
> >  0.052371 send(7, "\0\0\0
> > \177\0\0\1\3243\0\0\0\2\4\236\216\341\0\0\v\277"..., 32, 0) = 32
> > <0.000529>
> >  0.000882 rt_sigprocmask(SIG_BLOCK, ~[], [RT_0], 8) = 0 <0.21>
> >  0.000242 rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0
> > <0.21>
> >  0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6])
> > <0.47>
> >  0.000328 read(6, "\0\0\0 ", 4) = 4 <0.31>
> >  0.000179 read(6,
> > "\177\0\0\1\3242\0\0\0\2\4\236\216\341\0\0\7\327\177\0\0"..., 28) = 28
> > <0.75>

 The strace here shows select() with an infinite timeout, you're
numbers will be much better if you do (pseudo code)...

  struct timeval zerotime;

  zerotime.tv_sec = 0;
  zerotime.tv_usec = 0;

 if (!(ret = select( ... , )))
  ret = select( ... , NULL);

...basically you completely miss the function call for __pollwait()
inside poll_wait (include/linux/poll.h in the linux sources, with
__pollwait being in fs/select.c).

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available

2001-04-09 Thread James Antill

"Stephen D. Williams" [EMAIL PROTECTED] writes:

 An old thread, but important to get these fundamental performance
 numbers up there:
 
 2.4.2 on an 800mhz PIII Sceptre laptop w/ 512MB ram:
 
 elapsed time for 10 pingpongs is
 3.81327
 10/3.81256
 ~26229.09541095746689888159 
 1/.379912
 ~26321.88506812103855629724  
 
 26300 compares to 8000/sec. quite well ;-)  You didn't give specs for
 your test machine unfortunately.
 
 Since this tests both 'sides' of an application communication, it
 indicates a 'null transaction' rate of twice that.
 
 This was typical cpu usage on a triple run of 1:
 CPU states:  7.2% user, 92.7% system,  0.0% nice,  0.0% idle  

 I seemed to miss the original post, so I can't really comment on the
tests. However...

 Michael Lindner wrote:
  
  OK, 2.4.0 kernel installed, and a new set of numbers:
  
  testkernel  ping-pongs/s. @ total CPU util  w/SOL_NDELAY
  sample (2 skts) 2.2.18  100 @ 0.1%  800 @ 1%
  sample (1 skt)  2.2.18  8000 @ 100% 8000 @ 50%
  real app2.2.18  100 @ 0.1%  800 @ 1%
  
  sample (2 skts) 2.4.0   8000 @ 50%  8000 @ 50%
  sample (1 skt)  2.4.0   1 @ 50% 1 @ 50%
  real app2.4.0   1200 @ 50%  1200 @ 50%
  
  real appWindows 2K  4000 @ 100%
  
  The two points that still seem strange to me are:
  
  1. The 1 socket case is still 25% faster than the 2 socket case in 2.4.0
  (in 2.2.18 the 1 socket case was 10x faster).
  
  2. Linux never devotes more than 50% of the CPU (average over a long
  run) to the two processes (25% to each process, with the rest of the
  time idle).
  
  I'd really love to show that Linux is a viable platform for our SW, and
  I think it would be doable if I could figure out how to get the other
  50% of my CPU involved. An "strace -rT" of the real app on 2.4.0 looks
  like this for each ping/pong.
  
   0.052371 send(7, "\0\0\0
  \177\0\0\1\3243\0\0\0\2\4\236\216\341\0\0\v\277"..., 32, 0) = 32
  0.000529
   0.000882 rt_sigprocmask(SIG_BLOCK, ~[], [RT_0], 8) = 0 0.21
   0.000242 rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0
  0.21
   0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6])
  0.47
   0.000328 read(6, "\0\0\0 ", 4) = 4 0.31
   0.000179 read(6,
  "\177\0\0\1\3242\0\0\0\2\4\236\216\341\0\0\7\327\177\0\0"..., 28) = 28
  0.75

 The strace here shows select() with an infinite timeout, you're
numbers will be much better if you do (pseudo code)...

  struct timeval zerotime;

  zerotime.tv_sec = 0;
  zerotime.tv_usec = 0;

 if (!(ret = select( ... , zerotime)))
  ret = select( ... , NULL);

...basically you completely miss the function call for __pollwait()
inside poll_wait (include/linux/poll.h in the linux sources, with
__pollwait being in fs/select.c).

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DNS goofups galore...

2001-02-20 Thread James Antill

"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] (James Antill) writes:
> 
> >"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes:
> 
> >> % telnet mail.bar.org smtp
> >> 220 mail.foo.org ESMTP ready
> >> 
> >> 
> >> This kills loop detection. Yes, it is done this way =%-) and it breaks
> >> if done wrong.
> 
> > This is humour, yeh ?
> 
> No.

 This was a comment on the "loop detection" claim.

[snip ... domain example]

> No. This is a misconfiguration. Yes, RFC821 is a bit rusty but as far
> as I know, nothing has superseded it yet. And Section 3.7 states
> clearly:
> 
>   Whenever domain names are used in SMTP only the official names are
>   used, the use of nicknames or aliases is not allowed.

 _In_ SMTP, that doesn't say anything about MX records to me and even
if it does it's very old and needs to change.

> And the 220 Message is defined as
> 
> 220 

 So... you should have the reverse for the ip address after the
220. Which most people do (but not all, mainly due to there not being
enough ips).

[snip CNAME lesson]

 The question was, why can't you use CNAMEs. You said 'because of loop
detection'. I said 'But that doesn't work anyway, because you can have
to names pointing at one machine without a CNAME record ... and that
needs to, and currently does, work'.

> Dipl.-Inf. (Univ.) Henning P. Schmiedehausen   -- Geschaeftsfuehrer
> INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED]

 Let me put it this way...

tanstaafl.de.   IN MX   50 mail.hometree.net.
tanstaafl.de.   IN MX   10 mail.intermeta.de.
intermeta.de.   IN MX   50 mail.hometree.net.
intermeta.de.   IN MX   10 mail.intermeta.de.

mail.hometree.net.  IN A194.231.17.49
mail.intermeta.de.  IN A212.34.181.3

49.17.231.194.in-addr.arpa.  IN PTR  limes.hometree.net.
3.181.34.212.in-addr.arpa.  IN PTR  babsi.intermeta.de.

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DNS goofups galore...

2001-02-20 Thread James Antill

"Henning P. Schmiedehausen" [EMAIL PROTECTED] writes:

 [EMAIL PROTECTED] (James Antill) writes:
 
 "Henning P. Schmiedehausen" [EMAIL PROTECTED] writes:
 
  % telnet mail.bar.org smtp
  220 mail.foo.org ESMTP ready
  
  
  This kills loop detection. Yes, it is done this way =%-) and it breaks
  if done wrong.
 
  This is humour, yeh ?
 
 No.

 This was a comment on the "loop detection" claim.

[snip ... domain example]

 No. This is a misconfiguration. Yes, RFC821 is a bit rusty but as far
 as I know, nothing has superseded it yet. And Section 3.7 states
 clearly:
 
   Whenever domain names are used in SMTP only the official names are
   used, the use of nicknames or aliases is not allowed.

 _In_ SMTP, that doesn't say anything about MX records to me and even
if it does it's very old and needs to change.

 And the 220 Message is defined as
 
 220 domain

 So... you should have the reverse for the ip address after the
220. Which most people do (but not all, mainly due to there not being
enough ips).

[snip CNAME lesson]

 The question was, why can't you use CNAMEs. You said 'because of loop
detection'. I said 'But that doesn't work anyway, because you can have
to names pointing at one machine without a CNAME record ... and that
needs to, and currently does, work'.

 Dipl.-Inf. (Univ.) Henning P. Schmiedehausen   -- Geschaeftsfuehrer
 INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED]

 Let me put it this way...

tanstaafl.de.   IN MX   50 mail.hometree.net.
tanstaafl.de.   IN MX   10 mail.intermeta.de.
intermeta.de.   IN MX   50 mail.hometree.net.
intermeta.de.   IN MX   10 mail.intermeta.de.

mail.hometree.net.  IN A194.231.17.49
mail.intermeta.de.  IN A212.34.181.3

49.17.231.194.in-addr.arpa.  IN PTR  limes.hometree.net.
3.181.34.212.in-addr.arpa.  IN PTR  babsi.intermeta.de.

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DNS goofups galore...

2001-02-12 Thread James Antill

"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] (H. Peter Anvin) writes:
> 
> >> In other words, you do a lookup, you start with a primary lookup
> >> and then possibly a second lookup to resolve an MX or CNAME.  It's only
> >> the MX that points to a CNAME that results in yet another lookup.  An
> >> MX pointing to a CNAME is almost (almost, but not quite) as bad as a
> >> CNAME pointing to a CNAME.
> >> 
> 
> >There is no reducibility problem for MX -> CNAME, unlike the CNAME ->
> >CNAME case.
> 
> >Please explain how there is any different between an CNAME or MX pointing
> >to an A record in a different SOA versus an MX pointing to a CNAME
> >pointing to an A record where at least one pair is local (same SOA).
> 
> CNAME is the "canonical name" of a host. Not an alias. There is good
> decriptions for the problem with this in the bat book. Basically it
> breaks if your mailer expects one host on the other side (mail.foo.org) 
> and suddently the host reports as mail.bar.org). The sender is
> allowed to assume that the name reported after the "220" greeting
> matches the name in the MX. This is impossible with a CNAME:
> 
> mail.foo.org.   IN A 1.2.3.4
> mail.bar.org.   IN CNAME mail.foo.org.
> bar.org.IN MX 10 mail.bar.org.
> 
> % telnet mail.bar.org smtp
> 220 mail.foo.org ESMTP ready
> 
> 
> This kills loop detection. Yes, it is done this way =%-) and it breaks
> if done wrong.

 This is humour, yeh ?

 I would be supprised if even sendmail assumed braindamage like the
above.
 For instance something that is pretty common is...

foo.example.com. IN A 4.4.4.4
foo.example.com. IN MX 10 mail.example.com.
foo.example.com. IN MX 20 backup-mx1.example.com.

; This is really mail.example.org.
backup-mx1.example.com.  IN A 1.2.3.4

...another is to have "farms" of mail servers (the A record for the MX
has multiple entries).
 If it "broke" as you said, then a lot of mail wouldn't be being routed.

-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*james@and\.org
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: Traceroute without s bit

2000-12-06 Thread James Antill

Olaf Kirch <[EMAIL PROTECTED]> writes:

>  3.   There seems to be a bug somewhere in the handling of poll().
>   If you observe the traceroute process with strace, you'll
>   notice that it starts spinning madly after receiving the
>   first bunch of packets (those with ttl 1).
> 
>   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
>   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
>   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
>   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
>   ...
> 
>   I.e. the poll call returns as if it had timed out, but it
>   hasn't.

 I've just looked at it, but I'm pretty sure this is a bug in your
code. This should fix it...

--- traceroute.c.orig   Wed Dec  6 10:33:48 2000
+++ traceroute.cWed Dec  6 10:34:06 2000
@@ -193,7 +193,7 @@
timeout = hop->nextsend;
}
 
-   poll(pfd, m, timeout - now);
+   poll(pfd, m, (timeout - now) * 1000);
 
/* Receive any pending ICMP errors */
        for (n = 0; n < m; n++) {


-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*[EMAIL PROTECTED]
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Traceroute without s bit

2000-12-06 Thread James Antill

Olaf Kirch [EMAIL PROTECTED] writes:

  3.   There seems to be a bug somewhere in the handling of poll().
   If you observe the traceroute process with strace, you'll
   notice that it starts spinning madly after receiving the
   first bunch of packets (those with ttl 1).
 
   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
   13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0
   ...
 
   I.e. the poll call returns as if it had timed out, but it
   hasn't.

 I've just looked at it, but I'm pretty sure this is a bug in your
code. This should fix it...

--- traceroute.c.orig   Wed Dec  6 10:33:48 2000
+++ traceroute.cWed Dec  6 10:34:06 2000
@@ -193,7 +193,7 @@
timeout = hop-nextsend;
}
 
-   poll(pfd, m, timeout - now);
+   poll(pfd, m, (timeout - now) * 1000);
 
/* Receive any pending ICMP errors */
for (n = 0; n  m; n++) {


-- 
# James Antill -- [EMAIL PROTECTED]
:0:
* ^From: .*[EMAIL PROTECTED]
/dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sigtimedwait with a zero timeout

2000-10-02 Thread James Antill

Henrik Nordstrom <[EMAIL PROTECTED]> writes:

> You are not late. In fact you are the first who have responded to my
> linux-kernel messages at all.
> 
> Yes, I am well aware of sigwaitinfo.
> 
> sigwaitinfo blocks infinitely if there is no queued signals and is the
> opposite of sigtimedwait with a zero timeout.

 Yes, sorry that's what I thought you wanted to do (Ie. you new some
data was there and wanted to get it quickly).

> sigwaitinfo is implemented as sigtimedwait with a NULL timeout which is
> read as a timeout of MAX_SCHEDULE_TIMEOUT.

 Ahh I didn't know that.

> sigtimedwait with a zero timeout are meant to be used by applications
> needing to poll signal queues while doing other processing. Having
> sigtimedwait always block for at least 10 ms can have a quite negative
> impact on such applications.

 If you want to return imediatley (and there might not be data) the
answer given is usually...

 sigqueue( ... );
 sigwaitinfo( ... );

 If the above will still schedule, then Linus might be more likely to
take a patch (I'd guess that he'd look at sigtimedwait() to be like
sleep() in most other cases though). 

-- 
James Antill -- [EMAIL PROTECTED]
"If we can't keep this sort of thing out of the kernel, we might as well
pack it up and go run Solaris." -- Larry McVoy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sigtimedwait with a zero timeout

2000-10-02 Thread James Antill

Henrik Nordstrom [EMAIL PROTECTED] writes:

 You are not late. In fact you are the first who have responded to my
 linux-kernel messages at all.
 
 Yes, I am well aware of sigwaitinfo.
 
 sigwaitinfo blocks infinitely if there is no queued signals and is the
 opposite of sigtimedwait with a zero timeout.

 Yes, sorry that's what I thought you wanted to do (Ie. you new some
data was there and wanted to get it quickly).

 sigwaitinfo is implemented as sigtimedwait with a NULL timeout which is
 read as a timeout of MAX_SCHEDULE_TIMEOUT.

 Ahh I didn't know that.

 sigtimedwait with a zero timeout are meant to be used by applications
 needing to poll signal queues while doing other processing. Having
 sigtimedwait always block for at least 10 ms can have a quite negative
 impact on such applications.

 If you want to return imediatley (and there might not be data) the
answer given is usually...

 sigqueue( ... );
 sigwaitinfo( ... );

 If the above will still schedule, then Linus might be more likely to
take a patch (I'd guess that he'd look at sigtimedwait() to be like
sleep() in most other cases though). 

-- 
James Antill -- [EMAIL PROTECTED]
"If we can't keep this sort of thing out of the kernel, we might as well
pack it up and go run Solaris." -- Larry McVoy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/