Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
On Sat, 10 Feb 2007 18:49:56 -0800, Linus Torvalds wrote: > And I actually talked about that in one of the emails already. There is no > way you can beat an event-based thing for things that _are_ event-based. > That means mainly networking. > > For things that aren't event-based, but based on real IO (ie filesystems > etc), event models *suck*. They suck because the code isn't amenable to it > in the first place (ie anybody who thinks that a filesystem is like a > network stack and can be done as a state machine with packets is just > crazy!). > > So you would be crazy to makea web server that uses this to handle _all_ > outstanding IO. Network connections are often slow, and you can have tens > of thousands outstanding (and some may be outstanding for hours until they > time out, if ever). But that's the whole point: you can easily mix the > two, as given in several examples already (ie you can easily make the main > loop itself basically do just I don't see any replies to this, so here's my 2¢. The simple model of what a webserver does when sending static data is: 1. local_disk_fd = open() 2. fstat(local_disk_fd) 3. TCP_CORK on 4. send_headers(); 5. LOOP 5a. sendfile(network_con_fd, local_disk_fd) 5b. epoll(network_con_fd) 6. TCP_CORK off ...and here's my personal plan (again, somewhat simplified), which I think will be "better": 7. helper_proc_pipe_fd = DO open() + fstat() 8. read_stat_event_data(helper_proc_pipe_fd) 9. TCP_CORK on network_con_fd 10. send_headers(network_con_fd); 11. LOOP 11a. splice(helper_proc_pipe_fd, network_con_fd) 11b. epoll(network_con_fd && helper_proc_pipe_fd) 12. TCP_CORK off network_con_fd ...where the "helper proc" is doing splice() from disk to the pipe, on the other end. This, at least in theory, gives you an async webserver and zero copy disk to network[1]. My assumption is that Evgeniy's aio_sendfile() could fit into that model pretty easily, and would be faster. However, from what you've said above you're only trying to help #1 and #2 (which are likely to be cached in the app. anyway) and apps. that want to sendfile() to the network either do horrible hacks like lighttpd's "AIO"[2], do a read+write copy loop with AIO or don't use AIO. [1] And allows things like IO limiting, which aio_sendfile() won't. [2] http://illiterat.livejournal.com/2989.html -- James Antill -- [EMAIL PROTECTED] http://www.and.org/and-httpd/ -- $2,000 security guarantee http://www.and.org/vstr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
On Sat, 10 Feb 2007 18:49:56 -0800, Linus Torvalds wrote: And I actually talked about that in one of the emails already. There is no way you can beat an event-based thing for things that _are_ event-based. That means mainly networking. For things that aren't event-based, but based on real IO (ie filesystems etc), event models *suck*. They suck because the code isn't amenable to it in the first place (ie anybody who thinks that a filesystem is like a network stack and can be done as a state machine with packets is just crazy!). So you would be crazy to makea web server that uses this to handle _all_ outstanding IO. Network connections are often slow, and you can have tens of thousands outstanding (and some may be outstanding for hours until they time out, if ever). But that's the whole point: you can easily mix the two, as given in several examples already (ie you can easily make the main loop itself basically do just I don't see any replies to this, so here's my 2¢. The simple model of what a webserver does when sending static data is: 1. local_disk_fd = open() 2. fstat(local_disk_fd) 3. TCP_CORK on 4. send_headers(); 5. LOOP 5a. sendfile(network_con_fd, local_disk_fd) 5b. epoll(network_con_fd) 6. TCP_CORK off ...and here's my personal plan (again, somewhat simplified), which I think will be better: 7. helper_proc_pipe_fd = DO open() + fstat() 8. read_stat_event_data(helper_proc_pipe_fd) 9. TCP_CORK on network_con_fd 10. send_headers(network_con_fd); 11. LOOP 11a. splice(helper_proc_pipe_fd, network_con_fd) 11b. epoll(network_con_fd helper_proc_pipe_fd) 12. TCP_CORK off network_con_fd ...where the helper proc is doing splice() from disk to the pipe, on the other end. This, at least in theory, gives you an async webserver and zero copy disk to network[1]. My assumption is that Evgeniy's aio_sendfile() could fit into that model pretty easily, and would be faster. However, from what you've said above you're only trying to help #1 and #2 (which are likely to be cached in the app. anyway) and apps. that want to sendfile() to the network either do horrible hacks like lighttpd's AIO[2], do a read+write copy loop with AIO or don't use AIO. [1] And allows things like IO limiting, which aio_sendfile() won't. [2] http://illiterat.livejournal.com/2989.html -- James Antill -- [EMAIL PROTECTED] http://www.and.org/and-httpd/ -- $2,000 security guarantee http://www.and.org/vstr/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" <[EMAIL PROTECTED]> writes: > James Antill wrote: > ... > > >The > > > time went from 3.7 to 4.4 seconds per 10. > > > > Ok here's a quick test that I've done. This passes data between 2 > > processes. Obviously you can't compare this to your code or Michael's, > > however... > > I've attached my version of his code with your suggested change. > Possibly I didn't do it correctly. It's not a code thing, but I think you are measuring the wrong thing (at least in relation to the original question). Given the below diff... --- sdw-sockperf.c-orig Wed Apr 11 18:30:28 2001 +++ sdw-sockperf.c Wed Apr 11 18:33:09 2001 @@ -17,6 +17,7 @@ #include #include +#define USE_DOUBLE_SELECT 0 #ifndef INADDR_NONE #define INADDR_NONE ~0 @@ -147,7 +148,8 @@ int pings = 0; struct timeval zerotime; int ret; - +unsigned int misses = 0; + FD_ZERO(); FD_SET(r, ); gettimeofday(, 0); @@ -163,8 +165,10 @@ // if (!(ret = select( ... , ))) // ret = select( ... , NULL); //while ((readfds=fds, ret = select(r+1, , 0, 0, 0)) ) {- while ((ret = select(r+1, , 0, 0, )) || - (readfds=fds, ret = select(r+1, , 0, 0, 0)) ) { + while ((USE_DOUBLE_SELECT && +(ret = select(r+1, , 0, 0, ))) || + (++misses && +(readfds=fds, ret = select(r+1, , 0, 0, 0)) )) { if (FD_ISSET(r, )) { char buf[1]; int n = read(r, buf, sizeof(buf)); @@ -186,6 +190,8 @@ readfds = fds; } gettimeofday(, 0); +fprintf(stderr, "USE_DOUBLE_SELECT=%d\n", USE_DOUBLE_SELECT); +fprintf(stderr, "misses=%u\n", misses); fprintf(stderr, "elapsed time for 10 pingpongs is %g\n", now.tv_sec - then.tv_sec + (now.tv_usec - then.tv_usec) / 100.0); fprintf(stderr, "closing %d\n", r); ...I get constitently better results for "localhost 45644 45644 a" with USE_DOUBLE_SELECT=1 worth noting is that misses == 0 was always true. However if I have 2 programs, one doing "localhost 45642 45643" and one doing "localhost 45643 45642 a" then I get better results for USE_DOUBLE_SELECT=0[1] and misses is 80-90 thousand (Ie. it has to do 2 select calls 80-90 % of the time). Please note that the original question was, select/poll does a small schedule if you specify a timeout and that's bad. However in the two process case you _need_ the schedule, because there isn't any data there yet. So again given the original assumtion that data is available on one of the fd's then doing the double select is better, but if it isn't then you're wasting time no matter what you do. As to why my test code got good results even though it uses 2 processes, I used PF_LOCAL/AF_LOCAL sockets not PF_INET/AF_INET and those are fast enough at transfering the data that you don't need the schedule (misses == 0, if you add similar code to above). [1] This is on a real computer, on a 486 with 8Meg of RAM I still get better results with USE_DOUBLE_SELECT=1, and there are still 80% misses (no idea why). -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" [EMAIL PROTECTED] writes: James Antill wrote: ... The time went from 3.7 to 4.4 seconds per 10. Ok here's a quick test that I've done. This passes data between 2 processes. Obviously you can't compare this to your code or Michael's, however... I've attached my version of his code with your suggested change. Possibly I didn't do it correctly. It's not a code thing, but I think you are measuring the wrong thing (at least in relation to the original question). Given the below diff... --- sdw-sockperf.c-orig Wed Apr 11 18:30:28 2001 +++ sdw-sockperf.c Wed Apr 11 18:33:09 2001 @@ -17,6 +17,7 @@ #include unistd.h #include netinet/tcp.h +#define USE_DOUBLE_SELECT 0 #ifndef INADDR_NONE #define INADDR_NONE ~0 @@ -147,7 +148,8 @@ int pings = 0; struct timeval zerotime; int ret; - +unsigned int misses = 0; + FD_ZERO(fds); FD_SET(r, fds); gettimeofday(then, 0); @@ -163,8 +165,10 @@ // if (!(ret = select( ... , zerotime))) // ret = select( ... , NULL); //while ((readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) ) {- while ((ret = select(r+1, readfds, 0, 0, zerotime)) || - (readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) ) { + while ((USE_DOUBLE_SELECT +(ret = select(r+1, readfds, 0, 0, zerotime))) || + (++misses +(readfds=fds, ret = select(r+1, readfds, 0, 0, 0)) )) { if (FD_ISSET(r, readfds)) { char buf[1]; int n = read(r, buf, sizeof(buf)); @@ -186,6 +190,8 @@ readfds = fds; } gettimeofday(now, 0); +fprintf(stderr, "USE_DOUBLE_SELECT=%d\n", USE_DOUBLE_SELECT); +fprintf(stderr, "misses=%u\n", misses); fprintf(stderr, "elapsed time for 10 pingpongs is %g\n", now.tv_sec - then.tv_sec + (now.tv_usec - then.tv_usec) / 100.0); fprintf(stderr, "closing %d\n", r); ...I get constitently better results for "localhost 45644 45644 a" with USE_DOUBLE_SELECT=1 worth noting is that misses == 0 was always true. However if I have 2 programs, one doing "localhost 45642 45643" and one doing "localhost 45643 45642 a" then I get better results for USE_DOUBLE_SELECT=0[1] and misses is 80-90 thousand (Ie. it has to do 2 select calls 80-90 % of the time). Please note that the original question was, select/poll does a small schedule if you specify a timeout and that's bad. However in the two process case you _need_ the schedule, because there isn't any data there yet. So again given the original assumtion that data is available on one of the fd's then doing the double select is better, but if it isn't then you're wasting time no matter what you do. As to why my test code got good results even though it uses 2 processes, I used PF_LOCAL/AF_LOCAL sockets not PF_INET/AF_INET and those are fast enough at transfering the data that you don't need the schedule (misses == 0, if you add similar code to above). [1] This is on a real computer, on a 486 with 8Meg of RAM I still get better results with USE_DOUBLE_SELECT=1, and there are still 80% misses (no idea why). -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" <[EMAIL PROTECTED]> writes: > James Antill wrote: > > > > I seemed to miss the original post, so I can't really comment on the > > tests. However... > > It was a thread in January, but just ran accross it looking for > something else. See below for results. Ahh, ok. > > > Michael Lindner wrote: > ... > > > > <0.21> > > > > 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6]) > > > > <0.47> > > > > The strace here shows select() with an infinite timeout, you're > > numbers will be much better if you do (pseudo code)... [snip ... ] > > ...basically you completely miss the function call for __pollwait() > > inside poll_wait (include/linux/poll.h in the linux sources, with > > __pollwait being in fs/select.c). > > Apparently the extra system call overhead outweighs any benefit. There shouldn't be any "extra" system calls in the fast path. If data is waiting then you do one call to poll() either way, if not then you are wasting time blocking so it doesn't matter what you do. > In any > case, what you suggest would be better done in the kernel anyway. Possibly, however when this has come up before the kernel people have said it's hard to do in kernel space. >The > time went from 3.7 to 4.4 seconds per 10. Ok here's a quick test that I've done. This passes data between 2 processes. Obviously you can't compare this to your code or Michael's, however... The results with USE_DOUBLE_POLL on are... % time ./pingpong ./pingpong 0.15s user 0.89s system 48% cpu 2.147 total % time ./pingpong ./pingpong 0.19s user 0.91s system 45% cpu 2.422 total % time ./pingpong ./pingpong 0.10s user 1.02s system 49% cpu 2.282 total The results with USE_DOUBLE_POLL off are... % time ./pingpong ./pingpong 0.24s user 1.07s system 50% cpu 2.614 total % time ./pingpong ./pingpong 0.21s user 1.00s system 44% cpu 2.695 total % time ./pingpong ./pingpong 0.21s user 1.13s system 50% cpu 2.667 total Don't forget that the poll here is done with _1_ fd. Most real programs have more, and so benifit more. I also did the TRY_NO_POLL, as I was pretty sure what the results would be, that gives... % time ./pingpong ./pingpong 0.03s user 0.41s system 50% cpu 0.874 total % time ./pingpong ./pingpong 0.06s user 0.44s system 58% cpu 0.855 total % time ./pingpong ./pingpong 0.07s user 0.35s system 51% cpu 0.820 total pingpong.c -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" [EMAIL PROTECTED] writes: James Antill wrote: I seemed to miss the original post, so I can't really comment on the tests. However... It was a thread in January, but just ran accross it looking for something else. See below for results. Ahh, ok. Michael Lindner wrote: ... 0.21 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6]) 0.47 The strace here shows select() with an infinite timeout, you're numbers will be much better if you do (pseudo code)... [snip ... ] ...basically you completely miss the function call for __pollwait() inside poll_wait (include/linux/poll.h in the linux sources, with __pollwait being in fs/select.c). Apparently the extra system call overhead outweighs any benefit. There shouldn't be any "extra" system calls in the fast path. If data is waiting then you do one call to poll() either way, if not then you are wasting time blocking so it doesn't matter what you do. In any case, what you suggest would be better done in the kernel anyway. Possibly, however when this has come up before the kernel people have said it's hard to do in kernel space. The time went from 3.7 to 4.4 seconds per 10. Ok here's a quick test that I've done. This passes data between 2 processes. Obviously you can't compare this to your code or Michael's, however... The results with USE_DOUBLE_POLL on are... % time ./pingpong ./pingpong 0.15s user 0.89s system 48% cpu 2.147 total % time ./pingpong ./pingpong 0.19s user 0.91s system 45% cpu 2.422 total % time ./pingpong ./pingpong 0.10s user 1.02s system 49% cpu 2.282 total The results with USE_DOUBLE_POLL off are... % time ./pingpong ./pingpong 0.24s user 1.07s system 50% cpu 2.614 total % time ./pingpong ./pingpong 0.21s user 1.00s system 44% cpu 2.695 total % time ./pingpong ./pingpong 0.21s user 1.13s system 50% cpu 2.667 total Don't forget that the poll here is done with _1_ fd. Most real programs have more, and so benifit more. I also did the TRY_NO_POLL, as I was pretty sure what the results would be, that gives... % time ./pingpong ./pingpong 0.03s user 0.41s system 50% cpu 0.874 total % time ./pingpong ./pingpong 0.06s user 0.44s system 58% cpu 0.855 total % time ./pingpong ./pingpong 0.07s user 0.35s system 51% cpu 0.820 total pingpong.c -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" <[EMAIL PROTECTED]> writes: > An old thread, but important to get these fundamental performance > numbers up there: > > 2.4.2 on an 800mhz PIII Sceptre laptop w/ 512MB ram: > > elapsed time for 10 pingpongs is > 3.81327 > 10/3.81256 > ~26229.09541095746689888159 > 1/.379912 > ~26321.88506812103855629724 > > 26300 compares to 8000/sec. quite well ;-) You didn't give specs for > your test machine unfortunately. > > Since this tests both 'sides' of an application communication, it > indicates a 'null transaction' rate of twice that. > > This was typical cpu usage on a triple run of 1: > CPU states: 7.2% user, 92.7% system, 0.0% nice, 0.0% idle I seemed to miss the original post, so I can't really comment on the tests. However... > Michael Lindner wrote: > > > > OK, 2.4.0 kernel installed, and a new set of numbers: > > > > testkernel ping-pongs/s. @ total CPU util w/SOL_NDELAY > > sample (2 skts) 2.2.18 100 @ 0.1% 800 @ 1% > > sample (1 skt) 2.2.18 8000 @ 100% 8000 @ 50% > > real app2.2.18 100 @ 0.1% 800 @ 1% > > > > sample (2 skts) 2.4.0 8000 @ 50% 8000 @ 50% > > sample (1 skt) 2.4.0 1 @ 50% 1 @ 50% > > real app2.4.0 1200 @ 50% 1200 @ 50% > > > > real appWindows 2K 4000 @ 100% > > > > The two points that still seem strange to me are: > > > > 1. The 1 socket case is still 25% faster than the 2 socket case in 2.4.0 > > (in 2.2.18 the 1 socket case was 10x faster). > > > > 2. Linux never devotes more than 50% of the CPU (average over a long > > run) to the two processes (25% to each process, with the rest of the > > time idle). > > > > I'd really love to show that Linux is a viable platform for our SW, and > > I think it would be doable if I could figure out how to get the other > > 50% of my CPU involved. An "strace -rT" of the real app on 2.4.0 looks > > like this for each ping/pong. > > > > 0.052371 send(7, "\0\0\0 > > \177\0\0\1\3243\0\0\0\2\4\236\216\341\0\0\v\277"..., 32, 0) = 32 > > <0.000529> > > 0.000882 rt_sigprocmask(SIG_BLOCK, ~[], [RT_0], 8) = 0 <0.21> > > 0.000242 rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0 > > <0.21> > > 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6]) > > <0.47> > > 0.000328 read(6, "\0\0\0 ", 4) = 4 <0.31> > > 0.000179 read(6, > > "\177\0\0\1\3242\0\0\0\2\4\236\216\341\0\0\7\327\177\0\0"..., 28) = 28 > > <0.75> The strace here shows select() with an infinite timeout, you're numbers will be much better if you do (pseudo code)... struct timeval zerotime; zerotime.tv_sec = 0; zerotime.tv_usec = 0; if (!(ret = select( ... , ))) ret = select( ... , NULL); ...basically you completely miss the function call for __pollwait() inside poll_wait (include/linux/poll.h in the linux sources, with __pollwait being in fs/select.c). -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
"Stephen D. Williams" [EMAIL PROTECTED] writes: An old thread, but important to get these fundamental performance numbers up there: 2.4.2 on an 800mhz PIII Sceptre laptop w/ 512MB ram: elapsed time for 10 pingpongs is 3.81327 10/3.81256 ~26229.09541095746689888159 1/.379912 ~26321.88506812103855629724 26300 compares to 8000/sec. quite well ;-) You didn't give specs for your test machine unfortunately. Since this tests both 'sides' of an application communication, it indicates a 'null transaction' rate of twice that. This was typical cpu usage on a triple run of 1: CPU states: 7.2% user, 92.7% system, 0.0% nice, 0.0% idle I seemed to miss the original post, so I can't really comment on the tests. However... Michael Lindner wrote: OK, 2.4.0 kernel installed, and a new set of numbers: testkernel ping-pongs/s. @ total CPU util w/SOL_NDELAY sample (2 skts) 2.2.18 100 @ 0.1% 800 @ 1% sample (1 skt) 2.2.18 8000 @ 100% 8000 @ 50% real app2.2.18 100 @ 0.1% 800 @ 1% sample (2 skts) 2.4.0 8000 @ 50% 8000 @ 50% sample (1 skt) 2.4.0 1 @ 50% 1 @ 50% real app2.4.0 1200 @ 50% 1200 @ 50% real appWindows 2K 4000 @ 100% The two points that still seem strange to me are: 1. The 1 socket case is still 25% faster than the 2 socket case in 2.4.0 (in 2.2.18 the 1 socket case was 10x faster). 2. Linux never devotes more than 50% of the CPU (average over a long run) to the two processes (25% to each process, with the rest of the time idle). I'd really love to show that Linux is a viable platform for our SW, and I think it would be doable if I could figure out how to get the other 50% of my CPU involved. An "strace -rT" of the real app on 2.4.0 looks like this for each ping/pong. 0.052371 send(7, "\0\0\0 \177\0\0\1\3243\0\0\0\2\4\236\216\341\0\0\v\277"..., 32, 0) = 32 0.000529 0.000882 rt_sigprocmask(SIG_BLOCK, ~[], [RT_0], 8) = 0 0.21 0.000242 rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0 0.21 0.000173 select(8, [3 4 6 7], NULL, NULL, NULL) = 1 (in [6]) 0.47 0.000328 read(6, "\0\0\0 ", 4) = 4 0.31 0.000179 read(6, "\177\0\0\1\3242\0\0\0\2\4\236\216\341\0\0\7\327\177\0\0"..., 28) = 28 0.75 The strace here shows select() with an infinite timeout, you're numbers will be much better if you do (pseudo code)... struct timeval zerotime; zerotime.tv_sec = 0; zerotime.tv_usec = 0; if (!(ret = select( ... , zerotime))) ret = select( ... , NULL); ...basically you completely miss the function call for __pollwait() inside poll_wait (include/linux/poll.h in the linux sources, with __pollwait being in fs/select.c). -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DNS goofups galore...
"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes: > [EMAIL PROTECTED] (James Antill) writes: > > >"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes: > > >> % telnet mail.bar.org smtp > >> 220 mail.foo.org ESMTP ready > >> > >> > >> This kills loop detection. Yes, it is done this way =%-) and it breaks > >> if done wrong. > > > This is humour, yeh ? > > No. This was a comment on the "loop detection" claim. [snip ... domain example] > No. This is a misconfiguration. Yes, RFC821 is a bit rusty but as far > as I know, nothing has superseded it yet. And Section 3.7 states > clearly: > > Whenever domain names are used in SMTP only the official names are > used, the use of nicknames or aliases is not allowed. _In_ SMTP, that doesn't say anything about MX records to me and even if it does it's very old and needs to change. > And the 220 Message is defined as > > 220 So... you should have the reverse for the ip address after the 220. Which most people do (but not all, mainly due to there not being enough ips). [snip CNAME lesson] The question was, why can't you use CNAMEs. You said 'because of loop detection'. I said 'But that doesn't work anyway, because you can have to names pointing at one machine without a CNAME record ... and that needs to, and currently does, work'. > Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer > INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED] Let me put it this way... tanstaafl.de. IN MX 50 mail.hometree.net. tanstaafl.de. IN MX 10 mail.intermeta.de. intermeta.de. IN MX 50 mail.hometree.net. intermeta.de. IN MX 10 mail.intermeta.de. mail.hometree.net. IN A194.231.17.49 mail.intermeta.de. IN A212.34.181.3 49.17.231.194.in-addr.arpa. IN PTR limes.hometree.net. 3.181.34.212.in-addr.arpa. IN PTR babsi.intermeta.de. -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DNS goofups galore...
"Henning P. Schmiedehausen" [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (James Antill) writes: "Henning P. Schmiedehausen" [EMAIL PROTECTED] writes: % telnet mail.bar.org smtp 220 mail.foo.org ESMTP ready This kills loop detection. Yes, it is done this way =%-) and it breaks if done wrong. This is humour, yeh ? No. This was a comment on the "loop detection" claim. [snip ... domain example] No. This is a misconfiguration. Yes, RFC821 is a bit rusty but as far as I know, nothing has superseded it yet. And Section 3.7 states clearly: Whenever domain names are used in SMTP only the official names are used, the use of nicknames or aliases is not allowed. _In_ SMTP, that doesn't say anything about MX records to me and even if it does it's very old and needs to change. And the 220 Message is defined as 220 domain So... you should have the reverse for the ip address after the 220. Which most people do (but not all, mainly due to there not being enough ips). [snip CNAME lesson] The question was, why can't you use CNAMEs. You said 'because of loop detection'. I said 'But that doesn't work anyway, because you can have to names pointing at one machine without a CNAME record ... and that needs to, and currently does, work'. Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED] Let me put it this way... tanstaafl.de. IN MX 50 mail.hometree.net. tanstaafl.de. IN MX 10 mail.intermeta.de. intermeta.de. IN MX 50 mail.hometree.net. intermeta.de. IN MX 10 mail.intermeta.de. mail.hometree.net. IN A194.231.17.49 mail.intermeta.de. IN A212.34.181.3 49.17.231.194.in-addr.arpa. IN PTR limes.hometree.net. 3.181.34.212.in-addr.arpa. IN PTR babsi.intermeta.de. -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DNS goofups galore...
"Henning P. Schmiedehausen" <[EMAIL PROTECTED]> writes: > [EMAIL PROTECTED] (H. Peter Anvin) writes: > > >> In other words, you do a lookup, you start with a primary lookup > >> and then possibly a second lookup to resolve an MX or CNAME. It's only > >> the MX that points to a CNAME that results in yet another lookup. An > >> MX pointing to a CNAME is almost (almost, but not quite) as bad as a > >> CNAME pointing to a CNAME. > >> > > >There is no reducibility problem for MX -> CNAME, unlike the CNAME -> > >CNAME case. > > >Please explain how there is any different between an CNAME or MX pointing > >to an A record in a different SOA versus an MX pointing to a CNAME > >pointing to an A record where at least one pair is local (same SOA). > > CNAME is the "canonical name" of a host. Not an alias. There is good > decriptions for the problem with this in the bat book. Basically it > breaks if your mailer expects one host on the other side (mail.foo.org) > and suddently the host reports as mail.bar.org). The sender is > allowed to assume that the name reported after the "220" greeting > matches the name in the MX. This is impossible with a CNAME: > > mail.foo.org. IN A 1.2.3.4 > mail.bar.org. IN CNAME mail.foo.org. > bar.org.IN MX 10 mail.bar.org. > > % telnet mail.bar.org smtp > 220 mail.foo.org ESMTP ready > > > This kills loop detection. Yes, it is done this way =%-) and it breaks > if done wrong. This is humour, yeh ? I would be supprised if even sendmail assumed braindamage like the above. For instance something that is pretty common is... foo.example.com. IN A 4.4.4.4 foo.example.com. IN MX 10 mail.example.com. foo.example.com. IN MX 20 backup-mx1.example.com. ; This is really mail.example.org. backup-mx1.example.com. IN A 1.2.3.4 ...another is to have "farms" of mail servers (the A record for the MX has multiple entries). If it "broke" as you said, then a lot of mail wouldn't be being routed. -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*james@and\.org /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://vger.kernel.org/lkml/
Re: Traceroute without s bit
Olaf Kirch <[EMAIL PROTECTED]> writes: > 3. There seems to be a bug somewhere in the handling of poll(). > If you observe the traceroute process with strace, you'll > notice that it starts spinning madly after receiving the > first bunch of packets (those with ttl 1). > > 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 > 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 > 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 > 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 > ... > > I.e. the poll call returns as if it had timed out, but it > hasn't. I've just looked at it, but I'm pretty sure this is a bug in your code. This should fix it... --- traceroute.c.orig Wed Dec 6 10:33:48 2000 +++ traceroute.cWed Dec 6 10:34:06 2000 @@ -193,7 +193,7 @@ timeout = hop->nextsend; } - poll(pfd, m, timeout - now); + poll(pfd, m, (timeout - now) * 1000); /* Receive any pending ICMP errors */ for (n = 0; n < m; n++) { -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*[EMAIL PROTECTED] /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Traceroute without s bit
Olaf Kirch [EMAIL PROTECTED] writes: 3. There seems to be a bug somewhere in the handling of poll(). If you observe the traceroute process with strace, you'll notice that it starts spinning madly after receiving the first bunch of packets (those with ttl 1). 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 13:43:02 poll([{fd=4, events=POLLERR}], 1, 5) = 0 ... I.e. the poll call returns as if it had timed out, but it hasn't. I've just looked at it, but I'm pretty sure this is a bug in your code. This should fix it... --- traceroute.c.orig Wed Dec 6 10:33:48 2000 +++ traceroute.cWed Dec 6 10:34:06 2000 @@ -193,7 +193,7 @@ timeout = hop-nextsend; } - poll(pfd, m, timeout - now); + poll(pfd, m, (timeout - now) * 1000); /* Receive any pending ICMP errors */ for (n = 0; n m; n++) { -- # James Antill -- [EMAIL PROTECTED] :0: * ^From: .*[EMAIL PROTECTED] /dev/null - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: sigtimedwait with a zero timeout
Henrik Nordstrom <[EMAIL PROTECTED]> writes: > You are not late. In fact you are the first who have responded to my > linux-kernel messages at all. > > Yes, I am well aware of sigwaitinfo. > > sigwaitinfo blocks infinitely if there is no queued signals and is the > opposite of sigtimedwait with a zero timeout. Yes, sorry that's what I thought you wanted to do (Ie. you new some data was there and wanted to get it quickly). > sigwaitinfo is implemented as sigtimedwait with a NULL timeout which is > read as a timeout of MAX_SCHEDULE_TIMEOUT. Ahh I didn't know that. > sigtimedwait with a zero timeout are meant to be used by applications > needing to poll signal queues while doing other processing. Having > sigtimedwait always block for at least 10 ms can have a quite negative > impact on such applications. If you want to return imediatley (and there might not be data) the answer given is usually... sigqueue( ... ); sigwaitinfo( ... ); If the above will still schedule, then Linus might be more likely to take a patch (I'd guess that he'd look at sigtimedwait() to be like sleep() in most other cases though). -- James Antill -- [EMAIL PROTECTED] "If we can't keep this sort of thing out of the kernel, we might as well pack it up and go run Solaris." -- Larry McVoy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: sigtimedwait with a zero timeout
Henrik Nordstrom [EMAIL PROTECTED] writes: You are not late. In fact you are the first who have responded to my linux-kernel messages at all. Yes, I am well aware of sigwaitinfo. sigwaitinfo blocks infinitely if there is no queued signals and is the opposite of sigtimedwait with a zero timeout. Yes, sorry that's what I thought you wanted to do (Ie. you new some data was there and wanted to get it quickly). sigwaitinfo is implemented as sigtimedwait with a NULL timeout which is read as a timeout of MAX_SCHEDULE_TIMEOUT. Ahh I didn't know that. sigtimedwait with a zero timeout are meant to be used by applications needing to poll signal queues while doing other processing. Having sigtimedwait always block for at least 10 ms can have a quite negative impact on such applications. If you want to return imediatley (and there might not be data) the answer given is usually... sigqueue( ... ); sigwaitinfo( ... ); If the above will still schedule, then Linus might be more likely to take a patch (I'd guess that he'd look at sigtimedwait() to be like sleep() in most other cases though). -- James Antill -- [EMAIL PROTECTED] "If we can't keep this sort of thing out of the kernel, we might as well pack it up and go run Solaris." -- Larry McVoy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/