Re: possible bug in OpenNTPD code?
this analysis is as correct as thye provided diff, which has been committed. * Anirban Sinha [2008-12-05 03:07]: > Hi: > > I am sort of digging my way through the OpenNTPD codebase for my work. I > think I find a bug in the code. Please help me to understand the reason > if this is not a bug. > > In function ntp_main() (ntp.c), we poll() to check if there are any > events of interest. We do this: > > 1. Check internal fds (PIPE_MAIN) > 2. Then check PIPE_DNS fds > 3. Then check PIPE_HOTPLUG fds > > Next, for the server, we check all the fds we are listening on. And then > finally, for nfs clients, we check the fds for the remote servers. Now, > there's the issue in this line; > > for (j = 1; nfds > 0 && j < idx_peers; j++) { > ... > } > > Shouldn't the index start with 3? That is, shouldn't we do this: > > for (j = 3; nfds > 0 && j < idx_peers; j++) > > since, indices 0,1 and 2 correspond to the three checks I have written > above which are already done. > > In other words, can we apply the following patch to fix the issue? > > Index: ntpd/ntp.c > === > --- ntpd.orig/ntp.c > +++ ntpd/ntp.c > @@ -344,7 +344,7 @@ ntp_main(int pipe_prnt[2], struct ntpd_c > sensor_hotplugevent(hotplugfd); > } > > - for (j = 1; nfds > 0 && j < idx_peers; j++) > + for (j = PFD_MAX; nfds > 0 && j < idx_peers; j++) > if (pfd[j].revents & (POLLIN|POLLERR)) { > nfds--; > if (server_dispatch(pfd[j].fd, conf) == > -1) > > > > Thanks, > > Ani > -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: possible bug in OpenNTPD code?
>I'd take a look at your other diff later. Thanks for patching the upstream source in cvs with my patch. Ani > > -Otto
Re: possible bug in OpenNTPD code?
> From: Tom Van Looy [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 10, 2008 10:50 AM > To: Anirban Sinha > Cc: misc@openbsd.org > Subject: Re: possible bug in OpenNTPD code? > > >>Why would you assume that? That seems a bit hostile. Perhaps the > >>developers are a bit busy at the moment. > > > >True. I generally post on the Linux lists and I believe I am spoiled by > getting quick responses from my postings. In future, I will remember to keep > more patience. > > I would like to respond to this with a small anecdote: I opened a call at > vendor x at 17/sep/2008. The call resulted in me having to open a design > change request for their OS. So, 65 days later (coincidently today), I > received a "tracking number" and probably have to wait an other (max) 90 days > for the developers to decide if they even will make a fix. Oh, did I mention > how ridiculously much we pay each month for their support? This varies GREATLY from vendor to vendor. We had a vendor that put additional features in the product, because we asked for them. Of course, we were paying for between a third and half of a full time engineer in support fees. -- Ed Ahlsen-Girard
Re: possible bug in OpenNTPD code?
>I would like to respond to this with a small anecdote: I opened a call >at vendor x at 17/sep/2008. The call resulted in me having to open a >design change request for their OS. So, 65 days later (coincidently >today), I received a "tracking number" and probably have to wait an >other (max) 90 days for the developers to decide if they even will make >a fix. Oh, did I mention how ridiculously much we pay each month for >their support? Yes. This bites in the ass. But this is not something unheard of. Due to this very same reasons, we have moved our entire OS distro development in house so that we can timely address all issues ourselves as opposed to waiting for a third party vendor to address our issues. In this respect, the open source philosophy really works well. If you post it in an appropriate mailing list, someone somewhere will definitely respond to your message, albeit may be with slight delay. Long live the open source community. Ani
Re: possible bug in OpenNTPD code?
>>Why would you assume that? That seems a bit hostile. Perhaps the >>developers are a bit busy at the moment. > >True. I generally post on the Linux lists and I believe I am spoiled by getting quick responses from my postings. In future, I will remember to keep more patience. I would like to respond to this with a small anecdote: I opened a call at vendor x at 17/sep/2008. The call resulted in me having to open a design change request for their OS. So, 65 days later (coincidently today), I received a "tracking number" and probably have to wait an other (max) 90 days for the developers to decide if they even will make a fix. Oh, did I mention how ridiculously much we pay each month for their support?
Re: possible bug in OpenNTPD code?
Hi! On Tue, Dec 09, 2008 at 10:36:46PM -0800, Anirban Sinha wrote: >>How is it blocking indefinitely? Is wait() not returning -1 with >>errno == ECHILD when there are no children to reap? What led you to >>the conclusion that this code was blocking? (What platform are you >>running this on?) >Hmm, agreed. Looks like I was wrong with my analysis. In any case, I am >running the portable version of the ntpd on Linux. I am definitely >observing the parent still alive and blocked (sleeping) even when the >child is dead. I need to do some more digging on this. IIRC, you can show the wait channel on Linux too, using something like ps alxww|grep ntpd Then you perhaps can see what the parent ntpd process is really waiting for. Kind regards, Hannah.
Re: possible bug in OpenNTPD code?
On Thu, Dec 04, 2008 at 06:03:32PM -0800, Anirban Sinha wrote: > Hi: > > I am sort of digging my way through the OpenNTPD codebase for my work. I > think I find a bug in the code. Please help me to understand the reason > if this is not a bug. > > In function ntp_main() (ntp.c), we poll() to check if there are any > events of interest. We do this: > > 1. Check internal fds (PIPE_MAIN) > 2. Then check PIPE_DNS fds > 3. Then check PIPE_HOTPLUG fds > > Next, for the server, we check all the fds we are listening on. And then > finally, for nfs clients, we check the fds for the remote servers. Now, > there's the issue in this line; > > for (j = 1; nfds > 0 && j < idx_peers; j++) { > ... > } > > Shouldn't the index start with 3? That is, shouldn't we do this: > > for (j = 3; nfds > 0 && j < idx_peers; j++) > > since, indices 0,1 and 2 correspond to the three checks I have written > above which are already done. > > In other words, can we apply the following patch to fix the issue? > Yes, that seems like this bit was forgotten when PFD_MAX was "introduced" in 1.68 of ntp.c. Your fix is correct and commited. thanks for the report -- :wq Claudio
Re: possible bug in OpenNTPD code?
On Tue, Dec 09, 2008 at 09:57:25PM -0800, Philip Guenther wrote: > On Tue, Dec 9, 2008 at 8:10 PM, Anirban Sinha <[EMAIL PROTECTED]> wrote: > > There is yet another bug in Openntpd. This is direct copy-paste from > > openntpd code (ntpd.c:main()): > > > > do { > >if ((pid = wait(NULL)) == -1 && > >errno != EINTR && errno != ECHILD) > >fatal("wait"); > >} while (pid != -1 || (pid == -1 && errno == EINTR)); > > > > What this code intends to do is to reap all children and move on when > > there are no more. Instead, it ends up blocking indefinitely even when > > there are no children to reap! > > How is it blocking indefinitely? Is wait() not returning -1 with > errno == ECHILD when there are no children to reap? What led you to > the conclusion that this code was blocking? (What platform are you > running this on?) > > > > The way I fixed the bug is by doing this: > > + if (chld_pid && (pid = waitpid(chld_pid, &status, 0)) == -1 && > > + errno != EINTR && errno != ECHILD) > > + fatal("wait"); > > + if (pid !=-1) > > + log_info("child %d exited with return code %d", > > pid,WEXITSTATUS(status)); > > This code fails to retry the waitpid() if it returns with EINTR. > > > Philip Guenther Philip is right. The code is ok as is. Besides, in current ntpd has multiple children, so you one child argument is lost as well. I'd take a look at your other diff later. -Otto
Re: possible bug in OpenNTPD code?
On Wed, Dec 10, 2008 at 12:05:05AM -0600, Todd Alan Smith wrote: > On Tue, Dec 9, 2008 at 10:10 PM, Anirban Sinha <[EMAIL PROTECTED]> wrote: > > I hope I will get some response to this. If not, I will assume that > > there is really no interest in fixing bugs in openntpd and in that case, > > Why would you assume that? That seems a bit hostile. Perhaps the > developers are a bit busy at the moment. Indeed it sounded hostile to me. Especially since the op is sending in a buggy diff to code that is ok, afaiks. -Otto
Re: possible bug in OpenNTPD code?
>Why would you assume that? That seems a bit hostile. Perhaps the >developers are a bit busy at the moment. True. I generally post on the Linux lists and I believe I am spoiled by getting quick responses from my postings. In future, I will remember to keep more patience. Ani
Re: possible bug in OpenNTPD code?
>How is it blocking indefinitely? Is wait() not returning -1 with >errno == ECHILD when there are no children to reap? What led you to >the conclusion that this code was blocking? (What platform are you >running this on?) Hmm, agreed. Looks like I was wrong with my analysis. In any case, I am running the portable version of the ntpd on Linux. I am definitely observing the parent still alive and blocked (sleeping) even when the child is dead. I need to do some more digging on this. Apologies. Ani
Re: possible bug in OpenNTPD code?
On Tue, Dec 9, 2008 at 10:10 PM, Anirban Sinha <[EMAIL PROTECTED]> wrote: > I hope I will get some response to this. If not, I will assume that > there is really no interest in fixing bugs in openntpd and in that case, Why would you assume that? That seems a bit hostile. Perhaps the developers are a bit busy at the moment.
Re: possible bug in OpenNTPD code?
On Tue, Dec 9, 2008 at 8:10 PM, Anirban Sinha <[EMAIL PROTECTED]> wrote: > There is yet another bug in Openntpd. This is direct copy-paste from > openntpd code (ntpd.c:main()): > > do { >if ((pid = wait(NULL)) == -1 && >errno != EINTR && errno != ECHILD) >fatal("wait"); >} while (pid != -1 || (pid == -1 && errno == EINTR)); > > What this code intends to do is to reap all children and move on when > there are no more. Instead, it ends up blocking indefinitely even when > there are no children to reap! How is it blocking indefinitely? Is wait() not returning -1 with errno == ECHILD when there are no children to reap? What led you to the conclusion that this code was blocking? (What platform are you running this on?) > The way I fixed the bug is by doing this: > + if (chld_pid && (pid = waitpid(chld_pid, &status, 0)) == -1 && > + errno != EINTR && errno != ECHILD) > + fatal("wait"); > + if (pid !=-1) > + log_info("child %d exited with return code %d", > pid,WEXITSTATUS(status)); This code fails to retry the waitpid() if it returns with EINTR. Philip Guenther
Re: possible bug in OpenNTPD code?
On Tue, Dec 09, 2008 at 08:10:20PM -0800, Anirban Sinha wrote: > I hope I will get some response to this. If not, I will assume that > there is really no interest in fixing bugs in openntpd and in that case, > I will patch only our local copy of the ntpd codebase (as opposed to > reporting to the community). misc@ is probably not the most suited place to report bugs. Better use [EMAIL PROTECTED] Anyway, I'll take a look at your diffs. -Otto
Re: possible bug in OpenNTPD code?
There is yet another bug in Openntpd. This is direct copy-paste from openntpd code (ntpd.c:main()): do { if ((pid = wait(NULL)) == -1 && errno != EINTR && errno != ECHILD) fatal("wait"); } while (pid != -1 || (pid == -1 && errno == EINTR)); What this code intends to do is to reap all children and move on when there are no more. Instead, it ends up blocking indefinitely even when there are no children to reap! The way I fixed the bug is by doing this: Index: openntpd-src/ntpd.c === --- openntpd-src.orig/ntpd.c +++ openntpd-src/ntpd.c @@ -90,10 +90,11 @@ main(int argc, char *argv[]) { struct ntpd_conf lconf; struct pollfdpfd[POLL_MAX]; - pid_tchld_pid = 0, pid; + pid_tchld_pid = 0, pid=0; const char *conffile; int ch, nfds, timeout = INFTIM; int pipe_chld[2]; + int status; extern char *__progname; __progname = _compat_get_progname(argv[0]); @@ -233,11 +234,11 @@ main(int argc, char *argv[]) if (chld_pid) kill(chld_pid, SIGTERM); - do { - if ((pid = wait(NULL)) == -1 && - errno != EINTR && errno != ECHILD) - fatal("wait"); - } while (pid != -1 || (pid == -1 && errno == EINTR)); + if (chld_pid && (pid = waitpid(chld_pid, &status, 0)) == -1 && + errno != EINTR && errno != ECHILD) + fatal("wait"); + if (pid !=-1) + log_info("child %d exited with return code %d", pid, WEXITSTATUS(status)); msgbuf_clear(&ibuf->w); free(ibuf); It forks one child anyway. So it suffices to reap that one child. I hope I will get some response to this. If not, I will assume that there is really no interest in fixing bugs in openntpd and in that case, I will patch only our local copy of the ntpd codebase (as opposed to reporting to the community). Thanks, Ani >-Original Message----- >From: Anirban Sinha >Sent: Thursday, December 04, 2008 6:04 PM >To: 'misc@openbsd.org' >Subject: possible bug in OpenNTPD code? > >Hi: > >I am sort of digging my way through the OpenNTPD codebase for my work. I >think I find a bug in the code. Please help me to understand the reason >if this is not a bug. > >In function ntp_main() (ntp.c), we poll() to check if there are any >events of interest. We do this: > >1. Check internal fds (PIPE_MAIN) >2. Then check PIPE_DNS fds >3. Then check PIPE_HOTPLUG fds > >Next, for the server, we check all the fds we are listening on. And then >finally, for nfs clients, we check the fds for the remote servers. Now, >there's the issue in this line; > >for (j = 1; nfds > 0 && j < idx_peers; j++) { >... >} > >Shouldn't the index start with 3? That is, shouldn't we do this: > >for (j = 3; nfds > 0 && j < idx_peers; j++) > >since, indices 0,1 and 2 correspond to the three checks I have written >above which are already done. > >In other words, can we apply the following patch to fix the issue? > >Index: ntpd/ntp.c >=== >--- ntpd.orig/ntp.c >+++ ntpd/ntp.c >@@ -344,7 +344,7 @@ ntp_main(int pipe_prnt[2], struct ntpd_c > sensor_hotplugevent(hotplugfd); > } > >- for (j = 1; nfds > 0 && j < idx_peers; j++) >+ for (j = PFD_MAX; nfds > 0 && j < idx_peers; j++) > if (pfd[j].revents & (POLLIN|POLLERR)) { > nfds--; > if (server_dispatch(pfd[j].fd, conf) == -1) > > > >Thanks, > >Ani
possible bug in OpenNTPD code?
Hi: I am sort of digging my way through the OpenNTPD codebase for my work. I think I find a bug in the code. Please help me to understand the reason if this is not a bug. In function ntp_main() (ntp.c), we poll() to check if there are any events of interest. We do this: 1. Check internal fds (PIPE_MAIN) 2. Then check PIPE_DNS fds 3. Then check PIPE_HOTPLUG fds Next, for the server, we check all the fds we are listening on. And then finally, for nfs clients, we check the fds for the remote servers. Now, there's the issue in this line; for (j = 1; nfds > 0 && j < idx_peers; j++) { ... } Shouldn't the index start with 3? That is, shouldn't we do this: for (j = 3; nfds > 0 && j < idx_peers; j++) since, indices 0,1 and 2 correspond to the three checks I have written above which are already done. In other words, can we apply the following patch to fix the issue? Index: ntpd/ntp.c === --- ntpd.orig/ntp.c +++ ntpd/ntp.c @@ -344,7 +344,7 @@ ntp_main(int pipe_prnt[2], struct ntpd_c sensor_hotplugevent(hotplugfd); } - for (j = 1; nfds > 0 && j < idx_peers; j++) + for (j = PFD_MAX; nfds > 0 && j < idx_peers; j++) if (pfd[j].revents & (POLLIN|POLLERR)) { nfds--; if (server_dispatch(pfd[j].fd, conf) == -1) Thanks, Ani