On 14/09/2007, Alex Kiernan <[EMAIL PROTECTED]> wrote: > On 14/09/2007, Alex Kiernan <[EMAIL PROTECTED]> wrote: > > On 13/09/2007, bert hubert <[EMAIL PROTECTED]> wrote: > > > On Thu, Sep 13, 2007 at 03:16:12PM +0100, Alex Kiernan wrote: > > > > I've run another 8M queries through it in a test environment, and put > > > > it onto a live box where its been up for 5 hours, answered over a > > > > million queries and I've not seen a problem, so I'm hoping that this > > > > is the right fix. Certainly I've not managed to see it running live > > > > for anything like this long previously. > > > > > > Alex, > > > > > > I've committed a slightly different patch which appears to work on our > > > T2000, but can you verify? > > > > > > > I'll give it a go - from a quick look it looks like it differs only in > > the handling of ETIME, which given we're only retrieving a single > > event, I don't think the case of (ret==-1, errno==ETIME, numevents==1) > > can occur, but I'll be honest its not a hole I'd like to leave. > > Gave up almost immediately... :( > > I have to admit to being a bit surprised - I'll dig some more. >
Prepare to be surprised... I added instrumention so the code looked like this: int ret= port_getn(d_portfd, d_pevents.get(), min(PORT_MAX_LIST, s_maxevents), &numevents, &timeout); int e = errno; if (ret !=0) { L<<Logger::Error<<"1:ret="<<ret<<",errno="<<e<<",numevents="<<numevents<<endl; } errno = e; gettimeofday(now,0); e = errno; if (ret !=0) { L<<Logger::Error<<"2:ret="<<ret<<",errno="<<e<<",numevents="<<numevents<<endl; } errno = e; if(ret < 0) { And set my tests running - initially I see a few: Sep 14 12:21:15 1:ret=-1,errno=62,numevents=0 Sep 14 12:21:15 2:ret=-1,errno=62,numevents=0 type messages, then it all goes silent as I get the workload pushed up - then, when it all goes wrong, I see: Sep 14 12:35:34 1:ret=-1,errno=62,numevents=2 Sep 14 12:35:34 2:ret=-1,errno=62,numevents=2 i.e. timer expired, with two events to process, but the current comitted code doesn't handle those two events because it got an ETIME (certainly this a bizarre API - I can't think of another UNIX API where ret == -1, errno == E... means partial success). Looks like that's when the man page means by "desired" - you can still get up to max delivered (and a timeout at the same time!). -- Alex Kiernan _______________________________________________ Pdns-users mailing list Pdns-users@mailman.powerdns.com http://mailman.powerdns.com/mailman/listinfo/pdns-users