This is very interesting...it could be the source of my problem as well. I 
have a section of code which does something like this

         close(sockfd_);
         connected_ = false;

and elsewhere in another thread a select is done on all connected sockets 
(checking each instance of connected_). Is it possible that a race 
condition exists where the socket is closed, but connected_ is not set to 
false before the select is performed?

In general, does the close() block the whole process? If so then I'd guess 
this race condition won't happen. Looking at the pTh man page would say 
"yes", but I thought I should ask the experts...

thanks

Brent

At 05:27 PM 7/24/2001 -0700, Shawn Wagner wrote:
>On Tue, Jul 24, 2001 at 09:13:18PM +0200, Ralf S. Engelschall wrote:
> > On Mon, Jul 23, 2001, Shawn Wagner wrote:
> >
> > >
> > > With some more debugging, it looks like using hard syscall wrappers 
> is the
> > > problem. Turning that off makes everything work for me, at least (Linux
> > > 2.2). With them, pth_sched_eventmanager() isn't always detecting fd 
> i/o with
> > > a timeout correctly and was exiting early instead of waiting for an 
> event to
> > > happen. This is annoying, since one of the threads does 
> gethostbyaddr()'s,
> > > which I didn't want possibly blocking the whole server. But at least it's
> > > not aborting now.
> >
> > Hhmm... so you think it is not a semantical bug in the scheduler, but
> > instead a side-effect of some syscalls. What happens if you use the soft
> > syscall wrappers? Do you have a _SMALL_ test application at hand which I
> > can use to deterministically reproduce the problem?
> >
>
>I've narrowed it down further. The problem seemed to be happening after
>trying to connect to a port that's not listening (An auth/ident lookup, in
>this case). I found a bug in my code returning the wrong value when
>this happens, so a later select was done on a closed fd. After fixing
>that, everything's working good... and I have no idea why this wasn't
>happening all the time instead of just with hard syscalls.
>
>The error in my code revealed that the select done in pth_sched_eventmanager()
>doesn't deal well with errors besides EINTR (EBADF in my case). The test
>program below aborts no matter the syscall setting.
>
>--
>Shawn Wagner
>[EMAIL PROTECTED]
>
>#include <pth.h>
>#include <stdlib.h>
>#include <string.h>
>#include <unistd.h>
>
>void *thread_select(void *x) {
>   fd_set r;
>   struct timeval t;
>
>   FD_ZERO(&r);
>
>   /* Nice high number that shouldn't be open */
>   FD_SET(50, &r);
>
>   t.tv_sec = 5;
>   t.tv_usec = 0;
>
>   if (pth_select(51, &r, NULL, NULL, &t) < 0)
>     perror("pth_select");
>   printf("Never reached.\n");
>   return NULL;
>
>}
>
>
>int main(void) {
>   pth_t child;
>   pth_attr_t s_attr;
>
>   pth_init();
>
>   s_attr = pth_attr_new();
>   pth_attr_set(s_attr, PTH_ATTR_NAME, "select");
>   child = pth_spawn(s_attr, thread_select, NULL);
>   pth_join(child, NULL);
>
>   return EXIT_SUCCESS;
>}
>
>______________________________________________________________________
>GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
>User Support Mailing List                            [EMAIL PROTECTED]
>Automated List Manager (Majordomo)           [EMAIL PROTECTED]


______________________________________________________________________
GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
User Support Mailing List                            [EMAIL PROTECTED]
Automated List Manager (Majordomo)           [EMAIL PROTECTED]

Reply via email to