I see. So poll() should be used instead of select() for srun to scale.

在 2013-03-12二的 06:44 -0600,David Bigagli写道:
> This is the way select() works regardless of the version of redhat or
> any other distribution.
> 
> The fd_set is a bit array defined in <sys/select.h> of __FD_SETSIZE
> which is defined as 1024 in <bits/typesizes.h>
> 
> 
> /David
> 
> 
> 
> On Tue, Mar 12, 2013 at 11:30 AM, Hongjia Cao <[email protected]>
> wrote:
>         When launching tasks on about 1000 nodes, I get the following
>         error
>         message sometimes:
>         
>              srun: error: io_init_msg_read timed out
>              srun: error: failed reading io init message
>         
>         I find the problem in src/common/fd.c, where "select()" is
>         used to check
>         whether a file descriptor is readable. Running the attached
>         program
>         tsel.c shows that in RHEL 6.2 file descriptor passed to
>         "select()" can
>         not exceed 1023, or "FD_ISSET()" will not function correctly:
>         
>              [root@ln0 select]# cat /etc/issue
>              Red Hat Enterprise Linux Server release 6.2 (Santiago)
>              Kernel \r on an \m
>         
>              [root@ln0 select]# uname -a
>              Linux ln0 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13
>         EST 2011
>         x86_64 x86_64 x86_64 GNU/Linux
>         
>              [root@ln0 select]# ./tsel 1023
>              dup2 returned 1023
>              file descriptor 1023 in readable set
>              file descriptor 1023 in exception set
>         
>              select returned 1
>              file descriptor 1023 readable
>         
>              [root@ln0 select]# ./tsel 1024
>              dup2 returned 1024
>              file descriptor 1024 in readable set
>              file descriptor 1024 in exception set
>         
>              select returned 1
>         
>              [root@ln0 select]# ./tsel 1027
>              dup2 returned 1027
>              file descriptor 1027 in readable set
>              file descriptor 1027 in exception set
>              select returned -1
>              failed to select:: Bad file descriptor
>         
>         I changed "select()" to "poll()" to fix this problem.
>         
> 
> 
> 

Reply via email to