This is the way select() works regardless of the version of redhat or any
other distribution.
The fd_set is a bit array defined in <sys/select.h> of __FD_SETSIZE which
is defined as 1024 in <bits/typesizes.h>

*/David*


On Tue, Mar 12, 2013 at 11:30 AM, Hongjia Cao <[email protected]> wrote:

> When launching tasks on about 1000 nodes, I get the following error
> message sometimes:
>
>      srun: error: io_init_msg_read timed out
>      srun: error: failed reading io init message
>
> I find the problem in src/common/fd.c, where "select()" is used to check
> whether a file descriptor is readable. Running the attached program
> tsel.c shows that in RHEL 6.2 file descriptor passed to "select()" can
> not exceed 1023, or "FD_ISSET()" will not function correctly:
>
>      [root@ln0 select]# cat /etc/issue
>      Red Hat Enterprise Linux Server release 6.2 (Santiago)
>      Kernel \r on an \m
>
>      [root@ln0 select]# uname -a
>      Linux ln0 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011
> x86_64 x86_64 x86_64 GNU/Linux
>
>      [root@ln0 select]# ./tsel 1023
>      dup2 returned 1023
>      file descriptor 1023 in readable set
>      file descriptor 1023 in exception set
>
>      select returned 1
>      file descriptor 1023 readable
>
>      [root@ln0 select]# ./tsel 1024
>      dup2 returned 1024
>      file descriptor 1024 in readable set
>      file descriptor 1024 in exception set
>
>      select returned 1
>
>      [root@ln0 select]# ./tsel 1027
>      dup2 returned 1027
>      file descriptor 1027 in readable set
>      file descriptor 1027 in exception set
>      select returned -1
>      failed to select:: Bad file descriptor
>
> I changed "select()" to "poll()" to fix this problem.
>
>

Reply via email to