This is the way select() works regardless of the version of redhat or any other distribution. The fd_set is a bit array defined in <sys/select.h> of __FD_SETSIZE which is defined as 1024 in <bits/typesizes.h>
*/David* On Tue, Mar 12, 2013 at 11:30 AM, Hongjia Cao <[email protected]> wrote: > When launching tasks on about 1000 nodes, I get the following error > message sometimes: > > srun: error: io_init_msg_read timed out > srun: error: failed reading io init message > > I find the problem in src/common/fd.c, where "select()" is used to check > whether a file descriptor is readable. Running the attached program > tsel.c shows that in RHEL 6.2 file descriptor passed to "select()" can > not exceed 1023, or "FD_ISSET()" will not function correctly: > > [root@ln0 select]# cat /etc/issue > Red Hat Enterprise Linux Server release 6.2 (Santiago) > Kernel \r on an \m > > [root@ln0 select]# uname -a > Linux ln0 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 > x86_64 x86_64 x86_64 GNU/Linux > > [root@ln0 select]# ./tsel 1023 > dup2 returned 1023 > file descriptor 1023 in readable set > file descriptor 1023 in exception set > > select returned 1 > file descriptor 1023 readable > > [root@ln0 select]# ./tsel 1024 > dup2 returned 1024 > file descriptor 1024 in readable set > file descriptor 1024 in exception set > > select returned 1 > > [root@ln0 select]# ./tsel 1027 > dup2 returned 1027 > file descriptor 1027 in readable set > file descriptor 1027 in exception set > select returned -1 > failed to select:: Bad file descriptor > > I changed "select()" to "poll()" to fix this problem. > >
