Hello. It was unclear to me where to send this because the latest version of c3 isn't on the c3 web site, but rather, we found it over in the oscar project. I included the C3 contacts and the oscar-devel list in this email.
c3-5.0.1-1.src.rpm from oscar's page has a problem when there is a lot of output. Basically, a write pipe blocks because the process that does to the reading is in waitpid and not actively reading from the pipe. This creates a deadlock. In this patch, I moved the waitpid stuff after the pipe reads in the area that caused the problem. I've attached a patch. To duplicate the problem, run the script chunk below with a command similar to the following (same problem with more than one node; just kept it to one node for simplicity). cexec -p rack_1:15-15 /tmp/runme 2500 The script: #! /bin/bash cnt=1 while [ $cnt -le $1 ]; do echo "output line $cnt" #echo $cnt let cnt++ done -- Erik Jacobson - Linux System Software - SGI - Eagan, Minnesota
diff -Narup c3-5.0.1.sgi-ORIG/src/cexec c3-5.0.1.sgi/src/cexec --- c3-5.0.1.sgi-ORIG/src/cexec 2010-05-10 13:27:54.678174733 -0500 +++ c3-5.0.1.sgi/src/cexec 2010-05-12 15:10:59.988627176 -0500 @@ -488,10 +488,6 @@ try: if( code != 0 ): returncode = code os._exit(returncode) pid_list_outer.append( pid ) - for pid in pid_list_outer: # wait for all processes spawned to finish - pid, code = os.waitpid(pid,0) - code = code>>8 - if( code != 0 ): returncode = code if not dryrun: output = "" @@ -515,9 +511,13 @@ try: continue except IndexError: print "No computer returned any output." - - - + + for pid in pid_list_outer: # wait for all processes spawned to finish + pid, code = os.waitpid(pid,0) + code = code>>8 + if( code != 0 ): returncode = code + + except KeyboardInterrupt: print "Keyboard interrupt\n"
------------------------------------------------------------------------------
_______________________________________________ Oscar-devel mailing list Oscar-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-devel