Hi-- we're using Sun Grid Engine for our Illumina jobs, and are having a bear of a time getting things to finish without blowing up. Almost every job submission, we end up seeing errors like this after several hours. I really can find nothing else in the SGE logs to tell me what's going on.
We have a cluster of Dell R610's with a dedicated qmaster node. Connections to shared data are all via 10-gig Isilon. Spool directories (classic) are local to each node. 00:02:39] [cairo] [6cyc_5pm_NoIndex_L006_R1_008_eland_extended.txt.oa] error: commlib error: got read error (closing "cairo/shepherd_ijs/1") How can I go from this kind of message (commlib error) to something that's more meaningful? Thanks for ANY insight wit this! --Kent
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
