Hi--  we're using Sun Grid Engine for our Illumina jobs, and are having a bear 
of a time getting things to finish without blowing up.     Almost every job 
submission, we end up seeing errors like this after several hours.   I really 
can find nothing else in the SGE logs to tell me what's going on.

We have a cluster of Dell R610's with a dedicated qmaster node.    Connections 
to shared data are all via 10-gig Isilon.   Spool directories (classic) are 
local to each node.

00:02:39]   [cairo] [6cyc_5pm_NoIndex_L006_R1_008_eland_extended.txt.oa]    
error: commlib error: got read error (closing "cairo/shepherd_ijs/1")

How can I go from this kind of message (commlib error) to something that's more 
meaningful?

Thanks for ANY insight wit this!  --Kent
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to