Thats right,, its the jobs that is crashing. hav a python scrip that generates the jobs and sends them into the queue with qsub.
I have been testing and digging a bit for a while now, and it seems when i launch multiple jobs under the same project, they read from the same _mesh folder, and the first 4-5 jobs of 20 tends to crash. maybe its my NFS system that cant handle it.. From: Reuti <[email protected]> To: [email protected], Cc: [email protected] Date: 27.01.2014 14:59 Subject: Re: [gridengine users] Random Crash when launching to an empty queue Hi, Am 27.01.2014 um 10:56 schrieb [email protected]: > I have a strange problem that occurs randomly at times, when i try to launch many jobs into a empty queue system from same project > > the first 4-5 jobs always crash with random errors but most in regards to not being able to read a file. Not the submission crashes but the jobs - right? As they are using an unique copy of the job script distributed by SGE to the nodes (unless submitted as a binary), this shouldn't be a locking problem at all. You are using a job generator to create/submit the jobs in a loop? -- Reuti > if i launch jobs manually everything goes fine, if i launch them 1 by 1 into the queue it works just fine. > > as anyone else experienced such a problem? > > It sounds like there is somekind of locking mechanismn that prevent other jobs from reading the file at the exact same time. > > > ./thomas > > > > > > > > > Denne e-posten kan innehalde informasjon som er konfidensiell > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang > til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den. > > > This e-mail may contain confidential information, or otherwise > be protected against unauthorised use. Any disclosure, distribution or other use of the information by anyone but the intended recipient is strictly prohibited. > If you have received this e-mail in error, please advise the sender by immediate reply and destroy the received documents and any copies hereof. > > > > PBefore > printing, think about the environment > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > Denne e-posten kan innehalde informasjon som er konfidensiell og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den. This e-mail may contain confidential information, or otherwise be protected against unauthorised use. Any disclosure, distribution or other use of the information by anyone but the intended recipient is strictly prohibited. If you have received this e-mail in error, please advise the sender by immediate reply and destroy the received documents and any copies hereof. PBefore printing, think about the environment
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
