Am 27.01.2014 um 15:03 schrieb [email protected]: > Thats right,, its the jobs that is crashing. > > hav a python scrip that generates the jobs and sends them into the queue with > qsub.
This python script is submitting one job after the other, and is never executed twice at the same time? Otherwise the generated (temporary) job scripts need to have a PID or alike in their name to avoid that one instance is submitting the job from the other instance. -- Reuti > I have been testing and digging a bit for a while now, and it seems when i > launch multiple jobs under the same project, they read from the same _mesh > folder, and the first 4-5 jobs of 20 tends to crash. > > maybe its my NFS system that cant handle it.. > > > > > > From: Reuti <[email protected]> > To: [email protected], > Cc: [email protected] > Date: 27.01.2014 14:59 > Subject: Re: [gridengine users] Random Crash when launching to an > empty queue > > > > Hi, > > Am 27.01.2014 um 10:56 schrieb [email protected]: > > > I have a strange problem that occurs randomly at times, when i try to > > launch many jobs into a empty queue system from same project > > > > the first 4-5 jobs always crash with random errors but most in regards to > > not being able to read a file. > > Not the submission crashes but the jobs - right? As they are using an unique > copy of the job script distributed by SGE to the nodes (unless submitted as a > binary), this shouldn't be a locking problem at all. > > You are using a job generator to create/submit the jobs in a loop? > > -- Reuti > > > > if i launch jobs manually everything goes fine, if i launch them 1 by 1 > > into the queue it works just fine. > > > > as anyone else experienced such a problem? > > > > It sounds like there is somekind of locking mechanismn that prevent other > > jobs from reading the file at the exact same time. > > > > > > ./thomas > > > > > > > > > > > > > > > > > > Denne e-posten kan innehalde informasjon som er konfidensiell > > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har > > adgang > > til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. > > Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr > > e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og > > kopiar av den. > > > > > > This e-mail may contain confidential information, or otherwise > > be protected against unauthorised use. Any disclosure, distribution or > > other use of the information by anyone but the intended recipient is > > strictly prohibited. > > If you have received this e-mail in error, please advise the sender by > > immediate reply and destroy the received documents and any copies hereof. > > > > > > > > PBefore > > printing, think about the environment > > > > > > > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > Denne e-posten kan innehalde informasjon som er konfidensiell > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har > adgang > til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. > Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr > e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og > kopiar av den. > > > This e-mail may contain confidential information, or otherwise > be protected against unauthorised use. Any disclosure, distribution or other > use of the information by anyone but the intended recipient is strictly > prohibited. > If you have received this e-mail in error, please advise the sender by > immediate reply and destroy the received documents and any copies hereof. > > > > PBefore > printing, think about the environment > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
