Am 16.06.2011 um 15:03 schrieb baf035:

> we are using SoGE rel. 3910 for tests.       
> Submited jobs are correcty dispatched but no informations are stored in a 
> spool direcrory <SPOOL_DIR>/qmaster/jobs.

You are using classic spooling?


> In a qmaster messages file are inforamations about missing file/folder at the 
> time of ending of job:
> ----------------
> 6/16/2011 10:06:30|schedu|sged2|E|can't find parallel task 50993.1 task 
> past_usage for update in function pe_task_update_master_list_usage
> 06/16/2011 10:06:30|schedu|sged2|E|callback function for event "3941466. 
> EVENT JOB 50993.1 task past_usage USAGE" failed
> 06/16/2011 10:07:10|worker|sged2|E|unlink(jobs/00/0005/0993/common) failed: 
> No such file or directory
> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool file: 
> jobs/00/0005/0993/common

The "common" is strange here. What I saw in the past was just a plain file like 
0993 containing binary information of the job.


> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool directory: 
> jobs/00/0005/0993
> ---------------
> qacct -j 50993 | grep end_time | uniq
> end_time     Thu Jun 16 10:05:52 2011
> --------------
> 
> 
> A migration of the qmasterd leads to a total lost of job informations. No 
> jobs in qstat after the migration.
>  
> We have encountered also a case when files in <SPOOL_DIR>/qmaster/jobs are 
> correctly created but during 
> the migration disappeard without a log in the messages file.

And it's in a shared space?

-- Reuti


> Please validate this behavior and thanks for a fix.
> 
> baf035
> _______________________________________________
> SGE-discuss mailing list
> [email protected]
> https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to