Am 31.10.2011 um 15:58 schrieb Florian La Roche: > With current gridengine sources from github.com (Univa?) as > well as the Son of a GridEngine fork, classical spooling method > is completely broken. If you have more than one slot configured > or several hosts, this should result in error messages showing > up in the logs, even though jobs may run ok.
Looks like I completely misunderstand the above issue: Why should it be an error to have more than one slot or more than one host? -- Reuti > The patch changes from calls to sge_rmdir() that also delete > recursively all subdirs to standard rmdir() calls to keep the > spool dir clean, but not delete other jobs data. > > Introduced was this bug by the followin commit: > https://github.com/gridengine/gridengine/commit/8c6b462a4d85e1b0713b445fe91347eec60188ff > > I've put new rpm packages for RHEL5 and RHEL6 with the below patch > and based on current SoGE v8.0.0c to http://jur-linux.org/download/el-updates/ > if someone wants to try out real workloads. > > big thanks to all the gridengine developers, > best regards, > > Florian La Roche > > > --- a/source/libs/spool/classic/read_write_job.c > +++ b/source/libs/spool/classic/read_write_job.c > @@ -688,14 +688,12 @@ int job_remove_spool_file(u_long32 jobid, u_long32 > ja_taskid, > } > > /* > - * Following sge_rmdir call may fail. We can ignore this error. > + * Following rmdir call may fail. We can ignore this error. > * This is only an indicator that another task is running which has > * been spooled in the directory. > */ > DPRINTF(("try to remove "SFN"\n", task_spool_dir)); > - if (sge_rmdir(task_spool_dir, &error_msg)) { > - ERROR((SGE_EVENT, MSG_JOB_CANNOT_REMOVE_SS, > MSG_JOB_TASK_SPOOL_FILE, error_msg_buffer)); > - } > + rmdir(task_spool_dir); > > /* > * a task spool directory has been removed: reinit > @@ -735,16 +733,15 @@ int job_remove_spool_file(u_long32 jobid, u_long32 > ja_taskid, > try_to_remove_sub_dirs = 1; > } > /* > - * Following sge_rmdir calls may fail. We can ignore these errors. > + * Following rmdir calls may fail. We can ignore these errors. > * This is only an indicator that another job is running which has been > * spooled in the same directory. > */ > if (try_to_remove_sub_dirs) { > DPRINTF(("try to remove "SFN"\n", spool_dir_third)); > - > - if (!sge_rmdir(spool_dir_third, NULL)) { > + if (!rmdir(spool_dir_third)) { > DPRINTF(("try to remove "SFN"\n", spool_dir_second)); > - sge_rmdir(spool_dir_second, NULL); > + rmdir(spool_dir_second); > } > } > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
