Am 31.10.2011 um 15:58 schrieb Florian La Roche:

> With current gridengine sources from github.com (Univa?) as
> well as the Son of a GridEngine fork, classical spooling method
> is completely broken. If you have more than one slot configured
> or several hosts, this should result in error messages showing
> up in the logs, even though jobs may run ok.

Looks like I completely misunderstand the above issue:

Why should it be an error to have more than one slot or more than one host?

-- Reuti


> The patch changes from calls to sge_rmdir() that also delete
> recursively all subdirs to standard rmdir() calls to keep the
> spool dir clean, but not delete other jobs data.
> 
> Introduced was this bug by the followin commit:
> https://github.com/gridengine/gridengine/commit/8c6b462a4d85e1b0713b445fe91347eec60188ff
> 
> I've put new rpm packages for RHEL5 and RHEL6 with the below patch
> and based on current SoGE v8.0.0c to http://jur-linux.org/download/el-updates/
> if someone wants to try out real workloads.
> 
> big thanks to all the gridengine developers,
> best regards,
> 
> Florian La Roche
> 
> 
> --- a/source/libs/spool/classic/read_write_job.c
> +++ b/source/libs/spool/classic/read_write_job.c
> @@ -688,14 +688,12 @@ int job_remove_spool_file(u_long32 jobid, u_long32 
> ja_taskid,
>          }
> 
>          /*
> -          * Following sge_rmdir call may fail. We can ignore this error.
> +          * Following rmdir call may fail. We can ignore this error.
>           * This is only an indicator that another task is running which has 
>           * been spooled in the directory.
>           */  
>          DPRINTF(("try to remove "SFN"\n", task_spool_dir));
> -         if (sge_rmdir(task_spool_dir, &error_msg)) {
> -            ERROR((SGE_EVENT, MSG_JOB_CANNOT_REMOVE_SS, 
> MSG_JOB_TASK_SPOOL_FILE, error_msg_buffer));
> -         } 
> +      rmdir(task_spool_dir);
> 
>          /* 
>           * a task spool directory has been removed: reinit 
> @@ -735,16 +733,15 @@ int job_remove_spool_file(u_long32 jobid, u_long32 
> ja_taskid,
>       try_to_remove_sub_dirs = 1;
>    }
>    /*
> -    * Following sge_rmdir calls may fail. We can ignore these errors.
> +    * Following rmdir calls may fail. We can ignore these errors.
>     * This is only an indicator that another job is running which has been
>     * spooled in the same directory.
>     */
>    if (try_to_remove_sub_dirs) {
>       DPRINTF(("try to remove "SFN"\n", spool_dir_third));
> -
> -      if (!sge_rmdir(spool_dir_third, NULL)) {
> +      if (!rmdir(spool_dir_third)) {
>          DPRINTF(("try to remove "SFN"\n", spool_dir_second));
> -         sge_rmdir(spool_dir_second, NULL); 
> +         rmdir(spool_dir_second);
>       }
>    }
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to