On Tue, Jan 29, 2013 at 9:26 AM, William Hay <[email protected]> wrote:
>
> On 25 January 2013 17:21, Stefano Bridi <[email protected]> wrote:
>>
>> Hi all, is there a way to use the scratch area (local disk) on the
>> compute node in a transparent way from the submitted script point of
>> view?
>> What I want to do is to copy to and from the compute node scratch area
>> the job data using the prolog/epilog but I need also to start the
>> submitted script in the scratch area instead of the cwd.
>> Is there a way?
>
> Since you are mucking around with prolog and epilog I assume you have
> administrative control of the cluster.
> One solution would be to use a starter method to cd to $TMPDIR before
> execing the real job. starter_method
> is a bit of a swiss army chainsaw though (a flexible but dangeous tool).
>
> William
Yes, I'm the admin: the problem I want to solve in this way is to
lower the load of the central file server by using local scratch area
on the "master" node as a scratch area.
What I mean is that if the job is running in SMP or serial, it is the
local disk (/scratch) and, if the job is using multiple nodes (mpi),
it will be the local disk "/scratch" of the first node exported via
NFS and mounted on the fly via autofs "/net/n0000/scratch/" on the
other nodes.
By doing this, the traffic on the central file server ("/home") is
done only at the start and at the end of the job with the possibility
to apply some filter to throw away useless redundant huge files
generated by the software.
Please don't laugh...Now I'm doing this by copying the files to the
scratch area of the first node in the "start pe phase" and copying it
back in the "stop pe phase": It was my first try and I discovered too
late the existence of the prolog/epilog way which now I think is
should be "the way" of doing this.
Anyway, actually the users need to do a
cd /net/`hostname -s`/scratch/${USER}.${JOB_ID}
in the job script they submit in order to keep the mechanism working.
Now I have a new "user" which in fact is an automated system which I
prefer not to tweak and so I think to adapt GE to that automated
system.
What I'm trying to achieve is to have a system configured in this way
but "hardcoded" and transparent to the end user.
I suppose that the prolog/epilog is the right place to do the
first/last step (copying around data) and the starter_method is the
right way for doing the other step, now I need to figure out how to
do it and what side effects could emerge: any idea on the second
question?
Thanks
Stefano
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users