Re: [gridengine users] automatically use of scratch area on compute node

Reuti Tue, 29 Jan 2013 10:52:14 -0800

Am 29.01.2013 um 19:40 schrieb Reuti:

> Am 29.01.2013 um 12:50 schrieb Stefano Bridi:
> 
>> On Tue, Jan 29, 2013 at 9:26 AM, William Hay <[email protected]> wrote:
>>> 
>>> On 25 January 2013 17:21, Stefano Bridi <[email protected]> wrote:
>>>> 
>>>> Hi all, is there a way to use the scratch area  (local disk) on the
>>>> compute node in a transparent way from the submitted script point of
>>>> view?
>>>> What I want to do is to copy to and from the compute node scratch area
>>>> the job data using the prolog/epilog but I need also to start the
>>>> submitted script in the scratch area instead of the cwd.
>>>> Is there a way?
>>> 
>>> Since you are mucking around with prolog and epilog I assume you have
>>> administrative control of the cluster.
>>> One solution would be to use a starter method to cd to $TMPDIR before
>>> execing the real job.  starter_method
>>> is a bit of a swiss army chainsaw though (a flexible but dangeous tool).
>>> 
>>> William
>> 
>> Yes, I'm the admin: the problem I want to solve in this way is to
>> lower the load of the central file server by using local scratch area
>> on the "master" node as a scratch area.
>> What I mean is that if the job is running in SMP or serial, it is the
>> local disk (/scratch) and, if the job is using multiple nodes (mpi),
>> it will be the local disk "/scratch" of the first node exported via
>> NFS and mounted on the fly via autofs "/net/n0000/scratch/" on the
>> other nodes.
>> By doing this, the traffic on the central file server ("/home") is
>> done only at the start and at the end of the job with the possibility
>> to apply some filter to throw away useless redundant huge files
>> generated by the software.
>> Please don't laugh...Now I'm doing this by copying the files to the
>> scratch area of the first node in the "start pe phase" and copying it
>> back in the "stop pe phase": It was my first try and I discovered too
>> late the existence of  the prolog/epilog way which now I think is
>> should be "the way" of doing this.
> 
> Whether you do it in the PE script or prolog/epilog is personal taste IMO. If 
> it's only necessary in case of a parallel run, the PE scripts might even be 
> the more appropriate place.
> 
> 
>> Anyway, actually the users need to do a
>> 
>> cd /net/`hostname -s`/scratch/${USER}.${JOB_ID}
> 
> Do you create these directories on your own instead of using the build in 
> $TMPDIR?
> 
> So this is done also on the machine where the jobscript runs, even so it 
> would be accessible in /scratch?
> 
> I'm sill not sure about the workflow in detail, but 2 ideas I got and maybe 
> you can make any use of them:
> 
> a) Submit the job with a hold to modify the -wd:
> 
> reuti@pc15370:~> qsub -h -l h=pc15370 test.sh
> Your job 5532 ("test.sh") has been submitted
> reuti@pc15370:~> qalter -wd /tmp/5532.1.all.q 5532
> modified working directory of job 5532
> reuti@pc15370:~> qrls 5532
> modified hold of job 5532
> 
> You need to submit with a hold, as you don't know the jobnumber beforehand. 
> So, no `cd` by hand necessary but a wrapper around `qsub` to do these steps 
> for you.
> 
> b) Use path aliasing in SGE in the file:
> 
> /usr/sge/default/common/sge_aliases
> 
> you can put a line for each exechost:
> 
> /dummy/                  *           pc15370    /tmp/
> 
> reuti@pc15370:~> qsub -h -l h=pc15370 -wd /foobar test.sh
> Your job 5533 ("test.sh") has been submitted
> reuti@pc15370:~> qalter -wd /dummy/5533.1.all.q 5533
> modified working directory of job 5533
> reuti@pc15370:~> qrls 5533
> modified hold of job 5533
> 
> You can submit with a plain /scratch/ there, and it will be replaced before 
> execution to /tmp/ (man sge_aliases). Maybe it can be used to map /scratch/ to


Correction:

You can submit with a plain /dummy/ there, and it will be replaced before 
execution to /tmp/ (man sge_aliases). Maybe it can be used to map /scratch/ to


> /net/n0000/scratch/ or alike for each exechost.
> 
> NB: It looks like a bug that the flag to enable path aliasing isn't set by 
> `qalter`, hence already at submission time it's necessary to use -cwd or -wd 
> /foobar to set it with an arbitrary path.
> 
> -- Reuti
> 
> 
>> in the job script they submit in order to keep the mechanism working.
>> Now I have a new "user" which in fact is an automated system which I
>> prefer not to tweak and so I think to adapt GE to that automated
>> system.
>> What I'm trying to achieve is to have a system configured in this way
>> but "hardcoded" and transparent to the end user.
>> I suppose that the prolog/epilog is the right place to do the
>> first/last step (copying around data) and the starter_method is the
>> right way for doing the other step, now I need to figure out  how to
>> do it and what side effects could emerge: any idea on the second
>> question?
>> 
>> Thanks
>> Stefano
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] automatically use of scratch area on compute node

Reply via email to