Re: [Oscar-users] Jobs not executed in all hosts

Michael Edwards Tue, 21 Nov 2006 19:23:08 -0800

Comments inline:

On 11/21/06, Saravana Kumar <[EMAIL PROTECTED]> wrote:
> On Monday 20 November 2006 20:01, Michael Edwards wrote:
> Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time
> S Time
> -------------------- -------- -------- ---------- ------ ----- --- ------
> ----- - -----
> 66.hulk.gai.net      user workq    TestJob      6571     3   1    --  10000 R
> 00:02
>    node2+node1+node3


That output only says that you have successfully reserved 3 nodes.
That is, that job has the resources of node 1 2 and 3 available.
Torque does not know if they are being used at all, that is something
you would have to check with ganglia.  Open a web browser and go to
http://localhost/ganglia and you will get a lot of performance
information about the cluster.  Run the job and watch the cpu graphs
of nodes 1,2 and 3.  I strongly suspect you will only see one of them
do anything significant.

> I don't know why the queue(hp) that i created doesn't execute the job in all
> hosts.
> > Do  you want one identical copy of the sas script to run on each node?
> I don't understand your question. I want the sas job to run in one node(to
> keep file io in one node) but use all the free nodes' cpu cycles which i
> think will improve the performance and decreases the code running time.

Unless SAS is much smarter than I think it is, this is unlikely to
happen with out a lot of work on your part.  Clusters (of any sort,
this is not an OSCAR problem) do not divide up work automatically or
smartly as you (and lots of other people, don't feel bad) want them
to.  The way programs work is too different for this to be done in any
sort of intelegent way I have heard of.  There are some cluster
programs (openMosix, kerrighed) which will try to do this to a very
limited degree, but they still require very specially designed
programs to gain any significant advantage.  Kerrighed has an Oscar
project called SIS Oscar which is working on a 5.0 based version, I
believe, if you want to try it.

Anyway, the take home point is that to spread across nodes in the way
you are describing requires specially writen software, which is very
rare.  There are basically two ways clusters are used.  Custom writen
(usually) code which uses MPI or some other communication protocol to
split up the work among the nodes in a smart (custom) way.  This is
called parrallel programing.  The other way is to have many
non-parrallel jobs running on the cluster at the same time. You could
run your SAS script on all 3 nodes for example, walking a parameter
space.  This is called serial programing, and still usually requires
some clever scripting to deal with the peculiarities of cluster
computing.

I am neglecting the type of automatic parallelizaton which you are
describing because I see it as a special case of parallel programing
which, as I mentioned, basically doesn't exist.

Others may disagree with me on this point, so we'll see if they pipe in.

> My sas jobs have lot of io. Is it better to run the sas job in one node and
> use other node's cpu cycles or can the sasjob itself be split to multiple
> disks and cpus with the script(not clear on this).
>
> The whole point of installing this cluster suite was to minimize the time
> taken by sas jobs.

I'll poke around and see if SAS has a parallel programing module, I
know matlab does, but they still would require extra coding to use.

The cluster can still help improve your overall processing time, but
it will probably be in a serial mode rather than a parallel mode.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Re: [Oscar-users] Jobs not executed in all hosts

Reply via email to