Re: [galaxy-dev] PBS_Python Unable to submit jobs

2013-04-05 Thread Steve.Mcmahon
Hi Carrie,

I've had the same problem.  I wanted to get Galaxy to submit to a cluster which 
was running Torque 4.x.  Torque clients need to be 4.x to work with the that 
version of the server.  I spent a bit of time looking into this and determined 
that pbs_python used by Galaxy is not compatible with Torque 4.x.  A new 
version would need to be built.

At that stage I investigated using the DRMAA runner to talk to the Torque 4.x 
server.  That did work if I built the Torque clients with the server name hard 
coded --with-default-server.

What the DRMAA runner didn't do was data staging as the PBS runner does.  So I 
started working on some code for that.

I'm looking at giving up on the data staging by moving the Galaxy instance to 
the cluster.

Sorry I didn't help.  I would be interested in comments from Galaxy developers 
about whether the PBS runner will be supported in the future and, hence, 
whether Torque 4.x will be supported.  I'm also interested whether the DRMAA 
runner will support data staging or whether Galaxy instances really need to 
share file systems with a cluster.

Regards.

Steve McMahon
Solutions architect  senior systems administrator
ASC Cluster Services
Information Management  Technology (IMT)
CSIRO
Phone: +61-2-62142968  |  Mobile:  +61-4-00779318
steve.mcma...@csiro.aumailto:steve.mcma...@csiro.au |  
www.csiro.auhttp://www.csiro.au/
PO Box 225, DICKSON  ACT  2602
1 Wilf Crane Crescent, Yarralumla  ACT  2600

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Ganote, Carrie L
Sent: Friday, 5 April 2013 4:52 AM
To: galaxy-...@bx.psu.edu
Subject: [galaxy-dev] PBS_Python Unable to submit jobs

Hi Galaxy dev,

My setup is a bit non-standard, but I'm getting the following error:
galaxy.jobs.runners.pbs WARNING 2013-04-04 13:24:00,590 (75) pbs_submit failed 
(try 1/5), PBS error 15044: Resources temporarily unavailable

Here is my setup:
Torque3 is installed in /usr/local/bin and I can use it to connect with 
(Default) server1.
Torque4 is installed in /N/soft/ and I can use it to connect to server2.

I'm running trq_authd so torque4 should work.
I can submit jobs to both servers from the command line. For server2, I specify 
the path to qsub and the servername (-q batch@server2).

In Galaxy, I used torquelib_dir=/N/soft to scramble pbs_python.
My path is pointing at /N/soft first so 'which qsub' returns torque4.
If I just use pbs:///, it will submit a job to server1 (shouldn't work, because 
/N/soft/qsub doesn't work from the commandline, since the default server1 is 
running torque3).
If I use pbs://-l vmem=100mb,walltime=00:30:00/, it won't work (the server 
string in pbs.py becomes -l vmem=100mb,walltime=00:30:00 intsead of server1)
If I use pbs://server2/, I get the Resources temp unavail error above. The 
server string is server2, and I put the following in pbs.py:
whichq = os.popen(which qsub).read()
stats = os.popen(qstat @server2).read()
These return the correct values for server2 using the correct torque version4.

I'm stumped as to why this is not making the connection. It's probably 
something about the python implementation I'm overlooking.

Thanks for any advice,

Carrie Ganote
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] PBS_Python Unable to submit jobs

2013-04-05 Thread Ganote, Carrie L
Hi Steve,

Apologies, I didn't check the Galaxy list before sending you an email.

I came to mostly the same conclusion. I installed the Torque 4.x client on the 
submit node and I can submit jobs that way through the command line without 
issue.

I can't get pbs_submit to work from pbs_python, however.  Seems like it has to 
be some way in which the swig is translating the C code, or the python library 
is somehow not working with trqauthd over localhost:15005, or some other 
mysterious error.

Drmaa was my first choice, but our server was configured without 
--enable-drmaa, so I haven't been able to submit to it that way either. We've 
previously used pbs before so I thought it was a pretty safe backup plan!

Luckily, I don't have to do staging, we mounted our shared filesystem onto the 
VM running galaxy - you might look into Lustre if you have any ability to 
control that. I highly recommend bribing your sysadmins with beer.

I do hope there will be continued work done to address this issue - not because 
I have anything against drmaa, but because I suspect that the error lies upon 
false assumptions somewhere in the code that would do well to be fixed.

Thanks for your help!

Carrie Ganote

From: steve.mcma...@csiro.au [steve.mcma...@csiro.au]
Sent: Friday, April 05, 2013 2:24 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] PBS_Python Unable to submit jobs

Hi Carrie,

I’ve had the same problem.  I wanted to get Galaxy to submit to a cluster which 
was running Torque 4.x.  Torque clients need to be 4.x to work with the that 
version of the server.  I spent a bit of time looking into this and determined 
that pbs_python used by Galaxy is not compatible with Torque 4.x.  A new 
version would need to be built.

At that stage I investigated using the DRMAA runner to talk to the Torque 4.x 
server.  That did work if I built the Torque clients with the server name hard 
coded --with-default-server.

What the DRMAA runner didn’t do was data staging as the PBS runner does.  So I 
started working on some code for that.

I’m looking at giving up on the data staging by moving the Galaxy instance to 
the cluster.

Sorry I didn’t help.  I would be interested in comments from Galaxy developers 
about whether the PBS runner will be supported in the future and, hence, 
whether Torque 4.x will be supported.  I’m also interested whether the DRMAA 
runner will support data staging or whether Galaxy instances really need to 
share file systems with a cluster.

Regards.

Steve McMahon
Solutions architect  senior systems administrator
ASC Cluster Services
Information Management  Technology (IMT)
CSIRO
Phone: +61-2-62142968  |  Mobile:  +61-4-00779318
steve.mcma...@csiro.aumailto:steve.mcma...@csiro.au |  
www.csiro.auhttp://www.csiro.au/
PO Box 225, DICKSON  ACT  2602
1 Wilf Crane Crescent, Yarralumla  ACT  2600

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Ganote, Carrie L
Sent: Friday, 5 April 2013 4:52 AM
To: galaxy-...@bx.psu.edu
Subject: [galaxy-dev] PBS_Python Unable to submit jobs

Hi Galaxy dev,

My setup is a bit non-standard, but I'm getting the following error:
galaxy.jobs.runners.pbs WARNING 2013-04-04 13:24:00,590 (75) pbs_submit failed 
(try 1/5), PBS error 15044: Resources temporarily unavailable

Here is my setup:
Torque3 is installed in /usr/local/bin and I can use it to connect with 
(Default) server1.
Torque4 is installed in /N/soft/ and I can use it to connect to server2.

I'm running trq_authd so torque4 should work.
I can submit jobs to both servers from the command line. For server2, I specify 
the path to qsub and the servername (-q batch@server2).

In Galaxy, I used torquelib_dir=/N/soft to scramble pbs_python.
My path is pointing at /N/soft first so 'which qsub' returns torque4.
If I just use pbs:///, it will submit a job to server1 (shouldn't work, because 
/N/soft/qsub doesn't work from the commandline, since the default server1 is 
running torque3).
If I use pbs://-l vmem=100mb,walltime=00:30:00/, it won't work (the server 
string in pbs.py becomes -l vmem=100mb,walltime=00:30:00 intsead of server1)
If I use pbs://server2/, I get the Resources temp unavail error above. The 
server string is server2, and I put the following in pbs.py:
whichq = os.popen(which qsub).read()
stats = os.popen(qstat @server2).read()
These return the correct values for server2 using the correct torque version4.

I'm stumped as to why this is not making the connection. It's probably 
something about the python implementation I'm overlooking.

Thanks for any advice,

Carrie Ganote
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http

[galaxy-dev] PBS_Python Unable to submit jobs

2013-04-04 Thread Ganote, Carrie L
Hi Galaxy dev,

My setup is a bit non-standard, but I'm getting the following error:
galaxy.jobs.runners.pbs WARNING 2013-04-04 13:24:00,590 (75) pbs_submit failed 
(try 1/5), PBS error 15044: Resources temporarily unavailable

Here is my setup:
Torque3 is installed in /usr/local/bin and I can use it to connect with 
(Default) server1.
Torque4 is installed in /N/soft/ and I can use it to connect to server2.

I'm running trq_authd so torque4 should work.
I can submit jobs to both servers from the command line. For server2, I specify 
the path to qsub and the servername (-q batch@server2).

In Galaxy, I used torquelib_dir=/N/soft to scramble pbs_python.
My path is pointing at /N/soft first so 'which qsub' returns torque4.
If I just use pbs:///, it will submit a job to server1 (shouldn't work, because 
/N/soft/qsub doesn't work from the commandline, since the default server1 is 
running torque3).
If I use pbs://-l vmem=100mb,walltime=00:30:00/, it won't work (the server 
string in pbs.py becomes -l vmem=100mb,walltime=00:30:00 intsead of server1)
If I use pbs://server2/, I get the Resources temp unavail error above. The 
server string is server2, and I put the following in pbs.py:
whichq = os.popen(which qsub).read()
stats = os.popen(qstat @server2).read()
These return the correct values for server2 using the correct torque version4.

I'm stumped as to why this is not making the connection. It's probably 
something about the python implementation I'm overlooking.

Thanks for any advice,

Carrie Ganote
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/