[galaxy-dev] Job scheduling: FIFO, or fairer to multiple users?

2012-05-01 Thread Peter Cock
Hello all,

I'm curious if there is any way to manipulate the Galaxy job
queuing in order to be 'fairer' to multiple simultaneous users.
My impression is that Galaxy uses a simple FIFO queue itself,
with for cluster jobs offloaded to the cluster queue immediately.

In our case, I'm looking at large BLAST jobs (e.g. 20k queries
against NR), which by their nature are easily subdivided
between nodes (by dividing the query file up). We run these
as one job per node (giving multiple cores for threading).
That works nicely - the question I am currently pondering
is tuning the split strategy, and multiple users.

Specifically we get queue blocking if any one large BLAST
jobs is divided into as many or more sub-jobs than we have
cluster nodes in the BLAST queue. You can have one user's
big BLAST job blocking multiple other user's small BLAST
jobs even starting.

I appreciate whether this is a problem will depend on the
typical jobs run on each Galaxy instance, and the number
and size of nodes in the local cluster - which makes a
one-size-fits all strategy hard.

I know that in order to be back-end agnostic, Galaxy takes
limited advantage of different cluster backends - but perhaps
the new 'run jobs as user' functionality might be helpful to
allow the cluster to balance jobs between users? Is anyone
doing that already?

Another idea would be for Galaxy to manage its job queue
on a user basis. Currently Galaxy submits all its jobs directly
to the cluster, which can build up a backlog of pending jobs
(whose scheduling is now out of Galaxy's control - probably
simple FIFO depending on the cluster). Rather than giving
the queued jobs to the cluster immediately, Galaxy could
cache the jobs, and submit them gradually (monitor the
cluster queue to see when it needs topping up). This
would then enable Galaxy to interleave jobs from different
users - any other queuing strategy. Too complicated?

I think this is only a problem when the number of cluster
nodes (in any given queue) is similar to or smaller than
the number of parts a job might be broken up into. My
guess is the public Galaxy doesn't do much job splitting
(this code is quite new and not many of the wrappers
exploit it), and has a large cluster.

Is anyone else running into this kind of issues? Perhaps
when Galaxy users are in competition with other cluster
users?

Thanks,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Remove libraries using Galaxy code or API

2012-05-01 Thread Nate Coraor
On May 1, 2012, at 4:32 AM, liram_va...@agilent.com wrote:

 Hi Nate,
 
 Great! Thanks.
 
 Any chance that this change will be also included on galaxy-dist soon?

Our next dist was supposed to be out already but it was stalled to fix a few 
bugs.  It'll most likely be in the next one, whenever that is.

--nate

 Thanks,
 Liram
 
 -Original Message-
 From: Nate Coraor [mailto:n...@bx.psu.edu] 
 Sent: Monday, April 30, 2012 11:29 PM
 To: VARDI,LIRAM (A-Labs,ex1)
 Cc: galaxy-...@bx.psu.edu; BEN-DOR,AMIR (A-Labs,ex1)
 Subject: Re: [galaxy-dev] Remove libraries using Galaxy code or API
 
 On Apr 23, 2012, at 4:55 AM, liram_va...@agilent.com 
 liram_va...@agilent.com wrote:
 
 Hello,
 
 I am using Galaxy API for some actions and I must say that this is indeed a 
 really great feature with a great power.
 Anyway, I am trying to write a python script that one of its goals is 
 to remove some data libraries, But until now, I was unable to find a way to 
 remove data library or some of its datasets using the API or by direct call 
 to Galaxy's code.
 I found a old post that claim that this feature is not yet implemented.
 My questions:
 1)  Is this has changed since? I mean, is there a way now to clean or 
 remove completely a data library?
 2)  Is there a way to use Galaxy code to remove a library?  Such as a 
 function that can be used in my script to remove this library?
 
 Hi Liram,
 
 I've just implemented library deletion in changeset 1640cbaafd09.
 
 --nate
 
 
 Thanks in advance!
 Liram
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other 
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Remove libraries using Galaxy code or API

2012-05-01 Thread liram_vardi
Hi Nate,

Great! Thanks.

Any chance that this change will be also included on galaxy-dist soon?

Thanks,
Liram

-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Monday, April 30, 2012 11:29 PM
To: VARDI,LIRAM (A-Labs,ex1)
Cc: galaxy-...@bx.psu.edu; BEN-DOR,AMIR (A-Labs,ex1)
Subject: Re: [galaxy-dev] Remove libraries using Galaxy code or API

On Apr 23, 2012, at 4:55 AM, liram_va...@agilent.com 
liram_va...@agilent.com wrote:

 Hello,
  
 I am using Galaxy API for some actions and I must say that this is indeed a 
 really great feature with a great power.
 Anyway, I am trying to write a python script that one of its goals is 
 to remove some data libraries, But until now, I was unable to find a way to 
 remove data library or some of its datasets using the API or by direct call 
 to Galaxy's code.
 I found a old post that claim that this feature is not yet implemented.
 My questions:
 1)  Is this has changed since? I mean, is there a way now to clean or 
 remove completely a data library?
 2)  Is there a way to use Galaxy code to remove a library?  Such as a 
 function that can be used in my script to remove this library?

Hi Liram,

I've just implemented library deletion in changeset 1640cbaafd09.

--nate

  
 Thanks in advance!
 Liram
  
  
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other 
 Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Galaxy not killing split cluster jobs

2012-05-01 Thread Peter Cock
Hi all,

We're running our Galaxy with an SGE cluster, using the DRMAA
support in Galaxy, and job splitting. I've noticed if the user cancels
a job (that was running or queued on the cluster) while the job is
shows as deleted in Galaxy, looking at the queue on the cluster
with qstat shows it persists.

I've not seen anything similar reported except for this PBS issue:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003633.html

When I don't use job splitting, cancelling jobs seems to work:

galaxy.jobs.handler DEBUG 2012-05-01 14:46:47,755 stopping job 57 in
drmaa runner
galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:47,756 (57/26504)
Being killed...
galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:47,757 (57/26504)
Removed from DRM queue at user's request
galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:48,441 (57/26504)
state change: job finished, but failed
galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:48,441 Job output not
returned from cluster

When I am using job splitting, cancelling jobs fails:

galaxy.jobs.handler DEBUG 2012-05-01 14:28:30,364 stopping job 56 in
tasks runner
galaxy.jobs.runners.tasks WARNING 2012-05-01 14:28:30,386 stop_job():
56: no PID in database for job, unable to stop

That warning comes from lib/galaxy/jobs/runners/tasks.py which starts:

def stop_job( self, job ):
# DBTODO Call stop on all of the tasks.
#if our local job has JobExternalOutputMetadata associated,
then our primary job has to have already finished
if job.external_output_metadata:
pid =
job.external_output_metadata[0].job_runner_external_pid #every
JobExternalOutputMetadata has a pid set, we just need to take from one
of them
else:
pid = job.job_runner_external_id
if pid in [ None, '' ]:
log.warning( stop_job(): %s: no PID in database for job,
unable to stop % job.id )
return
pid = int( pid )
...

I'm a little confused about tasks.py vs drmaa.py but that TODO
comment looks pertinent. Is that the problem here?

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Galaxy not killing split cluster jobs

2012-05-01 Thread Dannon Baker
I'll take care of it.  Thanks for reminding me about the TODO!



On May 1, 2012, at 10:03 AM, Dannon Baker dannonba...@me.com wrote:

 On May 1, 2012, at 9:51 AM, Peter Cock wrote:
 
 I'm a little confused about tasks.py vs drmaa.py but that TODO
 comment looks pertinent. Is that the problem here?
 
 The runner in tasks.py is what executes the primary job, splitting and 
 creating the tasks.  The tasks themselves are actually injected back into the 
 regular job queue and run as normal jobs with the usual runners (in your case 
 drmaa).
 
 And, yes, it should be fairly straightforward to add, but this just hasn't 
 been implemented yet.
 
 -Dannon
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] May 2012 Galaxy Update

2012-05-01 Thread Dave Clements
Hello all,

The May 2012 Galaxy Update
http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05is now available (
http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05). *Galaxy
Updatehttp://wiki.g2.bx.psu.edu/GalaxyUpdates
* is a (mostly) monthly summary of what is going on in the Galaxy
community. *Galaxy Updates* complement the *Galaxy Development News
Briefshttp://wiki.g2.bx.psu.edu/DevNewsBriefs
* which accompany new Galaxy releases and focus on Galaxy code updates.

*Highlights:*

   -

   GCC2012: Just 3 Months
Away!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#GCC2012:_Just_3_Months_Away.21
   -

  Training Day needs your
input!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Training_Day:_We_Need_Your_Help.21
  Please tell us what you want to be
coveredhttps://docs.google.com/spreadsheet/viewform?formkey=dHBIRVB6cEhpTWpGN1pXSjhGdGR0aVE6MQ#gid=0.

  -

   Galaxy Tour de France 2012: This
Month!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Galaxy_Tour_de_France_2012
   -

   A new public server: Nebula for
ChIP-Seqhttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#New_Public_Server:_Nebula
   -

   31 New Papershttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#New_Papers
   -

   Open 
Positionshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Who.27s_Hiringat
six different institutions
   -

   Upcoming Events and
Deadlineshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Upcoming_Events_and_Deadlines
   -

   Tool Shed 
Contributionshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Tool_Shed_Contributions

As always, if you have anything you would like to see in the June *Galaxy
Update http://wiki.g2.bx.psu.edu/GalaxyUpdates*, please let me know.
Thanks,

Dave Clements

-- 
http://galaxyproject.org/GCC2012 http://galaxyproject.org/wiki/GCC2012
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Full path through API display.py

2012-05-01 Thread Carlos Borroto
Hi,

Recently Full Path display was added as an option. I was wondering
if this information could also be available when accessing a dataset
information through the API.

Thanks,
Carlos
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Full path through API display.py

2012-05-01 Thread Dannon Baker
Sure, good idea.  I'll tie it in.

-Dannon

On May 1, 2012, at 3:03 PM, Carlos Borroto wrote:

 Hi,
 
 Recently Full Path display was added as an option. I was wondering
 if this information could also be available when accessing a dataset
 information through the API.
 
 Thanks,
 Carlos
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Error message with terminated jobs

2012-05-01 Thread Christophe Caron

Hi,

We run Galaxy and Sun Grid Engine cluster environment (DRMAA API).

When i start a job (e.g. : blast), the job runs on the cluster, and 
produces results output files. But in the history web interface, the job 
status is in error with this message :


malloc: using debugging hooks
/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `module'
malloc: using debugging hooks

Any clue ?

Thanks in advance

--

Christophe Caron

Station Biologique / Service Informatique et Bio-informatique
Place Georges Teissier  29680 Roscoff

Analysis and Bioinformatics for Marine Science
   http://abims.sb-roscoff.fr/

christophe.ca...@sb-roscoff.fr

tél: +33 (0)2 98 29 25 43 / +33 (0)6 07 83 54 77






___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] how to get macs not macs14 for Galaxy

2012-05-01 Thread Sergei Manakov
hi Dan,

Thanks for your replay. I have now installed MACS 1.3.7.1, so the script
macs_wrapper.py is not trying to call the correct executable.

When I try to run MACS from galaxy, I do however get what an error:

Messages from MACS:

INFO  @ Tue, 01 May 2012 14:47:42:
# ARGUMENTS LIST:
# name = MACS_in_Galaxy
# format = SAM
# ChIP-seq file =
/usr/local/galaxy/galaxy-dist/database/files/002/dataset_2345.dat
# control file = None
# effective genome size = 2.70e+09
# tag size = 25
# band width = 300
# model fold = 32
# pvalue cutoff = 1.00e-05
# Ranges for calculating regional lambda are : peak_region,1000,5000,1
INFO  @ Tue, 01 May 2012 14:47:42: #1 read tag files...
INFO  @ Tue, 01 May 2012 14:47:42: #1 read treatment tags...
Traceback (most recent call last):
  File /usr/local/bin/macs, line 273, in
main()
  File /usr/local/bin/macs, line 57, in main
(treat, control) = load_tag_files_options (options)
  File /usr/local/bin/macs, line 252, in load_tag_files_options
treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag))
  File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py,
line 1480, in build_fwtrack
(chromosome,fpos,strand) = self.__fw_parse_line(thisline)
  File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py,
line 1524, in __fw_parse_line
thisstart = int(thisfields[3]) - 1  
ValueError: invalid literal for int() with base 10: '*'

how should I look at it?


thanks,
Sergei




On 9 April 2012 18:13, Daniel Blankenberg d...@bx.psu.edu wrote:

 Hi Sergei,

 The current MACS tool that comes with Galaxy uses MACS 1.3.7.1 from
 http://liulab.dfci.harvard.edu/MACS/Download.html.


 Thanks for using Galaxy,

 Dan


 On Mar 28, 2012, at 6:38 PM, Sergei Manakov wrote:

  Hello,
 
  I am trying to set up MACS tool on local Galaxy. Galaxy comes with
  it's macs-wrapper.py and macs-wrapper.xml, but it wants to use macs
  not macs14 executable.
 
  I tried to to editing macs-wrapper.py to make it use macs14 instead,
  but some options are not the same between the two, and the tool
  crashes.
 
  I would appreciate if someone could give me an advice on where I can
  get macs executable for 64-bit Linux.
 
  thanks,
  Sergei
 
  --
  Sergei (Siarhei Manakou) Manakov
 
  California Institute of Technology
 
  +1 626 395 3593
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/




-- 
Sergei (Siarhei Manakou) Manakov

California Institute of Technology

+1 626 395 3593
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] how to get macs not macs14 for Galaxy

2012-05-01 Thread Sergei Manakov
sorry, I meant is NOW trying to call the correct executable, but I still
get the error.

thanks,
S.

On 1 May 2012 14:51, Sergei Manakov siarheimana...@gmail.com wrote:

 hi Dan,

 Thanks for your replay. I have now installed MACS 1.3.7.1, so the script
 macs_wrapper.py is not trying to call the correct executable.

 When I try to run MACS from galaxy, I do however get what an error:

 Messages from MACS:

 INFO  @ Tue, 01 May 2012 14:47:42:
 # ARGUMENTS LIST:
 # name = MACS_in_Galaxy
 # format = SAM
 # ChIP-seq file = 
 /usr/local/galaxy/galaxy-dist/database/files/002/dataset_2345.dat
 # control file = None
 # effective genome size = 2.70e+09
 # tag size = 25
 # band width = 300
 # model fold = 32
 # pvalue cutoff = 1.00e-05
 # Ranges for calculating regional lambda are : peak_region,1000,5000,1
 INFO  @ Tue, 01 May 2012 14:47:42: #1 read tag files...
 INFO  @ Tue, 01 May 2012 14:47:42: #1 read treatment tags...
 Traceback (most recent call last):
   File /usr/local/bin/macs, line 273, in
 main()
   File /usr/local/bin/macs, line 57, in main
 (treat, control) = load_tag_files_options (options)
   File /usr/local/bin/macs, line 252, in load_tag_files_options
 treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag))
   File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 
 1480, in build_fwtrack
 (chromosome,fpos,strand) = self.__fw_parse_line(thisline)
   File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 
 1524, in __fw_parse_line
 thisstart = int(thisfields[3]) - 1
 ValueError: invalid literal for int() with base 10: '*'

 how should I look at it?


 thanks,
 Sergei





 On 9 April 2012 18:13, Daniel Blankenberg d...@bx.psu.edu wrote:

 Hi Sergei,

 The current MACS tool that comes with Galaxy uses MACS 1.3.7.1 from
 http://liulab.dfci.harvard.edu/MACS/Download.html.


 Thanks for using Galaxy,

 Dan


 On Mar 28, 2012, at 6:38 PM, Sergei Manakov wrote:

  Hello,
 
  I am trying to set up MACS tool on local Galaxy. Galaxy comes with
  it's macs-wrapper.py and macs-wrapper.xml, but it wants to use macs
  not macs14 executable.
 
  I tried to to editing macs-wrapper.py to make it use macs14 instead,
  but some options are not the same between the two, and the tool
  crashes.
 
  I would appreciate if someone could give me an advice on where I can
  get macs executable for 64-bit Linux.
 
  thanks,
  Sergei
 
  --
  Sergei (Siarhei Manakou) Manakov
 
  California Institute of Technology
 
  +1 626 395 3593
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/




 --
 Sergei (Siarhei Manakou) Manakov

 California Institute of Technology

 +1 626 395 3593




-- 
Sergei (Siarhei Manakou) Manakov

California Institute of Technology

+1 626 395 3593
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Configuring Galaxy for FTP upload

2012-05-01 Thread Ciara Ledero
Thanks for the reply, Nate!

Cheers,

CL
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Unable to set BAM Metadata

2012-05-01 Thread Ciara Ledero
Thanks for the reply! I'll try that one. I'll come back if the problem
still persists.

Cheers,

CL
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/