[galaxy-dev] Job scheduling: FIFO, or fairer to multiple users?
Hello all, I'm curious if there is any way to manipulate the Galaxy job queuing in order to be 'fairer' to multiple simultaneous users. My impression is that Galaxy uses a simple FIFO queue itself, with for cluster jobs offloaded to the cluster queue immediately. In our case, I'm looking at large BLAST jobs (e.g. 20k queries against NR), which by their nature are easily subdivided between nodes (by dividing the query file up). We run these as one job per node (giving multiple cores for threading). That works nicely - the question I am currently pondering is tuning the split strategy, and multiple users. Specifically we get queue blocking if any one large BLAST jobs is divided into as many or more sub-jobs than we have cluster nodes in the BLAST queue. You can have one user's big BLAST job blocking multiple other user's small BLAST jobs even starting. I appreciate whether this is a problem will depend on the typical jobs run on each Galaxy instance, and the number and size of nodes in the local cluster - which makes a one-size-fits all strategy hard. I know that in order to be back-end agnostic, Galaxy takes limited advantage of different cluster backends - but perhaps the new 'run jobs as user' functionality might be helpful to allow the cluster to balance jobs between users? Is anyone doing that already? Another idea would be for Galaxy to manage its job queue on a user basis. Currently Galaxy submits all its jobs directly to the cluster, which can build up a backlog of pending jobs (whose scheduling is now out of Galaxy's control - probably simple FIFO depending on the cluster). Rather than giving the queued jobs to the cluster immediately, Galaxy could cache the jobs, and submit them gradually (monitor the cluster queue to see when it needs topping up). This would then enable Galaxy to interleave jobs from different users - any other queuing strategy. Too complicated? I think this is only a problem when the number of cluster nodes (in any given queue) is similar to or smaller than the number of parts a job might be broken up into. My guess is the public Galaxy doesn't do much job splitting (this code is quite new and not many of the wrappers exploit it), and has a large cluster. Is anyone else running into this kind of issues? Perhaps when Galaxy users are in competition with other cluster users? Thanks, Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Remove libraries using Galaxy code or API
On May 1, 2012, at 4:32 AM, liram_va...@agilent.com wrote: Hi Nate, Great! Thanks. Any chance that this change will be also included on galaxy-dist soon? Our next dist was supposed to be out already but it was stalled to fix a few bugs. It'll most likely be in the next one, whenever that is. --nate Thanks, Liram -Original Message- From: Nate Coraor [mailto:n...@bx.psu.edu] Sent: Monday, April 30, 2012 11:29 PM To: VARDI,LIRAM (A-Labs,ex1) Cc: galaxy-...@bx.psu.edu; BEN-DOR,AMIR (A-Labs,ex1) Subject: Re: [galaxy-dev] Remove libraries using Galaxy code or API On Apr 23, 2012, at 4:55 AM, liram_va...@agilent.com liram_va...@agilent.com wrote: Hello, I am using Galaxy API for some actions and I must say that this is indeed a really great feature with a great power. Anyway, I am trying to write a python script that one of its goals is to remove some data libraries, But until now, I was unable to find a way to remove data library or some of its datasets using the API or by direct call to Galaxy's code. I found a old post that claim that this feature is not yet implemented. My questions: 1) Is this has changed since? I mean, is there a way now to clean or remove completely a data library? 2) Is there a way to use Galaxy code to remove a library? Such as a function that can be used in my script to remove this library? Hi Liram, I've just implemented library deletion in changeset 1640cbaafd09. --nate Thanks in advance! Liram ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Remove libraries using Galaxy code or API
Hi Nate, Great! Thanks. Any chance that this change will be also included on galaxy-dist soon? Thanks, Liram -Original Message- From: Nate Coraor [mailto:n...@bx.psu.edu] Sent: Monday, April 30, 2012 11:29 PM To: VARDI,LIRAM (A-Labs,ex1) Cc: galaxy-...@bx.psu.edu; BEN-DOR,AMIR (A-Labs,ex1) Subject: Re: [galaxy-dev] Remove libraries using Galaxy code or API On Apr 23, 2012, at 4:55 AM, liram_va...@agilent.com liram_va...@agilent.com wrote: Hello, I am using Galaxy API for some actions and I must say that this is indeed a really great feature with a great power. Anyway, I am trying to write a python script that one of its goals is to remove some data libraries, But until now, I was unable to find a way to remove data library or some of its datasets using the API or by direct call to Galaxy's code. I found a old post that claim that this feature is not yet implemented. My questions: 1) Is this has changed since? I mean, is there a way now to clean or remove completely a data library? 2) Is there a way to use Galaxy code to remove a library? Such as a function that can be used in my script to remove this library? Hi Liram, I've just implemented library deletion in changeset 1640cbaafd09. --nate Thanks in advance! Liram ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Galaxy not killing split cluster jobs
Hi all, We're running our Galaxy with an SGE cluster, using the DRMAA support in Galaxy, and job splitting. I've noticed if the user cancels a job (that was running or queued on the cluster) while the job is shows as deleted in Galaxy, looking at the queue on the cluster with qstat shows it persists. I've not seen anything similar reported except for this PBS issue: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003633.html When I don't use job splitting, cancelling jobs seems to work: galaxy.jobs.handler DEBUG 2012-05-01 14:46:47,755 stopping job 57 in drmaa runner galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:47,756 (57/26504) Being killed... galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:47,757 (57/26504) Removed from DRM queue at user's request galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:48,441 (57/26504) state change: job finished, but failed galaxy.jobs.runners.drmaa DEBUG 2012-05-01 14:46:48,441 Job output not returned from cluster When I am using job splitting, cancelling jobs fails: galaxy.jobs.handler DEBUG 2012-05-01 14:28:30,364 stopping job 56 in tasks runner galaxy.jobs.runners.tasks WARNING 2012-05-01 14:28:30,386 stop_job(): 56: no PID in database for job, unable to stop That warning comes from lib/galaxy/jobs/runners/tasks.py which starts: def stop_job( self, job ): # DBTODO Call stop on all of the tasks. #if our local job has JobExternalOutputMetadata associated, then our primary job has to have already finished if job.external_output_metadata: pid = job.external_output_metadata[0].job_runner_external_pid #every JobExternalOutputMetadata has a pid set, we just need to take from one of them else: pid = job.job_runner_external_id if pid in [ None, '' ]: log.warning( stop_job(): %s: no PID in database for job, unable to stop % job.id ) return pid = int( pid ) ... I'm a little confused about tasks.py vs drmaa.py but that TODO comment looks pertinent. Is that the problem here? Regards, Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Galaxy not killing split cluster jobs
I'll take care of it. Thanks for reminding me about the TODO! On May 1, 2012, at 10:03 AM, Dannon Baker dannonba...@me.com wrote: On May 1, 2012, at 9:51 AM, Peter Cock wrote: I'm a little confused about tasks.py vs drmaa.py but that TODO comment looks pertinent. Is that the problem here? The runner in tasks.py is what executes the primary job, splitting and creating the tasks. The tasks themselves are actually injected back into the regular job queue and run as normal jobs with the usual runners (in your case drmaa). And, yes, it should be fairly straightforward to add, but this just hasn't been implemented yet. -Dannon ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] May 2012 Galaxy Update
Hello all, The May 2012 Galaxy Update http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05is now available ( http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05). *Galaxy Updatehttp://wiki.g2.bx.psu.edu/GalaxyUpdates * is a (mostly) monthly summary of what is going on in the Galaxy community. *Galaxy Updates* complement the *Galaxy Development News Briefshttp://wiki.g2.bx.psu.edu/DevNewsBriefs * which accompany new Galaxy releases and focus on Galaxy code updates. *Highlights:* - GCC2012: Just 3 Months Away!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#GCC2012:_Just_3_Months_Away.21 - Training Day needs your input!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Training_Day:_We_Need_Your_Help.21 Please tell us what you want to be coveredhttps://docs.google.com/spreadsheet/viewform?formkey=dHBIRVB6cEhpTWpGN1pXSjhGdGR0aVE6MQ#gid=0. - Galaxy Tour de France 2012: This Month!http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Galaxy_Tour_de_France_2012 - A new public server: Nebula for ChIP-Seqhttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#New_Public_Server:_Nebula - 31 New Papershttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#New_Papers - Open Positionshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Who.27s_Hiringat six different institutions - Upcoming Events and Deadlineshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Upcoming_Events_and_Deadlines - Tool Shed Contributionshttp://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_05#Tool_Shed_Contributions As always, if you have anything you would like to see in the June *Galaxy Update http://wiki.g2.bx.psu.edu/GalaxyUpdates*, please let me know. Thanks, Dave Clements -- http://galaxyproject.org/GCC2012 http://galaxyproject.org/wiki/GCC2012 http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Full path through API display.py
Hi, Recently Full Path display was added as an option. I was wondering if this information could also be available when accessing a dataset information through the API. Thanks, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Full path through API display.py
Sure, good idea. I'll tie it in. -Dannon On May 1, 2012, at 3:03 PM, Carlos Borroto wrote: Hi, Recently Full Path display was added as an option. I was wondering if this information could also be available when accessing a dataset information through the API. Thanks, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Error message with terminated jobs
Hi, We run Galaxy and Sun Grid Engine cluster environment (DRMAA API). When i start a job (e.g. : blast), the job runs on the cluster, and produces results output files. But in the history web interface, the job status is in error with this message : malloc: using debugging hooks /bin/sh: module: line 1: syntax error: unexpected end of file /bin/sh: error importing function definition for `module' malloc: using debugging hooks Any clue ? Thanks in advance -- Christophe Caron Station Biologique / Service Informatique et Bio-informatique Place Georges Teissier 29680 Roscoff Analysis and Bioinformatics for Marine Science http://abims.sb-roscoff.fr/ christophe.ca...@sb-roscoff.fr tél: +33 (0)2 98 29 25 43 / +33 (0)6 07 83 54 77 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] how to get macs not macs14 for Galaxy
hi Dan, Thanks for your replay. I have now installed MACS 1.3.7.1, so the script macs_wrapper.py is not trying to call the correct executable. When I try to run MACS from galaxy, I do however get what an error: Messages from MACS: INFO @ Tue, 01 May 2012 14:47:42: # ARGUMENTS LIST: # name = MACS_in_Galaxy # format = SAM # ChIP-seq file = /usr/local/galaxy/galaxy-dist/database/files/002/dataset_2345.dat # control file = None # effective genome size = 2.70e+09 # tag size = 25 # band width = 300 # model fold = 32 # pvalue cutoff = 1.00e-05 # Ranges for calculating regional lambda are : peak_region,1000,5000,1 INFO @ Tue, 01 May 2012 14:47:42: #1 read tag files... INFO @ Tue, 01 May 2012 14:47:42: #1 read treatment tags... Traceback (most recent call last): File /usr/local/bin/macs, line 273, in main() File /usr/local/bin/macs, line 57, in main (treat, control) = load_tag_files_options (options) File /usr/local/bin/macs, line 252, in load_tag_files_options treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag)) File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 1480, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 1524, in __fw_parse_line thisstart = int(thisfields[3]) - 1 ValueError: invalid literal for int() with base 10: '*' how should I look at it? thanks, Sergei On 9 April 2012 18:13, Daniel Blankenberg d...@bx.psu.edu wrote: Hi Sergei, The current MACS tool that comes with Galaxy uses MACS 1.3.7.1 from http://liulab.dfci.harvard.edu/MACS/Download.html. Thanks for using Galaxy, Dan On Mar 28, 2012, at 6:38 PM, Sergei Manakov wrote: Hello, I am trying to set up MACS tool on local Galaxy. Galaxy comes with it's macs-wrapper.py and macs-wrapper.xml, but it wants to use macs not macs14 executable. I tried to to editing macs-wrapper.py to make it use macs14 instead, but some options are not the same between the two, and the tool crashes. I would appreciate if someone could give me an advice on where I can get macs executable for 64-bit Linux. thanks, Sergei -- Sergei (Siarhei Manakou) Manakov California Institute of Technology +1 626 395 3593 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Sergei (Siarhei Manakou) Manakov California Institute of Technology +1 626 395 3593 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] how to get macs not macs14 for Galaxy
sorry, I meant is NOW trying to call the correct executable, but I still get the error. thanks, S. On 1 May 2012 14:51, Sergei Manakov siarheimana...@gmail.com wrote: hi Dan, Thanks for your replay. I have now installed MACS 1.3.7.1, so the script macs_wrapper.py is not trying to call the correct executable. When I try to run MACS from galaxy, I do however get what an error: Messages from MACS: INFO @ Tue, 01 May 2012 14:47:42: # ARGUMENTS LIST: # name = MACS_in_Galaxy # format = SAM # ChIP-seq file = /usr/local/galaxy/galaxy-dist/database/files/002/dataset_2345.dat # control file = None # effective genome size = 2.70e+09 # tag size = 25 # band width = 300 # model fold = 32 # pvalue cutoff = 1.00e-05 # Ranges for calculating regional lambda are : peak_region,1000,5000,1 INFO @ Tue, 01 May 2012 14:47:42: #1 read tag files... INFO @ Tue, 01 May 2012 14:47:42: #1 read treatment tags... Traceback (most recent call last): File /usr/local/bin/macs, line 273, in main() File /usr/local/bin/macs, line 57, in main (treat, control) = load_tag_files_options (options) File /usr/local/bin/macs, line 252, in load_tag_files_options treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag)) File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 1480, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File /usr/local/lib/python2.6/dist-packages/MACS/IO/__init__.py, line 1524, in __fw_parse_line thisstart = int(thisfields[3]) - 1 ValueError: invalid literal for int() with base 10: '*' how should I look at it? thanks, Sergei On 9 April 2012 18:13, Daniel Blankenberg d...@bx.psu.edu wrote: Hi Sergei, The current MACS tool that comes with Galaxy uses MACS 1.3.7.1 from http://liulab.dfci.harvard.edu/MACS/Download.html. Thanks for using Galaxy, Dan On Mar 28, 2012, at 6:38 PM, Sergei Manakov wrote: Hello, I am trying to set up MACS tool on local Galaxy. Galaxy comes with it's macs-wrapper.py and macs-wrapper.xml, but it wants to use macs not macs14 executable. I tried to to editing macs-wrapper.py to make it use macs14 instead, but some options are not the same between the two, and the tool crashes. I would appreciate if someone could give me an advice on where I can get macs executable for 64-bit Linux. thanks, Sergei -- Sergei (Siarhei Manakou) Manakov California Institute of Technology +1 626 395 3593 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Sergei (Siarhei Manakou) Manakov California Institute of Technology +1 626 395 3593 -- Sergei (Siarhei Manakou) Manakov California Institute of Technology +1 626 395 3593 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Configuring Galaxy for FTP upload
Thanks for the reply, Nate! Cheers, CL ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Unable to set BAM Metadata
Thanks for the reply! I'll try that one. I'll come back if the problem still persists. Cheers, CL ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/