[galaxy-dev] Dynamic job runner configuration followup

2012-06-10 Thread John Chilton
Following up on some recent threads that have referenced my dynamic
job runner configuration work, Nate and I have discussed these issues
and I have created a new pull request based on those discussions and I
am confident these changes will be accepted soon.

Things are basically as I outlined them in my previous description:
http://www.mail-archive.com/galaxy-dev@lists.bx.psu.edu/msg03010.html
except for where to place the rules has changed. Now instead of
placing them in lib/galaxy/jobs/rules.py you will need to create a
file (or multiple files for them) in your lib/galaxy/jobs/rules the
directory.

My previous e-mail was a technical description of how it worked, I
think maybe that is why it didn't generate the excitement I had hoped
:). I think instead describing some concrete use cases might be
better. So here are six cool things you can do with dynamic job
runners.

1) Change maximum walltime based on job parameters or file sizes.
2) Implement wild card like configuration of job runners instead of
configuring one tool at a time.
3) Create queues with different priorities, and then give higher
priorities to people giving demos (or directors or testers etc...).
4) Utilize environment variables to determine job runner configurations.
5) Limit a particular tool's use to only white-listed users.
6) Tie into Galaxy's job history tables to throttle those problem
users clogging up your Galaxy instance.

To do any of these you will need pull in the changes from bitbucket,
add dynamic to the start_job_runners configuration option in
universe_wsgi.ini, and create a file such as
lib/galaxy/jobs/rules/200_runners.py for your rules.

Below I describe how to do these, though I haven't actually tested the
code snippets so they should be considered just an outline of the
idea, your mileage may vary.

1) Change maximum walltime based on job parameters or file sizes.

Lets say you want to change the max walltime of the BlastN based on
the size of the input query. First you would add the line
ncbi_blastn_wrapper=dynamic:///python to universe_wsgi.ini. Next in
200_runners.py you would add a function such as following:

import os

def ncbi_blastn_wrapper(job):
 inp_data = dict( [ ( da.name, da.dataset ) for da in
job.input_datasets ] ) inp_data.update( [ ( da.name, da.dataset ) for
da in job.input_library_datasets ] )
 query_file = inp_data[ query ].file_name
 query_size = os.path.getsize( query_file )
 if query_size  1024 * 1024:
   return 'pbs:-l walltime=24:00:00/'
 else:
   return 'pbs:-l walltime=12:00:00/'

2) Implement wild card like configuration of job runners instead of
configuring one tool at a time.

Lets say you have a coworker called J. Johnson ummm wait no Jim J. and
he maintains a tool suite for a fictitious metagenomics application
called fathur. Assume also that this fathur suite has dozens of tools
clogging up your configuration file because they need to all use
pbs:-l procs=8/ instead of the default pbs:/. To configure all
the fathur tools at once, in the [app:main] of universe_wsgi.ini would
would change default_cluster_job_runner from pbs:/ to
dynamic:///python/default_runner and then add the following function
to 200_runners.py.

def default_runner(tool_id):
 if tool_id.startswith('fathur_'):
   return 'pbs:-l procs=8/'
 else:
   return 'pbs:/'

3) Create queues with different priorities, and then give higher
priorities to people giving demos.

Lets say the users defined by the admin_users configuration property
in universe_wsgi.ini are the ones that give demos and do testing and
so you want to increase their priority for all jobs, and lets say to
do this you have created queues gx_normal and gx_important in your
queue manager with differing priorities. You could then take the
default_runner concept from the previous example and do something like
this:

def default_runner(app, user_email):
 admin_users = app.config.get( admin_users,  ).split( , )
 if user_email in admin_users:
   return 'pbs:///gx_important//'
 else:
   return 'pbs:///gx_normal//'

You could define the list of users right in this file instead of
pulling it in from admin_users and then apply this concept to give
higher priority to director or testers or paying users. Alternatively,
you could give lower priority to external users, people you just don't
like, etc

4) Utilize environment variables to determine job runner configurations.

Lets say you want cufflinks to always use as many cores as are
available, but in your testing environment you only have 4 cores
available whereas in production you have 16. Lets also say you have
the environment variable MAX_CORES set and this will be different on
each machine. You would then update universe_wsgi.ini to have
cufflinks use the dynamic job config (cufflinks=dynamic:///python) and
then add the following to 200_runners.py

import os

def cufflinks():
 return 'pbs:-l procs=%s/' % os.environ['MAX_CORES']

(Warning you would need to update the cufflinks 

[galaxy-dev] GCC2012 Early Registration ENDS THIS MONDAY JUNE 11

2012-06-10 Thread Dave Clements
Hello all,

Just a *final* reminder that early registration for the 2012 Galaxy
Community Conference (GCC2012)http://galaxyproject.org/wiki/Events/GCC2012
*closes on Monday June 11 (*which is probably* today* when you read this*)*.
Registering early saves 36 to 42% on registration costs, and allows you to
sign up for the GCC2012 Training
Dayhttp://galaxyproject.org/wiki/Events/GCC2012/TrainingDayand book
discounted
conference lodginghttp://wiki.g2.bx.psu.edu/Events/GCC2012/Logistics#Lodging
*before they fill up*.

*Register today http://wiki.g2.bx.psu.edu/Events/GCC2012/Register.
*

GCC2012 http://galaxyproject.org/wiki/Events/GCC2012 will be held July
25-27, in Chicago, Illinois, United States.  This year GCC2012 features a full
day of tutorial
sessionshttp://galaxyproject.org/wiki/Events/GCC2012/TrainingDaywith
3 parallel tracks, each featuring four, 90 minute workshops and
covering 10 different topic, including the newly added Variant and SNP
Analsys, RNA-Seq Analysis, and Galaxy Code Architecture
sessionshttp://wiki.g2.bx.psu.edu/Events/GCC2012/TrainingDay.


The two-day main
meetinghttp://wiki.g2.bx.psu.edu/Events/GCC2012/Program#Day_1:_July_26.2C_Thursdayincludes
over 25 talks by Galaxy community members and Galaxy developers
addressing the challenges of integrating, analyzing, and sharing the
diverse and very large datasets that are now typical in biomedical
research.

GCC2012 is an opportunity to share best practices with, and learn from, a
large community of researchers and support staff who are facing the
challenges of data-intensive biology. Galaxy
http://gmod.org/wiki/Galaxyis an open
web-based platform for data intensive biomedical
researchhttp://galaxyproject.orgthat is widely used and deployed at
research organizations of all sizes
around the world.

See you in Chicago!

Dave Clements, on behalf of the GCC2012 Organizing
Committeehttp://galaxyproject.org/wiki/Events/GCC2012/Organizing%20Committee
Links:
http://galaxyproject.org/GCC2012 http://galaxyproject.org/wiki/GCC2012
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Configuring Galaxy for FTP upload

2012-06-10 Thread Ciara Ledero
Hi all,

So I was not able to configure Galaxy immediately after this post of mine
due to the tasks that I first needed to do. Looking back, I forgot to ask
this question:

Do I need to perform the things stated in
http://wiki.g2.bx.psu.edu/Admin/Config/Upload%20via%20FTP? I just have set
ftp_upload_dir to a created folder and ftp_upload_site to something other
than None.

Thanks in advance for the help!

Cheers.

CL
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] About FTP Upload

2012-06-10 Thread Ciara Ledero
Hi all,

I checked basics.py after reading a similar post and saw this:

self.ftp_upload_dir = kwargs.get( 'ftp_upload_dir', None )
self.ftp_upload_site = kwargs.get( 'ftp_upload_site', None )
Can you guys tell me what the None part means?

Cheers,

CL
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/