Re: [galaxy-dev] running tools within tool
The BLAST+ binaries support multi-threaded operation, which is handled via the $GALAXY_SLOTS environment variable. This should be set automatically by Galaxy via your job runner settings, which allows you to (for example) allocate four cores to each BLAST job. In addition, the BLAST+ wrappers also support high level parallelism by task splitting if use_tasked_jobs = True is enabled in your universe_wsgi.ini configuration file. Essentially, the FASTA input query files are broken up into batches of 1000 sequences, a separate BLAST child job is run for each chunk, and then the BLAST output files are merged (in order). This is transparent for the end user. Each tool enables this via their XML file, e.g. parallelism method=multi split_inputs=query split_mode=to_size split_size=1000 merge_outputs=output1/parallelism This requires splitting support in the FASTA input datatypes, and merging support in the selected output datatype (e.g. BLAST XML, tabular, etc). This is done by methods in the Python datatype classes. It would be interesting to see if any of John's work on collections of files of the same type might fit nicely with this approach (and thus avoid the disk IO overhead of the merge step?). Peter On Mon, Feb 10, 2014 at 1:56 AM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Sun, Feb 9, 2014 at 7:50 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Thu, Feb 6, 2014 at 9:42 AM, Dannon Baker dannon.ba...@gmail.com wrote: Ketan, Have you taken a look at galaxy's built-in parallelism framework? For a great current example of a tool using this, look at Peter's NCBI BLAST+ wrappers. https://github.com/peterjc/galaxy_blast -Dannon ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] running tools within tool
Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Thu, Feb 6, 2014 at 9:42 AM, Dannon Baker dannon.ba...@gmail.com wrote: Ketan, Have you taken a look at galaxy's built-in parallelism framework? For a great current example of a tool using this, look at Peter's NCBI BLAST+ wrappers. https://github.com/peterjc/galaxy_blast -Dannon On Thu, Feb 6, 2014 at 10:32 AM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi John, Alex, All, Elaborating on the motivation behind my question of running tools within tool. First, running a tool in parallel at large-scale. For example, if I need to find a pattern from 1000 files via Galaxy Select tool from Text and Filter tool-group, I am limited by providing one file at a time to the tool which will take a long time to finish. Please correct me if there is a more sophisticated way to approach this problem. Second, related concern is running a tool in parallel on one or more HPC resources. We want to write a generic wrapper Galaxy tool, powered by Swift parallel framework such that it can run any arbitrary Galaxy tool in parallel on HPC resources. Currently, we have developed this capability but for external executables which is not a most secure way of using Galaxy as I understand from previous conversation. Having such a wrapper tool in a standard way is desirable so that it preserves the tool contract and binding within Galaxy environment. That is maintaining the history and metadata conventions of Galaxy. Thanks, Ketan On Wed, Feb 5, 2014 at 3:53 PM, John Chilton chil...@msi.umn.edu wrote: Galaxy has an API that is capable of running tools - certainly this is one path forward on something like this. I am not sure it is the best path forward though. Probably the best way to enhance Galaxy's execution capabilities is to extend the Galaxy core framework itself - this has its own downsides though. If you can offer more details about how you would like to enhance Galaxy - what it cannot do that you would like it to do - I or others may be able to provide more specific ideas. Otherwise, sorry I have not been or more help. -John On Tue, Feb 4, 2014 at 2:51 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Error introduced with Fastq Groomer
Hi Philippe, I’m unable to suggest a reason as to why this has happened, other than some sort of corruption whilst the job was running, but I would point out two things to you. 1. I don’t think you need to run the fastq groomer on your data anyway as it’s in Illumina 1.8+ format, which should already be in fastqsanger format. 2. It appears that the fastq groomer hasn’t worked as the quality scores haven’t changed format. (A general question to anyone here – will fastq groomer change the quality format of reads that are already in fastqsanger format?) Cheers, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From: Philippe Moncuquet philippe.m...@gmail.commailto:philippe.m...@gmail.com Date: Monday, 10 February 2014 03:50 To: Galaxy Dev galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] Error introduced with Fastq Groomer Hi, Some unexpected symbols were introduced while grooming my fastq file Before @DJTPB5M1:327:C3PC4ACXX:6:1104:9355:84986 1:N:0:GTCCGC GAGCCTTGCTAGGAGAGGGAAGGTGGAAGATCATCATTTCCAGGAGAGCACTGCTAGCAGGAAGCCACGTCTGCATTACACGCTTCATTAGGGACTTCCC + @@@FFFHHHE@=FDEGCCG2A7CDFHEF:B?BDEGGHGICHC9B@FGEHEGG;F=GHI==CE:;BBCC@CC;8=?=CA;ACC After @DJTPB5M1:327:C3PC4ACXX:6:1104:9355:84986 1:N:0:GTCCGC GAGCCTTGCTAGGAGAGGGAAGGTGGAAGATCATCATTTCCAGGAGAGCACTGCTAGCAGGAAGCCACG+1�CATTACACGCTTCATTAGGGACTTCCC + @@@FFFHHHE@=FDEGCCG2A7CDFHEF:B?BDEGGHGICHC9B@FGEHEGG;F=GHI==CE:;BBCC@CC;8=?=CA;ACC I relaunch this step without being able to reproduce the bug. Any ideas about this problem ? Have you guys came across the same problem before ? Regards, Philip ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Error introduced with Fastq Groomer
The groomer was recently migrated to the tool shed - this has not been released as part of a galaxy-dist though so I assume you are still running a version of the fastq groomer bundled with Galaxy? If yes, what version of Galaxy are you running (i.e. can you attach the output of hg summary)? -John On Mon, Feb 10, 2014 at 7:00 AM, graham etherington (TSL) graham.ethering...@sainsbury-laboratory.ac.uk wrote: Hi Philippe, I’m unable to suggest a reason as to why this has happened, other than some sort of corruption whilst the job was running, but I would point out two things to you. 1. I don’t think you need to run the fastq groomer on your data anyway as it’s in Illumina 1.8+ format, which should already be in fastqsanger format. 2. It appears that the fastq groomer hasn’t worked as the quality scores haven’t changed format. (A general question to anyone here – will fastq groomer change the quality format of reads that are already in fastqsanger format?) Cheers, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From: Philippe Moncuquet philippe.m...@gmail.com Date: Monday, 10 February 2014 03:50 To: Galaxy Dev galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] Error introduced with Fastq Groomer Hi, Some unexpected symbols were introduced while grooming my fastq file Before @DJTPB5M1:327:C3PC4ACXX:6:1104:9355:84986 1:N:0:GTCCGC GAGCCTTGCTAGGAGAGGGAAGGTGGAAGATCATCATTTCCAGGAGAGCACTGCTAGCAGGAAGCCACGTCTGCATTACACGCTTCATTAGGGACTTCCC + @@@FFFHHHE@=FDEGCCG2A7CDFHEF:B?BDEGGHGICHC9B@FGEHEGG;F=GHI==CE:;BBCC@CC;8=?=CA;ACC After @DJTPB5M1:327:C3PC4ACXX:6:1104:9355:84986 1:N:0:GTCCGC GAGCCTTGCTAGGAGAGGGAAGGTGGAAGATCATCATTTCCAGGAGAGCACTGCTAGCAGGAAGCCACG+1�CATTACACGCTTCATTAGGGACTTCCC + @@@FFFHHHE@=FDEGCCG2A7CDFHEF:B?BDEGGHGICHC9B@FGEHEGG;F=GHI==CE:;BBCC@CC;8=?=CA;ACC I relaunch this step without being able to reproduce the bug. Any ideas about this problem ? Have you guys came across the same problem before ? Regards, Philip ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Shed_tools couldn't installed due to lack of proxy support. Change in lib/tool_shed/util/common_util.py solves the problem.
Dear Galaxians, I am behind a proxy with authentication, and I think I will contribute a wiki page on how to install and configure galaxy for saving other people's time... In this mail however I would like to propose a change in the lib/tool_shed/util/common_util.py The problem I had was that not all tool wrappers could be installed using the web interface since I was always getting an error host not found generated by the urllib2.py call From that error message (found in paster.log) it was clear that the problem was due to the fact I am behind a firewall with authentication, and the http_proxy variable in the run.sh was not sufficient. I then modified lib/tool_shed/util/common_util.py following the instructions that I have found in http://stackoverflow.com/questions/34079/how-to-specify-an-authenticated-proxy-for-a-python-http-connection and then now finally it works. Hereafter is what I've modified and it is certainly NOT the appropriate way of doing it from a proper IT point of view (since is not good practice to encode passwords in a source code). Furthermore, with the next mercurial update my code changes will disappear ... I really hope that therefore someone of your smart people could modify the codebase accordingly, perhaps reusing the http_proxy ... proxy_user and proxy_password system variables :-? -- def tool_shed_get( app, tool_shed_url, uri ): Make contact with the tool shed via the uri provided. registry = app.tool_shed_registry #CHANGES FOR PROXY AUTH password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, 'http://myproxy.domain.com:8080','MYDOMAIN\MYUSER' , 'MYPASS') proxy_handler=urllib2.ProxyHandler({'http': 'http://myproxy.domain.com:8080'}) proxy_auth_handler=urllib2.ProxyBasicAuthHandler(password_mgr) urlopener = urllib2.build_opener(proxy_handler,proxy_auth_handler) #PREVIOUS CODE #urlopener = urllib2.build_opener() #password_mgr = registry.password_manager_for_url( tool_shed_url ) #if password_mgr is not None: #auth_handler = urllib2.HTTPBasicAuthHandler( password_mgr ) #urlopener.add_handler( auth_handler ) response = urlopener.open( uri ) content = response.read() response.close() return content ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Feb 10, 2014 Galaxy Distribution News Brief
Feb 10, 2014 Galaxy Distribution News Brief https://wiki.galaxyproject.org/News/2014_02_10_Galaxy_Distribution // *CompleteNews Brief https://wiki.galaxyproject.org/DevNewsBriefs/2014_02_10* *Highlights:* * Visualization upgrades, including Trackster CSS styling * Multiple Tools migrated to the Tool Shed for a leaner distribution * Redesign of UI rendering: new icons, new font, history pane updates * API functionality upgrades featuring a new master admin API key and * Tool Shed updates a focus on repository metadata, displays, installs, and tests * Over 35 new community contributions added http://getgalaxy.org http://getgalaxy.org/ http://bitbucket.org/galaxy/galaxy-dist http://galaxy-dist.readthedocs.org http://galaxy-dist.readthedocs.org/ new: $ hg clone https://bitbucket.org/galaxy/galaxy-dist#stable upgrade: $ hg pull $ hg update release_2014.02.10 /Thanks for using Galaxy!/ The Galaxy Team https://wiki.galaxyproject.org/Galaxy%20Team ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/