[galaxy-dev] select with a value preselected
Hello, In my tool UI, I have a select tool as follows: param name=site type=select multiple=true label=Execution Location help=Multi-select list - hold the appropriate key while clicking to select multiple items option value=localhostLocalhost/option option value=midwayMidway/option option value=uc3UC3/option option value=stampedeStampede/option option value=tukeyTukey/option /param How do I tell it to preselect value localhost by default? Currently, user has to explicitly select the a value but if she forgets to do so the tool breaks because nothing is selected by default. Thanks, -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] select with a value preselected
Great! Thanks. On Wed, Feb 19, 2014 at 2:18 PM, Saket Choudhary sake...@gmail.com wrote: Hi Ketan, You can specify selected=true[1] [1] https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Coption.3E_tag_set On 19 February 2014 20:14, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hello, In my tool UI, I have a select tool as follows: param name=site type=select multiple=true label=Execution Location help=Multi-select list - hold the appropriate key while clicking to select multiple items option value=localhostLocalhost/option option value=midwayMidway/option option value=uc3UC3/option option value=stampedeStampede/option option value=tukeyTukey/option /param How do I tell it to preselect value localhost by default? Currently, user has to explicitly select the a value but if she forgets to do so the tool breaks because nothing is selected by default. Thanks, -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] space in text tool results in two arguments
Hi Ross, I did try to use double quotes and curly braces: ${outloc} but it does not seem to address the issue. Thanks, Ketan On Mon, Feb 17, 2014 at 10:37 PM, Ross ross.laza...@gmail.com wrote: Hi Ketan. Please try quotation marks to enclose any parameter containing spaces in the tool command template - eg something like: python myscript.py $text_with_spaces $param2 $param3 Please confirm that this solves the problem? On Tue, Feb 18, 2014 at 12:47 PM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi, My tool in galaxy accepts text argument which can have zero or more spaces depending on user requirements. When user inputs one word it is parsed fine but in the case when user inputs more words separated by space it results in multiple separate arguments which messes with the way I am treating commandline args in my script. Is it possible to tell Galaxy to treat textbox as a single argument irrespective of spaces in the values provided? Thanks for any suggestions. Best, -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] setting up Galaxy for torque pbs
Hi, I am trying to set up Galaxy to interface with a Cray system which runs Torque/PBS. After reading this Galaxy wiki page: https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#PBS I was able to scramble torque egg with the following command: LIBTORQUE_DIR=/opt/torque/2.4.11/lib/libtorque.so.2 python scripts/scramble.py -e pbs_python I do not fully understand the parameters and configuration part in PBS section of the wiki page. Where exactly the runner XML snippet with plugin and destinations should be placed? Also wondering if the snippet will be required for all tools that I want to run on compute nodes or is it a global setting? Can I tell existing tools to use this setup? Another question is, can I configure Galaxy to submit jobs to compute cluster as a mortal user and not an admin of the system? Thanks, -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] running tools within tool
Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Thu, Feb 6, 2014 at 9:42 AM, Dannon Baker dannon.ba...@gmail.com wrote: Ketan, Have you taken a look at galaxy's built-in parallelism framework? For a great current example of a tool using this, look at Peter's NCBI BLAST+ wrappers. https://github.com/peterjc/galaxy_blast -Dannon On Thu, Feb 6, 2014 at 10:32 AM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi John, Alex, All, Elaborating on the motivation behind my question of running tools within tool. First, running a tool in parallel at large-scale. For example, if I need to find a pattern from 1000 files via Galaxy Select tool from Text and Filter tool-group, I am limited by providing one file at a time to the tool which will take a long time to finish. Please correct me if there is a more sophisticated way to approach this problem. Second, related concern is running a tool in parallel on one or more HPC resources. We want to write a generic wrapper Galaxy tool, powered by Swift parallel framework such that it can run any arbitrary Galaxy tool in parallel on HPC resources. Currently, we have developed this capability but for external executables which is not a most secure way of using Galaxy as I understand from previous conversation. Having such a wrapper tool in a standard way is desirable so that it preserves the tool contract and binding within Galaxy environment. That is maintaining the history and metadata conventions of Galaxy. Thanks, Ketan On Wed, Feb 5, 2014 at 3:53 PM, John Chilton chil...@msi.umn.edu wrote: Galaxy has an API that is capable of running tools - certainly this is one path forward on something like this. I am not sure it is the best path forward though. Probably the best way to enhance Galaxy's execution capabilities is to extend the Galaxy core framework itself - this has its own downsides though. If you can offer more details about how you would like to enhance Galaxy - what it cannot do that you would like it to do - I or others may be able to provide more specific ideas. Otherwise, sorry I have not been or more help. -John On Tue, Feb 4, 2014 at 2:51 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] running tools within tool
Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Sun, Feb 9, 2014 at 7:50 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Thanks Dannon for the reference. I checked out the tool and installed from toolshed on my local Galaxy instance. I also checked out the related paper which refers that the Blast executables run in parallel by partitioning the input files into fragments and running batches in parallel. That sounds cool. I browsed the code but could not find the exact mechanism. Is the parallelism at workflow level aka branch parallelism or is it at the tool level that is the tool invokes parallel code? Thanks, Ketan On Thu, Feb 6, 2014 at 9:42 AM, Dannon Baker dannon.ba...@gmail.comwrote: Ketan, Have you taken a look at galaxy's built-in parallelism framework? For a great current example of a tool using this, look at Peter's NCBI BLAST+ wrappers. https://github.com/peterjc/galaxy_blast -Dannon On Thu, Feb 6, 2014 at 10:32 AM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi John, Alex, All, Elaborating on the motivation behind my question of running tools within tool. First, running a tool in parallel at large-scale. For example, if I need to find a pattern from 1000 files via Galaxy Select tool from Text and Filter tool-group, I am limited by providing one file at a time to the tool which will take a long time to finish. Please correct me if there is a more sophisticated way to approach this problem. Second, related concern is running a tool in parallel on one or more HPC resources. We want to write a generic wrapper Galaxy tool, powered by Swift parallel framework such that it can run any arbitrary Galaxy tool in parallel on HPC resources. Currently, we have developed this capability but for external executables which is not a most secure way of using Galaxy as I understand from previous conversation. Having such a wrapper tool in a standard way is desirable so that it preserves the tool contract and binding within Galaxy environment. That is maintaining the history and metadata conventions of Galaxy. Thanks, Ketan On Wed, Feb 5, 2014 at 3:53 PM, John Chilton chil...@msi.umn.eduwrote: Galaxy has an API that is capable of running tools - certainly this is one path forward on something like this. I am not sure it is the best path forward though. Probably the best way to enhance Galaxy's execution capabilities is to extend the Galaxy core framework itself - this has its own downsides though. If you can offer more details about how you would like to enhance Galaxy - what it cannot do that you would like it to do - I or others may be able to provide more specific ideas. Otherwise, sorry I have not been or more help. -John On Tue, Feb 4, 2014 at 2:51 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists
Re: [galaxy-dev] running tools within tool
Hi John, Alex, All, Elaborating on the motivation behind my question of running tools within tool. First, running a tool in parallel at large-scale. For example, if I need to find a pattern from 1000 files via Galaxy Select tool from Text and Filter tool-group, I am limited by providing one file at a time to the tool which will take a long time to finish. Please correct me if there is a more sophisticated way to approach this problem. Second, related concern is running a tool in parallel on one or more HPC resources. We want to write a generic wrapper Galaxy tool, powered by Swift parallel framework such that it can run any arbitrary Galaxy tool in parallel on HPC resources. Currently, we have developed this capability but for external executables which is not a most secure way of using Galaxy as I understand from previous conversation. Having such a wrapper tool in a standard way is desirable so that it preserves the tool contract and binding within Galaxy environment. That is maintaining the history and metadata conventions of Galaxy. Thanks, Ketan On Wed, Feb 5, 2014 at 3:53 PM, John Chilton chil...@msi.umn.edu wrote: Galaxy has an API that is capable of running tools - certainly this is one path forward on something like this. I am not sure it is the best path forward though. Probably the best way to enhance Galaxy's execution capabilities is to extend the Galaxy core framework itself - this has its own downsides though. If you can offer more details about how you would like to enhance Galaxy - what it cannot do that you would like it to do - I or others may be able to provide more specific ideas. Otherwise, sorry I have not been or more help. -John On Tue, Feb 4, 2014 at 2:51 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Cheetah code issue
Hi, While developing a tool, I am facing Cheetah code issue. I looked up the mailing list archive and found many similar issues reported earlier but could not debug this. Any tips in debugging is appreciated. File cheetah_DynamicallyCompiledCheetahTemplate_1391530084_29_71368.py, line 97, in respond NotFound: cannot find 'rngstart' Attached is the xml tool definition file. Best, Ketan tool id=swiftforeach name=foreach descriptionA generic tool to run executable via Swift foreach parallel construct/description command interpreter=bash #if $rangeorlist.rl == rng swiftforeachrange.sh $site $interpret $exec $rngstart $rend $stepsize $outloc $logfile $outlist #for $a in $arg ${a.argname} #end for #else swiftforeachlist.sh $site $interpret $exec $listfile $outloc $logfile $outlist #for $a in $arg ${a.argname} #end for #end if /command inputs param name=site type=select label=Execution Location option value=localhostLocalhost/option option value=midwayMidway/option option value=uc3UC3/option option value=stampedeStampede/option /param param name=interpret type=select label=Execution interpreter option value=shsh/option option value=pythonpython/option option value=javajava/option option value=RR/option option value=matlabmatlab/option /param param format=sh,binexec name=exec type=data label=Executable/ conditional name=rangeorlist param name=rl type=select label=Select range or list option value=rngnumeric range/option option value=lstitems list/option /param when value=rng param name=rngstart size=2 type=integer value=0 label=start/ param name=rend size=2 type=integer value=9 label=end/ param name=stepsize size=2 type=integer value=1 label=stepsize/ /when when value=lst param format=data name=listfile type=data label=List file/ /when /conditional repeat name=arg title=arg param name=argname type=text label=arg / /repeat param name=outloc size=50 type=text value=$HOME/swift-sandbox label=location for output files help=Swift will write output files here on disc sanitizer sanitize=False / /param conditional name=configuration param name=mode type=select label=Swift configuration option value=defaultdefault configuration/option option value=advancedadvanced configuration/option /param when value=default /when when value=advanced param name=remoteurl size=50 type=text value=midway.swift.rcc.uchicago.edu label=remote url help=remote resource to run jobs on/param param name=throttle size=5 type=float value=0.07 label=job throttle help=number of parallel jobs to run/param param name=project type=text value=TG-STA110005S label=Project allocation help=name/code of project/param param name=slots size=5 type=integer value=1 label=slots help=number of scheduler jobs/param param name=queue size=5 type=text value=normal label=queue help=queue to run jobs on/param param name=nodes size=5 type=integer value=1 label=nodes help=number of nodes requested/param param name=nodegranularity size=5 type=integer value=1 label=node granularity help=node granularity/param param name=jobspernode size=5 type=integer value=8 label=jobs per node help=number of jobs per node requested/param param name=ppn size=5 type=integer value=8 label=ppn help=processes per node requested/param param name=walltime size=5 type=text value=00:10:00 label=Job walltime help=time in hh:mm:ss to request to scheduler for this job/param param name=maxtime size=5 type=integer value=700 label=application maxtime help=Application maxtime in seconds for this job/param /when /conditional /inputs outputs data format=txt name=logfile type=data label=Swift output / data format=txt name=outlist type=data label=Output list / /outputs !-- code file=postprocess.py hook postprocess=writeoutlist / /code -- help .. class:: warningmark **TIP**. Add args to provide additional arguments to your executable. - **What it does** This is a generic Swift tool that runs an executable over a range of numbers with arbitrary stepsize. Execution location allows user to declare where to run the tool. Executable can be any arbitrary executable of type binexec uploaded by user. Start, end and stepsize are integer values. Note that the stepsize cannot be less than 1. Optionally,
Re: [galaxy-dev] Cheetah code issue
That did the trick! Thanks! On Tue, Feb 4, 2014 at 10:26 AM, bjoern.gruen...@googlemail.com bjoern.gruen...@gmail.com wrote: Hi, to access variables in a conditional you need to write something like that: $rangeorlist.rngstart $rangeorlist.rend $rangeorlist.stepsize Cheers, Bjoern 2014-02-04 Ketan Maheshwari ketancmaheshw...@gmail.com: Hi, While developing a tool, I am facing Cheetah code issue. I looked up the mailing list archive and found many similar issues reported earlier but could not debug this. Any tips in debugging is appreciated. File cheetah_DynamicallyCompiledCheetahTemplate_1391530084_29_71368.py, line 97, in respond NotFound: cannot find 'rngstart' Attached is the xml tool definition file. Best, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Running tools from within a tool
Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] running tools within tool
Hi, This is a question I posted to galaxy user mailing list a while back and was redirected to dev for possible answers: Is it possible in Galaxy to design a tool whose sole purpose is to run other tools. This is motivated by our desire to enhance execution capabilities of existing tools via a generic tool which acts as a wrapper. Thanks, Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] datatype for executables
Thanks for your answer. Yes, the idea is to have runnable executables such as executable binaries and shell scripts to run via a parallelizing tool Swift. As for security, current plan is to run Swift enabled Galaxy tools in controlled cloud instances, which we are already doing. With this datatype, we are planning to design a suite of generic tools such that any arbitrary executable can be parallelized within Galaxy and can be run on Clouds and remote clusters. Is it possible to write a type file bin_or_exe which can detect the executable bit of data before they are part of Galaxy's indexed data. Thanks, Ketan On Tue, Jan 28, 2014 at 2:42 AM, Peter Cock p.j.a.c...@googlemail.comwrote: On Tuesday, January 28, 2014, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Is there a data type in Galaxy that identifies executables uniquely, eg. from the executable bit in the file perms or some other way? Thanks, Galaxy's data types are for data files - runnable tools/executables are handled via XML tool wrappers which define their options etc. Are you really asking about creating a datatype for a binary executable file? Or letting users run arbitrary tools? Even the idea of electing users run an arbitrary R script is dangerous enough from a security point of view. Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] datatype for executables
Hi Peter, Thanks for the advice. I was trying to say that the potential users for this case will run the tool on: 1. cloud instances that they own 2. PBS/Torque/SLURM interfaced HPC resources which they will have authenticated access to. This means that say in the worse case if some one chooses to run a forkbomb, it will only kill her own resource. In my opinion this is no less secure than say I wrap a forkbomb into a torque script and submit it to my department cluster. I am accountable and traceable to any harm I do this way. The benefit to users on the other hand will be that they can easily test their arbitrary applications to run on a larger scale via the task-parallelism provided by Swift. Once a user is satisfied with the behavior of her task on a compute node via Galaxy, she can follow our recipe which will concretize her implementation as a tool to be used in practice. Were there any scenarios you had in mind that would lead to security issues? Thanks, Ketan On Tue, Jan 28, 2014 at 4:17 PM, Peter Cock p.j.a.c...@googlemail.comwrote: On Tue, Jan 28, 2014 at 8:26 PM, Ketan Maheshwari ke...@mcs.anl.gov wrote: Is it possible to write a type file bin_or_exe which can detect the executable bit of data before they are part of Galaxy's indexed data. Thanks, Ketan You haven't convinced me this is a good idea, but I would try this by defining a new datatype class in Python with a sniffer method which just checks for the executable bit (probably defined as a subclass of the binary datatype, see [1]) and then add this and its sniffer to the datatype XML file. Peter [1] https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/datatypes/binary.py -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] datatype for executables
Is there a data type in Galaxy that identifies executables uniquely, eg. from the executable bit in the file perms or some other way? Thanks, -- Ketan ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?
Hi Kyle, Swift indeed is a complete framework for distributed computing. Distributing files out to cluster nodes, starting processes, bringing back result files to submit host is done out of the box (stagein-exec-stageout cycle). We can discuss offline if you are interested in giving it a shot. Best, Ketan On Mon, Oct 28, 2013 at 4:14 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: You probably are a good person to get an opinion from. My plan isn't to write new frameworks, but rather use existing libraries that can communicate with Mesos to setup their parallel environments. But for Swift, you would probably want to write a new framework. Just looking at Swift, I imagine one of the harder parts is just getting the system setup on a cluster (ie distributing out files to remote nodes, making sure that you have a way to start processes on those nodes and have them know where to find the master), it seems like Swift could benefit from having a Mesos based framework. Do you think it would enable you to have a 'zero-config' startup of a distributed Swift application? Kyle On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari ketancmaheshw...@gmail.com wrote: Hi Kyle, We have a similar ongoing development wherein we are working on integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal is to enable Galaxy based applications to run on a variety of distributed resources via various integration schemes as suitable to application and underlying execution environment. Here is an abstract of a paper (co-authored with Ravi, who responded on this thread) we will be presenting in a workshop at the upcoming SC 13 conference: The Galaxy platform is a web-based science portal for scientific computing supporting Life Sciences users community. While user-friendly and intuitive for doing small to medium scale computations, it currently has a limited support for large-scale, parallel and distributed computing. The Swift parallel scripting framework is capable of composing ordinary applications into parallel scripts that can be run on multi-scale distributed and performance computing platforms. In complex distributed environments, often the user end of application lifecycle slows down because of the technical complexities brought in by the scale, access methods and resource management nuances. Galaxy offers a simple way of designing, composing, executing, reusing, and reproducing application runs. An integration between Swift and Galaxy systems can accelerate science as well as bring the respective user communities together in an interactive, user-friendly, parallel and distributed data analysis environment enabled on a broad range of computational infrastructures. Kindly let us know if you need a hands on for the various tools we have already developed. Best, Ketan On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott kellr...@soe.ucsc.eduwrote: I don't think implementation will be very difficult. The bigger question is this a technology people are open to? The nearest competitor is YARN ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts. Kyle On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote: Kyle This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward. Regards On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for. I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as: - Hadoop (https://github.com/mesos/hadoop) - MPI ( https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md ) - Spark (http://spark-project.org) And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python Integration with Galaxy would have three parts: 1) Add a system config variable to Galaxy called 'MESOS_URL