[galaxy-dev] Tool wrapper XSD
Hi everyone, Is there a Galaxy XML tool wrapper XSD ? Thanks, Pierre -- Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS - UPMC Place Georges Teissier CS 90074 29688 ROSCOFF CEDEX FRANCE Tel : (+33) 2 98 29 56 46 http://abims.sb-roscoff.fr/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Best way to work with one directory and many files as 1 input
Hi Christos, Yes, I would be very interested by your modified file. I'm forwarding to the mailing-list so this bug can be listed. Thanks, Pierre Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS - UPMC Place Georges Teissier CS 90074 29688 ROSCOFF CEDEX FRANCE Tel : (+33) 2 98 29 56 46 http://abims.sb-roscoff.fr/ Le 01/02/2013 00:34, chriskan...@gmail.com a écrit : Hi, I'm working on the same thing, having a composite dataset with many files in many sub-folders. Well during the development testing of the tool that uses this composite datatype, I found out that Galaxy does not handle correctly sub-dirs, it works pretty good with files but if it finds a sub-dir then the copy messes it up. To be more specific it uses shutil.copy without checking if it is file or directory, when coping dirs it has to use shutil.copytree. I managed to make our local Galaxy installation to handle them quite ok with some modifications in method DiskObjectStor.update_from_file(...) located in lib/galaxy/objectstore/__init__.py I can send you a copy of the file I've modified. Regards, Christos -- Christos Kannas Researcher Ph.D Student e-Health Laboratory Dept. Computer Science, University of Cyprus quote author='Pierre Pericard' Ok, thanks a lot, I'll try and get back to the mailing list if other problems seem to occur. Pierre. Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 30/01/2013 11:45, Ross a écrit : I'd suggest: 1) Make your new datatype a subclass of Html - it's a subclass of composite that contains an HTML document as the object's native display - so it can inform users what's there. 2) When constructing these new things, pass the file_path of the Html (composite) dataset subclass to your wrapper on the command line 3) Your wrapper code can construct any arbitrary structure as long as it's rooted in that directory - Galaxy stores it without any fuss. The wrapper should also populate the Html file itself with nicely laid annotation for the user to check out. 4) The key is that all tools that take this new datatype as input must know how to decode this structure - they must be passed the $input.extra_files_path which gives them that same path root. 5) Yes, it's odd and annoying that it's extra_files_path for files_path. Go figure. 6) grep extra_files tools/*.xml to find some examples - I think the velvetg one uses a complex subdirectory structure - but it doesn't really matter - as long as your tools know how to deal with it, it's just a directory to Galaxy! I hope all this helps... On Wed, Jan 30, 2013 at 8:22 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr wrote: In that case, could anyone point me to an example of a Composite Datatype which could accept as input an unknown number of files in an unknown number of directories. I can't seem to understand how that would work based on the wiki. But maybe are we anticipating a near functionality of Galaxy. There were talks about changing the way Galaxy handle zip files, is it still on the table ? Thank in advance for any help, Pierre Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 29/01/2013 18:04, Peter Cock a écrit : On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr wrote: If I'm not mistaking, Composite Datatypes allow for only one directory, whereas we need to keep a constant directory structure with 2 or more sub-directories containing our input files. I'm not sure if that is true - the example of HTML output with images comes to mind as a common use-case where subfolder(s) would be expected. I've only had limited first hand experience with Galaxy's composite datatypes myself though. We have no way to change these tools behavior (obviously not Galaxy-friendly ;-) ) and therefore need to maintain this structure in the job working directory. Perhaps a tool wrapper could create a dummy folder using symlinks (faster and less wasted disk than copying files), but that isn't ideal. Peter ___ ___ Please keep all replies on the list by using reply all
Re: [galaxy-dev] Best way to work with one directory and many files as 1 input
In that case, could anyone point me to an example of a Composite Datatype which could accept as input an unknown number of files in an unknown number of directories. I can't seem to understand how that would work based on the wiki. But maybe are we anticipating a near functionality of Galaxy. There were talks about changing the way Galaxy handle zip files, is it still on the table ? Thank in advance for any help, Pierre Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 29/01/2013 18:04, Peter Cock a écrit : On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr wrote: If I'm not mistaking, Composite Datatypes allow for only one directory, whereas we need to keep a constant directory structure with 2 or more sub-directories containing our input files. I'm not sure if that is true - the example of HTML output with images comes to mind as a common use-case where subfolder(s) would be expected. I've only had limited first hand experience with Galaxy's composite datatypes myself though. We have no way to change these tools behavior (obviously not Galaxy-friendly ;-) ) and therefore need to maintain this structure in the job working directory. Perhaps a tool wrapper could create a dummy folder using symlinks (faster and less wasted disk than copying files), but that isn't ideal. Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Best way to work with one directory and many files as 1 input
Ok, thanks a lot, I'll try and get back to the mailing list if other problems seem to occur. Pierre. Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 30/01/2013 11:45, Ross a écrit : I'd suggest: 1) Make your new datatype a subclass of Html - it's a subclass of composite that contains an HTML document as the object's native display - so it can inform users what's there. 2) When constructing these new things, pass the file_path of the Html (composite) dataset subclass to your wrapper on the command line 3) Your wrapper code can construct any arbitrary structure as long as it's rooted in that directory - Galaxy stores it without any fuss. The wrapper should also populate the Html file itself with nicely laid annotation for the user to check out. 4) The key is that all tools that take this new datatype as input must know how to decode this structure - they must be passed the $input.extra_files_path which gives them that same path root. 5) Yes, it's odd and annoying that it's extra_files_path for files_path. Go figure. 6) grep extra_files tools/*.xml to find some examples - I think the velvetg one uses a complex subdirectory structure - but it doesn't really matter - as long as your tools know how to deal with it, it's just a directory to Galaxy! I hope all this helps... On Wed, Jan 30, 2013 at 8:22 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr wrote: In that case, could anyone point me to an example of a Composite Datatype which could accept as input an unknown number of files in an unknown number of directories. I can't seem to understand how that would work based on the wiki. But maybe are we anticipating a near functionality of Galaxy. There were talks about changing the way Galaxy handle zip files, is it still on the table ? Thank in advance for any help, Pierre Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 29/01/2013 18:04, Peter Cock a écrit : On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr wrote: If I'm not mistaking, Composite Datatypes allow for only one directory, whereas we need to keep a constant directory structure with 2 or more sub-directories containing our input files. I'm not sure if that is true - the example of HTML output with images comes to mind as a common use-case where subfolder(s) would be expected. I've only had limited first hand experience with Galaxy's composite datatypes myself though. We have no way to change these tools behavior (obviously not Galaxy-friendly ;-) ) and therefore need to maintain this structure in the job working directory. Perhaps a tool wrapper could create a dummy folder using symlinks (faster and less wasted disk than copying files), but that isn't ideal. Peter ___ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Best way to work with one directory and many files as 1 input
Hi all, We've just added some new tools based on R scripts to our local Galaxy instance. Most of these tools need to work at the root of the directory containing the input files (up to hundreds of XML files) spread among two or more sub-directories. The directory structure need to be kept since the R tools recursively search for files and use the subdirectories names as classes. To solve this problem we added a dummy datatype to our instance so we can upload the input directory as a zip file without Galaxy decompressing it. datatype extension=dummy_zip type=galaxy.datatypes.data:Data mimetype=application/zip display_in_upload=true subclass=true / However, since our tools can be runned as a workflow and that most of them need this input directory we need to unzip it with R in the job working directory for each tool (about 5 times for the entire workflow). Furthermore, this solution doesn't seem very clean if we want to share our tools via the ToolShed. Is there a smart way to handle this kind of input directory that can be achieved with Galaxy default datatypes and/or that doesn't require to unzip a file each time we use a tool ? Any update on a behavior change about zip files (http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-June/005631.html) ? Thanks in advance for any input, Pierre -- Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Best way to work with one directory and many files as 1 input
If I'm not mistaking, Composite Datatypes allow for only one directory, whereas we need to keep a constant directory structure with 2 or more sub-directories containing our input files. We have no way to change these tools behavior (obviously not Galaxy-friendly ;-) ) and therefore need to maintain this structure in the job working directory. Pierre. Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique de Roscoff CNRS-UPMC Place Georges Teissier CS 90074 29688 Roscoff CEDEX FRANCE http://abims.sb-roscoff.fr/ Le 29/01/2013 17:47, Peter Cock a écrit : On Tue, Jan 29, 2013 at 4:41 PM, Pierre Pericard pierre.peric...@sb-roscoff.fr wrote: Hi all, We've just added some new tools based on R scripts to our local Galaxy instance. Most of these tools need to work at the root of the directory containing the input files (up to hundreds of XML files) spread among two or more sub-directories. The directory structure need to be kept since the R tools recursively search for files and use the subdirectories names as classes. To solve this problem we added a dummy datatype to our instance so we can upload the input directory as a zip file without Galaxy decompressing it. Have you looked at a composite datatype instead, where the files are stored on disk decompressed? http://wiki.galaxyproject.org/Admin/Datatypes/Composite%20Datatypes Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Variable output number in workflows
Dear all, We've been actively developping xml wrappers for new programs in our instance of Galaxy and we encountered some problems with multiple outputs number while using these tools in a workflow. 1/ In the first case, inputs can be 1 single file or 2 paired files and the program output name is different depending on which case. The input problem is resolved using conditional and following http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files we implemented the outputs as such: outputs data name=nameSorted.single.bam format=bam from_work_dir=alignReads/alignReads.nameSorted.bam label=nameSorted single filterinputs['paired_or_single'] == 'single'/filter /data data name=nameSorted.paired.bam format=bam from_work_dir=alignReads/alignReads.nameSorted.PropMapPairsForRSEM.bam label=nameSorted paired filterinputs['paired_or_single'] == 'paired'/filter /data /outputs However, when using this tool in a workflow, the toolbox always presents the 2 outputs, whether we choose 1 or 2 inputs. Is there any way to have only one output in the toolbox but from a different file depending on the input ? 2/ the second case is very similar to the one described in http://dev.list.galaxyproject.org/outputting-different-numbers-of-files-based-on-variables-td4141375.html. When we input a single file the output is also a single file, and when we input two paired files the program outputs two files. Thanks again to the conditional tag the input problem was taken care of, but we would like the workflow toolbox to present 1 or 2 outputs (ideally with all 3 names different) depending on how many files are in the input. Is there any way to do so other than making 2 different versions of the xml or displaying all 3 outputs in the workflow toolbox ? Thanks in advance, Pierre -- Pierre Pericard IE CDD - Projet Peptisan Service Informatique et Bio-informatique (SIB) Station Biologique - CNRS-UPMC Place Georges Teissier, 29680 Roscoff FRANCE http://abims.sb-roscoff.fr/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/