[galaxy-dev] Tool wrapper XSD

2013-02-19 Thread Pierre Pericard

Hi everyone,

Is there a Galaxy XML tool wrapper XSD ?

Thanks,
Pierre


--
Pierre Pericard
IE CDD - Projet Peptisan
Service Informatique et Bio-informatique (SIB)

Station Biologique de Roscoff
CNRS - UPMC
Place Georges Teissier
CS 90074
29688 ROSCOFF CEDEX
FRANCE
Tel : (+33) 2 98 29 56 46
http://abims.sb-roscoff.fr/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Best way to work with one directory and many files as 1 input

2013-01-31 Thread Pierre Pericard

Hi Christos,

Yes, I would be very interested by your modified file.

I'm forwarding to the mailing-list so this bug can be listed.

Thanks,

Pierre



Pierre Pericard
IE CDD - Projet Peptisan
Service Informatique et Bio-informatique (SIB)

Station Biologique de Roscoff
CNRS - UPMC
Place Georges Teissier
CS 90074
29688 ROSCOFF CEDEX
FRANCE
Tel : (+33) 2 98 29 56 46
http://abims.sb-roscoff.fr/

Le 01/02/2013 00:34, chriskan...@gmail.com a écrit :

Hi,

I'm working on the same thing, having a composite dataset with many files in 
many sub-folders.

Well during the development  testing of the tool that uses this composite 
datatype, I found out that Galaxy does not handle correctly sub-dirs, it works 
pretty good with files but if it finds a sub-dir then the copy messes it up.

To be more specific it uses shutil.copy without checking if it is file or 
directory, when coping dirs it has to use shutil.copytree.

I managed to make our local Galaxy installation to handle them quite ok with 
some modifications in method DiskObjectStor.update_from_file(...) located in 
lib/galaxy/objectstore/__init__.py

I can send you a copy of the file I've modified.

Regards,
Christos

--

Christos Kannas
Researcher
Ph.D Student

e-Health Laboratory
Dept. Computer Science,
University of Cyprus

quote author='Pierre Pericard'
Ok, thanks a lot, I'll try and get back to the mailing list if other
problems seem to occur.

Pierre.


Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

Le 30/01/2013 11:45, Ross a écrit :

I'd suggest:
1) Make your new datatype a subclass of Html - it's a subclass of
composite that contains an HTML document as the object's native
display - so it can inform users what's there.

2) When constructing these new things, pass the file_path of the Html
(composite) dataset subclass to your wrapper on the command line

3) Your wrapper code can construct any arbitrary structure as long as
it's rooted in that directory - Galaxy stores it without any fuss. The
wrapper should also populate the Html file itself with nicely
laid annotation for the user to check out.

4) The key is that all tools that take this new datatype as input must
know how to decode this structure - they must be passed the
$input.extra_files_path which gives them that same path root.

5) Yes, it's odd and annoying that it's extra_files_path for
files_path. Go figure.

6) grep extra_files tools/*.xml to find some examples - I think the
velvetg one uses a complex subdirectory structure - but it doesn't
really matter - as long as your tools know how to deal with it, it's
just a directory to Galaxy!
I hope all this helps...


On Wed, Jan 30, 2013 at 8:22 PM, Pierre Pericard
pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr
wrote:

 In that case, could anyone point me to an example of a Composite
 Datatype which could accept as input an unknown number of files in
 an unknown number of directories. I can't seem to understand how
 that would work based on the wiki.

 But maybe are we anticipating a near functionality of Galaxy.
 There were talks about changing the way Galaxy handle zip files,
 is it still on the table ?

 Thank in advance for any help,

 Pierre



 Pierre Pericard
 IE CDD - Projet Peptisan

 Service Informatique et Bio-informatique (SIB)
 Station Biologique de Roscoff
 CNRS-UPMC
 Place Georges Teissier
 CS 90074
 29688 Roscoff CEDEX
 FRANCE
 http://abims.sb-roscoff.fr/

 Le 29/01/2013 18:04, Peter Cock a écrit :

 On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard
 pierre.peric...@sb-roscoff.fr
 mailto:pierre.peric...@sb-roscoff.fr wrote:

 If I'm not mistaking, Composite Datatypes allow for only
 one directory,
 whereas we need to keep a constant directory structure
 with 2 or more
 sub-directories containing our input files.

 I'm not sure if that is true - the example of HTML output with
 images
 comes to mind as a common use-case where subfolder(s) would be
 expected. I've only had limited first hand experience with
 Galaxy's
 composite datatypes myself though.

 We have no way to change these tools behavior (obviously
 not Galaxy-friendly
 ;-) ) and therefore need to maintain this structure in the
 job working
 directory.

 Perhaps a tool wrapper could create a dummy folder using symlinks
 (faster and less wasted disk than copying files), but that
 isn't ideal.

 Peter


 ___



___
Please keep all replies on the list by using reply all

Re: [galaxy-dev] Best way to work with one directory and many files as 1 input

2013-01-30 Thread Pierre Pericard
In that case, could anyone point me to an example of a Composite 
Datatype which could accept as input an unknown number of files in an 
unknown number of directories. I can't seem to understand how that would 
work based on the wiki.


But maybe are we anticipating a near functionality of Galaxy. There were 
talks about changing the way Galaxy handle zip files, is it still on the 
table ?


Thank in advance for any help,

Pierre


Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

Le 29/01/2013 18:04, Peter Cock a écrit :

On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard
pierre.peric...@sb-roscoff.fr wrote:

If I'm not mistaking, Composite Datatypes allow for only one directory,
whereas we need to keep a constant directory structure with 2 or more
sub-directories containing our input files.

I'm not sure if that is true - the example of HTML output with images
comes to mind as a common use-case where subfolder(s) would be
expected. I've only had limited first hand experience with Galaxy's
composite datatypes myself though.


We have no way to change these tools behavior (obviously not Galaxy-friendly
;-) ) and therefore need to maintain this structure in the job working
directory.

Perhaps a tool wrapper could create a dummy folder using symlinks
(faster and less wasted disk than copying files), but that isn't ideal.

Peter


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Best way to work with one directory and many files as 1 input

2013-01-30 Thread Pierre Pericard
Ok, thanks a lot, I'll try and get back to the mailing list if other 
problems seem to occur.


Pierre.


Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

Le 30/01/2013 11:45, Ross a écrit :

I'd suggest:
1) Make your new datatype a subclass of Html - it's a subclass of 
composite that contains an HTML document as the object's native 
display - so it can inform users what's there.


2) When constructing these new things, pass the file_path of the Html 
(composite) dataset subclass to your wrapper on the command line


3) Your wrapper code can construct any arbitrary structure as long as 
it's rooted in that directory - Galaxy stores it without any fuss. The 
wrapper should also populate the Html file itself with nicely

laid annotation for the user to check out.

4) The key is that all tools that take this new datatype as input must 
know how to decode this structure - they must be passed the 
$input.extra_files_path which gives them that same path root.


5) Yes, it's odd and annoying that it's extra_files_path for 
files_path. Go figure.


6) grep extra_files tools/*.xml to find some examples - I think the 
velvetg one uses a complex subdirectory structure - but it doesn't 
really matter - as long as your tools know how to deal with it, it's 
just a directory to Galaxy!

I hope all this helps...


On Wed, Jan 30, 2013 at 8:22 PM, Pierre Pericard 
pierre.peric...@sb-roscoff.fr mailto:pierre.peric...@sb-roscoff.fr 
wrote:


In that case, could anyone point me to an example of a Composite
Datatype which could accept as input an unknown number of files in
an unknown number of directories. I can't seem to understand how
that would work based on the wiki.

But maybe are we anticipating a near functionality of Galaxy.
There were talks about changing the way Galaxy handle zip files,
is it still on the table ?

Thank in advance for any help,

Pierre



Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

Le 29/01/2013 18:04, Peter Cock a écrit :

On Tue, Jan 29, 2013 at 4:58 PM, Pierre Pericard
pierre.peric...@sb-roscoff.fr
mailto:pierre.peric...@sb-roscoff.fr wrote:

If I'm not mistaking, Composite Datatypes allow for only
one directory,
whereas we need to keep a constant directory structure
with 2 or more
sub-directories containing our input files.

I'm not sure if that is true - the example of HTML output with
images
comes to mind as a common use-case where subfolder(s) would be
expected. I've only had limited first hand experience with
Galaxy's
composite datatypes myself though.

We have no way to change these tools behavior (obviously
not Galaxy-friendly
;-) ) and therefore need to maintain this structure in the
job working
directory.

Perhaps a tool wrapper could create a dummy folder using symlinks
(faster and less wasted disk than copying files), but that
isn't ideal.

Peter


___



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Best way to work with one directory and many files as 1 input

2013-01-29 Thread Pierre Pericard

Hi all,

We've just added some new tools based on R scripts to our local Galaxy 
instance.


Most of these tools need to work at the root of the directory containing 
the input files (up to hundreds of XML files) spread among two or more 
sub-directories. The directory structure need to be kept since the R 
tools recursively search for files and use the subdirectories names as 
classes.


To solve this problem we added a dummy datatype to our instance so we 
can upload the input directory as a zip file without Galaxy 
decompressing it.


datatype extension=dummy_zip type=galaxy.datatypes.data:Data 
mimetype=application/zip display_in_upload=true subclass=true /


However, since our tools can be runned as a workflow and that most of 
them need this input directory we need to unzip it with R in the job 
working directory for each tool (about 5 times for the entire workflow).


Furthermore, this solution doesn't seem very clean if we want to share 
our tools via the ToolShed.


Is there a smart way to handle this kind of input directory that can be 
achieved with Galaxy default datatypes and/or that doesn't require to 
unzip a file each time we use a tool ?


Any update on a behavior change about zip files 
(http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-June/005631.html) ?


Thanks in advance for any input,

Pierre

--
Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Best way to work with one directory and many files as 1 input

2013-01-29 Thread Pierre Pericard
If I'm not mistaking, Composite Datatypes allow for only one directory, 
whereas we need to keep a constant directory structure with 2 or more 
sub-directories containing our input files.


We have no way to change these tools behavior (obviously not 
Galaxy-friendly ;-) ) and therefore need to maintain this structure in 
the job working directory.


Pierre.


Pierre Pericard
IE CDD - Projet Peptisan

Service Informatique et Bio-informatique (SIB)
Station Biologique de Roscoff
CNRS-UPMC
Place Georges Teissier
CS 90074
29688 Roscoff CEDEX
FRANCE
http://abims.sb-roscoff.fr/

Le 29/01/2013 17:47, Peter Cock a écrit :

On Tue, Jan 29, 2013 at 4:41 PM, Pierre Pericard
pierre.peric...@sb-roscoff.fr wrote:

Hi all,

We've just added some new tools based on R scripts to our local Galaxy
instance.

Most of these tools need to work at the root of the directory containing the
input files (up to hundreds of XML files) spread among two or more
sub-directories. The directory structure need to be kept since the R tools
recursively search for files and use the subdirectories names as classes.

To solve this problem we added a dummy datatype to our instance so we can
upload the input directory as a zip file without Galaxy decompressing it.

Have you looked at a composite datatype instead, where the files are
stored on disk decompressed?

http://wiki.galaxyproject.org/Admin/Datatypes/Composite%20Datatypes

Peter


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Variable output number in workflows

2012-11-30 Thread Pierre Pericard

Dear all,

We've been actively developping xml wrappers for new programs in our 
instance of Galaxy and we encountered some problems with multiple 
outputs number while using these tools in a workflow.


1/ In the first case, inputs can be 1 single file or 2 paired files and 
the program output name is different depending on which case. The input 
problem is resolved using conditional and following 
http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files we 
implemented the outputs as such:


outputs
data name=nameSorted.single.bam format=bam 
from_work_dir=alignReads/alignReads.nameSorted.bam label=nameSorted 
single 

filterinputs['paired_or_single'] == 'single'/filter
/data
data name=nameSorted.paired.bam format=bam 
from_work_dir=alignReads/alignReads.nameSorted.PropMapPairsForRSEM.bam 
label=nameSorted paired 

filterinputs['paired_or_single'] == 'paired'/filter
/data
/outputs

However, when using this tool in a workflow, the toolbox always presents 
the 2 outputs, whether we choose 1 or 2 inputs. Is there any way to have 
only one output in the toolbox but from a different file depending on 
the input ?


2/ the second case is very similar to the one described in 
http://dev.list.galaxyproject.org/outputting-different-numbers-of-files-based-on-variables-td4141375.html. 
When we input a single file the output is also a single file, and when 
we input two paired files the program outputs two files.
Thanks again to the conditional tag the input problem was taken care 
of, but we would like the workflow toolbox to present 1 or 2 outputs 
(ideally with all 3 names different) depending on how many files are in 
the input.
Is there any way to do so other than making 2 different versions of the 
xml or displaying all 3 outputs in the workflow toolbox ?


Thanks in advance,


Pierre


--
Pierre Pericard
IE CDD - Projet Peptisan
Service Informatique et Bio-informatique (SIB)
Station Biologique - CNRS-UPMC
Place Georges Teissier, 29680 Roscoff
FRANCE
http://abims.sb-roscoff.fr/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/