Re: [galaxy-dev] uploading multiple files into one dataset

2012-06-01 Thread Jorrit Boekel

Dear list,

Our lab has been outputting data in multiple files that we currently 
merge (in galaxy) by tarring them. This works fine with the parallel 
processing that Galaxy offers.


The problem, see also below, was to create a user-friendly way to not 
having to create 50-200 datasets in Galaxy but one containing all the 
merged files. I do not know if this functionality is something that 
people want to use, or if it goes against Galaxy design principles, but 
I have implemented it for our lab.


I have enabled a input type=file multiple (may not work in IE 
though)  in a separate FileField subclass, and the list of files that 
subsequently uploads is persisted and passed to the upload tool, where 
they are merged according to a datatype (specified in the upload tool) 
merge-method. File type detection is done by using sniffers as file type 
is set to auto.


I don't have a very good view of the demand for this sort of function, 
but if anyone else would like to use/modify it, I can fork and issue a 
pull request.


cheers,
jorrit



On 03/06/2012 06:56 PM, Nate Coraor wrote:

On Feb 29, 2012, at 11:34 AM, Jorrit Boekel wrote:


Dear list,

Our lab's proteomics data is frequently outputted into50 files containing 
different fractions of proteins. The files are locally stored and not present on 
the Galaxy server. We've planned to somehow (inside galaxy) merge these files and 
split them into tasks so they can be run on the cluster. We would either 
merge/split the files by concatenation, or untar/tar files at every job, depending 
on filetype and tool. No problems so far.

However, I have been looking around for a way to upload50 files simultaneously to galaxy and 
convert to one dataset, and this does not seem to be supported. Before starting to create a hack to 
make this work, which doesn't seem especially trivial to me, I'd like to know if I should instead use 
libraries. From what I've seen, libraries are not treated as datasets in Galaxy but rather contain 
datasets. If there was a tar all sets in library and import to history I'd be using that, 
but I've only encountered tar/zip sets and download locally which would be a bit of a 
workaround.

Hi Jorrit,

It's not possible to do this all in one step, but you can definitely upload 
them all simultaneously and then concatenate them using the concatenate tool 
(or write a simple tool to tar them).

--nate


I haven't found much on this subject in the mailing list, has this 
functionality been requested before?

cheers,
jorrit boekel
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] uploading multiple files into one dataset

2012-03-06 Thread Nate Coraor
On Feb 29, 2012, at 11:34 AM, Jorrit Boekel wrote:

 Dear list,
 
 Our lab's proteomics data is frequently outputted into 50 files containing 
 different fractions of proteins. The files are locally stored and not present 
 on the Galaxy server. We've planned to somehow (inside galaxy) merge these 
 files and split them into tasks so they can be run on the cluster. We would 
 either merge/split the files by concatenation, or untar/tar files at every 
 job, depending on filetype and tool. No problems so far.
 
 However, I have been looking around for a way to upload 50 files 
 simultaneously to galaxy and convert to one dataset, and this does not seem 
 to be supported. Before starting to create a hack to make this work, which 
 doesn't seem especially trivial to me, I'd like to know if I should instead 
 use libraries. From what I've seen, libraries are not treated as datasets in 
 Galaxy but rather contain datasets. If there was a tar all sets in library 
 and import to history I'd be using that, but I've only encountered tar/zip 
 sets and download locally which would be a bit of a workaround.

Hi Jorrit,

It's not possible to do this all in one step, but you can definitely upload 
them all simultaneously and then concatenate them using the concatenate tool 
(or write a simple tool to tar them).

--nate

 
 I haven't found much on this subject in the mailing list, has this 
 functionality been requested before?
 
 cheers,
 jorrit boekel
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] uploading multiple files into one dataset

2012-02-29 Thread Jorrit Boekel

Dear list,

Our lab's proteomics data is frequently outputted into 50 files 
containing different fractions of proteins. The files are locally stored 
and not present on the Galaxy server. We've planned to somehow (inside 
galaxy) merge these files and split them into tasks so they can be run 
on the cluster. We would either merge/split the files by concatenation, 
or untar/tar files at every job, depending on filetype and tool. No 
problems so far.


However, I have been looking around for a way to upload 50 files 
simultaneously to galaxy and convert to one dataset, and this does not 
seem to be supported. Before starting to create a hack to make this 
work, which doesn't seem especially trivial to me, I'd like to know if I 
should instead use libraries. From what I've seen, libraries are not 
treated as datasets in Galaxy but rather contain datasets. If there was 
a tar all sets in library and import to history I'd be using that, but 
I've only encountered tar/zip sets and download locally which would be 
a bit of a workaround.


I haven't found much on this subject in the mailing list, has this 
functionality been requested before?


cheers,
jorrit boekel
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/