Re: [galaxy-dev] EST download from any source?

2011-09-27 Thread Nate Coraor
Timothy Wu wrote:
> On Thu, Sep 15, 2011 at 5:32 PM, Timothy Wu <2hug...@gmail.com> wrote:
> 
> > On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock 
> > wrote:
> >
> >> On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote:
> >> >
> >> > I think I need some kind of "data source" implementation that allow user
> >> to
> >> > obtain the data themselves. However with the current tool XML
> >> definition, I
> >> > don't know how to have a FTP download tool to download EST data from
> >> NCBI to
> >> > Galaxy directly.
> >>
> >> Perhaps I have misunderstood you, but I'd just use the provided
> >> "Upload Data" tool, and paste in the FTP URL for the file, e.g.
> >> an NCBI FTP URL.
> >>
> >
> > I wasn't aware that the Upload data tool could take a FTP URL, so thanks
> > for letting me know.
> >
> > Unfortunately that doesn't take a wild card.
> >
> > I need to have the path specification like this "
> > ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum.
> >
> > Actually my tool is more versatile (though I don't need it for this
> > particular application).
> >
> > I could specify
> >
> > ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna
> >
> > and grab all the fasta files for all chromosome of all species under the
> > genomes directory. I thought it would be a nice tool to have in my galaxy
> > arsenal.
> >
> >
> It looks to me like a good idea to check out how the upload tool is
> implemented. But it seems a bit complex. I don't understand why it does not
> have the  tag, it also has this action tab   module="galaxy.tools.actions.upload" class="UploadToolAction"/> which is not
> explained in the "Tool Config Syntax".
> 
> Any documentations or tutorials out there that would help me understand how
> to implement this?
> 
> Timothy

Hi Timothy,

The default action taken when executing a tool is to call the execute()
method in lib/galaxy/tools/actions/__init__.py in DefaultToolAction.
This method prepares the tool and creates a job to run the tool.

The upload tool is unlike other tools and can't use this default method,
so it instead uses an action in upload.py in the same directory.

This would probably be easiest as a new tool, however, since it's a very
specialized case.

--nate

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] EST download from any source?

2011-09-15 Thread Timothy Wu
On Thu, Sep 15, 2011 at 5:32 PM, Timothy Wu <2hug...@gmail.com> wrote:

> On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock wrote:
>
>> On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote:
>> >
>> > I think I need some kind of "data source" implementation that allow user
>> to
>> > obtain the data themselves. However with the current tool XML
>> definition, I
>> > don't know how to have a FTP download tool to download EST data from
>> NCBI to
>> > Galaxy directly.
>>
>> Perhaps I have misunderstood you, but I'd just use the provided
>> "Upload Data" tool, and paste in the FTP URL for the file, e.g.
>> an NCBI FTP URL.
>>
>
> I wasn't aware that the Upload data tool could take a FTP URL, so thanks
> for letting me know.
>
> Unfortunately that doesn't take a wild card.
>
> I need to have the path specification like this "
> ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum.
>
> Actually my tool is more versatile (though I don't need it for this
> particular application).
>
> I could specify
>
> ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna
>
> and grab all the fasta files for all chromosome of all species under the
> genomes directory. I thought it would be a nice tool to have in my galaxy
> arsenal.
>
>
It looks to me like a good idea to check out how the upload tool is
implemented. But it seems a bit complex. I don't understand why it does not
have the  tag, it also has this action tab   which is not
explained in the "Tool Config Syntax".

Any documentations or tutorials out there that would help me understand how
to implement this?

Timothy
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] EST download from any source?

2011-09-15 Thread Peter Cock
On Thu, Sep 15, 2011 at 10:32 AM, Timothy Wu <2hug...@gmail.com> wrote:
> On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock 
> wrote:
>>
>> Perhaps I have misunderstood you, but I'd just use the provided
>> "Upload Data" tool, and paste in the FTP URL for the file, e.g.
>> an NCBI FTP URL.
>
> I wasn't aware that the Upload data tool could take a FTP URL, so thanks for
> letting me know.
>
> Unfortunately that doesn't take a wild card.
>
> I need to have the path specification like this
> "ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum.
>
> Actually my tool is more versatile (though I don't need it for this
> particular application).
>
> I could specify
>
> ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna
>
> and grab all the fasta files for all chromosome of all species under the
> genomes directory. I thought it would be a nice tool to have in my galaxy
> arsenal.
>
> Timothy

That volume of data shouldn't really be uploaded into individual
Galaxy user's histories (not unless you have a Galaxy setup
with an unusually high disk quota per user - lucky you).

This seems ideal for the Galaxy data library functionality, where
the Galaxy admin loads the big data sets and makes them
available to all the Galaxy users (or a subset using access
controls). For the user's history the files are just linked to - so
there is only one copy on disk.
http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries

However, we'd also like easy access to (some of) the files on
ftp://ftp.ncbi.nih.gov/genomes/ so a new "NCBI Genomes
FTP-site Data Source Tool" as part of Galaxy would be nice
(like the existing UCSC data source etc).

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] EST download from any source?

2011-09-15 Thread Timothy Wu
On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock wrote:

> On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote:
> >
> > I think I need some kind of "data source" implementation that allow user
> to
> > obtain the data themselves. However with the current tool XML definition,
> I
> > don't know how to have a FTP download tool to download EST data from NCBI
> to
> > Galaxy directly.
>
> Perhaps I have misunderstood you, but I'd just use the provided
> "Upload Data" tool, and paste in the FTP URL for the file, e.g.
> an NCBI FTP URL.
>

I wasn't aware that the Upload data tool could take a FTP URL, so thanks for
letting me know.

Unfortunately that doesn't take a wild card.

I need to have the path specification like this "
ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum.

Actually my tool is more versatile (though I don't need it for this
particular application).

I could specify

ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna

and grab all the fasta files for all chromosome of all species under the
genomes directory. I thought it would be a nice tool to have in my galaxy
arsenal.

Timothy
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] EST download from any source?

2011-09-15 Thread Peter Cock
On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote:
>
> I think I need some kind of "data source" implementation that allow user to
> obtain the data themselves. However with the current tool XML definition, I
> don't know how to have a FTP download tool to download EST data from NCBI to
> Galaxy directly.

Perhaps I have misunderstood you, but I'd just use the provided
"Upload Data" tool, and paste in the FTP URL for the file, e.g.
an NCBI FTP URL.

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] EST download from any source?

2011-09-15 Thread Timothy Wu
On Wed, Sep 14, 2011 at 5:21 PM, Hans-Rudolf Hotz  wrote:

>
>
> On 09/14/2011 10:39 AM, Timothy Wu wrote:
>
> //
>
>
>> Alternatively, I can just ask user to download from NCBI ftp themselves,
>> decompress them, and upload it to galaxy.
>>
>> What's the best approach here?
>>
>
> How about: you download the data once, and then offer it as a 'data
> library' to your users. This way you avoid data duplication.
>
>
I do not know how to prepare a "data library". However, I think this is less
than optimal as the data itself may be updated. And I don't think data
duplication is really a problem if the users install their own version of
Galaxy.

I think I need some kind of "data source" implementation that allow user to
obtain the data themselves. However with the current tool XML definition, I
don't know how to have a FTP download tool to download EST data from NCBI to
Galaxy directly.

Oh well, I guess I'll resort to users uploading zipped EST genbank files
themselves by uploading to galaxy via FTP if all else fails. Or I'll just
have the FTP tool to also parses the parses the genbank downloaded and
merges all data to a single file. But this really limits the flexibility of
the FTP tool which could be more generic.

Timothy
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] EST download from any source?

2011-09-14 Thread Hans-Rudolf Hotz



On 09/14/2011 10:39 AM, Timothy Wu wrote:

//


Alternatively, I can just ask user to download from NCBI ftp themselves,
decompress them, and upload it to galaxy.

What's the best approach here?


How about: you download the data once, and then offer it as a 'data 
library' to your users. This way you avoid data duplication.




And I noticed that file types does not include genbank types nor gzip
types. Is there some generic type I could use? Just Data class?


We treat GenBank files as "txt". This works fine with the EMBOSS tools.

Regards, Hans




Timothy


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] EST download from any source?

2011-09-14 Thread Timothy Wu
Hi,

I'm trying to wrap up my own tool in Galaxy. The input to my tool include
the set of EST (such as the entire human collection). I tried using UCSC
genome browser but it doesn't seem to let me download the whole human
collection due to the size of the data.

I tried to implement my own FTP client and try to wrap that up in galaxy. I
intend to have the FTP client download data from NCBI's FTP server directly,
and have the downloaded files as output files to feed back into galaxy. I
intend to make the FTP client somewhat generic, so as not to enforce the
type of files. Though in my case, I would be download gzipped genbank files.

But galaxy support for multiple output files kind of tripped me over. I do
not know exactly what to do, since it looks as if galaxy requires a strict
naming convention for the outputs, according to
http://gmod.827538.n3.nabble.com/Multiple-output-not-known-until-tool-run-td1734071.html(the
case I have is obviously that the number of files would not be known
until run time).

I guess it doesn't really, really matter, if I send those files, whatever
the naming convention are, and fed it to a gzip decompressor (which I am
planning to do a simple wrap up, just to be able to handle my stuff). Then
it should all work out fine.

Alternatively, I can just ask user to download from NCBI ftp themselves,
decompress them, and upload it to galaxy.

What's the best approach here?

And I noticed that file types does not include genbank types nor gzip types.
Is there some generic type I could use? Just Data class?

Timothy
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/