Re: [galaxy-dev] EST download from any source?
Timothy Wu wrote: > On Thu, Sep 15, 2011 at 5:32 PM, Timothy Wu <2hug...@gmail.com> wrote: > > > On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock > > wrote: > > > >> On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote: > >> > > >> > I think I need some kind of "data source" implementation that allow user > >> to > >> > obtain the data themselves. However with the current tool XML > >> definition, I > >> > don't know how to have a FTP download tool to download EST data from > >> NCBI to > >> > Galaxy directly. > >> > >> Perhaps I have misunderstood you, but I'd just use the provided > >> "Upload Data" tool, and paste in the FTP URL for the file, e.g. > >> an NCBI FTP URL. > >> > > > > I wasn't aware that the Upload data tool could take a FTP URL, so thanks > > for letting me know. > > > > Unfortunately that doesn't take a wild card. > > > > I need to have the path specification like this " > > ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum. > > > > Actually my tool is more versatile (though I don't need it for this > > particular application). > > > > I could specify > > > > ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna > > > > and grab all the fasta files for all chromosome of all species under the > > genomes directory. I thought it would be a nice tool to have in my galaxy > > arsenal. > > > > > It looks to me like a good idea to check out how the upload tool is > implemented. But it seems a bit complex. I don't understand why it does not > have the tag, it also has this action tab module="galaxy.tools.actions.upload" class="UploadToolAction"/> which is not > explained in the "Tool Config Syntax". > > Any documentations or tutorials out there that would help me understand how > to implement this? > > Timothy Hi Timothy, The default action taken when executing a tool is to call the execute() method in lib/galaxy/tools/actions/__init__.py in DefaultToolAction. This method prepares the tool and creates a job to run the tool. The upload tool is unlike other tools and can't use this default method, so it instead uses an action in upload.py in the same directory. This would probably be easiest as a new tool, however, since it's a very specialized case. --nate > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On Thu, Sep 15, 2011 at 5:32 PM, Timothy Wu <2hug...@gmail.com> wrote: > On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock wrote: > >> On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote: >> > >> > I think I need some kind of "data source" implementation that allow user >> to >> > obtain the data themselves. However with the current tool XML >> definition, I >> > don't know how to have a FTP download tool to download EST data from >> NCBI to >> > Galaxy directly. >> >> Perhaps I have misunderstood you, but I'd just use the provided >> "Upload Data" tool, and paste in the FTP URL for the file, e.g. >> an NCBI FTP URL. >> > > I wasn't aware that the Upload data tool could take a FTP URL, so thanks > for letting me know. > > Unfortunately that doesn't take a wild card. > > I need to have the path specification like this " > ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum. > > Actually my tool is more versatile (though I don't need it for this > particular application). > > I could specify > > ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna > > and grab all the fasta files for all chromosome of all species under the > genomes directory. I thought it would be a nice tool to have in my galaxy > arsenal. > > It looks to me like a good idea to check out how the upload tool is implemented. But it seems a bit complex. I don't understand why it does not have the tag, it also has this action tab which is not explained in the "Tool Config Syntax". Any documentations or tutorials out there that would help me understand how to implement this? Timothy ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On Thu, Sep 15, 2011 at 10:32 AM, Timothy Wu <2hug...@gmail.com> wrote: > On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock > wrote: >> >> Perhaps I have misunderstood you, but I'd just use the provided >> "Upload Data" tool, and paste in the FTP URL for the file, e.g. >> an NCBI FTP URL. > > I wasn't aware that the Upload data tool could take a FTP URL, so thanks for > letting me know. > > Unfortunately that doesn't take a wild card. > > I need to have the path specification like this > "ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum. > > Actually my tool is more versatile (though I don't need it for this > particular application). > > I could specify > > ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna > > and grab all the fasta files for all chromosome of all species under the > genomes directory. I thought it would be a nice tool to have in my galaxy > arsenal. > > Timothy That volume of data shouldn't really be uploaded into individual Galaxy user's histories (not unless you have a Galaxy setup with an unusually high disk quota per user - lucky you). This seems ideal for the Galaxy data library functionality, where the Galaxy admin loads the big data sets and makes them available to all the Galaxy users (or a subset using access controls). For the user's history the files are just linked to - so there is only one copy on disk. http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries However, we'd also like easy access to (some of) the files on ftp://ftp.ncbi.nih.gov/genomes/ so a new "NCBI Genomes FTP-site Data Source Tool" as part of Galaxy would be nice (like the existing UCSC data source etc). Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock wrote: > On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote: > > > > I think I need some kind of "data source" implementation that allow user > to > > obtain the data themselves. However with the current tool XML definition, > I > > don't know how to have a FTP download tool to download EST data from NCBI > to > > Galaxy directly. > > Perhaps I have misunderstood you, but I'd just use the provided > "Upload Data" tool, and paste in the FTP URL for the file, e.g. > an NCBI FTP URL. > I wasn't aware that the Upload data tool could take a FTP URL, so thanks for letting me know. Unfortunately that doesn't take a wild card. I need to have the path specification like this " ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz"; at the minimum. Actually my tool is more versatile (though I don't need it for this particular application). I could specify ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna and grab all the fasta files for all chromosome of all species under the genomes directory. I thought it would be a nice tool to have in my galaxy arsenal. Timothy ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On Thu, Sep 15, 2011 at 9:32 AM, Timothy Wu <2hug...@gmail.com> wrote: > > I think I need some kind of "data source" implementation that allow user to > obtain the data themselves. However with the current tool XML definition, I > don't know how to have a FTP download tool to download EST data from NCBI to > Galaxy directly. Perhaps I have misunderstood you, but I'd just use the provided "Upload Data" tool, and paste in the FTP URL for the file, e.g. an NCBI FTP URL. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On Wed, Sep 14, 2011 at 5:21 PM, Hans-Rudolf Hotz wrote: > > > On 09/14/2011 10:39 AM, Timothy Wu wrote: > > // > > >> Alternatively, I can just ask user to download from NCBI ftp themselves, >> decompress them, and upload it to galaxy. >> >> What's the best approach here? >> > > How about: you download the data once, and then offer it as a 'data > library' to your users. This way you avoid data duplication. > > I do not know how to prepare a "data library". However, I think this is less than optimal as the data itself may be updated. And I don't think data duplication is really a problem if the users install their own version of Galaxy. I think I need some kind of "data source" implementation that allow user to obtain the data themselves. However with the current tool XML definition, I don't know how to have a FTP download tool to download EST data from NCBI to Galaxy directly. Oh well, I guess I'll resort to users uploading zipped EST genbank files themselves by uploading to galaxy via FTP if all else fails. Or I'll just have the FTP tool to also parses the parses the genbank downloaded and merges all data to a single file. But this really limits the flexibility of the FTP tool which could be more generic. Timothy ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] EST download from any source?
On 09/14/2011 10:39 AM, Timothy Wu wrote: // Alternatively, I can just ask user to download from NCBI ftp themselves, decompress them, and upload it to galaxy. What's the best approach here? How about: you download the data once, and then offer it as a 'data library' to your users. This way you avoid data duplication. And I noticed that file types does not include genbank types nor gzip types. Is there some generic type I could use? Just Data class? We treat GenBank files as "txt". This works fine with the EMBOSS tools. Regards, Hans Timothy ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] EST download from any source?
Hi, I'm trying to wrap up my own tool in Galaxy. The input to my tool include the set of EST (such as the entire human collection). I tried using UCSC genome browser but it doesn't seem to let me download the whole human collection due to the size of the data. I tried to implement my own FTP client and try to wrap that up in galaxy. I intend to have the FTP client download data from NCBI's FTP server directly, and have the downloaded files as output files to feed back into galaxy. I intend to make the FTP client somewhat generic, so as not to enforce the type of files. Though in my case, I would be download gzipped genbank files. But galaxy support for multiple output files kind of tripped me over. I do not know exactly what to do, since it looks as if galaxy requires a strict naming convention for the outputs, according to http://gmod.827538.n3.nabble.com/Multiple-output-not-known-until-tool-run-td1734071.html(the case I have is obviously that the number of files would not be known until run time). I guess it doesn't really, really matter, if I send those files, whatever the naming convention are, and fed it to a gzip decompressor (which I am planning to do a simple wrap up, just to be able to handle my stuff). Then it should all work out fine. Alternatively, I can just ask user to download from NCBI ftp themselves, decompress them, and upload it to galaxy. What's the best approach here? And I noticed that file types does not include genbank types nor gzip types. Is there some generic type I could use? Just Data class? Timothy ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/