Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool

2014-09-02 Thread Michael R. Crusoe
I would ask that such a tool be designed to work well in a stand-alone mode
outside of Galaxy.

Cheers,


On Sat, Aug 23, 2014 at 4:24 AM, Dooley, Damion 
wrote:

> We are about to implement a fasta database (file) versioning system as a
> Galaxy tool.  I wanted to get interested people's feedback first before we
> roll ahead with the prototype implementation.  The versioning system aims
> to:
>
> * Enable reproducible research: To recreate a search result at a certain
> point in time we need versioning so that search and mapping tools can look
> at sequence reference databases corresponding to a particular past date.
> This recall can also explain the difference between what was known in the
> past vs. currently.
>
> * Reduce hard drive space.  Some databases are too big to keep N copies
> around, e.g. 5 years of 16S, updated monthly, is say, 670Mb + 668Mb + 665Mb
> +   But occasionally we want to access past archives fairly quickly.
>
> * Integrate database versioning into Galaxy without adding a lot of
> complexity.
>
> A bonus would be to enable the efficient sharing of version databases
> between computers/servers.
>
> The solution we think would work centres around a "Versioned Data
> Retrieval" tool (draft image attached) that would work as follows:
>
> 1) User selects from a list of databases provided by  "Shared Data > Data
> Libraries > Versioned Data".
>   - Each database has a master file that keeps its various versions as a
> list of time-stamped insert/delete transactions of key (fasta id) value
> (description & sequence) pairs.
>   - Each master file is managed outside of galaxy via a triggered process
> on regular fasta file imports from data sources like NCBI or other niche
> sources.
>   - We're expecting, due to the nature of fasta archived sequence updates,
> that our master file would only be about 1.1x the latest version in size
> (uncompressed).
> 2) User enters date / version id to retrieve (validated)
> 3) If a cached version of that database exists, it is linked into user's
> history.
> 4) Otherwise a new version of it is created, placed in cache, and linked
> into history.
>   - The cached version itself then shows up as linked data under a Data
> Library > Versioned Data subfolder.
> 5) User can select preconfigured workflow(s) to execute on the selected
> retreived fasta file to regenerate any database products they need.
>   - Workflow output data would also be cached in the same way the fasta
> data is - by linking the Galaxy Data Library to it.
>   - Workflow execution will be skipped if end data already exists in cache.
>   - Simple makeblastdb or bowtie-build commands, or more specific
> workflows that include dustmasker etc can be implemented.
>
> Does this sound attractive?
>
> We're hoping such a vision could handle Fasta databases from 12mb to e.g.
> 200Gb (probably requires makeblastdb in parallel at that scale).
>
> Preliminary work suggests this project is doable via the Galaxy API
> without galaxy customization - does that sound right?!
>
> Feedback really appreciated!
>
> Regards,
>
> Damion Dooley
>
> Hsiao lab, BC Public Health Microbiology & Reference Laboratory, BC Centre
> for Disease Control
> 655 West 12th Avenue, Vancouver, British Columbia, V5Z 4R4 Canada
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
>



-- 
Michael R. Crusoe:  Programmer & Bioinformatician   mcru...@msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State U
http://ged.msu.edu/ http://orcid.org/-0002-2961-9670 @biocrusoe
<http://twitter.com/biocrusoe>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Some plugin for digital normalization

2013-12-26 Thread Michael R. Crusoe
> From: Cristian Alejandro Rojas 
> Date: December 26, 2013 2:37 AM
> To: galaxy-dev@lists.bx.psu.edu
> Subject: [galaxy-dev] Some plugin for digital normalization
> Hello all,
>
> I've been searching in tool sheds(main and test) for a plugin to do
digital normalization of reads, for example Trinity has one script to do
this (./util/normalize_by_kmer_coverage.pl),  diginorm is another
alternative (http://ged.msu.edu/papers/2012-diginorm/).
>
> Do you know if there is a galaxy tool ready for this work or should I
develop my own?

Hello Cristian,

I am actively working on a wrapper for normalize-by-median.py and the other
tools in the khmer suite.

https://github.com/ged-lab/khmer/blob/galaxy-integration/scripts/normalize-by-median.xml

It requires some changes on our end which you can track here:

https://github.com/ged-lab/khmer/pull/237

You can install this branch of khmer via pip

pip install -e git+...@github.com:ged-lab/khmer.git@output_naming1#egg=khmer

or if you don't have a github account:

pip install -e git+
https://github.com/ged-lab/khmer.git@output_naming1#egg=khmer

(Instructions on installing using a virtualenv or on OS X are at:
https://khmer.readthedocs.org/en/latest/install.html just substitute in the
pertinent pip line from above)

I haven't tested it much; doing so is the top of my to-do list for the rest
of this week. If you or anyone else is feeling adventurous I would be happy
to have the feedback.

Cheers!

-- 
Michael R. Crusoe: Software Engineer and Bioinformatician  mcru...@msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State University
http://ged.msu.edu/ http://orcid.org/-0002-2961-9670@biocrusoe
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Some plugin for digital normalization

2013-12-29 Thread Michael R. Crusoe
You are doing nothing wrong Cristian, it was my error.

I've updated the tool definition file; it is now at
https://github.com/ged-lab/khmer/blob/output_naming1
/scripts/normalize-by-median.xml

And I've updated the code base. You'll need to run the following to update
your installation

pip install -e git+...@github.com:ged-lab/khmer.git@output_naming1#egg=khmer

or if you don't have a github account:

pip install -e git+
https://github.com/ged-lab/khmer.git@output_naming1#egg=khmer

To be able to use diginorm inside Galaxy Workflows we are only supporting
diginorming into a single file at this time.

There is a chaining method where multiple files can be processed
sequentially but until Galaxy supports proper collections a cleaner
integration will have to wait.

Interested parties can follow along at
https://github.com/ged-lab/khmer/pull/237#issuecomment-31307144



On Thu, Dec 26, 2013 at 10:53 PM, Cristian Alejandro Rojas <
alejandro.0...@gmail.com> wrote:

> Thank u michael.
>
> I downloaded and install khmer to my machine, the i have include your
> galaxy plugin in my local instance, but when I try to run this from the web
> interface i'm getting this error:
>
> Traceback (most recent call last):
>   File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 
> 123, in prepare_job
> job_wrapper.prepare()
>   File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 702, in 
> prepare
> self.command_line = self.tool.build_command_line( param_dict )
>   File "/home/galaxy/galaxy-dist/lib/galaxy/tools/__init__.py", line 2639, in 
> build_command_line
> command_line = fill_template( self.command, context=param_dict )
>   File "/home/galaxy/galaxy-dist/lib/galaxy/util/template.py", line 9, in 
> fill_template
> return str( Template( source=template_text, searchList=[context] ) )
>   File 
> "/home/galaxy/galaxy-dist/eggs/Cheetah-2.2.2-py2.7-linux-i686-ucs4.egg/Cheetah/Template.py",
>  line 1004, in __str__
> return getattr(self, mainMethName)()
>   File "cheetah_DynamicallyCompiledCheetahTemplate_1388113247_46_38074.py", 
> line 111, in respond
> NotFound: cannot find 'hashsize'
>
> I have tried change parameters, using the default parameters, advanced
> parameters but it didn't work. Do you know what am i doing wrong?
>
> Cheers!
>
>
> 2013/12/26 Michael R. Crusoe 
>
> > From: Cristian Alejandro Rojas 
>> > Date: December 26, 2013 2:37 AM
>> > To: galaxy-dev@lists.bx.psu.edu
>> > Subject: [galaxy-dev] Some plugin for digital normalization
>>
>> > Hello all,
>> >
>> > I've been searching in tool sheds(main and test) for a plugin to do
>> digital normalization of reads, for example Trinity has one script to do
>> this (./util/normalize_by_kmer_coverage.pl),  diginorm is another
>> alternative (http://ged.msu.edu/papers/2012-diginorm/).
>> >
>> > Do you know if there is a galaxy tool ready for this work or should I
>> develop my own?
>>
>> Hello Cristian,
>>
>> I am actively working on a wrapper for normalize-by-median.py and the
>> other tools in the khmer suite.
>>
>>
>> https://github.com/ged-lab/khmer/blob/galaxy-integration/scripts/normalize-by-median.xml
>>
>> It requires some changes on our end which you can track here:
>>
>> https://github.com/ged-lab/khmer/pull/237
>>
>> You can install this branch of khmer via pip
>>
>> pip install -e git+...@github.com:ged-lab/khmer.git@output_naming1
>> #egg=khmer
>>
>> or if you don't have a github account:
>>
>> pip install -e git+
>> https://github.com/ged-lab/khmer.git@output_naming1#egg=khmer
>>
>> (Instructions on installing using a virtualenv or on OS X are at:
>> https://khmer.readthedocs.org/en/latest/install.html just substitute in
>> the pertinent pip line from above)
>>
>> I haven't tested it much; doing so is the top of my to-do list for the
>> rest of this week. If you or anyone else is feeling adventurous I would be
>> happy to have the feedback.
>>
>> Cheers!
>>
>> --
>> Michael R. Crusoe: Software Engineer and Bioinformatician
>> mcru...@msu.edu
>>  @ the Genomics, Evolution, and Development lab; Michigan State University
>> http://ged.msu.edu/ http://orcid.org/-0002-2961-9670
>>  @biocrusoe
>>
>
>
>
> --
> *Cristian Alejandro Rojas Quintero*
> *Estudiante Ingeniería de Sistemas *
>  *Universidad Distrital Francisco José de Caldas*
> Bogotá - Colombia
>
>


-- 
Michael R. Crusoe: Software Engineer and Bioinformatician  mcru...@msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State University
http://ged.msu.edu/ http://orcid.org/-0002-2961-9670
@biocrusoe<http://twitter.com/biocrusoe>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Some plugin for digital normalization

2014-01-08 Thread Michael R. Crusoe
That is wonderful to hear. I'm adding more khmer Galaxy wrappers and the
next release of khmer will have the necessary modifications included.

Please let me know if there is a specific feature you'd like to see.
On Jan 8, 2014 11:09 PM, "Cristian Alejandro Rojas" <
alejandro.0...@gmail.com> wrote:

> Sorry for the late answer.
>
> Thank you so much Michael.
>
> I have tested the plugin and everything now works fine!
>
> Cheers
>
>
> 2013/12/29 Michael R. Crusoe 
>
>> You are doing nothing wrong Cristian, it was my error.
>>
>> I've updated the tool definition file; it is now at
>> https://github.com/ged-lab/khmer/blob/output_naming1
>> /scripts/normalize-by-median.xml
>>
>> And I've updated the code base. You'll need to run the following to
>> update your installation
>>
>>
>> pip install -e git+...@github.com:ged-lab/khmer.git@output_naming1
>> #egg=khmer
>>
>> or if you don't have a github account:
>>
>> pip install -e git+
>> https://github.com/ged-lab/khmer.git@output_naming1#egg=khmer
>>
>> To be able to use diginorm inside Galaxy Workflows we are only supporting
>> diginorming into a single file at this time.
>>
>> There is a chaining method where multiple files can be processed
>> sequentially but until Galaxy supports proper collections a cleaner
>> integration will have to wait.
>>
>> Interested parties can follow along at
>> https://github.com/ged-lab/khmer/pull/237#issuecomment-31307144
>>
>>
>>
>> On Thu, Dec 26, 2013 at 10:53 PM, Cristian Alejandro Rojas <
>> alejandro.0...@gmail.com> wrote:
>>
>>> Thank u michael.
>>>
>>> I downloaded and install khmer to my machine, the i have include your
>>> galaxy plugin in my local instance, but when I try to run this from the web
>>> interface i'm getting this error:
>>>
>>> Traceback (most recent call last):
>>>   File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 
>>> 123, in prepare_job
>>> job_wrapper.prepare()
>>>   File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 702, in 
>>> prepare
>>> self.command_line = self.tool.build_command_line( param_dict )
>>>   File "/home/galaxy/galaxy-dist/lib/galaxy/tools/__init__.py", line 2639, 
>>> in build_command_line
>>> command_line = fill_template( self.command, context=param_dict )
>>>   File "/home/galaxy/galaxy-dist/lib/galaxy/util/template.py", line 9, in 
>>> fill_template
>>> return str( Template( source=template_text, searchList=[context] ) )
>>>   File 
>>> "/home/galaxy/galaxy-dist/eggs/Cheetah-2.2.2-py2.7-linux-i686-ucs4.egg/Cheetah/Template.py",
>>>  line 1004, in __str__
>>> return getattr(self, mainMethName)()
>>>   File "cheetah_DynamicallyCompiledCheetahTemplate_1388113247_46_38074.py", 
>>> line 111, in respond
>>> NotFound: cannot find 'hashsize'
>>>
>>> I have tried change parameters, using the default parameters, advanced
>>> parameters but it didn't work. Do you know what am i doing wrong?
>>>
>>> Cheers!
>>>
>>>
>>> 2013/12/26 Michael R. Crusoe 
>>>
>>> > From: Cristian Alejandro Rojas 
>>>> > Date: December 26, 2013 2:37 AM
>>>> > To: galaxy-dev@lists.bx.psu.edu
>>>> > Subject: [galaxy-dev] Some plugin for digital normalization
>>>>
>>>> > Hello all,
>>>> >
>>>> > I've been searching in tool sheds(main and test) for a plugin to do
>>>> digital normalization of reads, for example Trinity has one script to do
>>>> this (./util/normalize_by_kmer_coverage.pl),  diginorm is another
>>>> alternative (http://ged.msu.edu/papers/2012-diginorm/).
>>>> >
>>>> > Do you know if there is a galaxy tool ready for this work or should I
>>>> develop my own?
>>>>
>>>> Hello Cristian,
>>>>
>>>> I am actively working on a wrapper for normalize-by-median.py and the
>>>> other tools in the khmer suite.
>>>>
>>>>
>>>> https://github.com/ged-lab/khmer/blob/galaxy-integration/scripts/normalize-by-median.xml
>>>>
>>>> It requires some changes on our end which you can track here:
>>>>
>>>> https://github.com/ged-lab/khmer/pull/237
>>>>
>>