Re: [galaxy-user] Metagenomics

2012-03-12 Thread John Major
A small warning re-the current cloud-Blast+ config.

To properly use the metagenomic tools, if you use the blast+ galaxy tool,
make sure to export in blast.XML, then you'll need a script to parse out
the readID and the Hit_def (as the hit ID).  It appears that the 'Hit_def'
field contains the correct key to the taxonomy database.  Specifically, the
Hit_def field is in the format #_#, where the 'gi' id is the first #.  The
tabular (normal and extended) data does not contain this info.

I noticed this after attempting to use the tabular data, and using a
trimmed col[1] (supposed to be hit seqID), but my results always came back
as a ranked list of the most sequenced genomes in nt basically  keying
in randomly.

j

On Wed, Mar 7, 2012 at 4:16 PM, Jennifer Jackson j...@bx.psu.edu wrote:

 Hi Vincent, Scott,

 Filtering raw hits is an important part of a metagenomics analysis
 pipeline. Please see the methods described in the published metagenomics
 analysis paper associated with this tool set:

 Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor J,
 and Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic
 pipeline. Genome Research. 2009 Nov; 19(11):2144-53.

 http://www.ncbi.nlm.nih.gov/**pubmed/19819906http://www.ncbi.nlm.nih.gov/pubmed/19819906

 Live supplemental data that can be imported and experimented with is
 available on the public instance, including raw data, working histories,
 and a tutorial that demonstrates step-by-step the exact methods used in the
 publication:
 http://main.g2.bx.psu.edu/u/**aun1/p/windshield-splatterhttp://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter
 http://main.g2.bx.psu.edu/**library http://main.g2.bx.psu.edu/library- see 
 Windshield splatter

 Not all tools are available on the public main server, but a local or
 cloud instance could be used with wrapped tools from the Distribution or
 Tool Shed, as necessary. For example, BLAST is not available on the public
 instance, but is included in the distribution for use in local or cloud
 instances. http://getgalaxy.org

 Hopefully you will both find this helpful,

 Jen
 Galaxy project




 On 2/29/12 5:32 PM, Montoya, Vincent wrote:

 Hello
 I am a relatively new user on Galaxy and I had a question regarding
 Fetching Taxonomic Information.  It is great that I can retrieve all of
 the hits for each sequence, but I cannot seem to find an option to also
 provide how accurate of a match it is to the given taxon.  For instance, a
 percentage match.  I can access this information in the original file and
 programmatically retrieve it but, it would be nice if it came in one
 package so that I can avoide those false hits that have a low percentage
 match.  Can you please provide me with instructions on how to best to
 retrieve this information (hopefully in a single file)?
 Thank you
 Vincent
 __**_
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

   
 http://lists.bx.psu.edu/**listinfo/galaxy-devhttp://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

   http://lists.bx.psu.edu/


 --
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org/wiki/**Supporthttp://galaxyproject.org/wiki/Support

 __**_
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  
 http://lists.bx.psu.edu/**listinfo/galaxy-devhttp://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Metagenomics

2012-03-12 Thread Peter Cock
On Mon, Mar 12, 2012 at 6:28 PM, John Major john.e.major...@gmail.com wrote:
 A small warning re-the current cloud-Blast+ config.

 To properly use the metagenomic tools, if you use the blast+ galaxy tool,
 make sure to export in blast.XML, then you'll need a script to parse out the
 readID and the Hit_def (as the hit ID).  It appears that the 'Hit_def' field
 contains the correct key to the taxonomy database.  Specifically, the
 Hit_def field is in the format #_#, where the 'gi' id is the first #.  The
 tabular (normal and extended) data does not contain this info.

 I noticed this after attempting to use the tabular data, and using a trimmed
 col[1] (supposed to be hit seqID), but my results always came back as a
 ranked list of the most sequenced genomes in nt basically  keying in
 randomly.

 j

Hi John,

Can you expand on that with a specific example (ideally on the galaxy-dev
list, CC'd, since BLAST+ isn't event available on the public galaxy)?

Also which version of BLAST+ are you using since I recall some changes
to the tabular output IDs prior to 2.2.25 (which is what the wrappers were
tested on, I've not tried 2.2.26 yet).

Thanks,

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Metagenomics

2012-03-12 Thread Scott Tighe

Dear GALAXY and Jennifer

Although the windshield analysis papers were good starters, They do not 
address conversed sequence purging or how to get at this information. If 
anyone has an automated approach I'd be interested . [Discard sequences 
from blast that have more then 4 hit 99%]


Scott

Scott Tighe
Advanced Genome Technology Lab
Vermont Cancer Center at the University of Vermont
149 Beaumont Avenue
Health Science Research Bd RM 305
Burlington Vermont USA 05405
lab  802-656-AGTC (2482)
cell 802-999-


On 3/12/2012 2:28 PM, John Major wrote:

A small warning re-the current cloud-Blast+ config.

To properly use the metagenomic tools, if you use the blast+ galaxy 
tool, make sure to export in blast.XML, then you'll need a script to 
parse out the readID and the Hit_def (as the hit ID).  It appears that 
the 'Hit_def' field contains the correct key to the taxonomy 
database.  Specifically, the Hit_def field is in the format #_#, where 
the 'gi' id is the first #.  The tabular (normal and extended) data 
does not contain this info.


I noticed this after attempting to use the tabular data, and using a 
trimmed col[1] (supposed to be hit seqID), but my results always came 
back as a ranked list of the most sequenced genomes in nt 
basically  keying in randomly.


j

On Wed, Mar 7, 2012 at 4:16 PM, Jennifer Jackson j...@bx.psu.edu 
mailto:j...@bx.psu.edu wrote:


Hi Vincent, Scott,

Filtering raw hits is an important part of a metagenomics analysis
pipeline. Please see the methods described in the published
metagenomics analysis paper associated with this tool set:

Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W,
Taylor J, and Nekrutenko A. Windshield splatter analysis with the
Galaxy metagenomic pipeline. Genome Research. 2009 Nov;
19(11):2144-53.

http://www.ncbi.nlm.nih.gov/pubmed/19819906

Live supplemental data that can be imported and experimented with
is available on the public instance, including raw data, working
histories, and a tutorial that demonstrates step-by-step the exact
methods used in the publication:
http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter
http://main.g2.bx.psu.edu/library - see Windshield splatter

Not all tools are available on the public main server, but a local
or cloud instance could be used with wrapped tools from the
Distribution or Tool Shed, as necessary. For example, BLAST is not
available on the public instance, but is included in the
distribution for use in local or cloud instances. http://getgalaxy.org

Hopefully you will both find this helpful,

Jen
Galaxy project




On 2/29/12 5:32 PM, Montoya, Vincent wrote:

Hello
I am a relatively new user on Galaxy and I had a question
regarding Fetching Taxonomic Information.  It is great that
I can retrieve all of the hits for each sequence, but I cannot
seem to find an option to also provide how accurate of a match
it is to the given taxon.  For instance, a percentage match.
 I can access this information in the original file and
programmatically retrieve it but, it would be nice if it came
in one package so that I can avoide those false hits that have
a low percentage match.  Can you please provide me with
instructions on how to best to retrieve this information
(hopefully in a single file)?
Thank you
Vincent
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org http://usegalaxy.org.  Please keep all
replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/


-- 
Jennifer Jackson

http://usegalaxy.org
http://galaxyproject.org/wiki/Support

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org http://usegalaxy.org.  Please keep all replies
on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/




___
The Galaxy User list should be used for the discussion of
Galaxy 

Re: [galaxy-user] Metagenomics

2012-03-07 Thread Jennifer Jackson

Hi Vincent, Scott,

Filtering raw hits is an important part of a metagenomics analysis 
pipeline. Please see the methods described in the published metagenomics 
analysis paper associated with this tool set:


Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor 
J, and Nekrutenko A. Windshield splatter analysis with the Galaxy 
metagenomic pipeline. Genome Research. 2009 Nov; 19(11):2144-53.


http://www.ncbi.nlm.nih.gov/pubmed/19819906

Live supplemental data that can be imported and experimented with is 
available on the public instance, including raw data, working histories, 
and a tutorial that demonstrates step-by-step the exact methods used in 
the publication:

http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter
http://main.g2.bx.psu.edu/library - see Windshield splatter

Not all tools are available on the public main server, but a local or 
cloud instance could be used with wrapped tools from the Distribution or 
Tool Shed, as necessary. For example, BLAST is not available on the 
public instance, but is included in the distribution for use in local or 
cloud instances. http://getgalaxy.org


Hopefully you will both find this helpful,

Jen
Galaxy project



On 2/29/12 5:32 PM, Montoya, Vincent wrote:

Hello
I am a relatively new user on Galaxy and I had a question regarding Fetching 
Taxonomic Information.  It is great that I can retrieve all of the hits for each 
sequence, but I cannot seem to find an option to also provide how accurate of a match it 
is to the given taxon.  For instance, a percentage match.  I can access this information 
in the original file and programmatically retrieve it but, it would be nice if it came in 
one package so that I can avoide those false hits that have a low percentage match.  Can 
you please provide me with instructions on how to best to retrieve this information 
(hopefully in a single file)?
Thank you
Vincent
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Metagenomics

2012-03-01 Thread Scott Tighe

Vincent

Great question!!!  And a follow up for me, is how to purge the conserved 
sequences. Presently the current data set I have from Fetch is likely 
to be 99% composed of incorrect taxon just because of conserved 
sequence. So, how do you select just unique sequences (ie  those that do 
not have more then... say 5 hits above 99%). Any advice would be nice.


Our bioinformatic person said there was a way to do it thru blast X.

Scott


Scott Tighe
Advanced Genome Technology Lab
Vermont Cancer Center at the University of Vermont
149 Beaumont Avenue
Health Science Research Bd RM 305
Burlington Vermont USA 05405
lab  802-656-AGTC (2482)
cell 802-999-


On 2/29/2012 8:32 PM, Montoya, Vincent wrote:

Hello
I am a relatively new user on Galaxy and I had a question regarding Fetching 
Taxonomic Information.  It is great that I can retrieve all of the hits for each 
sequence, but I cannot seem to find an option to also provide how accurate of a match it 
is to the given taxon.  For instance, a percentage match.  I can access this information 
in the original file and programmatically retrieve it but, it would be nice if it came in 
one package so that I can avoide those false hits that have a low percentage match.  Can 
you please provide me with instructions on how to best to retrieve this information 
(hopefully in a single file)?
Thank you
Vincent
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/