Re: [galaxy-user] Metagenomics
A small warning re-the current cloud-Blast+ config. To properly use the metagenomic tools, if you use the blast+ galaxy tool, make sure to export in blast.XML, then you'll need a script to parse out the readID and the Hit_def (as the hit ID). It appears that the 'Hit_def' field contains the correct key to the taxonomy database. Specifically, the Hit_def field is in the format #_#, where the 'gi' id is the first #. The tabular (normal and extended) data does not contain this info. I noticed this after attempting to use the tabular data, and using a trimmed col[1] (supposed to be hit seqID), but my results always came back as a ranked list of the most sequenced genomes in nt basically keying in randomly. j On Wed, Mar 7, 2012 at 4:16 PM, Jennifer Jackson j...@bx.psu.edu wrote: Hi Vincent, Scott, Filtering raw hits is an important part of a metagenomics analysis pipeline. Please see the methods described in the published metagenomics analysis paper associated with this tool set: Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor J, and Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Research. 2009 Nov; 19(11):2144-53. http://www.ncbi.nlm.nih.gov/**pubmed/19819906http://www.ncbi.nlm.nih.gov/pubmed/19819906 Live supplemental data that can be imported and experimented with is available on the public instance, including raw data, working histories, and a tutorial that demonstrates step-by-step the exact methods used in the publication: http://main.g2.bx.psu.edu/u/**aun1/p/windshield-splatterhttp://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter http://main.g2.bx.psu.edu/**library http://main.g2.bx.psu.edu/library- see Windshield splatter Not all tools are available on the public main server, but a local or cloud instance could be used with wrapped tools from the Distribution or Tool Shed, as necessary. For example, BLAST is not available on the public instance, but is included in the distribution for use in local or cloud instances. http://getgalaxy.org Hopefully you will both find this helpful, Jen Galaxy project On 2/29/12 5:32 PM, Montoya, Vincent wrote: Hello I am a relatively new user on Galaxy and I had a question regarding Fetching Taxonomic Information. It is great that I can retrieve all of the hits for each sequence, but I cannot seem to find an option to also provide how accurate of a match it is to the given taxon. For instance, a percentage match. I can access this information in the original file and programmatically retrieve it but, it would be nice if it came in one package so that I can avoide those false hits that have a low percentage match. Can you please provide me with instructions on how to best to retrieve this information (hopefully in a single file)? Thank you Vincent __**_ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/**listinfo/galaxy-devhttp://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/**Supporthttp://galaxyproject.org/wiki/Support __**_ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/**listinfo/galaxy-devhttp://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Metagenomics
On Mon, Mar 12, 2012 at 6:28 PM, John Major john.e.major...@gmail.com wrote: A small warning re-the current cloud-Blast+ config. To properly use the metagenomic tools, if you use the blast+ galaxy tool, make sure to export in blast.XML, then you'll need a script to parse out the readID and the Hit_def (as the hit ID). It appears that the 'Hit_def' field contains the correct key to the taxonomy database. Specifically, the Hit_def field is in the format #_#, where the 'gi' id is the first #. The tabular (normal and extended) data does not contain this info. I noticed this after attempting to use the tabular data, and using a trimmed col[1] (supposed to be hit seqID), but my results always came back as a ranked list of the most sequenced genomes in nt basically keying in randomly. j Hi John, Can you expand on that with a specific example (ideally on the galaxy-dev list, CC'd, since BLAST+ isn't event available on the public galaxy)? Also which version of BLAST+ are you using since I recall some changes to the tabular output IDs prior to 2.2.25 (which is what the wrappers were tested on, I've not tried 2.2.26 yet). Thanks, Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Metagenomics
Dear GALAXY and Jennifer Although the windshield analysis papers were good starters, They do not address conversed sequence purging or how to get at this information. If anyone has an automated approach I'd be interested . [Discard sequences from blast that have more then 4 hit 99%] Scott Scott Tighe Advanced Genome Technology Lab Vermont Cancer Center at the University of Vermont 149 Beaumont Avenue Health Science Research Bd RM 305 Burlington Vermont USA 05405 lab 802-656-AGTC (2482) cell 802-999- On 3/12/2012 2:28 PM, John Major wrote: A small warning re-the current cloud-Blast+ config. To properly use the metagenomic tools, if you use the blast+ galaxy tool, make sure to export in blast.XML, then you'll need a script to parse out the readID and the Hit_def (as the hit ID). It appears that the 'Hit_def' field contains the correct key to the taxonomy database. Specifically, the Hit_def field is in the format #_#, where the 'gi' id is the first #. The tabular (normal and extended) data does not contain this info. I noticed this after attempting to use the tabular data, and using a trimmed col[1] (supposed to be hit seqID), but my results always came back as a ranked list of the most sequenced genomes in nt basically keying in randomly. j On Wed, Mar 7, 2012 at 4:16 PM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Hi Vincent, Scott, Filtering raw hits is an important part of a metagenomics analysis pipeline. Please see the methods described in the published metagenomics analysis paper associated with this tool set: Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor J, and Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Research. 2009 Nov; 19(11):2144-53. http://www.ncbi.nlm.nih.gov/pubmed/19819906 Live supplemental data that can be imported and experimented with is available on the public instance, including raw data, working histories, and a tutorial that demonstrates step-by-step the exact methods used in the publication: http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter http://main.g2.bx.psu.edu/library - see Windshield splatter Not all tools are available on the public main server, but a local or cloud instance could be used with wrapped tools from the Distribution or Tool Shed, as necessary. For example, BLAST is not available on the public instance, but is included in the distribution for use in local or cloud instances. http://getgalaxy.org Hopefully you will both find this helpful, Jen Galaxy project On 2/29/12 5:32 PM, Montoya, Vincent wrote: Hello I am a relatively new user on Galaxy and I had a question regarding Fetching Taxonomic Information. It is great that I can retrieve all of the hits for each sequence, but I cannot seem to find an option to also provide how accurate of a match it is to the given taxon. For instance, a percentage match. I can access this information in the original file and programmatically retrieve it but, it would be nice if it came in one package so that I can avoide those false hits that have a low percentage match. Can you please provide me with instructions on how to best to retrieve this information (hopefully in a single file)? Thank you Vincent ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org http://usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org http://usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy
Re: [galaxy-user] Metagenomics
Hi Vincent, Scott, Filtering raw hits is an important part of a metagenomics analysis pipeline. Please see the methods described in the published metagenomics analysis paper associated with this tool set: Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor J, and Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Research. 2009 Nov; 19(11):2144-53. http://www.ncbi.nlm.nih.gov/pubmed/19819906 Live supplemental data that can be imported and experimented with is available on the public instance, including raw data, working histories, and a tutorial that demonstrates step-by-step the exact methods used in the publication: http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter http://main.g2.bx.psu.edu/library - see Windshield splatter Not all tools are available on the public main server, but a local or cloud instance could be used with wrapped tools from the Distribution or Tool Shed, as necessary. For example, BLAST is not available on the public instance, but is included in the distribution for use in local or cloud instances. http://getgalaxy.org Hopefully you will both find this helpful, Jen Galaxy project On 2/29/12 5:32 PM, Montoya, Vincent wrote: Hello I am a relatively new user on Galaxy and I had a question regarding Fetching Taxonomic Information. It is great that I can retrieve all of the hits for each sequence, but I cannot seem to find an option to also provide how accurate of a match it is to the given taxon. For instance, a percentage match. I can access this information in the original file and programmatically retrieve it but, it would be nice if it came in one package so that I can avoide those false hits that have a low percentage match. Can you please provide me with instructions on how to best to retrieve this information (hopefully in a single file)? Thank you Vincent ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Metagenomics
Vincent Great question!!! And a follow up for me, is how to purge the conserved sequences. Presently the current data set I have from Fetch is likely to be 99% composed of incorrect taxon just because of conserved sequence. So, how do you select just unique sequences (ie those that do not have more then... say 5 hits above 99%). Any advice would be nice. Our bioinformatic person said there was a way to do it thru blast X. Scott Scott Tighe Advanced Genome Technology Lab Vermont Cancer Center at the University of Vermont 149 Beaumont Avenue Health Science Research Bd RM 305 Burlington Vermont USA 05405 lab 802-656-AGTC (2482) cell 802-999- On 2/29/2012 8:32 PM, Montoya, Vincent wrote: Hello I am a relatively new user on Galaxy and I had a question regarding Fetching Taxonomic Information. It is great that I can retrieve all of the hits for each sequence, but I cannot seem to find an option to also provide how accurate of a match it is to the given taxon. For instance, a percentage match. I can access this information in the original file and programmatically retrieve it but, it would be nice if it came in one package so that I can avoide those false hits that have a low percentage match. Can you please provide me with instructions on how to best to retrieve this information (hopefully in a single file)? Thank you Vincent ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/