Re: [galaxy-user] Metagenomic filtering
Gerald 16s is basically useless for identification to genus. Since I started sequencing 16s in 1992, I have come to realize that without sequencing the full 1540 bases, it is generally misleading, and even than, it is not accurate enough to nail genus on more than 1/2 the cases. However, what is your feeling on ITS and gyrase, They seem to be far more discriminating but those databases have been decommissioned sometime ago. The desirable thing would be that Galaxy or NCBI add a filter conserved genes [ ie any hit with a second choice greater than 3% distance]. Something such as that. If you (or others) are aware of such a thing, I'd love the here about it. Sincerely Scott Scott Tighe Senior Core Laboratory Research Staff Advanced Genome Technologies Core University of Vermont Vermont Cancer Center 149 Beaumont ave Health Science Research Facility 303/305 Burlington Vermont 05405 802-656-2557 On 9/18/2013 2:05 PM, Gerald Bothe wrote: Removing model organisms may not be enough, you may have the same problem with, say, a Clostridium cluster IV anaerobe. I think a solution would be to first: compare to a collection of genes, e.g. get all the hits for 16S rRNA genes, RNA polymerases (conserved to quite conserved), and to e.g. ion channels and cell surface proteins. then: once a read or contig is identified as belonging to a gene family, gene, or protein domain, check within that group for species identities. Then you compare apples to apples in terms of gene conservation level Does anybody know a program that would do this efficiently from metagenomic data? Gerald Bothe *From:* Scott W. Tighe scott.ti...@uvm.edu *To:* galaxy-user@lists.bx.psu.edu *Sent:* Wednesday, September 18, 2013 10:03 AM *Subject:* Re: [galaxy-user] Metagenomic filtering Dear Galaxy When running HiSeq shot metagenomics sample from the environment against megablast and taxonomic representation, How do I filter/remove all the 16s and other conserved sequences. The problem if blasting a single organism that has a fraction of conserved sequence, the results will align with E.coli 10,000 times more then the possible target organism. This data would be wrong and misleading. For example a 100mg sample that was negative for e coli using MUG test, give thousands of hits with galaxy. 1) Is there a filter conserved sequences setting? 2) Is there a remove model organisms setting? Scott Tighe --Core Laboratory Research Staff Advanced Genome Technologies Core Deep Sequencing (MPS) Facility Vermont Cancer Center 149 Beaumont Ave University of Vermont HSRF 303 Burlington Vermont USA 05045 802-656-AGTC 802-999- (cell) Quoting Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu: Hello Elwood, Are you still having connection issues today? Or is this resolved? Best, Jen Galaxy team On 9/13/13 11:36 AM, Elwood Linney wrote: A message sent earlier this week by me indicated that I could not connect to Galaxy via Fetch to download data. A reply indicated a glitch was fixed. I then could connect with Fetch and I tried to transfer 4 x 16gb files and the connection disconnected about 4 times. Now, once again, I cannot connect with Galaxy online to transfer data. Is this a problem that can be solved-either at my end or at Galaxy? Elwood Linney ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org http://usegalaxy.org/. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ --Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified
Re: [galaxy-user] Identifying Tags - Galaxy Question
Hi Dominique, Glad that helped. And yes, you can merge many file types that are text-based with the tool 'Text Manipulation - Concatenate datasets. Sometimes you will need to convert to format tabular first, and then back to the desired format (fasta, gtf, etc.) after. Take care, Jen Galaxy team On 9/19/13 5:51 AM, D. A. Cowart wrote: Thank you Jennifer, this helped tremendously. I completely missed the barcode splitter too. One other question: do you know if it is possible to merge different fasta files on Galaxy? Say, I wanted to merge those tagged files back to one complete fasta file. Best, Dominique Cowart On Thu, Sep 19, 2013 at 12:19 AM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Hello Dominique, Yes, this can be done. Here is the process - Start by splitting up the data by using the 'NGS: QC and manipulation - Barcode Splitter tool. The result files will be available as links. These can be copied and added to the Get Data - Upload File tool in the text box, in batch, and each will loaded as a dataset. Copying them into a simple text file, then pasting into the Upload tool all at once is a quick way to do this, or you can do one by one. Once you have the individual files as datasets, you probably will want to rename them to better keep track of which barcode/tag they represent. Click on the pencil icon in the upper right corner of each dataset to do this on the Edit Attributes form. Next, the idea is to convert the fasta dataset to tabular, add in a column with the _Tag1 information, merge the original identifier column with the new tag column, cut the columns to rearrange - (you want just the new merged identifier and the original fasta sequence - leaving behind the two columns with the original identifier + tag), then covert back from tabular to fasta format. Use the tools in 'Text Manipulation' and 'FASTA manipulation' to do these operations. I would normally suggest creating/using a workflow at this point, but as the tags will all be different, and the Add column step is in the middle of the processing, this is probably not worth it. Hopefully this helps! Jen Galaxy team On 9/18/13 7:36 AM, D. A. Cowart wrote: Hello, I need to perform an action (or series of actions) on an 454 dataset using Galaxy, and have not been able to figure out the necessary steps, even after looking through the toolbar expressions and using custom search. My file is a fasta and has the standard format: GNJQDEZ01A940A CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA GNJQDEZ01BJYQZ CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT Each of the 100,000 sequences within this file contains a specific tag, which is the first 8 nucleotides. There are 19 tags total. I would like to identify these tags and add an identifier of the tag to the sequence name. Therefore, if I am looking for the first tag (CTGAGTCA), the output would look like: GNJQDEZ01A940A_*Tag1* *CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA Is it possible to achieve this using Galaxy? If possible, could you kindly suggest tools to use. Thank you in advance, Dominique Cowart ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server atusegalaxy.org http://usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] 3' adapter trimming using FASTX-toolkit clipper
Hi all, I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 M reads. I want to remove the adapter sequences from the reads before mapping to the genomes/known miRNA database. My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many reads only contain part of the 3' adapter sequence. I am using FASTX-toolkit to clip it off. How many bases should I put in the Enter custom clipping sequence ? Because in the output files, I end up with more reads when putting the whole 3 adapter sequence than putting only first 8 nt. Also, miRNA is about 17-25 nt long, I guess that the rest of the reads (51-21=30bp) must contain part or whole 5's adapter sequence or the by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' adapter as well. Any suggestion will be highly appreciated Thanh ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper
Hi Thanh, Just enter the whole adapter sequence. The tool will match what is found in the input sequence and clip. The help graphic on the Clip form itself illustrates this - only one adapter is entered (can be entered) but a variable length is clipped from the input to produce the output. Thanks for posting this new question to the mailing list. This greatly helps us to track provide the speediest replies. Best, Jen Galaxy team On 9/19/13 4:15 PM, Hoang, Thanh wrote: Hi all, I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 M reads. I want to remove the adapter sequences from the reads before mapping to the genomes/known miRNA database. My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many reads only contain part of the 3' adapter sequence. I am using FASTX-toolkit to clip it off. How many bases should I put in the Enter custom clipping sequence ? Because in the output files, I end up with more reads when putting the whole 3 adapter sequence than putting only first 8 nt. Also, miRNA is about 17-25 nt long, I guess that the rest of the reads (51-21=30bp) must contain part or whole 5's adapter sequence or the by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' adapter as well. Any suggestion will be highly appreciated Thanh ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper
Thanh, To hopefully be clearer, the part matched is clipped (whole or partial, and there is even some tolerance for low-frequency mismatches). I would suggest taking a few sequences out and running the tool on them to try it out. You could test for both length and mismatch constraints this way. (Perhaps even using constructed sequences that are modified to have specific adapter lengths and/or mismatch counts). This is a great way to get a feel for new tools in general. If you need more details about exactly how the algorithm works, you can read the original documentation and then if you still need help, try contacting the tool author (links at bottom of tool form). But this is a very popular, commonly used tool and what I have shared is how it is behaves to my knowledge experience. There may not be much more to it. Best, Jen Galaxy Team On Sep 19, 2013, at 5:57 PM, Hoang, Thanh hoan...@miamioh.edu wrote: Hi Jenny, Thank you. When you put the whole 3' adapter sequence into the Clipper, what will happen to the reads that only contains part of the adapter? Are they considered as not containing the adapter and subsequently non-clipped reads? Thanh On Thu, Sep 19, 2013 at 8:46 PM, Jennifer Jackson j...@bx.psu.edu wrote: Hi Thanh, Just enter the whole adapter sequence. The tool will match what is found in the input sequence and clip. The help graphic on the Clip form itself illustrates this - only one adapter is entered (can be entered) but a variable length is clipped from the input to produce the output. Thanks for posting this new question to the mailing list. This greatly helps us to track provide the speediest replies. Best, Jen Galaxy team On 9/19/13 4:15 PM, Hoang, Thanh wrote: Hi all, I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 M reads. I want to remove the adapter sequences from the reads before mapping to the genomes/known miRNA database. My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many reads only contain part of the 3' adapter sequence. I am using FASTX-toolkit to clip it off. How many bases should I put in the Enter custom clipping sequence ? Because in the output files, I end up with more reads when putting the whole 3 adapter sequence than putting only first 8 nt. Also, miRNA is about 17-25 nt long, I guess that the rest of the reads (51-21=30bp) must contain part or whole 5's adapter sequence or the by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' adapter as well. Any suggestion will be highly appreciated Thanh ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/