Re: [galaxy-user] Metagenomic filtering

2013-09-19 Thread Scott Tighe

Gerald

 16s is basically useless for identification to genus. Since I started 
sequencing 16s in 1992, I have come to realize that without sequencing 
the  full 1540 bases, it is generally misleading, and even than, it is 
not accurate enough to nail genus on more than 1/2 the cases.   However, 
what is your feeling on ITS  and gyrase, They seem to be far more 
discriminating but those databases have been decommissioned sometime ago.


The desirable thing would be that Galaxy or NCBI  add a filter 
conserved genes [ ie any hit with a second choice greater than 3% 
distance]. Something such as that.


If you (or others)  are aware of such a thing, I'd love the here about it.

Sincerely
Scott


Scott Tighe
Senior Core Laboratory Research Staff
Advanced Genome Technologies Core
University of Vermont
Vermont Cancer Center
149 Beaumont ave
Health Science Research Facility 303/305
Burlington Vermont 05405
802-656-2557

On 9/18/2013 2:05 PM, Gerald Bothe wrote:
Removing model organisms may not be enough, you may have the same 
problem with, say, a Clostridium cluster IV anaerobe. I think a 
solution would be to
first: compare to a collection of genes, e.g. get all the hits for 16S 
rRNA genes, RNA polymerases (conserved to quite conserved), and to 
e.g. ion channels and cell surface proteins.
then: once a read or contig is identified as belonging to a gene 
family, gene, or protein domain, check within that group for  species 
identities. Then you compare apples to apples in terms of gene 
conservation level
Does anybody know a program that would do this efficiently from 
metagenomic data?

Gerald Bothe

*From:* Scott W. Tighe scott.ti...@uvm.edu
*To:* galaxy-user@lists.bx.psu.edu
*Sent:* Wednesday, September 18, 2013 10:03 AM
*Subject:* Re: [galaxy-user] Metagenomic filtering

Dear Galaxy

When running HiSeq shot metagenomics sample from the environment
against megablast and taxonomic representation, How do I
filter/remove all the 16s and other conserved sequences.

The problem if blasting a single organism that has a fraction of
conserved sequence, the results will align with E.coli 10,000
times more then the possible target organism. This data would be
wrong and misleading. For example a 100mg sample that was negative
for e coli using MUG test, give thousands of hits with galaxy.

1) Is there a filter conserved sequences setting?



2) Is there a remove model organisms setting?


Scott Tighe
--Core Laboratory Research Staff
Advanced Genome Technologies Core
Deep Sequencing (MPS) Facility
Vermont Cancer Center
149 Beaumont Ave
University of Vermont HSRF 303
Burlington Vermont  USA 05045
802-656-AGTC
802-999- (cell)



Quoting Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu:

 Hello Elwood,

 Are you still having connection issues today? Or is this resolved?

 Best,

 Jen
 Galaxy team

 On 9/13/13 11:36 AM, Elwood Linney wrote:
 A message sent earlier this week by me indicated that I could
not connect to Galaxy via Fetch to download data.

 A reply indicated a glitch was fixed.

 I then could connect with Fetch and I tried to transfer 4 x
16gb files and the connection disconnected about 4 times.

 Now, once again, I cannot connect with Galaxy online to
transfer data.

 Is this a problem that can be solved-either at my end or at Galaxy?

 Elwood Linney


 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org http://usegalaxy.org/. Please keep all
replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/

 --Jennifer Hillman-Jackson
 http://galaxyproject.org



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified 

Re: [galaxy-user] Identifying Tags - Galaxy Question

2013-09-19 Thread Jennifer Jackson

Hi Dominique,

Glad that helped. And yes, you can merge many file types that are 
text-based with the tool 'Text Manipulation - Concatenate datasets. 
Sometimes you will need to convert to format tabular first, and then 
back to the desired format (fasta, gtf, etc.) after.


Take care,

Jen
Galaxy team

On 9/19/13 5:51 AM, D. A. Cowart wrote:
Thank you Jennifer, this helped tremendously. I completely missed the 
barcode splitter too.
One other question: do you know if it is possible to merge different 
fasta files on Galaxy? Say, I wanted to merge those tagged files back 
to one complete fasta file.


Best,
Dominique Cowart


On Thu, Sep 19, 2013 at 12:19 AM, Jennifer Jackson j...@bx.psu.edu 
mailto:j...@bx.psu.edu wrote:


Hello Dominique,

Yes, this can be done. Here is the process -

Start by splitting up the data by using the 'NGS: QC and
manipulation - Barcode Splitter tool. The result files will be
available as links. These can be copied and added to the Get Data
- Upload File tool in the text box, in batch, and each will
loaded as a dataset. Copying them into a simple text file, then
pasting into the Upload tool all at once is a quick way to do
this, or you can do one by one.

Once you have the individual files as datasets, you probably will
want to rename them to better keep track of which barcode/tag they
represent. Click on the pencil icon in the upper right corner of
each dataset to do this on the Edit Attributes form.

Next, the idea is to convert the fasta dataset to tabular, add in
a column with the _Tag1 information, merge the original
identifier column with the new tag column, cut the columns to
rearrange - (you want just the new merged identifier and the
original fasta sequence - leaving behind the two columns with the
original identifier + tag), then covert back from tabular to fasta
format. Use the tools in 'Text Manipulation' and 'FASTA
manipulation' to do these operations. I would normally suggest
creating/using a workflow at this point, but as the tags will all
be different, and the Add column step is in the middle of the
processing, this is probably not worth it.

Hopefully this helps!

Jen
Galaxy team


On 9/18/13 7:36 AM, D. A. Cowart wrote:

Hello,

I need to perform an action (or series of actions) on an 454
dataset using Galaxy, and have not been able to figure out the
necessary steps, even after looking through the toolbar
expressions and using custom search.
My file is a fasta and has the standard format:

GNJQDEZ01A940A
CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA
GNJQDEZ01BJYQZ
CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT

Each of the 100,000 sequences within this file contains a
specific tag, which is the first 8 nucleotides.
There are 19 tags total. I would like to identify these tags and
add an identifier of the tag to the sequence name.
Therefore, if I am looking for the first tag (CTGAGTCA), the
output would look like:

GNJQDEZ01A940A_*Tag1*
*CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA

Is it possible to achieve this using Galaxy? If possible, could
you kindly suggest tools to use.

Thank you in advance,
Dominique Cowart


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
atusegalaxy.org  http://usegalaxy.org.  Please keep all replies on the 
list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


-- 
Jennifer Hillman-Jackson

http://galaxyproject.org




--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Hoang, Thanh
Hi all,
I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5
M reads. I want to remove the adapter sequences from the reads before
mapping to the genomes/known miRNA database.
My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many
reads only contain part of the 3' adapter sequence. I am using
FASTX-toolkit to clip it off. How many bases  should I put in the  Enter
custom clipping sequence ? Because in the output files, I end up with more
reads when putting the whole 3 adapter sequence than putting only first 8
nt.
Also, miRNA is about 17-25 nt long, I guess that the rest of the reads
(51-21=30bp) must contain part or whole 5's adapter sequence or the
by-product of mRNA/tRNA degradation. So I think that I have to trim the 5'
adapter as well.
Any suggestion will be highly appreciated
Thanh
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Jennifer Jackson

Hi Thanh,

Just enter the whole adapter sequence. The tool will match what is found 
in the input sequence and clip. The help graphic on the Clip form itself 
illustrates this - only one adapter is entered (can be entered) but a 
variable length is clipped from the input to produce the output.


Thanks for posting this new question to the mailing list. This greatly 
helps us to track  provide the speediest replies.


Best,

Jen
Galaxy team

On 9/19/13 4:15 PM, Hoang, Thanh wrote:

Hi all,
I am analyzing miRNA sequencing now. My data is 51bp, single -ended 
and ~5 M reads. I want to remove the adapter sequences from the reads 
before mapping to the genomes/known miRNA database.
My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that 
many reads only contain part of the 3' adapter sequence. I am using 
FASTX-toolkit to clip it off. How many bases  should I put in the  
Enter custom clipping sequence ? Because in the output files, I end 
up with more reads when putting the whole 3 adapter sequence than 
putting only first 8 nt.
Also, miRNA is about 17-25 nt long, I guess that the rest of the reads 
(51-21=30bp) must contain part or whole 5's adapter sequence or the 
by-product of mRNA/tRNA degradation. So I think that I have to trim 
the 5' adapter as well.

Any suggestion will be highly appreciated
Thanh



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Jennifer Jackson
Thanh,

To hopefully be clearer, the part matched is clipped (whole or partial, and 
there is even some tolerance for low-frequency mismatches). 

I would suggest taking a few sequences out and running the tool on them to try 
it out. You could test for both length and mismatch constraints this way. 
(Perhaps even using constructed sequences that are modified to have specific 
adapter lengths and/or mismatch counts). This is a great way to get a feel for 
new tools in general.

If you need more details about exactly how the algorithm works, you can read 
the original documentation and then if you still need help, try contacting the 
tool author (links at bottom of tool form). But this is a very popular, 
commonly used tool and what I have shared is how it is behaves to my knowledge 
 experience. There may not be much more to it.

Best,

Jen
Galaxy Team


On Sep 19, 2013, at 5:57 PM, Hoang, Thanh hoan...@miamioh.edu wrote:

 Hi Jenny,
 Thank you.
 When you put the whole 3' adapter sequence into the Clipper, what will happen 
 to the reads that only contains part of the adapter? Are they considered as 
 not containing the adapter and subsequently non-clipped reads?
 Thanh
 
 
 On Thu, Sep 19, 2013 at 8:46 PM, Jennifer Jackson j...@bx.psu.edu wrote:
 Hi Thanh,
 
 Just enter the whole adapter sequence. The tool will match what is found in 
 the input sequence and clip. The help graphic on the Clip form itself 
 illustrates this - only one adapter is entered (can be entered) but a 
 variable length is clipped from the input to produce the output.
 
 Thanks for posting this new question to the mailing list. This greatly helps 
 us to track  provide the speediest replies.
 
 Best,
 
 Jen
 Galaxy team
 
 
 On 9/19/13 4:15 PM, Hoang, Thanh wrote:
 Hi all,
 I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 
 M reads. I want to remove the adapter sequences from the reads before 
 mapping to the genomes/known miRNA database.
 My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many 
 reads only contain part of the 3' adapter sequence. I am using 
 FASTX-toolkit to clip it off. How many bases  should I put in the  Enter 
 custom clipping sequence ? Because in the output files, I end up with more 
 reads when putting the whole 3 adapter sequence than putting only first 8 
 nt.
 Also, miRNA is about 17-25 nt long, I guess that the rest of the reads 
 (51-21=30bp) must contain part or whole 5's adapter sequence or the 
 by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' 
 adapter as well.
 Any suggestion will be highly appreciated
 Thanh
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
   http://galaxyproject.org/search/mailinglists/
 
 -- 
 Jennifer Hillman-Jackson
 http://galaxyproject.org
 
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/