Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-20 Thread Jennifer Jackson

Hi Thanh,

Questions are fine, that is what this mailing list is for. But please do 
try to cc the mailing list and start a new thread for new topics when 
possible.


To generate a length distribution (among other stats), the tool "NGS: QC 
and manipulation -> FastQC" is a quick method.


Take care,

Jen
Galaxy team

On 9/20/13 9:11 AM, Hoang, Thanh wrote:

Hi,
Thank you very much Jenny.
You are right. The FASTX-toolkit does some tolerances.
Now I got the output file after clipping full-length 3' adapter 
sequence. Is there any tool in Galaxy where I can draw some kind of 
graph or statistics  showing distribution of the length versus number 
 of  the reads in the output file?

Sorry for asking many basic questions
Thanh



On Thu, Sep 19, 2013 at 9:31 PM, Jennifer Jackson > wrote:


Thanh,

To hopefully be clearer, the part matched is clipped (whole or
partial, and there is even some tolerance for low-frequency
mismatches).

I would suggest taking a few sequences out and running the tool on
them to try it out. You could test for both length and mismatch
constraints this way. (Perhaps even using constructed sequences
that are modified to have specific adapter lengths and/or mismatch
counts). This is a great way to get a feel for new tools in general.

If you need more details about exactly how the algorithm works,
you can read the original documentation and then if you still need
help, try contacting the tool author (links at bottom of tool
form). But this is a very popular, commonly used tool and what I
have shared is how it is behaves to my knowledge & experience.
There may not be much more to it.

Best,

Jen
Galaxy Team


On Sep 19, 2013, at 5:57 PM, "Hoang, Thanh" mailto:hoan...@miamioh.edu>> wrote:


Hi Jenny,
Thank you.
When you put the whole 3' adapter sequence into the Clipper, what
will happen to the reads that only contains part of the adapter?
Are they considered as not containing the adapter and
subsequently non-clipped reads?
Thanh


On Thu, Sep 19, 2013 at 8:46 PM, Jennifer Jackson mailto:j...@bx.psu.edu>> wrote:

Hi Thanh,

Just enter the whole adapter sequence. The tool will match
what is found in the input sequence and clip. The help
graphic on the Clip form itself illustrates this - only one
adapter is entered (can be entered) but a variable length is
clipped from the input to produce the output.

Thanks for posting this new question to the mailing list.
This greatly helps us to track & provide the speediest replies.

Best,

Jen
Galaxy team


On 9/19/13 4:15 PM, Hoang, Thanh wrote:

Hi all,
I am analyzing miRNA sequencing now. My data is 51bp, single
-ended and ~5 M reads. I want to remove the adapter
sequences from the reads before mapping to the genomes/known
miRNA database.
My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I
found that many reads only contain part of the 3' adapter
sequence. I am using FASTX-toolkit to clip it off. How many
bases  should I put in the " Enter custom clipping sequence"
? Because in the output files, I end up with more reads when
putting the whole 3 adapter sequence than putting only first
8 nt.
Also, miRNA is about 17-25 nt long, I guess that the rest of
the reads (51-21=30bp) must contain part or whole 5's
adapter sequence or the by-product of mRNA/tRNA degradation.
So I think that I have to trim the 5' adapter as well.
Any suggestion will be highly appreciated
Thanh



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
atusegalaxy.org  .  Please keep all replies on 
the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/



--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://l

Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Jennifer Jackson
Thanh,

To hopefully be clearer, the part matched is clipped (whole or partial, and 
there is even some tolerance for low-frequency mismatches). 

I would suggest taking a few sequences out and running the tool on them to try 
it out. You could test for both length and mismatch constraints this way. 
(Perhaps even using constructed sequences that are modified to have specific 
adapter lengths and/or mismatch counts). This is a great way to get a feel for 
new tools in general.

If you need more details about exactly how the algorithm works, you can read 
the original documentation and then if you still need help, try contacting the 
tool author (links at bottom of tool form). But this is a very popular, 
commonly used tool and what I have shared is how it is behaves to my knowledge 
& experience. There may not be much more to it.

Best,

Jen
Galaxy Team


On Sep 19, 2013, at 5:57 PM, "Hoang, Thanh"  wrote:

> Hi Jenny,
> Thank you.
> When you put the whole 3' adapter sequence into the Clipper, what will happen 
> to the reads that only contains part of the adapter? Are they considered as 
> not containing the adapter and subsequently non-clipped reads?
> Thanh
> 
> 
> On Thu, Sep 19, 2013 at 8:46 PM, Jennifer Jackson  wrote:
>> Hi Thanh,
>> 
>> Just enter the whole adapter sequence. The tool will match what is found in 
>> the input sequence and clip. The help graphic on the Clip form itself 
>> illustrates this - only one adapter is entered (can be entered) but a 
>> variable length is clipped from the input to produce the output.
>> 
>> Thanks for posting this new question to the mailing list. This greatly helps 
>> us to track & provide the speediest replies.
>> 
>> Best,
>> 
>> Jen
>> Galaxy team
>> 
>> 
>> On 9/19/13 4:15 PM, Hoang, Thanh wrote:
>>> Hi all,
>>> I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 
>>> M reads. I want to remove the adapter sequences from the reads before 
>>> mapping to the genomes/known miRNA database.
>>> My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many 
>>> reads only contain part of the 3' adapter sequence. I am using 
>>> FASTX-toolkit to clip it off. How many bases  should I put in the " Enter 
>>> custom clipping sequence" ? Because in the output files, I end up with more 
>>> reads when putting the whole 3 adapter sequence than putting only first 8 
>>> nt.
>>> Also, miRNA is about 17-25 nt long, I guess that the rest of the reads 
>>> (51-21=30bp) must contain part or whole 5's adapter sequence or the 
>>> by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' 
>>> adapter as well.
>>> Any suggestion will be highly appreciated
>>> Thanh
>>> 
>>> 
>>> 
>>> ___
>>> The Galaxy User list should be used for the discussion of
>>> Galaxy analysis and other features on the public server
>>> at usegalaxy.org.  Please keep all replies on the list by
>>> using "reply all" in your mail client.  For discussion of
>>> local Galaxy instances and the Galaxy source code, please
>>> use the Galaxy Development list:
>>> 
>>>   http://lists.bx.psu.edu/listinfo/galaxy-dev
>>> 
>>> To manage your subscriptions to this and other Galaxy lists,
>>> please use the interface at:
>>> 
>>>   http://lists.bx.psu.edu/
>>> 
>>> To search Galaxy mailing lists use the unified search at:
>>> 
>>>   http://galaxyproject.org/search/mailinglists/
>> 
>> -- 
>> Jennifer Hillman-Jackson
>> http://galaxyproject.org
> 
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Jennifer Jackson

Hi Thanh,

Just enter the whole adapter sequence. The tool will match what is found 
in the input sequence and clip. The help graphic on the Clip form itself 
illustrates this - only one adapter is entered (can be entered) but a 
variable length is clipped from the input to produce the output.


Thanks for posting this new question to the mailing list. This greatly 
helps us to track & provide the speediest replies.


Best,

Jen
Galaxy team

On 9/19/13 4:15 PM, Hoang, Thanh wrote:

Hi all,
I am analyzing miRNA sequencing now. My data is 51bp, single -ended 
and ~5 M reads. I want to remove the adapter sequences from the reads 
before mapping to the genomes/known miRNA database.
My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that 
many reads only contain part of the 3' adapter sequence. I am using 
FASTX-toolkit to clip it off. How many bases  should I put in the " 
Enter custom clipping sequence" ? Because in the output files, I end 
up with more reads when putting the whole 3 adapter sequence than 
putting only first 8 nt.
Also, miRNA is about 17-25 nt long, I guess that the rest of the reads 
(51-21=30bp) must contain part or whole 5's adapter sequence or the 
by-product of mRNA/tRNA degradation. So I think that I have to trim 
the 5' adapter as well.

Any suggestion will be highly appreciated
Thanh



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] 3' adapter trimming using FASTX-toolkit clipper

2013-09-19 Thread Hoang, Thanh
Hi all,
I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5
M reads. I want to remove the adapter sequences from the reads before
mapping to the genomes/known miRNA database.
My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many
reads only contain part of the 3' adapter sequence. I am using
FASTX-toolkit to clip it off. How many bases  should I put in the " Enter
custom clipping sequence" ? Because in the output files, I end up with more
reads when putting the whole 3 adapter sequence than putting only first 8
nt.
Also, miRNA is about 17-25 nt long, I guess that the rest of the reads
(51-21=30bp) must contain part or whole 5's adapter sequence or the
by-product of mRNA/tRNA degradation. So I think that I have to trim the 5'
adapter as well.
Any suggestion will be highly appreciated
Thanh
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/