Re: [galaxy-user] Identifying Tags - Galaxy Question

2013-09-19 Thread Jennifer Jackson

Hi Dominique,

Glad that helped. And yes, you can merge many file types that are 
text-based with the tool 'Text Manipulation - Concatenate datasets. 
Sometimes you will need to convert to format tabular first, and then 
back to the desired format (fasta, gtf, etc.) after.


Take care,

Jen
Galaxy team

On 9/19/13 5:51 AM, D. A. Cowart wrote:
Thank you Jennifer, this helped tremendously. I completely missed the 
barcode splitter too.
One other question: do you know if it is possible to merge different 
fasta files on Galaxy? Say, I wanted to merge those tagged files back 
to one complete fasta file.


Best,
Dominique Cowart


On Thu, Sep 19, 2013 at 12:19 AM, Jennifer Jackson j...@bx.psu.edu 
mailto:j...@bx.psu.edu wrote:


Hello Dominique,

Yes, this can be done. Here is the process -

Start by splitting up the data by using the 'NGS: QC and
manipulation - Barcode Splitter tool. The result files will be
available as links. These can be copied and added to the Get Data
- Upload File tool in the text box, in batch, and each will
loaded as a dataset. Copying them into a simple text file, then
pasting into the Upload tool all at once is a quick way to do
this, or you can do one by one.

Once you have the individual files as datasets, you probably will
want to rename them to better keep track of which barcode/tag they
represent. Click on the pencil icon in the upper right corner of
each dataset to do this on the Edit Attributes form.

Next, the idea is to convert the fasta dataset to tabular, add in
a column with the _Tag1 information, merge the original
identifier column with the new tag column, cut the columns to
rearrange - (you want just the new merged identifier and the
original fasta sequence - leaving behind the two columns with the
original identifier + tag), then covert back from tabular to fasta
format. Use the tools in 'Text Manipulation' and 'FASTA
manipulation' to do these operations. I would normally suggest
creating/using a workflow at this point, but as the tags will all
be different, and the Add column step is in the middle of the
processing, this is probably not worth it.

Hopefully this helps!

Jen
Galaxy team


On 9/18/13 7:36 AM, D. A. Cowart wrote:

Hello,

I need to perform an action (or series of actions) on an 454
dataset using Galaxy, and have not been able to figure out the
necessary steps, even after looking through the toolbar
expressions and using custom search.
My file is a fasta and has the standard format:

GNJQDEZ01A940A
CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA
GNJQDEZ01BJYQZ
CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT

Each of the 100,000 sequences within this file contains a
specific tag, which is the first 8 nucleotides.
There are 19 tags total. I would like to identify these tags and
add an identifier of the tag to the sequence name.
Therefore, if I am looking for the first tag (CTGAGTCA), the
output would look like:

GNJQDEZ01A940A_*Tag1*
*CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA

Is it possible to achieve this using Galaxy? If possible, could
you kindly suggest tools to use.

Thank you in advance,
Dominique Cowart


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
atusegalaxy.org  http://usegalaxy.org.  Please keep all replies on the 
list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


-- 
Jennifer Hillman-Jackson

http://galaxyproject.org




--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Identifying Tags - Galaxy Question

2013-09-18 Thread Jennifer Jackson

Hello Dominique,

Yes, this can be done. Here is the process -

Start by splitting up the data by using the 'NGS: QC and manipulation - 
Barcode Splitter tool. The result files will be available as links. 
These can be copied and added to the Get Data - Upload File tool in 
the text box, in batch, and each will loaded as a dataset. Copying them 
into a simple text file, then pasting into the Upload tool all at once 
is a quick way to do this, or you can do one by one.


Once you have the individual files as datasets, you probably will want 
to rename them to better keep track of which barcode/tag they represent. 
Click on the pencil icon in the upper right corner of each dataset to do 
this on the Edit Attributes form.


Next, the idea is to convert the fasta dataset to tabular, add in a 
column with the _Tag1 information, merge the original identifier 
column with the new tag column, cut the columns to rearrange - (you want 
just the new merged identifier and the original fasta sequence - leaving 
behind the two columns with the original identifier + tag), then covert 
back from tabular to fasta format. Use the tools in 'Text Manipulation' 
and 'FASTA manipulation' to do these operations. I would normally 
suggest creating/using a workflow at this point, but as the tags will 
all be different, and the Add column step is in the middle of the 
processing, this is probably not worth it.


Hopefully this helps!

Jen
Galaxy team

On 9/18/13 7:36 AM, D. A. Cowart wrote:

Hello,

I need to perform an action (or series of actions) on an 454 dataset 
using Galaxy, and have not been able to figure out the necessary 
steps, even after looking through the toolbar expressions and using 
custom search.

My file is a fasta and has the standard format:

GNJQDEZ01A940A
CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA
GNJQDEZ01BJYQZ
CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT

Each of the 100,000 sequences within this file contains a specific 
tag, which is the first 8 nucleotides.
There are 19 tags total. I would like to identify these tags and add 
an identifier of the tag to the sequence name.
Therefore, if I am looking for the first tag (CTGAGTCA), the output 
would look like:


GNJQDEZ01A940A_*Tag1*
*CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA

Is it possible to achieve this using Galaxy? If possible, could you 
kindly suggest tools to use.


Thank you in advance,
Dominique Cowart


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/