[galaxy-user] Identifying Tags - Galaxy Question
Hello, I need to perform an action (or series of actions) on an 454 dataset using Galaxy, and have not been able to figure out the necessary steps, even after looking through the toolbar expressions and using custom search. My file is a fasta and has the standard format: GNJQDEZ01A940A CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA GNJQDEZ01BJYQZ CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT Each of the 100,000 sequences within this file contains a specific tag, which is the first 8 nucleotides. There are 19 tags total. I would like to identify these tags and add an identifier of the tag to the sequence name. Therefore, if I am looking for the first tag (CTGAGTCA), the output would look like: GNJQDEZ01A940A_*Tag1* *CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA Is it possible to achieve this using Galaxy? If possible, could you kindly suggest tools to use. Thank you in advance, Dominique Cowart ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Galaxy Question
Hello, I would like to use Galaxy to divide a very large Ilumnia fasta file (~3GB) into separate fasta files. Is this possible on Galaxy? Here is an example of the reads: HWI-ST156:535:C10GLACXX:8:1101:1195:1080 1:N:0:CGGTTGT AAATAGAATATCACATTTCACAAGCAGGACAGTGTGTGTGAAATCGTGAATTCAACGTTTATCAATTAGAACGCCTACGTGTAG HWI-ST156:535:C10GLACXX:8:1101:1210:1102 1:N:0:CGGTTGT ATTTATCATAACAACTTAAATCAGTCAGTGGATTTCTGTCGGTCCGGTTAGCTCGGTTGGTAAAGGCGTTTGTTCGATCGTCTGTAGCAATCGGGC I have tried the Filter and Sort option to try and select sequences just by a beginning sequence (ATGC, for example) to separate these sequences into a specific file, but I have been unsuccessful in this. Thank you, Dominique ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Help with Summary Statistics
Hello, I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been organized and sorted by species. The format of the data is as follows: HD4AU5D01BHBCQCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MCTCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT etc...for each species I have attempted to use the Summary Statistics button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy? Thank you, Dominique Cowart User name: dac330 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/