Re: [galaxy-user] help for trim sequences
You might also use ( / add to main) the CutAdapt tool, which is available in the main toolshed. It takes multiple adapters, allows 3/5/both side adapters, and is fast. http://toolshed.g2.bx.psu.edu/repository/view_repository?sort=User.usernameoperation=view_or_manage_repositoryid=f19bc86bac946438 Best, Geert On 11/23/2013 03:19 PM, Peter Cock wrote: On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jackson j...@bx.psu.edu wrote: Hi Seung Hee, I know we discussed this on the other list, but I didn't point you to the open development ticket to (potentially) extend the functions of the Cut tool. This is not being actively worked on right now, but you can follow it for updates if you want. https://trello.com/c/CbFSHrU5 Others are still welcome to comment about what types of solutions they might have to offer. There is no specific tool to do this on Main right now (or in the Tool Shed, from my checks). http://usegalaxy.org/toolshed This tool of mine might do what Seung Hee wanted, but I have not tried it on very large Illumina datasets: http://toolshed.g2.bx.psu.edu/view/peterjc/seq_primer_clip Regards, Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandewe...@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/in/geertvandeweyer ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] help for trim sequences
Thanks Geert, This tool was in the back on my mind, but I couldn't find it last week for some reason! Seung Hee - this is a very good choice, for use in a local or or cloud Galaxy. http://getgalaxy.org http://usegalaxy.org/cloud I think I will close out the ticket below and point it to CutAdapt as a solution. A ticket to ask for this tool to be on Main is a distinct subject/issue - if someone wants to submit that request, the community can vote, team can priotitize, etc. http://wiki.galaxyproject.org/Issues The tool Peter mentions can also be examined. One may fit your needs better than the other, Thanks!! Jen Galaxy team On 11/25/13 6:03 AM, Geert Vandeweyer wrote: You might also use ( / add to main) the CutAdapt tool, which is available in the main toolshed. It takes multiple adapters, allows 3/5/both side adapters, and is fast. http://toolshed.g2.bx.psu.edu/repository/view_repository?sort=User.usernameoperation=view_or_manage_repositoryid=f19bc86bac946438 Best, Geert On 11/23/2013 03:19 PM, Peter Cock wrote: On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jacksonj...@bx.psu.edu wrote: Hi Seung Hee, I know we discussed this on the other list, but I didn't point you to the open development ticket to (potentially) extend the functions of the Cut tool. This is not being actively worked on right now, but you can follow it for updates if you want. https://trello.com/c/CbFSHrU5 Others are still welcome to comment about what types of solutions they might have to offer. There is no specific tool to do this on Main right now (or in the Tool Shed, from my checks).http://usegalaxy.org/toolshed This tool of mine might do what Seung Hee wanted, but I have not tried it on very large Illumina datasets: http://toolshed.g2.bx.psu.edu/view/peterjc/seq_primer_clip Regards, Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail:geert.vandewe...@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/in/geertvandeweyer ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] help for trim sequences
Thanks Peter for another option! Jen Galaxy team On 11/23/13 6:19 AM, Peter Cock wrote: On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jackson j...@bx.psu.edu wrote: Hi Seung Hee, I know we discussed this on the other list, but I didn't point you to the open development ticket to (potentially) extend the functions of the Cut tool. This is not being actively worked on right now, but you can follow it for updates if you want. https://trello.com/c/CbFSHrU5 Others are still welcome to comment about what types of solutions they might have to offer. There is no specific tool to do this on Main right now (or in the Tool Shed, from my checks). http://usegalaxy.org/toolshed This tool of mine might do what Seung Hee wanted, but I have not tried it on very large Illumina datasets: http://toolshed.g2.bx.psu.edu/view/peterjc/seq_primer_clip Regards, Peter -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Identifying Genes
I am very new to Galaxy. We have performed a comparative analysis between the transcriptomes of different samples. We performed the analysis using Galaxy software (Tophat; CuffDiff; etc). What my PI has done is compiled a list of all the genes differentially expressed between each set, each in a separate excel sheet. So what I have is an excel spreadsheet with a list (usually around 300) of test id, gene id, and locus (ChrX:1-222). Initially, we have been identifying each gene individually, one at a time, by pasting the locus into the UCSC browser. This works, but is incredibly tedious. There has to be a better way in Galaxy. I have tried making BED files out of the loci, but so far I have been unable to identify genes using galaxy. Can someone please explain how I can take my long list of loci and get gene names, ID, function, and possibly some downstream comparative ontologies to begin analyzing. Like I said, very new to Galaxy and genomics. Thanks very much ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] How to know callable loci after variant calling in GATK ?
Hi all, Would someone know how to get the information on which areas of the genome are considered callable after a call with Unified genotyper from GATK (as a .bed or pileup file)? Thanks for your help/advice, Fabrice -- Fabrice Besnard Institute of Biology of the Ecole Normale Supérieure (IBENS) 46 rue d'Ulm, 75230 Paris cedex 05, France 8th floor. Office: Room 802. Lab: Room 817. mail: fbesn...@biologie.ens.fr Tel: +33-1-44-32-39-31 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Identifying Genes
Hi Jacob, Using the tool Get Data - UCSC Main table browser, data can be retrieved directly using either gene symbols or locus positions. A good track to go against is UCSC Genes, if available for your genome. RefSeq Genes is another good choice. But really any track in the group Gene and Gene Prediction Tracks is worth a look to see if it is fit for what you are interested in, as the content can vary between genomes and even builds. The specifics can be reviewed at UCSC by clicking into the describe table schema area (button next to table selection, start with default table). To search multiple gene symbols, enter the list in the form under identifiers. To search multiple loci, enter the list under region (define regions). These both accept a text file, so download the information, cut out of the original file, formatted how the UCSC form states from Galaxy as text (tabular). Or, export as text from the Excel spreadsheet. 300 should be fine at once, I believe the limits are around 1000 per query for each of these. At this point in the query, the extract would just pull basic data from the single primary table. To also pull out related information, change the output file type to be selected fields from primary and related tables and then click on get output. The next form is where you can link in additional tables of data. The general idea is to add the table, then select the specific fields that you want to include. Again, any of these can be reviewed before the final query is made using the first main form and then the describe table schema button, or once in that describe view, by clicking on related tables to navigate. When doing the query this way, the Table browser takes care of the relational joins for you, just as an SQL query would. For more help about using the UCSC table browser, these links are good places to start, and for detailed questions about a specific piece of data that you cannot locate, the support team for the browser can almost certainly help. The Table browser is not your only option (flat text files and a mySQL database are available), but this is a web-based access point to the information, easily imported into Galaxy or downloaded for further analysis. There are also other types of queries possible, at UCSC and in Galaxy, this is just the most direct I know of, for your question and original data: https://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html https://genome.ucsc.edu/FAQ/FAQmaillist.html One note: you have the locus position with a chromosome identifier in the format Chr1 in your email. I am not sure if this was intentional or not - but you will need to format the identifiers to match those in the target reference genome, just as they were in the original analysis. In general, this would mean the format would be chrX instead (case matters). So, check/adjust the case/format to avoid problems, these really do have to be an exact match. The same is true for gene names/symbols - you can always search in the browser to see what the format is if something is missing and adjust. Also make sure that Excel does not output any hidden characters (line wraps) - stick with plain text cells for best results if you plan to output/use the data with external tools. You probably know most of this, but just in case I wanted to point out where the gotchas could be. Even if using gene names for this, you may want to use the position later on, and identifiers in the correct format from the start are a good idea. Hopefully this gets you started! Jen Galaxy team On 11/25/13 8:40 AM, Loupe, Jacob M. wrote: I am very new to Galaxy. We have performed a comparative analysis between the transcriptomes of different samples. We performed the analysis using Galaxy software (Tophat; CuffDiff; etc). What my PI has done is compiled a list of all the genes differentially expressed between each set, each in a separate excel sheet. So what I have is an excel spreadsheet with a list (usually around 300) of test id, gene id, and locus (ChrX:1-222). Initially, we have been identifying each gene individually, one at a time, by pasting the locus into the UCSC browser. This works, but is incredibly tedious. There has to be a better way in Galaxy. I have tried making BED files out of the loci, but so far I have been unable to identify genes using galaxy. Can someone please explain how I can take my long list of loci and get gene names, ID, function, and possibly some downstream comparative ontologies to begin analyzing. Like I said, very new to Galaxy and genomics. Thanks very much ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source
Re: [galaxy-user] help for trim sequences
Hi Seung Hee, You can request that this tool be added to the public Main server at usegalaxy.org through Trello and the team will consider it. For right now, the options are local or cloud. (as in my other reply) Or, you can look around the the other public servers hosted by our community - each is run by a distinct group with their own contact/help/public-use criteria: http://wiki.galaxyproject.org/PublicGalaxyServers It may be simplest to see if a local will do the job, then upload the results to the public server for downstream analysis. Just do the very basics of a production server install and then add the tool to test it out. This will take some line commands to set up, but shouldn't be too much of an investment. The links are: http://getgalaxy.org http://usegalaxy.org/toolshed http://wiki.galaxyproject.org/Tool%20Shed#Installing.2C_maintaining_and_uninstalling_tool_shed_repositories_within_a_Galaxy_instance Local install help/discussion: galaxy-...@bx.psu.edu Subscribe or search prior Q/A: http://wiki.galaxyproject.org/MailingLists Take care, Jen Galaxy team On 11/25/13 11:29 AM, Seung Hee Cho wrote: Thank you for much for your great help! I am trying to use this tool but I am wondering if I can use this CutAdapt tools on the public server. I was working on my job on the public server, so if not I need download it for use. I truly appreciate your help! Best, *Seung Hee Cho* Contreras Research Group, CPE 5.416 The University of Texas at Austin Department of Chemical Engineering 200 E Dean Keeton St. Stop C0400 Austin, TX 78712-1589 On Mon, Nov 25, 2013 at 10:08 AM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Thanks Peter for another option! Jen Galaxy team On 11/23/13 6:19 AM, Peter Cock wrote: On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Hi Seung Hee, I know we discussed this on the other list, but I didn't point you to the open development ticket to (potentially) extend the functions of the Cut tool. This is not being actively worked on right now, but you can follow it for updates if you want. https://trello.com/c/CbFSHrU5 Others are still welcome to comment about what types of solutions they might have to offer. There is no specific tool to do this on Main right now (or in the Tool Shed, from my checks). http://usegalaxy.org/toolshed This tool of mine might do what Seung Hee wanted, but I have not tried it on very large Illumina datasets: http://toolshed.g2.bx.psu.edu/view/peterjc/seq_primer_clip Regards, Peter -- Jennifer Hillman-Jackson http://galaxyproject.org -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Transcriptome Hypericum perforatum
To whom it may concern I would like to kindly ask you if you do have any experience in de-novo transcriptomic analysis (no reference genome available) who might give us some advice. Our main question is how to create the best set of cDNA contigs, on which we can map our RNAseq reads for the analysis of differential expression. Currently 4 larger sets of of RNAseq reads are available from different genotypes as well as draft genome assembly for one of the genotypes. We worry about the SNPs in different genotypes affecting the assembly, if we combine all the RNAseq datasets and using assemblers such as Trinity, Oases, Velvet. Might it be better to use the draft genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all available RNAseq data or only using the RNAseq data from the same genotype as the genome draft? Thank you in advance Best wishes Miro Sotak ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Transcriptome Hypericum perforatum
Hello Miro, for these kind of general questions I would recommend you to ask in the bioinformatics forum at http://www.biostars.org/ as it is somewhat unrelated to Galaxy. Nevertheless some of the tools you mentioned are installed and available on the main instance (usegalaxy.org) and some you can install on your own Galaxy via the Toolshed (http://toolshed.g2.bx.psu.edu/). best Martin, Galaxy Team On Mon, Nov 25, 2013 at 4:16 PM, miroslav.sotak miroslav.so...@upjs.skwrote: To whom it may concern I would like to kindly ask you if you do have any experience in de-novo transcriptomic analysis (no reference genome available) who might give us some advice. Our main question is how to create the best set of cDNA contigs, on which we can map our RNAseq reads for the analysis of differential expression. Currently 4 larger sets of of RNAseq reads are available from different genotypes as well as draft genome assembly for one of the genotypes. We worry about the SNPs in different genotypes affecting the assembly, if we combine all the RNAseq datasets and using assemblers such as Trinity, Oases, Velvet. Might it be better to use the draft genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all available RNAseq data or only using the RNAseq data from the same genotype as the genome draft? Thank you in advance Best wishes Miro Sotak ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Transcriptome Hypericum perforatum
Hello, Interesting genome. I see that SRA has some RNA-seq public data, but there isn't much else going on. And you goal is to characterize the expression for observed phenotypes (linked to known genotypes)? If you use the Tuxedo suite after assembly (Trinity or other), differential expression of alternative splicing is one of the discovery outputs. From my experience (and other are welcome to add comments), most SNP differences (_single_ base polymorphisms) do not in general impact the global assembly of whole genome data. Larger insertions/deletions are where you will observe differences. But that is DNA. For transcription assembly, including RNA-seq, novel isoforms per sample and in particular rare events like SNPs, can become diluted when multiple samples are directly combined and assembled together straight de-novo. Still, obtaining full length cDNAs is certainly possible. And it has been done just about the same way, with various types of RNA data, for a very long time (most of RefSeq started out that way). The downside here is that the most common variant can overwhelm, but with a plant you might have that issue anyway depending on ploidy. So, test for yourself. Genomes can vary and the tools are so interesting - same way is a gross generalization on my part, in specifics the tools are very sophisticated. And, most importantly, as you do have a reference genome to use as a guide (and that is really an invaluable tool not to be ignored) be sure to incorporate it unless it is from a sample that is known to be significantly, unacceptably, different from the wildtype. It sounds like the quality has been assessed to be unacceptable to use directly as a reference genome for some reason (correct? Or, you just want to build up the cDNA set -great project!). But the genome can still be utilized. Specifically - using it as an early stage assembly guide will give you a huge advantage, in my opinion (some assemblers cluster the data first by mapping - you want this if possible). But again, you could try it both ways and check out a few genes to see how the transcript profile worked out (vs any knowns - comparative OK, I always used these when I did this type of work), plus use the truth metrics (to me) of transcription assembly: how many singletons did you end up with (and what do they map to! can they really be ignored?) how many over-clustered genes did you get (interesting, sparcer genes gobbled up by abundant housekeeping). Under-clustered genes/transcripts or incomplete transcripts are other factors, but depending on how you set the parameters in Cufflinks, this may be less important, if it isn't a pathological problem. Many people will have advice about this, so ask, but also test. Looking at the results will inform you if the path is right. I hope this helps a little bit! Jen Galaxy team On 11/25/13 1:16 PM, miroslav.sotak wrote: To whom it may concern I would like to kindly ask you if you do have any experience in de-novo transcriptomic analysis (no reference genome available) who might give us some advice. Our main question is how to create the best set of cDNA contigs, on which we can map our RNAseq reads for the analysis of differential expression. Currently 4 larger sets of of RNAseq reads are available from different genotypes as well as draft genome assembly for one of the genotypes. We worry about the SNPs in different genotypes affecting the assembly, if we combine all the RNAseq datasets and using assemblers such as Trinity, Oases, Velvet. Might it be better to use the draft genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all available RNAseq data or only using the RNAseq data from the same genotype as the genome draft? Thank you in advance Best wishes Miro Sotak ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to