Re: [galaxy-user] Trackster Error
Hi Suzanne, Can you share your history with me and I’ll take a look? Thanks, J. -- Jeremy Goecks Assistant Professor of Computational Biology George Washington University On Mar 24, 2014, at 5:08 PM, Suzanne Gomes suzanneluziago...@gmail.com wrote: Hello, I am trying to look at an output from Tophat using Trackster, but I keep getting the following error: Could not load chroms for this dbkey: dp4 This is not a custom dbkey - I just selected it from the list of available ones on Trackster. I have database/build set to: D. pseudoobscura (dp4) (dp4) for my Tophat results. Any ideas why this is happening and what I can do to fix it? Thanks Suzanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Trackster Error
Hi Suzanne, Thanks for sharing your history. This is a file format issue on our side. We’ll get it taken care of and let you know when it’s fixed. Thanks, J. -- Jeremy Goecks Assistant Professor of Computational Biology George Washington University On Mar 27, 2014, at 9:27 AM, Jeremy Goecks jgoe...@gwu.edu wrote: Hi Suzanne, Can you share your history with me and I’ll take a look? Thanks, J. -- Jeremy Goecks Assistant Professor of Computational Biology George Washington University On Mar 24, 2014, at 5:08 PM, Suzanne Gomes suzanneluziago...@gmail.com wrote: Hello, I am trying to look at an output from Tophat using Trackster, but I keep getting the following error: Could not load chroms for this dbkey: dp4 This is not a custom dbkey - I just selected it from the list of available ones on Trackster. I have database/build set to: D. pseudoobscura (dp4) (dp4) for my Tophat results. Any ideas why this is happening and what I can do to fix it? Thanks Suzanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Questions regarding Circster visualization
Indeed, there's another feature I don't fully understand: I have a bgiWig file that contains reads of only one chromosome. I expected that Circster would display this one chromosome as one circle, but apparently Circster always draws a circle where all possible chromosomes of a genome are displayed. I think the usability would greatly increase if Circster only displayed those chromosomes that are actually represented in the coverage file. (Of course, I could zoom in, but if you're working with a chromosome that's very small in comparison (e.g. the Y chromosome) the circular representation is not really seen anymore as the region covered by the Y chromosome is so tiny compared to the autosomes). Circster is really for genome-wide visualization, and the assumption is that you'll have data for many if not all chromosomes. If you have data for only a single chromosome, using Trackster (Galaxy's track browser) makes more sense; Trackster is also more developed and has more display options right now. Let's say, then, that what you're proposing is a very advanced feature that could be implemented down the road. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Questions regarding Circster visualization
1. I tested it using a bigWig and a BED file. Both were loaded nicely in Circos, but I was surprised to see that the visualization of both files looked exactly the same, i.e. both file types seemed to be interpreted as histograms/coverage data. From the Circos plots I've seen in publications, I assumed that BED files should be visualized as straight lines, indicating genome regions (rather than a coverage). Am I doing anything wrong? Or, rather, how should I modify the BED file so that its content is simply interpreted as genomic regions? This is a limitation of the visualization, and it should be addressed. I've created a Trello card for this enhancement that you follow here: https://trello.com/c/YIdx6QvV 2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397), line data is mentioned for displaying connecting lines in the center of the circle - could you give me an example line of how this kind of data needs to be formatted? The format is a 7-column tabular file with tab-separated values: -- chrom1 start1 end1 chrom2 start2 end2 score -- Score isn't used right now, but it still needs to be there. Once you have this format, you'll need to convert the datatype from 'tabular' to 'chrint' in order to visualize it (click on the pencil icon -- Datatype. Also, I have a workflow up to convert Tophat fusion output data to chrint format here: https://usegalaxy.org/u/jeremy/w/tophat-fusion-post-output-to-chrint Sorry for the cryptic nature of everything right now. We'll get this info and more up on a wiki page eventually (you're welcome to start one in the meantime). Let us know if you have more questions. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Creating a Trackster visualisation from a reference in your history
Is it possible to create a custom build and use it to view a SAM file without adding the .len and .2bit files in to the Galaxy file system as an administrator? Yes, it definitely is. If so, what am I doing wrong? This is a Galaxy bug which has been fixed in this commit: https://bitbucket.org/galaxy/galaxy-central/commits/117fef56513fc563dd231516196cfd601c1635e2 We have a release coming up, so this fix will be included in the release and will make it to our public server soon. In the meantime, note that you can use the genome fasta file rather than the len file to create a custom build and everything should work. Thanks, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Trackster Error chrom/?.len , No such file or directory
You'll need to set your dataset's database/dbkey to your custom reference genome before you can visualize it. We have enhancements planned so that this error doesn't happen in the future. Best, J. On Dec 5, 2013, at 7:56 AM, Jasper Jan Koehorst jasper.koeho...@wur.nl wrote: I have my own genome fasta file containing 1 chromosome with a modified header so that it looks like: chr1 ATGCATGC I did a FASTQ mapping on it via the galaxy interface and now I end up with a bam file: 9 Bowtie2 on data 6, data 8, and data 7: aligned reads 1.2 GB format bam database ? I use the visualize button to start the visualization of the dataset. I chose trackster, And view it in a new visualization. I use my fasta file as a reference genome: Name Key Number of chroms/contigs STPmg315 STPmg315_v1 1 But then I get the error: Couldn't open /home/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len , No such file or directory I looked into the /chrom/ folder and of course ? does not exist. I am currently running python ./cron/build_chrom_db.py ./tool-data/shared/ucsc/chrom/ But this ofcourse downloads only known genomes and their chr. information. As I have my own genome I was curious how to continue with this. I manually created a file in the /chrom/so that it looks like this: head STPmg315.len chr1 1900521 but no luck so far. What else do I have to do to make it work? Thanks, Jasper ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Contents of a SAM file
All reads are in the SAM file; you can filter to remove unmapped reads as needed. J. On Nov 7, 2013, at 5:36 AM, Benjamin Osei-agyeman benjy_o...@yahoo.co.uk wrote: Hi What are the contents of a SAM file after Bowtie has been run? Does it contain all reads or only those reads mapped to the genome? Thanks Benjy ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] galaxy-user Digest, Vol 89, Issue 4
Thanks for the info. However, my problem is that the Tool Version field is completely empty in my history items (eg. Tophat2, Cuffdiff). I suppose I can check the dependancies list you described, but it would be important to know precisely which version was run on any given query. If you ran Cuffdiff in the last couple months, you used version 2.1.1 ; before that it was 1.3.x The version information was added in the last couple weeks, which is why you don't see it. Any runs going forward should include the version. J. Best regards, Cory Message: 3 Date: Tue, 5 Nov 2013 09:45:19 + From: graham etherington (TSL) graham.ethering...@sainsbury-laboratory.ac.uk To: Cory Dunn cd...@ku.edu.tr, galaxy-u...@bx.psu.edu galaxy-u...@bx.psu.edu Subject: Re: [galaxy-user] Cuffdiff version not apparent Message-ID: ce9e6d0a.21768%graham.ethering...@sainsbury-laboratory.ac.uk Content-Type: text/plain; charset=Windows-1252 Hi Cory, A list of Galaxy dependancies can be found on the wiki at: http://wiki.galaxyproject.org/Admin/Tools/Tool%20Dependencies ...although many tools allow a range of tool versions. You can also identify the information about the specific tool versions by clicking on the View Details ?i? icon of a history item created by that tool and looking at the Tool Version field. If you?re using the Galaxy public server (https://usegalaxy.org/) then clicking on the ?i? icon of a cuffdiff output file will show: Tool Version:cuffdiff v2.1.1 (4046M) Hope this helps. Cheers, Graham ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Trackster Error: needLargeMem: trying to allocate 0 bytes (limit: 100000000000)
It turns out that your artificial test is a bit too artificial. In order to display a coverage plot, Trackster converts reads in a BAM to BigWig using a two step process: (1) BAM to bedgraph; (2) bedgraph to bigwig Your super simple example generates an empty file in step 1 because your single read does not map to Araly1, and the tool used in step 2 (bedGraphToBigWig) fails with the error that you're seeing. This is a corner-case bug, and I've create a card for this so you can track its resolution: https://trello.com/c/kMFUNawL Best, J. On Oct 30, 2013, at 5:44 PM, Guest, Simon simon.gu...@agresearch.co.nz wrote: I'm having problems getting Trackster working on my own Galaxy instance, so I thought I would check on the usegalaxy public server. However, I'm getting the same Trackster Error: needLargeMem: trying to allocate 0 bytes (limit: 1000) that was reported on this list in July, but there was no followup: http://user.list.galaxyproject.org/Trackster-Error-td4655737.html My history is at https://usegalaxy.org/u/simon-guest/h/trackster-error This is just an artificial test I made using a fragment of a reference genome, but I thought it should work OK. Any clues? cheers, Simon ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] [galaxy-dev] Bam File
Hello, First, I've moved this question from the Galaxy development mailing list to the Galaxy user mailing list; in the future, please send questions about using Galaxy to the galaxy-user list. To answer your question, files larger than 2GB must be uploaded via FTP to Galaxy. This is necessary due to Web browser limitations. Help to use FTP is on this wiki. The screencasts both show the two step process. The first is to FTP the data to the server, the second is to move the data from the Get Data - Upload Data tool form into your history. http://wiki.galaxyproject.org/FTPUpload Best, J. On Oct 28, 2013, at 11:55 AM, Arshad Rafiq arshadrafi...@gmail.com wrote: I am trying to upload a bam file for my data analysis (size is about 9GB) I am trying URL method to up load and getting error message, can you please help me to sort out this problem. I am seeing following message An error occurred setting the metadata for this dataset. You may be able to set it manually or retry auto-detection Thanks *** Arshad -- Muhammad Arshad Rafiq, PhD Research Associate Laboratory of Dr. R. Hamilton Physiology and Experimental Medicine Research Institute The Hospital for Sick Children McMaster Building, Room 7005 88 Elm St. Toronto, ON M5G 1X8 Canada 647-237-4915 arshadrafi...@gmail.com ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] problems in transitioning from Tophat to Cuffdiff
Are you reporting a bug for each failed Cuffdiff run? That's the easiest way for the Galaxy team to help you out. One thing to keep in mind is that, for now, spaces are not allowed in condition names. We'll address this problem soon. Best, J. On Oct 22, 2013, at 5:42 AM, Elwood Linney ellin...@gmail.com wrote: After successfully using RNAseq software in Galaxy online for about 10 different datasets to just get gene expression differences between replicates from control versus exposed zebrafish embryos, I am having no luck getting cuffdiff to work with the moved Galaxy. I had this problem with histories developed before the move and histories developed after the move. I have had this problem using an order cuffmerge gtf file that worked in the past in Cuffdiff, with a new cuffmerge file developed from cufflinks of the files and by just using a ref file gtf from UCSC. I don't know if this is just some interface problem with a different version of the software that was included with the move, or a reference genome that does not interface with Cuffdiff. It has happened with about 5 different histories. Is anyone else having this problem? And found a solution? Elwood Linney ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] CloudMap
This sounds an issue with importing workflows. Additional details will help others provide help: (1) What version of Galaxy are you using? (2) What workflow are you trying to import? (3) What steps have you taken that produce the error that you're seeing? Thanks, J. On Sep 13, 2013, at 3:32 PM, Isaac Knoflicek wrote: Has anyone out there gotten CloudMap to work in on a local Galaxy instance? https://main.g2.bx.psu.edu/cloudmap I believe I have all the prerequisites installed but when I try to import any of the published workflows I get a “TypeError: expected string or buffer”. Any advice would be greatly appreciated. Thanks, Isaac Knoflicek IT Manager – Laboratory of Genetics University of Wisconsin - Madison ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Cuffdiff changes
Where can I see which version are being used? You can see both the Galaxy tool version and the Cuffdiff tool version (when available) by clicking on the 'view details' icon (the 'i' at the bottom of an expanded dataset). Right now the Cuffdiff version is not displayed, but that will change when our server is updated. What does Cuffdiff(version 0.0.5) mean then? That is the version of the Galaxy wrapper; the wrapper provides the interface between Cuffdiff and Galaxy. What version was it before? I think Cuffdiff version was 1.3.1 previously. I look forward to the update, will that mean another version of Cuffdiff again? The wrapper will be updated but not Cuffdiff itself. J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Cuffdiff-cummerbund with biological replicates problem
In the past, others have had success using Cummerbund with Galaxy, and there's even a Cummerbund wrapper in the tool shed: http://toolshed.g2.bx.psu.edu/view/jjohnson/cummerbund That said, it appears that replicate information is largely contained in the read group tracking files, which are not currently included in Galaxy's Cuffdiff outputs. I don't know if these files are required by Cummerbund to do replicate analysis. This would be a good question for the Cummerbund developers, as well as what the p and q values mean when doing replicate analysis. If you find that Galaxy's lacking something for Cummerbund to function correctly, that would be very useful information to share with the list. Best, J. On Jul 26, 2013, at 8:50 PM, Mike Shamblott wrote: I'm trying to run Cuffdiff on a set of 10 human samples with biological replication then download the results for further analyses in Cummerbund(v2.1.1). It seems like a standard workflow but I cannot get cummerbund to acknowledge replicates. I download and rename the 11 cuffdiff output files to the names expected by cummerbund. Cummerbund builds a CuffSet with no warnings and most analyses work as expected. The problem comes any time I try to see the results of replication. For example, in cummerbund, replicates() returns an empty set and any type of plot returns an error when replicates=T is included as an argument. There is no evidence of replication data in any of the 11 cuffdiff output files. The data is presented with the group name only. From this, I conclude that the problem is with cuffdiff, since there is no replicate data for cummerbund to build into its db. I see that there are several read group files that are produced by cuffdiff but cannot be downloaded in Galaxy. Is this the problem, and if so, how can Galaxy be used to generate data with (essential) replication? Are the p and q significance values reported in the output files a result of replicate analysis? I have tried to ask this question in several different forums without success. The responses I've gotten suggest its a Galaxy issue rather than either cuffdiff or cummerbund. I'm hoping someone here can help answer my questions. Hopeful, Mike ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] How to define the cutoff value of RPKM for expressed genes?
The confidence intervals provided by Cufflinks/Cuffdiff are a good place to start; any confidence interval that includes 0 should be looked on skeptically. Good luck, J. On Jul 3, 2013, at 2:52 PM, Hoang, Thanh wrote: Hi all, I have been working on RNA-seq data analysis using TopHat and Cuffdiff. One of the problem I have is to define the cutoff RPKM value to tell whether a gene is expressed from the background noise?. Could anybody give me a suggestion? Thank you Thanh ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] View details of Tophat alignment
Nothing is wrong with your job, this is a bug in our code that has been corrected. You'll start seeing the correct parameter values again when we update our server early next week. Best, J. On May 29, 2013, at 11:05 PM, Du, Jianguang wrote: Hi All, After I finshed Tophat alignment for RNA-seq, I took look at the details of parameters by clicking the icon View details, and I got the information as shown below: Input Parameter Value Note for rerun RNA-Seq FASTQ file73: Filtered Groomed data1_rep2 Use a built in reference genome or own from your history indexed Select a reference genome /galaxy/data/mm9/bowtie_index/mm9 Is this library mate-paired? single TopHat settings to usefull Library Type FR Unstranded Anchor length (at least 3)None Maximum number of mismatches that can appear in the anchor region of spliced alignmentNone The minimum intron length None The maximum intron length None Allow indel searchNo Maximum number of alignments to be allowedNone Minimum intron length that may be found during split-segment (default) search None Maximum intron length that may be found during split-segment (default) search None Number of mismatches allowed in the initial read mapping None Number of mismatches allowed in each segment alignment for reads mapped independently None Minimum length of read segments None Use Own Junctions Yes Use Gene Annotation Model Yes Gene Model Annotations1: mm9 genes.gtf Use Raw Junctions No Only look for supplied junctions No Use Closure SearchNo Use Coverage Search Yes Minimum intron length that may be found during coverage searchNone Maximum intron length that may be found during coverage searchNone Use Microexon Search No I am totally confused by so many Nones. Then I checked the workflow I set and used for the TopHat alignment, the details are the same as above. However, the brief description just under the title of alignment output (. accepted hits) is as below: format: bam, database: mm9 Tophat for Illumina on data 1 and data 73: accepted_hits, TopHat v1.4.0 tophat -p 8 -a 8 -m 0 -i 70 -I 50 -g 20 -G /galaxy/main_pool/pool1/files/004/425/dataset_4425972.dat --library-type fr-unstranded --no-novel-indels --coverage-search --min-cove Could you please tell me is there anything wrong (because so many None in the detail parameters)? Thanks. Jianguang DU ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?
1) My reads are 36nt long. How much should I set for the Minimum length of reads segments to get the most reliable output with the highest mapping of splicing junctions?. In my previous run of TopHat, I set it as 18. Can I reduce it more to get better mapping on splicing junctions? You'll need to define for yourself what you mean by better/best mapping and experiment to find the parameters that give you the best results. 2) I do not understand exactly how TopHat works as for the Anchor length although I have read the manual for TopHat. Suppose I set the Anchor length as 8 and the Maximum number of mismatch that can appear in the anchor region of spliced alignment as 0 when I run Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will report this alignment to the outputs .accepted hits and .splicing junctions if either end of the read has 8 or more nucleotides mapping on one exon? I think that's correct. 3) Is there disadvantage/negative effect if I choose to set the Anchor length at the lowest, for example 3? My understanding is that, under the 0 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, the other part of the read will map on the adjacent exon (in my case, the other part would be 33 nucleotides). So my understanding is that setting the Anchor length at 3 does not increase the inaccuracy of the alignment. Am I correct? Setting the anchor length especially small reduces the constraints on mapping, so more reads will map but there are likely to be more false positives as well. Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?
I have one more question about the Anchor length. For a RNA-seq read mapped on the splicing junction under the 0 mismatch condition, if 5 nucleotides of one end map on one exon, does it mean the rest part of the read must map on the adjacent exon? What I want to understand is that, although reducing Anchor length may reduce the reliability of mapping on one end/exon, but the increased number of mapped nucleotides on the adjacent exon may increase the reliability of mapping. Does it mean overall the reliability of mapping is not changed? No, in general the probability of mapping 5 bases + (N-5) remaining bases incorrectly is higher than mapping 8 bases + (N-8) bases incorrectly because (a) there are more matching 5-mers than 8-mers in a genome and (b) there can mismatches when mapping the remainder. J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?
36bp reads will map across splice junctions but at a relatively low rate; you can try changing segment length to get better mapping, but you'll want to evaluate the results carefully to ensure that you're getting good results. Good luck, J. On Apr 8, 2013, at 5:45 PM, Du, Jianguang wrote: Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent parameters (no mismatch)? Thanks. Best, Jianguang Du ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?
In addition to reducing the the Minimum length of reas segments, do I also need to reduce Anchor length to get more mapping on splicing junctins? Definitely worth a try. Looks like the setting for Anchor length only affects the number of mapped splicing junctions reported in the .splicing junctions output. Is my understanding correct? No, it will affect mapped reads as well. Does the regions mean the number of mapped splicing junctions? Yes. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] (no subject)
Cuffmerge does some additional steps that Cuffcompare does not; specifically, Cuffmerge attempts to remove assembly artifacts: http://cufflinks.cbcb.umd.edu/manual.html#cuffmerge It's likely that the (presumed) artifacts removed by Cuffmerge account for the differences that you're seeing. Best, J. On Apr 5, 2013, at 8:33 AM, Davide Degli Esposti wrote: Dear Galaxy team, I have a question about RNA analysis with the cufflinks package. I have some bam files to analyze from a SOLiD platform. Some previous tests show that these bam/sam files are different from those coming from Tophat and cufflinks cannot assemble them using a reference annotation (XS attribute lacking in spliced alignments). (see https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetscufflinks). An apparent solution is to include the reference annotation in the cuffmerge (see https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetsapril-20132) or cuffcompare (see https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetsjan-2013-1) steps. Doing like this allowed me to run cuffdiff on my datasets without apparent technical errors. However, when I compare the list of differentially expressed transcripts (DETs), these results extremely different: using cuffcompare, I got 390 DETs and using cuffmerge I got 770 DETs, but just 60 genes are shared between the two lists. The parameters used in cuffdiff (FDR, Min Alignement counts, etc.) are the same for the two analyses. Do you have any explanation about that? I expected that cuffcompare and cuffmerge did not lead to outputs quantitatively different. Where may the source of this difference be? I thank you for your cooperation Davide --- Davide Degli Esposti, PhD Epigenetic (EGE) Group International Agency for Research on Cancer Tel. +33 4 72738036 Fax. +33 4 72738322 150, cours Albert Thomas 69372 Lyon Cedex 08 France This message and its attachments are strictly confidential. If you are not the intended recipient of this message, please immediately notify the sender and delete it. Since its integrity cannot be guaranteed, its content cannot involve the sender's responsibility. Any misuse, any disclosure or publication of its content, either whole or partial, is prohibited, exception made of formally approved use. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Cuffdiff statistical calculations are inconsistent?
The header of the Cuffdiff tool page says it is version 0.0.5 This version is the Galaxy tool wrapper version, not the tool version. (Yes, this is a usability issue.) You can find the tool version in the dataset's information panel by clicking on the 'i' icon. Is there a way, or setting, on Cuffdiff 2.0 to revert the parameters to be more similar to Cuffdiff 1.3? This isn't a parameter issue. The Cuffdiff algorithm has changed substantially, and it's not clear to me if/how (or whether it's a good idea at all) to modify parameters to obtain 1.3-esque results. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cuffdiff statistical calculations are inconsistent?
This is likely due to the upgrade from Cufflinks 1.3.x to Cufflinks 2.0.x; Cufflinks 2.0 introduced a new algorithm for Cuffdiff in particular. You can read about these changes on the website: http://cufflinks.cbcb.umd.edu/ (and there's a manuscript describing the changes as well). You might consider writer to to the tool authors directly for more details: tophat.cuffli...@gmail.com Of course, please consider sharing anything you learn with members of this list as well. Best, J. On Mar 13, 2013, at 12:06 PM, Mohammad Heydarian wrote: We are having the exact same issue, on the main server and our (recent) cloud instances. Were some of the hidden Cuffdiff parameters modified since fall 2012? Cheers, Mo Heydarian On Mar 13, 2013 11:02 AM, Jenna Smith jes...@case.edu wrote: Hi, I'll preface my concern by saying that I'm a novice to Cufflinks. Back in September, I performed a Cuffdiff analysis comparing a wild-type and mutant condition. The analysis returned ~800 transcripts differentially regulated between the two with statistical significance. Recently, I've rerun the Cuffdiff analysis - using exactly the same files stored in Galaxy for all inputs, and with all the same parameters - and only get a few dozen statistically significant hits. However, all of the data besides the p and q values are essentially identical between these two runs, so I am really unclear as to what is causing the difference. Here is just one clear example: From run 1: YFR026C FPKM 1 = 17.2434 FPKM 2 = 196.735 log2(fold change) = 3.51214 p = 1.64E-8 q = 7.33E-6 significant = yes From run 2: YFR026C FPKM 1 = 14.4489 FPKM 2 = 144.939 log2(fold change) = 3.32641 p = 0.000170034 q = 0.0719964 significant = no The second Cuffdiff analysis shows there is still a ~10-fold difference between conditions, but this is not statistically significant. Has the version of Cuffdiff on Galaxy been updated such that some parameters have changed, that could explain this difference? Or, is there some setting I am missing that would cause very large changes to fail statistical significance testing? Any help or input would be appreciated, I am really at a loss for why executing what should be exactly the same task is giving vastly different results. I could just be overlooking something very fundamental that is obvious to someone with more experience with this program. Thanks. -Jenna Smith ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How should I include biological replicates in cufflink/cuffdiff?
I am dealing with a bacterium which has about 4000 genes. When I tried Cuffmerge to merge everything with reference annotation, I got a merged file of only 50 lines. If I left out the reference annotation file, Cuffmerge returned me a merged file of 4000 lines (which is more reasonable). However this difference didn't happen if I use Cuffcompare to merge all the files. With or Without reference annotation, the merged file are both of 4000 lines. If I continue to Cuffdiff with this Cuffcompare file, I got over 1000 significantly changed genes. Could you give me some suggestion on this? Should I just trust the Cuffcompare file? Cuffmerge attempts to remove incomplete or spurious transcripts. My best guess is bacterial transcripts, with few/no introns, are being filtered out because they appear to be incomplete to Cuffmerge. So, in your case, Cuffcompare could be the superior option. You might want to verify my guess by discussing the issue with the cufflinks developers directly: tophat.cuffli...@gmail.com ; please feel free to post anything you learn to this list. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] [galaxy-dev] not enough memory space on my galaxy session
Hello, Apologies for the slow reply. I've moved this thread to the galaxy-user mailing list because it centers on using Galaxy rather than developing it. 2. deleted the first files, like the first fastq files, but I'm affraid to have an error messages Deleting your fastq files after you have mapped your reads is fine and will not cause any errors. Make sure to both delete and purge your datasets to clear them from your account: http://wiki.galaxyproject.org/Learn/Managing%20Datasets#Delete_vs_Delete_Permanently 3. to obtain more memory, just for the time of the study. Your best bet to obtain more memory quickly is to use a cloud instance: http://wiki.galaxyproject.org/CloudMan Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How should I include biological replicates in cufflink/cuffdiff?
My question is, if I need to compare between 5 time points, should I do comparison pairwise? No, do them all at once with Cuffdiff: (a) set 'Perform Replicate Analysis' to 'Yes'; (b) create 5 replicate conditions, one for each time point; (c) add your replicates for each time point. There's a Cuffdiff flag to do time series analysis, but it isn't implemented yet in Galaxy, so you'll get pairwise comparisons for all conditions. You can use the filtering tool to reduce Cuffdiff outputs to only the timepoint comparisons. I will use cuffmerge to merge 0hour-1, 0hour-2, 0hour-3, 1hour-1,1hour-2.1hour-3 to generate one cuffmerge file. Correct. Then I will run cuffdiff using the merged file, include two groups, group 1 is 0 hour (add 0hour 1-3 in group 1) and group 2 is 1hour (add 1hour1-3 in group 2). Use the process I described above to do all pairwise comparisons in one run. Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] replicates in cuffdiff output
Read group info isn't included in the Cuffdiff output right now. I've created a Trello card to fix this oversight: https://trello.com/c/FdUYdbIn Best, J. On Feb 18, 2013, at 12:32 PM, Johanna Sandgren wrote: Hi, I am running cufflinks and cuffdiff using Galaxy. I am however wondering if it is not supposed to be output files from cuffdiff regarding each replicate. Anyone know why those (read.group-files) are not there, or when they will be if it is because of the version used in Galaxy. I find it very valuable to have those to be able to see intra/inter-group features in downstream analysis. Thanks, Johanna .. Johanna Sandgren, PhD Department of Oncology-Pathology CCK, Karolinska Institutet SE-171 76 Stockholm, Sweden +46-8-517 721 35 (office), +46-8- 321047(fax), +46-708 388476 (mobile) ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cufflinks output for cummeRbund
Cummerbund is available in the Galaxy toolshed for use in local or cloud Galaxies: http://toolshed.g2.bx.psu.edu/view/jjohnson/cummerbund We haven't put it on our public server yet because there are testing and compatibility challenges that need to be addressed. Best, J. On Feb 11, 2013, at 7:49 PM, Mike Shamblott wrote: I have been using cufflinks on Galaxy Main. I have downloaded the files generated but they do not correspond to the file names expected by cummeRbund. For example: cummeRbund expects 4 tracking files (e.g isoforms.fpkm_tracking) , 4 .diff files (e.g isoform_exp.diff). Here is a trimmed version of the output I download, grouped by what Im guessing is the tracking, diff and usage files: TRACKING ...Galaxy109_transcript_FPKM_tracking.tabular ...Galaxy107_gene_FPKM_tracking.tabular ...Galaxy105_TSS_groups_FPKM_tracking.tabular ...Galaxy103_CDS_FPKM_tracking.tabular .DIFF ...Galaxy108_transcript_differential_expression_testing.tabular ...Galaxy106_gene_differential_expression_testing.tabular ...Galaxy102_CDS_FPKM_differential_expression_testing.tabular USAGE ...Galaxy99_splicing_differential_expression_testing.tabular ...Galaxy100_promoters_differential_expression_testing.tabular ...Galaxy101_CDS_overloading_differential_expression_testing.tabular Given that cummeRbund is a common next step in the workflow, is there an option to save the output in the expected format, perhaps with a galaxy history number prepend? I'm not sure which files are to be renamed and to what name and it seems that one file is missing. If there were a cummeRbund implementation on Main it probably wouldn't matter as much but until that happens, I (and i'm guessing other newcomers) would appreciate the help! Thanks, mike ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Moving history datasets to libraries
In the Add Datasets/Upload Files libraries form, set the option 'Upload option' to 'Import datasets from your current history' and you'll be able to add datasets from your history to a library. Best, J. On Jan 27, 2013, at 3:36 AM, Ted Goldstein wrote: I must be mistaken, but I don't see any way to move a dataset that I create in a history to a library except to download it and upload it again. Can this be correct? It seems like this is essential functionality. Please tell me I am wrong. Thanks, Ted ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trackster custom builds wrong
Hello, Can you share with me (a) the fasta dataset and (b) the form values (e.g. name, dbkey, etc) you used when you encountered this error? Thanks, J. On Jan 7, 2013, at 10:17 AM, Jennifer Hillman-Jackson wrote: Repost to Galaxy-user --- When using Trackster on Galaxy( https://main.g2.bx.psu.edu/root ), as the Galaxy Wiki of Trackster (http://wiki.galaxyproject.org/Learn/Visualization) recommend , I need to build a track browsers for soybean because that it isn't installed for all users. But after I enter the all the request information on the webpage(new build name, key and definition) and submit, but I get an server error information, it says an error occurred. see the error logs for more information.(Turn debug on to display exception reports here) Since I find there is also someboby encounter this problem in the mail list, but did't find any useful solution. So, I want to know why it happend and how to fix it. Is there something wrong in my maniputation? Any replay will be appreciated. Thank you very much! Yours sincerely, Yanting Shen ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Add new page server error
This bug has been fixed in our code base; our public server will be fixed when we update it early next week. Best, J. On Jan 4, 2013, at 5:54 PM, Aaron Stonestrom wrote: When logged into main trying to create a new shared page under Add new page in Saved Pages, entering any page title and selecting Submit gives me: Server Error An error occurred. See the error logs for more information. (Turn debug on to display exception reports here) Thanks for any help, ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cuffdiff
Use the replicates option (yes, a bit of a misnomer) and put each Tophat run in its own group. This will produce a tabular file with FPKM for each group/run. Best, J. On Nov 12, 2012, at 10:05 AM, Vevis, Christis wrote: Hi, I got confused while trying to perform Cuffdiff for my RNA sequencing analysis. So I have five different samples which were sequenced. I used tophat to create the bam files and cufflink to create the assembled trancripts. Then I uded Cuffmerge to merge them in one file and then I wanted to do Cuffdiff with that merged file. What shall I choose for the ‘’SAM or BAM file of aligned RNA-Seq’’ option? I have the 5 options from the 5 tophat actions on my 5 samples. All I want in the end is an excel table showing the number of hits from each sample (and not necessary a comparison of them). Regards Kristis Vevis, PhD Student Cell Biology UCL Institute of Ophthalmology 11-43 Bath Street London EC1V 9EL, UK 020 7608 4067 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Identification of replicate outlier
c) if you can create an appropriate input matrix (read counts by exon or other contig for each sample eg), the Principal Component Analysis tool might be helpful (library size normalization is one devil that lies in the detail and it's not quite the same as MDS - see below) I like starting with this approach because it can be done easily in Galaxy. You can take the expression datasets produced by Cufflinks for each replicate and join them on gene name to get a big table of replicate-expression values and either eyeball it or use PCA. Note that since Cufflinks produces FPKM, library size is already accounted for. Another idea/approach: Cuffdiff already has an advanced model for dealing with replicates: http://cufflinks.cbcb.umd.edu/howitworks.html#reps You may want to investigate how this model works and whether you can tune it with parameter settings before giving up on using all your replicates. One challenge with this approach is that the Galaxy Cuffdiff wrapper does not yet include all parameters, so you might try enhancing the Cuffdiff wrapper with additional, relevant parameters and using those as well as the existing ones. If you do this, please consider submitting your enhancements back to me and I can integrate them into our code base. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] (no subject)
Kristis, This data is available further downstream in an RNA-seq analysis pipeline, specifically, as output from the Cuffdiff tool. Take a look at the page for more details: https://main.g2.bx.psu.edu/rna-seq Best, J. On Nov 9, 2012, at 3:42 AM, Vevis, Christis wrote: Hi, I am performing online tophat on 5 different samples which I want to compare for gene expression. Is there any simple way, after the end of tophat for all of them, with which I can have an excel table with the 5 samples and their hits? Something similar to this Vevis1 Vevis2 Vevis3 Vevis4 Vevis5 uc010kuo.1 128.8503 136.60553 146.7073 91.23218 120.325 AK311687 TRA2A Homo sapiens transformer 2 alpha homolog (Drosophila) (TRA2A), mRNA. Regards Kristis Vevis, PhD Student Cell Biology UCL Institute of Ophthalmology 11-43 Bath Street London EC1V 9EL, UK 020 7608 4067 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cufflinks visualization
I'm able to visualize Cufflinks assembled transcripts in Trackster. Can you please be more specific about (a) which datasets you're having trouble using and (b) what errors you're seeing? Thanks, J. On Oct 31, 2012, at 1:10 PM, i b wrote: Hi all, can anyone explain me wh how can I visualize cufflinks outputs in trackster? galaxy keep sending me errors thanks, ib ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Export to file
When you say large history, is there a size limit that I should be aware of, or will it handle anything that my quota can accept? It will handle anything your quota can accept. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Export to file
I've reworked the code to handle large history export files in -central changeset afc8e9345268., and this should solve your issue. This change should make it out to our public server this coming week. Best, J. On Oct 18, 2012, at 12:36 PM, Dave Corney wrote: Hi Jeremy, Thanks for your offer of help. By the time I got your email I had already added many new jobs to the history that are either running now or waiting to run. Since I read somewhere that if the history is running then there are problems exporting I shared a clone of the history with you. The clone should be identical to the history that I was having problems with yesterday. I can share with you the original history once the jobs have finished running (but it might take a while). Thanks, Dave On Wed, Oct 17, 2012 at 10:35 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Dave, There's likely something problematic about your history that causing problems. Can you share with me the history that's generating the error? To do so, from the history options menu -- Share/Publish -- Share with a User -- my email address Thanks, J. On Oct 17, 2012, at 6:58 PM, Jennifer Jackson wrote: Hi Dave, Yes, if your Galaxy instance is on the internet, for entire history transfer, you can skip the curl download and just enter the URL from the public Main Galaxy server into your Galaxy directly. To load large data over 2G that is local (datasets, not history archives), you can use the data library option. The idea is to load into a library, then move datasets from libraries into histories as needed. Help is in our wiki here: http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Uploading%20Library%20Files Take care, Jen Galaxy team On 10/17/12 3:21 PM, Dave Corney wrote: Hi Jen, Thanks for your response and suggestion. Just so that it is clear, for your second method, where I export to file and then use curl, I will download to my computer as an intermediate stage? Is there a simple way to take the history and datasets from PSU galaxy to our Princeton galaxy directly (without downloading to my computer first)? Unfortunately, we don't have FTP on our own galaxy, which is why I was looking for alternatives (each file is 2GB, so uploading through the browser won't work either). It seems that to import from file, the file needs to have a URL and I'm not sure how to go about that if the file is store locally on my computer. Thanks, Dave On Wed, Oct 17, 2012 at 6:12 PM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Hi Dave, To export larger files, you can use a different method. Open up a terminal window on your computer and type in at the prompt ($): $ curl -0 'file_link' name_the_output Where file_link can be obtained by right-clicking on the disc icon for the dataset and selecting Copy link location. If you are going to import into a local Galaxy, exporting entire histories, or a history comprised of datasets that you have copied/grouped together, may be a quick alternative. From the history panel, use Options (gear icon) - Export to File to generate a link, then use curl again to perform the download. The Import from File function (in the same menu) can be used in your local Galaxy to incorporate the history and the datasets it contains. Hopefully this helps, but please let us know if you have more questions, Jen Galaxy team On 10/17/12 2:37 PM, Dave Corney wrote: Hi list, Is there a currently a known problem with the export to file function? I'm trying to migrate some data from the public galaxy to a private one; the export function worked well with a small (~100mb) dataset, but it has not been working with larger datasets (2GB) and I get the error: Server Error. An error occurred. See the error logs for more information. (Turn debug on to display exception reports here). Is there a limit on the file size of the export? If so, what is it? Thanks in advance, Dave _ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org http://usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/__listinfo/galaxy-dev http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please
Re: [galaxy-user] Data table named 'bowtie2_indexes' is required by tool but not configured
You'll need to update the tool_data_table_conf.xml file in your Galaxy home directory. If you haven't made changes to the file, you can copy tool_data_table_conf.xml.sample to tool_data_table_conf.xml If you have made changes, add these entries to the file: -- table name=bowtie2_indexes comment_char=# columnsvalue, dbkey, name, path/columns file path=tool-data/bowtie2_indices.loc / /table table name=tophat2_indexes comment_char=# columnsvalue, dbkey, name, path/columns file path=tool-data/bowtie2_indices.loc / /table -- Finally, please direct questions about local Galaxy installations to the galaxy-dev mailing list: galaxy-...@bx.psu.edu Best, J. On Oct 19, 2012, at 2:58 AM, Sachit Adhikari wrote: I am getting this error in Bowtie2 and Tophat2: Data table named 'bowtie2_indexes' is required by tool but not configured Data table named 'tophat2_indexes' is required by tool but not configured. How can I solve it? Thanks ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Export to file
Dave, There's likely something problematic about your history that causing problems. Can you share with me the history that's generating the error? To do so, from the history options menu -- Share/Publish -- Share with a User -- my email address Thanks, J. On Oct 17, 2012, at 6:58 PM, Jennifer Jackson wrote: Hi Dave, Yes, if your Galaxy instance is on the internet, for entire history transfer, you can skip the curl download and just enter the URL from the public Main Galaxy server into your Galaxy directly. To load large data over 2G that is local (datasets, not history archives), you can use the data library option. The idea is to load into a library, then move datasets from libraries into histories as needed. Help is in our wiki here: http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Uploading%20Library%20Files Take care, Jen Galaxy team On 10/17/12 3:21 PM, Dave Corney wrote: Hi Jen, Thanks for your response and suggestion. Just so that it is clear, for your second method, where I export to file and then use curl, I will download to my computer as an intermediate stage? Is there a simple way to take the history and datasets from PSU galaxy to our Princeton galaxy directly (without downloading to my computer first)? Unfortunately, we don't have FTP on our own galaxy, which is why I was looking for alternatives (each file is 2GB, so uploading through the browser won't work either). It seems that to import from file, the file needs to have a URL and I'm not sure how to go about that if the file is store locally on my computer. Thanks, Dave On Wed, Oct 17, 2012 at 6:12 PM, Jennifer Jackson j...@bx.psu.edu mailto:j...@bx.psu.edu wrote: Hi Dave, To export larger files, you can use a different method. Open up a terminal window on your computer and type in at the prompt ($): $ curl -0 'file_link' name_the_output Where file_link can be obtained by right-clicking on the disc icon for the dataset and selecting Copy link location. If you are going to import into a local Galaxy, exporting entire histories, or a history comprised of datasets that you have copied/grouped together, may be a quick alternative. From the history panel, use Options (gear icon) - Export to File to generate a link, then use curl again to perform the download. The Import from File function (in the same menu) can be used in your local Galaxy to incorporate the history and the datasets it contains. Hopefully this helps, but please let us know if you have more questions, Jen Galaxy team On 10/17/12 2:37 PM, Dave Corney wrote: Hi list, Is there a currently a known problem with the export to file function? I'm trying to migrate some data from the public galaxy to a private one; the export function worked well with a small (~100mb) dataset, but it has not been working with larger datasets (2GB) and I get the error: Server Error. An error occurred. See the error logs for more information. (Turn debug on to display exception reports here). Is there a limit on the file size of the export? If so, what is it? Thanks in advance, Dave _ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org http://usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/__listinfo/galaxy-dev http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list
Re: [galaxy-user] a question about cuffdiff values
Hi El, 1) what do these numbers represent? FPKM values for sample 1 and 2. Cufflinks documentation is the place to get definitions for all columns: http://cufflinks.cbcb.umd.edu/manual.html#gene_exp_diff 2) If in the value column where I expect a higher number has a value of 10 or less mean anything or should one be selecting for values higher that these single digit numbers 3) And in the column of genes that might be repressed is there really a difference between a value of 0.1 versus something like 0.01 since that can change my log ratios significantly--this, of course, goes back to my first question These questions get at the challenge of interpreting FPKM values. One thing to look at is the confidence intervals (CI) produced by Cufflinks/diff. CIs that overlap 0 are, in my experience, unreliable no matter how large the FPKM. Most likely genes with FPKM values near 0 have CIs overlapping 0, which means there's likely no difference between them. However, genes with low FPKM values ( e.g. 10) but tight CIs and 0 should probably be included for further analysis. Another thing to look at is whether a couple highly-expressed genes are reducing FPKM values. If so, using the upper-quartile normalization option can help you get better resolution for genes expressed at low levels. Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Datasets permanently deleted
Sarah, I can't reproduce this behavior on a local instance or on our public server. This raises a couple questions: Are you using the most recent version of Galaxy? Can you reproduce this behavior on our public server (usegalaxy.org)? Thanks, J. On Jul 31, 2012, at 8:13 AM, Sarah Maman wrote: Dear all, In the menu User - Saved Datasets, all datasets are listed even if some of them have been deleted permanently by deleting its history. So, it's possible to copy an deleted dataset in the current history and that is confusing for users. Do you have any solution to drop these datasets in saved datasets ? Thanks in advance, Sarah Maman ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cuffconfusion
There is an excellent article on how to do differential gene/transcript expression with Tophat and Cufflinks here: http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html This article will answer the questions you've posed below and provides numerous figures that will help you create a workflow to meet your needs. Best, J. On Jul 20, 2012, at 6:39 PM, i b wrote: ok, im really confused now about cufflinks and its tools. All I wanted was to look for differentially expressed genes between two samples: treated (2 replicates) and control (one replicate). can anyone give me a workflow for a similar analysis with the various options chosen? I have read a lot of different posts where for cuffdiff they have used cufflinks, cuffcompare, cuffmerge or any gtf file as imput together with the bam file. There must be a difference in using all these different file right??? Also: what is the advantage in using cuffcompare and how we compare them: we give all cufflinks or we separate control from treated? Why do we need cuffmerge?isn't it as well combining the cufflinks? when we use cuffcompare or cuffmerge do we mix all cufflinks no matter is they are control or treated ones? Please don't send me back to the cufflink page (http://cufflinks.cbcb.umd.edu/index.html)...I need more simpler words! Thanks, ib ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trackster error: indexing
Hi Nancy, I'm ccing the galaxy-user mailing list as this discussion may be helpful to others. The problem is that your BAM dataset isn't sorted. Galaxy requires that uploaded BAMs be sorted to be useful for most tools and for visualization. You can fix this in two ways: (a) using samtools from the command line and then uploading the sorted file to Galaxy: samtools sort in.bam out_prefix (b) from Galaxy, use tools to convert the BAM to SAM and back again to BAM; the output of the SAM to BAM tool will be sorted. Taking either of these steps will enable visualization in Trackster. Best, J. On Jul 13, 2012, at 5:34 PM, Nancy Au Yeung wrote: On Fri, Jul 13, 2012 at 2:29 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Hi Nancy, Can you share the history with the problematic dataset(s) with me and I can take a look? Please share the history with me using my email address: jeremy.goe...@emory.edu Best, J. On Jul 12, 2012, at 9:07 PM, Nancy Au Yeung wrote: Hi, I saw another post regarding trackster error and it seems like this is different. I have tried copying the dataset from the History option, but this same error occurs. See error script below. Thanks! Trackster Error *** glibc detected *** python: double free or corruption (top): 0x01c09370 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f93fe421ab6] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f93fe4267ec] /lib/x86_64-linux-gnu/libc.so.6(fclose+0x14d)[0x7f93fe412a0d] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7f93fdac2ad7] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x6e7f8)[0x7f93fdaae7f8] python(PyObject_Call+0x36)[0x4824c6] python(PyEval_CallObjectWithKeywords+0x36)[0x486086] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x52a04)[0x7f93fda92a04] python[0x482068] python(PyObject_Call+0x36)[0x4824c6] python(PyEval_EvalFrameEx+0x91a)[0x4c5e8a] python(PyEval_EvalCodeEx+0x136)[0x4ccee6] python(PyEval_EvalFrameEx+0x838)[0x4c5da8] python(PyEval_EvalCodeEx+0x136)[0x4ccee6] python(PyRun_FileExFlags+0xe1)[0x577901] python(PyRun_SimpleFileExFlags+0x177)[0x577b37] python(Py_Main+0x6f7)[0x550497] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f93fe3caead] python[0x41dea1] === Memory map: 0040-00672000 r-xp fe:00 893421 /usr/bin/python2.7 00871000-00872000 r--p 00271000 fe:00 893421 /usr/bin/python2.7 00872000-008db000 rw-p 00272000 fe:00 893421 /usr/bin/python2.7 008db000-008ed000 rw-p 00:00 0 0166d000-01c29000 rw-p 00:00 0 [heap] 7f93f800-7f93f8021000 rw-p 00:00 0 7f93f8021000-7f93fc00 ---p 00:00 0 7f93fd6de000-7f93fd70e000 r-xp 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd70e000-7f93fd80d000 ---p 0003 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd80d000-7f93fd812000 rw-p 0002f000 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd812000-7f93fd814000 rw-p 00:00 0 7f93fd814000-7f93fd83b000 r-xp fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fd83b000-7f93fda3a000 ---p 00027000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3a000-7f93fda3b000 r--p 00026000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3b000-7f93fda3f000 rw-p 00027000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3f000-7f93fda4 rw-p 00:00 0 7f93fda4-7f93fdb1 r-xp 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdb1-7f93fdc1 ---p 000d 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdc1-7f93fdc2 rw-p 000d 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdc2-7f93fdc25000 rw-p 00:00 0 7f93fdc25000-7f93fdc44000 r-xp fe:00 949804 /usr/lib/python2.7/lib-dynload/_io.so 7f93fdc44000-7f93fde44000 ---p 0001f000 fe:00 949804 /usr/lib
Re: [galaxy-user] Trackster error: indexing
Hi Nancy, Can you share the history with the problematic dataset(s) with me and I can take a look? Please share the history with me using my email address: jeremy.goe...@emory.edu Best, J. On Jul 12, 2012, at 9:07 PM, Nancy Au Yeung wrote: Hi, I saw another post regarding trackster error and it seems like this is different. I have tried copying the dataset from the History option, but this same error occurs. See error script below. Thanks! Trackster Error *** glibc detected *** python: double free or corruption (top): 0x01c09370 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f93fe421ab6] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f93fe4267ec] /lib/x86_64-linux-gnu/libc.so.6(fclose+0x14d)[0x7f93fe412a0d] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7f93fdac2ad7] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x6e7f8)[0x7f93fdaae7f8] python(PyObject_Call+0x36)[0x4824c6] python(PyEval_CallObjectWithKeywords+0x36)[0x486086] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x52a04)[0x7f93fda92a04] python[0x482068] python(PyObject_Call+0x36)[0x4824c6] python(PyEval_EvalFrameEx+0x91a)[0x4c5e8a] python(PyEval_EvalCodeEx+0x136)[0x4ccee6] python(PyEval_EvalFrameEx+0x838)[0x4c5da8] python(PyEval_EvalCodeEx+0x136)[0x4ccee6] python(PyRun_FileExFlags+0xe1)[0x577901] python(PyRun_SimpleFileExFlags+0x177)[0x577b37] python(Py_Main+0x6f7)[0x550497] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f93fe3caead] python[0x41dea1] === Memory map: 0040-00672000 r-xp fe:00 893421 /usr/bin/python2.7 00871000-00872000 r--p 00271000 fe:00 893421 /usr/bin/python2.7 00872000-008db000 rw-p 00272000 fe:00 893421 /usr/bin/python2.7 008db000-008ed000 rw-p 00:00 0 0166d000-01c29000 rw-p 00:00 0 [heap] 7f93f800-7f93f8021000 rw-p 00:00 0 7f93f8021000-7f93fc00 ---p 00:00 0 7f93fd6de000-7f93fd70e000 r-xp 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd70e000-7f93fd80d000 ---p 0003 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd80d000-7f93fd812000 rw-p 0002f000 00:18 458639 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so 7f93fd812000-7f93fd814000 rw-p 00:00 0 7f93fd814000-7f93fd83b000 r-xp fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fd83b000-7f93fda3a000 ---p 00027000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3a000-7f93fda3b000 r--p 00026000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3b000-7f93fda3f000 rw-p 00027000 fe:00 949810 /usr/lib/python2.7/lib-dynload/_ctypes.so 7f93fda3f000-7f93fda4 rw-p 00:00 0 7f93fda4-7f93fdb1 r-xp 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdb1-7f93fdc1 ---p 000d 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdc1-7f93fdc2 rw-p 000d 00:18 458638 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so 7f93fdc2-7f93fdc25000 rw-p 00:00 0 7f93fdc25000-7f93fdc44000 r-xp fe:00 949804 /usr/lib/python2.7/lib-dynload/_io.so 7f93fdc44000-7f93fde44000 ---p 0001f000 fe:00 949804 /usr/lib/python2.7/lib-dynload/_io.so 7f93fde44000-7f93fde45000 r--p 0001f000 fe:00 949804 /usr/lib/python2.7/lib-dynload/_io.so 7f93fde45000-7f93fde4e000 rw-p 0002 fe:00 949804 /usr/lib/python2.7/lib-dynload/_io.so 7f93fde4e000-7f93fdf0f000 rw-p 00:00 0 7f93fdf0f000-7f93fdf12000 r-xp fe:00 949801 /usr/lib/python2.7/lib-dynload/_heapq.so 7f93fdf12000-7f93fe111000 ---p 3000 fe:00 949801 /usr/lib/python2.7/lib-dynload/_heapq.so 7f93fe111000-7f93fe112000 r--p 2000 fe:00 949801 /usr/lib/python2.7/lib-dynload/_heapq.so 7f93fe112000-7f93fe114000 rw-p 3000 fe:00 949801 /usr/lib/python2.7/lib-dynload/_heapq.so
Re: [galaxy-user] [galaxy-dev] Create and Transfer Galaxy Page
Todd, There's not an ideal solution for your situation. My suggestion: (a) set up a cloud instance with your tools + a Page and use the share-an-instance feature so that others can access your data, tools, histories, and page in a single place ( http://wiki.g2.bx.psu.edu/Admin/Cloud ); (b) put your tools into the tool shed for easy access ( http://toolshed.g2.bx.psu.edu/ ); (c) replicate the page + as many of the histories as possible on our public server, with a note about how to get going with either (i) tools from the tool shed or (ii) on the cloud. We're working to make public server-cloud access more easy, so there may be something on the horizon that could smooth (c)(ii) out. Best, J. On Apr 18, 2012, at 5:02 PM, Todd Oakley wrote: Jeremy - Thanks so much for your helpful responses. One problem that i didn't mention with implementing your suggestions is that the histories I want to post contain mainly new tools that my lab developed for phylogenetics using transcriptome data. Therefore, the public instance does not have most of the tools in the history (we will put these on the tool shed as soon as we can). In addition, the analyses are VERY computationally intensive, including assembly and Maximum Likelihood analyses, and therefore probably are not suitable for re-running on the public Galaxy instance. (This is also a reason why I cannot make my local galaxy instance public - it exposes too many tools that could bog down the host computer). Additional suggestions most welcome… Todd On Apr 17, 2012, at 6:33 PM, Jeremy Goecks wrote: Hi Todd, [Not sure if this is better suited to galaxy-dev or -user, so I'm sending to both]. galaxy-user is most appropriate for this question because it related to usage of Galaxy; galaxy-dev is for local installation and tool development questions. My question is - can I create a Galaxy 'Published Page' from my local Galaxy instance/histories, and then transfer that page to the main Galaxy instance? Not currently, though this is in our long-term plan. The reason is that I cannot make my local Galaxy instance public, as I am using a campus resource to host our galaxy. If this is possible, how can I do that? If not, any other ideas? It is possible to move datasets and workflows relatively easily between instances, so I'd recommend that: (a) you move your data and workflows to our public instance; (b) rerun your analyses on the public instance to create the required; (c) create and host the Page on our public instance. You can be assured that we will maintain our public server over the coming years and your Page will remain available and have a stable URL. Also, are there any tutorials/pages on how to create Published Pages in Galaxy in the first place? Not yet, though the idea is for the Page editor to be self explanatory. Here's how to get started with Pages: (a) from User menu, go to Saved Pages; (b) create a Page; (c) edit the Page using the Web-based editor; there are menus for inserting embedded datasets, workflows, histories, and visualizations as well as performing standard word-processing operations. Let us know if you have problems/questions and we'll start a guide for creating Pages. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Tophat mapping
I am wondering if these non-coding reads will be included when cufflinks calculates transcript/gene expression. Reads will only be included if they map to assembled/known transcripts. And another question is: how to know the number of reads mapped to a certain exon? This isn't possible because a single read may map to multiple exons and/or transcripts. Cufflinks assigns reads probabilistically when their mapping cannot be uniquely determined. See http://cufflinks.cbcb.umd.edu/faq.html#count http://cufflinks.cbcb.umd.edu/howitworks.html for details. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Tophat mapping
Jeremy, do you have a workflow to estimate what percent of the reads are mapping to unknown expressed regions? Here's a simple approach assuming mapped reads are in BAM format: BAM -- SAM SAM -- Interval Intersect reads as interval with known annotation not allowing for any overlap. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] [galaxy-dev] Create and Transfer Galaxy Page
Hi Todd, [Not sure if this is better suited to galaxy-dev or -user, so I'm sending to both]. galaxy-user is most appropriate for this question because it related to usage of Galaxy; galaxy-dev is for local installation and tool development questions. My question is - can I create a Galaxy 'Published Page' from my local Galaxy instance/histories, and then transfer that page to the main Galaxy instance? Not currently, though this is in our long-term plan. The reason is that I cannot make my local Galaxy instance public, as I am using a campus resource to host our galaxy. If this is possible, how can I do that? If not, any other ideas? It is possible to move datasets and workflows relatively easily between instances, so I'd recommend that: (a) you move your data and workflows to our public instance; (b) rerun your analyses on the public instance to create the required; (c) create and host the Page on our public instance. You can be assured that we will maintain our public server over the coming years and your Page will remain available and have a stable URL. Also, are there any tutorials/pages on how to create Published Pages in Galaxy in the first place? Not yet, though the idea is for the Page editor to be self explanatory. Here's how to get started with Pages: (a) from User menu, go to Saved Pages; (b) create a Page; (c) edit the Page using the Web-based editor; there are menus for inserting embedded datasets, workflows, histories, and visualizations as well as performing standard word-processing operations. Let us know if you have problems/questions and we'll start a guide for creating Pages. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Workshop in Chicago
Scott, Your information is incorrect. The Galaxy Community Conference ( http://wiki.g2.bx.psu.edu/Events/GCC2012 ) will have something for everyone who is working with Galaxy, from sys admins to tool developers to core staff to end users/biologists. Our program is still in flux, and we welcome input about what you'd like to see at the conference at outre...@galaxyproject.org Best, J. From: Scott W. Tighe scott.ti...@uvm.edu Date: April 11, 2012 9:42:08 AM EDT To: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Workshop in Chicago Dear Galaxy and Admin Staff: I was informed by a few peope that the Galaxy workshop in Chicago is reay geared to Bioinformatic people that know how to write code. Not necessiy for general core ab staff that has data analysis needs from time to time. Can anyone shed some light on the subject please Scott Tighe -- Core Laboratory Research Staff DNA and Microarray Core Facility 149 Beaumont Ave University of Vermont HSRF 305 Burlington Vermont USA 05045 802-656-2557 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] problems with color settings in Visualizations
Mackenzie, We've fixed this issue in our code base and it should be fixed on our server in the next day or two. Best, J. On Mar 22, 2012, at 3:35 PM, Mackenzie Gavery wrote: Hi, I am working with some saved visualizations, and finding that the color settings are not working today. Specifically, every time I change the color (in Settings) the result is the feature ends up black regardless of the color selection. Could you help me with this issue? Thanks, Mackenzie ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] questions on directionality
Nick, I apologize if this is covered in documentation or help threads. searched and it seemed it was not. I have several illumina rna-seq data sets that should be directional. It seems the directionality is very good, based on the visualization. First question is; in the visualization window, are the reads color coded by direction, i.e. are orange one direction and blue the other? Different colors in read data does indicate strandedness; hover over the track and click on the 'Edit Settings' icon (gear) to see/change the sense/anti-sense colors used. Similar question, is there a way to quantify directionality of the data set? You can use the Filter SAM tool to filter for mapped strand. Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Using galaxy for Bacterial RNA-seq
Bomba, I'm not familiar enough with bacterial/prokaryotic transcriptomes to suggest a possible workflow. You might try the standard Tophat-Cufflinks-Cuffcompare/merge-Cuffdiff workflow and see whether you get meaningful results; Tophat runs Bowtie internally, so there's no reason to run Bowtie separately unless there are Bowtie-specific parameters that you need to modify. I've had very little experience with PALMapper and can't speak to its efficacy, either for eukaryotic or prokaryotic transcriptome analyses. Finally, I've cc'd the galaxy-user mailing list. Using this list is the best way to reach the Galaxy user community and get in touch with someone that has used Galaxy to analyze bacterial transcriptomes. Good luck, J. On Feb 16, 2012, at 9:17 AM, Bomba Dam wrote: Dear Dr. Goecks, I am working as a post-doctoral fellow in MPI Marburg, Germany. We am trying to understand the differential expression of genes in a methanotrophic bacterium under different growth conditions. We are sequencing the transcriptome using Illumina Hiseq. As I dont have expertise in programming languages, I found the Galaxy interface very user-friendly for doing such transcriptome analysis. However, I could not find a step wise protocol\workflow for mapping bacterial RNA-seq against the reference genome (we have the completely sequenced genome of our model organism). I have found a detailed step by step workflow for RNA-seq analysis from the University of Alabama web-site (uab.edu). However, it refers to the eukaryotic system. Most examples provided and used for analysis are from eukaryotic systems. I am a bit confused weather the same workflow will also work well for bacterial systems as there are no splicing events or I should make some modifications. Can you kindly suggest me which workflow should I follow for mapping the bacterial reads (Bowtie, Tophat or PALMapper) and subsequent quantification steps. I want some guidance in this regard. With kind regards, Bomba Dam -- Dr. BOMBA DAM Alexander von Humboldt Postdoctoral Research Fellow Max-Planck-Institut für terrestrische Mikrobiologie Karl-von-Frisch-Straße 10 D-35043 Marburg, Germany E mail: bomba@mpi-marburg.mpg.de PHONE: +49 176 321 321 75 (Mobile); +49 6421 178 721 (LAB); +49 6421 2828516 (ROOM) Assistant Professor of Microbiology Department of Botany, Institute of Science Visva-Bharati (A Central University) Santiniketan, West Bengal 731235, India. E mail: bumba_mi...@visva-bhatari.ac.in, bumba_mi...@rediffmail.com; ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Solution for: Error running cuffdiff. Error: cannot open reference GTF file CONDITION, CONTROL for reading
The problem ended being the use of Perform Bias Correction(-b) and a GTF file with no Database/Build associated. Looking at cuffdiff wrapper I found, if a FASTA reference is not selected from the history, the FASTA reference of the GTF file associated build is used. If there is not build association, your cuffdiff run will fail with this not so helpful error. My feeling is, cuffdiff should check for a non-dashed string after '-b' and complain if is absents, but this doesn't happen currently. Agreed. I implemented the spirit of this functionality via argument checking in galaxy-central changeset 71031bf3105c Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Clustering with cuffcompare or cuffdiff results
1. It seems that it is better to run everything up to cuffdiff, but does cuffdiff allow multiple sample comparison because I read somewhere that even for multi-samples it still compare tham pairwisely? Cuffdiff supports replicate analysis. In a sense, because I want to do clustering which needs some quantitative data source to do the merging, will cuffdiff provide me some quantitative measures rather than the test score and p-value which is too qualitative to include? Take a look at the Cuffdiff documentation for outputs: http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output 2. If I really need to get count data from the FPKM values, how do I obtain the mentioned effective length? Would it be better if I treat each assembled transcript as an object in clustering, rather than genes. What does it mean you'd be throwing away Cufflinks' uncertainty even with using isoforms as objects? How should I include the uncertainty into my clustering? These FAQs from http://cufflinks.cbcb.umd.edu/faq.html address your questions: -- I want to find differentially expressed genes. Can I use Cufflinks in conjunction with count-based differential expression packages? It's possible, but we strongly advise against this. Current count-based differential expression tools are poorly suited to differential expression analysis in genomes with alternatively spliced genes. The main reason for this is that when a gene has multiple isoforms, a change in the total number of reads or fragments from that gene doesn't always correspond to a change in expression for that gene. Conversely, a gene's expression may change, but the total number of fragments generated by its isoforms may be very similar. In order to detect changes accurately, it's necessary to estimate how many fragments came from each individual splice variant in each sample. Current count-based tools don't do this (to our knowledge - please send us email if you know of one!). Even if they did, fragments that come from parts of genes that are shared by more than one splice variant can't generally assigned to a single isoform, so the fragment counts for each isoform are only estimates, and there is some uncertainty in the counts. Isoforms that are very similar will have a great deal of uncertainty surrounding their fragment counts. This uncertainty needs to be accounted for when testing for differential expression. So while you could use Cufflinks to estimate isoform-level counts, you'd be throwing away Cufflinks' uncertainty, and thus have more confidence in the differences you see than you really should. This will probably lead to many false positives in your analysis. Furthermore, we do not normalize simply by the length to calculate FPKM but an effective length, as explained in our publications. Calculting counts from FPKM by multiplying by the length will give incorrect results. We strongly encourage you to consider using Cuffdiff to find differentially expressed genes and transcripts. Will you please report how many fragments come from each transcript in a future release? For the foreseeable future, we will not be reporting the number of fragments we think originated from each transcript. People who have asked for this almost always want to use Cufflinks in conjunction with count-based differential expression packages, which is not a good idea. We're trying to keep our output formats as simple as possible. -- Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How to get reads counts from cufflins?
Victor, I got the normalized values (FPKM) from cufflinks. And I want to get relative reads counts. How can I do that? It's not clear to me what you're looking for. FPKM is a normalized read count metric where the F stands for fragment, which is a single read (or half of a paired read). Another question: how does cufflinks handle isoform genes while calculating the reads counts? Or what papers can help me understand this? Expectation maximization is used to probabilistically assign reads to isoforms. See the Cufflinks documentation for details and paper links: http://cufflinks.cbcb.umd.edu/ Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How to get reads counts from cufflins?
Reads are probabilistically assigned, so raw read counts are not available from Cufflinks. Recovering raw fragment counts could be done by reverse-engineering the FPKM value, but Cufflinks doesn't do this for you. If you choose to do this, keep in mind that Cufflinks uses an effective transcript length. Best, J. On Feb 8, 2012, at 11:06 PM, Li, Jilong (MU-Student) wrote: Dear Jeremy, Sorry, I didn't expressed my question clearly. I got the FPKM normalized values for each gene from cufflinks. And I want to get the original reads counts that were not normalized from cufflinks. Could you please tell me how to get those? Thank you very much! Victor From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Thursday, February 09, 2012 4:00 AM To: Li, Jilong (MU-Student) Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] How to get reads counts from cufflins? Victor, I got the normalized values (FPKM) from cufflinks. And I want to get relative reads counts. How can I do that? It's not clear to me what you're looking for. FPKM is a normalized read count metric where the F stands for fragment, which is a single read (or half of a paired read). Another question: how does cufflinks handle isoform genes while calculating the reads counts? Or what papers can help me understand this? Expectation maximization is used to probabilistically assign reads to isoforms. See the Cufflinks documentation for details and paper links: http://cufflinks.cbcb.umd.edu/ Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Walltime exceeded
Peera, Turning off bias correction can significantly shorten Cufflinks runtime. If you still encounter this error, you'll want to use a local or cloud instance of Galaxy: https://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy http://wiki.g2.bx.psu.edu/Admin/Cloud Good luck, J. On Jan 30, 2012, at 3:40 AM, Hemarajata, Peera wrote: Dear all, My Cufflinks jobs keep getting killed due to the walltime limit. Is there a way to fix this or is there anything I can do to reduce the size of my BAM datasets so the analysis can get done? Thank you! Peera Hemarajata ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cufflinks merging more than one transcript on bacterial genomes
Noa, This is one thing I would like help with- is it worth simply reducing to nothing the max intron size? What is accepted consensus when using tophat on bacterial genomes? I'm not sure that folks on this list have much experience with bacterial transcriptome analysis. You might try seqanswers.com or try emailing the Tophat/Cufflinks authors directly: tophat.cuffli...@gmail.com If you find something interesting in another place, please feel free to share with the Galaxy community. When I look at the second tophat file, of accepted hits, all hits align nicely with known genes. However, when I run cufflinks I run into the following issues: when I use a reference genome, I get in addition to the known transcripts, a bunch of very long transcripts spanning very large genomic regions. Also, I will have two genes that are very near each other but run in opposite directions (which you can see beautifully in the tophat accepted hits alignments - different colors for each strand) but they merge into a single CUFF identifier. Is there any way I can address this- is it something I am missing with respect to parameters I have to change because I am working on a bacterial genome? Reference genome or reference gene annotation? Using a genome to correct for bias should not change the assembled transcripts, only their expression levels. You can use a reference gene annotation either as ground truth or as a guide; using the reference as ground truth ensures that Cufflinks will only assemble transcripts defined in the annotation. Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trackster errors
Erin, This was due to a temporary issue that has been fixed. However, you'll need to copy the problematic datasets and use the new copy for visualization. To copy datasets, use History Options -- Copy Datasets; you can select the source history as your target history to copy datasets within a history. Thanks, J. On Jan 20, 2012, at 9:57 AM, Erin Shanle wrote: Hello I would like to visualize the tracks of my tophat accepted hits bam file. I ran my first sample that was ~10,000,000 reads and it could be visualized in both Trackster and the UCSC genome browser. When I tried to visualize my other samples (which ranged from 15,000,000 to 35,000,000 reads) it won't show up in trackster because of an indexing error. When I ran Picard statistics on the tophat accepted hits bam output, I see that there was successful alignment of ~90% of the reads. Since I have one sample that works, I am not sure how to address the issue. Here's the error I get from Trackster when I try to visualize the samples: *** glibc detected *** python: double free or corruption (!prev): 0x00ff02b0 *** === Backtrace: = /lib/libc.so.6(+0x71ad6)[0x7ff6561f7ad6] /lib/libc.so.6(cfree+0x6c)[0x7ff6561fc84c] /lib/libc.so.6(fclose+0x14d)[0x7ff6561e8a1d] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7ff655cdbb17] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(+0x6e838)[0x7ff655cc7838] python(PyObject_Call+0x47)[0x41ef47] python(PyEval_CallObjectWithKeywords+0x43)[0x4a1a53] /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(+0x52a44)[0x7ff655caba44] python[0x46f0a3] python(PyObject_Call+0x47)[0x41ef47] python(PyEval_EvalFrameEx+0x4878)[0x4a72b8] python(PyEval_EvalCodeEx+0x911)[0x4a95c1] python(PyEval_EvalFrameEx+0x4d12)[0x4a7752] python(PyEval_EvalCodeEx+0x911)[0x4a95c1] python(PyEval_EvalCode+0x32)[0x4a9692] python(PyRun_FileExFlags+0x13e)[0x4c98be] python(PyRun_SimpleFileExFlags+0xd4)[0x4c9ad4] python(Py_Main+0x9ed)[0x41a6bd] /lib/libc.so.6(__libc_start_main+0xfd)[0x7ff6561a4c4d] python[0x4198d9] === Memory map: 0040-0061d000 r-xp fe:00 715090 /usr/bin/python2.6 0081d000-0087f000 rw-p 0021d000 fe:00 715090 /usr/bin/python2.6 0087f000-0088e000 rw-p 00:00 0 00d1a000-0109a000 rw-p 00:00 0 [heap] 7ff65000-7ff650021000 rw-p 00:00 0 7ff650021000-7ff65400 ---p 00:00 0 7ff6556ed000-7ff655702000 r-xp fe:00 1512749 /lib/libgcc_s.so.1 7ff655702000-7ff655902000 ---p 00015000 fe:00 1512749 /lib/libgcc_s.so.1 7ff655902000-7ff655903000 rw-p 00015000 fe:00 1512749 /lib/libgcc_s.so.1 7ff655903000-7ff655933000 r-xp 00:13 1154848 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so 7ff655933000-7ff655a32000 ---p 0003 00:13 1154848 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so 7ff655a32000-7ff655a37000 rw-p 0002f000 00:13 1154848 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so 7ff655a37000-7ff655a39000 rw-p 00:00 0 7ff655a39000-7ff655a55000 r-xp fe:00 738300 /usr/lib/python2.6/lib-dynload/_ctypes.so 7ff655a55000-7ff655c55000 ---p 0001c000 fe:00 738300 /usr/lib/python2.6/lib-dynload/_ctypes.so 7ff655c55000-7ff655c59000 rw-p 0001c000 fe:00 738300 /usr/lib/python2.6/lib-dynload/_ctypes.so 7ff655c59000-7ff655d29000 r-xp 00:13 1154847 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so 7ff655d29000-7ff655e29000 ---p 000d 00:13 1154847 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so 7ff655e29000-7ff655e39000 rw-p 000d 00:13 1154847 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so 7ff655e39000-7ff655ec rw-p 00:00 0 7ff655fc-7ff656185000 rw-p 00:00 0 7ff656186000-7ff6562de000 r-xp fe:00 1514734 /lib/libc-2.11.2.so 7ff6562de000-7ff6564dd000 ---p 00158000 fe:00 1514734 /lib/libc-2.11.2.so 7ff6564dd000-7ff6564e1000 r--p 00157000 fe:00 1514734 /lib/libc-2.11.2.so 7ff6564e1000-7ff6564e2000 rw-p 0015b000 fe:00 1514734
Re: [galaxy-user] How to find out SNPs and point mutations in RNA-Seq data using Galaxy?
Wei, The pileup tool will help you find SNPs in your data; you'll want to read the documentation to understand how best to use it for your needs. You can also try the Unified Genotyper on our test server ( http://test.g2.bx.psu.edu/ ), but it's in alpha/beta status and we aren't providing any support for it yet. Good luck, J. On Jan 9, 2012, at 1:28 AM, ericliao...@gmail.com ericliao...@gmail.com wrote: HI, I am new to the RNA-seq, and the only available sources for me to do analysis is the Galaxy server. I want find out SNP and point mutations in RNA-Seq data using Galaxy (I do not know if anyone using RNA-seq data to find point mutations, because there is whole Genome sequencing for reporting mutations and SNPs). I have been searching in the forum for a step-by-step protocols for doing it, but could not find it. I have one normal sample and two cancer samples, a TopHat produced accepted Hits.bam file for each one. I want to find out SNP and point mutations in the cancer samples, so How do I go from here? Can anyone show me how to do it in Galaxy main server? Thanks! Wei ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Make Galaxy continue running when i close the browser
Efthymois, You'll want to run Galaxy as a daemon process. Run % sh run.sh --help to get more information on running Galaxy as a daemon. Also, please direct questions about running/configuring a local Galaxy instance to galaxy-dev (cc'd) rather than galaxy-user, which is for tool and analysis questions. Best, J. On Jan 5, 2012, at 8:06 AM, Makis Ladoukakis wrote: Dear Galaxy users, I have installed a local Galaxy instance on a server and I use it to run certain genomic assembly workflows. Nevertheless with larger datasets completion may take up to one day. How can i make Galaxy to continue the operation even when i close the browser? Is that possible on a local instance or on the main server? Thank you, Efthymios Ladoukakis ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Running cufflinks on a genome without a bowtie index
Noa, Using your FASTA in Tophat and Cufflinks is the correct approach. You don't need to provide an annotation file in Cufflinks, and you can also avoid using your FASTA in Cufflinks by not using bias correction. If you're still having problems, the issue is likely your parameter choices in Tophat and/or Cufflinks. You'll want to read the documentation carefully to choose parameters appropriately for your data. Good luck, J. On Dec 22, 2011, at 5:09 AM, Noa Sher wrote: Hi I am trying to run Cufflinks on a genome without a bowtie index. How do I make my own index? I have a FASTA file of the genome, but if I run tophat using just that and then cufflinks using a gtf file of the transcriptome, I get zero in all FPKM values Thanks Noa ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] suggestions for de novo assembly plant transcriptome without reference
Baohua and Jane, As David noted, there is a Trinity wrapper for Galaxy, it works, and Trinity is great. However, Trinity is not enabled/installed on our public server (main.g2) or on Galaxy cloud instances (Amazon) right now. You'll need a little programming expertise to set up Trinity locally or on the cloud. Also, the Galaxy's team support for Trinity is very minimal right now as we haven't done much testing with it yet. Good luck, J. On Dec 21, 2011, at 4:01 PM, David Matthews wrote: Hi Jane, I have used Trinity on a local installation here at Bristol University. The main reason its not on Galaxy main is because its very very memory intensive (we run it on nodes with 256GB RAM). So you really need access to a big machine to run it. Having said all that the output is astoundingly good so it's worth the time and effort to get a run going if you can. Cheers David On 21 Dec 2011, at 13:36, Jane Song wrote: Dear Galaxy Expert, I would like to use Galaxy to de-novo assembly single-end read illumina data (140bp) for plant transcriptomes (without reference). I remember early emails mention trinity in Galaxy. But I could not see at Galaxy web http://main.g2.bx.psu.edu/root . Maybe it is installed in Amarzon EC2? Other suggestions in de-novo assembly plant transcriptomes without reference. Many thanks and look forward to hearing back from you, Jane ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Extract genomic DNA
Rebecca, You should be able to use a custom genome with this tool by selecting History from the Source for Genomic Data parameter. The bug you're describing has, to the best of my knowledge, been fixed in Galaxy and should not be present anymore. On which Galaxy instance are you seeing this issue? If this is not the main Galaxy server ( http://main.g2.bx.psu.edu/ ), you'll want to contact the maintainers of the instance that you're using and ask them to update the instance. Best, J. On Dec 14, 2011, at 1:31 PM, Rebecca C Mueller wrote: Hi all, Does anyone know if you can use the Extract Genomic DNA command with a genome not in the database? I am working with an algal genome (C. merolae) that isn't currently in the pulldown Database/Build menu. I keep getting the Unspecified genome build error, and am assuming that's the problem, as my other files appear to be formatted correctly (tab delimited without spaces for the intervals, same names for chromosomes in interval and fasta file, etc). Thanks! Rebecca ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Changing Bowtie parameters in TopHat
Thanks for your help. I'm mapping reads from one organism to a related but different organism, so some of the parameters I'd like to adjust are to relax mapping stringency -specifically: -n 3 (allow 3 mismatches in seed) -e 250 (allow cummulative phred score of 250 [or some other value depending on read length] for mismatches in remainder of read) I'd also like to only report alignments that are unambiguously mapped to a single location, so: -m 1 --best on --strata on It sounds like I need to read the documentation again, but it didn't look at first glance like I could specify these things. Yes, reading the documentation is highly recommended. You can definitely specify -m, but you may have to think creatively about how to modify Tophat's available parameters to meet your needs. You might also contact the Tophat authors directly and see if they have any suggestions: tophat.cuffli...@gmail.com Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Changing Bowtie parameters in TopHat
Jeremy, My apologies if this has been covered before but I am using Galaxy Main and wonder if, when running TopHat, you can modify the mapping parameters used by Bowtie? Not all Bowtie parameters can be modified when running Tophat. Which parameters are you looking to modify and why? It seems that the full parameter list for TopHat pertains only to the reads that aren't mapped by Bowtie (the reads spanning splice junctions). This should not be the case. For instance, documentation for the max-multihits discusses multiple hits when mapping reads junctions/segments: http://tophat.cbcb.umd.edu/manual.html If you're seeing different results, it may be a bug that could be discussed with the Tophat authors: tophat.cuffli...@gmail.com Is there a way to access the full parameter list of Bowtie through TopHat? Not currently. Or perhaps run Bowtie directly, then feed this into a TopHat run? I don't think this is possible b/c Tophat uses the reads that map initially to build the coverage islands and then uses these islands to generate an index of potential splice junctions. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] generic filenames with Export to File
There is currently no way to do this but it would definitely be a useful option to have. I've opened a ticket that you can follow and/or comment on if you're interested: https://bitbucket.org/galaxy/galaxy-central/issue/680/preserve-dataset-names-when-exporting I forgot to mention that you can inspect the datasets_attrs.txt to see the mapping between datasets and files. datasets_attrs.txt contains a JSON dict, so it would be possible to write a little script that renames datasets based on the values in the dict. J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Permissions and private roles
Also another question about permissions. If I create a Galaxy page and share that with limited users then it appears that the datasets are all public via a URL is that correct? Yes, all datasets are public via URL by default in Galaxy, and a Galaxy Page makes it easy to find this URL. Without knowing the dataset hash id and/or the instance's secret key, it's very difficult to guess a URL that leads to a valid dataset. To change a dataset's permissions, click on the pencil (Edit attributes) and scroll to the bottom of the attributes page. Thanks, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Problem with Cuffcompare
Chandu, I've deleted my copy of your history to save space as your history was quite large. Please rerun Cuffcompare with the modifications I suggested below. Thanks, J. On Oct 13, 2011, at 4:26 PM, Chandu Galaxy wrote: Thank you very much Jeremy. Can I have a look at the re-ran datasets? On Thu, Oct 13, 2011 at 12:12 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Chandu, There are two problems: (1) you mapped your reads to AgamP3, but the dbkeys for all of your Cufflinks datasets is anoGam1. This should not have happened automatically with Galaxy, but I'm looking into the issue now. Did you do this yourself? (2) Galaxy does not have sequence data for anoGam1, and this directly led to the problem that you're seeing. I corrected the problem by manually assigning build AgamP3 to your Cufflinks datasets and then rerunning Cuffcompare. In the future, I expect that we'll add anoGam1 data to our public server, but it's not clear when this will occur. Thanks, J. On Oct 12, 2011, at 5:35 PM, Chandu Galaxy wrote: Thank you Jeremy. I've shared my History named 'Mosquito Work: RNA-Seq analysis 2' with you just now. Please see the datasets from 1-58 (also see deleted datasets). Thanks. -- Chandu On Wed, Oct 12, 2011 at 2:02 PM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Chandu, Are you running your analysis on our public server ( main.g2.bx.psu.edu )? If so, can you share your history me please (Options--Share/Publish--Share with a User--my email address). Thanks, J. On Oct 11, 2011, at 4:26 PM, Chandu Galaxy wrote: Thank you for the response. I can't check my reference genome dataset because I'm using reference provided by Galaxy (Mosquito (Anopheles gambiae): AgamP3). Is there any solution? Thank you. -- Chandu On Mon, Oct 10, 2011 at 7:15 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Tool execution generated the following error message: Error running cuffcompare. Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.1.0 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). No fasta index found for ./input1. Rebuilding, please wait.. Error: sequence lines in a FASTA record must have the same length! Chandu, Cufflinks/compare/diff requires that your reference genome dataset have the following format: my_chrom AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT ... Note that all lines of sequence data have the same length. The problem you're seeing is because there are lines in your sequence data that are not the same length, e.g. my_chrom AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTA AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCG ... The FASTA Width tool in Galaxy can help you format your dataset correctly. Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] de novo assembly
Cecilia, Are you trying to use the Public Galaxy or a local install? There are several assemblers with Galaxy Wrappers on the Galaxy ToolShed (e.g. Roche Newbler, and MIRA 3) which you could add to your own local Galaxy if you have one. There are wrappers for ABySS as well. These assemblers are generally for genome data. For transcriptome data, galaxy-central provides a wrapper for the Trinity assembler. However, do novo genome assembly can be very computationally demanding, so not many Galaxy Instances will want to offer it. If you don't want to/can't set up a local instance for assembly, consider using a cloud instance: http://wiki.g2.bx.psu.edu/Admin/Cloud Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] rna-seq mutation detection
Rich, You can convert base quality scores using the FASTQ Groomer tool. Note that Galaxy tools typically work with Sanger (Phred+33) quality scores. Good luck, J. On Aug 29, 2011, at 10:48 PM, Richard Mark White wrote: Hi, Thanks very much. I've tried this, but one thing I have noticed is that if I do the initial mapping with BWA vs. Bowtie the # of variants I get is much larger with BWA. I have seen mention on the web that you need to change the quality score annotation for BWA before running SAMtools, but not sure precisely how to do this. any thoughts? Rich From: Jeremy Goecks jeremy.goe...@emory.edu To: Richard Mark White whit...@yahoo.com Cc: galaxy-user@lists.bx.psu.edu galaxy-user@lists.bx.psu.edu Sent: Sunday, August 28, 2011 2:18 PM Subject: Re: [galaxy-user] rna-seq mutation detection Rich, Given that you're analyzing your RNA-seq data using Galaxy, I'd guess that you're using Tophat to map your reads onto on reference genome. If this is the case, then you can use the BAM files produced by Tophat to generate variation data for each sample. The variation tools that you'll want to look at are [NGS: SAM Tools--]Generate Pileup [NGS: GATK Tools--]Unified Genotyper (only avaiable on our test server and still in beta) The outputs for each tool produce a consensus base for each potential variation site, and you can compare the consensus base for each sample to look for differences. If you're doing de novo assembly of your RNA-seq data to look for variation, you'll need to use tools that are not currently available in Galaxy. Good luck, J. On Aug 18, 2011, at 12:22 PM, Richard Mark White wrote: Hi, I am trying to look at differences between two RNA-seq samples to see if there are mutations in one of them relative to the other (i.e. not in comparison to a reference genome). Does anyone know of a way to do this within galaxy? Any help is appreciated! rich ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] rna-seq mutation detection
=== Please use Reply All when responding to this email! === Rich, Given that you're analyzing your RNA-seq data using Galaxy, I'd guess that you're using Tophat to map your reads onto on reference genome. If this is the case, then you can use the BAM files produced by Tophat to generate variation data for each sample. The variation tools that you'll want to look at are [NGS: SAM Tools--]Generate Pileup [NGS: GATK Tools--]Unified Genotyper (only avaiable on our test server and still in beta) The outputs for each tool produce a consensus base for each potential variation site, and you can compare the consensus base for each sample to look for differences. If you're doing de novo assembly of your RNA-seq data to look for variation, you'll need to use tools that are not currently available in Galaxy. Good luck, J. On Aug 18, 2011, at 12:22 PM, Richard Mark White wrote: Hi, I am trying to look at differences between two RNA-seq samples to see if there are mutations in one of them relative to the other (i.e. not in comparison to a reference genome). Does anyone know of a way to do this within galaxy? Any help is appreciated! rich ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cufflinks quartile normalization
=== Please use Reply All when responding to this email! === David, Quartile normalization is explained in the Cufflinks manual: http://cufflinks.cbcb.umd.edu/manual.html -- With this option, Cufflinks normalizes by the upper quartile of the number of fragments mapping to individual loci instead of the total number of sequenced fragments. This can improve robustness of differential expression calls for less abundant genes and transcripts. My reading of this is that the M in FPKM is taken from the upper quartile rather than the total; if the FPKM numbers for highly expressed isoforms change substantially, that suggests many of your reads are mapping to minimally expressed isoforms. Without knowing more about your experiment, it's not possible to say whether you should be doing quartile normalization. However, given that it's designed for DE calls for less abundant isoforms, you'll want to see whether this holds true for your dataset(s) and whether Cuffdiff DE tests makes sense in the context of your research questions. Good luck, J. On Aug 25, 2011, at 1:49 PM, David Joly wrote: Can someone help me understanding the quartile normalization in Cufflinks? I read different threads in which they reported that the FPKM values were inflated after normalization (-N) but most people didn't report their values so I don't know how big the inflation should be... In my case, the difference is huge! The FPKM values for the four first genes without normalization are in the range of [61 - 184] while after normalization, they are in the range of [2.4e+6 - 7.4e+6]. Even though this inflation does not seem to affect the calculation of the gene expression changes [ log (FPKM2/FPKM1) ], I'm wondering if something is wrong with my dataset. Is it was I should expect? Is it always better to use the normalization? Thanks, David ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cufflinks with reference annotation and without reference annotation
Crystal, If you provide a gene annotation to Cufflinks, the transcripts produced will match those in the annotation exactly. If you assemble without a gene annotation, the transcripts produced will match the reference in some cases, but, in others, will not match the reference due to small and/or large errors. Because '=' denotes an exact match between an assembled transcript and a reference transcript, more '=' are to be expected when Cufflinks has a gene annotation. Finally, a couple procedural issues: *please send questions about analyses and tool usage to the galaxy- user mailing list, not galaxy-dev or individual developers; *please do not send duplicate emails as it can confuse our tracking system and slow down our response rather than speed it up. Good luck, J. On Aug 17, 2011, at 9:14 AM, Crystal Goh wrote: Hi, I am Crystal. I have some problem with Cuffdiff output. Hope can get some advice. Thanks. After aligning RNA-seq reads with Tophat, I used the Tophat output for Cufflinks. For Cufflinks, I tried two approaches and compared the results: 1st approach: Put zebrafish Ensembl GTF as reference annotation 2nd approach: without reference annotation. From the output of above 2 approaches, I continued with Cuffcompare (with reference annotation) and Cuffdiff, Attached word document is the workflow and parameters I set for these 2 approaches. When I compared the output of Cuffdiff between these 2 approaches, a total of 48584 tracking id with class code = was observed in trancript FPKM tracking file from Approach 1, whereas there is only 1248 tracking id with class code '=' from Approach 2 (I attached transcript FPKM tracking files from approach 1 and 2) In my opinion, I should observe 48584 tracking id with class code '=' and additional tracking id with other class codes in transcript FPKM tracking file from Approach 2. Can I get advice on this? Thank you. Best regards, Crystal Workflow and parameter for 2 approaches.zipApproach 1 Transcript FPKM tracking (Cufflinks with reference annotation).zipApproach 2 Transcript FPKM tracking (Cufflinks without reference annotation).zip ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Any thing wrong with my cufflink process in galaxy?
Yao, It's difficult to tell what's wrong without seeing your analysis. However, you may want to use the reference annotation during the Cufflinks phase to either estimate isoform expression or guide assembly (this option will appear on our public server soon). Read the Cufflinks documentation to understand these options and what they do for your assembly and FPKM values: http://cufflinks.cbcb.umd.edu/manual.html#cufflinks De novo assembly from mapped reads is often somewhat imprecise and incomplete, especially for low-coverage data. It's not surprising that a de novo assembly doesn't match especially well with the reference. If you're still not seeing any differential expression after using the reference GTF in Cufflinks, Cuffcompare, and Cuffdiff, you may want to email the Cufflinks/compare/diff authors and ask for some pointers: tophat.cuffli...@gmail.com Good luck, J. On Aug 10, 2011, at 5:07 AM, yao chen wrote: Dear all: Recently, I run cufflink in galaxy on the internet. I want to compare two samples, However, I found no transcript or gene passed the significant level, even many of them have large FPKM in one sample and 0 FPKM in another sample. Any thoughts? Below is my cufflink process: I have four samples belong to two group. the test have three samples, and the control has one sample. First, using accept_hit.bam from tophat, I run cufflink without annotation on each sample. Then, for the four gtf files from four samples, I run cuffcompare to combine these transcript and compare to the annotation genome. However, at this step, I found the transcript accuracy is very low. See one example: Missed exons:10673/11776 ( 90.6%) Wrong exons:1254/2007 ( 62.5%) Missed introns:8529/8637 ( 98.7%) Wrong introns:2/5 ( 40.0%) Missed loci:0/504 ( 0.0%) Wrong loci:1248/2002 ( 62.3%) at last, I run cufdiff between this two group sample. Thank you. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cuffmerge question
Carol, My question is, if I use the public Galaxy server interface to TopHat and Cufflinks, is there any access to cuffmerge? No, Cuffmerge is not available in Galaxy. Also, I'm trying to understand the difference between using cuffmerge and then using cuffcompare (without a reference genome) to assemble gtf transcript files produced by Cufflinks for each group of 3 Illumina paired-end reads corresponding to biological replicates, in order to use the resulting combined gtf file for comparing the TopHat alignments of two such groups using cuffdiff. Is there any difference in the output between cuffdiff and cuffcompare, using in this fashion? For example, do they form the union of transcripts by the same rules, and do their outputs contain (or lack) the same columns (strand, perhaps??) I've read things on seq-answers indicating that I should be using cuffmerge, but I can't find it on the public server and apparently haven't installed it properly on my own computer so far. From the Cufflinks/compare/merge/diff documentation ( http://cufflinks.cbcb.umd.edu/manual.html#cuffmerge ): *Cuffmerge calls Cuffcompare and does some filtering of transfrags as well as merging of novel and known isoforms; *The main purpose of this script is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. Hence, it appears that Cuffmerge and Cuffcompare are relatively similar and use the same basic union algorithm--whatever Cuffcompare uses. If you have more detailed questioned, you might ask the Cufflinks' authors: tophat.cuffli...@gmail.com Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] visualization
Jiannong, Hans is on right track. You can indeed visualize your data using Trackster, Galaxy's genome browser; Trackster is available via the Visualization tab. Here are the steps needed to visualize your dataset: (1) Use the [FASTA Manipulation -- Compute Sequence length] tool to compute lengths the contigs in your build; (2) if there are spaces in your contig names, you'll need to use only the characters before the first space as contig names because this is what the mapping tools do; use [Text Manipulation -- Convert delimiters to TAB] and then [Text Manipulation -- Cut ] to cut the first and last column from the dataset; now you should have a file in the form contig_nametabcontig_length (2) Create a custom build: (a) User tab -- Custom Builds; (b) scroll to the bottom, enter a name and key for your build and copy in the contig names and lengths you created in steps (1) and (2); (3) Set the dbkey for the dataset(s) that you want to visualize by clicking on the pencil icon for each dataset and selecting your custom dbkey. (4) Use the Trackster icon next to a dataset--see attached screenshot--and insert the dataset into a new browser. Let us know if you have any problems. And, yes, we're working to make this process much easier. Best, J. inline: Screen shot 2011-08-04 at 11.09.34 AM.png On Aug 4, 2011, at 3:13 AM, Hans-Rudolf Hotz wrote: Hi Assuming you know the length of your contigs, you can add them as a custom build. Click on: 'Visualization - 'New Track Browser' - 'Add a Custom Build' Hope this helps. Regards, Hans On 08/04/2011 01:51 AM, vasu punj wrote: IGV should allow you to do this but not sure about trackbrowser in Galaxy. Vasu --- On Wed, 8/3/11, Jiannong Xuj...@nmsu.edu wrote: From: Jiannong Xuj...@nmsu.edu Subject: [galaxy-user] visualization To: galaxy-user@lists.bx.psu.edugalaxy-user@lists.bx.psu.edu Date: Wednesday, August 3, 2011, 6:03 PM Hi Jen, I mapped illumine reads to draft genomic contigs, and try to visualize the mapping. Is there any way I can use my own reference contigs for visualization? Thanks John Xu NMSU ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Questions on CuffDiff Output and Browser Visualization
Kurinji, 1. when I look at my differentially expressed transcripts file (generated using ensembl hg19 as a reference with chr added on to obtain results with ensembl gene names) and search for specific genes that I am interested in I can not find them in my cuffdiff output file - even though I can visualize these genes on IGV and they look obviously differentially regulated. Also, given that the cuffdifff output for differentially expressed transcripts does list all trascripts, including the ones that have not significantly changed, wouldn't transcripts for these genes be listed anyway, even if my visual ballparking on differential regulation is not statistically significant? I would really like to know why I am missing genes from my cuffdiff output. It's not possible to answer this question in general because it's specific to your analysis; in particular, your use of a reference annotation file is going to influence Cuffdiff's outputs. You might try using positional information rather than gene names when searching through Cuffdiff files as the gene short name/ID is only used for known transcripts/genes. More detailed questions are probably best directed to the Cufflinks authors: tophat.cuffli...@gmail.com 2. do you all get a good correlation between the top differentially expressed transcripts/genes generated from cuffdiff and how the data looks when visualized on IGV - ie. do your upregulated transcripts really look upregulated when visualizing? I found that while some validate visually, some do not which is confusing Cufflinks uses multiple statistical techniques to estimate FPKM and differential expression; in some cases, it may not be possible to visually observe differential expression amongst transcripts. Alternatively, setting additional parameters (e.g. normalization) may lead to results that match what you're looking for (visually or otherwise). 3. when visualizing on a browser, and if different transcripts for one gene are regulated differently - ie. some are up in your treated sample but some are done for the same gene - how can you tell which transcriptID from cuffdiff corresponds with what you are seeing? This information can be found in Cuffdiff's transcript FPKM tracking and differential expression testing files. Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cuffdiff Question
Hello Kurinji, I was at your USC Galaxy seminar last week, which I found very helpful - thank you! Glad to hear that you found the workshop helpful. As a reminder, please email questions about using Galaxy and its tools to the galaxy-user mailing list (which I've cc'd). You may get quicker and different responses from community members, and everyone will benefit from the discussion. I used my recently generated RNAseq data in Galaxy (which was pre-aligned using tophat and already had cufflinks run on it) - I ran cuffcompare with all the gtf files and then cuffdiff for the three pairs (there is 1 control and 3 different drug treatments - no replicates). I got several output files, as expected, but decided just to look at the gene differential expression as a start. Some questions I have are - 1. (very basic question!) which is sample 1 (and corresponding value 1) and sample 2 (and corresponding value 2)in my output file. This is what my output file is called - 90: Cuffdiff on data 37, data 38, and data 60: gene differential expression testing 33,969 lines Is 37 sample one or sample two? Given the data - I would expect sample 37 to correspond to value 2 - but I could be wrong. Please let me know! The best way to figure out which dataset corresponds with Cuffdiff's labels is to click the rerun button in the dataset: sample names correspond directly to the reads datasets (i.e. BAM files) provided as input to Cuffdiff. 2. How do I find the UCSC gene names corresponding with start/end sites - I did input the hg18 UCSC gtf file as a reference You'll need to use a reference annotation (GTF file) that has the gene_name attribute as input for Cufflinks/compare/difff. Typically Ensembl annotations have this attribute; however, you'll need to prepend 'chr' to each line--really, to each chromosome name--in order to bring Ensembl notation in line with UCSC/Galaxy notation. Actually, I noticed that value 1 in this particular output file is all 0 - no idea why. It is not this way in the other files, making me wonder if there is an error somewhere. I am sure the bam file is okay as I viewed it on IGV and saw the patterns I would expect for some candidate genes I looked at. It's difficult for me to comment without seeing your analysis. Some output files depend on particular attributes being set correctly in the annotation file. You may want to search through our mailing list archives and see if your question has already been answered: http://gmod.827538.n3.nabble.com/Galaxy-Users-f815892.html Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cuffdiff Question
Thanks for the reply. I tried to use the script provided on a previous galaxy thread for adding the chr on to the gtf file on the mac terminal but I keep getting this error - awk: can't open file ensembl.gtf source line number 1 I am very new to using the terminal so please let me know if there is something basic that I am not doing right, Try this Galaxy workflow: http://main.g2.bx.psu.edu/u/jeremy/w/make-ensembl-gtf-compatible-with-cufflinks It simply prepends 'chr' to the chromosome name, which is needed if you're using an Ensemble reference annotation and want to use it with Cufflinks/compare/diff in Galaxy. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] question on cufflinks output
Hello Wen, It's not necessary to send multiple emails to the mailing list; we track incoming emails to ensure that we respond to all of them. Your FPKM values do look high, but keep in mind that coverage is only part of the FPKM calculation; it's also dependent on transcript length and the total number of reads in your sample. Your transcript lengths look very short, so that may be skewing your FPKM values. For the record, Cufflinks is using scientific/E notation, so e denotes powers of 10 in the FPKM output. A good place to ask followup questions about cufflinks output is the cufflinks help email address: tophat.cuffli...@gmail.com Good luck, J. On Jun 24, 2011, at 10:35 AM, Wen Huang wrote: Dear Galaxy team and users, I have a question on the output by cufflinks on Galaxy. I started with about 28M paired-end reads and mapped them to the reference genome using Tophat on Galaxy. The aligned fragments were assembled by cufflinks, again on Galaxy and I got an output with the first few lines on the bottom of this email. I was wondering how could cufflinks possibly estimate FPKM on the order of e+07 when the coverage is between 8-50 fragments per base and the total mapped fragments smaller than 28M. Assuming that 20M fragments were mapped, the FPKM should be something around coverage/28. Was the e in the output the Euler's number or 10? I appreciate your help. Thanks, Wen Huang tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coveragestatus FPKMFPKM_conf_loFPKM_conf_hi CUFF.2.1 - - CUFF.2 - - chr1:90301-90706 405 21.1837 OK 1.84527e+07 1.10716e+07 2.58338e+07 CUFF.1.1 - - CUFF.1 - - chr1:65419-65692 273 30.9833 OK 2.31848e+07 8.52143e+06 3.78481e+07 CUFF.3.1 - - CUFF.3 - - chr1:135255-135896 641 8.61389 OK 6.31907e+06 3.41968e+06 9.21846e+06 CUFF.4.1 - - CUFF.4 - - chr1:155808-156529 721 7.26147 OK 5.32695e+06 2.88278e+06 7.77112e+06 CUFF.5.1 - - CUFF.5 - - chr1:160421-160729 308 17.6004 OK 1.77483e+07 7.50132e+06 2.79953e+07 CUFF.6.1 - - CUFF.6 - - chr1:170695-171212 517 9.16414 OK 8.41605e+06 4.44869e+06 1.23834e+07 CUFF.7.1 - - CUFF.7 - - chr1:180885-181188 303 30.5702 OK 2.6515e+07 1.36533e+07 3.93767e+07 CUFF.8.1 - - CUFF.8 - - chr1:184397-184702 305 26.712 OK 2.13696e+07 9.94707e+06 3.27921e+07 CUFF.10.1 - - CUFF.10 - - chr1:233237-234095 858 3.71208 OK 3.31435e+06 1.60283e+06 5.02588e+06 CUFF.9.1 - - CUFF.9 - - chr1:203688-204070 382 41.6301 OK 5.36082e+07 4.02061e+07 6.70102e+07 CUFF.11.1 - - CUFF.11 - - chr1:239126-239664 538 19.5995 OK 2.0562e+07 1.45634e+07 2.65605e+07 CUFF.12.1 - - CUFF.12 - - chr1:243903-244327 424 10.3509 OK 1.07542e+07 5.37709e+06 1.61313e+07 CUFF.15.1 - - CUFF.15 - - chr1:240487-240995 508 15.8596 OK 1.83065e+07 1.23671e+07 2.42459e+07 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Extract Genomic DNA Problem
Stephen, This is a formatting issue with your input file; it needs to be tab-delimited but it's not currently. You'll need to: (a) convert spaces to tabs using the Convert delimiters to Tabs tool; (b) click on the pencil icon and set the data type to BED. Best, J. On Jun 21, 2011, at 8:45 AM, Stephen Taylor wrote: Hi, I was trying to extract FASTA sequences using the following tab separated data for Chicken on the Galaxy Main server: chr5 4725816847259240 chr18 1938527 1939965 chr2 101973625 101974007 chr4 7565389875674045 chr19 4258837 4263299 chr4 3933004939372715 chr4 9606881 9610083 chr15 7264937 7265599 chr21 6659189 6667015 chr2 351239 352821 I got the following galaxy output: 7: Extract Genomic DNA on data 6 empty format: fasta, database: galGal3 Info: 10 warnings, 1st is: Unable to fetch the sequence from '47258168' to '1072' for build 'galGal3'. Skipped 10 invalid lines, 1st is #1, chr5 47258168 47259240 Any ideas what I am doing wrong? Thanks, Steve ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Question about output CuffDiff SplicingDiff
Felix, You seem to be providing the correct inputs to Cuffdiff and it appears to be producing valid output. More information about setting parameter values and interpreting Cuffdiff can be found in manual: http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff Good luck, J. On Jun 16, 2011, at 8:13 AM, Felix Mayr wrote: Hi there, Looking at the output of the SplicingDiff files of CuffDiff, me and my colleagues are preplexed about the output of the p_values and q_values. We've tried different inputs of different samples to compare but never seem to manage to get p_values smaller than 0.50 and we keep getting higher than 1 q_values (also smaller which we expect) which we think is strange too. The input files we use for the CuffDiff are the CuffCompare of a combined CuffCompare of a dataset, or the CuffCompare of just the two samples we want to analyse. For the samples input files we use the TopHat files respectively. Could you please help us get meaningful results for the SplicingDiff files or help us understand the data? The top 5 rows of our typical SplicingDiff file: test_id gene_id genelocus sample_1 sample_2 1583 TSS11905 XLOC_028193- chr5:134910259-134914719 q1 q2 2385 TSS12870 XLOC_030892- chr7:29976178-30008608 q1 q2 8005 TSS6887 XLOC_016656- chr18:47803031-47807892 q1 q2 10214 TSS9761 XLOC_022527- chr20:43128822-43138649 q1 q2 2818 TSS13383 XLOC_032450- chr8:100899717-100905900 q1 q2 status value_1 value_2sqrt.JS. test_stat p_value q_value 1583 OK 0 0 0.000771867 0.797878 0.501645 164.5400 2385 OK 0 0 0.001548470 0.797809 0.505482 82.8991 8005 OK 0 0 0.003288510 0.797717 0.508184 55.5615 10214 OK 0 0 0.001414180 0.797620 0.510277 41.8427 2818 OK 0 0 0.007112780 0.797416 0.513678 33.6973 Thanks in advance for your most appreciated help, Felix Mayr ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cufflinks error in galaxy
John, My best guess is that you are using bias correction but do not have the needed reference genome(s) for the builds that you want to use. See this page for instructions about setting up HTS tools; in particular, you'll need to set up the sam_fa_indices.loc file: https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup Best, J. On Jun 9, 2011, at 4:44 AM, 吳正華 wrote: Hi galaxy dev team: I just installed galaxy on my ubuntu box and tried to do RNA-seq analysis according to Jeremy's excellent tutorial http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise however, I encountered following error messages when I was trying to execute Cufflinks in galaxy.. Dataset generation errors Dataset 121: Cufflinks on data 92: gene expression Tool execution generated the following error message: Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf' The tool produced the following additional output: cufflinks v1.0.3 cufflinks -q --no-update-check -I 30 -F 0.05 -j 0.05 -p 4 -b how should I solve this problem? Thanks in advance. Best Regards, John Wu ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Galaxy Help: Extract sequences from [gtf file] + [genome FASTA file]
Edge, Please send questions like this to the galaxy-user mailing list, where many people see your email and can help you and/or benefit from it. I've cc'd the list for this reply. The thread you linked to is out of date. To get sequences for the features in a GTF file, you can use the 'Extract Genomic DNA' tool and set the option 'Interpret features when possible' to Yes. To get sequences for Cufflinks transcripts, use the transcripts.gtf as input to the tool. Best, J. On May 12, 2011, at 3:08 AM, Edge Edge wrote: I just read through the post at the following link, http://lists.bx.psu.edu/pipermail/galaxy-user/2011-February/001934.html I'm facing the same problem as well. I'm desired to extract out the assembled transcript by Cufflink. Can I know that how I link my output file from Tophat and Cufflink with the Galaxy? I'm having the following output file right now: junctions.bed insertions.bed deletions.bed accepted_hits.bam human_reference_genome.fasta transcripts.gtf isoforms.fpkm_tracking genes.fpkm_tracking Sorry that I got a bit confusing about the explanation that you given to Karen, in order to get the sequence data for transcripts in a Cuff* GTF file, you'll want to select for only exons (use Galaxy's 'Extract Features' tool) and then use the resultant dataset as input to Extract. Thanks a lot for your advice. best regards edge Master Student UTAR Malaysia ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] question about Filtering Cufflink files
Jagat, First, a couple housekeeping issues: (a) the questions you're asking are better suited to the galaxy-user list (questions about using Galaxy and performing analyses) rather than galaxy-dev (questions about installing Galaxy locally and tool development), so I've moved this thread to galaxy-user; (b) please start new threads when appropriate rather than replying to older threads as this makes threads shorter and more focused. Onto your questions: I have another question when I filter gene list In the filtered list there are multiple rows per gene. I should have one gene per row? I have attached the snap shot of out put, but not sure if galaxy server will take it or not. I did se the discussion on other forum: http://seqanswers.com/forums/showthread.php?t=8830 GTF files have multiple lines per feature, so your output is reasonable. which suggest that possible complications in getting one gene per row. My next question is in that scenario what should be the best way of representing one gene per FPKM value? should we take average of FPKM per gene? I think in the gene it is till giving the transcript FPKM value but these values are different from previous file filtered with transcript id. As Vasu noted, this is an ongoing area of research. For some experiments, it may be reasonable to group alternatively-spliced isoforms of the same gene and jointly estimate FPKM, and for others it may not. Fortunately, if you do want to group transcripts to get gene FPKM values, Cuffdiff does this for you: see its gene FPKM expression file. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Filter Tool
(Starting new thread on galaxy-user.) Jagat, It depends what filter tool you're using and what dataset you're filtering. There is a generic filter tool that can be used to filter Cuffdiff tabular files for either FPKM values and differential expression tests. There is also a tool for filtering GTF files based on a Cuffdiff expr dataset. It sounds like you may be confusing either the tools or the inputs. If after double-checking you're still having problems with filtering, please put together a short list of your analysis steps and share your history with me, and I can take a look. Thanks, J. Further to my question, It appear that there is some problem with the filter option: When I use the isoform/gene exp file as such it work fine but when I filter these files with either parameter such as status if test was successful or on p value it return me empty file. The way am saving the file is - expr file filter save as txt file and upload back in Galaxy. Any suggestion? Jagat On Tue, May 3, 2011 at 3:08 AM, shamsher jagat kanwar...@gmail.com wrote: Jeremy, I have been trying to follow the steps in filtering Cufflink out put files you have described in one of the previous messages (http://gmod.827538.n3.nabble.com/Re-downstream-analysis-of-cuffdiff-out-put-td2836457.html): I have shared histroy with you, but in summary: File 35: when Filter GTF data by attributes value list on data 11 (combined GTF) and data 33 (which is gene expr file) . Will not this should have one gene per row. But it is not? File 39: Filter GTF file by attribute value list on data 11 and data 38 (Cuffdiff splicing expr) it failed. I would assume that it should filter on the basis of TSSid . The error message is Traceback (most recent call last): File /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py, line 67, in filter( gff_file, attribute_name, ids_file, output_file ) File /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py, line 57, in filter if attributes[ attribute_name ] in ids_dict: KeyError: 'tss_id' 40 : Filter GTF data by attribute list on data 11 and 34 (tss group exp) failed and error message is: Traceback (most recent call last): File /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py, line 67, in filter( gff_file, attribute_name, ids_file, output_file ) File /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py, line 57, in filter if attributes[ attribute_name ] in ids_dict: KeyError: 'tss_id' I would consider that if one gene has different Id than there is splicing . However in contrast isoform file with transcript Id is working fine (File 20) On a different note can I convert GTF file to txt tab delaminated file I tried to convert file 11 in txt (following Edit attributes) but the file is not properly formatted especially col-pid and TSS id. Am I doing something wrong. Thanks. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] RNA seq analysis
Sumathy, It sounds like you're on the right track. To visualize data for a custom build in Trackster, you need to create a custom build and use that in Trackster: (1) using the top tabs in Galaxy, go to User -- Custom Builds; (2) add a new build with the length info as follows: contig_name length Important note: you'll need to make sure that your contig name matches the one used in your fasta file. This is my best guess about what's causing problems for you. (3) Create a Trackster visualization using the custom build and add your dataset. Let us know if you have more questions/problems. Thanks, J. On May 6, 2011, at 10:43 PM, puvan...@umn.edu wrote: Hi I may be doing in a wrong way. I clicked trackster and I added the custom build genome. Since it is a very small genome (~2kb), I considered this as a single contig. Then I cliked add tracks and added my data file. But I got a message no data for this contig. Whenever I used built in genomes I did not have any problem. I guess I am doing something wrong here. Sumathy On May 6 2011, Jeremy Goecks wrote: Sumathy, What kind of problems are you having with Trackster? J. On May 6, 2011, at 8:30 PM, puvan...@umn.edu wrote: Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Sumathy Puvanendiran Graduate student ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Normalization and plotting of RPKM/FPKM after cufflink
Vasu, Here are the steps to create this visualization; this is relatively new functionality, and you'll want to use our test server ( http://test.g2.bx.psu.edu/ ) for now. (1) Create a new visualization from the main menu: Visualization -- New Track Browser and choose your genome build. (2) Add your Cufflinks GTF files to the visualization using the 'Add Tracks' button in the upper right of the visualization. (Adding the Tophat reads and/or annotation tracks might prove useful as well.) (3) Zoom in (use the button at the top, double click on a point, or drag to select an area on the genome coordinates at the top) until the track's menu has the option 'Show filters' and choose this option to show filters. (4) Once filters are visible, you should be able to drag the slider to dynamically filter transcripts. Here's an example visualization of some mapped Tophat reads and Cufflinks transcripts that you can try out: http://test.g2.bx.psu.edu/u/jeremy.goecks/v/assembly-of-h1-hesc-rna-seq-data We're continuing to refine and extend this functionality and the Galaxy Track Browser in general; questions/comments/suggestions are most welcome. Best, J. On Apr 19, 2011, at 9:28 AM, vasu punj wrote: Thanks Jeremy, This appear to be a useful function. Could you please enlist the steps in workflow to achieve the above visualization or alternatively point me to the URL where it is summarized please. I believe it will take Tophat out put Bam file and fpkm tracking file. I tried but I dont see track browser unless i convert to GTF file format. Further if you can point me how to get the slider window function as shown in snap shot that will be great. Good work Jeremy! Thanks. Vasu --- On Sun, 4/17/11, Jeremy Goecks jeremy.goe...@emory.edu wrote: From: Jeremy Goecks jeremy.goe...@emory.edu Subject: Re: Normalization an dplotting of RPKM/FPKM after cufflink To: vasu punj pu...@yahoo.com Cc: galaxy-u...@bx.psu.edu Date: Sunday, April 17, 2011, 3:45 PM Vasu, I want to include the following discussion in my message regarding use Bam files of Tophat to visualize reads either in IGV or Galaxy or other tools. I want to find out if I can plot RPKM/FPKM normalized values after running differential analysis in Cufflinks. Galaxy has a number of tools for analyzing numerical data; look under the menu items Statistics and Graph/Display Data for useful tools. If you're looking to plot FPKM values in addition to mapped reads from Tophat and Cufflinks transcripts, the Galaxy Tracks Browser might prove useful as it has filtering functionality so that you can move a slider to show/hide data based on FPKM values; its often useful to use the sliders for FPKM measures to get a sense of your data. See the attached screenshot for an example. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] get wig file after tophat
Hi Ying, You're in luck because I've been working with genome browsers lately, so I think I can help you address your problem. What you're looking for is a visualization of a coverage histogram for the BAM reads produced by Tophat, yes? It turns out that some genome browsers provide this automatically as part of their solution for visualizing BAM files b/c BAM files tend to be very large and hence visualizing aggregated data is often the best solution. Both IGV and the Galaxy Trackster Browser support this functionality. I think you'll have to do some simple file conversions to get the display you want in IGV; you can check out the IGV documentation or perhaps Jim can help. I'm not sure if IGB supports this visualization mode for BAM; Ann can chime in with additional information. The Galaxy Track Browser supports coverage histograms when viewing large regions. When zoomed in, the reads are typically displayed individually, although there is a (very beta) option to create a histogram for the visible set of reads; this option may not work well (yet!) as Tophat reads often have large gaps. The top track in this visualization shows a coverage histogram for a set of Tophat reads: http://test.g2.bx.psu.edu/u/jeremy.goecks/v/assembly-of-h1-hesc-rna-seq-data Please see my previous email to Vasu for details about setting up a visualization in the Galaxy Track Browser. Best, J. On 4/20/11 5:16 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote: Dear Ann and Jeremy: We have this discussion long time ago, and I am sorry that I brought it up here again. I am just thinking that as Ann said, can we add this tool which convert bam into wig file into galaxy? Or make a workflow to generate a wig file from a bam file generate by tophat? In this way we can just easily get a wig file from galaxy and will be able to see it in IGB. I know this may seems unnecessary for the purpose of statistical analysis, but if we can see the coverage with IGB, sometimes it is helpful to pick up interesting points quickly for specific genes. This may seems a old fashion way but my boss is a big fan of using IGB to see expression file(wig or sgr file) and do some analysis. THanks a lot! BEst Ying Quoting Jeremy Goecks jeremy.goe...@emory.edu: Hi all, Ann is correct - Tophat does not produce .wig files when run anymore. However, it's fairly easy to use Galaxy to make a wiggle-like coverage file from a BAM file: (a) run the pileup tool on your BAM to create a pileup file; (b) cut columns 1 and 4 to get your coverage file. A final note: it's often difficult to visualize coverage files because they're so large. You might be better off visualizing the BAM file and using the coverage file for statistics. Best, J. Hello, I think I know the answer (sort of) to this question. This may be because newer versions of tophat stopped running the wiggles program, which is still part of the tophat distribution and is the program that makes the coverage.wig file. A later version of tophat might bring this back, however - there's a note to this effect in the tophat python code. So if you can run wiggles, you can make the coverage.wig file on your own. A student here at UNC Charlotte (Adam Baxter) made a few changes to the wiggles source code that would allow you to use it with samtools to make a coverage.wig file from the accepted_hits.bam file that TopHat creates. If you (or anyone else) would like a copy, please email Adam, who is cc'ed on this email. We would be happy to help add it to Galaxy if this would be of interest to you or other Galaxy users. If there is any way we can be of assistance, please let us know! Very best wishes, Ann Loraine On 2/21/11 3:39 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote: Hi: I am using tophat in galaxy to analyze my paired-end RNA-seq data and find out that after the tophat analysis, we can not get the wig file from it anymore which is used to be able to. Do you have any idea of how to still be able to get the wig file after tophat analysis? Thanks a lot! Best Ying Zhang, M.D., Ph.D. Postdoctoral Associate Department of Genetics, Yale University School of Medicine 300 Cedar Street,S320 New Haven, CT 06519 Tel: (203)737-2616 Fax: (203)737-2286 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Ann Loraine Associate Professor Dept. of Bioinformatics and Genomics, UNCC North Carolina Research Campus 600 Laureate Way Kannapolis, NC 28081 704-250-5750
Re: [galaxy-user] downstream analysis of cuffdiff out put
Vasu, Please reply to the mailing list as emails to individual Galaxy developers often get lost, and there are others on the list that might be able to help you or benefit from this discussion. Now, to your question: you're using the wrong GFF filtering tool, which is an easy mistake to make as there are many of them. You want to use Filter and Sort -- GFF -- Filter GTF data by attribute values list. Using this tool, I was able to filter dataset 11--a GTF file produced by Cuffcompare--using a Cuffdiff isoform expression file (dataset 10) on transcript_id. I've shared the modified history with you. Best, J. On Apr 18, 2011, at 3:29 PM, vasu punj wrote: Hi Jeremy, I have been trying to use the tool mentioned in this message. I have two samples comparison 6 and 5 and has run Cufflink/ Cuffcompare/ Cuffdiff. I have filtred the files for c12 i.e for significant analysis and file is 10 uploaded as B_A Cuffdiff isoform expr filtered.txt I uploaded the second file 11 B_A_Homo_sapiens.GRCh37.60.clean.combined which is a combined GTF file generated by cuffcompare. When I tried to run filter combined transcript file using: Combined GTF as Cufflink assembled transcripts (11) and Cuffcompare tracking file as Cuffdiff isoform exp filtered file using sample no as 2, it return an empty file (12) Than thinking that perhaps it may be tracking file which I may have to use instead of combined GTF. I used B-A combined tracking file in place of combined GTF file but it will pop up only in Cuffcompare tracking file It may not be right but I used File 13 as tracking file with combined GTF as as assembled transcript still it return empty out put I have also shared history with you. Would you like to point me what is going on here? Thanks. Vasu --- On Mon, 4/11/11, Jeremy Goecks jeremy.goe...@emory.edu wrote: From: Jeremy Goecks jeremy.goe...@emory.edu Subject: Re: [galaxy-user] downstream analysis of cuffdiff out put To: shamsher jagat kanwar...@gmail.com Cc: galaxy-user galaxy-user@lists.bx.psu.edu Date: Monday, April 11, 2011, 9:04 AM On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Jagat, Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Hello all, I've added a tool called 'Filter GTF file by attribute values list' to the galaxy-central code repository. This tool is available on our test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by attribute values list and will be available on our main server in the next few weeks. As expected, this tool filters a GTF file based on a list of attribute values--or filters using a tabular file where attribute values are first column, as is the case for Cuffdiff output files. Potential attributes that can be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, these are the IDs that Cuffdiff uses in its output files. Here's an example workflow: (1) Run Cufflinks/compare/diff (2) Filter Cufflinks isoform differential expression file for transcripts that are differentially expressed; in other words, filter for c12=='yes' (2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare combined transcripts using the filtered file from step (2) as the attribute values list and, voila, you have a GTF file of the differentially expressed transcripts that you can view in your favorite genome browser. Hope this helps; feedback is always welcome. Best, J. -Inline Attachment Follows- ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu
Re: [galaxy-user] Nucleotide analysis - GC percentage
Now why does a tool search on the public Galaxy instance for GC not suggest this tool? Name: geecee Description: Calculates fractional GC content of nucleic acid sequences Does this mean the description isn't searched? It would seem like a sensible idea to me to include that... Searching for geecee works, but unless you're familiar with this EMBOSS tool no-one will think of that. Peter, The tool search doesn't start until you type in three characters, so typing 'GC' does not initiate a search. Typing 'gcspace' or 'gc content' works. Perhaps a tooltip or help text is needed. J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Jagat, Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Hello all, I've added a tool called 'Filter GTF file by attribute values list' to the galaxy-central code repository. This tool is available on our test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by attribute values list and will be available on our main server in the next few weeks. As expected, this tool filters a GTF file based on a list of attribute values--or filters using a tabular file where attribute values are first column, as is the case for Cuffdiff output files. Potential attributes that can be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, these are the IDs that Cuffdiff uses in its output files. Here's an example workflow: (1) Run Cufflinks/compare/diff (2) Filter Cufflinks isoform differential expression file for transcripts that are differentially expressed; in other words, filter for c12=='yes' (2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare combined transcripts using the filtered file from step (2) as the attribute values list and, voila, you have a GTF file of the differentially expressed transcripts that you can view in your favorite genome browser. Hope this helps; feedback is always welcome. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] RNA seq analysis and GTF files
David, can you please share your history with me and I'll take a look (History Options -- Share/Publish -- Share with User -- my email? Thanks, J. On Apr 7, 2011, at 3:23 PM, David K Crossman wrote: Hello! I would like to ask a question related to this thread below. I ran into the same issues as below and was unaware of having to swap some columns around in the GTF file. So, after 'swapping the gene name from the complete table (name2 value, column 12) into the GFT file's gene_id value (which by default is the same as transcript_id), I uploaded this patched file (mm9) into Galaxy and ran Cufflinks, CuffCompare and CuffDiff using this patched GTF file as the reference annotation. For both Cufflinks and CuffCompare, the gene_id was present in their respective columns. The problem I have encountered now is that in all of the output files in CuffDiff, the gene_id column is blank (contains a -; highlighted in yellow below). This example is from the CuffDiff gene expression output file: test_id gene locus sample_1 sample_2 status value_1 value_2 ln(fold_change) test_stat p_value significant XLOC_01 - chr1:4797973-4836816 q1 q2 OK 73.1908 82.1567 0.115559 -0.71896 0.472168 no XLOC_02 - chr1:4847774-4887990 q1 q2 OK 81.7264 53.1165 -0.43089 2.44474 0.014496 no XLOC_03 - chr1:5073253-5152630 q1 q2 OK 408.289 333.749 -0.20159 2.73173 0.0063 no XLOC_04 - chr1:5578573-5596214 q1 q2 NOTEST 2.34764 4.79772 0.71473 -0.89735 0.369532 no What am I doing wrong? I am interested in the differentially expressed genes in this RNA-Seq dataset (as well as calling variants, which is my next step, but want to get this answered first before moving on). Any info, suggestions or help would be greatly appreciated. Thanks, David -Original Message- From: galaxy-user-boun...@lists.bx.psu.edu [mailto:galaxy-user-boun...@lists.bx.psu.edu ] On Behalf Of Jeremy Goecks Sent: Friday, April 01, 2011 8:47 AM To: ssa...@ccib.mgh.harvard.edu Cc: galaxy-user Subject: Re: [galaxy-user] RNA seq analysis and GTF files On Mar 31, 2011, at 12:30 PM, ssa...@ccib.mgh.harvard.edu ssa...@ccib.mgh.harvard.edu wrote: Hi Jeremy, I used your exercise to perform an RNA-seq analysis. First I encountered a problem where the gene IDs were missing from the results. Jen from the Galaxy team suggested this: Yes, the team has taken a look and there are a few things going on. The first is that when running the Cuffcompare program, a reference annotation file in GTF format should be used in order to obtain the same results as in Jeremy's exercise. This seemed to be missing from your runs, which resulted in badly formatted output that later resulted in a poor result when Cuffdiff was used. The second has to do with the reference GTF file itself. For the best results, the GTF file must have the gene_id attribute defined in the 9th column of the file and the chromosome names must be in the same format as the genome native to Galaxy. Depending on the source of the reference GTF, one of these may need to be adjusted. Chromosome names can be adjusted using Galaxy's Text Manipulation tools. The gene_id attribute would need to be adjusted prior to loading into Galaxy. For mm9, using the Get Data - UCSC Main table browser tool can help you to obtain all of the raw data necessary to create a complete GTF file with a gene_id identifier. Extract data from the track RefSeq Genes and output the primary data table refGene twice - first in GTF format, then again as the complete table in tabular format (not BED). Then, using your own tools, swap in the gene name from the complete table (name2 value, column 12) into the GTF file's gene_id value (which by default is the same as transcript_id). Upload and the tools will function as intended. The team is aware of the issues associated with GTF source files and is discussing solutions. Any changes to native data content will be reported to the mailing list in a News Brief or other communications. Our apologies for the inconvenience! Thanks for using Galaxy and please let us know if we can help again, Best, Jen Galaxy team I followed the directions (or at least I think I did) and things seemed to work better but there is one more issue for example in file: Galaxy287- [Cuffdiff_on_data_197,_data_197,_and_data_274__isoform_FPKM_ tracking].tabular.txt The column gene_short_name does not have any names in it. nearest_ref_id does have the gene ID info so I can still interpret the data, but I was wondering if there remains another problem that I'm not aware of with the GTF file. Slim, Please send questions to the galaxy-user mailing list (cc'd) rather than individual Galaxy team members; there are many people on the list that may be able to address your question, and discussions are archived for future use as well. Without seeing your
Re: [galaxy-user] RNA seq analysis and GTF files
On Mar 31, 2011, at 12:30 PM, ssa...@ccib.mgh.harvard.edu ssa...@ccib.mgh.harvard.edu wrote: Hi Jeremy, I used your exercise to perform an RNA-seq analysis. First I encountered a problem where the gene IDs were missing from the results. Jen from the Galaxy team suggested this: Yes, the team has taken a look and there are a few things going on. The first is that when running the Cuffcompare program, a reference annotation file in GTF format should be used in order to obtain the same results as in Jeremy's exercise. This seemed to be missing from your runs, which resulted in badly formatted output that later resulted in a poor result when Cuffdiff was used. The second has to do with the reference GTF file itself. For the best results, the GTF file must have the gene_id attribute defined in the 9th column of the file and the chromosome names must be in the same format as the genome native to Galaxy. Depending on the source of the reference GTF, one of these may need to be adjusted. Chromosome names can be adjusted using Galaxy's Text Manipulation tools. The gene_id attribute would need to be adjusted prior to loading into Galaxy. For mm9, using the Get Data - UCSC Main table browser tool can help you to obtain all of the raw data necessary to create a complete GTF file with a gene_id identifier. Extract data from the track RefSeq Genes and output the primary data table refGene twice - first in GTF format, then again as the complete table in tabular format (not BED). Then, using your own tools, swap in the gene name from the complete table (name2 value, column 12) into the GTF file's gene_id value (which by default is the same as transcript_id). Upload and the tools will function as intended. The team is aware of the issues associated with GTF source files and is discussing solutions. Any changes to native data content will be reported to the mailing list in a News Brief or other communications. Our apologies for the inconvenience! Thanks for using Galaxy and please let us know if we can help again, Best, Jen Galaxy team I followed the directions (or at least I think I did) and things seemed to work better but there is one more issue for example in file: Galaxy287-[Cuffdiff_on_data_197,_data_197,_and_data_274__isoform_FPKM_tracking].tabular.txt The column gene_short_name does not have any names in it. nearest_ref_id does have the gene ID info so I can still interpret the data, but I was wondering if there remains another problem that I'm not aware of with the GTF file. Slim, Please send questions to the galaxy-user mailing list (cc'd) rather than individual Galaxy team members; there are many people on the list that may be able to address your question, and discussions are archived for future use as well. Without seeing your analysis, I'd suggest trying two things: (1) Provide gene annotation reference file to Cufflinks as well as Cuffcompare and Cuffdiff; in other words, you'll want to do guided assembly. (2) Try using an Ensembl GTF, which has the gene name in the attributes. I think (2) is more likely to generate the results you want, but there are the many known problems in using Ensembl GTFs with Cufflinks/compare/diff. Good luck, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trouble with RNAseq analysis
Cristian, Please share your history with me (History Options -- Share/Publish -- Share with User -- my email) and I'll take a look. Thanks, J. On Mar 30, 2011, at 10:48 AM, Cristian Rojas wrote: Hi everybody, I am trying to analyze the differential expression between two RNAseq samples. But I found many troubles aligning my reads. I will describe what I did. First I groomed the FastQ files (2). Then I uploaded the Sorghum genome and aligned the reads to it with Tophat. Aftter that, I tried to use Cufflink with the BAM file of Tophat, using as annotation file an uploaded GTF file and the Sorghum genome, but I received an error message in the three outputs of Cufflink. I tried to align against new brand Maize genome (now at Galaxy), and the same messages. I also converted the BAM file to SAM, but the same. Any advice? What was wrong? Thanks in advance. Cristian ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trouble with RNAseq analysis
Cristian, The is a formatting character; what needs to match is the string after the in the genome file and the entries in the contig column of your GTF. Your GTF is quite different that your genome file; your genome file has 10 contigs labeled by number, but your GTF has many, many contig names labelled by numbers and names. For Cufflinks to work, you can either (a) turn off bias correction or (b) restrict entries in your GTF to those that match your reference genome. Finally, please reply all to emails so that all emails remain on list for archival and community purposes. Thanks, J. On Mar 30, 2011, at 12:02 PM, Cristian Rojas wrote: Thanks Jeremy. But in genomes fasta files very often any chromosome represents a sequence followed by . Then, it is no possible match contig names in GTF with names in Genome fasta. What must I do? Cristian - Mensaje original De: Jeremy Goecks jeremy.goe...@emory.edu Para: Cristian Rojas cristianroja...@yahoo.com.ar CC: galaxy-user@lists.bx.psu.edu Enviado: miércoles, 30 de marzo, 2011 12:53:50 Asunto: Re: [galaxy-user] Trouble with RNAseq analysis Cristian, The contig names in your GTF file don't match those in your reference (fasta) file. In order for Cufflinks to use a reference GTF, its contigs names must match those in your reference genome. Best, J. On Mar 30, 2011, at 11:31 AM, Cristian Rojas wrote: Thanks Jeremy. I did it. Cristian - Mensaje original De: Jeremy Goecks jeremy.goe...@emory.edu Para: Cristian Rojas cristianroja...@yahoo.com.ar CC: galaxy-user@lists.bx.psu.edu Enviado: miércoles, 30 de marzo, 2011 12:02:47 Asunto: Re: [galaxy-user] Trouble with RNAseq analysis Cristian, Please share your history with me (History Options -- Share/Publish -- Share with User -- my email) and I'll take a look. Thanks, J. On Mar 30, 2011, at 10:48 AM, Cristian Rojas wrote: Hi everybody, I am trying to analyze the differential expression between two RNAseq samples. But I found many troubles aligning my reads. I will describe what I did. First I groomed the FastQ files (2). Then I uploaded the Sorghum genome and aligned the reads to it with Tophat. Aftter that, I tried to use Cufflink with the BAM file of Tophat, using as annotation file an uploaded GTF file and the Sorghum genome, but I received an error message in the three outputs of Cufflink. I tried to align against new brand Maize genome (now at Galaxy), and the same messages. I also converted the BAM file to SAM, but the same. Any advice? What was wrong? Thanks in advance. Cristian ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Trouble with RNAseq analysis
I tried agaian and the same problem. I tuned off the bias correction but mantained the GFT file. May be this is the problem? I didnt find your history. Thanks Look for the history I've shared in History Options -- Histories Shared with Me. As requested, if you're still having problems, please report the problematic dataset by clicking on the bug icon. Thanks, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] GTF-to-GFF3
Karen, Sorry for the slow reply. There are no immediate plans to add either BED-to-GFF3 or GTF-to-GFF3 converters to Galaxy main or the Galaxy codebase. However, if you're working with your own Galaxy, you might encourage the Rätsch lab to contribute their tools to the Galaxy Tool Shed (http://community.g2.bx.psu.edu/); you could then download them from there and install them in your own Galaxy. Alternatively, we welcome community contributions to the Galaxy codebase, and we'd be happy to incorporate these tools if they came with functional tests and test data. Best, J. On Mar 7, 2011, at 11:34 AM, Karen Tang wrote: Hi Galaxy developers, Any plans on adding a GTF-to-GFF3 format conversion? This converter is at the Rätsch lab's instance of Galaxy (http://galaxy.tuebingen.mpg.de/). Could it be made more widely available? Karen :) Dept of Plant Biology University of Minnesota___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
Jagat, Please send queries such as these to the galaxy-user mailing list (cc'd); there are many users on the list who can contribute to this discussion, and there are many additional users that will benefit from this discussion. I was wondering if you can point me to a documentation or URL to guide how to perform the downstream analysis once we have cuffdiff out put. In general, I agree that tools are needed to further process cufflinks/compare/diff outputs, but I'm not aware of any that are publicly available. Let's open this issue up for discussion and see if we can reach a consensus about tools might be useful. Everyone, please feel free to contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools you've built for Galaxy: http://community.g2.bx.psu.edu/ Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/