Re: [galaxy-user] Trackster Error

2014-03-27 Thread Jeremy Goecks
Hi Suzanne,

Can you share your history with me and I’ll take a look?

Thanks,
J.

--
Jeremy Goecks
Assistant Professor of Computational Biology
George Washington University



On Mar 24, 2014, at 5:08 PM, Suzanne Gomes suzanneluziago...@gmail.com wrote:

 Hello,
 
 I am trying to look at an output from Tophat using Trackster, but I keep 
 getting the following error: 
 
 Could not load chroms for this dbkey: dp4
 
 This is not a custom dbkey - I just selected it from the list of available 
 ones on Trackster. 
 
 I have database/build set to: D. pseudoobscura (dp4) (dp4) for my Tophat 
 results.
 
 Any ideas why this is happening and what I can do to fix it?
 
 Thanks
 
 Suzanne
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] Trackster Error

2014-03-27 Thread Jeremy Goecks
Hi Suzanne,

Thanks for sharing your history. This is a file format issue on our side. We’ll 
get it taken care of and let you know when it’s fixed.

Thanks,
J.

--
Jeremy Goecks
Assistant Professor of Computational Biology
George Washington University

 
 
 On Mar 27, 2014, at 9:27 AM, Jeremy Goecks jgoe...@gwu.edu wrote:
 
 Hi Suzanne,
 
 Can you share your history with me and I’ll take a look?
 
 Thanks,
 J.
 
 --
 Jeremy Goecks
 Assistant Professor of Computational Biology
 George Washington University
 
 
 
 On Mar 24, 2014, at 5:08 PM, Suzanne Gomes suzanneluziago...@gmail.com 
 wrote:
 
 Hello,
 
 I am trying to look at an output from Tophat using Trackster, but I keep 
 getting the following error: 
 
 Could not load chroms for this dbkey: dp4
 
 This is not a custom dbkey - I just selected it from the list of available 
 ones on Trackster. 
 
 I have database/build set to: D. pseudoobscura (dp4) (dp4) for my Tophat 
 results.
 
 Any ideas why this is happening and what I can do to fix it?
 
 Thanks
 
 Suzanne
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
 http://galaxyproject.org/search/mailinglists/
 
 


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] Questions regarding Circster visualization

2014-02-12 Thread Jeremy Goecks
 Indeed, there's another feature I don't fully understand: I have a bgiWig 
 file that contains reads of only one chromosome. I expected that Circster 
 would display this one chromosome as one circle, but apparently Circster 
 always draws a circle where all possible chromosomes of a genome are 
 displayed. I think the usability would greatly increase if Circster only 
 displayed those chromosomes that are actually represented in the coverage 
 file. (Of course, I could zoom in, but if you're working with a chromosome 
 that's very small in comparison (e.g. the Y chromosome) the circular 
 representation is not really seen anymore as the region covered by the Y 
 chromosome is so tiny compared to the autosomes).

Circster is really for genome-wide visualization, and the assumption is that 
you'll have data for many if not all chromosomes. If you have data for only a 
single chromosome, using Trackster (Galaxy's track browser) makes more sense; 
Trackster is also more developed and has more display options right now. 

Let's say, then, that what you're proposing is a very advanced feature that 
could be implemented down the road.

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] Questions regarding Circster visualization

2014-02-11 Thread Jeremy Goecks
 1. I tested it using a bigWig and a BED file. Both were loaded nicely in 
 Circos, but I was surprised to see that the visualization of both files 
 looked exactly the same, i.e. both file types seemed to be interpreted as 
 histograms/coverage data. From the Circos plots I've seen in publications, I 
 assumed that BED files should be visualized as straight lines, indicating 
 genome regions (rather than a coverage). Am I doing anything wrong? Or, 
 rather, how should I modify the BED file so that its content is simply 
 interpreted as genomic regions?

This is a limitation of the visualization, and it should be addressed. I've 
created a Trello card for this enhancement that you follow here: 
https://trello.com/c/YIdx6QvV

 2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397), line 
 data is mentioned for displaying connecting lines in the center of the 
 circle - could you give me an example line of how this kind of data needs to 
 be formatted?

The format is a 7-column tabular file with tab-separated values:

--
chrom1 start1 end1 chrom2 start2 end2 score
--

Score isn't used right now, but it still needs to be there. Once you have this 
format, you'll need to convert the datatype from 'tabular' to 'chrint' in order 
to visualize it (click on the pencil icon -- Datatype. Also, I have a workflow 
up to convert Tophat fusion output data to chrint format here:

 https://usegalaxy.org/u/jeremy/w/tophat-fusion-post-output-to-chrint 

Sorry for the cryptic nature of everything right now. We'll get this info and 
more up on a wiki page eventually (you're welcome to start one in the 
meantime). Let us know if you have more questions.

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Creating a Trackster visualisation from a reference in your history

2014-01-23 Thread Jeremy Goecks
 Is it possible to create a custom build and use it to view a SAM file without 
 adding the .len and .2bit files in to the Galaxy file system as an 
 administrator?

Yes, it definitely is.

 If so, what am I doing wrong?


This is a Galaxy bug which has been fixed in this commit:

https://bitbucket.org/galaxy/galaxy-central/commits/117fef56513fc563dd231516196cfd601c1635e2

We have a release coming up, so this fix will be included in the release and 
will make it to our public server soon. In the meantime, note that you can use 
the genome fasta file rather than the len file to create a custom build and 
everything should work.

Thanks,
J.

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Trackster Error chrom/?.len , No such file or directory

2013-12-05 Thread Jeremy Goecks
You'll need to set your dataset's database/dbkey to your custom reference 
genome before you can visualize it. We have enhancements planned so that this 
error doesn't happen in the future.

Best,
J.


On Dec 5, 2013, at 7:56 AM, Jasper Jan Koehorst jasper.koeho...@wur.nl wrote:

 I have my own genome fasta file containing 1 chromosome with a modified 
 header so that it looks like:
 
 chr1
 ATGCATGC
 
 I did a FASTQ mapping on it via the galaxy interface and now I end up with a 
 bam file:
 
 9 Bowtie2 on data 6, data 8, and data 7: aligned reads
 1.2 GB
 format bam  database ?
 
 I use the visualize button to start the visualization of the dataset. I chose 
 trackster, And view it in a new visualization. I use my fasta file as a 
 reference genome:
 
 Name  Key  Number of chroms/contigs   
 STPmg315  STPmg315_v1 1
 
 But then I get the error:
 Couldn't open /home/galaxy/galaxy-dist/tool-data/shared/ucsc/chrom/?.len , No 
 such file or directory
 I looked into the /chrom/ folder and of course ? does not exist. I am 
 currently running 
 python ./cron/build_chrom_db.py ./tool-data/shared/ucsc/chrom/
 But this ofcourse downloads only known genomes and their chr. information. As 
 I have my own genome I was curious how to continue with this.
 
 I manually created a file in the /chrom/so that it looks like this:
 head STPmg315.len 
 chr1  1900521
 
 but no luck so far. What else do I have to do to make it work?
 
 
 
 Thanks,
 
 
 
 Jasper
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Contents of a SAM file

2013-11-07 Thread Jeremy Goecks
All reads are in the SAM file; you can filter to remove unmapped reads as 
needed.

J.

On Nov 7, 2013, at 5:36 AM, Benjamin Osei-agyeman benjy_o...@yahoo.co.uk 
wrote:

 
 Hi
 
 What are the contents of a SAM file after Bowtie has been run? Does it 
 contain all reads or only those reads mapped to the genome?
 
 Thanks 
 
 Benjy
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] galaxy-user Digest, Vol 89, Issue 4

2013-11-05 Thread Jeremy Goecks
 Thanks for the info.  However, my problem is that the Tool Version field is 
 completely empty in my history items (eg. Tophat2, Cuffdiff).  I suppose I 
 can check the dependancies list you described, but it would be important to 
 know precisely which version was run on any given query.  

If you ran Cuffdiff in the last couple months, you used version 2.1.1 ; before 
that it was 1.3.x  The version information was added in the last couple weeks, 
which is why you don't see it. Any runs going forward should include the 
version.

J.

 
 Best regards,
 Cory
 
 Message: 3
 Date: Tue, 5 Nov 2013 09:45:19 +
 From: graham etherington (TSL)
 graham.ethering...@sainsbury-laboratory.ac.uk
 To: Cory Dunn cd...@ku.edu.tr, galaxy-u...@bx.psu.edu
 galaxy-u...@bx.psu.edu
 Subject: Re: [galaxy-user] Cuffdiff version not apparent
 Message-ID:
 ce9e6d0a.21768%graham.ethering...@sainsbury-laboratory.ac.uk
 Content-Type: text/plain; charset=Windows-1252
 
 Hi Cory,
 A list of Galaxy dependancies can be found on the wiki at:
 http://wiki.galaxyproject.org/Admin/Tools/Tool%20Dependencies
 ...although many tools allow a range of tool versions.
 
 You can also identify the information about the specific tool versions by
 clicking on the View Details ?i? icon of a history item created by that
 tool and looking at the Tool Version field.
 If you?re using the Galaxy public server (https://usegalaxy.org/) then
 clicking on the ?i? icon of a cuffdiff output file will show:
 
 Tool Version:cuffdiff v2.1.1 (4046M)
 Hope this helps.
 
 Cheers,
 Graham
 
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Trackster Error: needLargeMem: trying to allocate 0 bytes (limit: 100000000000)

2013-10-30 Thread Jeremy Goecks
It turns out that your artificial test is a bit too artificial. In order to 
display a coverage plot, Trackster converts reads in a BAM to BigWig using a 
two step process:

(1) BAM to bedgraph;
(2) bedgraph to bigwig

Your super simple example generates an empty file in step 1 because your single 
read does not map to Araly1, and the tool used in step 2 (bedGraphToBigWig) 
fails with the error that you're seeing. This is a corner-case bug, and I've 
create a card for this so you can track its resolution: 
https://trello.com/c/kMFUNawL

Best,
J.


On Oct 30, 2013, at 5:44 PM, Guest, Simon simon.gu...@agresearch.co.nz 
wrote:

 I'm having problems getting Trackster working on my own Galaxy
 instance, so I thought I would check on the usegalaxy public server.
 
 However, I'm getting the same Trackster Error: needLargeMem: trying to
 allocate 0 bytes (limit: 1000) that was reported on this list
 in July, but there was no followup:
 http://user.list.galaxyproject.org/Trackster-Error-td4655737.html
 
 My history is at https://usegalaxy.org/u/simon-guest/h/trackster-error
 
 This is just an artificial test I made using a fragment of a reference
 genome, but I thought it should work OK.
 
 Any clues?
 
 cheers,
 Simon
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] [galaxy-dev] Bam File

2013-10-29 Thread Jeremy Goecks
Hello,

First, I've moved this question from the Galaxy development mailing list to the 
Galaxy user mailing list; in the future, please send questions about using 
Galaxy to the galaxy-user list.

To answer your question, files larger than 2GB must be uploaded via FTP to 
Galaxy. This is necessary due to Web browser limitations.

Help to use FTP is on this wiki. The screencasts both show the two step 
process. The first is to FTP the data to the server, the second is to move the 
data from the Get Data - Upload Data tool form into your history.
http://wiki.galaxyproject.org/FTPUpload

Best,
J.

On Oct 28, 2013, at 11:55 AM, Arshad Rafiq arshadrafi...@gmail.com wrote:

  I am trying to upload  a bam file for my data analysis (size is about 9GB) I 
 am trying URL method to up load and getting error message, can you please 
 help me to sort out this problem. I am seeing following message
 An error occurred setting the metadata for this dataset. You may be able to 
 set it manually or retry auto-detection
 
 Thanks
 ***
 Arshad
 
 -- 
 Muhammad Arshad Rafiq, PhD
 Research Associate
 Laboratory of Dr. R. Hamilton
 Physiology and Experimental Medicine
 Research Institute
 The Hospital for Sick Children
 McMaster Building, Room 7005
 88 Elm St. 
 Toronto, ON
 M5G 1X8
 Canada
 647-237-4915
 arshadrafi...@gmail.com
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] problems in transitioning from Tophat to Cuffdiff

2013-10-22 Thread Jeremy Goecks
Are you reporting a bug for each failed Cuffdiff run? That's the easiest way 
for the Galaxy team to help you out. One thing to keep in mind is that, for 
now, spaces are not allowed in condition names. We'll address this problem soon.

Best,
J.



On Oct 22, 2013, at 5:42 AM, Elwood Linney ellin...@gmail.com wrote:

 
 After successfully using RNAseq software in  Galaxy online for about 10 
 different datasets to just get gene expression differences between replicates 
 from control versus exposed zebrafish embryos,  I am having no luck getting 
 cuffdiff to work with the moved Galaxy.
 
 I had this problem with histories developed before the move and histories 
 developed after the move.
 
 I have had this problem using an order cuffmerge gtf file that worked in the 
 past in Cuffdiff, with a new cuffmerge file developed from cufflinks of the 
 files and by just using a ref file gtf from UCSC.
 
 I don't know if this is just some interface problem with a different version 
 of the software that was included with the move, or a reference genome that 
 does not interface with Cuffdiff.  It has happened with about 5 different 
 histories.
 
 Is anyone else having this problem? And found a solution?
 
 Elwood Linney
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] CloudMap

2013-09-14 Thread Jeremy Goecks
This sounds an issue with importing workflows. Additional details will help 
others provide help:

(1) What version of Galaxy are you using?
(2) What workflow are you trying to import?
(3) What steps have you taken that produce the error that you're seeing?

Thanks,
J.

On Sep 13, 2013, at 3:32 PM, Isaac Knoflicek wrote:

 Has anyone out there gotten CloudMap to work in on a local Galaxy instance? 
 https://main.g2.bx.psu.edu/cloudmap
  
 I believe I have all the prerequisites installed but when I try to import any 
 of the published workflows I get a “TypeError: expected string or buffer”.
  
 Any advice would be greatly appreciated.
  
 Thanks,
 Isaac Knoflicek
 IT Manager – Laboratory of Genetics
 University of Wisconsin - Madison
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Cuffdiff changes

2013-08-23 Thread Jeremy Goecks
 Where can I see which version are being used?

You can see both the Galaxy tool version and the Cuffdiff tool version (when 
available) by clicking on the 'view details' icon (the 'i' at the bottom of an 
expanded dataset). Right now the Cuffdiff version is not displayed, but that 
will change when our server is updated.

 What does Cuffdiff(version 0.0.5) mean then?

That is the version of the Galaxy wrapper; the wrapper provides the interface 
between Cuffdiff and Galaxy.

 What version was it before?

I think Cuffdiff version was 1.3.1 previously.
 
 I look forward to the update, will that mean another version of Cuffdiff 
 again?

The wrapper will be updated but not Cuffdiff itself.

J.

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Cuffdiff-cummerbund with biological replicates problem

2013-07-31 Thread Jeremy Goecks
In the past, others have had success using Cummerbund with Galaxy, and there's 
even a Cummerbund wrapper in the tool shed: 

http://toolshed.g2.bx.psu.edu/view/jjohnson/cummerbund

That said, it appears that replicate information is largely contained in the 
read group tracking files, which are not currently included in Galaxy's 
Cuffdiff outputs. I don't know if these files are required by Cummerbund to do 
replicate analysis. This would be a good question for the Cummerbund 
developers, as well as what the p and q values mean when doing replicate 
analysis.

If you find that Galaxy's lacking something for Cummerbund to function 
correctly, that would be very useful information to share with the list.

Best,
J.


On Jul 26, 2013, at 8:50 PM, Mike Shamblott wrote:

 I'm trying to run Cuffdiff on a set of 10 human samples with biological 
 replication then download the results for further analyses in 
 Cummerbund(v2.1.1).  It seems like a standard workflow but I cannot get 
 cummerbund to acknowledge replicates.  I download and rename the 11 cuffdiff 
 output files to the names expected by cummerbund.  Cummerbund builds a 
 CuffSet with no warnings and most analyses work as expected.  The problem 
 comes any time I try to see the results of replication.  For example, in 
 cummerbund, replicates() returns an empty set and any type of plot returns 
 an error when replicates=T is included as an argument.
 
 There is no evidence of replication data in any of the 11 cuffdiff output 
 files.  The data is presented with the group name only.  From this, I 
 conclude that the problem is with cuffdiff, since there is no replicate data 
 for cummerbund to build into its db.  I see that there are several read group 
 files that are produced by cuffdiff but cannot be downloaded in Galaxy.  Is 
 this the problem, and if so, how can Galaxy be used to generate data with 
 (essential) replication?  Are the p  and  q significance values reported in 
 the output files a result of replicate analysis?  
 
 I have tried to ask this question in several different forums without 
 success.  The responses I've gotten suggest its a Galaxy issue rather than 
 either cuffdiff or cummerbund.   I'm hoping someone here can help answer my 
 questions.
 
 Hopeful,
 
 Mike
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] How to define the cutoff value of RPKM for expressed genes?

2013-07-03 Thread Jeremy Goecks
The confidence intervals provided by Cufflinks/Cuffdiff are a good place to 
start; any confidence interval that includes 0 should be looked on skeptically.

Good luck,
J.

On Jul 3, 2013, at 2:52 PM, Hoang, Thanh wrote:

 Hi all,
 I have been working on RNA-seq data analysis using TopHat and Cuffdiff. One 
 of the problem I have is to define the cutoff RPKM value to tell whether a 
 gene is expressed from  the background noise?.
  Could anybody give me a suggestion?
 Thank you
 Thanh
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] View details of Tophat alignment

2013-05-30 Thread Jeremy Goecks
Nothing is wrong with your job, this is a bug in our code that has been 
corrected. You'll start seeing the correct parameter values again when we 
update our server early next week.

Best,
J.

On May 29, 2013, at 11:05 PM, Du, Jianguang wrote:

 Hi All,
 After I finshed Tophat alignment for RNA-seq, I took look at the details of 
 parameters by clicking the icon View details, and I got the information as 
 shown below:
  
 Input Parameter   Value   Note for rerun
 RNA-Seq FASTQ file73: Filtered Groomed data1_rep2 
 Use a built in reference genome or own from your history  indexed 
 Select a reference genome /galaxy/data/mm9/bowtie_index/mm9   
 Is this library mate-paired?  single  
 TopHat settings to usefull
 Library Type  FR Unstranded   
 Anchor length (at least 3)None
 Maximum number of mismatches that can appear in the anchor region of spliced 
 alignmentNone
 The minimum intron length None
 The maximum intron length None
 Allow indel searchNo  
 Maximum number of alignments to be allowedNone
 Minimum intron length that may be found during split-segment (default) search 
 None
 Maximum intron length that may be found during split-segment (default) search 
 None
 Number of mismatches allowed in the initial read mapping  None
 Number of mismatches allowed in each segment alignment for reads mapped 
 independently None
 Minimum length of read segments   None
 Use Own Junctions Yes 
 Use Gene Annotation Model Yes 
 Gene Model Annotations1: mm9 genes.gtf
 Use Raw Junctions No  
 Only look for supplied junctions  No  
 Use Closure SearchNo  
 Use Coverage Search   Yes 
 Minimum intron length that may be found during coverage searchNone
 Maximum intron length that may be found during coverage searchNone
 Use Microexon Search  No
  
 I am totally confused by so many Nones.
 Then I checked the workflow I set and used for the TopHat alignment, the 
 details are the same as above.
  
 However, the brief description just under the title of alignment output (. 
 accepted hits) is as below:
  
 format: bam, database: mm9
 Tophat for Illumina on data 1 and data 73: accepted_hits, TopHat v1.4.0 
 tophat -p 8 -a 8 -m 0 -i 70 -I 50 -g 20 -G 
 /galaxy/main_pool/pool1/files/004/425/dataset_4425972.dat --library-type 
 fr-unstranded --no-novel-indels --coverage-search --min-cove
  
 Could you please tell me is there anything wrong (because so many None in 
 the detail parameters)?
  
 Thanks.
 Jianguang DU
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-10 Thread Jeremy Goecks
 1) My reads are 36nt long. How much should I set for the Minimum length of 
 reads segments to get the most reliable output with the highest mapping of 
 splicing junctions?. In my previous run of TopHat, I set it as 18. Can I 
 reduce it more to get better mapping on splicing junctions?

You'll need to define for yourself what you mean by better/best mapping and 
experiment to find the parameters that give you the best results.

 2) I do not understand exactly how TopHat works as for the Anchor length 
 although I have read the manual for TopHat. 
 Suppose I set the Anchor length as 8 and the Maximum number of mismatch 
 that can appear in the anchor region of spliced alignment as 0 when I run 
 Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will 
 report this alignment to the outputs .accepted hits and .splicing 
 junctions if either end of the read has 8 or more nucleotides mapping on one 
 exon?

I think that's correct.

 3) Is there disadvantage/negative effect if I choose to set the Anchor 
 length at the lowest, for example 3? My understanding is that, under the 0 
 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, 
 the other part of the read will map on the adjacent exon (in my case, the 
 other part would be 33 nucleotides). So my understanding is that setting the 
 Anchor length at 3 does not increase the inaccuracy of the alignment. Am I 
 correct?

Setting the anchor length especially small reduces the constraints on mapping, 
so more reads will map but there are likely to be more false positives as well.

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-10 Thread Jeremy Goecks

 I have one more question about the Anchor length. For a RNA-seq read mapped 
 on the splicing junction under the 0 mismatch condition, if 5 nucleotides of 
 one end map on one exon, does it mean the rest part of the read must map on 
 the adjacent exon? What I want to understand is that, although reducing 
 Anchor length may reduce the reliability of mapping on one end/exon, but 
 the increased number of mapped nucleotides on the adjacent exon may increase 
 the reliability of mapping. Does it mean overall the reliability of mapping 
 is not changed?

No, in general the probability of mapping 5 bases + (N-5) remaining bases 
incorrectly is higher than mapping 8 bases + (N-8) bases incorrectly because 
(a) there are more matching 5-mers than 8-mers in a genome and (b) there can 
mismatches when mapping the remainder.

J.

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-09 Thread Jeremy Goecks
36bp reads will map across splice junctions but at a relatively low rate; you 
can try changing segment length to get better mapping, but you'll want to 
evaluate the results carefully to ensure that you're getting good results.

Good luck,
J.

On Apr 8, 2013, at 5:45 PM, Du, Jianguang wrote:

 Hi All,
 I have a very basic question. I have RNA-seq datasets of several cell types 
 and want to compare the alternative splicing events between cell types. The 
 reads are 36nt in length. Are these reads long enough to map on the splicing 
 jucntions accurately when I run Tophat with stringent parameters (no 
 mismatch)?
 Thanks.
 Best,
 Jianguang Du
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-09 Thread Jeremy Goecks
 In addition to reducing the the Minimum length of reas segments, do I also 
 need to reduce Anchor length to get more mapping on splicing junctins?

Definitely worth a try.

 Looks like the setting for Anchor length only affects the number of mapped 
 splicing junctions reported in the .splicing junctions output. Is my 
 understanding correct?

No, it will affect mapped reads as well.

 Does the regions mean the number of mapped splicing junctions?

Yes.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] (no subject)

2013-04-05 Thread Jeremy Goecks
Cuffmerge does some additional steps that Cuffcompare does not; specifically, 
Cuffmerge attempts to remove assembly artifacts: 
http://cufflinks.cbcb.umd.edu/manual.html#cuffmerge It's likely that the 
(presumed) artifacts removed by Cuffmerge account for the differences that 
you're seeing.

Best,
J.

On Apr 5, 2013, at 8:33 AM, Davide Degli Esposti wrote:

 Dear Galaxy team,
 
 I have a question about RNA analysis with the cufflinks package.
 
 I have some bam files to analyze from a SOLiD platform. Some previous tests 
 show that these bam/sam files are different from those coming from Tophat and 
 cufflinks cannot assemble them using a reference annotation (XS attribute 
 lacking in spliced alignments). (see 
 https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetscufflinks).
  An apparent solution is to include the reference annotation in the cuffmerge 
 (see  
 https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetsapril-20132)
  or cuffcompare (see 
 https://main.g2.bx.psu.edu/u/davide-degliesposti/h/rna-seqtest-datasetsjan-2013-1)
  steps. Doing like this allowed me to run cuffdiff on my datasets without 
 apparent technical errors. However, when I compare the list of differentially 
 expressed transcripts (DETs), these results extremely different: using 
 cuffcompare, I got 390 DETs and using cuffmerge I got 770 DETs, but just 60 
 genes are shared between the two lists. The parameters used in cuffdiff (FDR, 
 Min Alignement counts, etc.) are the same for the two analyses.
 
 Do you have any explanation about that? I expected that cuffcompare and 
 cuffmerge did not lead to outputs quantitatively different. Where may the 
 source of this difference be?
 
 I thank you for your cooperation
 
 Davide
 
 ---
 Davide Degli Esposti, PhD
 Epigenetic (EGE) Group
 International Agency for Research on Cancer
 Tel. +33 4 72738036
 Fax. +33 4 72738322
 150, cours Albert Thomas
 69372 Lyon Cedex 08
 France 
 
 
 
 
 
 
 
 This message and its attachments are strictly confidential. If you are not
 the intended recipient of this message, please immediately notify the sender 
 and delete it. Since its integrity cannot be guaranteed, its content cannot 
 involve the sender's responsibility. Any misuse, any disclosure or 
 publication 
 of its content, either whole or partial, is prohibited, exception made of 
 formally approved use.
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 
  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Cuffdiff statistical calculations are inconsistent?

2013-03-15 Thread Jeremy Goecks
 The header of the Cuffdiff tool page says it is version 0.0.5

This version is the Galaxy tool wrapper version, not the tool version. (Yes, 
this is a usability issue.) You can find the tool version in the dataset's 
information panel by clicking on the 'i' icon.

 Is there a way, or setting, on Cuffdiff 2.0 to revert the parameters to be 
 more similar to Cuffdiff 1.3?

This isn't a parameter issue. The Cuffdiff algorithm has changed substantially, 
and it's not clear to me if/how (or whether it's a good idea at all) to modify 
parameters to obtain 1.3-esque results. 

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Cuffdiff statistical calculations are inconsistent?

2013-03-13 Thread Jeremy Goecks
This is likely due to the upgrade from Cufflinks 1.3.x to Cufflinks 2.0.x; 
Cufflinks 2.0 introduced a new algorithm for Cuffdiff in particular. You can 
read about these changes on the website:
http://cufflinks.cbcb.umd.edu/ (and there's a manuscript describing the changes 
as well).

You might consider writer to to the tool authors directly for more details: 
tophat.cuffli...@gmail.com Of course, please consider sharing anything you 
learn with members of this list as well.

Best,
J.



On Mar 13, 2013, at 12:06 PM, Mohammad Heydarian wrote:

 We are having the exact same issue, on the main server and our (recent) cloud 
 instances.
 
 Were some of the hidden Cuffdiff parameters modified since fall 2012? 
 
 Cheers,
 Mo Heydarian
 
 On Mar 13, 2013 11:02 AM, Jenna Smith jes...@case.edu wrote:
 Hi,
 
 I'll preface my concern by saying that I'm a novice to Cufflinks.  Back in 
 September, I performed a Cuffdiff analysis comparing a wild-type and mutant 
 condition.  The analysis returned ~800 transcripts differentially regulated 
 between the two with statistical significance.  Recently, I've rerun the 
 Cuffdiff analysis - using exactly the same files stored in Galaxy for all 
 inputs, and with all the same parameters - and only get a few dozen 
 statistically significant hits.  However, all of the data besides the p and q 
 values are essentially identical between these two runs, so I am really 
 unclear as to what is causing the difference.  Here is just one clear example:
 
 From run 1:
 YFR026C
 FPKM 1 = 17.2434
 FPKM 2 = 196.735
 log2(fold change) = 3.51214
 p = 1.64E-8
 q = 7.33E-6
 significant = yes
 
 From run 2:
 YFR026C
 FPKM 1 = 14.4489
 FPKM 2 = 144.939
 log2(fold change) = 3.32641
 p = 0.000170034
 q = 0.0719964
 significant = no
 
 The second Cuffdiff analysis shows there is still a ~10-fold difference 
 between conditions, but this is not statistically significant.  Has the 
 version of Cuffdiff on Galaxy been updated such that some parameters have 
 changed, that could explain this difference?  Or, is there some setting I am 
 missing that would cause very large changes to fail statistical significance 
 testing?  Any help or input would be appreciated, I am really at a loss for 
 why executing what should be exactly the same task is giving vastly different 
 results.  I could just be overlooking something very fundamental that is 
 obvious to someone with more experience with this program.  Thanks.
 
 -Jenna Smith
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] How should I include biological replicates in cufflink/cuffdiff?

2013-03-08 Thread Jeremy Goecks
 
 I am dealing with a bacterium which has about 4000 genes. When I tried 
 Cuffmerge to merge everything with reference annotation, I got a merged file 
 of only 50 lines. If I left out the reference annotation file, Cuffmerge 
 returned me a merged file of 4000 lines (which is more reasonable).
 
 However this difference didn't happen if I use Cuffcompare to merge all the 
 files. With or Without reference annotation, the merged file are both of 4000 
 lines. If I continue to Cuffdiff with this Cuffcompare file, I got over 1000 
 significantly changed genes.
 
 Could you give me some suggestion on this? Should I just trust the 
 Cuffcompare file? 

Cuffmerge attempts to remove incomplete or spurious transcripts. My best guess 
is bacterial transcripts, with few/no introns, are being filtered out because 
they appear to be incomplete to Cuffmerge. So, in your case, Cuffcompare could 
be the superior option. 

You might want to verify my guess by discussing the issue with the cufflinks 
developers directly: tophat.cuffli...@gmail.com ; please feel free to post 
anything you learn to this list.

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] [galaxy-dev] not enough memory space on my galaxy session

2013-03-08 Thread Jeremy Goecks
Hello,

Apologies for the slow reply. I've moved this thread to the galaxy-user mailing 
list because it centers on using Galaxy rather than developing it.

 2. deleted the first files, like the first fastq files, but I'm affraid to 
 have an error messages

Deleting your fastq files after you have mapped your reads is fine and will not 
cause any errors. Make sure to both delete and purge your datasets to clear 
them from your account:

http://wiki.galaxyproject.org/Learn/Managing%20Datasets#Delete_vs_Delete_Permanently

 3. to obtain more memory, just for the time of the study.

Your best bet to obtain more memory quickly is to use a cloud instance:

http://wiki.galaxyproject.org/CloudMan


Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] How should I include biological replicates in cufflink/cuffdiff?

2013-03-03 Thread Jeremy Goecks
 My question is, if I need to compare between 5 time points, should I do 
 comparison pairwise?

No, do them all at once with Cuffdiff: 

(a) set 'Perform Replicate Analysis' to 'Yes';
(b) create 5 replicate conditions, one for each time point;
(c) add your replicates for each time point. 

There's a Cuffdiff flag to do time series analysis, but it isn't implemented 
yet in Galaxy, so you'll get pairwise comparisons for all conditions. You can 
use the filtering tool to reduce Cuffdiff outputs to only the timepoint 
comparisons.


 I will use cuffmerge to merge 0hour-1, 0hour-2, 0hour-3, 
 1hour-1,1hour-2.1hour-3 to generate one cuffmerge file.

Correct.

 Then I will run cuffdiff using the merged file, include two groups, group 1 
 is 0 hour (add 0hour 1-3 in group 1) and group 2 is 1hour (add 1hour1-3 in 
 group 2).

Use the process I described above to do all pairwise comparisons in one run.

Good luck,
J.



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] replicates in cuffdiff output

2013-02-18 Thread Jeremy Goecks
Read group info isn't included in the Cuffdiff output right now. I've created a 
Trello card to fix this oversight: https://trello.com/c/FdUYdbIn

Best,
J.

On Feb 18, 2013, at 12:32 PM, Johanna Sandgren wrote:

 Hi,
  
 I am running cufflinks and cuffdiff using Galaxy. I am however wondering if 
 it is not supposed to be output files from cuffdiff regarding each replicate. 
 Anyone know why those (read.group-files) are not there, or when they will be 
 if it is because of the version used in Galaxy. I find it very valuable to 
 have those to be able to see intra/inter-group features in downstream 
 analysis.
  
 Thanks,
 Johanna
  
 ..
 Johanna Sandgren, PhD
 Department of Oncology-Pathology
 CCK, Karolinska Institutet
 SE-171 76 Stockholm, Sweden
 +46-8-517 721 35 (office),
 +46-8- 321047(fax), +46-708 388476 (mobile)
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] cufflinks output for cummeRbund

2013-02-12 Thread Jeremy Goecks
Cummerbund is available in the Galaxy toolshed for use in local or cloud 
Galaxies:

http://toolshed.g2.bx.psu.edu/view/jjohnson/cummerbund

We haven't put it on our public server yet because there are testing and 
compatibility challenges that need to be addressed.

Best,
J.

On Feb 11, 2013, at 7:49 PM, Mike Shamblott wrote:

 I have been using cufflinks on Galaxy Main.  I have downloaded the files 
 generated but they do not correspond to the file names expected by 
 cummeRbund.  
 
 For example:
 cummeRbund expects 4 tracking files (e.g isoforms.fpkm_tracking) , 4 .diff 
 files (e.g isoform_exp.diff).  
 
 Here is a trimmed version of the output I download, grouped by what Im 
 guessing is the tracking, diff and usage files:
 
 
 TRACKING
 ...Galaxy109_transcript_FPKM_tracking.tabular
 ...Galaxy107_gene_FPKM_tracking.tabular
 ...Galaxy105_TSS_groups_FPKM_tracking.tabular
 ...Galaxy103_CDS_FPKM_tracking.tabular
 
 .DIFF
 
 ...Galaxy108_transcript_differential_expression_testing.tabular
 ...Galaxy106_gene_differential_expression_testing.tabular
 ...Galaxy102_CDS_FPKM_differential_expression_testing.tabular
 
 USAGE
 ...Galaxy99_splicing_differential_expression_testing.tabular
 ...Galaxy100_promoters_differential_expression_testing.tabular
 ...Galaxy101_CDS_overloading_differential_expression_testing.tabular
 
 
 Given that cummeRbund is a common next step in the workflow, is there an 
 option to save the output in the expected format, perhaps with a galaxy 
 history number prepend?  I'm not sure which files are to be renamed and to 
 what name and it seems that one file is missing.
 
 If there were a cummeRbund implementation on Main it probably wouldn't matter 
 as much but until that happens, I (and i'm guessing other newcomers) would 
 appreciate the help!
 
 Thanks,
 
 mike
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Moving history datasets to libraries

2013-01-27 Thread Jeremy Goecks
In the Add Datasets/Upload Files libraries form, set the option 'Upload option' 
to 'Import datasets from your current history' and you'll be able to add 
datasets from your history to a library.

Best,
J.

On Jan 27, 2013, at 3:36 AM, Ted Goldstein wrote:

 I must be mistaken, but I don't see any way to move a dataset  that I create 
 in a history to a library except to download it and upload it again.  
 Can this be correct?  It seems like this is essential functionality.
 
 Please tell me I am wrong.
 Thanks,
 Ted
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Trackster custom builds wrong

2013-01-08 Thread Jeremy Goecks
Hello,

Can you share with me (a) the fasta dataset and (b) the form values (e.g. name, 
dbkey, etc) you used when you encountered this error?

Thanks,
J.

On Jan 7, 2013, at 10:17 AM, Jennifer Hillman-Jackson wrote:

 Repost to Galaxy-user
 ---
 
 When using Trackster on Galaxy( https://main.g2.bx.psu.edu/root ), as the 
 Galaxy Wiki of Trackster (http://wiki.galaxyproject.org/Learn/Visualization) 
 recommend , I need to build a track browsers for soybean because that it 
 isn't installed for all users. But after I enter the all the request 
 information on the webpage(new build name, key and definition) and submit, 
 but I get an server error information, it says an error occurred. see the 
 error logs for more information.(Turn debug on to display exception reports 
 here)
 
 Since I find  there is also someboby encounter  this problem in the mail 
 list, but did't find any useful solution.
 So, I want to know why it happend and how to fix it. Is there something wrong 
 in my maniputation?
 Any replay will be appreciated. Thank you very much!
 
 Yours sincerely,
 Yanting Shen
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Add new page server error

2013-01-04 Thread Jeremy Goecks
This bug has been fixed in our code base; our public server will be fixed when 
we update it early next week.

Best,
J.

On Jan 4, 2013, at 5:54 PM, Aaron Stonestrom wrote:

 When logged into main trying to create a new shared page under Add new page 
 in Saved Pages, entering any page title and selecting Submit gives me:
 Server Error
 An error occurred. See the error logs for more information. (Turn debug on to 
 display exception reports here)
 
 Thanks for any help,
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] cuffdiff

2012-11-12 Thread Jeremy Goecks
Use the replicates option (yes, a bit of a misnomer) and put each Tophat run in 
its own group. This will produce a tabular file with FPKM for each group/run.

Best,
J.  

On Nov 12, 2012, at 10:05 AM, Vevis, Christis wrote:

 Hi,
  
 I got confused while trying to perform Cuffdiff for my RNA sequencing 
 analysis. So I have five different samples which were sequenced. I used 
 tophat to create the bam files and cufflink to create the assembled 
 trancripts. Then I uded Cuffmerge to merge them in one file and then I wanted 
 to do Cuffdiff with that merged file. What shall I choose for the ‘’SAM or 
 BAM file of aligned RNA-Seq’’ option? I have the 5 options from the 5 tophat 
 actions on my 5 samples. All I want in the end is an excel table showing the 
 number of hits from each sample (and not necessary a comparison of them).
  
 Regards  
  
 Kristis Vevis, PhD Student
 Cell Biology
 UCL Institute of Ophthalmology
 11-43 Bath Street
 London
 EC1V 9EL, UK
 020 7608 4067
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Identification of replicate outlier

2012-11-11 Thread Jeremy Goecks

 c) if you can create an appropriate input matrix (read counts by exon
 or other contig for each sample eg), the Principal Component Analysis
 tool might be helpful (library size normalization is one devil that
 lies in the detail and it's not quite the same as MDS - see below)

I like starting with this approach because it can be done easily in Galaxy. You 
can take the expression datasets produced by Cufflinks for each replicate and 
join them on gene name to get a big table of replicate-expression values and 
either eyeball it or use PCA. Note that since Cufflinks produces FPKM, library 
size is already accounted for.

Another idea/approach: Cuffdiff already has an advanced model for dealing with 
replicates: 

http://cufflinks.cbcb.umd.edu/howitworks.html#reps

You may want to investigate how this model works and whether you can tune it 
with parameter settings before giving up on using all your replicates. 

One challenge with this approach is that the Galaxy Cuffdiff wrapper does not 
yet include all parameters, so you might try enhancing the Cuffdiff wrapper 
with additional, relevant parameters and using those as well as the existing 
ones. If you do this, please consider submitting your enhancements back to me 
and I can integrate them into our code base.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] (no subject)

2012-11-09 Thread Jeremy Goecks
Kristis,

This data is available further downstream in an RNA-seq analysis pipeline, 
specifically, as output from the Cuffdiff tool. Take a look at the page for 
more details:

https://main.g2.bx.psu.edu/rna-seq

Best,
J.

On Nov 9, 2012, at 3:42 AM, Vevis, Christis wrote:

 Hi,
  
 I am performing online tophat on 5  different samples which I want to compare 
 for gene expression. Is there any simple way, after the end of tophat for all 
 of them, with which I can have an excel table with the 5 samples and their 
 hits?
  
 Something similar to this
 Vevis1
 Vevis2
 Vevis3
 Vevis4
 Vevis5
 uc010kuo.1
 128.8503
 136.60553
 146.7073
 91.23218
 120.325
 AK311687
 TRA2A
 Homo sapiens transformer 2 alpha homolog (Drosophila) (TRA2A), mRNA.
 Regards
  
 Kristis Vevis, PhD Student
 Cell Biology
 UCL Institute of Ophthalmology
 11-43 Bath Street
 London
 EC1V 9EL, UK
 020 7608 4067
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] cufflinks visualization

2012-11-01 Thread Jeremy Goecks
I'm able to visualize Cufflinks assembled transcripts in Trackster. Can you 
please be more specific about (a) which datasets you're having trouble using 
and (b) what errors you're seeing?

Thanks,
J.

On Oct 31, 2012, at 1:10 PM, i b wrote:

 Hi all,
 can anyone explain me wh how can I visualize cufflinks outputs in trackster?
 galaxy keep sending me errors
 
 thanks,
 ib
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Export to file

2012-10-22 Thread Jeremy Goecks
 
 When you say large history, is there a size limit that I should be aware of, 
 or will it handle anything that my quota can accept?

It will handle anything your quota can accept.

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Export to file

2012-10-20 Thread Jeremy Goecks
I've reworked the code to handle large history export files in -central 
changeset afc8e9345268., and this should solve your issue. This change should 
make it out to our public server this coming week.

Best,
J.

On Oct 18, 2012, at 12:36 PM, Dave Corney wrote:

 Hi Jeremy,
 
 Thanks for your offer of help. By the time I got your email I had already 
 added many new jobs to the history that are either running now or waiting to 
 run. Since I read somewhere that if the history is running then there are 
 problems exporting I shared a clone of the history with you. The clone should 
 be identical to the history that I was having problems with yesterday. I can 
 share with you the original history once the jobs have finished running (but 
 it might take a while).
 
 Thanks,
 Dave
 
 
 On Wed, Oct 17, 2012 at 10:35 PM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Dave,
 
 There's likely something problematic about your history that causing 
 problems. Can you share with me the history that's generating the error? To 
 do so, from the history options menu -- Share/Publish -- Share with a User 
 -- my email address
 
 Thanks,
 J.
 
 
 On Oct 17, 2012, at 6:58 PM, Jennifer Jackson wrote:
 
  Hi Dave,
 
  Yes, if your Galaxy instance is on the internet, for entire history 
  transfer, you can skip the curl download and just enter the URL from the 
  public Main Galaxy server into your Galaxy directly.
 
  To load large data over 2G that is local (datasets, not history archives), 
  you can use the data library option. The idea is to load into a library, 
  then move datasets from libraries into histories as needed. Help is in our 
  wiki here:
  http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries
  http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Uploading%20Library%20Files
 
  Take care,
 
  Jen
  Galaxy team
 
  On 10/17/12 3:21 PM, Dave Corney wrote:
  Hi Jen,
 
  Thanks for your response and suggestion. Just so that it is clear, for
  your second method, where I export to file and then use curl, I will
  download to my computer as an intermediate stage? Is there a simple way
  to take the history and datasets from PSU galaxy to our Princeton galaxy
  directly (without downloading to my computer first)? Unfortunately, we
  don't have FTP on our own galaxy, which is why I was looking for
  alternatives (each file is 2GB, so uploading through the browser won't
  work either). It seems that to import from file, the file needs to have
  a URL and I'm not sure how to go about that if the file is store locally
  on my computer.
 
  Thanks,
  Dave
 
 
 
  On Wed, Oct 17, 2012 at 6:12 PM, Jennifer Jackson j...@bx.psu.edu
  mailto:j...@bx.psu.edu wrote:
 
 Hi Dave,
 
 To export larger files, you can use a different method. Open up a
 terminal window on your computer and type in at the prompt ($):
 
 $ curl -0 'file_link'  name_the_output
 
 Where file_link can be obtained by right-clicking on the disc icon
 for the dataset and selecting Copy link location.
 
 If you are going to import into a local Galaxy, exporting entire
 histories, or a history comprised of datasets that you have
 copied/grouped together, may be a quick alternative. From the
 history panel, use Options (gear icon) - Export to File to
 generate a link, then use curl again to perform the download. The
 Import from File function (in the same menu) can be used in your
 local Galaxy to incorporate the history and the datasets it contains.
 
 Hopefully this helps, but please let us know if you have more questions,
 
 Jen
 Galaxy team
 
 
 On 10/17/12 2:37 PM, Dave Corney wrote:
 
 Hi list,
 
 Is there a currently a known problem with the export to file
 function?
 I'm trying to migrate some data from the public galaxy to a
 private one;
 the export function worked well with a small (~100mb) dataset,
 but it
 has not been working with larger datasets (2GB) and I get the
 error:
 Server Error. An error occurred. See the error logs for more
 information. (Turn debug on to display exception reports here).
 Is there
 a limit on the file size of the export? If so, what is it?
 
 Thanks in advance,
 Dave
 
 
 _
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org http://usegalaxy.org.  Please keep all
 replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/__listinfo/galaxy-dev
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please

Re: [galaxy-user] Data table named 'bowtie2_indexes' is required by tool but not configured

2012-10-19 Thread Jeremy Goecks
You'll need to update the tool_data_table_conf.xml file in your Galaxy home 
directory.

If you haven't made changes to the file, you can copy 
tool_data_table_conf.xml.sample to tool_data_table_conf.xml If you have made 
changes, add these entries to the file:

--
table name=bowtie2_indexes comment_char=#
columnsvalue, dbkey, name, path/columns
file path=tool-data/bowtie2_indices.loc /
/table

table name=tophat2_indexes comment_char=#
columnsvalue, dbkey, name, path/columns
file path=tool-data/bowtie2_indices.loc /
/table
--

Finally, please direct questions about local Galaxy installations to the 
galaxy-dev mailing list: galaxy-...@bx.psu.edu

Best,
J.

On Oct 19, 2012, at 2:58 AM, Sachit Adhikari wrote:

 I am getting this error in Bowtie2 and Tophat2:
 Data table named 'bowtie2_indexes' is required by tool but not configured
 Data table named 'tophat2_indexes' is required by tool but not configured.
 
 How can I solve it? Thanks
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Export to file

2012-10-17 Thread Jeremy Goecks
Dave,

There's likely something problematic about your history that causing problems. 
Can you share with me the history that's generating the error? To do so, from 
the history options menu -- Share/Publish -- Share with a User -- my email 
address

Thanks,
J.


On Oct 17, 2012, at 6:58 PM, Jennifer Jackson wrote:

 Hi Dave,
 
 Yes, if your Galaxy instance is on the internet, for entire history transfer, 
 you can skip the curl download and just enter the URL from the public Main 
 Galaxy server into your Galaxy directly.
 
 To load large data over 2G that is local (datasets, not history archives), 
 you can use the data library option. The idea is to load into a library, then 
 move datasets from libraries into histories as needed. Help is in our wiki 
 here:
 http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries
 http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Uploading%20Library%20Files
 
 Take care,
 
 Jen
 Galaxy team
 
 On 10/17/12 3:21 PM, Dave Corney wrote:
 Hi Jen,
 
 Thanks for your response and suggestion. Just so that it is clear, for
 your second method, where I export to file and then use curl, I will
 download to my computer as an intermediate stage? Is there a simple way
 to take the history and datasets from PSU galaxy to our Princeton galaxy
 directly (without downloading to my computer first)? Unfortunately, we
 don't have FTP on our own galaxy, which is why I was looking for
 alternatives (each file is 2GB, so uploading through the browser won't
 work either). It seems that to import from file, the file needs to have
 a URL and I'm not sure how to go about that if the file is store locally
 on my computer.
 
 Thanks,
 Dave
 
 
 
 On Wed, Oct 17, 2012 at 6:12 PM, Jennifer Jackson j...@bx.psu.edu
 mailto:j...@bx.psu.edu wrote:
 
Hi Dave,
 
To export larger files, you can use a different method. Open up a
terminal window on your computer and type in at the prompt ($):
 
$ curl -0 'file_link'  name_the_output
 
Where file_link can be obtained by right-clicking on the disc icon
for the dataset and selecting Copy link location.
 
If you are going to import into a local Galaxy, exporting entire
histories, or a history comprised of datasets that you have
copied/grouped together, may be a quick alternative. From the
history panel, use Options (gear icon) - Export to File to
generate a link, then use curl again to perform the download. The
Import from File function (in the same menu) can be used in your
local Galaxy to incorporate the history and the datasets it contains.
 
Hopefully this helps, but please let us know if you have more questions,
 
Jen
Galaxy team
 
 
On 10/17/12 2:37 PM, Dave Corney wrote:
 
Hi list,
 
Is there a currently a known problem with the export to file
function?
I'm trying to migrate some data from the public galaxy to a
private one;
the export function worked well with a small (~100mb) dataset,
but it
has not been working with larger datasets (2GB) and I get the
error:
Server Error. An error occurred. See the error logs for more
information. (Turn debug on to display exception reports here).
Is there
a limit on the file size of the export? If so, what is it?
 
Thanks in advance,
Dave
 
 
_
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org http://usegalaxy.org.  Please keep all
replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
 
http://lists.bx.psu.edu/__listinfo/galaxy-dev
http://lists.bx.psu.edu/listinfo/galaxy-dev
 
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
 
http://lists.bx.psu.edu/
 
 
--
Jennifer Jackson
http://galaxyproject.org
 
 
 
 -- 
 Jennifer Jackson
 http://galaxyproject.org
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list 

Re: [galaxy-user] a question about cuffdiff values

2012-08-06 Thread Jeremy Goecks
Hi El,

 1) what do these numbers represent?

FPKM values for sample 1 and 2. Cufflinks documentation is the place to get 
definitions for all columns: 
http://cufflinks.cbcb.umd.edu/manual.html#gene_exp_diff

 2) If in the value column where I expect a higher number has  a value of 
 10 or less mean anything or should one be selecting for values higher that 
 these single digit numbers 
 3) And in the column of genes that might be repressed is there really a 
 difference between a value of 0.1  versus something like 0.01 since that 
 can change my log ratios significantly--this, of course, goes back to my 
 first question

These questions get at the challenge of interpreting FPKM values. One thing to 
look at is the confidence intervals (CI) produced by Cufflinks/diff. CIs that 
overlap 0 are, in my experience, unreliable no matter how large the FPKM. 

Most likely genes with FPKM values near 0 have CIs overlapping 0, which means 
there's likely no difference between them. However, genes with low FPKM values 
( e.g.  10) but tight CIs and  0 should probably be included for further 
analysis.

Another thing to look at is whether a couple highly-expressed genes are 
reducing FPKM values. If so, using the upper-quartile normalization option can 
help you get better resolution for genes expressed at low levels.

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Datasets permanently deleted

2012-07-31 Thread Jeremy Goecks
Sarah,

I can't reproduce this behavior on a local instance or on our public server. 
This raises a couple questions:

Are you using the most recent version of Galaxy? 
Can you reproduce this behavior on our public server (usegalaxy.org)?

Thanks,
J.


On Jul 31, 2012, at 8:13 AM, Sarah Maman wrote:

 Dear all,
 
 In the menu User - Saved Datasets, all datasets are listed even if some 
 of them have been deleted permanently by deleting its history.
 So, it's possible to copy an deleted dataset in the current history and that 
 is confusing for users.
 
 Do you have any solution to drop these datasets in saved datasets ?
 
 Thanks in advance,
 Sarah Maman
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Cuffconfusion

2012-07-20 Thread Jeremy Goecks
There is an excellent article on how to do differential gene/transcript 
expression with Tophat and Cufflinks here:

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

This article will answer the questions you've posed below and provides numerous 
figures that will help you create a workflow to meet your needs.

Best,
J.

On Jul 20, 2012, at 6:39 PM, i b wrote:

 ok, im really confused now about cufflinks and its tools.
 
 All I wanted was to look for differentially expressed genes between
 two samples: treated (2 replicates) and control (one replicate).
 
 can anyone give me a workflow for a similar analysis with the various
 options chosen?
 
 I have read a lot of different posts where for cuffdiff  they have
 used cufflinks, cuffcompare, cuffmerge or any gtf file as imput
 together with the bam file.
 There must be a difference in using all these different file right???
 
 Also:
 what is the advantage in using cuffcompare and how we compare them: we
 give all cufflinks or we separate control from treated?
 Why do we need cuffmerge?isn't it as well combining the cufflinks?
 when we use cuffcompare or cuffmerge do we mix all cufflinks no matter
 is they are control or treated ones?
 
 
 Please don't send me back to the cufflink page
 (http://cufflinks.cbcb.umd.edu/index.html)...I need more simpler
 words!
 
 Thanks,
 ib
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Trackster error: indexing

2012-07-14 Thread Jeremy Goecks
Hi Nancy,

I'm ccing the galaxy-user mailing list as this discussion may be helpful to 
others.

The problem is that your BAM dataset isn't sorted. Galaxy requires that 
uploaded BAMs be sorted to be useful for most tools and for visualization.

You can fix this in two ways:

(a) using samtools from the command line and then uploading the sorted file to 
Galaxy:
samtools sort in.bam out_prefix

(b) from Galaxy, use tools to convert the BAM to SAM and back again to BAM; the 
output of the SAM to BAM tool will be sorted.

Taking either of these steps will enable visualization in Trackster.

Best,
J.


On Jul 13, 2012, at 5:34 PM, Nancy Au Yeung wrote:

 
 On Fri, Jul 13, 2012 at 2:29 PM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Hi Nancy,
 
 Can you share the history with the problematic dataset(s) with me and I can 
 take a look? Please share the history with me using my email address: 
 jeremy.goe...@emory.edu
 
 Best,
 J.
 
 On Jul 12, 2012, at 9:07 PM, Nancy Au Yeung wrote:
 
 Hi,
 
 I saw another post regarding trackster error and it seems like this is 
 different.  I have tried copying the dataset from the History option, but 
 this same error occurs.  See error script below.
 
 Thanks!
 
 Trackster Error
 
  *** glibc detected *** python: double free or corruption (top): 
 0x01c09370 ***
 === Backtrace: =
 /lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f93fe421ab6]
 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f93fe4267ec]
 /lib/x86_64-linux-gnu/libc.so.6(fclose+0x14d)[0x7f93fe412a0d]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7f93fdac2ad7]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x6e7f8)[0x7f93fdaae7f8]
 python(PyObject_Call+0x36)[0x4824c6]
 python(PyEval_CallObjectWithKeywords+0x36)[0x486086]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x52a04)[0x7f93fda92a04]
 python[0x482068]
 python(PyObject_Call+0x36)[0x4824c6]
 python(PyEval_EvalFrameEx+0x91a)[0x4c5e8a]
 python(PyEval_EvalCodeEx+0x136)[0x4ccee6]
 python(PyEval_EvalFrameEx+0x838)[0x4c5da8]
 python(PyEval_EvalCodeEx+0x136)[0x4ccee6]
 python(PyRun_FileExFlags+0xe1)[0x577901]
 python(PyRun_SimpleFileExFlags+0x177)[0x577b37]
 python(Py_Main+0x6f7)[0x550497]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f93fe3caead]
 python[0x41dea1]
 === Memory map: 
 0040-00672000 r-xp  fe:00 893421 
 /usr/bin/python2.7
 00871000-00872000 r--p 00271000 fe:00 893421 
 /usr/bin/python2.7
 00872000-008db000 rw-p 00272000 fe:00 893421 
 /usr/bin/python2.7
 008db000-008ed000 rw-p  00:00 0 
 0166d000-01c29000 rw-p  00:00 0  
 [heap]
 7f93f800-7f93f8021000 rw-p  00:00 0 
 7f93f8021000-7f93fc00 ---p  00:00 0 
 7f93fd6de000-7f93fd70e000 r-xp  00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd70e000-7f93fd80d000 ---p 0003 00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd80d000-7f93fd812000 rw-p 0002f000 00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd812000-7f93fd814000 rw-p  00:00 0 
 7f93fd814000-7f93fd83b000 r-xp  fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fd83b000-7f93fda3a000 ---p 00027000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3a000-7f93fda3b000 r--p 00026000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3b000-7f93fda3f000 rw-p 00027000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3f000-7f93fda4 rw-p  00:00 0 
 7f93fda4-7f93fdb1 r-xp  00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdb1-7f93fdc1 ---p 000d 00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdc1-7f93fdc2 rw-p 000d 00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdc2-7f93fdc25000 rw-p  00:00 0 
 7f93fdc25000-7f93fdc44000 r-xp  fe:00 949804 
 /usr/lib/python2.7/lib-dynload/_io.so
 7f93fdc44000-7f93fde44000 ---p 0001f000 fe:00 949804 
 /usr/lib

Re: [galaxy-user] Trackster error: indexing

2012-07-13 Thread Jeremy Goecks
Hi Nancy,

Can you share the history with the problematic dataset(s) with me and I can 
take a look? Please share the history with me using my email address: 
jeremy.goe...@emory.edu

Best,
J.

On Jul 12, 2012, at 9:07 PM, Nancy Au Yeung wrote:

 Hi,
 
 I saw another post regarding trackster error and it seems like this is 
 different.  I have tried copying the dataset from the History option, but 
 this same error occurs.  See error script below.
 
 Thanks!
 
 Trackster Error
 
  *** glibc detected *** python: double free or corruption (top): 
 0x01c09370 ***
 === Backtrace: =
 /lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f93fe421ab6]
 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f93fe4267ec]
 /lib/x86_64-linux-gnu/libc.so.6(fclose+0x14d)[0x7f93fe412a0d]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7f93fdac2ad7]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x6e7f8)[0x7f93fdaae7f8]
 python(PyObject_Call+0x36)[0x4824c6]
 python(PyEval_CallObjectWithKeywords+0x36)[0x486086]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so(+0x52a04)[0x7f93fda92a04]
 python[0x482068]
 python(PyObject_Call+0x36)[0x4824c6]
 python(PyEval_EvalFrameEx+0x91a)[0x4c5e8a]
 python(PyEval_EvalCodeEx+0x136)[0x4ccee6]
 python(PyEval_EvalFrameEx+0x838)[0x4c5da8]
 python(PyEval_EvalCodeEx+0x136)[0x4ccee6]
 python(PyRun_FileExFlags+0xe1)[0x577901]
 python(PyRun_SimpleFileExFlags+0x177)[0x577b37]
 python(Py_Main+0x6f7)[0x550497]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f93fe3caead]
 python[0x41dea1]
 === Memory map: 
 0040-00672000 r-xp  fe:00 893421 
 /usr/bin/python2.7
 00871000-00872000 r--p 00271000 fe:00 893421 
 /usr/bin/python2.7
 00872000-008db000 rw-p 00272000 fe:00 893421 
 /usr/bin/python2.7
 008db000-008ed000 rw-p  00:00 0 
 0166d000-01c29000 rw-p  00:00 0  
 [heap]
 7f93f800-7f93f8021000 rw-p  00:00 0 
 7f93f8021000-7f93fc00 ---p  00:00 0 
 7f93fd6de000-7f93fd70e000 r-xp  00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd70e000-7f93fd80d000 ---p 0003 00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd80d000-7f93fd812000 rw-p 0002f000 00:18 458639 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/ctabix.so
 7f93fd812000-7f93fd814000 rw-p  00:00 0 
 7f93fd814000-7f93fd83b000 r-xp  fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fd83b000-7f93fda3a000 ---p 00027000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3a000-7f93fda3b000 r--p 00026000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3b000-7f93fda3f000 rw-p 00027000 fe:00 949810 
 /usr/lib/python2.7/lib-dynload/_ctypes.so
 7f93fda3f000-7f93fda4 rw-p  00:00 0 
 7f93fda4-7f93fdb1 r-xp  00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdb1-7f93fdc1 ---p 000d 00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdc1-7f93fdc2 rw-p 000d 00:18 458638 
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.7-linux-x86_64-ucs4.egg/csamtools.so
 7f93fdc2-7f93fdc25000 rw-p  00:00 0 
 7f93fdc25000-7f93fdc44000 r-xp  fe:00 949804 
 /usr/lib/python2.7/lib-dynload/_io.so
 7f93fdc44000-7f93fde44000 ---p 0001f000 fe:00 949804 
 /usr/lib/python2.7/lib-dynload/_io.so
 7f93fde44000-7f93fde45000 r--p 0001f000 fe:00 949804 
 /usr/lib/python2.7/lib-dynload/_io.so
 7f93fde45000-7f93fde4e000 rw-p 0002 fe:00 949804 
 /usr/lib/python2.7/lib-dynload/_io.so
 7f93fde4e000-7f93fdf0f000 rw-p  00:00 0 
 7f93fdf0f000-7f93fdf12000 r-xp  fe:00 949801 
 /usr/lib/python2.7/lib-dynload/_heapq.so
 7f93fdf12000-7f93fe111000 ---p 3000 fe:00 949801 
 /usr/lib/python2.7/lib-dynload/_heapq.so
 7f93fe111000-7f93fe112000 r--p 2000 fe:00 949801 
 /usr/lib/python2.7/lib-dynload/_heapq.so
 7f93fe112000-7f93fe114000 rw-p 3000 fe:00 949801 
 /usr/lib/python2.7/lib-dynload/_heapq.so
 

Re: [galaxy-user] [galaxy-dev] Create and Transfer Galaxy Page

2012-04-19 Thread Jeremy Goecks
Todd,

There's not an ideal solution for your situation. My suggestion:

(a) set up a cloud instance with your tools + a Page and use the 
share-an-instance feature so that others can access your data, tools, 
histories, and page in a single place ( http://wiki.g2.bx.psu.edu/Admin/Cloud );
(b) put your tools into the tool shed for easy access ( 
http://toolshed.g2.bx.psu.edu/ ); 
(c) replicate the page + as many of the histories as possible on our public 
server, with a note about how to get going with either (i) tools from the tool 
shed or (ii) on the cloud.

We're working to make public server-cloud access more easy, so there may be 
something on the horizon that could smooth (c)(ii) out.

Best,
J.

On Apr 18, 2012, at 5:02 PM, Todd Oakley wrote:

 Jeremy -
   Thanks so much for your helpful responses.
 
   One problem that i didn't mention with implementing your suggestions is 
 that the histories I want to post contain mainly new tools that my lab 
 developed for phylogenetics using transcriptome data. Therefore, the public 
 instance does not have most of the tools in the history (we will put these on 
 the tool shed as soon as we can).  In addition, the analyses are VERY 
 computationally intensive, including assembly and Maximum Likelihood 
 analyses, and therefore probably are not suitable for re-running on the 
 public Galaxy instance.  (This is also a reason why I cannot make my local 
 galaxy instance public - it exposes too many tools that could bog down the 
 host computer).
 
   Additional suggestions most welcome…
 
 
 Todd
 
 
 On Apr 17, 2012, at 6:33 PM, Jeremy Goecks wrote:
 
 Hi Todd,
 
   [Not sure if this is better suited to galaxy-dev or -user, so I'm sending 
 to both].
 
 galaxy-user is most appropriate for this question because it related to 
 usage of Galaxy; galaxy-dev is for local installation and tool development 
 questions.
 
 My question is - can I create a Galaxy 'Published Page' from my local 
 Galaxy instance/histories, and then transfer that page to the main Galaxy 
 instance?
 
 Not currently, though this is in our long-term plan.
 
 The reason is that I cannot make my local Galaxy instance public, as I am 
 using a campus resource to host our galaxy.  If this is possible, how can I 
 do that?  If not, any other ideas?
 
 It is possible to move datasets and workflows relatively easily between 
 instances, so I'd recommend that:
 
 (a) you move your data and workflows to our public instance;
 (b) rerun your analyses on the public instance to create the required;
 (c) create and host the Page on our public instance.
 
 You can be assured that we will maintain our public server over the coming 
 years and your Page will remain available and have a stable URL.
 
 Also, are there any tutorials/pages on how to create Published Pages in 
 Galaxy in the first place?
 
 Not yet, though the idea is for the Page editor to be self explanatory. 
 Here's how to get started with Pages:
 
 (a) from User menu, go to Saved Pages;
 (b) create a Page;
 (c) edit the Page using the Web-based editor; there are menus for inserting 
 embedded datasets, workflows, histories, and visualizations as well as 
 performing standard word-processing operations.
 
 Let us know if you have problems/questions and we'll start a guide for 
 creating Pages.
 
 Best,
 J.
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Tophat mapping

2012-04-18 Thread Jeremy Goecks
 I am wondering if these non-coding reads will be included when cufflinks 
 calculates transcript/gene expression. 

Reads will only be included if they map to assembled/known transcripts.

 And another question is:  how to know the number of reads mapped to a certain 
 exon? 

This isn't possible because a single read may map to multiple exons and/or 
transcripts. Cufflinks assigns reads probabilistically when their mapping 
cannot be uniquely determined.

See

http://cufflinks.cbcb.umd.edu/faq.html#count
http://cufflinks.cbcb.umd.edu/howitworks.html

for details.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Tophat mapping

2012-04-18 Thread Jeremy Goecks
 Jeremy, do you have a workflow to estimate what percent of the reads
 are mapping to unknown expressed regions?


Here's a simple approach assuming mapped reads are in BAM format:

BAM -- SAM

SAM -- Interval

Intersect reads as interval with known annotation not allowing for any overlap.

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] [galaxy-dev] Create and Transfer Galaxy Page

2012-04-17 Thread Jeremy Goecks
Hi Todd,

 [Not sure if this is better suited to galaxy-dev or -user, so I'm sending 
 to both].

galaxy-user is most appropriate for this question because it related to usage 
of Galaxy; galaxy-dev is for local installation and tool development questions.

 My question is - can I create a Galaxy 'Published Page' from my local Galaxy 
 instance/histories, and then transfer that page to the main Galaxy instance?

Not currently, though this is in our long-term plan.

 The reason is that I cannot make my local Galaxy instance public, as I am 
 using a campus resource to host our galaxy.  If this is possible, how can I 
 do that?  If not, any other ideas?

It is possible to move datasets and workflows relatively easily between 
instances, so I'd recommend that:

(a) you move your data and workflows to our public instance;
(b) rerun your analyses on the public instance to create the required;
(c) create and host the Page on our public instance.

You can be assured that we will maintain our public server over the coming 
years and your Page will remain available and have a stable URL.

 Also, are there any tutorials/pages on how to create Published Pages in 
 Galaxy in the first place?

Not yet, though the idea is for the Page editor to be self explanatory. Here's 
how to get started with Pages:

(a) from User menu, go to Saved Pages;
(b) create a Page;
(c) edit the Page using the Web-based editor; there are menus for inserting 
embedded datasets, workflows, histories, and visualizations as well as 
performing standard word-processing operations.

Let us know if you have problems/questions and we'll start a guide for creating 
Pages.

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Workshop in Chicago

2012-04-11 Thread Jeremy Goecks
Scott,

Your information is incorrect. The Galaxy Community Conference ( 
http://wiki.g2.bx.psu.edu/Events/GCC2012 ) will have something for everyone who 
is working with Galaxy, from sys admins to tool developers to core staff to end 
users/biologists. 

Our program is still in flux, and we welcome input about what you'd like to see 
at the conference at outre...@galaxyproject.org

Best,
J.


 From: Scott W. Tighe scott.ti...@uvm.edu
 Date: April 11, 2012 9:42:08 AM EDT
 To: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] Workshop in Chicago
 
 Dear Galaxy and Admin Staff:
 
 I was informed by a few peope that the Galaxy workshop in Chicago is reay 
 geared to Bioinformatic people that know how to write code. Not necessiy for 
 general core ab staff that has data analysis needs from time to time.
 
 Can anyone shed some light on the subject please
 
 Scott Tighe
 
 
 -- 
 Core Laboratory Research Staff
 DNA and Microarray Core Facility
 149 Beaumont Ave
 University of Vermont HSRF 305
 Burlington Vermont  USA 05045
 802-656-2557

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] problems with color settings in Visualizations

2012-03-22 Thread Jeremy Goecks
Mackenzie,

We've fixed this issue in our code base and it should be fixed on our server in 
the next day or two.

Best,
J.

On Mar 22, 2012, at 3:35 PM, Mackenzie Gavery wrote:

 Hi,
 
 I am working with some saved visualizations, and finding that the color 
 settings are not working today. Specifically, every time I change the color 
 (in Settings) the result is the feature ends up black regardless of the color 
 selection.  Could you help me with this issue?
 
 Thanks,
 
 Mackenzie
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] questions on directionality

2012-02-27 Thread Jeremy Goecks
Nick,

 I apologize if this is covered in documentation or help threads.  searched 
 and it seemed it was not.  I have several illumina rna-seq data sets that 
 should be directional.  It seems the directionality is very good, based on 
 the visualization.  First question is; in the visualization window, are the 
 reads color coded by direction, i.e. are orange one direction and blue the 
 other?  

Different colors in read data does indicate strandedness; hover over the track 
and click on the 'Edit Settings' icon (gear) to see/change the sense/anti-sense 
colors used.

 Similar question, is there a way to quantify directionality of the data set?

You can use the Filter SAM tool to filter for mapped strand.

Good luck,
J.



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Using galaxy for Bacterial RNA-seq

2012-02-16 Thread Jeremy Goecks
Bomba,

I'm not familiar enough with bacterial/prokaryotic transcriptomes to suggest a 
possible workflow. You might try the standard 
Tophat-Cufflinks-Cuffcompare/merge-Cuffdiff workflow and see whether you get 
meaningful results; Tophat runs Bowtie internally, so there's no reason to run 
Bowtie separately unless there are Bowtie-specific parameters that you need to 
modify. I've had very little experience with PALMapper and can't speak to its 
efficacy, either for eukaryotic or prokaryotic transcriptome analyses.

Finally, I've cc'd the galaxy-user mailing list. Using this list is the best 
way to reach the Galaxy user community and get in touch with someone that has 
used Galaxy to analyze bacterial transcriptomes.

Good luck,
J.


On Feb 16, 2012, at 9:17 AM, Bomba Dam wrote:

 Dear Dr. Goecks,
 
 I am working as a post-doctoral fellow in MPI Marburg, Germany. We am trying 
 to understand the differential expression of genes in a methanotrophic 
 bacterium under different growth conditions. We are sequencing the 
 transcriptome using Illumina Hiseq. As I dont have expertise in programming 
 languages, I found the Galaxy interface very user-friendly for doing such 
 transcriptome analysis. However, I could not find a step wise 
 protocol\workflow for mapping bacterial RNA-seq against the reference genome 
 (we have the completely sequenced genome of our model organism). I have found 
 a detailed step by step workflow for RNA-seq analysis from the University of 
 Alabama web-site (uab.edu). However, it refers to the eukaryotic system.
 Most examples provided and used for analysis are from eukaryotic systems. I 
 am a bit confused weather the same workflow will also work well for bacterial 
 systems as there are no splicing events or I should make some modifications. 
 Can you kindly suggest me which workflow should I follow for mapping the 
 bacterial reads (Bowtie, Tophat or PALMapper) and subsequent quantification 
 steps. I want some guidance in this regard.
 
 With kind regards,
 
 Bomba Dam
 -- 
 Dr. BOMBA DAM
 Alexander von Humboldt Postdoctoral Research Fellow
 Max-Planck-Institut für terrestrische Mikrobiologie
 Karl-von-Frisch-Straße 10
 D-35043 Marburg, Germany
 E mail: bomba@mpi-marburg.mpg.de
 PHONE: +49 176 321 321 75 (Mobile); +49 6421 178 721 (LAB); +49 6421 2828516 
 (ROOM)
 
 Assistant Professor of Microbiology
 Department of Botany, Institute of Science
 Visva-Bharati (A Central University)
 Santiniketan, West Bengal 731235, India.
 E mail: bumba_mi...@visva-bhatari.ac.in, bumba_mi...@rediffmail.com;
 
 
 
 
 


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Solution for: Error running cuffdiff. Error: cannot open reference GTF file CONDITION, CONTROL for reading

2012-02-14 Thread Jeremy Goecks
 The problem ended being the use of Perform Bias Correction(-b) and a
 GTF file with no Database/Build associated. Looking at cuffdiff
 wrapper I found, if a FASTA reference is not selected from the
 history, the FASTA reference of the GTF file associated build is used.
 If there is not build association, your cuffdiff run will fail with
 this not so helpful error.
 
 My feeling is, cuffdiff should check for a non-dashed string after
 '-b' and complain if is absents, but this doesn't happen currently.

Agreed. I implemented the spirit of this functionality via argument checking in 
galaxy-central changeset 71031bf3105c

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Clustering with cuffcompare or cuffdiff results

2012-02-14 Thread Jeremy Goecks
 1. It seems that it is better to run everything up to cuffdiff, but does 
 cuffdiff allow multiple sample comparison because I read somewhere that even 
 for multi-samples it still compare tham pairwisely?

Cuffdiff supports replicate analysis.

 In a sense, because I want to do clustering which needs some quantitative 
 data source to do the merging, will cuffdiff provide me some quantitative 
 measures rather than the test score and p-value which is too qualitative to 
 include? 

Take a look at the Cuffdiff documentation for outputs: 
http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output

 2. If I really need to get count data from the FPKM values, how do I obtain 
 the mentioned effective length? Would it be better if I treat each 
 assembled transcript as an object in clustering, rather than genes. What does 
 it mean you'd be throwing away Cufflinks' uncertainty even with using 
 isoforms as objects? How should I include the uncertainty into my clustering?

These FAQs from http://cufflinks.cbcb.umd.edu/faq.html address your questions:

--
I want to find differentially expressed genes. Can I use Cufflinks in 
conjunction with count-based differential expression packages?

It's possible, but we strongly advise against this. Current count-based 
differential expression tools are poorly suited to differential expression 
analysis in genomes with alternatively spliced genes. The main reason for this 
is that when a gene has multiple isoforms, a change in the total number of 
reads or fragments from that gene doesn't always correspond to a change in 
expression for that gene. Conversely, a gene's expression may change, but the 
total number of fragments generated by its isoforms may be very similar. In 
order to detect changes accurately, it's necessary to estimate how many 
fragments came from each individual splice variant in each sample. Current 
count-based tools don't do this (to our knowledge - please send us email if you 
know of one!). Even if they did, fragments that come from parts of genes that 
are shared by more than one splice variant can't generally assigned to a single 
isoform, so the fragment counts for each isoform are only estimates, and there 
is some uncertainty in the counts. Isoforms that are very similar will have a 
great deal of uncertainty surrounding their fragment counts. This uncertainty 
needs to be accounted for when testing for differential expression. So while 
you could use Cufflinks to estimate isoform-level counts, you'd be throwing 
away Cufflinks' uncertainty, and thus have more confidence in the differences 
you see than you really should. This will probably lead to many false positives 
in your analysis. Furthermore, we do not normalize simply by the length to 
calculate FPKM but an effective length, as explained in our publications. 
Calculting counts from FPKM by multiplying by the length will give incorrect 
results. We strongly encourage you to consider using Cuffdiff to find 
differentially expressed genes and transcripts.

Will you please report how many fragments come from each transcript in a future 
release?

For the foreseeable future, we will not be reporting the number of fragments we 
think originated from each transcript. People who have asked for this almost 
always want to use Cufflinks in conjunction with count-based differential 
expression packages, which is not a good idea. We're trying to keep our output 
formats as simple as possible.
--

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] How to get reads counts from cufflins?

2012-02-08 Thread Jeremy Goecks
Victor,

 I got the normalized values (FPKM) from cufflinks. And I want to get relative 
 reads counts. How can I do that?

It's not clear to me what you're looking for. FPKM is a normalized read count 
metric where the F stands for fragment, which is a single read (or half of a 
paired read).

 Another question: how does cufflinks handle isoform genes while calculating 
 the reads counts?  Or what papers can help me understand this?

Expectation maximization is used to probabilistically assign reads to isoforms. 
See the Cufflinks documentation for details and paper links:

http://cufflinks.cbcb.umd.edu/

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] How to get reads counts from cufflins?

2012-02-08 Thread Jeremy Goecks
Reads are probabilistically assigned, so raw read counts are not available from 
Cufflinks. 

Recovering raw fragment counts could be done by reverse-engineering the FPKM 
value, but Cufflinks doesn't do this for you. If you choose to do this, keep in 
mind that Cufflinks uses an effective transcript length.

Best,
J.

On Feb 8, 2012, at 11:06 PM, Li, Jilong (MU-Student) wrote:

 Dear Jeremy,
  
 Sorry, I didn't expressed my question clearly. I got the FPKM normalized 
 values for each gene from cufflinks. And I want to get the original reads 
 counts that were not normalized from cufflinks. Could you please tell me how 
 to get those?
  
 Thank you very much!
  
 Victor
  
  
  
  
  
 From: Jeremy Goecks [jeremy.goe...@emory.edu]
 Sent: Thursday, February 09, 2012 4:00 AM
 To: Li, Jilong (MU-Student)
 Cc: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] How to get reads counts from cufflins?
 
 Victor,
 
 I got the normalized values (FPKM) from cufflinks. And I want to get 
 relative reads counts. How can I do that?
 
 It's not clear to me what you're looking for. FPKM is a normalized read count 
 metric where the F stands for fragment, which is a single read (or half of a 
 paired read).
 
 Another question: how does cufflinks handle isoform genes while calculating 
 the reads counts?  Or what papers can help me understand this?
 
 Expectation maximization is used to probabilistically assign reads to 
 isoforms. See the Cufflinks documentation for details and paper links:
 
 http://cufflinks.cbcb.umd.edu/
 
 Best,
 J.

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Walltime exceeded

2012-01-30 Thread Jeremy Goecks
Peera,

Turning off bias correction can significantly shorten Cufflinks runtime.

If you still encounter this error, you'll want to use a local or cloud instance 
of Galaxy:

https://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy
http://wiki.g2.bx.psu.edu/Admin/Cloud

Good luck,
J.

On Jan 30, 2012, at 3:40 AM, Hemarajata, Peera wrote:

 Dear all,
 
 My Cufflinks jobs keep getting killed due to the walltime limit. Is there a 
 way to fix this or is there anything I can do to reduce the size of my BAM 
 datasets so the analysis can get done?
 
 Thank you!
 
 Peera Hemarajata
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Cufflinks merging more than one transcript on bacterial genomes

2012-01-25 Thread Jeremy Goecks
Noa,

 This is one thing I would like help with- is it worth simply reducing to 
 nothing the max intron size? What is accepted consensus when using tophat on 
 bacterial genomes?

I'm not sure that folks on this list have much experience with bacterial 
transcriptome analysis. You might try seqanswers.com or try emailing the 
Tophat/Cufflinks authors directly: tophat.cuffli...@gmail.com If you find 
something interesting in another place, please feel free to share with the 
Galaxy community.

 When I look at the second tophat file, of accepted hits, all hits align 
 nicely with known genes.  However, when I run cufflinks I run into the 
 following issues: when I use a reference genome, I get in addition to the 
 known transcripts, a bunch of very long transcripts spanning very large 
 genomic regions. Also, I will have two genes that are very near each other 
 but run in opposite directions (which you can see beautifully in the tophat 
 accepted hits alignments - different colors for each strand) but they merge 
 into a single CUFF identifier.  Is there any way I can address this- is it 
 something I am missing with respect to parameters I have to change because I 
 am working on a bacterial genome?


Reference genome or reference gene annotation? Using a genome to correct for 
bias should not change the assembled transcripts, only their expression levels. 
You can use a reference gene annotation either as ground truth or as a guide; 
using the reference as ground truth ensures that Cufflinks will only assemble 
transcripts defined in the annotation.

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Trackster errors

2012-01-20 Thread Jeremy Goecks
Erin,

This was due to a temporary issue that has been fixed. However, you'll need to 
copy the problematic datasets and use the new copy for visualization. To copy 
datasets, use History Options -- Copy Datasets; you can select the source 
history as your target history to copy datasets within a history.

Thanks,
J.

On Jan 20, 2012, at 9:57 AM, Erin Shanle wrote:

 Hello
 I would like to visualize the tracks of my tophat accepted hits bam file.  I 
 ran my first sample that was ~10,000,000 reads and it could be visualized in 
 both Trackster and the UCSC genome browser.  When I tried to visualize my 
 other samples (which ranged from 15,000,000 to 35,000,000 reads) it won't 
 show up in trackster because of an indexing error.  When I ran Picard 
 statistics on the tophat accepted hits bam output, I see that there was 
 successful alignment of ~90% of the reads.  Since I have one sample that 
 works, I am not sure how to address the issue.  
 
 Here's the error I get from Trackster when I try to visualize the samples:
 *** glibc detected *** python: double free or corruption (!prev): 
 0x00ff02b0 ***
 === Backtrace: =
 /lib/libc.so.6(+0x71ad6)[0x7ff6561f7ad6]
 /lib/libc.so.6(cfree+0x6c)[0x7ff6561fc84c]
 /lib/libc.so.6(fclose+0x14d)[0x7ff6561e8a1d]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(bam_index_load_literal+0x37)[0x7ff655cdbb17]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(+0x6e838)[0x7ff655cc7838]
 python(PyObject_Call+0x47)[0x41ef47]
 python(PyEval_CallObjectWithKeywords+0x43)[0x4a1a53]
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so(+0x52a44)[0x7ff655caba44]
 python[0x46f0a3]
 python(PyObject_Call+0x47)[0x41ef47]
 python(PyEval_EvalFrameEx+0x4878)[0x4a72b8]
 python(PyEval_EvalCodeEx+0x911)[0x4a95c1]
 python(PyEval_EvalFrameEx+0x4d12)[0x4a7752]
 python(PyEval_EvalCodeEx+0x911)[0x4a95c1]
 python(PyEval_EvalCode+0x32)[0x4a9692]
 python(PyRun_FileExFlags+0x13e)[0x4c98be]
 python(PyRun_SimpleFileExFlags+0xd4)[0x4c9ad4]
 python(Py_Main+0x9ed)[0x41a6bd]
 /lib/libc.so.6(__libc_start_main+0xfd)[0x7ff6561a4c4d]
 python[0x4198d9]
 === Memory map: 
 0040-0061d000 r-xp  fe:00 715090 
 /usr/bin/python2.6
 0081d000-0087f000 rw-p 0021d000 fe:00 715090 
 /usr/bin/python2.6
 0087f000-0088e000 rw-p  00:00 0 
 00d1a000-0109a000 rw-p  00:00 0  
 [heap]
 7ff65000-7ff650021000 rw-p  00:00 0 
 7ff650021000-7ff65400 ---p  00:00 0 
 7ff6556ed000-7ff655702000 r-xp  fe:00 1512749
 /lib/libgcc_s.so.1
 7ff655702000-7ff655902000 ---p 00015000 fe:00 1512749
 /lib/libgcc_s.so.1
 7ff655902000-7ff655903000 rw-p 00015000 fe:00 1512749
 /lib/libgcc_s.so.1
 7ff655903000-7ff655933000 r-xp  00:13 1154848
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so
 7ff655933000-7ff655a32000 ---p 0003 00:13 1154848
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so
 7ff655a32000-7ff655a37000 rw-p 0002f000 00:13 1154848
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/ctabix.so
 7ff655a37000-7ff655a39000 rw-p  00:00 0 
 7ff655a39000-7ff655a55000 r-xp  fe:00 738300 
 /usr/lib/python2.6/lib-dynload/_ctypes.so
 7ff655a55000-7ff655c55000 ---p 0001c000 fe:00 738300 
 /usr/lib/python2.6/lib-dynload/_ctypes.so
 7ff655c55000-7ff655c59000 rw-p 0001c000 fe:00 738300 
 /usr/lib/python2.6/lib-dynload/_ctypes.so
 7ff655c59000-7ff655d29000 r-xp  00:13 1154847
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so
 7ff655d29000-7ff655e29000 ---p 000d 00:13 1154847
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so
 7ff655e29000-7ff655e39000 rw-p 000d 00:13 1154847
 /galaxy/home/g2main/galaxy_main/eggs/pysam-0.4.2_kanwei_b10f6e722e9a-py2.6-linux-x86_64-ucs4.egg/csamtools.so
 7ff655e39000-7ff655ec rw-p  00:00 0 
 7ff655fc-7ff656185000 rw-p  00:00 0 
 7ff656186000-7ff6562de000 r-xp  fe:00 1514734
 /lib/libc-2.11.2.so
 7ff6562de000-7ff6564dd000 ---p 00158000 fe:00 1514734
 /lib/libc-2.11.2.so
 7ff6564dd000-7ff6564e1000 r--p 00157000 fe:00 1514734
 /lib/libc-2.11.2.so
 7ff6564e1000-7ff6564e2000 rw-p 0015b000 fe:00 1514734
 

Re: [galaxy-user] How to find out SNPs and point mutations in RNA-Seq data using Galaxy?

2012-01-09 Thread Jeremy Goecks
Wei,

The pileup tool will help you find SNPs in your data; you'll want to read the 
documentation to understand how best to use it for your needs. You can also try 
the Unified Genotyper on our test server ( http://test.g2.bx.psu.edu/ ), but 
it's in alpha/beta status and we aren't providing any support for it yet.

Good luck,
J.


On Jan 9, 2012, at 1:28 AM, ericliao...@gmail.com ericliao...@gmail.com 
wrote:

 HI, 
 I am new to the RNA-seq, and the only available sources for me to do analysis 
 is the Galaxy server. I want find out SNP and point mutations in RNA-Seq data 
 using Galaxy (I do not know if anyone using RNA-seq data to find point 
 mutations, because there is whole Genome sequencing for reporting mutations 
 and SNPs). I have been searching in the forum for a step-by-step protocols 
 for doing it, but could not find it. 
 I have one normal sample and two cancer samples, a TopHat produced accepted 
 Hits.bam file for each one. 
 I want to find out SNP and point mutations in the cancer samples, so How do I 
 go from here? Can anyone show me how to do it in Galaxy main server? 
 Thanks!
  
 Wei
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Make Galaxy continue running when i close the browser

2012-01-05 Thread Jeremy Goecks
Efthymois,

You'll want to run Galaxy as a daemon process. Run

% sh run.sh --help

to get more information on running Galaxy as a daemon.

Also, please direct questions about running/configuring a local Galaxy instance 
to galaxy-dev (cc'd) rather than galaxy-user, which is for tool and analysis 
questions.

Best,
J.

On Jan 5, 2012, at 8:06 AM, Makis Ladoukakis wrote:

 Dear Galaxy users,
 
 I have installed a local Galaxy instance on a server and I use it to run 
 certain genomic assembly workflows. Nevertheless with larger datasets 
 completion may take up to one day. How can i make Galaxy to continue the 
 operation even when i close the browser? Is that possible on a local instance 
 or on the main server?
 
 Thank you,
 Efthymios Ladoukakis
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Running cufflinks on a genome without a bowtie index

2011-12-22 Thread Jeremy Goecks
Noa,

Using your FASTA in Tophat and Cufflinks is the correct approach. You don't 
need to provide an annotation file in Cufflinks, and you can also avoid using 
your FASTA in Cufflinks by not using bias correction.

If you're still having problems, the issue is likely your parameter choices in 
Tophat and/or Cufflinks. You'll want to read the documentation carefully to 
choose parameters appropriately for your data.

Good luck,
J.

On Dec 22, 2011, at 5:09 AM, Noa Sher wrote:

 Hi
 I am trying to run Cufflinks on a genome without a bowtie index.
 How do I make my own index? I have a FASTA file of the genome, but if I run 
 tophat using just that and then cufflinks using a gtf file of the 
 transcriptome, I get zero in all FPKM values
 Thanks
 Noa
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] suggestions for de novo assembly plant transcriptome without reference

2011-12-21 Thread Jeremy Goecks
Baohua and Jane,

As David noted, there is a Trinity wrapper for Galaxy, it works, and Trinity is 
great. 

However, Trinity is not enabled/installed on our public server (main.g2) or on 
Galaxy cloud instances (Amazon) right now. You'll need a little programming 
expertise to set up Trinity locally or on the cloud. Also, the Galaxy's team 
support for Trinity is very minimal right now as we haven't done much testing 
with it yet.

Good luck,
J.

On Dec 21, 2011, at 4:01 PM, David Matthews wrote:

 Hi Jane,
 
 I have used Trinity on a local installation here at Bristol University. The 
 main reason its not on Galaxy main is because its very very memory intensive 
 (we run it on nodes with 256GB RAM). So you really need access to a big 
 machine to run it. Having said all that the output is astoundingly good so 
 it's worth the time and effort to get a run going if you can.
 
 Cheers
 David
 
 
 
 On 21 Dec 2011, at 13:36, Jane Song wrote:
 
 Dear Galaxy Expert,
 
 I would like to use Galaxy to de-novo assembly single-end read illumina data 
 (140bp) for plant transcriptomes (without reference).  I remember early 
 emails mention trinity in Galaxy. But I could not see at Galaxy web 
 http://main.g2.bx.psu.edu/root
 . Maybe it is installed in Amarzon EC2? Other suggestions in de-novo 
 assembly plant transcriptomes without reference.
 
 Many thanks and look forward to hearing back from you,
 Jane
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Extract genomic DNA

2011-12-14 Thread Jeremy Goecks
Rebecca,

You should be able to use a custom genome with this tool by selecting History 
from the Source for Genomic Data parameter.

The bug you're describing has, to the best of my knowledge, been fixed in 
Galaxy and should not be present anymore. On which Galaxy instance are you 
seeing this issue? If this is not the main Galaxy server ( 
http://main.g2.bx.psu.edu/ ), you'll want to contact the maintainers of the 
instance that you're using and ask them to update the instance.

Best,
J.

On Dec 14, 2011, at 1:31 PM, Rebecca C Mueller wrote:

 Hi all,
 Does anyone know if you can use the Extract Genomic DNA command with a 
 genome not in the database? I am working with an algal genome (C. merolae) 
 that isn't currently in the pulldown Database/Build menu. I keep getting the 
 Unspecified genome build error, and am assuming that's the problem, as my 
 other files appear to be formatted correctly (tab delimited without spaces 
 for the intervals, same names for chromosomes in interval and fasta file, 
 etc).
 Thanks!
 Rebecca
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Changing Bowtie parameters in TopHat

2011-11-17 Thread Jeremy Goecks
 Thanks for your help. I'm mapping reads from one organism to a related but 
 different organism, so some of the parameters I'd like to adjust are to relax 
 mapping stringency -specifically:
 
 -n 3 (allow 3 mismatches in seed)
 -e 250 (allow cummulative phred score of 250 [or some other value depending 
 on read length] for mismatches in remainder of read)
 
 I'd also like to only report alignments that are unambiguously mapped to a 
 single location, so:
 
 -m 1
 --best on
 --strata on
 
 It sounds like I need to read the documentation again, but it didn't look at 
 first glance like I could specify these things.

Yes, reading the documentation is highly recommended. 

You can definitely specify -m, but you may have to think creatively about how 
to modify Tophat's available parameters to meet your needs. You might also 
contact the Tophat authors directly and see if they have any suggestions: 
tophat.cuffli...@gmail.com

Good luck,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Changing Bowtie parameters in TopHat

2011-11-16 Thread Jeremy Goecks
Jeremy,

 My apologies if this has been covered before but I am using Galaxy Main and 
 wonder if, when running TopHat, you can modify the mapping parameters used by 
 Bowtie?

Not all Bowtie parameters can be modified when running Tophat. Which parameters 
are you looking to modify and why?

 It seems that the full parameter list for TopHat pertains only to the reads 
 that aren't mapped by Bowtie (the reads spanning splice junctions).

This should not be the case. For instance, documentation for the max-multihits 
discusses multiple hits when mapping reads junctions/segments:

http://tophat.cbcb.umd.edu/manual.html

If you're seeing different results, it may be a bug that could be discussed 
with the Tophat authors: tophat.cuffli...@gmail.com

 Is there a way to access the full parameter list of Bowtie through TopHat?

Not currently.

 Or perhaps run Bowtie directly, then feed this into a TopHat run?

I don't think this is possible b/c Tophat uses the reads that map initially to 
build the coverage islands and then uses these islands to generate an index of 
potential splice junctions.

Best,
J.




___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] generic filenames with Export to File

2011-11-14 Thread Jeremy Goecks
 
 There is currently no way to do this but it would definitely be a useful 
 option to have. I've opened a ticket that you can follow and/or comment on if 
 you're interested:
 
 https://bitbucket.org/galaxy/galaxy-central/issue/680/preserve-dataset-names-when-exporting

I forgot to mention that you can inspect the datasets_attrs.txt to see the 
mapping between datasets and files. datasets_attrs.txt contains a JSON dict, so 
it would be possible to write a little script that renames datasets based on 
the values in the dict.

J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Permissions and private roles

2011-10-24 Thread Jeremy Goecks

 Also another question about permissions. If I create a Galaxy page and share 
 that with limited users then it appears that the datasets are all public via 
 a URL is that correct?

Yes, all datasets are public via URL by default in Galaxy, and a Galaxy Page 
makes it easy to find this URL. Without knowing the dataset hash id and/or the 
instance's secret key, it's very difficult to guess a URL that leads to a valid 
dataset.

To change a dataset's permissions, click on the pencil (Edit attributes) and 
scroll to the bottom of the attributes page.

Thanks,
J. 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Problem with Cuffcompare

2011-10-13 Thread Jeremy Goecks
Chandu,

I've deleted my copy of your history to save space as your history was quite 
large. Please rerun Cuffcompare with the modifications I suggested below.

Thanks,
J.

On Oct 13, 2011, at 4:26 PM, Chandu Galaxy wrote:

 Thank you very much Jeremy.
 
 Can I have a look at the re-ran datasets?
 
 
 On Thu, Oct 13, 2011 at 12:12 PM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Chandu,
 
 There are two problems:
 
 (1) you mapped your reads to AgamP3, but the dbkeys for all of your Cufflinks 
 datasets is anoGam1. This should not have happened automatically with Galaxy, 
 but I'm looking into the issue now. Did you do this yourself?
 
 (2) Galaxy does not have sequence data for anoGam1, and this directly led to 
 the problem that you're seeing.
 
 I corrected the problem by manually assigning build AgamP3 to your Cufflinks 
 datasets and then rerunning Cuffcompare. In the future, I expect that we'll 
 add anoGam1 data to our public server, but it's not clear when this will 
 occur.
 
 Thanks,
 J.
 
 
 
 On Oct 12, 2011, at 5:35 PM, Chandu Galaxy wrote:
 
 Thank you Jeremy. I've shared my History named 'Mosquito Work: RNA-Seq 
 analysis 2' with you just now. Please see the datasets from 1-58 (also see 
 deleted datasets). Thanks.
 
 --
 Chandu
 
 On Wed, Oct 12, 2011 at 2:02 PM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Chandu,
 
 Are you running your analysis on our public server ( main.g2.bx.psu.edu )? 
 If so, can you share your history me please (Options--Share/Publish--Share 
 with a User--my email address).
 
 Thanks,
 J.
 
 On Oct 11, 2011, at 4:26 PM, Chandu Galaxy wrote:
 
 Thank you for the response.
 I can't check my reference genome dataset because I'm using reference 
 provided by Galaxy (Mosquito (Anopheles gambiae): AgamP3). Is there any 
 solution? Thank you.
 
 --
 Chandu
 
 
 
 On Mon, Oct 10, 2011 at 7:15 AM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 
 Tool execution generated the following error message:
 Error running cuffcompare. Warning: Your version of Cufflinks is not 
 up-to-date. It is recommended that you upgrade to Cufflinks v1.1.0 to 
 benefit from the most recent features and bug fixes 
 (http://cufflinks.cbcb.umd.edu).
 No fasta index found for ./input1. Rebuilding, please wait..
 Error: sequence lines in a FASTA record must have the same length!
 Chandu,
 
 Cufflinks/compare/diff requires that your reference genome dataset have the 
 following format:
 
 my_chrom
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT
 ...
 
 Note that all lines of sequence data have the same length. 
 
 The problem you're seeing is because there are lines in your sequence data 
 that are not the same length, e.g.
 
 my_chrom
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGT
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTA
 AGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCGGTAGTTACCG
 ...
 
 The FASTA Width tool in Galaxy can help you format your dataset correctly.
 
 Good luck,
 J.
 
 
 
 
 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] de novo assembly

2011-09-29 Thread Jeremy Goecks

Cecilia,


Are you trying to use the Public Galaxy or a local install? There
are several assemblers with Galaxy Wrappers on the Galaxy
ToolShed (e.g. Roche Newbler, and MIRA 3) which you could
add to your own local Galaxy if you have one.


There are wrappers for ABySS as well. These assemblers are generally  
for genome data.


For transcriptome data, galaxy-central provides a wrapper for the  
Trinity assembler.



However, do novo genome assembly can be very computationally
demanding, so not many Galaxy Instances will want to offer it.



If you don't want to/can't set up a local instance for assembly,  
consider using a cloud instance:


http://wiki.g2.bx.psu.edu/Admin/Cloud

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] rna-seq mutation detection

2011-08-30 Thread Jeremy Goecks
Rich,

You can convert base quality scores using the FASTQ Groomer tool. Note that 
Galaxy tools typically work with Sanger (Phred+33) quality scores.

Good luck,
J.

On Aug 29, 2011, at 10:48 PM, Richard Mark White wrote:

 Hi,
   Thanks very much.  I've tried this, but one thing I have noticed is that if 
 I do the initial mapping with BWA vs. Bowtie the # of variants I get is much 
 larger with BWA.  I have seen mention on the web that you need to change the 
 quality score annotation for BWA before running SAMtools, but not sure 
 precisely how to do this.  any thoughts?
 
 Rich
 
 
 From: Jeremy Goecks jeremy.goe...@emory.edu
 To: Richard Mark White whit...@yahoo.com
 Cc: galaxy-user@lists.bx.psu.edu galaxy-user@lists.bx.psu.edu
 Sent: Sunday, August 28, 2011 2:18 PM
 Subject: Re: [galaxy-user] rna-seq mutation detection
 
 Rich,
 
 Given that you're analyzing your RNA-seq data using Galaxy, I'd guess that 
 you're using Tophat to map your reads onto on reference genome. If this is 
 the case, then you can use the BAM files produced by Tophat to generate 
 variation data for each sample. The variation tools that you'll want to look 
 at are 
 
 [NGS: SAM Tools--]Generate Pileup
 [NGS: GATK Tools--]Unified Genotyper (only avaiable on our test server and 
 still in beta)
 
 The outputs for each tool produce a consensus base for each potential 
 variation site, and you can compare the consensus base for each sample to 
 look for differences.
 
 If you're doing de novo assembly of your RNA-seq data to look for variation, 
 you'll need to use tools that are not currently available in Galaxy.
 
 Good luck,
 J.
 
 
 On Aug 18, 2011, at 12:22 PM, Richard Mark White wrote:
 
 Hi,
   I am trying to look at differences between two RNA-seq samples to see if 
 there are mutations in one of them relative to the other (i.e. not in 
 comparison to a reference genome).  Does anyone know of a way to do this 
 within galaxy?  Any help is appreciated!
 
 rich
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 
 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] rna-seq mutation detection

2011-08-28 Thread Jeremy Goecks
 === Please use Reply All when responding to this email! ===

Rich,

Given that you're analyzing your RNA-seq data using Galaxy, I'd guess that 
you're using Tophat to map your reads onto on reference genome. If this is the 
case, then you can use the BAM files produced by Tophat to generate variation 
data for each sample. The variation tools that you'll want to look at are 

[NGS: SAM Tools--]Generate Pileup
[NGS: GATK Tools--]Unified Genotyper (only avaiable on our test server and 
still in beta)

The outputs for each tool produce a consensus base for each potential variation 
site, and you can compare the consensus base for each sample to look for 
differences.

If you're doing de novo assembly of your RNA-seq data to look for variation, 
you'll need to use tools that are not currently available in Galaxy.

Good luck,
J.


On Aug 18, 2011, at 12:22 PM, Richard Mark White wrote:

 Hi,
   I am trying to look at differences between two RNA-seq samples to see if 
 there are mutations in one of them relative to the other (i.e. not in 
 comparison to a reference genome).  Does anyone know of a way to do this 
 within galaxy?  Any help is appreciated!
 
 rich
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Cufflinks quartile normalization

2011-08-27 Thread Jeremy Goecks
 === Please use Reply All when responding to this email! ===

David,

Quartile normalization is explained in the Cufflinks manual: 
http://cufflinks.cbcb.umd.edu/manual.html --

With this option, Cufflinks normalizes by the upper quartile of the number of 
fragments mapping to individual loci instead of the total number of sequenced 
fragments. This can improve robustness of differential expression calls for 
less abundant genes and transcripts.

My reading of this is that the M in FPKM is taken from the upper quartile 
rather than the total; if the FPKM numbers for highly expressed isoforms change 
substantially, that suggests many of your reads are mapping to minimally 
expressed isoforms. Without knowing more about your experiment, it's not 
possible to say whether you should be doing quartile normalization. However, 
given that it's designed for DE calls for less abundant isoforms, you'll want 
to see whether this holds true for your dataset(s) and whether Cuffdiff DE 
tests makes sense in the context of your research questions.

Good luck,
J.


On Aug 25, 2011, at 1:49 PM, David Joly wrote:

 Can someone help me understanding the quartile normalization in Cufflinks? I 
 read different threads in which they reported that the FPKM values were 
 inflated after normalization (-N) but most people didn't report their values 
 so I don't know how big the inflation should be...
  
 In my case, the difference is huge! The FPKM values for the four first genes 
 without normalization are in the range of [61 - 184] while after 
 normalization, they are in the range of [2.4e+6 - 7.4e+6]. Even though this 
 inflation does not seem to affect the calculation of the gene expression 
 changes [ log (FPKM2/FPKM1) ], I'm wondering if something is wrong with my 
 dataset.
  
 Is it was I should expect? Is it always better to use the normalization?
  
 Thanks,
  
 David

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Cufflinks with reference annotation and without reference annotation

2011-08-18 Thread Jeremy Goecks

Crystal,

If you provide a gene annotation to Cufflinks, the transcripts  
produced will match those in the annotation exactly. If you assemble  
without a gene annotation, the transcripts produced will match the  
reference in some cases, but, in others, will not match the reference  
due to small and/or large errors. Because '=' denotes an exact match  
between an assembled transcript and a reference transcript, more '='  
are to be expected when Cufflinks has a gene annotation.


Finally, a couple procedural issues:

*please send questions about analyses and tool usage to the galaxy- 
user mailing list, not galaxy-dev or individual developers;
*please do not send duplicate emails as it can confuse our tracking  
system and slow down our response rather than speed it up.


Good luck,
J.

On Aug 17, 2011, at 9:14 AM, Crystal Goh wrote:

Hi, I am Crystal. I have some problem with Cuffdiff output. Hope can  
get some advice. Thanks.



After aligning RNA-seq reads with Tophat, I used the Tophat output  
for Cufflinks.


For Cufflinks, I tried two approaches and compared the results:
1st approach: Put zebrafish Ensembl GTF as reference annotation
2nd approach: without reference annotation.


From the output of above 2 approaches, I continued with Cuffcompare  
(with reference annotation) and Cuffdiff,
Attached word document is the workflow and parameters I set for  
these 2 approaches.



When I compared the output of Cuffdiff between these 2 approaches, a  
total of 48584 tracking id with class code = was observed in  
trancript FPKM tracking file from Approach 1, whereas there is only  
1248 tracking id with class code '=' from Approach 2 (I attached  
transcript FPKM tracking files from approach 1 and 2)


In my opinion, I should observe 48584 tracking id with class code  
'=' and additional tracking id with other class codes in transcript  
FPKM tracking file from Approach 2.


Can I get advice on this?


Thank you.

Best regards,
Crystal
Workflow and parameter for 2 approaches.zipApproach 1 Transcript  
FPKM tracking  (Cufflinks with reference annotation).zipApproach 2  
Transcript FPKM tracking  (Cufflinks without reference  
annotation).zip


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Any thing wrong with my cufflink process in galaxy?

2011-08-14 Thread Jeremy Goecks
Yao,

It's difficult to tell what's wrong without seeing your analysis. However, you 
may want to use the reference annotation during the Cufflinks phase to either 
estimate isoform expression or guide assembly (this option will appear on our 
public server soon). Read the Cufflinks documentation to understand these 
options and what they do for your assembly and FPKM values:

http://cufflinks.cbcb.umd.edu/manual.html#cufflinks

De novo assembly from mapped reads is often somewhat imprecise and incomplete, 
especially for low-coverage data. It's not surprising that a de novo assembly 
doesn't match especially well with the reference.

If you're still not seeing any differential expression after using the 
reference GTF in Cufflinks, Cuffcompare, and Cuffdiff, you may want to email 
the Cufflinks/compare/diff authors and ask for some pointers: 
tophat.cuffli...@gmail.com

Good luck,
J.



On Aug 10, 2011, at 5:07 AM, yao chen wrote:

 Dear all:
 
 Recently, I run cufflink in galaxy on the internet. I want to compare two 
 samples, However, I found no transcript or gene passed the significant level, 
 even many of them have large FPKM in one sample and 0 FPKM in another sample.
 
 Any thoughts?
 
 Below is my cufflink process:
 
 I have four samples belong to two group. the test have three samples, and the 
 control has one sample.
 
 First, using accept_hit.bam from tophat, I run cufflink without annotation on 
 each sample.
 
 Then, for the four gtf files from four samples, I run cuffcompare to 
 combine these transcript and compare to the annotation genome. However, at 
 this step, I found the transcript accuracy is very low. 
 See one example:
 Missed exons:10673/11776 ( 90.6%)
 Wrong exons:1254/2007 ( 62.5%)
  Missed introns:8529/8637 ( 98.7%)
   Wrong introns:2/5 ( 40.0%)
  Missed loci:0/504 (  0.0%)
   Wrong loci:1248/2002 ( 62.3%)
 
 at last, I run cufdiff between this two group sample. 
 
 Thank you.
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] cuffmerge question

2011-08-14 Thread Jeremy Goecks
Carol,

 My question is, if I use the public Galaxy server interface to TopHat and 
 Cufflinks, is there any access to cuffmerge?

No, Cuffmerge is not available in Galaxy.

 Also, I'm trying to understand the difference between using cuffmerge and 
 then using cuffcompare (without a reference genome) to assemble gtf 
 transcript files produced by Cufflinks for each group of 3 Illumina 
 paired-end reads corresponding to biological replicates, in order to use the 
 resulting combined gtf file for comparing the TopHat alignments of two such 
 groups using cuffdiff.   
 
 Is there any difference in the output between cuffdiff and cuffcompare, using 
 in this fashion?  For example, do they form the union of transcripts by the 
 same rules, and do their outputs contain (or lack) the same columns (strand, 
 perhaps??)  I've read things on seq-answers indicating that I should be using 
 cuffmerge, but I can't find it on the public server and apparently haven't 
 installed it properly on my own computer so far.


From the Cufflinks/compare/merge/diff documentation ( 
http://cufflinks.cbcb.umd.edu/manual.html#cuffmerge ): 

*Cuffmerge calls Cuffcompare and does some filtering of transfrags as well as 
merging of novel and known isoforms;
*The main purpose of this script is to make it easier to make an assembly GTF 
file suitable for use with Cuffdiff.

Hence, it appears that Cuffmerge and Cuffcompare are relatively similar and use 
the same basic union algorithm--whatever Cuffcompare uses. If you have more 
detailed questioned, you might ask the Cufflinks' authors: 
tophat.cuffli...@gmail.com

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] visualization

2011-08-04 Thread Jeremy Goecks
Jiannong,

Hans is on right track. You can indeed visualize your data using Trackster, 
Galaxy's genome browser; Trackster is available via the Visualization tab.

Here are the steps needed to visualize your dataset:

(1) Use the [FASTA Manipulation -- Compute Sequence length] tool to compute 
lengths the contigs in your build;

(2) if there are spaces in your contig names, you'll need to use only the 
characters before the first space as contig names because this is what the 
mapping tools do; use [Text Manipulation -- Convert delimiters to TAB] and 
then [Text Manipulation -- Cut ] to cut the first and last column from the 
dataset; now you should have a file in the form

contig_nametabcontig_length

(2) Create a custom build:
(a) User tab -- Custom Builds;
(b) scroll to the bottom, enter a name and key for your build and copy 
in the contig names and lengths you created in steps (1) and (2);

(3) Set the dbkey for the dataset(s) that you want to visualize by clicking on 
the pencil icon for each dataset and selecting your custom dbkey.

(4) Use the Trackster icon next to a dataset--see attached screenshot--and 
insert the dataset into a new browser.

Let us know if you have any problems. And, yes, we're working to make this 
process much easier.

Best,
J.

inline: Screen shot 2011-08-04 at 11.09.34 AM.png



On Aug 4, 2011, at 3:13 AM, Hans-Rudolf Hotz wrote:

 Hi
 
 Assuming you know the length of your contigs, you can add them as a custom 
 build.
 
 Click on: 'Visualization - 'New Track Browser' - 'Add a Custom Build'
 
 
 Hope this helps.
 
 Regards, Hans
 
 
 On 08/04/2011 01:51 AM, vasu punj wrote:
 IGV should allow you to do this  but not sure about trackbrowser in Galaxy.
 Vasu
 
 --- On Wed, 8/3/11, Jiannong Xuj...@nmsu.edu  wrote:
 
 
 From: Jiannong Xuj...@nmsu.edu
 Subject: [galaxy-user] visualization
 To: galaxy-user@lists.bx.psu.edugalaxy-user@lists.bx.psu.edu
 Date: Wednesday, August 3, 2011, 6:03 PM
 
 
 Hi Jen,
 
 I mapped illumine reads to draft genomic contigs, and try to visualize the 
 mapping. Is there any way I can use my own reference contigs for 
 visualization?
 
 Thanks
 
 John Xu
 NMSU
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Questions on CuffDiff Output and Browser Visualization

2011-07-06 Thread Jeremy Goecks
Kurinji,

 1. when I look at my differentially expressed transcripts file (generated 
 using ensembl hg19 as a reference with chr added on to obtain results with 
 ensembl gene names) and search for specific genes that I am interested in I 
 can not find them in my cuffdiff output file - even though I can visualize 
 these genes on IGV and they look obviously differentially regulated. Also, 
 given that the cuffdifff output for differentially expressed transcripts does 
 list all trascripts, including the ones that have not significantly changed, 
 wouldn't transcripts for these genes be listed anyway, even if my visual 
 ballparking on differential regulation is not statistically significant? I 
 would really like to know why I am missing genes from my cuffdiff output.

It's not possible to answer this question in general because it's specific to 
your analysis; in particular, your use of a reference annotation file is going 
to influence Cuffdiff's outputs. You might try using positional information 
rather than gene names when searching through Cuffdiff files as the gene short 
name/ID is only used for known transcripts/genes.

More detailed questions are probably best directed to the Cufflinks authors: 
tophat.cuffli...@gmail.com

 2. do you all get a good correlation between the top differentially expressed 
 transcripts/genes generated from cuffdiff and how the data looks when 
 visualized on IGV - ie. do your upregulated transcripts really look 
 upregulated when visualizing? I found that while some validate visually, some 
 do not which is confusing

Cufflinks uses multiple statistical techniques to estimate FPKM and 
differential expression; in some cases, it may not be possible to visually 
observe differential expression amongst transcripts. Alternatively, setting 
additional parameters (e.g. normalization) may lead to results that match what 
you're looking for (visually or otherwise).

 3. when visualizing on a browser, and if different transcripts for one gene 
 are regulated differently - ie. some are up in your treated sample but some 
 are done for the same gene - how can you tell which transcriptID from 
 cuffdiff corresponds with what you are seeing?

This information can be found in Cuffdiff's transcript FPKM tracking and 
differential expression testing files.

Good luck,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Cuffdiff Question

2011-06-28 Thread Jeremy Goecks
Hello Kurinji,

 I was at your USC Galaxy seminar last week, which I found very helpful - 
 thank you!

Glad to hear that you found the workshop helpful. As a reminder, please email 
questions about using Galaxy and its tools to the galaxy-user mailing list 
(which I've cc'd). You may get quicker and different responses from community 
members, and everyone will benefit from the discussion.

 I used my recently generated RNAseq data in Galaxy (which was pre-aligned 
 using tophat and already had cufflinks run on it) - I ran cuffcompare with 
 all the gtf files and then cuffdiff for the three pairs (there is 1 control 
 and 3 different drug treatments - no replicates). I got several output files, 
 as expected, but decided just to look at the gene differential expression as 
 a start. Some questions I have are - 
 
 1. (very basic question!) which is sample 1 (and corresponding value 1) and 
 sample 2 (and corresponding value 2)in my output file. This is what my output 
 file is called - 
 
 90: Cuffdiff on data 37, data 38, and data 60: gene differential expression 
 testing 33,969 lines
 
 Is 37 sample one or sample two? Given the data - I would expect sample 37 to 
 correspond to value 2 - but I could be wrong. Please let me know!

The best way to figure out which dataset corresponds with Cuffdiff's labels is 
to click the rerun button in the dataset: sample names correspond directly to 
the reads datasets (i.e. BAM files) provided as input to Cuffdiff.

 2. How do I find the UCSC gene names corresponding with start/end sites - I 
 did input the hg18 UCSC gtf file as a reference


You'll need to use a reference annotation (GTF file) that has the gene_name 
attribute as input for Cufflinks/compare/difff. Typically Ensembl annotations 
have this attribute; however, you'll need to prepend 'chr' to each 
line--really, to each chromosome name--in order to bring Ensembl notation in 
line with UCSC/Galaxy notation.

 Actually, I noticed that value 1 in this particular output file is all 0 - no 
 idea why. It is not this way in the other files, making me wonder if there is 
 an error somewhere. I am sure the bam file is okay as I viewed it on IGV and 
 saw the patterns I would expect for some candidate genes I looked at.

It's difficult for me to comment without seeing your analysis. Some output 
files depend on particular attributes being set correctly in the annotation 
file. You may want to search through our mailing list archives and see if your 
question has already been answered: 
http://gmod.827538.n3.nabble.com/Galaxy-Users-f815892.html

Good luck,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Cuffdiff Question

2011-06-28 Thread Jeremy Goecks

 Thanks for the reply. I tried to use the script provided on a previous galaxy 
 thread for adding the chr on to the gtf file on the mac terminal but I keep 
 getting this error - 
 
 awk: can't open file ensembl.gtf
  source line number 1
 
 I am very new to using the terminal so please let me know if there is 
 something basic that I am not doing right,

Try this Galaxy workflow:

http://main.g2.bx.psu.edu/u/jeremy/w/make-ensembl-gtf-compatible-with-cufflinks

It simply prepends 'chr' to the chromosome name, which is needed if you're 
using an Ensemble reference annotation and want to use it with 
Cufflinks/compare/diff in Galaxy.

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] question on cufflinks output

2011-06-25 Thread Jeremy Goecks
Hello Wen,

It's not necessary to send multiple emails to the mailing list; we track 
incoming emails to ensure that we respond to all of them.

Your FPKM values do look high, but keep in mind that coverage is only part of 
the FPKM calculation; it's also dependent on transcript length and the total 
number of reads in your sample. Your transcript lengths look very short, so 
that may be skewing your FPKM values. For the record, Cufflinks is using 
scientific/E notation, so e denotes powers of 10 in the FPKM output. 

A good place to ask followup questions about cufflinks output is the cufflinks 
help email address: tophat.cuffli...@gmail.com

Good luck,
J.

On Jun 24, 2011, at 10:35 AM, Wen Huang wrote:

 Dear Galaxy team and users,
 
 I have a question on the output by cufflinks on Galaxy.
 
 I started with about 28M paired-end reads and mapped them to the reference 
 genome using Tophat on Galaxy. The aligned fragments were assembled by 
 cufflinks, again on Galaxy and I got an output with the first few lines on 
 the bottom of this email.
 
 I was wondering how could cufflinks possibly estimate FPKM on the order of 
 e+07 when the coverage is between 8-50 fragments per base and the total 
 mapped fragments smaller than 28M. Assuming that 20M fragments were mapped, 
 the FPKM should be something around coverage/28. Was the e in the output the 
 Euler's number or 10?
 
 I appreciate your help.
 
 Thanks,
 Wen Huang
 tracking_id   class_code  nearest_ref_id  gene_id gene_short_name tss_id  
 locus   length  coveragestatus  FPKMFPKM_conf_loFPKM_conf_hi
 CUFF.2.1  -   -   CUFF.2  -   -   chr1:90301-90706
 405 21.1837 OK  1.84527e+07 1.10716e+07 2.58338e+07
 CUFF.1.1  -   -   CUFF.1  -   -   chr1:65419-65692
 273 30.9833 OK  2.31848e+07 8.52143e+06 3.78481e+07
 CUFF.3.1  -   -   CUFF.3  -   -   chr1:135255-135896  
 641 8.61389 OK  6.31907e+06 3.41968e+06 9.21846e+06
 CUFF.4.1  -   -   CUFF.4  -   -   chr1:155808-156529  
 721 7.26147 OK  5.32695e+06 2.88278e+06 7.77112e+06
 CUFF.5.1  -   -   CUFF.5  -   -   chr1:160421-160729  
 308 17.6004 OK  1.77483e+07 7.50132e+06 2.79953e+07
 CUFF.6.1  -   -   CUFF.6  -   -   chr1:170695-171212  
 517 9.16414 OK  8.41605e+06 4.44869e+06 1.23834e+07
 CUFF.7.1  -   -   CUFF.7  -   -   chr1:180885-181188  
 303 30.5702 OK  2.6515e+07  1.36533e+07 3.93767e+07
 CUFF.8.1  -   -   CUFF.8  -   -   chr1:184397-184702  
 305 26.712  OK  2.13696e+07 9.94707e+06 3.27921e+07
 CUFF.10.1 -   -   CUFF.10 -   -   chr1:233237-234095  
 858 3.71208 OK  3.31435e+06 1.60283e+06 5.02588e+06
 CUFF.9.1  -   -   CUFF.9  -   -   chr1:203688-204070  
 382 41.6301 OK  5.36082e+07 4.02061e+07 6.70102e+07
 CUFF.11.1 -   -   CUFF.11 -   -   chr1:239126-239664  
 538 19.5995 OK  2.0562e+07  1.45634e+07 2.65605e+07
 CUFF.12.1 -   -   CUFF.12 -   -   chr1:243903-244327  
 424 10.3509 OK  1.07542e+07 5.37709e+06 1.61313e+07
 CUFF.15.1 -   -   CUFF.15 -   -   chr1:240487-240995  
 508 15.8596 OK  1.83065e+07 1.23671e+07 2.42459e+07
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Extract Genomic DNA Problem

2011-06-21 Thread Jeremy Goecks
Stephen,

This is a formatting issue with your input file; it needs to be tab-delimited 
but it's not currently. You'll need to:

(a) convert spaces to tabs using the Convert delimiters to Tabs tool;
(b) click on the pencil icon and set the data type to BED.

Best,
J.

On Jun 21, 2011, at 8:45 AM, Stephen Taylor wrote:

 Hi,
 
 I was trying to extract FASTA sequences using the following tab separated 
 data for Chicken on the Galaxy Main server:
 
 chr5   4725816847259240
 chr18  1938527 1939965
 chr2   101973625   101974007
 chr4   7565389875674045
 chr19  4258837 4263299
 chr4   3933004939372715
 chr4   9606881 9610083
 chr15  7264937 7265599
 chr21  6659189 6667015
 chr2   351239  352821
 
 
 I got the following galaxy output:
 
 
 
 7: Extract Genomic DNA on data 6
 empty
 format: fasta, database: galGal3
 Info: 10 warnings, 1st is: Unable to fetch the sequence from '47258168' to 
 '1072' for build 'galGal3'.
 Skipped 10 invalid lines, 1st is #1, chr5 47258168 47259240
 
 Any ideas what I am doing wrong?
 
 Thanks,
 
 Steve
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Question about output CuffDiff SplicingDiff

2011-06-16 Thread Jeremy Goecks
Felix,

You seem to be providing the correct inputs to Cuffdiff and it appears to be 
producing valid output. More information about setting parameter values and 
interpreting Cuffdiff can be found in manual: 
http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff

Good luck,
J.

On Jun 16, 2011, at 8:13 AM, Felix Mayr wrote:

 Hi there,
 
 Looking at the output of the SplicingDiff files of CuffDiff, me and my 
 colleagues are preplexed about the output of the p_values and q_values. We've 
 tried different inputs of different samples to compare but never seem to 
 manage to get p_values smaller than 0.50 and we keep getting higher than 1 
 q_values (also smaller which we expect) which we think is strange too. 
 
 The input files we use for the CuffDiff are the CuffCompare of a combined 
 CuffCompare of a dataset, or the CuffCompare of just the two samples we want 
 to analyse. For the samples input files we use the TopHat files respectively. 
 
 Could you please help us get meaningful results for the SplicingDiff files or 
 help us understand the data? 
 
 The top 5 rows of our typical SplicingDiff file: 
test_id gene_id genelocus sample_1 sample_2
 1583  TSS11905 XLOC_028193- chr5:134910259-134914719   q1   q2
 2385  TSS12870 XLOC_030892-   chr7:29976178-30008608   q1   q2
 8005   TSS6887 XLOC_016656-  chr18:47803031-47807892   q1   q2
 10214  TSS9761 XLOC_022527-  chr20:43128822-43138649   q1   q2
 2818  TSS13383 XLOC_032450- chr8:100899717-100905900   q1   q2
 status value_1 value_2sqrt.JS. test_stat  p_value  q_value
 1583  OK   0   0 0.000771867  0.797878 0.501645 164.5400
 2385  OK   0   0 0.001548470  0.797809 0.505482  82.8991
 8005  OK   0   0 0.003288510  0.797717 0.508184  55.5615
 10214 OK   0   0 0.001414180  0.797620 0.510277  41.8427
 2818  OK   0   0 0.007112780  0.797416 0.513678  33.6973
 
 Thanks in advance for your most appreciated help,
 
 Felix Mayr
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Cufflinks error in galaxy

2011-06-09 Thread Jeremy Goecks
John,

My best guess is that you are using bias correction but do not have the needed 
reference genome(s) for the builds that you want to use. See this page for 
instructions about setting up HTS tools; in particular, you'll need to set up 
the sam_fa_indices.loc file:

https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup

Best,
J. 

On Jun 9, 2011, at 4:44 AM, 吳正華 wrote:

 Hi galaxy dev team:
  
 I just installed galaxy on my ubuntu box and tried to do RNA-seq analysis 
 according to Jeremy's excellent tutorial
  
 http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
  
 however, I encountered following error messages when I was trying to execute 
 Cufflinks in galaxy..
  
 Dataset generation errors
 Dataset 121: Cufflinks on data 92: gene expression
 Tool execution generated the following error message:
 Error running cufflinks. [Errno 2] No such file or directory: 
 'transcripts.gtf'
 The tool produced the following additional output:
 cufflinks v1.0.3
 cufflinks -q --no-update-check -I 30 -F 0.05 -j 0.05 -p 4 -b
  
 how should I solve this problem?
  
 Thanks in advance.
  
  
 Best Regards,
  
 John Wu
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Galaxy Help: Extract sequences from [gtf file] + [genome FASTA file]

2011-05-12 Thread Jeremy Goecks
Edge,

Please send questions like this to the galaxy-user mailing list, where many 
people see your email and can help you and/or benefit from it. I've cc'd the 
list for this reply.

The thread you linked to is out of date. To get sequences for the features in a 
GTF file, you can use the 'Extract Genomic DNA' tool and set the option 
'Interpret features when possible' to Yes. To get sequences for Cufflinks 
transcripts, use the transcripts.gtf as input to the tool.

Best,
J.


On May 12, 2011, at 3:08 AM, Edge Edge wrote:

 
 I just read through the post at the following link, 
 http://lists.bx.psu.edu/pipermail/galaxy-user/2011-February/001934.html
 I'm facing the same problem as well.
 I'm desired to extract out the assembled transcript by Cufflink.
 Can I know that how I link my output file from Tophat and Cufflink with the 
 Galaxy?
 I'm having the following output file right now:
 junctions.bed
 insertions.bed
 deletions.bed
 accepted_hits.bam
 human_reference_genome.fasta
 transcripts.gtf
 isoforms.fpkm_tracking
 genes.fpkm_tracking
 
 Sorry that I got a bit confusing about the explanation that you given to 
 Karen, in order to get the sequence data for transcripts in a Cuff* GTF 
 file, you'll want to select for only exons (use Galaxy's 'Extract Features' 
 tool) and then use the resultant dataset as input to Extract.
 
 Thanks a lot for your advice.
 
 best regards
 edge
 Master Student
 UTAR Malaysia

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] question about Filtering Cufflink files

2011-05-09 Thread Jeremy Goecks
Jagat,

First, a couple housekeeping issues:

(a) the questions you're asking are better suited to the galaxy-user list 
(questions about using Galaxy and performing analyses) rather than galaxy-dev 
(questions about installing Galaxy locally and tool development), so I've moved 
this thread to galaxy-user;

(b) please start new threads when appropriate rather than replying to older 
threads as this makes threads shorter and more focused.

Onto your questions:

 I have another question when  I filter gene  list In the filtered list there 
 are multiple rows per gene. I should have one gene per row? I have attached 
 the snap shot of out put, but not sure if galaxy server will take it or not. 
 I did se the discussion on other forum:
 http://seqanswers.com/forums/showthread.php?t=8830

GTF files have multiple lines per feature, so your output is reasonable.


 which suggest that possible complications in getting one gene per row. My 
 next question is in that scenario what should be the best way of representing 
 one gene per FPKM value? should we take average of FPKM per gene? I think in 
 the gene it is till giving the transcript FPKM value but these values are 
 different from previous file filtered with transcript id.

As Vasu noted, this is an ongoing area of research. For some experiments, it 
may be reasonable to group alternatively-spliced isoforms of the same gene and 
jointly estimate FPKM, and for others it may not. Fortunately, if you do want 
to group transcripts to get gene FPKM values, Cuffdiff does this for you: see 
its gene FPKM expression file.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Filter Tool

2011-05-09 Thread Jeremy Goecks
(Starting new thread on galaxy-user.)

Jagat,

It depends what filter tool you're using and what dataset you're filtering. 
There is a generic filter tool that can be used to filter Cuffdiff tabular 
files for either FPKM values and differential expression tests. There is also a 
tool for filtering GTF files based on a Cuffdiff expr dataset. It sounds like 
you may be confusing either the tools or the inputs.

If after double-checking you're still having problems with filtering, please 
put together a short list of your analysis steps and share your history with 
me, and I can take a look.

Thanks,
J.

 Further to my question, It appear that there is some problem with the filter 
 option:
 When I use the isoform/gene exp file as such it work fine but when I filter 
 these files with either parameter such as status if test was successful or on 
 p value it return me empty file. The way am saving the file is - expr file 
 filter save as txt file and upload back in Galaxy.
 Any suggestion?
  
 Jagat
 
 
 On Tue, May 3, 2011 at 3:08 AM, shamsher jagat kanwar...@gmail.com wrote:
 Jeremy,
  
 I have been trying to follow  the steps in filtering Cufflink out put files 
 you have  described in one of the previous messages 
 (http://gmod.827538.n3.nabble.com/Re-downstream-analysis-of-cuffdiff-out-put-td2836457.html):
  
 I have shared histroy with you, but in summary:
  
 File 35: when Filter GTF data by attributes value list on data 11 (combined 
 GTF) and data 33 (which is gene expr  file) . Will not this should have one 
 gene per row. But it is not?
 
 File 39:  Filter GTF file by attribute value list on data 11 and data 38 
 (Cuffdiff splicing expr) it failed. I would assume that it should filter  on 
 the basis of TSSid . The error message is
 
 Traceback (most recent call last):
   File 
 /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py,
  line 67, in
 filter( gff_file, attribute_name, ids_file, output_file )
   File 
 /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py,
  line 57, in filter
 if attributes[ attribute_name ] in ids_dict:
 KeyError: 'tss_id'
 
 40 : Filter GTF data by attribute list on data 11 and 34 (tss group exp) 
 failed and error message is:
 
 Traceback (most recent call last):
   File 
 /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py,
  line 67, in 
 filter( gff_file, attribute_name, ids_file, output_file )
   File 
 /var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py,
  line 57, in filter
 if attributes[ attribute_name ] in ids_dict:
 KeyError: 'tss_id'
  
 I would consider that if one gene has different Id than there is splicing .
 
 However in contrast isoform file with transcript Id is working fine (File 20)
 
  On a different note can I convert GTF file to txt tab delaminated file I 
 tried to convert file 11 in txt (following Edit attributes) but the file is 
 not properly formatted especially col-pid and TSS id. Am I doing something 
 wrong.
 
 Thanks.
 
  
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] RNA seq analysis

2011-05-07 Thread Jeremy Goecks
Sumathy,

It sounds like you're on the right track. To visualize data for a custom build 
in Trackster, you need to create a custom build and use that in Trackster:

(1) using the top tabs in Galaxy, go to User -- Custom Builds;
(2) add a new build with the length info as follows:
contig_name length

Important note: you'll need to make sure that your contig name matches the one 
used in your fasta file. This is my best guess about what's causing problems 
for you.

(3) Create a Trackster visualization using the custom build and add your 
dataset.

Let us know if you have more questions/problems.

Thanks,
J.

On May 6, 2011, at 10:43 PM, puvan...@umn.edu wrote:

 
 
 Hi
 
 I may be doing in a wrong way. I clicked trackster and I added the custom 
 build genome. Since it is a very small genome (~2kb), I considered this as a 
 single contig. Then I cliked add tracks and added my data file. But I got a 
 message no data for this contig. Whenever I used built in genomes I did not 
 have any problem. I guess I am doing something wrong here.
 
 
 Sumathy
 
 
 
 
 
 
 
 
 
 
 On May 6 2011, Jeremy Goecks wrote:
 
 Sumathy,
 
 What kind of problems are you having with Trackster?
 
 J.
 
 On May 6, 2011, at 8:30 PM, puvan...@umn.edu wrote:
 
 Hello
  I was able to run RNA seq data against a custom build genome. How can I 
 visualize the results. I tried via trackster and unfortunately I couldn't. 
 Can you help me?
 Thanks
 Sumathy
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 http://lists.bx.psu.edu/
 
 
 
 -- 
 Sumathy Puvanendiran
 Graduate student
 
 


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Normalization and plotting of RPKM/FPKM after cufflink

2011-04-20 Thread Jeremy Goecks
Vasu,

Here are the steps to create this visualization; this is relatively new 
functionality, and you'll want to use our test server ( 
http://test.g2.bx.psu.edu/ ) for now.

(1) Create a new visualization from the main menu: Visualization -- New Track 
Browser and choose your genome build.
(2) Add your Cufflinks GTF files to the visualization using the 'Add Tracks' 
button in the upper right of the visualization. (Adding the Tophat reads and/or 
annotation tracks might prove useful as well.)
(3) Zoom in (use the button at the top, double click on a point, or drag to 
select an area on the genome coordinates at the top) until the track's menu has 
the option 'Show filters' and choose this option to show filters.
(4) Once filters are visible, you should be able to drag the slider to 
dynamically filter transcripts.

Here's an example visualization of some mapped Tophat reads and Cufflinks 
transcripts that you can try out:

http://test.g2.bx.psu.edu/u/jeremy.goecks/v/assembly-of-h1-hesc-rna-seq-data

We're continuing to refine and extend this functionality and the Galaxy Track 
Browser in general; questions/comments/suggestions are most welcome.

Best,
J.


On Apr 19, 2011, at 9:28 AM, vasu punj wrote:

 
 Thanks Jeremy,
  
 This appear to be a useful function. Could you please enlist the steps in 
 workflow to achieve  the above visualization or alternatively point me to the 
 URL where it is summarized please. I believe it will take Tophat out put Bam 
 file and fpkm tracking file. I tried  but I dont see   track browser unless i 
 convert to GTF file format. Further if you can point me how to get the slider 
 window function  as shown in snap shot that will be great. Good work Jeremy!
  
 Thanks.
  
 Vasu
  
  
  
 --- On Sun, 4/17/11, Jeremy Goecks jeremy.goe...@emory.edu wrote:
 
 From: Jeremy Goecks jeremy.goe...@emory.edu
 Subject: Re: Normalization an dplotting of RPKM/FPKM after cufflink
 To: vasu punj pu...@yahoo.com
 Cc: galaxy-u...@bx.psu.edu
 Date: Sunday, April 17, 2011, 3:45 PM
 
 Vasu,
 
  I want to include the following discussion in my message regarding use Bam 
  files of Tophat to visualize reads either in IGV or Galaxy or other tools.
   I want to find out if I can plot RPKM/FPKM normalized values
  after running differential analysis in Cufflinks.
 
 Galaxy has a number of tools for analyzing numerical data; look under the 
 menu items Statistics and Graph/Display Data for useful tools. If you're 
 looking to plot FPKM values in addition to mapped reads from Tophat and 
 Cufflinks transcripts, the Galaxy Tracks Browser might prove useful as it has 
 filtering functionality so that you can move a slider to show/hide data based 
 on FPKM values; its often useful to use the sliders for FPKM measures to get 
 a sense of your data. See the attached screenshot for an example.
 
 
 Best,
 J.

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] get wig file after tophat

2011-04-20 Thread Jeremy Goecks
Hi Ying,

You're in luck because I've been working with genome browsers lately, so I 
think I can help you address your problem. What you're looking for is a 
visualization of a coverage histogram for the BAM reads produced by Tophat, 
yes? 

It turns out that some genome browsers provide this automatically as part of 
their solution for visualizing BAM files b/c BAM files tend to be very large 
and hence visualizing aggregated data is often the best solution. Both IGV and 
the Galaxy Trackster Browser support this functionality. I think you'll have to 
do some simple file conversions to get the display you want in IGV; you can 
check out the IGV documentation or perhaps Jim can help. I'm not sure if IGB 
supports this visualization mode for BAM; Ann can chime in with additional 
information.

The Galaxy Track Browser supports coverage histograms when viewing large 
regions. When zoomed in, the reads are typically displayed individually, 
although there is a (very beta) option to create a histogram for the visible 
set of reads; this option may not work well (yet!) as Tophat reads often have 
large gaps.

The top track in this visualization shows a coverage histogram for a set of 
Tophat reads:

http://test.g2.bx.psu.edu/u/jeremy.goecks/v/assembly-of-h1-hesc-rna-seq-data 

Please see my previous email to Vasu for details about setting up a 
visualization in the Galaxy Track Browser.

Best,
J.


 On 4/20/11 5:16 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote:
 
 Dear Ann and Jeremy:
 
 We have this discussion long time ago, and I am sorry that I brought it
 up here
 again. I am just thinking that as Ann said, can we add this tool which 
 convert
 bam into wig file into galaxy? Or make a workflow to generate a wig
 file from a
 bam file generate by tophat? In this way we can just easily get a wig
 file from
 galaxy and will be able to see it in IGB. I know this may seems
 unnecessary for
 the purpose of statistical analysis, but if we can see the coverage with IGB,
 sometimes it is helpful to pick up interesting points quickly for specific
 genes. This may seems a old fashion way but my boss is a big fan of using IGB
 to see expression file(wig or sgr file) and do some analysis. THanks a lot!
 
 BEst
 
 Ying
 
 Quoting Jeremy Goecks jeremy.goe...@emory.edu:
 
 Hi all,
 
 Ann is correct - Tophat does not produce .wig files when run anymore.
 However, it's fairly easy to use Galaxy to make a wiggle-like
 coverage file from a BAM file:
 
 (a) run the pileup tool on your BAM to create a pileup file;
 (b) cut columns 1 and 4 to get your coverage file.
 
 A final note: it's often difficult to visualize coverage files
 because they're so large. You might be better off visualizing the BAM
 file and using the coverage file for statistics.
 
 Best,
 J.
 
 Hello,
 
 I think I know the answer (sort of) to this question.
 
 This may be because newer versions of tophat stopped running the wiggles
 program, which is still part of the tophat distribution and is the program
 that makes the coverage.wig file.
 
 A later version of tophat might bring this back, however - there's a note 
 to
 this effect in the tophat python code.
 
 So if you can run wiggles, you can make the coverage.wig file on your 
 own.
 
 A student here at UNC Charlotte (Adam Baxter) made a few changes to the
 wiggles source code that would allow you to use it with samtools to make 
 a
 coverage.wig file from the accepted_hits.bam file that TopHat creates.
 
 If you (or anyone else) would like a copy, please email Adam, who is cc'ed
 on this email.
 
 We would be happy to help add it to Galaxy if this would be of interest to
 you or other Galaxy users.
 
 If there is any way we can be of assistance, please let us know!
 
 Very best wishes,
 
 Ann Loraine
 
 
 On 2/21/11 3:39 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote:
 
 Hi:
 
 I am using tophat in galaxy to analyze my paired-end RNA-seq data
 and find out
 that after the tophat analysis, we can not get the wig file from it 
 anymore
 which is used to be able to. Do you have any idea of how to still
 be able to
 get the wig file after tophat analysis? Thanks a lot!
 
 Best
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 --
 Ann Loraine
 Associate Professor
 Dept. of Bioinformatics and Genomics, UNCC
 North Carolina Research Campus
 600 Laureate Way
 Kannapolis, NC 28081
 704-250-5750

Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-04-18 Thread Jeremy Goecks

Vasu,

Please reply to the mailing list as emails to individual Galaxy  
developers often get lost, and there are others on the list that might  
be able to help you or benefit from this discussion.


Now, to your question: you're using the wrong GFF filtering tool,  
which is an easy mistake to make as there are many of them. You want  
to use Filter and Sort -- GFF -- Filter GTF data by attribute values  
list. Using this tool, I was able to filter dataset 11--a GTF file  
produced by Cuffcompare--using a Cuffdiff isoform expression file  
(dataset 10) on transcript_id. I've shared the modified history with  
you.


Best,
J.

On Apr 18, 2011, at 3:29 PM, vasu punj wrote:


Hi Jeremy,

I have been trying to use the tool mentioned in this message.
I have two samples comparison  6 and 5 and has run Cufflink/  
Cuffcompare/ Cuffdiff. I have filtred the files for c12 i.e for  
significant analysis and file is 10 uploaded as B_A Cuffdiff isoform  
expr filtered.txt
I uploaded the second file 11  
B_A_Homo_sapiens.GRCh37.60.clean.combined which is a combined GTF  
file generated by cuffcompare.


When I tried to run filter combined transcript file using:
Combined GTF as Cufflink assembled transcripts (11)  and Cuffcompare  
tracking file as Cuffdiff isoform exp filtered file  using sample no  
as 2,  it return an empty file (12)
Than thinking that perhaps it may be tracking file which I may have  
to use instead of combined GTF.
I used B-A combined tracking file in place of combined GTF file but  
it will pop up only in Cuffcompare tracking file It may not be right  
but I used File 13 as
 tracking file with combined GTF as as assembled transcript still it  
return empty out put

I have also shared history with you.

Would you like to point me what is going on here?
Thanks.


Vasu
--- On Mon, 4/11/11, Jeremy Goecks jeremy.goe...@emory.edu wrote:

From: Jeremy Goecks jeremy.goe...@emory.edu
Subject: Re: [galaxy-user] downstream analysis of cuffdiff out put
To: shamsher jagat kanwar...@gmail.com
Cc: galaxy-user galaxy-user@lists.bx.psu.edu
Date: Monday, April 11, 2011, 9:04 AM

On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:

Jagat,
Just like any mRNA-seq experiment to achieve following objectives:
1.   Reconstruct  all transcripts of a particular gene and  
corresponding Cuffdiff  significantly expressed transcripts as  
called by cuffdiff.

2.   What are different isoforms
3.   Location of splicing

From various output files which unique ID can be matched  from one  
file say Cuffdiff.expr (transcript/ isoform/Splicing)  to  other  
file - transcript.gtf  corresponding to each sample or combined  
GTF file.
I've got a script that does this for the cuffdiff isoform  
expression testing file and a GTF file; I'll wrap it up and add it  
to Galaxy in the next couple weeks. It would probably be useful to  
have similar scripts for the other expression testing files as  
well. Also, it would be nice to be able to take the FPKM values  
generated by Cuffdiff and attach them to their respective  
transcripts as attributes.


Hello all,

I've added a tool called 'Filter GTF file by attribute values list'  
to the galaxy-central code repository. This tool is available on our  
test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort --  
GFF -- Filter GTF data by attribute values list and will be  
available on our main server in the next few weeks.


As expected, this tool filters a GTF file based on a list of  
attribute values--or filters using a tabular file where attribute  
values are first column, as is the case for Cuffdiff output files.  
Potential attributes that can be filtered on include transcript_id,  
gene_id, tss_id, and p_id; conveniently, these are the IDs that  
Cuffdiff uses in its output files.


Here's an example workflow:

(1) Run Cufflinks/compare/diff
(2) Filter Cufflinks isoform differential expression file for  
transcripts that are differentially expressed; in other words,  
filter for c12=='yes'
(2) Use 'Filter GTF data by attribute values list' to filter  
Cuffcompare combined transcripts using the filtered file from step  
(2) as the attribute values list and, voila, you have a GTF file of  
the differentially expressed transcripts that you can view in your  
favorite genome browser.


Hope this helps; feedback is always welcome.

Best,
J.

-Inline Attachment Follows-

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu

Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Jeremy Goecks
 Now why does a tool search on the public Galaxy instance for GC
 not suggest this tool?
 
 Name: geecee
 Description: Calculates fractional GC content of nucleic acid sequences
 
 Does this mean the description isn't searched? It would seem like
 a sensible idea to me to include that...
 
 Searching for geecee works, but unless you're familiar with this
 EMBOSS tool no-one will think of that.


Peter,

The tool search doesn't start until you type in three characters, so typing 
'GC' does not initiate a search. Typing 'gcspace' or 'gc content' works. 
Perhaps a tooltip or help text is needed.

J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-04-11 Thread Jeremy Goecks
 On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Jagat,
 Just like any mRNA-seq experiment to achieve following objectives:
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - 
 transcript.gtf  corresponding to each sample or combined GTF file.
 
 I've got a script that does this for the cuffdiff isoform expression testing 
 file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
 weeks. It would probably be useful to have similar scripts for the other 
 expression testing files as well. Also, it would be nice to be able to take 
 the FPKM values generated by Cuffdiff and attach them to their respective 
 transcripts as attributes.

Hello all,

I've added a tool called 'Filter GTF file by attribute values list' to the 
galaxy-central code repository. This tool is available on our test server ( 
http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by 
attribute values list and will be available on our main server in the next few 
weeks.

As expected, this tool filters a GTF file based on a list of attribute 
values--or filters using a tabular file where attribute values are first 
column, as is the case for Cuffdiff output files. Potential attributes that can 
be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, 
these are the IDs that Cuffdiff uses in its output files. 

Here's an example workflow:

(1) Run Cufflinks/compare/diff
(2) Filter Cufflinks isoform differential expression file for transcripts that 
are differentially expressed; in other words, filter for c12=='yes'
(2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare 
combined transcripts using the filtered file from step (2) as the attribute 
values list and, voila, you have a GTF file of the differentially expressed 
transcripts that you can view in your favorite genome browser.

Hope this helps; feedback is always welcome.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] RNA seq analysis and GTF files

2011-04-07 Thread Jeremy Goecks
David, can you please share your history with me and I'll take a look  
(History Options -- Share/Publish -- Share with User -- my email?


Thanks,
J.

On Apr 7, 2011, at 3:23 PM, David K Crossman wrote:


Hello!

I would like to ask a question related to this  
thread below.  I ran into the same issues as below and was unaware  
of having to swap some columns around in the GTF file.  So, after  
'swapping the gene name from the complete table (name2 value, column  
12) into the GFT file's gene_id value (which by default is the same  
as transcript_id), I uploaded this patched file (mm9) into Galaxy  
and ran Cufflinks, CuffCompare and CuffDiff using this patched GTF  
file as the reference annotation.  For both Cufflinks and  
CuffCompare, the gene_id was present in their respective columns.   
The problem I have encountered now is that in all of the output  
files in CuffDiff, the gene_id column is blank (contains a -;  
highlighted in yellow below).  This example is from the CuffDiff  
gene expression output file:


test_id
gene
locus
sample_1
sample_2
status
value_1
value_2
ln(fold_change)
test_stat
p_value
significant
XLOC_01
-
chr1:4797973-4836816
q1
q2
OK
73.1908
82.1567
0.115559
-0.71896
0.472168
no
XLOC_02
-
chr1:4847774-4887990
q1
q2
OK
81.7264
53.1165
-0.43089
2.44474
0.014496
no
XLOC_03
-
chr1:5073253-5152630
q1
q2
OK
408.289
333.749
-0.20159
2.73173
0.0063
no
XLOC_04
-
chr1:5578573-5596214
q1
q2
NOTEST
2.34764
4.79772
0.71473
-0.89735
0.369532
no

What am I doing wrong?  I am interested in the  
differentially expressed genes in this RNA-Seq dataset (as well as  
calling variants, which is my next step, but want to get this  
answered first before moving on).  Any info, suggestions or help  
would be greatly appreciated.


Thanks,
David


-Original Message-
From: galaxy-user-boun...@lists.bx.psu.edu [mailto:galaxy-user-boun...@lists.bx.psu.edu 
] On Behalf Of Jeremy Goecks

Sent: Friday, April 01, 2011 8:47 AM
To: ssa...@ccib.mgh.harvard.edu
Cc: galaxy-user
Subject: Re: [galaxy-user] RNA seq analysis and GTF files



On Mar 31, 2011, at 12:30 PM, ssa...@ccib.mgh.harvard.edu ssa...@ccib.mgh.harvard.edu 
 wrote:


 Hi Jeremy,
 I used your exercise to perform an RNA-seq analysis. First I  
encountered a problem where the gene IDs were missing from the  
results. Jen from the Galaxy team suggested this:


 Yes, the team has taken a look and there are a few things going on.

 The first is that when running the Cuffcompare program, a  
reference annotation file in GTF format should be used in order to  
obtain the same results as in Jeremy's exercise. This seemed to be  
missing from your runs, which resulted in badly formatted output  
that later resulted in a poor result when Cuffdiff was used.


 The second has to do with the reference GTF file itself. For the  
best results, the GTF file must have the gene_id attribute defined  
in the 9th column of the file and the chromosome names must be in  
the same format as the genome native to Galaxy. Depending on the  
source of the reference GTF, one of these may need to be adjusted.  
Chromosome names can be adjusted using Galaxy's Text Manipulation  
tools. The gene_id attribute would need to be adjusted prior to  
loading into Galaxy.


 For mm9, using the Get Data - UCSC Main table browser tool can  
help you to obtain all of the raw data necessary to create a  
complete GTF file with a gene_id identifier. Extract data from the  
track RefSeq Genes and output the primary data table refGene  
twice - first in GTF format, then again as the complete table in  
tabular format (not BED). Then, using your own tools, swap in the  
gene name from the complete table (name2 value, column 12) into the  
GTF file's gene_id value (which by default is the same as  
transcript_id). Upload and the tools will function as intended.


 The team is aware of the issues associated with GTF source files  
and is discussing solutions. Any changes to native data content will  
be reported to the mailing list in a News Brief or other  
communications.


 Our apologies for the inconvenience! Thanks for using Galaxy and
 please let us know if we can help again,

 Best,

 Jen
 Galaxy team


 I followed the directions (or at least I think I did) and things  
seemed to work better but there is one more issue for example in file:
 Galaxy287- 
[Cuffdiff_on_data_197,_data_197,_and_data_274__isoform_FPKM_

 tracking].tabular.txt The column gene_short_name does not have any
 names in it. nearest_ref_id does have the gene ID info so I can  
still interpret the data, but I was wondering if there remains  
another problem that I'm not aware of with the GTF file.


Slim,

Please send questions to the galaxy-user mailing list (cc'd) rather  
than individual Galaxy team members; there are many people on the  
list that may be able to address your question, and discussions are  
archived for future use as well. Without seeing your

Re: [galaxy-user] RNA seq analysis and GTF files

2011-04-01 Thread Jeremy Goecks


On Mar 31, 2011, at 12:30 PM, ssa...@ccib.mgh.harvard.edu 
ssa...@ccib.mgh.harvard.edu wrote:

 Hi Jeremy, 
 I used your exercise to perform an RNA-seq analysis. First I encountered a 
 problem where the gene IDs were missing from the results. Jen from the Galaxy 
 team suggested this:  
 
 Yes, the team has taken a look and there are a few things going on.
 
 The first is that when running the Cuffcompare program, a reference 
 annotation file in GTF format should be used in order to obtain the same 
 results as in Jeremy's exercise. This seemed to be missing from your runs, 
 which resulted in badly formatted output that later resulted in a poor result 
 when Cuffdiff was used.
 
 The second has to do with the reference GTF file itself. For the best 
 results, the GTF file must have the gene_id attribute defined in the 9th 
 column of the file and the chromosome names must be in the same format as the 
 genome native to Galaxy. Depending on the source of the reference GTF, one of 
 these may need to be adjusted. Chromosome names can be adjusted using 
 Galaxy's Text Manipulation tools. The gene_id attribute would need to be 
 adjusted prior to loading into Galaxy.
 
 For mm9, using the Get Data - UCSC Main table browser tool can help you to 
 obtain all of the raw data necessary to create a complete GTF file with a 
 gene_id identifier. Extract data from the track RefSeq Genes and output the 
 primary data table refGene twice - first in GTF format, then again as the 
 complete table in tabular format (not BED). Then, using your own tools, swap 
 in the gene name from the complete table (name2 value, column 12) into the 
 GTF file's gene_id value (which by default is the same as transcript_id). 
 Upload and the tools will function as intended.
 
 The team is aware of the issues associated with GTF source files and is 
 discussing solutions. Any changes to native data content will be reported to 
 the mailing list in a News Brief or other communications.
 
 Our apologies for the inconvenience! Thanks for using Galaxy and please let 
 us know if we can help again,
 
 Best,
 
 Jen
 Galaxy team
 
 
 I followed the directions (or at least I think I did) and things seemed to 
 work better but there is one more issue for example in file:
 Galaxy287-[Cuffdiff_on_data_197,_data_197,_and_data_274__isoform_FPKM_tracking].tabular.txt
 The column gene_short_name does not have any names in it. nearest_ref_id does 
 have the gene ID info so I can still interpret the data, but I was wondering 
 if there remains another problem that I'm not aware of with the GTF file.

Slim,

Please send questions to the galaxy-user mailing list (cc'd) rather than 
individual Galaxy team members; there are many people on the list that may be 
able to address your question, and discussions are archived for future use as 
well. Without seeing your analysis, I'd suggest trying two things:

(1) Provide gene annotation reference file to Cufflinks as well as Cuffcompare 
and Cuffdiff; in other words, you'll want to do guided assembly.
(2) Try using an Ensembl GTF, which has the gene name in the attributes.

I think (2) is more likely to generate the results you want, but there are the 
many known problems in using Ensembl GTFs with Cufflinks/compare/diff.

Good luck,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Trouble with RNAseq analysis

2011-03-30 Thread Jeremy Goecks
Cristian,

Please share your history with me (History Options -- Share/Publish -- Share 
with User -- my email) and I'll take a look.

Thanks,
J.

On Mar 30, 2011, at 10:48 AM, Cristian Rojas wrote:

 Hi everybody, I am trying to analyze the differential expression between two 
 RNAseq samples. But I found many troubles aligning my reads. I will describe 
 what I did. First I groomed the FastQ files (2). Then I uploaded the Sorghum 
 genome and aligned the reads to it with Tophat. Aftter that, I tried to use 
 Cufflink with the BAM file of Tophat, using as annotation file an uploaded 
 GTF 
 file and the Sorghum genome, but I received an error message in the three 
 outputs of Cufflink. I tried to align against new brand Maize genome (now at 
 Galaxy), and the same messages. I also converted the BAM file to SAM, but the 
 same. Any advice? What was wrong?
 Thanks in advance.
 Cristian
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Trouble with RNAseq analysis

2011-03-30 Thread Jeremy Goecks
Cristian,

The  is a formatting character; what needs to match is the string after the 
 in the genome file and the entries in the contig column of your GTF. Your 
GTF is quite different that your genome file; your genome file has 10 contigs 
labeled by number, but your GTF has many, many contig names labelled by numbers 
and names.

For Cufflinks to work, you can either (a) turn off bias correction or (b) 
restrict entries in your GTF to those that match your reference genome. 

Finally, please reply all to emails so that all emails remain on list for 
archival and community purposes.

Thanks,
J. 

On Mar 30, 2011, at 12:02 PM, Cristian Rojas wrote:

 Thanks Jeremy. But in genomes fasta files very often any chromosome 
 represents a 
 sequence followed by . Then, it is no possible match contig names in GTF 
 with 
  names in Genome fasta. What must I do?
 Cristian
 
 
 
 - Mensaje original 
 De: Jeremy Goecks jeremy.goe...@emory.edu
 Para: Cristian Rojas cristianroja...@yahoo.com.ar
 CC: galaxy-user@lists.bx.psu.edu
 Enviado: miércoles, 30 de marzo, 2011 12:53:50
 Asunto: Re: [galaxy-user] Trouble with RNAseq analysis
 
 Cristian,
 
 The contig names in your GTF file don't match those in your reference (fasta) 
 file. In order for Cufflinks to use a reference GTF, its contigs names must 
 match those in your reference genome.
 
 Best,
 J.
 
 On Mar 30, 2011, at 11:31 AM, Cristian Rojas wrote:
 
 Thanks Jeremy. I did it.
 Cristian
 
 
 
 - Mensaje original 
 De: Jeremy Goecks jeremy.goe...@emory.edu
 Para: Cristian Rojas cristianroja...@yahoo.com.ar
 CC: galaxy-user@lists.bx.psu.edu
 Enviado: miércoles, 30 de marzo, 2011 12:02:47
 Asunto: Re: [galaxy-user] Trouble with RNAseq analysis
 
 Cristian,
 
 Please share your history with me (History Options -- Share/Publish -- 
 Share 
 
 with User -- my email) and I'll take a look.
 
 Thanks,
 J.
 
 On Mar 30, 2011, at 10:48 AM, Cristian Rojas wrote:
 
 Hi everybody, I am trying to analyze the differential expression between 
 two 
 RNAseq samples. But I found many troubles aligning my reads. I will 
 describe 
 what I did. First I groomed the FastQ files (2). Then I uploaded the 
 Sorghum 
 genome and aligned the reads to it with Tophat. Aftter that, I tried to use 
 Cufflink with the BAM file of Tophat, using as annotation file an uploaded 
 GTF 
 
 
 file and the Sorghum genome, but I received an error message in the three 
 outputs of Cufflink. I tried to align against new brand Maize genome (now 
 at 
 Galaxy), and the same messages. I also converted the BAM file to SAM, but 
 the 
 
 same. Any advice? What was wrong?
 Thanks in advance.
 Cristian
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Trouble with RNAseq analysis

2011-03-30 Thread Jeremy Goecks


I tried agaian and the same problem. I tuned off the bias correction  
but
mantained the GFT file. May be this is the problem? I didnt find  
your history.

Thanks



Look for the history I've shared in History Options -- Histories  
Shared with Me. As requested, if you're still having problems, please  
report the problematic dataset by clicking on the bug icon.


Thanks,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] GTF-to-GFF3

2011-03-22 Thread Jeremy Goecks
Karen,

Sorry for the slow reply. There are no immediate plans to add either 
BED-to-GFF3 or GTF-to-GFF3 converters to Galaxy main or the Galaxy codebase.

However, if you're working with your own Galaxy, you might encourage the Rätsch 
lab to contribute their tools to the Galaxy Tool Shed 
(http://community.g2.bx.psu.edu/); you could then download them from there and 
install them in your own Galaxy. Alternatively, we welcome community 
contributions to the Galaxy codebase, and we'd be happy to incorporate these 
tools if they came with functional tests and test data.

Best,
J. 

On Mar 7, 2011, at 11:34 AM, Karen Tang wrote:

 Hi Galaxy developers,
 
 Any plans on adding a GTF-to-GFF3 format conversion?
 
 This converter is at the Rätsch lab's instance of Galaxy 
 (http://galaxy.tuebingen.mpg.de/). Could it be made more
 widely available?
 
 Karen :)
 Dept of Plant Biology
 University of 
 Minnesota___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-03-10 Thread Jeremy Goecks
Jagat,

Please send queries such as these to the galaxy-user mailing list (cc'd); there 
are many users on the list who can contribute to this discussion, and there are 
many additional users that will benefit from this discussion.

 I was wondering if you can point me to a documentation or URL to guide how to 
 perform the downstream analysis once we have cuffdiff out put.

In general, I agree that tools are needed to further process 
cufflinks/compare/diff outputs, but I'm not aware of any that are publicly 
available. Let's open this issue up for discussion and see if we can reach a 
consensus about tools might be useful. Everyone, please feel free to contribute 
ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools 
you've built for Galaxy:

http://community.g2.bx.psu.edu/

 Just like any mRNA-seq experiment to achieve following objectives:
 
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - transcript.gtf 
  corresponding to each sample or combined GTF file.
 
I've got a script that does this for the cuffdiff isoform expression testing 
file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
weeks. It would probably be useful to have similar scripts for the other 
expression testing files as well. Also, it would be nice to be able to take the 
FPKM values generated by Cuffdiff and attach them to their respective 
transcripts as attributes.

Best,
J. 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

  1   2   >