Re: [galaxy-dev] Tophat problem

2013-05-20 Thread Zain A Alvi
Hi Jen,

Thank you for the information regarding the FastQ information.  It was really 
helpful.

Lately, I have been getting the following error: Error getting history update 
from this server- Bad Gateway. This occurred after I tried to reupload some 
pre-aligned/ and indexed BAM files from NCBI GEO because I was hoping to 
generate and retrieve FPKM/RPKM values from them.

Unfortunately, the my old files are still not available on Galaxy and I get an 
Internal Server Error when trying to retrieve them.  Although I can get the 
work flow for them.

The last weird error is that when I use Cuffdiff, I get FPKM of 0 with p/q 
values of 1 all the time. When this should not be the case as the BAM files are 
from two different organs. This is for every single gene, hence this indicates 
that something is wrong. I was able to retrieve the GTF file from UCSC main 
with the following settings:

Insect - D. pseuddobscura
Group - Genes and Gene Prediction Tracks
Track: Flybase
Table FlybaseGene
Output format: GTF.

I was wondering should these setting be fine or should I change the Group to 
mRNA or some other settings. Although the one that is avilable on UCSC is old 
dp3 file from 2004. The latest GFF is 3.1 on Flybase. I was wondering anyway to 
convert to a GTF file.

Sorry for so many questions. Thank you again for the great help.

Sincerely,

Zain



From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Tuesday, May 07, 2013 3:21 PM
To: Zain A Alvi
Cc: galaxy-...@bx.psu.edu
Subject: Re: [galaxy-dev] Tophat problem

Hi Zain,

I believe we already worked out the .fastqsanger/grooming part of this question 
in another thread. But for others reading this post, this is a help link:
See FASTQ
http://wiki.galaxyproject.org/Support#Dataset_special_cases

Our RNA-exercise covers and example workflow:
https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

Best,

Jen
Galaxy team

On 5/3/13 8:59 PM, Zain A Alvi wrote:
Dear Sir or Madam,

I hope this reaches you well. Lately, I have been trying to use tophat and then 
use bowtie on Galaxy project to create an aligned BAM file. The original data 
came from a SRA file that I have acquired from the Japanese DNA Databank. This 
SRA was then converted to FASTQ using the tools available on Galaxy project. 
Now when I go under Tophat on Galaxy Project, I am unable to select the 
converted RNA-Seq FASTQ file. I was wondering, is there a specific format for 
the file to be in. Currently it is just a *.fastq file.  I am confused as to 
why I am not being able to select the FASTQ file.

Also if there is a guide on how to use Galaxy Project to create an aligned BAM 
file and then check for expression through Cufflinks package. I would really 
appreciate it.

Sincerely,

Zain



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Tophat problem

2013-05-20 Thread Jennifer Jackson

Hi Zain,

On 5/19/13 1:35 PM, Zain A Alvi wrote:

Hi Jen,

Thank you for the information regarding the FastQ information. It was 
really helpful.


Lately, I have been getting the following error: Error getting 
history update from this server- Bad Gateway. This occurred after I 
tried to reupload some pre-aligned/ and indexed BAM files from NCBI 
GEO because I was hoping to generate and retrieve FPKM/RPKM values 
from them.

This has now been resolved, very sorry for the confusion it caused.


Unfortunately, the my old files are still not available on Galaxy and 
I get an Internal Server Error when trying to retrieve them.  Although 
I can get the work flow for them.

Same, resolved now.


The last weird error is that when I use Cuffdiff, I get FPKM of 0 with 
p/q values of 1 all the time. When this should not be the case as the 
BAM files are from two different organs. This is for every single 
gene, hence this indicates that something is wrong. I was able to 
retrieve the GTF file from UCSC main with the following settings:


Insect - D. pseuddobscura
Group - Genes and Gene Prediction Tracks
Track: Flybase
Table FlybaseGene
Output format: GTF.

I was wondering should these setting be fine or should I change the 
Group to mRNA or some other settings. Although the one that is 
avilable on UCSC is old dp3 file from 2004. The latest GFF is 3.1 on 
Flybase. I was wondering anyway to convert to a GTF file.
I can't recommend a conversion tool, but there are a few on the web that 
could be tested out, if you decide to go that route. I do know that 
certain GFF3 files directly from FLYBASE have been problematic with the 
RNA-seq tools due to duplicated ID attributes. I don't know if this is 
all versions or not, or just the dm3 version. That said, the issue has 
been isolated to a few records (a gene mapping to 1 location), and 
there isn't any reason why you shouldn't test out the /D. pseuddobscura/ 
version and then adjust it, if needed.


The GTF file from the UCSC Table browser is correct, but Cuffdiff is 
looking for attributes that this version of the file does not have. If 
you look at the 9th field of the file to examine these attributes and 
compare it to the Cuffdiff input documentation, you can see how these 
differ. The gene_id and transcript_id are the same value and other 
attributes are not present such as tss_id and p_id. There is nothing 
wrong with the file, but without these attributes populated a particular 
way, certain calculations will not be done.

http://cufflinks.cbcb.umd.edu/manual.html

These variations are just different projects following a slightly 
different file specification. Some are content variations, some are 
format variations. This is common with this file type family (GFF, GTF, 
GFF3). This is why iGenomes creates files specifically for certain 
genomes for use with this tool set.


When you do obtain a file that has the format and content you want to 
use, double check that the chromosome names are *exactly* the same 
between the reference genome, Tophat output, and GTF or GFF3 file. 
Mismatches can also lead to calculations being missed.

http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server

iGenomes did not produce a file for fruit fly, but you could request one 
from them. This is where they publish the data for other genomes, and 
there is a link to the project at the top of the page:

http://cufflinks.cbcb.umd.edu/igenomes.html

Good luck with your project,

Jen
Galaxy team


Sorry for so many questions. Thank you again for the great help.

Sincerely,

Zain



--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Tophat problem

2013-05-07 Thread Jennifer Jackson

Hi Zain,

I believe we already worked out the .fastqsanger/grooming part of this 
question in another thread. But for others reading this post, this is a 
help link:

See FASTQ
http://wiki.galaxyproject.org/Support#Dataset_special_cases

Our RNA-exercise covers and example workflow:
https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

Best,

Jen
Galaxy team

On 5/3/13 8:59 PM, Zain A Alvi wrote:

Dear Sir or Madam,

I hope this reaches you well. Lately, I have been trying to use tophat 
and then use bowtie on Galaxy project to create an aligned BAM file. 
The original data came from a SRA file that I have acquired from the 
Japanese DNA Databank. This SRA was then converted to FASTQ using the 
tools available on Galaxy project. Now when I go under Tophat on 
Galaxy Project, I am unable to select the converted RNA-Seq FASTQ 
file. I was wondering, is there a specific format for the file to be 
in. Currently it is just a *.fastq file.  I am confused as to why I am 
not being able to select the FASTQ file.


Also if there is a guide on how to use Galaxy Project to create an 
aligned BAM file and then check for expression through Cufflinks 
package. I would really appreciate it.


Sincerely,

Zain


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Tophat problem

2013-05-06 Thread Zain A Alvi
Dear Sir or Madam,

I hope this reaches you well. Lately, I have been trying to use tophat and then 
use bowtie on Galaxy project to create an aligned BAM file. The original data 
came from a SRA file that I have acquired from the Japanese DNA Databank. This 
SRA was then converted to FASTQ using the tools available on Galaxy project. 
Now when I go under Tophat on Galaxy Project, I am unable to select the 
converted RNA-Seq FASTQ file. I was wondering, is there a specific format for 
the file to be in. Currently it is just a *.fastq file.  I am confused as to 
why I am not being able to select the FASTQ file.

Also if there is a guide on how to use Galaxy Project to create an aligned BAM 
file and then check for expression through Cufflinks package. I would really 
appreciate it.

Sincerely,

Zain
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/