[galaxy-user] help for alternative splicing with RNA-seq analysis

2012-08-09 Thread Du, Jianguang
I have RNA-seq datasets of several cell types. I want to compare alternative splicing events between diffrent cell types. Can anyone show me the protocol/workflow or direct me to the tutorial? Thanks. Jianguang ___ The Galaxy User list

[galaxy-user] how to split paired-end dataset of FASTQ format

2012-08-09 Thread Du, Jianguang
I downloaded RNA-seq dataset at FASTQ format from SRA of NCBI. I uploaded the dataset onto Galaxy. The dataset is paired-end. I want to split it into two datasets (one for each end) with FASTQ splitter. But the name of the dataset does not appear under FASTQ reads. How should I do to solve this

[galaxy-user] FASTQ splitter produced empty dataset, please help

2012-08-10 Thread Du, Jianguang
I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly, I list the detail of what I did with my dataset: Step 1) My aim is to compare datasets for the differential alternative splicing. I downloaded paired-end datasets at FASTQ

[galaxy-user] (no subject)

2012-08-10 Thread Du, Jianguang
dataset into two datasets, how should I choose the settings when I run Manipulte FASTQ? Thanks. Jianguang / On 8/10/12 7:21 AM, Du, Jianguang wrote: I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly

[galaxy-user] need help to split paired-end dataset

2012-08-10 Thread Du, Jianguang
dataset into two datasets, how should I choose the settings when I run Manipulte FASTQ? Thanks. Jianguang / On 8/10/12 7:21 AM, Du, Jianguang wrote: I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly

[galaxy-user] which reference genome should I select

2012-08-14 Thread Du, Jianguang
Dear All, I am going to run Tophat with mouse RNA-seq datasets. When I uploaded the datasets with URL method, I chose Mouse July 2007 (NCBI37/mm9) (mm9) under Genome. So database: mm9 appears in the brief description of each dataset in history. My question is: when I run Tophat, under Will

[galaxy-user] whixh setting should be used to upload mouse reference genome?

2012-08-14 Thread Du, Jianguang
Dear All, I am going to search the alternative splicing events bentween datasets. I am not sure about the settings of mouse reference genome (mm9) when I upload it from UCSC Main. Would you please tell me the settings for 1) group: 2) Track: 3) Table: 4) Output format: Thanks.

[galaxy-user] How to decide Mean Inner Distance between Mate Pairs?

2012-08-15 Thread Du, Jianguang
Dear All, I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets. Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: Layout: PAIRED,

[galaxy-user] Do I need to allow indel search?

2012-08-15 Thread Du, Jianguang
Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA-seq datasets. Do I need to allow indel search when I run Tophat? What is the indel search for? I could not find detail information about indel search through the documentation of Tophat. Thanks. Jianguang Du

[galaxy-user] Use Own Junctions or not

2012-08-15 Thread Du, Jianguang
Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA-seq datasets. Should I use own junctions when I run Tophat? What does Own Junctions mean? Thanks. Jianguang DU ___ The Galaxy User list should be used for the

[galaxy-user] Minimum length of read segments

2012-08-16 Thread Du, Jianguang
Dear All, I am going to run Tophat with RNA-seq dataset to observe alternative splicing events. There is a parameter for Tophat: Minimum length of read segment. According to implemented Tophat options, the description for Minimum length of read segment is Each read is cut up into segments,

[galaxy-user] run Bowtie to estimate Mean Inner Distance between Mate Pairs

2012-08-16 Thread Du, Jianguang
Dear All, In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put

[galaxy-user] How to find the alternatively spliced segment of genes in Cuffdiff output

2012-08-21 Thread Du, Jianguang
Dear All, I have run programs from Tophat to Cuffdiff of Galaxy to look for the difference in alternative splicing events between cell types. However I do not know how to find the detail information (such as the sequence and the genomic coordinates) of the alternatively spliced part of a

Re: [galaxy-user] run Bowtie to estimate Mean Inner Distance between Mate Pairs

2012-08-21 Thread Du, Jianguang
Hi All, Thank you for your help. I understand how to do now. Jianguang From: rshar...@bx.psu.edu [rshar...@bx.psu.edu] Sent: Tuesday, August 21, 2012 11:15 AM To: galaxy-user@lists.bx.psu.edu Cc: Du, Jianguang Subject: Re: [galaxy-user] run Bowtie

[galaxy-user] How much can I trimm my reads

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for the differential splicing events between cell types. My reads are 36bp long. In order to increase the quality of reads, I need to trim some nucleotides from ends. How many nucleotides can I trim? I am afraid that if I trim too much, the reliability

[galaxy-user] What is the minimum Quality should I set for Filter FASTQ?

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. Some of my reads contain bed nucleotides, should I run Filter FASTQ to remove these not so good reads? If I do need to, what is the Minimum Quality should I set for the Filter? Thanks. Jianguang

[galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. These are mouse cells. Jen suggested me to use the iGenomes version of reference GTF to take full advantage of the options in CuffDiff. My question is: should I use this iGenome version reference GTF

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
From: Jennifer Jackson [j...@bx.psu.edu] Sent: Thursday, August 23, 2012 11:46 AM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat? Hello Jianguang, When in the analysis

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Hi Jen, Thank you very much for your help. Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Thursday, August 23, 2012 3:53 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Use a built-in index. How can I solve this problem? Thanks in advance. Jianguang From: galaxy-user-boun...@lists.bx.psu.edu [galaxy-user-boun...@lists.bx.psu.edu] on behalf of Du, Jianguang [jia...@iupui.edu] Sent: Thursday, August 23, 2012 4:01 PM

[galaxy-user] Is tss_id unique for each transcript?

2012-08-24 Thread Du, Jianguang
Dear All, I am reading and comparing the outputs of Cuffdiff. I found there is a tss_id column in transcript FPKM tracking, gene FPKM tracking, TSS groups FPKM tracking and CDS FPKM tracking files. In this column, each line has a unique label such as TSS1001. This kind of labels also appear

[galaxy-user] Is there web-based CummeRbund?

2012-08-24 Thread Du, Jianguang
Dear All, I am going to visualize Cuffdiff outputs. I understand that CummeRbund can be used to visualize Cuffdiff outputs. However, I am not good at Linux system and feel difficult to understand CummeRbund manual. Is there web-based CummeRbund program (like Tophat and Cufflink) available for

[galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Du, Jianguang
Dear All, I ran Flagstat under NGS: SAM Tools to check the quality of the Tophat output (the file of accepted hits). I got the diagnosis results as follow: 9471730 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 9471730 + 0 mapped (100.00%:-nan%) 0 + 0 paired in sequencing 0

Re: [galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Du, Jianguang
datasets, and single-end setting for the single-end datasets. And then ran Cufflink, Cuffmerge, and Cuffdiff. Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Monday, August 27, 2012 12:36 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu

[galaxy-user] Please help with the settings for Cufflink, Cuffmerge and Cuffdiff

2012-08-27 Thread Du, Jianguang
Dear All, I am looking for the differential splicing events between cell types. Although I got a lot of helps from Jen and from protocols found online, I am still not sure about some settings for Cufflink, Cuffmerge and Cuffdiff. 1) For Cufflink: There is a setting for Bias Correction. I made

[galaxy-user] How to decide if the deference is significant

2012-08-27 Thread Du, Jianguang
Dear All, I am looking for the deferential splicing events between cell types. I have run the Cuffdiff and I am going through the output file splicing differential expression testing. I have read the documentation and protocols about how Cuffdiff test for differential expression and

[galaxy-user] Should I use raw junction and Only look for supplied junctions

2012-08-28 Thread Du, Jianguang
Dear All, I have two more questions about settings for Tophat. My aim is to look for the defferential splicing events between cell types. After I checked Use Own Junctions, three more options came out: 1) Use Gene Annotation Model 2) Use raw Junctions 3) Only look for supplied junctions

[galaxy-user] Please help to understand the square root of Jensen-Shannon divergence

2012-09-04 Thread Du, Jianguang
Dear All, I am looking for the differential splicing events between cell types. However the Cuffdiff gives output using the square root of Jensen-shannon divergence to measure the difference. Although I tried my best to understand the definition of the square root of Jensen-shannon

[galaxy-user] Tophat settings

2012-09-06 Thread Du, Jianguang
Dear All, I am not so sure about two Tophat settings. Please help. 1) Number of mismatches allowed in the initial read mapping Based on the documantation, my understanding is: the reads are re-aligned to transcriptome/genome if the mismatches in the initial alignment is more than the set

Re: [galaxy-user] Please help to understand the square root of Jensen-Shannon divergence

2012-09-06 Thread Du, Jianguang
, and then compare between conditions. Thanks in advance, Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Thursday, September 06, 2012 12:38 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu; closetic...@galaxyproject.org Subject: Re: [galaxy

[galaxy-user] Number of mismatches allowed in the initial read mapping

2012-09-06 Thread Du, Jianguang
Dear All, I tested how to set the Number of mismatches allowed in the initial read mapping as follows. At first, I ran FASTQ Groomer on a dataset to get the number of total reads. The total number of the reads is 17510227. Then I ran Tophat after set Number of mismatches allowed in the

[galaxy-user] Does Tophat output *.accepted hits file contain headers?

2012-09-13 Thread Du, Jianguang
Dear All, I want to use the Tophat output files with .accepted hits to do analysis outside Galaxy. However, the program I am using requires the Tophat output to be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with .accepted hits produced at Galaxy contain headers? Will

[galaxy-user] How much FPKM can be take into consideration when compare gene expression

2012-09-19 Thread Du, Jianguang
Dear All, I am comparing the gene expression between two cell types by examining the Cufflink output file -- gene differential expression testingjavascript:void(0);. The file lists the FPKM of genes in two cell types and log2 of fold. I want to look for genes that have more than 2-flod of

[galaxy-user] please restore my account

2012-10-08 Thread Du, Jianguang
Dear Sir or Madam, I had onpened multiple accounts at Galaxy Main, I did not know that it is against policy. I noticed this policy when I found that all the accounts are blocked. Would you please restore the account with email address jia...@iupui.edumailto:jia...@iupui.edu? If you are not

[galaxy-user] Do I need to specify the file format when I upload datasets using FTP method?

2013-03-21 Thread Du, Jianguang
Hi Everyone, When I upload my datasets onto my history via FTP method (using FileZilla), do I need to specify the file format under File Format of Upload File from your computer? I noticed that the screencast of how to upload datasets via FTP just leaves the File Format as Auto-detect.

[galaxy-user] is there size limit of dataset for running Tophat?

2013-03-27 Thread Du, Jianguang
Hi All, Is there a size limit of dataset for running Tophat at Galaxy? If there is, how many reads is the limit? Thanks. Jianguang ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the

[galaxy-user] Parameters for merging BAM files

2013-04-05 Thread Du, Jianguang
Hi All, I want to merge the Tophat output (Accepted Hits) of Several datasets. I want the merged BAM file has the exact format as the individual input BAM files, should I check Merge all component bam file headers into the merged bam file? Thanks. Have a nice weekend. Jianguang

[galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-08 Thread Du, Jianguang
Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-09 Thread Du, Jianguang
in the .splicing junctions output. Is my understanding correct? Does the regions mean the number of mapped splicing junctions? Thanks. Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Tuesday, April 09, 2013 9:03 AM To: Du, Jianguang Cc

Re: [galaxy-user] Parameters for merging BAM files

2013-04-10 Thread Du, Jianguang
Hi Jen, Thanks for the information. I used this setting and the merged BAM files (.accepted hits) worked quite well for the downstream analysis. Best, Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Tuesday, April 09, 2013 4:10 PM To: Du, Jianguang Cc

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-10 Thread Du, Jianguang
be 33 nucleotides). So my understanding is that setting the Anchor length at 3 does not increase the inaccuracy of the alignment. Am I correct? Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Tuesday, April 09, 2013 1:57 PM To: Du, Jianguang

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-11 Thread Du, Jianguang
? Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Wednesday, April 10, 2013 3:16 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions

[galaxy-user] Which Library Type should I use for single-end reads

2013-04-15 Thread Du, Jianguang
Hi All, I have a very basic question about parameters for running TopHat. I have datasets of single-end reads. These datasets were generated with Illumina Genome Analyzer IIx. Which Library Type should I choose to run Tophat? Thanks. Best, Jianguang

[galaxy-user] View details of Tophat alignment

2013-05-30 Thread Du, Jianguang
Hi All, After I finshed Tophat alignment for RNA-seq, I took look at the details of parameters by clicking the icon View details, and I got the information as shown below: Input Parameter Value Note for rerun RNA-Seq FASTQ file 73: Filtered Groomed data1_rep2 Use a built in reference

[galaxy-user] Which Input FASTQ quality scores type should I choose when run FASTQ Groomer?

2013-08-30 Thread Du, Jianguang
Hi All, I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which Input FASTQ quality scores type I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as read 1 and read 2. 1)