Re: [Bioc-sig-seq] Reads in 3'utr
Hi, I have summed the counts now to the gene level for 3'UTR. I want to assess the relative amount of each 3âUTR end usage such as what percentage of reads comes from each 3âUTR isoform?  I want to identify the different 3âUTR ends for each gene to get alternative 3'UTR  usage(disease vs control)? Do you have any idea about how to proceed? Thanks,Rohan --- On Sat, 24/9/11, Valerie Obenchain wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Saturday, 24 September, 2011, 4:24 AM On 09/23/2011 02:57 PM, rohan bareja wrote: Hi, utr=threeUTRsByTranscript(txdb,use.names=FALSE) So,utr is GRangesList of length 33381   Then as u said,I did the following: txBygene <- transcriptsBy(txdb, "gene")   geneID <- rep(names(txBygene), elementLengths(txBygene))   df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]])  This gives me a dataframe with 40,780 rows with gene ID and txID from txBygene object.      geneID  txID 40775  9994 11731 40776  9994 11730 40777  9997 38491 40778  9997 38489 40779  9997 38496 40780  9997 38497 Since my utr object is of length 33,381 ,my counts length is same i.e 33,381 So I am not able to map the counts to the above data frame which has transcript and gene IDs. Yes, these lengths are different. In this example we have utr regions from 58 transcripts. > length(utr) [1] 58 Those 58 transcripts can be matched to their gene ID's by looking at the txBygene object. All of the transcripts fall into one (or more) of 51 genes, > length(txBygene) [1] 51 There are multiple transcripts per gene so we expand the gene ID's to map to the transcripts. > dim(df) [1] 79 2 This data.frame has all transcripts from the txdb mapped to the gene ID's. Your utr data may contain only a subset of these transcripts. That is something you need to check. Match the desired transcript names to the df, pull out the gene IDs. You then have the gene ID's for your utr regions and can split or group your counts by gene. Valerie --- On Fri, 23/9/11, Valerie Obenchain wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Friday, 23 September, 2011, 10:50 PM Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs.    txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")    txdb <- loadFeatures(txdb_file)    utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through,    txBygene <- transcriptsBy(txdb, "gene")    geneID <- rep(names(txBygene), elementLengths(txBygene))    df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) Now you know what gene ID each tx count belongs to. You can split your counts by gene ID ... Valerie
Re: [Bioc-sig-seq] Reads in 3'utr
DESeq and edgeR vignettes. Valerie On 09/28/11 19:46, rohan bareja wrote: Hi, I have summed the counts now to the gene level for 3'UTR. I want to assess the relative amount of each 3’UTR end usage such as what percentage of reads comes from each 3’UTR isoform? I want to identify the different 3’UTR ends for each gene to get alternative 3'UTR usage(disease vs control)? Do you have any idea about how to proceed? Thanks, Rohan --- On *Sat, 24/9/11, Valerie Obenchain //*wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Saturday, 24 September, 2011, 4:24 AM On 09/23/2011 02:57 PM, rohan bareja wrote: Hi, utr=threeUTRsByTranscript(txdb,use.names=FALSE) So,utr is GRangesList of length 33381 Then as u said,I did the following: txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) This gives me a dataframe with 40,780 rows with gene ID and txID from txBygene object. geneID txID 40775 9994 11731 40776 9994 11730 40777 9997 38491 40778 9997 38489 40779 9997 38496 40780 9997 38497 Since my utr object is of length 33,381 ,my counts length is same i.e 33,381 So I am not able to map the counts to the above data frame which has transcript and gene IDs. Yes, these lengths are different. In this example we have utr regions from 58 transcripts. > length(utr) [1] 58 Those 58 transcripts can be matched to their gene ID's by looking at the txBygene object. All of the transcripts fall into one (or more) of 51 genes, > length(txBygene) [1] 51 There are multiple transcripts per gene so we expand the gene ID's to map to the transcripts. > dim(df) [1] 79 2 This data.frame has all transcripts from the txdb mapped to the gene ID's. Your utr data may contain only a subset of these transcripts. That is something you need to check. Match the desired transcript names to the df, pull out the gene IDs. You then have the gene ID's for your utr regions and can split or group your counts by gene. Valerie --- On *Fri, 23/9/11, Valerie Obenchain / /*wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Friday, 23 September, 2011, 10:50 PM Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs. txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadFeatures(txdb_file) utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through, txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) Now you know what gene ID each tx count belongs to. You can split your counts by gene ID ... Valerie On 09/20/2011 12:13 PM, rohan bareja wrote: Hi everyone, I am doing NGS analysis using bam files.I have counted reads in 3'utr region using utr=threeUTRsByTranscript(txdb,use.names=FALSE) countsUTR<- countOverlaps(utr,reads) I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ? Thanks,Rohan [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
Re: [Bioc-sig-seq] Reads in 3'utr
Hi Valerie, Thanks a lot..It worked finally.. So now I have a data frame for the geneIds ,TranscriptIds and the counts (3'utr) which is given below: GENE TX countsUTRctl[1,] "148398" "1121" "2" [2,] "339451" "1118" "0" [3,] "84069" "1116" "0" [4,] "84069" "1119" "11" [5,] "9636" "1126" "11" [6,] "375790" "1127" "0" Now I want to do differential expression of genes using DESeq,so do I have to merge the two same genes and its counts such as geneID 84069 (from above ) or i can proceed with the above dataframe?If I have to merge them how do I do that? Thanks,Rohan --- On Sat, 24/9/11, Valerie Obenchain wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Saturday, 24 September, 2011, 4:24 AM On 09/23/2011 02:57 PM, rohan bareja wrote: Hi, utr=threeUTRsByTranscript(txdb,use.names=FALSE) So,utr is GRangesList of length 33381 Then as u said,I did the following: txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) This gives me a dataframe with 40,780 rows with gene ID and txID from txBygene object. geneID txID 40775 9994 11731 40776 9994 11730 40777 9997 38491 40778 9997 38489 40779 9997 38496 40780 9997 38497 Since my utr object is of length 33,381 ,my counts length is same i.e 33,381 So I am not able to map the counts to the above data frame which has transcript and gene IDs. Yes, these lengths are different. In this example we have utr regions from 58 transcripts. > length(utr) [1] 58 Those 58 transcripts can be matched to their gene ID's by looking at the txBygene object. All of the transcripts fall into one (or more) of 51 genes, > length(txBygene) [1] 51 There are multiple transcripts per gene so we expand the gene ID's to map to the transcripts. > dim(df) [1] 79 2 This data.frame has all transcripts from the txdb mapped to the gene ID's. Your utr data may contain only a subset of these transcripts. That is something you need to check. Match the desired transcript names to the df, pull out the gene IDs. You then have the gene ID's for your utr regions and can split or group your counts by gene. Valerie --- On Fri, 23/9/11, Valerie Obenchain wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Friday, 23 September, 2011, 10:50 PM Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs. txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadFeatures(txdb_file) utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through, txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(nam
Re: [Bioc-sig-seq] Reads in 3'utr
On 09/26/11 09:16, rohan bareja wrote: Yes, you need to merge the counts for each gene. see ?split Split the counts on the gene IDs then sum with lapply. Something like, lapply(split(df$counts, df$geneID), sum) Valerie Hi Valerie, Thanks a lot..It worked finally.. So now I have a data frame for the geneIds ,TranscriptIds and the counts (3'utr) which is given below: GENE TX countsUTRctl [1,] "148398" "1121" "2" [2,] "339451" "1118" "0" [3,]"84069" "1116" "0" [4,] "84069" "1119" "11" [5,] "9636" "1126" "11" [6,] "375790" "1127" "0" Now I want to do differential expression of genes using DESeq,so do I have to merge the two same genes and its counts such as geneID 84069 (from above ) or i can proceed with the above dataframe? If I have to merge them how do I do that? Thanks, Rohan --- On *Sat, 24/9/11, Valerie Obenchain //* wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Saturday, 24 September, 2011, 4:24 AM On 09/23/2011 02:57 PM, rohan bareja wrote: Hi, utr=threeUTRsByTranscript(txdb,use.names=FALSE) So,utr is GRangesList of length 33381 Then as u said,I did the following: txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) This gives me a dataframe with 40,780 rows with gene ID and txID from txBygene object. geneID txID 40775 9994 11731 40776 9994 11730 40777 9997 38491 40778 9997 38489 40779 9997 38496 40780 9997 38497 Since my utr object is of length 33,381 ,my counts length is same i.e 33,381 So I am not able to map the counts to the above data frame which has transcript and gene IDs. Yes, these lengths are different. In this example we have utr regions from 58 transcripts. > length(utr) [1] 58 Those 58 transcripts can be matched to their gene ID's by looking at the txBygene object. All of the transcripts fall into one (or more) of 51 genes, > length(txBygene) [1] 51 There are multiple transcripts per gene so we expand the gene ID's to map to the transcripts. > dim(df) [1] 79 2 This data.frame has all transcripts from the txdb mapped to the gene ID's. Your utr data may contain only a subset of these transcripts. That is something you need to check. Match the desired transcript names to the df, pull out the gene IDs. You then have the gene ID's for your utr regions and can split or group your counts by gene. Valerie --- On *Fri, 23/9/11, Valerie Obenchain / /*wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Friday, 23 September, 2011, 10:50 PM Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs. txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadFeatures(txdb_file) utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through, txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) Now you know what gene ID each tx count belongs to. You can split your counts by gene ID ... Valerie On 09/20/2011 12:13 PM, rohan bareja wrote: Hi everyone, I am doing NGS analysis using bam files.I have counted reads in 3'utr region using utr=threeUTRsByTranscript(txdb,use.names=FALSE) countsUTR<- countOverlaps(utr,reads) I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ? Thanks,Rohan [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
Re: [Bioc-sig-seq] Reads in 3'utr
On 09/23/2011 02:57 PM, rohan bareja wrote: > Hi, > > utr=threeUTRsByTranscript(txdb,use.names=FALSE) > So,utr is GRangesList of length 33381 > Then as u said,I did the following: > > txBygene <- transcriptsBy(txdb, "gene") >geneID <- rep(names(txBygene), elementLengths(txBygene)) >df <- data.frame(geneID=geneID, > txID=values(unlist(txBygene))[["tx_id"]]) > > This gives me a dataframe with 40,780 rows with gene ID and txID from > txBygene object. > geneID txID > 40775 9994 11731 > 40776 9994 11730 > 40777 9997 38491 > 40778 9997 38489 > 40779 9997 38496 > 40780 9997 38497 > > Since my utr object is of length 33,381 ,my counts length is same i.e > 33,381 > So I am not able to map the counts to the above data frame which has > transcript and gene IDs. > Yes, these lengths are different. In this example we have utr regions from 58 transcripts. > length(utr) [1] 58 Those 58 transcripts can be matched to their gene ID's by looking at the txBygene object. All of the transcripts fall into one (or more) of 51 genes, > length(txBygene) [1] 51 There are multiple transcripts per gene so we expand the gene ID's to map to the transcripts. > dim(df) [1] 79 2 This data.frame has all transcripts from the txdb mapped to the gene ID's. Your utr data may contain only a subset of these transcripts. That is something you need to check. Match the desired transcript names to the df, pull out the gene IDs. You then have the gene ID's for your utr regions and can split or group your counts by gene. Valerie > > > > --- On *Fri, 23/9/11, Valerie Obenchain //*wrote: > > > From: Valerie Obenchain > Subject: Re: [Bioc-sig-seq] Reads in 3'utr > To: "rohan bareja" > Cc: bioc-sig-sequencing@r-project.org > Date: Friday, 23 September, 2011, 10:50 PM > > Hi Rohan, > > You can relate the counts for 3UTR regions to gene IDs through the > transcript IDs. > > txdb_file <- system.file("extdata", > "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") > txdb <- loadFeatures(txdb_file) > utr=threeUTRsByTranscript(txdb,use.names=FALSE) > > > The transcript names can be matched to the gene ID's through, > > txBygene <- transcriptsBy(txdb, "gene") > geneID <- rep(names(txBygene), elementLengths(txBygene)) > df <- data.frame(geneID=geneID, > txID=values(unlist(txBygene))[["tx_id"]]) > > Now you know what gene ID each tx count belongs to. You can split > your counts by gene ID ... > > > Valerie > > > > On 09/20/2011 12:13 PM, rohan bareja wrote: >> Hi everyone, >> I am doing NGS analysis using bam files.I have counted reads in 3'utr >> region using >> utr=threeUTRsByTranscript(txdb,use.names=FALSE) >> countsUTR<- countOverlaps(utr,reads) >> I have got the transcript level counts from this.How can I get the gene >> level counts??It might sound silly but Does anybody have an idea on what >> type of anaylses we can do from this countsUTR ? >> Thanks,Rohan >> [[alternative HTML version deleted]] >> >> >> >> ___ >> Bioc-sig-sequencing mailing list >> Bioc-sig-sequencing@r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
Re: [Bioc-sig-seq] Reads in 3'utr
Hi, utr=threeUTRsByTranscript(txdb,use.names=FALSE)So,utr is GRangesList of length 33381 Then as u said,I did the following: txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) This gives me a dataframe with 40,780 rows with gene ID and txID from txBygene object. geneID txID40775 9994 1173140776 9994 1173040777 9997 3849140778 9997 3848940779 9997 3849640780 9997 38497 Since my utr object is of length 33,381 ,my counts length is same i.e 33,381So I am not able to map the counts to the above data frame which has transcript and gene IDs. --- On Fri, 23/9/11, Valerie Obenchain wrote: From: Valerie Obenchain Subject: Re: [Bioc-sig-seq] Reads in 3'utr To: "rohan bareja" Cc: bioc-sig-sequencing@r-project.org Date: Friday, 23 September, 2011, 10:50 PM Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs. txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadFeatures(txdb_file) utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through, txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) Now you know what gene ID each tx count belongs to. You can split your counts by gene ID ... Valerie On 09/20/2011 12:13 PM, rohan bareja wrote: Hi everyone, I am doing NGS analysis using bam files.I have counted reads in 3'utr region using utr=threeUTRsByTranscript(txdb,use.names=FALSE) countsUTR <- countOverlaps(utr,reads) I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ? Thanks,Rohan [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
Re: [Bioc-sig-seq] Reads in 3'utr
Hi Rohan, You can relate the counts for 3UTR regions to gene IDs through the transcript IDs. txdb_file <- system.file("extdata", "UCSC_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadFeatures(txdb_file) utr=threeUTRsByTranscript(txdb,use.names=FALSE) The transcript names can be matched to the gene ID's through, txBygene <- transcriptsBy(txdb, "gene") geneID <- rep(names(txBygene), elementLengths(txBygene)) df <- data.frame(geneID=geneID, txID=values(unlist(txBygene))[["tx_id"]]) Now you know what gene ID each tx count belongs to. You can split your counts by gene ID ... Valerie On 09/20/2011 12:13 PM, rohan bareja wrote: > Hi everyone, > I am doing NGS analysis using bam files.I have counted reads in 3'utr region > using > utr=threeUTRsByTranscript(txdb,use.names=FALSE) > countsUTR<- countOverlaps(utr,reads) > I have got the transcript level counts from this.How can I get the gene level > counts??It might sound silly but Does anybody have an idea on what type of > anaylses we can do from this countsUTR ? > Thanks,Rohan > [[alternative HTML version deleted]] > > > > ___ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
[Bioc-sig-seq] Reads in 3'utr
Hi everyone, I am doing NGS analysis using bam files.I have counted reads in 3'utr region using utr=threeUTRsByTranscript(txdb,use.names=FALSE) countsUTR <- countOverlaps(utr,reads) I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ? Thanks,Rohan [[alternative HTML version deleted]] ___ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing