Re: [Bioc-devel] ExperimentHub::GSE62944 outdated
Hi, Yes, please share your scripts with Sonali since (I think) she will be putting the data together. Valerie On 06/04/2016 05:56 AM, Marini, Federico wrote: > Hi Valerie, > > This is true. I also did the same thing for the normal samples, this as well > already as SummarizedExperiment. > > I can check in my scripts once I am back in the office, if you want to use it > as a starter. > > Cheers, > Federico > > > From: Obenchain, Valerie > Sent: Friday, June 3, 2016 5:28 PM > To: Ludwig Geistlinger; bioc-devel@r-project.org > Cc: Marini, Federico; Sonali Arora > Subject: Re: [Bioc-devel] ExperimentHub::GSE62944 outdated > > Hi Ludwig and Federico, > > Yes, we plan to update these data in the next couple of weeks. > > Sonali mentioned that the current data only include the tumor samples > and she'd like to add the normals. The new data will likely be added as > SummarizedExperiment objects instead of ExpressionSets. > > Valerie > > > On 06/03/2016 04:57 AM, Ludwig Geistlinger wrote: >> FYI >> >> That works for me, but maybe this is also of interest for others, so I >> wonder if somebody of the Bioc annotation/experiment team (Sonali, >> Valerie, Martin?) could update this accordingly for ExperimentHub? >> >> Best, >> Ludwig >> > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ExperimentHub::GSE62944 outdated
Hi Valerie, This is true. I also did the same thing for the normal samples, this as well already as SummarizedExperiment. I can check in my scripts once I am back in the office, if you want to use it as a starter. Cheers, Federico From: Obenchain, Valerie Sent: Friday, June 3, 2016 5:28 PM To: Ludwig Geistlinger; bioc-devel@r-project.org Cc: Marini, Federico; Sonali Arora Subject: Re: [Bioc-devel] ExperimentHub::GSE62944 outdated Hi Ludwig and Federico, Yes, we plan to update these data in the next couple of weeks. Sonali mentioned that the current data only include the tumor samples and she'd like to add the normals. The new data will likely be added as SummarizedExperiment objects instead of ExpressionSets. Valerie On 06/03/2016 04:57 AM, Ludwig Geistlinger wrote: > FYI > > That works for me, but maybe this is also of interest for others, so I > wonder if somebody of the Bioc annotation/experiment team (Sonali, > Valerie, Martin?) could update this accordingly for ExperimentHub? > > Best, > Ludwig > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ExperimentHub::GSE62944 outdated
Hi Ludwig and Federico, Yes, we plan to update these data in the next couple of weeks. Sonali mentioned that the current data only include the tumor samples and she'd like to add the normals. The new data will likely be added as SummarizedExperiment objects instead of ExpressionSets. Valerie On 06/03/2016 04:57 AM, Ludwig Geistlinger wrote: > FYI > > That works for me, but maybe this is also of interest for others, so I > wonder if somebody of the Bioc annotation/experiment team (Sonali, > Valerie, Martin?) could update this accordingly for ExperimentHub? > > Best, > Ludwig > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ExperimentHub::GSE62944 outdated
FYI That works for me, but maybe this is also of interest for others, so I wonder if somebody of the Bioc annotation/experiment team (Sonali, Valerie, Martin?) could update this accordingly for ExperimentHub? Best, Ludwig -- Dr. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de > Hi Ludwig, > > In november I sent the updated recipe to Martin, but I think it was not > updated yet. > > Anyway, you can do it yourself with the code here below: > > library("GEOquery") > library("Biobase") > > suppl <- GEOquery::getGEOSuppFiles("GSE62944") > > setwd("GSE62944") > > clinvar <- > > read.delim("GSE62944_06_01_15_TCGA_24_548_Clinical_Variables_9264_Samples.txt.gz") > clinvar2 <- t(clinvar) > > # add variable names > colnames(clinvar2) <- clinvar2[1,] > # and remove the 2nd abbreviation, with the CDE_ID too > clinvar3 <- clinvar2[-c(1:3),] > > # substitute dots with dashes in the ids, to be consistent with > previous object > clinvar4 <- clinvar3 > rownames(clinvar4) <- gsub("\\.","-",rownames(clinvar3)) > clinvar4 <- as.data.frame(clinvar4) > > CancerType <- > read.delim("GSE62944_06_01_15_TCGA_24_CancerType_Samples.txt.gz", > header=FALSE, colClasses=c("character", "factor"), > col.names=c("sample", "type")) > idx <- match(rownames(clinvar4), CancerType$sample) > # these are already nicely sorted > clinvar4$CancerType <- CancerType$type[idx] > > > countFile <- > "GSM1536837_06_01_15_TCGA_24.tumor_Rsubread_FeatureCounts.txt.gz" > untar("GSE62944_RAW.tar", countFile) > > counts <- local({ >data <- scan(countFile, what=character(), sep="\t", quote="") >m <- matrix(data, 9265) >dimnames(m) <- list(m[,1], m[1,]) >m <- t(m[-1, -1]) >mode(m) <- "integer" >m > }) > > # just to be sure > gplots::venn(list(colnames(counts),rownames(cl4))) # they are all > there, but not correctly sorted > head(colnames(counts)) > head(rownames(clinvar4)) > > # re-sorting according to the counts object > cl5 <- > clinvar4[rownames(clinvar4)[match(colnames(counts),rownames(clinvar4))],] > head(rownames(cl5),20) > head(colnames(counts),20) > > # as in your example > eset_new <- Biobase::ExpressionSet(counts, AnnotatedDataFrame(cl5)) > > # or as SummarizedExperiment > library("GenomicRanges") > se <- SummarizedExperiment(assays=list(counts)) > colData(se) <- S4Vectors::DataFrame(cl5) > > # data exploration to see how samples are related to each other > library("DESeq2") > ddsTCGA <- DESeqDataSet(se,design=~CancerType) > > ddsTCGA <- estimateSizeFactors(ddsTCGA) > log2tcga <- log2(1+counts(ddsTCGA,normalized=TRUE)) > se_log2tcga <- SummarizedExperiment(assays=list(log2tcga)) > colData(se_log2tcga) <- colData(ddsTCGA) # the rlog transform takes > very long time, so just a quick and dirty check > > pca_d4 <- function (x, intgroup = "condition", ntop = 500, > returnData = FALSE,title=NULL, > pcX = 1, pcY = 2,text_labels=TRUE,point_size=3) > # customized principal components > { >library("DESeq2") >library("genefilter") >library("ggplot2") >rv <- rowVars(assay(x)) >select <- order(rv, decreasing = > TRUE)[seq_len(min(ntop,length(rv)))] >pca <- prcomp(t(assay(x)[select, ])) >percentVar <- pca$sdev^2/sum(pca$sdev^2) > >intgroup.df <- as.data.frame(colData(x)[, intgroup, drop = FALSE]) >group <- factor(apply(intgroup.df, 1, paste, collapse = " : ")) >d <- data.frame(PC1 = pca$x[, pcX], PC2 = pca$x[, pcY], group = > group, >intgroup.df, names = colnames(x)) >colnames(d)[1] <- paste0("PC",pcX) >colnames(d)[2] <- paste0("PC",pcY) >if (returnData) { > attr(d, "percentVar") <- percentVar[1:2] > return(d) >} ># clever way of positioning the labels >d$hjust = ifelse((sign(d[,paste0("PC",pcX)])==1),0.9,0.1)# (1 + > varname.adjust * sign(PC1))/2) >g <- ggplot(data = d, aes_string(x = paste0("PC",pcX), y = > paste0("PC",pcY), color = "group")) + > geom_point(size = point_size) + > xlab(paste0("PC",pcX,": ", round(percentVar[pcX] * 100,digits = > 2), "% variance")) + > ylab(paste0("PC",pcY,": ", round(percentVar[pcY] * 100,digits = > 2), "% variance")) >if(text_labels) g <- g + geom_text(mapping = > aes(label=names,hjust=hjust, vjust=-0.5), show.legend = F) >if(!is.null(title)) g <- g + ggtitle(title) >g > } > > pdf("allTCGA_diy.pdf",height=30,width=30) > pca_d4(se_log2tcga,intgroup="Cance
[Bioc-devel] ExperimentHub::GSE62944 outdated
Hi, I would like to do some analysis on the TCGA data as provided in ExperimentHub's GSE62944 ExpressionSet. The Description of the dataset reads: "TCGA re-processed RNA-Seq data from 9264 Tumor Samples and 741 normal samples across 24 cancer types" However, when loading the dataset via > eh <- ExperimentHub() > query(eh , "GSE62944") > tcga_data <- eh[["EH1"]] and counting the samples > dim(tcga_data) Features Samples 23368 7706 as well as the cancer types > length(table(pData(tcga_data)[,"CancerType"])) results in the observed discrepancies with the above description, indicating that this is an outdated version of the dataset. Is it possible to (1) update it accordingly (2) include a varLabel, i.e. pData column indicating whether this is a tumor or an adjacent normal sample for the respective cancer type. That would be great! Thx & Best, Ludwig -- Dr. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel