Re: [R] lmer coefficient distributions and p values
Hi Daniel, You might want to review further advances by Doug Bates with lme4 since the post you show in your email. http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3565.html In this thread Doug Bates discusses fitting using maximum likelihood for testing purposes. There is now an anova() method for lmer() and lmer2() fits performed using method=ML. You can compare different models and get p-values for p-value obsessed journals using this approach. I believe I read a thread discussing the use of maximum likelihood fitting for use in ANOVA tests, and then REML fits on final models for better parameter estimates - but I can't find that thread. Hopefully Doug Bates or anyone involved in that level of discussion can chime in. See also for example this wiki http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests and the new listserv at https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Also check the latest documentation for lme4 and the lmer() and lmer2() functions at http://cran.r-project.org/ in the Packages ... lme4 pages. Hope this helps Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Daniel Lakeland Sent: Wed 8/15/2007 9:34 AM To: r-help@stat.math.ethz.ch Subject: [R] lmer coefficient distributions and p values I am helping my wife do some statistical analysis. She is a biologist, and she has performed some measurements on various genotypes of mice. My background is in applied mathematics and engineering, and I have a fairly good statistics background, but I am by no means a PhD level expert in statistical methods. We have used the lmer package to fit various models for the various experiments that she has done (random effects from multiple measurements for each animal or each trial, and fixed effects from developmental stage, and genotype etc). The results are fairly clear cut to me, and I suggested that she publish the results as coefficient estimates for the relevant contrasts, and their standard error estimates. However, she has read the statistical guidelines for the journal and they insist on p values. I personally think that p values, and sharp-null hypothesis tests are misguided and should be banned from publications, but it doesn't much matter what I think compared to what the editors want. Based on searching the archives, and finding this message: https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html I am aware of the theoretical difficulties with p values from lmer results. I am also aware of the mcmcsamp function which performs some kind of bayesian sampling from the posterior distribution of the coefficients based on some kind of prior (I will need to do some more reading to more fully understand this). Is this the primary way in which we can estimate the distribution of the model coefficients and calculate a p value or a confidence interval? What can I do with the t statistic provided by lmer? If as the above message suggests, we are uncertain of the correct F and by extension t distributions to use, what help are the t statistics? I suppose I could test them against a very low degree of freedom t distribution (say 3) and publish those p values? Again, I'm content to ignore p values and stick to estimates, but the journal isn't. BTW: thanks to all on this list, I've benefitted greatly from R and from the archives of help topics. -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Systematically biased count data regression model
Hi Matthew, You may be experiencing the classic 'regression towards the mean' phenomenon, in which case shrinkage estimation may help with prediction (extremely low and high values need to be shrunk back towards the mean) Here's a reference that discusses the issue in a manner somewhat related to your situation, and it has plenty of good references Application of Shrinkage Techniques in Logistic Regression Analysis: A Case Study E. W. Steyerberg Statistica Neerlandica, 2001, vol. 55, issue 1, pages 76-88 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Matthew and Kim Bowser Sent: Thu 8/9/2007 8:43 AM To: r-help@stat.math.ethz.ch Subject: [R] Systematically biased count data regression model Dear all, I am attempting to explain patterns of arthropod family richness (count data) using a regression model. It seems to be able to do a pretty good job as an explanatory model (i.e. demonstrating relationships between dependent and independent variables), but it has systematic problems as a predictive model: It is biased high at low observed values of family richness and biased low at high observed values of family richness (see attached pdf). I have tried diverse kinds of reasonable regression models mostly as in Zeileis, et al. (2007), as well as transforming my variables, both with only small improvements. Do you have suggestions for making a model that would perform better as a predictive model? Thank you for your time. Sincerely, Matthew Bowser STEP student USFWS Kenai National Wildlife Refuge Soldotna, Alaska, USA M.Sc. student University of Alaska Fairbanks Fairbankse, Alaska, USA Reference Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for count data in R. Technical Report 53, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. Code `data` - structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318
Re: [R] FW: Selecting undefined column of a data frame (was[BioC]read.phenoData vs read.AnnotatedDataFrame)
Resolution: To avoid bugs in code due to typos of data frame column names that can occur when using the '$' extractor, foo - data.frame(Filename = c(a, b)) foo$FileName NULL a past alternative was to use foo[, FileName] instead of foo$FileName. However, this too now silently returns NULL. foo[, FileName] NULL A modest and simple modification is to use TRUE for the row index argument. foo[T, FileName] Error in `[.data.frame`(foo, T, FileName) : undefined columns selected An error is issued, and the misspelled column name can more easily be found in debugging the issue. all.equal(foo$Filename, foo[T, Filename]) [1] TRUE The two accessor methods yield the same result when column names are spelled correctly. all.equal(iris$Species, iris[T, Species]) [1] TRUE Other solutions no doubt exist. Currently a single argument to [.data.frame will throw an error if the argument does not match a column name. foo[FileName] Error in `[.data.frame`(foo, FileName) : undefined columns selected sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Steven McKinney Sent: Fri 8/3/2007 11:10 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] FW: Selecting undefined column of a data frame (was[BioC]read.phenoData vs read.AnnotatedDataFrame) I see now that for my example foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL the issue is in this clause of the [.data.frame extractor. The lines if (drop length(y) == 1L) return(.subset2(y, 1L)) return the NULL result just before the error check cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) is performed. Is this intended behaviour, or has a logical bug crept into the [.data.frame extractor? if (missing(i)) { if (missing(j) drop length(x) == 1L) return(.subset2(x, 1L)) y - if (missing(j)) x else .subset(x, j) if (drop length(y) == 1L) return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) if (any(duplicated(cols))) names(y) - make.unique(cols) nrow - .row_names_info(x, 2L) if (drop !mdrop nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Should this discussion move to R-devel? Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Steven McKinney Sent: Fri 8/3/2007 10:37 AM To: r-help@stat.math.ethz.ch Subject: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame) Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo
[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order of Annotated object) and my comparisons were confused without being obvious as to why or where. Our solution: specify that filename is as.character so assignment of file to target is correct(after correcting $Filename) now that using read.AnnotatedDataFrame rather than readphenoData. Data-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd) Hurrah! It may be beneficial to others, that if the filename argument isn't specified, that filenames are read from the phenoData object if included here. Thanks! -Original Message- From: Martin Morgan [mailto:[EMAIL PROTECTED] Sent: Thursday, 26 July 2007 11:49 a.m. To: Johnstone, Alice Cc: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame Hi Alice -- Johnstone, Alice [EMAIL PROTECTED] writes: Using R2.5.0 and Bioconductor I have been following code to analysis Affymetrix expression data: 2 treatments vs control. The original code was run last year and used the read.phenoData command, however with the newer version I get the error message Warning messages: read.phenoData is deprecated, use read.AnnotatedDataFrame instead The phenoData class is deprecated, use AnnotatedDataFrame (with ExpressionSet) instead I use the read.AnnotatedDataFrame command, but when it comes to the end of the analysis the comparison of the treatment to the controls gets mixed up compared to what you get using the original read.phenoData ie it looks like the 3 groups get labelled wrong and so the comparisons are different (but they can still be matched up). My questions are, 1) do you need to set up your target file differently when using read.AnnotatedDataFrame - what is the standard format? I can't quite tell where things are going wrong for you, so it would help if you can narrow down where the problem occurs. I think read.AnnotatedDataFrame should be comparable to read.phenoData. Does pData(pd) look right? What about pData(Data) and pData(eset.rma) ? It's not important but pData(pd)$Target is the same as pd$Target. Since the analysis is on eset.rma, it probably makes sense to use the pData from there to construct your design matrix targs-factor
Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)
I see now that for my example foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL the issue is in this clause of the [.data.frame extractor. The lines if (drop length(y) == 1L) return(.subset2(y, 1L)) return the NULL result just before the error check cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) is performed. Is this intended behaviour, or has a logical bug crept into the [.data.frame extractor? if (missing(i)) { if (missing(j) drop length(x) == 1L) return(.subset2(x, 1L)) y - if (missing(j)) x else .subset(x, j) if (drop length(y) == 1L) return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) if (any(duplicated(cols))) names(y) - make.unique(cols) nrow - .row_names_info(x, 2L) if (drop !mdrop nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Should this discussion move to R-devel? Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Steven McKinney Sent: Fri 8/3/2007 10:37 AM To: r-help@stat.math.ethz.ch Subject: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame) Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order
Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
Thanks Prof Ripley, I used double indexing (if I understand the doc correctly) so my call was foo[, FileName] I traced through each line of `[.data.frame` following the sequence of commands executed for my call. In the code section if (missing(i)) { if (missing(j) drop length(x) == 1L) return(.subset2(x, 1L)) y - if (missing(j)) x else .subset(x, j) if (drop length(y) == 1L) return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) if (any(duplicated(cols))) names(y) - make.unique(cols) nrow - .row_names_info(x, 2L) if (drop !mdrop nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } the return happened after execution of if (drop length(y) == 1L) return(.subset2(y, 1L)) before the check on column names. Shouldn't the check on column names cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) occur before if (drop length(y) == 1L) return(.subset2(y, 1L)) rather than after? -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Fri 8/3/2007 12:25 PM To: Steven McKinney Cc: r-help@stat.math.ethz.ch Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame) You are reading the wrong part of the code for your argument list: foo[FileName] Error in `[.data.frame`(foo, FileName) : undefined columns selected [.data.frame is one of the most complex functions in R, and does many different things depending on which arguments are supplied. On Fri, 3 Aug 2007, Steven McKinney wrote: Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order of Annotated object) and my comparisons were confused without being obvious as to why or where. Our solution: specify that filename is as.character so assignment
Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
-Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Fri 8/3/2007 1:05 PM To: Steven McKinney Cc: r-help@stat.math.ethz.ch Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame) I've since seen your followup a more detailed explanation may help. The path through the code for your argument list does not go where you quoted, and there is a reason for it. Using a copy of [.data.frame with browser() I have traced the flow of execution. (My copy with the browser command is at the end of this email) foo[, FileName] Called from: `[.data.frame`(foo, , FileName) Browse[1] n debug: mdrop - missing(drop) Browse[1] n debug: Narg - nargs() - (!mdrop) Browse[1] n debug: if (Narg 3) { if (!mdrop) warning(drop argument will be ignored) if (missing(i)) return(x) if (is.matrix(i)) return(as.matrix(x)[i]) y - NextMethod([) cols - names(y) if (!is.null(cols) any(is.na(cols))) stop(undefined columns selected) if (any(duplicated(cols))) names(y) - make.unique(cols) return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } Browse[1] n debug: if (missing(i)) { if (missing(j) drop length(x) == 1L) return(.subset2(x, 1L)) y - if (missing(j)) x else .subset(x, j) if (drop length(y) == 1L) return(.subset2(y, 1L)) cols - names(y) if (any(is.na(cols))) stop(undefined columns selected) if (any(duplicated(cols))) names(y) - make.unique(cols) nrow - .row_names_info(x, 2L) if (drop !mdrop nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } Browse[1] n debug: if (missing(j) drop length(x) == 1L) return(.subset2(x, 1L)) Browse[1] n debug: y - if (missing(j)) x else .subset(x, j) Browse[1] n debug: if (drop length(y) == 1L) return(.subset2(y, 1L)) Browse[1] n NULL So `[.data.frame` is exiting after executing + if (drop length(y) == 1L) + return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? Couldn't the error check + cols - names(y) + if (any(is.na(cols))) + stop(undefined columns selected) be done before the above return()? What would break if the error check on column names was done before returning a NULL result due to incorrect column name spelling? Why should foo[, FileName] NULL differ from foo[seq(nrow(foo)), FileName] Error in `[.data.frame`(foo, seq(nrow(foo)), FileName) : undefined columns selected Thank you for your explanations. Generally when you extract in R and ask for an non-existent index you get NA or NULL as the result (and no warning), e.g. y - list(x=1, y=2) y[[z]] NULL Because data frames 'must' have (column) names, they are a partial exception and when the result is a data frame you get an error if it would contain undefined columns. But in the case of foo[, FileName], the result is a single column and so will not have a name: there seems no reason to be different from foo[[FileName]] NULL foo$FileName NULL which similarly select a single column. At one time they were different in R, for no documented reason. On Fri, 3 Aug 2007, Prof Brian Ripley wrote: You are reading the wrong part of the code for your argument list: foo[FileName] Error in `[.data.frame`(foo, FileName) : undefined columns selected [.data.frame is one of the most complex functions in R, and does many different things depending on which arguments are supplied. On Fri, 3 Aug 2007, Steven McKinney wrote: Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols
Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
What would break is that three methods for doing the same thing would give different answers. Please do have the courtesy to actually read the detailed explanation you are given. Sorry Prof. Ripley, I am attempting to read carefully, as this issue has deeper coding/debugging implications, and as you point out, [.data.frame is one of the most complex functions in R so please bear with me. This change in behaviour has taken away a side-effect debugging tool, discussed below. On Fri, 3 Aug 2007, Steven McKinney wrote: -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Fri 8/3/2007 1:05 PM To: Steven McKinney Cc: r-help@stat.math.ethz.ch Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame) I've since seen your followup a more detailed explanation may help. The path through the code for your argument list does not go where you quoted, and there is a reason for it. Generally when you extract in R and ask for an non-existent index you get NA or NULL as the result (and no warning), e.g. y - list(x=1, y=2) y[[z]] NULL Because data frames 'must' have (column) names, they are a partial exception and when the result is a data frame you get an error if it would contain undefined columns. But in the case of foo[, FileName], the result is a single column and so will not have a name: there seems no reason to be different from foo[[FileName]] NULL foo$FileName NULL which similarly select a single column. At one time they were different in R, for no documented reason. This difference provided a side-effect debugging tool, in that where bar - foo[, FileName] used to throw an error, alerting as to a typo, it now does not. Having been burned by NULL results due to typos in code lines using the $ extractor such as bar - foo$FileName I learned to use bar - foo[, FileName] to help cut down on typo bugs. With the ubiquity of camelCase object names, this is a constant typing bug hazard. I am wondering what to do now to double check spelling when accessing columns of a dataframe. If [.data.frame stays as is, can a debug mechanism be implemented in R that forces strict adherence to existing list names in debug mode? This would also help debug typos in camelCase names when using the $ and [[ extractors and accessors. Are there other debugging tools already in R that can help point out such camelCase list element name typos? On Fri, 3 Aug 2007, Prof Brian Ripley wrote: You are reading the wrong part of the code for your argument list: foo[FileName] Error in `[.data.frame`(foo, FileName) : undefined columns selected [.data.frame is one of the most complex functions in R, and does many different things depending on which arguments are supplied. On Fri, 3 Aug 2007, Steven McKinney wrote: Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -- Brian D. Ripley
Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)
Hi Bert, -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Fri 8/3/2007 3:19 PM To: Steven McKinney; r-help@stat.math.ethz.ch Subject: RE: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame) I suspect you'll get some creative answers, but if all you're worried about is whether a column exists before you do something with it, what's wrong with: nm - ... ## a character vector of names if(!all(nm %in% names(yourdata))) ## complain else ## do something I think this is called defensive programming. This is a good example of good defensive programming. I do indeed check variable/object names whenever obtaining them from an external source (user input, file input, a list in code). I was able to practice a defensive programming style in the past by using bar - foo[, FileName] instead of bar - foo$FileName but this has changed recently, so I need to figure out some other mechanisms. R is such a productive language, but this change will lead many of us to chase elusive typos that used to get revealed. I'm hoping that some kind of explicit data frame variable checking mechanism might be introduced since we've lost this one. It would also be great to have such a mechanism to help catch list access and extraction errors. Why should foo$FileName always quietly return NULL? I'm not sure why the following incongruity is okay. foo - matrix(1:4, nrow = 2) dimnames(foo) - list(NULL, c(a, b)) bar - foo[, A] Error: subscript out of bounds foo.df - as.data.frame(foo) foo.df a b 1 1 3 2 2 4 bar - foo.df[, A] bar NULL It is a lot of extra typing to wrap every command in extra code, but more of that will need to happen going forward. Steve McKinney Bert Gunter Genentech -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steven McKinney Sent: Friday, August 03, 2007 10:38 AM To: r-help@stat.math.ethz.ch Subject: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame) Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label
Re: [R] Extracting a website text content using R
-Original Message- From: [EMAIL PROTECTED] on behalf of Am Stat Sent: Wed 8/1/2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon Is this what you had in mind? foo - scan(url(http://cran.r-project.org/;), what = character) Read 69 items paste(unlist(foo), collapse = ) [1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN html head titleThe Comprehensive R Archive Network/title link rel=\icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\shortcut icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\stylesheet\ type=\text/css\ href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0 FRAMESET rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\ frameborder=0 FRAME src=\navbar.html\ name=\contents\ frameborder=0 /FRAMESET FRAME src=\banner.shtml\ name=\banner\ frameborder=0 noframes h1The Comprehensive R Archive Network/h1 Your browser seems not to support frames, here is the A href=\navbar.html\contents page/A of CRAN. /noframes /FRAMESET Try the search phrase cran scan url in Google for more hits on info about R functions that can deal with URLs. In R try apropos(URL) [1] contourLines URLdecode URLencode browseURL contrib.urlmain.help.url url.show [8] loadURLread.table.url scan.url source.url url SteveM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design matrix question
Hi useRs, Perhaps I am having a senior moment? I have a nested variable situation to model, toy example: df - data.frame(A = factor(c(a, a, x, x), levels = c(x, a)), + B = factor(c(b, x, x, x), levels = c(x, b))) df A B 1 a b 2 a x 3 x x 4 x x So of course the full design matrix is singular model.matrix(~ A * B, df) (Intercept) Aa Bb Aa:Bb 1 1 1 1 1 2 1 1 0 0 3 1 0 0 0 4 1 0 0 0 attr(,assign) [1] 0 1 2 3 attr(,contrasts) attr(,contrasts)$A [1] contr.treatment attr(,contrasts)$B [1] contr.treatment I'd like not to have the B term main effect in the model model.matrix(~ A * B - B, df) (Intercept) Aa Ax:Bb Aa:Bb 1 1 1 0 1 2 1 1 0 0 3 1 0 0 0 4 1 0 0 0 attr(,assign) [1] 0 1 2 2 attr(,contrasts) attr(,contrasts)$A [1] contr.treatment attr(,contrasts)$B [1] contr.treatment How do I get model.matrix to not add that column of zeroes? Why does model.matrix add that column of zeroes? Is this a bug, or a senior moment? Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reodering factor
One way to reorder a factor is to define a new factor and specify the order of levels using the levels argument of the factor() function. The first category specified for the levels argument will be the reference category in model fits such as with lm(). mydata - data.frame(y = c(runif(10), runif(10) + 10), grp = c(rep(A, 10), rep(B, 10))) mydata y grp 1 0.0808684 A 2 0.2930649 A 3 0.4671063 A 4 0.7815386 A 5 0.5360262 A 6 0.8092338 A 7 0.9965648 A 8 0.3549031 A 9 0.3426956 A 10 0.2988377 A 11 10.6528479 B 12 10.7118101 B 13 10.4484731 B 14 10.9638309 B 15 10.7650812 B 16 10.6355089 B 17 10.7003755 B 18 10.2147930 B 19 10.8901356 B 20 10.6319798 B lm(y ~ grp, data = mydata) Call: lm(formula = y ~ grp, data = mydata) Coefficients: (Intercept) grpB 0.4961 10.1654 mydata$grp2 - factor(mydata$grp, levels = c(B, A)) mydata y grp grp2 1 0.0808684 AA 2 0.2930649 AA 3 0.4671063 AA 4 0.7815386 AA 5 0.5360262 AA 6 0.8092338 AA 7 0.9965648 AA 8 0.3549031 AA 9 0.3426956 AA 10 0.2988377 AA 11 10.6528479 BB 12 10.7118101 BB 13 10.4484731 BB 14 10.9638309 BB 15 10.7650812 BB 16 10.6355089 BB 17 10.7003755 BB 18 10.2147930 BB 19 10.8901356 BB 20 10.6319798 BB lm(y ~ grp2, data = mydata) Call: lm(formula = y ~ grp2, data = mydata) Coefficients: (Intercept)grp2A 10.66 -10.17 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of John Sorkin Sent: Thu 5/3/2007 7:10 PM To: r-help@stat.math.ethz.ch Subject: [R] reodering factor R 2.4.1 Windows XP How does one reorder a factor? I have the following data: factor(data$Group) [1] ZZ ZT ZT ZZ ZZ ZT ZZ ZZ ZT ZT ZT ZT ZZ ZT ZT ZZ ZT ZZ ZT ZZ ZT ZT ZZ ZZ ZT ZZ ZT ZZ ZT ZZ ZZ ZT ZZ ZT Levels: ZT ZZ In my regression (i.e. lm(y~data$Group) ZT is taken as the reference category and I get an estimate for ZZ. I would like ZZ to be the reference category and obtain an estimate for ZT. Thank, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing an ordering on subsets of a data frame
Hi Lukas, Using by() or its cousins tapply() etc. is tricky, as you need to properly merge results back into X. You can do that by adding a key ID variable to X, and carrying along that key ID variable in calls to by() etc., though I haven't tested out a method. You can also create a new column in X to hold the results, and then sort the subsections of X in a for() loop. X - data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3)) X A B 1 1 2 2 1 3 3 1 4 4 2 3 5 2 1 6 2 1 7 3 2 8 3 1 9 3 3 X$C - rep(as.numeric(NA), nrow(X)) sortLevels - unique(X$A) for(i in seq(along = sortLevels)) { + sortIdxp - X$A == sortLevels[i] + X$C[sortIdxp] - rank(X$B[sortIdxp], ties.method = random) + } X A B C 1 1 2 1 2 1 3 2 3 1 4 3 4 2 3 3 5 2 1 1 6 2 1 2 7 3 2 2 8 3 1 1 9 3 3 3 Merging results back in after using tapply() or by() is harder if your data frame is in random order, but the for() loop approach with indexing still works fine. set.seed(123) Y - X[sample(9), ] Y A B C 3 1 4 3 7 3 2 2 9 3 3 3 6 2 1 2 5 2 1 1 1 1 2 1 2 1 3 2 8 3 1 1 4 2 3 3 Y$C - rep(as.numeric(NA), nrow(Y)) sortLevels - unique(Y$A) ## You can also use levels() instead of unique() if Y$A is a factor. for(i in seq(along = sortLevels)) { + sortIdxp - Y$A == sortLevels[i] + Y$C[sortIdxp] - rank(Y$B[sortIdxp], ties.method = random) + } Y A B C 3 1 4 3 7 3 2 2 9 3 3 3 6 2 1 2 5 2 1 1 1 1 2 1 2 1 3 2 8 3 1 1 4 2 3 3 oY - order(Y$A) Y[oY,] A B C 3 1 4 3 1 1 2 1 2 1 3 2 6 2 1 2 5 2 1 1 4 2 3 3 7 3 2 2 9 3 3 3 8 3 1 1 HTH Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of Lukas Biewald Sent: Wednesday, April 18, 2007 2:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Computing an ordering on subsets of a data frame If I have a data frame X that looks like this: A B - - 1 2 1 3 1 4 2 3 2 1 2 1 3 2 3 1 3 3 and I want to make another column which has the rank of B computed separately for each value of A. I.e. something like: A B C - - - 1 2 1 1 3 2 1 4 3 2 3 3 2 1 1 2 1 2 3 2 2 3 1 1 3 3 3 by(X, X[,1], function(x) { rank(x[,1], ties.method=random) } ) almost seems to work, but the data is not in a frame, and I can't figure out how to merge it back into X properly. Thanks, Lukas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting intercept from ppr fit
Hi Vadim, Estimates of the alpha_0 terms in MASS are the $yb component of the object returned by ppr(). As I understand it, the original PPR algorithm assumes the response variable(s) are centered, so the 'alpha_0' term in MASS is just the mean of the response if the user does not center the responses. Your actual 'a' term will not appear in the output of ppr. n - 1000 data - data.frame(x= rnorm (n), y= rnorm (n)) a - 10 data$z - evalq(a + atan (x + y) + rnorm (n), data) data.ppr - ppr(z ~ x + y, data=data, nterms =1) ## how to extract a = 10 from data.ppr? data.ppr$yb [1] 9.973964 a - 210 data$z - evalq(a + atan (x + y) + rnorm (n), data) data.ppr - ppr(z ~ x + y, data=data, nterms =1) data.ppr$yb [1] 209.9773 HTH Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Vadim Ogranovich Sent: Tue 4/17/2007 1:06 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] extracting intercept from ppr fit Sorry for triple-posting : I seem to have a problem w/ my mail client. Hi, Is there a way, documented or not, to extract the intercept term (the alpha_0 the MASS book) from a ppr() (Projection Persuit Regression) fit? Thanks, Vadim ## Example: n - 1000 data - data.frame(x= rnorm (n), y= rnorm (n)) a - 10 data$z - evalq(a + atan (x + y) + rnorm (n), data) data.ppr - ppr(z ~ x + y, data=data, nterms =1) ## how to extract a = 10 from data.ppr? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting approximate Wald test (Chisq) fromcoxph(..frailty)
$assign2 print2 - NULL for (i in 1:nterms) { kk - alist[[i]] if (print.map[i] 0) { j - print.map[i] if (pterms[i] == 2) temp - (object$printfun[[j]])(object$frail, object$fvar, , object$df[i], object$history[[j]]) else temp - (object$printfun[[j]])(coef[kk], object$var[kk, kk], object$var2[kk, kk], object$df[i], object$history[[j]]) print1 - rbind(print1, temp$coef) if (is.matrix(temp$coef)) { xx - dimnames(temp$coef)[[1]] if (is.null(xx)) xx - rep(names(pterms)[i], nrow(temp$coef)) else xx - paste(names(pterms)[i], xx, sep = , ) pname1 - c(pname1, xx) } else pname1 - c(pname1, names(pterms)[i]) print2 - c(print2, temp$history) } else if (terms length(kk) 1) { pname1 - c(pname1, names(pterms)[i]) temp - coxph.wtest(object$var[kk, kk], coef[kk])$test print1 - rbind(print1, c(NA, NA, NA, temp, object$df[i], 1 - pchisq(temp, 1))) } else { pname1 - c(pname1, names(coef)[kk]) tempe - (diag(object$var))[kk] temp - coef[kk]^2/tempe print1 - rbind(print1, cbind(coef[kk], sqrt(tempe), sqrt((diag(object$var2))[kk]), temp, 1, 1 - pchisq(temp, 1))) } } temp - cbind(format(print1[, 1]), format(print1[, 2]), format(print1[, 3]), format(round(print1[, 4], 2)), format(round(print1[, 5], 2)), format(signif(print1[, 6], 2))) temp - ifelse(is.na(print1), , temp) dimnames(temp) - list(substring(pname1, 1, maxlabel), c(coef, se(coef), se2, Chisq, DF, p)) prmatrix(temp, quote = FALSE) if (conf.int length(coef) 0) { z - qnorm((1 + conf.int)/2, 0, 1) coef - coef * scale se - se * scale tmp - cbind(exp(coef), exp(-coef), exp(coef - z * se), exp(coef + z * se)) dimnames(tmp) - list(substring(names(coef), 1, maxlabel), c(exp(coef), exp(-coef), paste(lower ., round(100 * conf.int, 2), sep = ), paste(upper ., round(100 * conf.int, 2), sep = ))) cat(\n) prmatrix(tmp) } logtest - -2 * (object$loglik[1] - object$loglik[2]) sctest - object$score cat(\nIterations:, object$iter[1], outer,, object$iter[2], Newton-Raphson\n) if (length(print2)) { for (i in 1:length(print2)) cat(, print2[i], \n) } if (is.null(object$df)) df - sum(!is.na(coef)) else df - round(sum(object$df), 2) cat(Degrees of freedom for terms=, format(round(object$df, 1)), \n) cat(Rsquare=, format(round(1 - exp(-logtest/object$n), 3)), (max possible=, format(round(1 - exp(2 * object$loglik[1]/object$n), 3)), )\n) cat(Likelihood ratio test= , format(round(logtest, 2)), on , df, df,,p=, format(1 - pchisq(logtest, df)), \n, sep = ) if (!is.null(object$wald.test)) cat(Wald test= , format(round(object$wald.test, 2)), on , df, df, p=, format(1 - pchisq(object$wald.test, df)), sep = ) if (!is.null(object$score)) cat(\nScore (logrank) test = , format(round(sctest, 2)), on , df, df,,p=, format(1 - pchisq(sctest, df)), sep = ) if (is.null(object$rscore)) cat(\n) else cat(, Robust = , format(round(object$rscore, 2)), p=, format(1 - pchisq(object$rscore, df)), \n, sep = ) invisible(return(list(temp = temp, tmp = tmp))) } Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Mohammad Ehsanul Karim Sent: Tue 4/17/2007 2:55 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Extracting approximate Wald test (Chisq) fromcoxph(..frailty) Dear list, I need to extract the approximate Wald test (Chisq) so that I can put it in a loop. str seemed like a great idea, but I cannot seem to find the approximate Wald test for frailty (in the example data below: 17.89 and its p-value 0.12000) there. I cannot seem to find it in capture.output either as numeric form. Do I need to modify some given values? If yes, please give me a clue for the example: library(survival) kfitm1-coxph(formula = Surv(time, status) ~ age + sex +disease + frailty(id, dist = gauss), data = kidney) str(kfitm1) capture.output( print(kfitm1) ) Mohammad Ehsanul Karim (R - 2.3.1 on windows) wildscop at yahoo dot com Institute of Statistical Research and Training University of Dhaka
Re: [R] problem with RCurl install on Unix
I get the same problem, and haven't figured it out yet. Is it a 32bit/64bit clash? (Similarly, I don't have RMySQL up and running cleanly.) library(RCurl) Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library '/share/apps/R/R-2.4.1/library/RCurl/libs/RCurl.so': libcurl.so.4: cannot open shared object file: No such file or directory Error: package/namespace load failed for 'RCurl' sessionInfo() R version 2.4.1 (2006-12-18) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.iso885915;LC_NUMERIC=C;LC_TIME=en_US.iso885915;LC_COLLATE=en_US.iso885915;LC_MONETARY=en_US.iso885915;LC_MESSAGES=en_US.iso885915;LC_PAPER=en_US.iso885915;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.iso885915;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: DBI 0.1-12 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Mark W Kimpel Sent: Wed 3/21/2007 3:09 PM To: r-help@stat.math.ethz.ch Subject: [R] problem with RCurl install on Unix I am having trouble getting an install of RCurl to work properly on a Unix server. The steps I have taken are: 1. installed cUrl from source without difficulty 2. installed RCurl from source using the command ~/R_HOME/R-devel/bin/R CMD INSTALL -l ~/R_HOME/R-devel/library ~/RCurl_0.8-0.tar.gz I received no errors during this install 3. when I go back to R and require(RCurl), I get require(RCurl) Loading required package: RCurl Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library '/N/u/mkimpel/BigRed/R_HOME/R-devel/library/RCurl/libs/RCurl.so': libcurl.so.4: cannot open shared object file: No such file or directory °1§ FALSE Outside of R I get mkimpelàBigRed:¨/R_HOME/R-devel/library/RCurl/libs ls RCurl.so I can cat into RCurl and I have even done chmod a+x RCurl.so in case there was a problem with permission for R to open the file. Below is my sessionInfo. Thanks, Mark sessionInfo() R version 2.5.0 Under development (unstable) (2007-03-11 r40824) powerpc64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: °1§ stats graphics grDevices datasets utils tools °7§ methods base other attached packages: limma affyaffyio Biobase 2.9.13 1.13.14 1.3.3 1.13.39 -- Mark W. Kimpel MD Neuroinformatics Department of Psychiatry Indiana University School of Medicine __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
Since you can index a matrix or dataframe with a matrix of logicals, you can use is.na() to index all the NA locations and replace them all with 0 in one command. mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, replace = TRUE), nrow = 6)) mydata.df V1 V2 V3 V4 V5 1 1 NA 1 1 1 2 1 NA NA NA 1 3 NA NA 1 NA NA 4 NA NA NA NA 1 5 NA 1 NA NA 1 6 1 NA NA 1 1 is.na(mydata.df) V1V2V3V4V5 1 FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE TRUE FALSE 3 TRUE TRUE FALSE TRUE TRUE 4 TRUE TRUE TRUE TRUE FALSE 5 TRUE FALSE TRUE TRUE FALSE 6 FALSE TRUE TRUE FALSE FALSE mydata.df[is.na(mydata.df)] - 0 mydata.df V1 V2 V3 V4 V5 1 1 0 1 1 1 2 1 0 0 0 1 3 0 0 1 0 0 4 0 0 0 0 1 5 0 1 0 0 1 6 1 0 0 1 1 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D. Sent: Wed 3/14/2007 5:22 PM To: R-Help List Subject: [R] replacing all NA's in a dataframe with zeros... I've seen how to replace the NA's in a single column with a data frame * mydata$ncigs[is.na(mydata$ncigs)]-0 *But this is just one column... I have thousands of columns (!) that I need to do this, and I can't figure out a way, outside of the dreaded loop, do replace all NA's in an entire data frame (all vars) without naming each var separately. Yikes. I'm racking my brain on this, seems like I must be staring at the obvious, but it eludes me. Searches have come up CLOSE, but not quite what I need.. Any pointers? -- --- David L. Van Brunt, Ph.D. mailto:[EMAIL PROTECTED] If Tyranny and Oppression come to this land, it will be in the guise of fighting a foreign enemy. --James Madison [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dendrogram - got it , just need to label :)
Here is one example of labeling nodes, borrowing code from the help page for the dendrapply() function. local({ edgeLab - function(n) { if(!is.leaf(n)) { a - attributes(n) i - i+1 attr(n, edgetext) - format(i) } n } i - 0 }) dL - dendrapply(as.dendrogram(hclust(dist(iris[, 1:4]), method = single)), edgeLab) plot(dL) This labels the edges above the nodes. Martin Maechler and Robert Gentleman are developing the dendrogram objects suite of functions. As I have had to label nodes in S-PLUS, I'd like to put in a request for a few more control parameters for edge/internal node labeling control: - Allow the label without the polygon surrounding it. The polygon can obliterate too much of the dendrogram for larger sample sizes. Perhaps an edgePar polygon plot logical p.plot taking values TRUE (default) and FALSE to omit the polygon. - Allow the label to appear near the node at the base of the edge. Perhaps an edgePar text location parameter t.pos taking values in (0.0, 1.0) where 0.5 is in the middle of the edge (the default) and 1.0 is at the base of the edge. Since clusters are not always identified by 'cutting' the dendrogram (e.g. in the iris single linkage dendrogram plot we want to identify internal nodes that are large runts or whose vertical edge lengths are considerably longer than average) it is useful to be able to identify nodes deeper in the tree. This is aided by having access to internal node labels/names and being able to extract internal nodes by those labels/names. Best Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of bunny , lautloscrew.com Sent: Fri 3/9/2007 2:02 PM To: R-help@stat.math.ethz.ch Subject: [R] dendrogram - got it , just need to label :) Hi all, Hi Gavin, thx for your help i finally found out what i want to do and how to fix it. just needed to get some more level my cut level was too small... two question remain... a) can i somehow scale the twigs after cutting ? b) how can i label the nodes and how to label which one... thx !! -m. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double-banger function names: preferences and suggestions
The underscore versus left-arrow conundrum has its roots in the evolution of ASCII during the middle of the last century. Some Teletype machines in the 1970s (when the S language was being developed) still had a left arrow, and its ASCII code was used in S as a one keystroke convenience for the assignment operator. The left arrow symbol was then removed from most keyboards/printers/fontsets and replaced by the underscore. Thus the underscore remained as a one keystroke assignment operator. See e.g. http://www.wps.com/projects/codes/index.html#GRPH LEFT-ARROW, ? UNDERSCORE, _ One of the graphical codes, left-arrow mutated to the underscore of ASCII-1967. It may have had earlier, or other, meanings, but for some early programming languages it was assignment, eg. c ? b + a C is assigned the sum of B and A. Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Marc Schwartz Sent: Sun 2/25/2007 8:28 AM To: Alberto Vieira Ferreira Monteiro Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Double-banger function names: preferences and suggestions On Sun, 2007-02-25 at 15:56 +, Alberto Vieira Ferreira Monteiro wrote: hadley wickham wrote: What do you prefer/recommend for double-banger function names: 1 scale.colour 2 scale_colour 3 scaleColour 1 is more R-like, but conflicts with S3. 2 is a modern version of number 1, but not many packages use it. Number 3 is more java-like. (I like number 2 best) Any suggestions? I always prefer 2, but this would make it non-portable to S-Plus. S-Plus has a bug, where _ is the equivalent to - (why would they do this? I prefer to think it's stupidity and not villainy) That's not a bug. If you search the archives of both the S-PLUS list and the R lists, you will see highly energized discussion on the use of the underscore operator. In R, the use of '_' was allowed for assignment up until version 1.8.0 when: DEPRECATED DEFUNCT o The assignment operator `_' has been removed. and subsequently allowed in names in version 1.9.0 when: o Underscore '_' is now allowed in syntactically valid names, and make.names() no longer changes underscores. Very old code that makes use of underscore for assignment may now give confusing error messages. Not to further contribute to the dialog on 'style', but to further contribute ;-), for those who have coded in the Windows environment (ie. C, VBA, etc.) the extension of sorts to number 3 is of course Hungarian Notation, named after Charles Simonyi, originally at Xerox PARC and later senior developer/architect at MS. The extension was the inclusion of the data type prefix, such as fnScaleColour to indicate that this was a function, with the name using caps to make words more distinct. And no, I'm not advocating that use...I have been guilty myself of using variants of 1 and 3, perhaps driven by my circulating caffeine levels as much as anything else. HTH, Marc Schwartz Off to go remove 12 inches of snow from the driveway and sidewalk...oy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google Custom Search Engine for R
Whereas R is very generic, CRAN is much less so. I've had very good luck adding CRAN to my search terms, e.g. try to Google cran 3d scatterplot This produces all R-related hits on the first Google page. Hope this helps Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Matthew Keller Sent: Fri 2/23/2007 7:51 AM To: Sérgio Nunes Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Google Custom Search Engine for R Hi Sergio, There was a discussion on this board recently about the difficulty of searching for R related material on the web. I think the custom google search engine is a good idea. It would be helpful if we could have access to the full list of websites it is indexing so that we could make suggestions about other sites that are missing. As it is, it only tells us that there are 35 websites, and shows us the first several. Also, you might check out Sasha Goodman's Rseek: http://www.rseek.org/ Have you tried to compare the success of yours with Rseek? All the Best, Matt On 2/23/07, Sérgio Nunes [EMAIL PROTECTED] wrote: Hi, Since R is a (very) generic name, I've been having some trouble searching the web for this topic. Due to this, I've just created a Google Custom Search Engine that includes several of the most relevant sites that have information on R. See it in action at: http://google.com/coop/cse?cx=018133866098353049407%3Aozv9awtetwy This is really a preliminary test. Feel free to add yourself to the project and contribute with suggestions. Sérgio Nunes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to plot two graphs on one single plot?
Broadly speaking, there are a few classes of plotting functions. 1) New graph functions. The plot() function starts a new graph. 2) Add to a graph functions The points(), lines(), text() etc. functions add something to the current graph. 3) Control graph functions par() controls various aspects of the graph. R graphics experts might have some better classification and terminology. So you want your second plotting function to be points() rather than plot(), to add to the existing graph. Try x1=rnorm(25, mean=0, sd=1) y1=dnorm(x1, mean=0, sd=1) x2=rnorm(25, mean=0, sd=1) y2=dnorm(x2, mean=0, sd=1) plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2), xlab='x', ylab='y') points(x2, y2, type='p', col=red, xlab='x', ylab='y') Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Yun Zhang Sent: Fri 2/23/2007 7:34 AM To: Henrique Dallazuanna Cc: r-help@stat.math.ethz.ch Subject: Re: [R] How to plot two graphs on one single plot? Thanks. Now R plots two graphs on one plot. Yet they are still on two graphs, vertically parallelized with each other. But what I want to do is actually plotting two distribution on one single graph, using the same x and y axis. Like: | | | (dist2) | (dist 1) | --- Is it possible to do that? Thanks, Yun Henrique Dallazuanna wrote: par(mfrow=c(2,1)) #your plot #after plot par(mfrow=c(1,1)) On 23/02/07, *Yun Zhang* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi, I am trying to plot two distribution graph on one plot. But I dont know how. I set them to the same x, y limit, even same x, y labels. Code: x1=rnorm(25, mean=0, sd=1) y1=dnorm(x1, mean=0, sd=1) x2=rnorm(25, mean=0, sd=1) y2=dnorm(x2, mean=0, sd=1) plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2), xlab='x', ylab='y') plot(x2, y2, type='p', col=red, xlab='x', ylab='y') They just dont show up in one plot. Any hint will be very helpful. Thanks, Yun __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná Brasil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug/problem reporting: Possible to modify posting guide FAQ?
If users post a bug or problem issue to an R-based news group (R-devel, R-help, BioC - though BioC is far more forgiving) they get yelled at for not reading the posting guide and FAQ. Please *_do_* read the FAQ, the posting guide, ... the yellers do say. So I read the BioC FAQ and it says... http://www.bioconductor.org/docs/faq/ Bug reports on packages should perhaps be sent to the package maintainer rather than to r-bugs. So I send email to a maintainer, who I believe rightly points out best to send this kind of questions to the bioc mailing list, rather than to myself privately, because other people might (a) also have answers or (b) benefit from the questions answers. Could the FAQ possibly be revised to some sensible combination that generates less finger pointing, such as Bug reports on packages should be sent to the Bioconductor mailing list, and sent or copied to the package maintainer, rather than to r-bugs. or Bug reports on packages should be sent to the package maintainer, and copied to the Bioconductor mailing list, rather than to r-bugs. Could the posting guides to R-help and R-devel do something similar? Sign me Tired of all the finger pointing http://www.r-project.org/posting-guide.html If the question relates to a contributed package , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find(functionname) and packageDescription(packagename) to find this information. Only send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports. How about If the question relates to a contributed package , e.g., one downloaded from CRAN, email the list and be sure to additionally send to or copy to the package maintainer as well. You can use find(functionname) and packageDescription(packagename) to find this information. Only send such questions to one of R-help or R-devel. This applies to both requests for help and to bug reports. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I modify an exported function in a locked environment?
Running R.app on Mac OS X 10.4 version _ platform powerpc-apple-darwin8.6.0 arch powerpc os darwin8.6.0 system powerpc, darwin8.6.0 status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) I am trying to learn how to modify functions in a locked environment. For an example, suppose I'm using the package zoo. zoo contains function rollmean.default rollmean.default function (x, k, na.pad = FALSE, align = c(center, left, right), ...) { x - unclass(x) n - length(x) y - x[k:n] - x[c(1, 1:(n - k))] y[1] - sum(x[1:k]) rval - cumsum(y)/k if (na.pad) { rval - switch(match.arg(align), left = { c(rval, rep(NA, k - 1)) }, center = { c(rep(NA, floor((k - 1)/2)), rval, rep(NA, ceiling((k - 1)/2))) }, right = { c(rep(NA, k - 1), rval) }) } return(rval) } environment: namespace:zoo Suppose for whatever reason I want output to be in percent, so I'd like to modify the result to be rval - 100 * cumsum(y)/k I cannot just copy the function and change it, as the namespace mechanism ensures the rollmean.default in 'zoo' continues to be used. If I use fixInNamespace(rollmean.default, ns = zoo) I can edit the rval - cumsum(y)/k line to read rval - 100 * cumsum(y)/k save the file and exit the R.app GUI editor. But this does not update the exported copy of the function (the documentation for fixInNamespace says this is the case) - how do I accomplish this last step? If I list the function after editing, I see the original copy. But if I reinvoke the editor via fixInNamespace(), I do see my modification. Where is my copy residing? How do I push it out to replace the exported copy? Is this the proper way to modify a package function? Are there other ways? I've searched webpages, R news, help files and have been unable to find out how to get this process fully completed. Any guidance appreciated. Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scripts to process array CGH data files from NimbleGen
Before I reinvent the wheel, has anyone set up scripts to read in array CGH data files from NimbleGen, for use in the aCGH package or a similar package? I have normalized NimbleGen files ([array_id]_[identifier]_normalized.txt in the NimbleGen naming scheme) several to a directory, to read in and use to create an aCGH object. Any info appreciated. Best Steve McKinney Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html