> What would break is that three methods for doing the same thing would > give different answers. > > Please do have the courtesy to actually read the detailed explanation you > are given.
Sorry Prof. Ripley, I am attempting to read carefully, as this issue has deeper coding/debugging implications, and as you point out, "[.data.frame is one of the most complex functions in R" so please bear with me. This change in behaviour has taken away a side-effect debugging tool, discussed below. > > > On Fri, 3 Aug 2007, Steven McKinney wrote: > > > > >> -----Original Message----- > >> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > >> Sent: Fri 8/3/2007 1:05 PM > >> To: Steven McKinney > >> Cc: r-help@stat.math.ethz.ch > >> Subject: Re: [R] FW: Selecting undefined column of a data frame (was > >> [BioC] read.phenoData vs read.AnnotatedDataFrame) > >> > >> I've since seen your followup a more detailed explanation may help. > >> The path through the code for your argument list does not go where you > >> quoted, and there is a reason for it. > > > > > >> Generally when you extract in R and ask for an non-existent index you get > >> NA or NULL as the result (and no warning), e.g. > >> > >>> y <- list(x=1, y=2) > >>> y[["z"]] > >> NULL > >> > >> Because data frames 'must' have (column) names, they are a partial > >> exception and when the result is a data frame you get an error if it would > >> contain undefined columns. > >> > >> But in the case of foo[, "FileName"], the result is a single column and so > >> will not have a name: there seems no reason to be different from > >> > >>> foo[["FileName"]] > >> NULL > >>> foo$FileName > >> NULL > >> > >> which similarly select a single column. At one time they were different > >> in R, for no documented reason. This difference provided a side-effect debugging tool, in that where > bar <- foo[, "FileName"] used to throw an error, alerting as to a typo, it now does not. Having been burned by NULL results due to typos in code lines using the $ extractor such as > bar <- foo$FileName I learned to use > bar <- foo[, "FileName"] to help cut down on typo bugs. With the ubiquity of camelCase object names, this is a constant typing bug hazard. I am wondering what to do now to double check spelling when accessing columns of a dataframe. If "[.data.frame" stays as is, can a debug mechanism be implemented in R that forces strict adherence to existing list names in debug mode? This would also help debug typos in camelCase names when using the $ and [[ extractors and accessors. Are there other debugging tools already in R that can help point out such camelCase list element name typos? > >> > >> > >> On Fri, 3 Aug 2007, Prof Brian Ripley wrote: > >> > >>> You are reading the wrong part of the code for your argument list: > >>> > >>>> foo["FileName"] > >>> Error in `[.data.frame`(foo, "FileName") : undefined columns selected > >>> > >>> [.data.frame is one of the most complex functions in R, and does many > >>> different things depending on which arguments are supplied. > >>> > >>> > >>> On Fri, 3 Aug 2007, Steven McKinney wrote: > >>> > >>>> Hi all, > >>>> > >>>> What are current methods people use in R to identify > >>>> mis-spelled column names when selecting columns > >>>> from a data frame? > >>>> > >>>> Alice Johnson recently tackled this issue > >>>> (see [BioC] posting below). > >>>> > >>>> Due to a mis-spelled column name ("FileName" > >>>> instead of "Filename") which produced no warning, > >>>> Alice spent a fair amount of time tracking down > >>>> this bug. With my fumbling fingers I'll be tracking > >>>> down such a bug soon too. > >>>> > >>>> Is there any options() setting, or debug technique > >>>> that will flag data frame column extractions that > >>>> reference a non-existent column? It seems to me > >>>> that the "[.data.frame" extractor used to throw an > >>>> error if given a mis-spelled variable name, and I > >>>> still see lines of code in "[.data.frame" such as > >>>> > >>>> if (any(is.na(cols))) > >>>> stop("undefined columns selected") > >>>> > >>>> > >>>> > >>>> In R 2.5.1 a NULL is silently returned. > >>>> > >>>>> foo <- data.frame(Filename = c("a", "b")) > >>>>> foo[, "FileName"] > >>>> NULL > >>>> > >>>> Has something changed so that the code lines > >>>> if (any(is.na(cols))) > >>>> stop("undefined columns selected") > >>>> in "[.data.frame" no longer work properly (if > >>>> I am understanding the intention properly)? > >>>> > >>>> If not, could "[.data.frame" check an > >>>> options() variable setting (say > >>>> warn.undefined.colnames) and throw a warning > >>>> if a non-existent column name is referenced? > >>>> > >>>> > >>>> > >>>> > >>>>> sessionInfo() > >>>> R version 2.5.1 (2007-06-27) > >>>> powerpc-apple-darwin8.9.1 > >>>> > >>>> locale: > >>>> en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > >>>> > >>>> attached base packages: > >>>> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > >>>> "base" > >>>> > >>>> other attached packages: > >>>> plotrix lme4 Matrix lattice > >>>> "2.2-3" "0.99875-4" "0.999375-0" "0.16-2" > >>>>> > >>>> > >>>> > >>>> > >>>> Steven McKinney > >>>> > >>>> Statistician > >>>> Molecular Oncology and Breast Cancer Program > >>>> British Columbia Cancer Research Centre > >>>> > >>>> email: smckinney +at+ bccrc +dot+ ca > >>>> > >>>> tel: 604-675-8000 x7561 > >>>> > >>>> BCCRC > >>>> Molecular Oncology > >>>> 675 West 10th Ave, Floor 4 > >>>> Vancouver B.C. > >>>> V5Z 1L3 > >>>> Canada > >>>> > >>>> > >> > >> > >> -- > >> Brian D. Ripley, [EMAIL PROTECTED] > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >> University of Oxford, Tel: +44 1865 272861 (self) > >> 1 South Parks Road, +44 1865 272866 (PA) > >> Oxford OX1 3TG, UK Fax: +44 1865 272595 > >> > >> > >> > > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.