Is there a reason it's a factor and not a string? A string would seem to be more appropriate to me (given that we know it's a number that can't be represented exactly by R)
Hadley On Saturday, April 26, 2014, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > >>>>> Simon Urbanek <simon.urba...@r-project.org <javascript:;>> > >>>>> on Sat, 19 Apr 2014 13:06:15 -0400 writes: > > > On Apr 19, 2014, at 9:00 AM, Martin Maechler < > maech...@stat.math.ethz.ch <javascript:;>> wrote: > >>>>>>> McGehee, Robert <robert.mcge...@geodecapital.com<javascript:;> > > > >>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes: > >> > >>>> This is all application specific and > >>>> sort of beyond the scope of type.convert(), which now behaves as > it > >>>> has been documented to behave. > >> > >>> That's only a true statement because the documentation was changed > to reflect the new behavior! The new feature in type.convert certainly does > not behave according to the documentation as of R 3.0.3. Here's a snippit: > >> > >>> The first type that can accept all the > >>> non-missing values is chosen (numeric and complex return values > >>> will represented approximately, of course). > >> > >>> The key phrase is in parentheses, which reminds the user to expect > a possible loss of precision. That important parenthetical was removed from > the documentation in R 3.1.0 (among other changes). > >> > >>> Putting aside the fact that this introduces a large amount of > unnecessary work rewriting SQL / data import code, SQL packages, my biggest > conceptual problem is that I can no longer rely on a particular function > call returning a particular class. In my example querying stock prices, > about 5% of prices came back as factors and the remaining 95% as numeric, > so we had random errors popping in throughout the morning. > >> > >>> Here's a short example showing us how the new behavior can be > unreliable. I pass a character representation of a uniformly distributed > random variable to type.convert. 90% of the time it is converted to > "numeric" and 10% it is a "factor" (in R 3.1.0). In the 10% of cases in > which type.convert converts to a factor the leading non-zero digit is > always a 9. So if you were expecting a numeric value, then 1 in 10 times > you may have a bug in your code that didn't exist before. > >> > >>>> options(digits=16) > >>>> cl <- NULL; for (i in 1:10000) cl[i] <- > class(type.convert(format(runif(1)))) > >>>> table(cl) > >>> cl > >>> factor numeric > >>> 990 9010 > >> > >> Yes. > >> > >> Murray's point is valid, too. > >> > >> But in my view, with the reasoning we have seen here, > >> *and* with the well known software design principle of > >> "least surprise" in mind, > >> I also do think that the default for type.convert() should be what > >> it has been for > 10 years now. > >> > > > I think there should be two separate discussions: > > > a) have an option (argument to type.convert and possibly read.table) > to enable/disable this behavior. I'm strongly in favor of this. > > In my (not committed) version of R-devel, I now have > > > str(type.convert(format(1/3, digits=17), exact=TRUE)) > Factor w/ 1 level "0.33333333333333331": 1 > > str(type.convert(format(1/3, digits=17), exact=FALSE)) > num 0.333 > > where the 'exact' argument name has been ``imported'' from the > underlying C code. > > [ As we CRAN package writers know by now, arguments nowadays can > hardly be abbreviated anymore, and so I am not open to longer > alternative argument names, as someone liking blind typing, I'm > not fond of camel case or other keyboard gymnastics (;-) but if someone > has a great idea for > a better argument name.... ] > > Instead of only TRUE/FALSE, we could consider NA with > semantics "FALSE + warning" or also "TRUE + warning". > > > > b) decide what the default for a) will be. I have no strong opinion, > I can see arguments in both directions > > I think many have seen the good arguments in both directions. > I'm still strongly advocating that we value long term stability > higher here, and revert to more compatibility with the many > years of previous versions. > > If we'd use a default of 'exact=NA', I'd like it to mean > FALSE + warning, but would not oppose much to TRUE + warning. > > I agree that for the TRUE case, it may make more sense to return > string-like object of a new (simple) class such as "bignum" > that was mentioned in this thread. > > OTOH, this functionality should make it into an R 3.1.1 in the > not so distant future, and thinking through consequences and > implementing the new class approach may just take a tad too much > time... > > Martin > > > But most importantly I think a) is better than the status quo - even > if the discussion about b) drags out. > > > Cheers, > > Simon > > ______________________________________________ > R-devel@r-project.org <javascript:;> mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- http://had.co.nz/ [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel