Agreed. Perhaps even a global option would make sense. We already have an option with a similar spirit: 'options(³stringsAsFactors"=T/F)'. Perhaps 'options(³exactNumericAsString²=T/F)' [or something else] would be desirable, with the option being the default value to the type.convert argument.
I also like Gabor¹s idea of a ³distinguishing class². R doesn¹t natively support arbitrary precision numbers (AFAIK), but I think that¹s what Murray wants. I could imagine some kind of new class emerging here that initially looks just like a character/factor, but may evolve over time to accept arithmetic methods and act more like a number (e.g. knowing that ³0.1², ³.10² and "1e-1" are the same number, or that ³-9²<³-0.2"). A class ³bignum² perhaps? Cheers, Robert On 4/20/14, 3:24 AM, "Murray Stokely" <mur...@stokely.org> wrote: >Yes, I'm also strongly in favor of having an option for this. If >there was an option in base R for controlling this we would just use >that and get rid of the separate RProtoBuf.int64AsString option we use >in the RProtoBuf package on CRAN to control whether 64-bit int types >from C++ are returned to R as numerics or character vectors. > >I agree that reasonable people can disagree about the default, but I >found my original bug report about this, so I will counter Robert's >example with my favorite example of what was wrong with the previous >behavior : > >tmp<-data.frame(n=c("72057594037927936", "72057594037927937"), >name=c("foo", "bar")) >length(unique(tmp$n)) ># 2 >write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE) >data <- read.csv("/tmp/foo.csv") >length(unique(data$n)) ># 1 > > - Murray > > >On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek ><simon.urba...@r-project.org> wrote: >> On Apr 19, 2014, at 9:00 AM, Martin Maechler >><maech...@stat.math.ethz.ch> wrote: >> >>>>>>>> McGehee, Robert <robert.mcge...@geodecapital.com> >>>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes: >>> >>>>> This is all application specific and >>>>> sort of beyond the scope of type.convert(), which now behaves as it >>>>> has been documented to behave. >>> >>>> That's only a true statement because the documentation was changed to >>>>reflect the new behavior! The new feature in type.convert certainly >>>>does not behave according to the documentation as of R 3.0.3. Here's a >>>>snippit: >>> >>>> The first type that can accept all the >>>> non-missing values is chosen (numeric and complex return values >>>> will represented approximately, of course). >>> >>>> The key phrase is in parentheses, which reminds the user to expect a >>>>possible loss of precision. That important parenthetical was removed >>>>from the documentation in R 3.1.0 (among other changes). >>> >>>> Putting aside the fact that this introduces a large amount of >>>>unnecessary work rewriting SQL / data import code, SQL packages, my >>>>biggest conceptual problem is that I can no longer rely on a >>>>particular function call returning a particular class. In my example >>>>querying stock prices, about 5% of prices came back as factors and the >>>>remaining 95% as numeric, so we had random errors popping in >>>>throughout the morning. >>> >>>> Here's a short example showing us how the new behavior can be >>>>unreliable. I pass a character representation of a uniformly >>>>distributed random variable to type.convert. 90% of the time it is >>>>converted to "numeric" and 10% it is a "factor" (in R 3.1.0). In the >>>>10% of cases in which type.convert converts to a factor the leading >>>>non-zero digit is always a 9. So if you were expecting a numeric >>>>value, then 1 in 10 times you may have a bug in your code that didn't >>>>exist before. >>> >>>>> options(digits=16) >>>>> cl <- NULL; for (i in 1:10000) cl[i] <- >>>>>class(type.convert(format(runif(1)))) >>>>> table(cl) >>>> cl >>>> factor numeric >>>> 990 9010 >>> >>> Yes. >>> >>> Murray's point is valid, too. >>> >>> But in my view, with the reasoning we have seen here, >>> *and* with the well known software design principle of >>> "least surprise" in mind, >>> I also do think that the default for type.convert() should be what >>> it has been for > 10 years now. >>> >> >> I think there should be two separate discussions: >> >> a) have an option (argument to type.convert and possibly read.table) to >>enable/disable this behavior. I'm strongly in favor of this. >> >> b) decide what the default for a) will be. I have no strong opinion, I >>can see arguments in both directions >> >> But most importantly I think a) is better than the status quo - even if >>the discussion about b) drags out. >> >> Cheers, >> Simon >> >> >> ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel