On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert <robert.mcge...@geodecapital.com> wrote: > Here's my use case: I have a function that pulls arbitrary financial data > from a web service call such as a stock's industry, price, volume, etc. by > reading the web output as a text table. The data may be either character > (industry, stock name, etc.) or numeric (price, volume, etc.), and the > function generally doesn't know the class in advance. The problem is that we > frequently get numeric values represented with more precision than actually > exists, for instance a price of "2.6999999999999999" rather than "2.70". The > numeric representation is exactly one digit too much for type.convert which > (in R 3.10.0) converts it to character instead of numeric (not what I want). > This caused a bunch of "non-numeric argument to binary operator" errors to > appear today as numeric data was now being represented as characters. > > I have no doubt that this probably will cause some unwanted RODBC side > effects for us as well. IMO, getting the class right is more important than > infinite precision. What use is a character representation of a number anyway > if you can't perform arithmetic on it? I would favor at least making the new > behavior optional, but I think many packages (like RODBC) potentially need to > be patched to code around the new feature if it's left in.
The uses of character representation of a number are many: unique identifiers/user ids, hash codes, timestamps, or other values where rounding results to the nearest value that can be represented as a numeric type would completely change the results of any data analysis performed on that data. Database join operations are certainly an area where R's previous behavior of silently dropping precision of numbers with type.convert can get you into trouble. For example, things like join operations or group by operations performed in R code would produce erroneous results if you are joining/grouping by a key without the full precision of your underlying data. Records can get joined up incorrectly or aggregated with the wrong groups. If you later want to do arithmetic on them, you can choose to lose precision by using as.numeric() or use one of the large number packages on CRAN (GMP, int64, bit64, etc.). But once you've dropped the precision with as.numeric you can never get it back, which is why the previous behavior was clearly dangerous. I think I had some additional examples in the original bug/patch I filed about this issue a few years ago, but I'm unable to find it on bugs.r-project.org and its not referenced in the cl descriptions or news file. - Murray ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel