>>>>> peter dalgaard <pda...@gmail.com> >>>>> on Tue, 29 Apr 2014 09:32:21 +0200 writes:
> On 28 Apr 2014, at 19:17 , Martin Maechler <maech...@stat.math.ethz.ch> wrote: >> > [...snip...] >>>> I think there should be two separate discussions: >> >>>> a) have an option (argument to type.convert and possibly >>>> read.table) to enable/disable this behavior. I'm strongly >>>> in favor of this. >> >>> In my (not committed) version of R-devel, I now have >> >>>> str(type.convert(format(1/3, digits=17), exact=TRUE)) >>> Factor w/ 1 level "0.33333333333333331": 1 >>>> str(type.convert(format(1/3, digits=17), exact=FALSE)) >>> num 0.333 >> >>> where the 'exact' argument name has been ``imported'' from >>> the underlying C code. >> >>> [ As we CRAN package writers know by now, arguments >>> nowadays can hardly be abbreviated anymore, and so I am >>> not open to longer alternative argument names, as someone >>> liking blind typing, I'm not fond of camel case or other >>> keyboard gymnastics (;-) but if someone has a great idea >>> for a better argument name.... ] >> >>> Instead of only TRUE/FALSE, we could consider NA with >>> semantics "FALSE + warning" or also "TRUE + warning". >> >> >>>> b) decide what the default for a) will be. I have no >>>> strong opinion, I can see arguments in both directions >> >>> I think many have seen the good arguments in both >>> directions. I'm still strongly advocating that we value >>> long term stability higher here, and revert to more >>> compatibility with the many years of previous versions. >> >>> If we'd use a default of 'exact=NA', I'd like it to mean >>> FALSE + warning, but would not oppose much to TRUE + >>> warning. >> >> I have now committed svn rev 65507 --- to R-devel only for now --- >> the above: exact = NA is the default >> and it means "warning + FALSE". >> >> Interestingly, I currently get 5 identical warnings for one >> simple call, so there seems clearly room for optimization, and >> that is one main reason for this reason to not yet be migrated >> to 'R 3.1.0 patched'. > I actually think that the default should be the old behaviour. No warning, just potentially lose digits. If this gets a user in trouble, _then_ turn on the check for lost digits. > After all, I think we had about one single use case, where lost digits caused trouble (I cannot even dig up what the case was - someone had, like, 20-digit ID labels, I reckon). In contrast, we have seen umpteen cases where people have exported floating point data to slightly beyond machine precision, "just in case", and relied on read.table() to do the sensible thing. > It's also an open question whether we really want to apply the same logic to doubles and integer inputs. a really good point. From my cursory code reading it would not look so obvious where to make the distinction without quite a bit of more coding, but I may just have overlooked a good idea. > The whole change went in as (r62327) > "force type.convert to read e.g. 64-bit integers as strings/factors" > I, for one, did not expect that "e.g." would include 0.12345678901234567. My eyes were on the upcoming 3.0.0 release at that point, so I might not have noticed it anyway, but apparently noone lifted an eyebrow. It seems that this was deliberately postponed for 3.1.0, but for more than a year, noone actually exercised the code. > -pd > BTW, "exact" is a horrible name for an option, how about digitloss=c("allow", "warn", "forbid")? I've also thought quickly about switching to an "enumeration type" with string options. If we would distinguish integer and non-integer input (and hexadecimal vs decimal which are already different code branches), we would need more than three options anyway ... and when I start thinking about the possibilities, I start to see too many "desirable" possibilities, e.g., digitloss="allow for non-integers, don't warn" digitloss="allow for non-integers, do warn" digitloss="forbid, don't warn" digitloss="forbid, do warn" etc... which would speak for a different approach, maybe with yet another argument for dealing with "long integer" only. OTOH, I don't feel like spending even considerably more time on this, now, unless others are willing to also help (coding + testing). Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel