Is there a reason it's a factor and not a string? A string would seem to be
more appropriate to me (given that we know it's a number that can't be
represented exactly by R)

Hadley

On Saturday, April 26, 2014, Martin Maechler <maech...@stat.math.ethz.ch>
wrote:

> >>>>> Simon Urbanek <simon.urba...@r-project.org <javascript:;>>
> >>>>>     on Sat, 19 Apr 2014 13:06:15 -0400 writes:
>
>     > On Apr 19, 2014, at 9:00 AM, Martin Maechler <
> maech...@stat.math.ethz.ch <javascript:;>> wrote:
>     >>>>>>> McGehee, Robert <robert.mcge...@geodecapital.com<javascript:;>
> >
>     >>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>     >>
>     >>>> This is all application specific and
>     >>>> sort of beyond the scope of type.convert(), which now behaves as
> it
>     >>>> has been documented to behave.
>     >>
>     >>> That's only a true statement because the documentation was changed
> to reflect the new behavior! The new feature in type.convert certainly does
> not behave according to the documentation as of R 3.0.3. Here's a snippit:
>     >>
>     >>> The first type that can accept all the
>     >>> non-missing values is chosen (numeric and complex return values
>     >>> will represented approximately, of course).
>     >>
>     >>> The key phrase is in parentheses, which reminds the user to expect
> a possible loss of precision. That important parenthetical was removed from
> the documentation in R 3.1.0 (among other changes).
>     >>
>     >>> Putting aside the fact that this introduces a large amount of
> unnecessary work rewriting SQL / data import code, SQL packages, my biggest
> conceptual problem is that I can no longer rely on a particular function
> call returning a particular class. In my example querying stock prices,
> about 5% of prices came back as factors and the remaining 95% as numeric,
> so we had random errors popping in throughout the morning.
>     >>
>     >>> Here's a short example showing us how the new behavior can be
> unreliable. I pass a character representation of a uniformly distributed
> random variable to type.convert. 90% of the time it is converted to
> "numeric" and 10% it is a "factor" (in R 3.1.0). In the 10% of cases in
> which type.convert converts to a factor the leading non-zero digit is
> always a 9. So if you were expecting a numeric value, then 1 in 10 times
> you may have a bug in your code that didn't exist before.
>     >>
>     >>>> options(digits=16)
>     >>>> cl <- NULL; for (i in 1:10000) cl[i] <-
> class(type.convert(format(runif(1))))
>     >>>> table(cl)
>     >>> cl
>     >>> factor numeric
>     >>> 990    9010
>     >>
>     >> Yes.
>     >>
>     >> Murray's point is valid, too.
>     >>
>     >> But in my view, with the reasoning we have seen here,
>     >> *and* with the well known software design principle of
>     >> "least surprise" in mind,
>     >> I also do think that the default for type.convert() should be what
>     >> it has been for > 10 years now.
>     >>
>
>     > I think there should be two separate discussions:
>
>     > a) have an option (argument to type.convert and possibly read.table)
> to enable/disable this behavior. I'm strongly in favor of this.
>
> In my (not committed) version of R-devel, I now have
>
>  > str(type.convert(format(1/3, digits=17), exact=TRUE))
>   Factor w/ 1 level "0.33333333333333331": 1
>  > str(type.convert(format(1/3, digits=17), exact=FALSE))
>   num 0.333
>
> where the 'exact' argument name has been ``imported'' from the
> underlying C code.
>
> [ As we CRAN package writers know by now, arguments nowadays can
>   hardly be abbreviated anymore, and so I am not open to longer
>   alternative argument names, as someone liking blind typing, I'm
>   not fond of camel case or other keyboard gymnastics (;-) but if someone
> has a great idea for
>   a better argument name.... ]
>
> Instead of only  TRUE/FALSE, we could consider NA with
> semantics "FALSE + warning" or also "TRUE + warning".
>
>
>     > b) decide what the default for a) will be. I have no strong opinion,
> I can see arguments in both directions
>
> I think many have seen the good arguments in both directions.
> I'm still strongly advocating that we value long term stability
> higher here, and revert to more compatibility with the many
> years of previous versions.
>
> If we'd use a default of 'exact=NA', I'd like it to mean
> FALSE + warning, but would not oppose much to  TRUE + warning.
>
> I agree that for the TRUE case, it may make more sense to return
> string-like object of a new (simple) class such as  "bignum"
> that was mentioned in this thread.
>
> OTOH, this functionality should make it into an R 3.1.1 in the
> not so distant future, and thinking through consequences and
> implementing the new class approach may just take a tad too much
> time...
>
> Martin
>
>     > But most importantly I think a) is better than the status quo - even
> if the discussion about b) drags out.
>
>     > Cheers,
>     > Simon
>
> ______________________________________________
> R-devel@r-project.org <javascript:;> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
http://had.co.nz/

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to