Re: [R] Best practice: to factor or not to factor for float variables

2014-07-05 Thread Sebastian Schubert
Hi Hadley, actually, I started with floating point numbers, ensured that the respective numbers are equal in R but I still got strange behaviour with dplyr's group_by: https://github.com/hadley/dplyr/issues/482 If I had to guess, I would suppose the source of this error somewhere in the C++

Re: [R] Best practice: to factor or not to factor for float variables

2014-07-05 Thread MacQueen, Don
However, format((0.1+0.2)) == format(0.3) [1] TRUE Which suggests that if you want to treat measured variables as categories, one way to do it is to format them first. Of course, one may have to control the format more carefully than above (if necessary, see for example ?formatC). merge() on

[R] Best practice: to factor or not to factor for float variables

2014-07-04 Thread Sebastian Schubert
Hi, I would like to ask for best practice advice on the design of data structure and the connected analysis techniques. In my particular case, I have measurements of several variables at several, sometimes equal, heights. Following the tidy data approach of Hadley Wickham, I want to put all data

Re: [R] Best practice: to factor or not to factor for float variables

2014-07-04 Thread PIKAL Petr
Hi I would keep height as numeric and created height.f as factor, maybe ordered. hh-runif(50) hh [1] 0.116060220 0.447546370 0.433749570 0.006548963 0.425710667 0.328972894 [7] 0.091274539 0.271797166 0.007669982 0.208922146 0.168174196 0.227466231 ... hh.f-cut(hh, seq(0,1,.1)) hh.f [1]

Re: [R] Best practice: to factor or not to factor for float variables

2014-07-04 Thread Hadley Wickham
Why not just round the floating point numbers to ensure they're equal with zapsmall, round or signif? Hadley On Fri, Jul 4, 2014 at 4:04 AM, Sebastian Schubert schubert@gmail.com wrote: Hi, I would like to ask for best practice advice on the design of data structure and the connected

Re: [R] Best practice: to factor or not to factor for float variables

2014-07-04 Thread David Winsemius
Keep as numeric and group with cut(), Hmisc::cut2, or findInterval. The beauty of the functional language design is that you do not need to create a new factor variable. -- David Sent from my iPhone On Jul 4, 2014, at 8:33 AM, Hadley Wickham h.wick...@gmail.com wrote: Why not just round