Thank you also for the explanations of rounding errors with floating point arithmetics. I did not expect it. This small error was a real problem for me as I was trying to find a way to recode numeric values into intervals. Because I wanted to retain numeric values as a result, I tried not to use cut or cut2. Hence to convert a range of temperatures into 0.2 degree intervals I had written:
(lets first make a fake temperature variable k for testing) k <- seq(-5,5,0.1) k1 <- ifelse(k<0,-0.2*(abs(k) %/% 0.2) - 0.1, 0.2 *(k %/% 0.2) + 0.1)
Note that this works well to quickly recode a numeric variable that only takes integer values. But it produces the problem that prompted my call for help when there are decimals: some values end up in a different class than what you'd expect.
Considering your answers, I found 3 solutions:
k2 <- ifelse(k<0,-0.2*(abs(round(10*k)) %/% 2) - 0.1, 0.2 *(round(10*k) %/% 2) + 0.1)
k3 <- (-0.1+min(k)) + 0.2 * as.numeric(cut(k, seq(min(k),max(k)+0.2,0.2), right=F, labels=F))
k4 <- cut2(k, seq(min(k), max(k)+0.2, 0.2), levels.mean=T) k5 <- as.numeric(levels(k7))[k7]
I could "round" to 1 decimal to be even more exact but this is good enough. If it can be more elegant, please let me know!
Denis
Subject: [R] 2 small problems: integer division and the nature of NA
Hi,
I'm wondering why
48 %/% 2 gives 24 but 4.8 %/% 0.2 gives 23... I'm not trying to round up here, but to find out how many times something fits into something else, and the answer should have been the same for both examples, no?
On a different topic, I like the behavior of NAs better in R than in SAS (at least they are not considered the smallest value for a variable), but at the same time I am surprised that the sum of NAs is 0 instead of NA.
The sum of a vector having at least one NA but also valid data gives NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum to give the sum of valid data, ignoring NAs that do not tell us anything about the value of a variable. I found out while getting the sum of small subsets of my data (such as when subsetting by several variables), sometimes a "cell" only contained NAs for my response variable. I would have expected the sum to be NA in such cases, as I do not have a single data point telling me the value of my response here. But R tells me the sum was zero in that cell! Was this behavior considered "desirable" when sum was built? If not, any hope it will be fixed?
Sincerely,
Denis Chabot
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html