Thanks to the many R users who convinced me that the sum of NAs should be zero and gave me a solution if I did not want it to be zero.

Thank you also for the explanations of rounding errors with floating point arithmetics. I did not expect it. This small error was a real problem for me as I was trying to find a way to recode numeric values into intervals. Because I wanted to retain numeric values as a result, I tried not to use cut or cut2. Hence to convert a range of temperatures into 0.2 degree intervals I had written:

(lets first make a fake temperature variable k for testing)
k <- seq(-5,5,0.1)
k1 <- ifelse(k<0,-0.2*(abs(k) %/% 0.2) - 0.1, 0.2 *(k %/% 0.2) + 0.1)

Note that this works well to quickly recode a numeric variable that only takes integer values. But it produces the problem that prompted my call for help when there are decimals: some values end up in a different class than what you'd expect.

Considering your answers, I found 3 solutions:

k2 <- ifelse(k<0,-0.2*(abs(round(10*k)) %/% 2) - 0.1, 0.2 *(round(10*k) %/% 2) + 0.1)

k3 <- (-0.1+min(k)) + 0.2 * as.numeric(cut(k, seq(min(k),max(k)+0.2,0.2), right=F, labels=F))

k4 <- cut2(k, seq(min(k), max(k)+0.2, 0.2), levels.mean=T)
k5 <- as.numeric(levels(k7))[k7]

I could "round" to 1 decimal to be even more exact but this is good enough. If it can be more elegant, please let me know!

Denis
Subject: [R] 2 small problems: integer division and the nature of NA


Hi,

I'm wondering why

48 %/% 2 gives 24
but
4.8 %/% 0.2 gives 23...
I'm not trying to round up here, but to find out how many times
something fits into something else, and the answer should have been the
same for both examples, no?

On a different topic, I like the behavior of NAs better in R than in
SAS (at least they are not considered the smallest value for a
variable), but at the same time I am surprised that the sum of NAs is 0
instead of NA.

The sum of a vector having at least one NA but also valid data gives NA
if we do not specify na.rm=T. But with na.rm=T, we are telling sum to
give the sum of valid data, ignoring NAs that do not tell us anything
about the value of a variable. I found out while getting the sum of
small subsets of my data (such as when subsetting by several
variables), sometimes a "cell" only contained NAs for my response
variable. I would have expected the sum to be NA in such cases, as I do
not have a single data point telling me the value of my response here.
But R tells me the sum was zero in that cell! Was this behavior
considered "desirable" when sum was built? If not, any hope it will be
fixed?

Sincerely,

Denis Chabot


______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to