On 12/16/2006 12:25 PM, John Zedlewski wrote: > Duncan-- > Ah, good point, clearly setting the tolerance to 0 is bad in that > case. Also, my code has another problem when the max is negative -- it > will set a negative tolerance! One good fix for both problems is the > following: set the initial value of "large" to the first value in the > row instead of 0.0, then keep the "fmax2(a,large)" approach from my > patch, but at the end, take the absolute value of large. That will > always follow the current docs -- using the largest value, not the > largest absolute value in the row, for comparison. > > You proposal of changing the docs and just fixing the infinite > problem sounds like a reasonable approach too, especially if people > are already depending on this behavior (??), although I still think > it's s a little weird that comparing "(-1e10, 2, 3)" will say that 2 > or 3 could be the max.
I think the explanation for this is in the man page: "In this case, the determination of "tie" assumes that the entries are probabilities..." If you transform that vector linearly to force it into the [0,1] range of probabilities, the behaviour is clear: > x <- c(-1e10, 2, 3) > (x - min(x))/diff(range(x)) [1] 0 1 1 Duncan Murdoch > Thanks, > --JRZ > >> I think I'd prefer to be clear about what "largest" means in the docs >> rather than dropping the absolute value, because if all entries are >> negative, your version sets the tolerance to 0. > > > On 12/16/06, Duncan Murdoch <[EMAIL PROTECTED]> wrote: >> On 12/15/2006 7:09 PM, John Zedlewski wrote: >>> I've noticed that the max.col function with the default "random" >>> option often gives unexpected results. For instance, in this test, it >>> seems clear what the answer should be: >>> >>>> # second col should always be max >>>> x1 = cbind(1:10, 2:11, -Inf) >>>> >>>> # this works fine >>>> max.col(x1, "first") >>> [1] 2 2 2 2 2 2 2 2 2 2 >>>> # this gives random answers >>>> max.col(x1) >>>> [1] 3 1 1 2 3 3 1 3 1 1 >>> Ouch! max.col is randomizing across all values. >>> Even without infinite values, something similar can happen: >>> >>>> # test 2 --- tolerance problems >>>> >>>> # clearly column 3 is the max >>>> x1 = cbind(-1e9 * 1:10, 1:10, 2:11) >>>> >>>> # again, first method works: >>>> max.col(x1, "first") >>> [1] 3 3 3 3 3 3 3 3 3 3 >>>> # but random doesn't >>>> max.col(x1) >>> [1] 2 3 2 3 3 2 2 2 3 2 >>> >>> The max.col docs say " there is a relative tolerance of 1e-5, relative >>> to the largest entry in the row", but it's really using the maximum >>> absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0). >>> Is this necessary for some sort of S-plus compatibility? If so, I >>> think it would be good to make this absolute value property very clear >>> in the docs, since it can cause subtle bugs (and did for me). >>> >>> Personally, I think the behavior is much nicer with the following patch: >>> >>> --- rplain/R-2.4.0/src/appl/maxcol.c 2006-04-09 18:19:58.000000000 -0400 >>> +++ R-2.4.0/src/appl/maxcol.c 2006-12-14 15:30:56.000000000 -0500 >>> @@ -26,13 +26,14 @@ >>> do_rand = *ties_meth == 1; >>> >>> for (r = 0; r < n_r; r++) { >>> - /* first check row for any NAs and find the largest abs(entry) */ >>> + /* first check row for any NAs and find the largest entry */ >>> large = 0.0; >>> isna = FALSE; >>> for (c = 0; c < *nc; c++) { >>> a = matrix[r + c * n_r]; >>> if (ISNAN(a)) { isna = TRUE; break; } >>> - if (do_rand) large = fmax2(large, fabs(a)); >>> + if (!R_FINITE(a)) continue; >>> + if (do_rand) large = fmax2(large, a); >>> } >>> if (isna) { maxes[r] = NA_INTEGER; continue; } >>> >>> ---------------- END ---------------------- >>> >>> This gives the expected behavior in the two examples above. >> I think I'd prefer to be clear about what "largest" means in the docs >> rather than dropping the absolute value, because if all entries are >> negative, your version sets the tolerance to 0. >> >> But ignoring infinite values when calculating the largest absolute value >> might be a good idea: I can't see why someone would want an infinite >> tolerance. >> >> So I'd propose to add your R_FINITE check, and make this change to the docs: >> >> " there is a relative tolerance of 1e-5, relative >> to the largest finite absolute value in the row" >> >> The copyright notice in the file says this function is from MASS/MASS.c >> by W. N. Venables and B. D. Ripley, so I'd like to hear from at least >> one of them before making a change. >> >> Duncan Murdoch >> >>> (Sorry to crosspost to both this list and R-help, but it was suggested >>> that R-devel would be a more appropriate forum for this.) >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel