I've been asked to close this thread "in public" >>>>> "TH" == Turner, Heather <[EMAIL PROTECTED]> >>>>> on Wed, 22 Aug 2007 10:20:21 +0100 writes:
TH> Petr Savicky kindly brought this thread to my attention TH> as I'm afraid it had passed me by. As one of the TH> contributors to the earlier discussion on adding TH> warnings to sweep I would like to give my support to TH> Petr's proposed patch. Petr's proposed patch has been added to R-devel (to become R-2.6.0 alpha in two days) last week. Actually it was found subsequently that there was a small problem in the patch [basically because n %% n is zero only for n != 0 and the case n==0 is valid]. If you are interested in using the new sweep() version, and because we are entering alpha testing of R 2.6.0, I'd recommend you to start using "R-devel". Martin Maechler TH> For the record I should say that Petr was right to point TH> out that the use of MARGIN in my examples did not make TH> sense TH> https://stat.ethz.ch/pipermail/r-devel/2007-July/046487.html TH> so I have no quibble with that. TH> I think it is sensible too, to use the dim attribute of TH> STATS as the basis of the test, when the dim attribute TH> is present. This provides a way to control the strength TH> of the test in the case of sweeping out a vector, as TH> Petr describes in his message below. I think that the TH> proposed patch successfully brings together the TH> different views on what should be tested, which was the TH> stumbling block last time around TH> https://stat.ethz.ch/pipermail/r-help/2005-June/074037.html TH> Even if people set check.margin = FALSE for reasons of TH> speed, this in itself should be a useful check, since TH> they will need to be confident that the test is TH> unnecessary. TH> Heather TH> -----Original Message----- From: TH> [EMAIL PROTECTED] TH> [mailto:[EMAIL PROTECTED] On Behalf Of Petr TH> Savicky Sent: 08 August 2007 07:54 To: TH> r-devel@r-project.org Subject: Re: [Rd] sweep sanity TH> checking? TH> Thanks to Martin Maechler for his comments, advice and TH> for pointing out the speed problem. Thanks also to Ben TH> Bolker for tests of speed, which confirm that for small TH> arrays, a slow down by a factor of about 1.2 - 1.5 may TH> occur. Now, I would like to present a new version of TH> sweep, which is simpler and has an option to avoid the TH> test. This is expected to be used in scripts, where the TH> programmer is quite sure that the usage is correct and TH> speed is required. The new version differs from the TH> previous one in the following: TH> 1. The option check.margin has a different meaning. It TH> defaults to TRUE and it determines whether the test is TH> performed or not. TH> 2. Since check.margin has the meaning above, it cannot TH> be used to select, which test should be performed. This TH> depends on the type of STATS. The suggested sweep TH> function contains two tests: - a vector test by Heather TH> Turner, which is used, if STATS has no dim attribute TH> and, hence, is a vector (STATS should not be anything TH> else than a vector or an array) - an array test used if TH> STATS has dim attribute. The vector test allows some TH> kinds of recycling, while the array test does TH> not. Hence, in the most common case, where x is a matrix TH> and STATS is a vector, if the user likes to be warned if TH> the length of the vector is not exactly the right one, TH> the following call is suggested: TH> sweep(x,MARGIN,as.array(STATS)). Otherwise, a warning TH> will be generated only if length(STATS) does not divide TH> the specified dimension of x, which is nrow(x) TH> (MARGIN=1) or ncol(x) (MARGIN=2). TH> 3. If STATS is an array, then the test is more TH> restrictive than in the previous version. It is now TH> required that after deleting dimensions with one level, TH> the remaining dimensions coincide. The previous version TH> allowed additionally the cases, when dim(STATS) is a TH> prefix of dim(x)[MARGIN], for example, if dim(STATS) = TH> k1 and dim(x)[MARGIN] = c(k1,k2). TH> The code of the tests in the suggested sweep is based on TH> the previous suggestions TH> https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html TH> by Robin Hankin TH> https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html TH> by Heather Turner TH> https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html TH> by Ben Bolker with some further modifications. TH> The modification of sweep.Rd was prepared by Ben Bolker TH> and me. TH> I would like to encourage everybody who likes to express TH> his opinion on the patch to do it now. In my opinion, TH> the suggestion of the new code stabilized in the sense TH> that I will not modify it unless there is a negative TH> feedback. TH> A patch against R-devel_2007-08-06 is attached. It TH> contains tabs. If they are corrupted by email transfer, TH> use the link TH> http://www.cs.cas.cz/~savicky/R-devel/patch-sweep which TH> is an identical copy. TH> Petr Savicky. TH> ======================================================================== TH> ======================== --- TH> R-devel_2007-08-06/src/library/base/R/sweep.R 2007-07-27 TH> 17:51:13.000000000 +0200 +++ TH> R-devel_2007-08-06-sweep/src/library/base/R/sweep.R TH> 2007-08-07 10:30:12.383672960 +0200 @@ -14,10 +14,29 @@ TH> # A copy of the GNU General Public License is available TH> at # http://www.r-project.org/Licenses/ TH> -sweep <- function(x, MARGIN, STATS, FUN = "-", ...) TH> +sweep <- function(x, MARGIN, STATS, FUN = "-", TH> check.margin=TRUE, ...) { FUN <- match.fun(FUN) dims <- TH> dim(x) + if (check.margin) { + dimmargin <- dims[MARGIN] TH> + dimstats <- dim(STATS) + lstats <- length(STATS) + if TH> (lstats > prod(dimmargin)) { + warning("length of STATS TH> greater than the extent of dim(x)[MARGIN]") + } else if TH> (is.null(dimstats)) { # STATS is a vector + cumDim <- TH> c(1, cumprod(dimmargin)) + upper <- min(cumDim[cumDim >= TH> lstats]) + lower <- max(cumDim[cumDim <= lstats]) + if TH> (upper %% lstats != 0 || lstats %% lower != 0) + TH> warning("STATS does not recycle exactly across MARGIN") TH> + } else { + dimmargin <- dimmargin[dimmargin > 1] + TH> dimstats <- dimstats[dimstats > 1] + if TH> (length(dimstats) != length(dimmargin) || any(dimstats TH> != dimmargin)) + warning("length(STATS) or dim(STATS) do TH> not match dim(x)[MARGIN]") + } + } perm <- c(MARGIN, TH> (1:length(dims))[ - MARGIN]) FUN(x, aperm(array(STATS, TH> dims[perm]), order(perm)), ...) } --- TH> R-devel_2007-08-06/src/library/base/man/sweep.Rd TH> 2007-07-27 17:51:35.000000000 +0200 +++ TH> R-devel_2007-08-06-sweep/src/library/base/man/sweep.Rd TH> 2007-08-07 10:29:45.517757200 +0200 @@ -11,7 +11,7 @@ TH> statistic. } \usage{ -sweep(x, MARGIN, STATS, FUN="-", TH> \dots) +sweep(x, MARGIN, STATS, FUN="-", TH> check.margin=TRUE, \dots) } \arguments{ \item{x}{an TH> array.} @@ -22,8 +22,18 @@ case of binary operators TH> such as \code{"/"} etc., the function name must TH> backquoted or quoted. (\code{FUN} is found by a call to TH> \code{\link{match.fun}}.)} + TH> \item{check.margin}{logical. If \code{TRUE} (the TH> default), warn if the + length or dimensions of TH> \code{STATS} do + not match the specified dimensions of TH> \code{x}.} \item{\dots}{optional arguments to TH> \code{FUN}.} } +\details{ + The consistency check among TH> \code{STATS}, \code{MARGIN} and \code{x} + is stricter TH> if \code{STATS} is an array than if it is a vector. + TH> In the vector case, some kinds of recycling are allowed TH> without a + warning. Use TH> \code{sweep(x,MARGIN,as.array(STATS))} if \code{STATS} + TH> is a vector and you want to be warned if any recycling TH> occurs. +} \value{ An array with the same shape as TH> \code{x}, but with the summary statistics swept out. TH> ______________________________________________ TH> R-devel@r-project.org mailing list TH> https://stat.ethz.ch/mailman/listinfo/r-devel TH> ______________________________________________ TH> R-devel@r-project.org mailing list TH> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel