The error I am referring to is in unique.c in Base R, it cannot accomodate greater than 2^29 values, even though it appears the overflow protection should be 2^30. The only relevance of the arules package is I was using it while I discovered this issue.
Thanks, Patrick 2011/10/6 Uwe Ligges <lig...@statistik.tu-dortmund.de>: > > > On 05.10.2011 22:15, Patrick McCann wrote: >> >> Hi, >> >> I am trying to read in a rather large list of transactions using the >> arules library. > > You mean the arules package? > > >> It seems in the coerce method into the dgCmatrix, it >> somewhere calls unique. Unique.c throws an error when n> 536870912; >> however, when 4*n was modified to 2*n in 2004, the overflow protection >> should have changed from 2^29 to 2^30, right? If so, how would I >> change it in my copy? Do I have to recompile everything? > > Yes. > > There is also the way to ask the maintainer to improve it, but it won't work > without reinstallation of the changed package sources. > > Uwe Ligges > > >> >> Thanks, >> Patrick McCann >> >> >> Here is a simple to reproduce example: >>> >>> runif(2^29+5)->a >>> sum(unique(a))->b >> >> Error in unique.default(a) : length 536870917 is too large for hashing >>> >>> traceback() >> >> 3: unique.default(a) >> 2: unique(a) >> 1: unique(a) >>> >>> unique.default >> >> function (x, incomparables = FALSE, fromLast = FALSE, ...) >> { >> z<- .Internal(unique(x, incomparables, fromLast)) >> if (is.factor(x)) >> factor(z, levels = seq_len(nlevels(x)), labels = levels(x), >> ordered = is.ordered(x)) >> else if (inherits(x, "POSIXct")) >> structure(z, class = class(x), tzone = attr(x, "tzone")) >> else if (inherits(x, "Date")) >> structure(z, class = class(x)) >> else z >> } >> <environment: namespace:base> >> >>> From http://svn.r-project.org/R/trunk/src/main/unique.c I see: >> >> >> /* >> Choose M to be the smallest power of 2 >> not less than 2*n and set K = log2(M). >> Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29. >> >> Dec 2004: modified from 4*n to 2*n, since in the worst case we have >> a 50% full table, and that is still rather efficient -- see >> R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606. >> */ >> static void MKsetup(int n, HashData *d) >> { >> int n4 = 2 * n; >> >> if(n< 0 || n> 536870912) /* protect against overflow to -ve */ >> error(_("length %d is too large for hashing"), n); >> d->M = 2; >> d->K = 1; >> while (d->M< n4) { >> d->M *= 2; >> d->K += 1; >> } >> } >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.