[R] arules killed

2012-01-17 Thread Patrick McCann
Hi, I recently got a bizarre message when running arules. It just said
Killed and quit. Anyone know why this might have happened? I am running R
on an AWS quad xl ubuntu instance.

Here is some information, including dataset size and the parameters:

parameter specification:
   confidence minval smax arem  aval originalSupport  support minlen
maxlen
 0.00035812510.11 none FALSETRUE 3.581251e-05  2
   4
 target   ext
  rules FALSE

algorithmic control:
 filter tree heap memopt load sort verbose
0.1 TRUE TRUE  FALSE TRUE2TRUE

apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)(c) 1996-2004   Christian Borgelt
set item appearances ...[1712 item(s)] done [0.00s].
set transactions ...[1712 item(s), 837696 transaction(s)] done [3.99s].
sorting and recoding items ... [1561 item(s)] done [1.83s].
creating transaction tree ... done [1.65s].
checking subsets of size 1 2 3Killed

Thanks,
Patrick McCann

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pmml for random forest rules

2011-10-10 Thread Patrick McCann
Hi,

I am having some trouble using R 2.13.1 for generating a pmml object
of of class c('randomForest.formula', 'randomForest')

I see that these methods are available:
 methods(pmml)
 [1] pmml.coxph*pmml.hclust*   pmml.itemsets* pmml.kmeans*
pmml.ksvm* pmml.lm*   pmml.multinom* pmml.nnet*
pmml.rpart*
[10] pmml.rsf*  pmml.rules*pmml.survreg*


However, the R journal 1/1 pg 64 says there should be a method
available ( 
http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf
):

Random Forest (and randomSurvivalForest)
— randomForest (Breiman and Cutler. R
port by A. Liaw and M. Wiener, 2009) and randomSurvivalForest
(Ishwaran and Kogalur ,
2009): PMML export of a randomSurvivalForest rsf object. This
function gives the user
the ability to export PMML containing the geometry of a forest.

However, if I run these lines of code:

library(randomForest)
(iris.rf- randomForest(Species ~ ., data=iris))
pmml(iris.rf)

I get this error:

Error in UseMethod(pmml) :
  no applicable method for 'pmml' applied to an object of class
c('randomForest.formula', 'randomForest')



Also, if I run these lines of code
data(Adult)
## Mine association rules.
rules - apriori(Adult,
 parameter = list(supp = 0.5, conf = 0.9,
  target = rules))
 pmml(rules)


I get this error:
 pmml(rules)
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function size, for
signature itemMatrix


With this traceback:

 traceback()
5: stop(unable to find an inherited method for function \, fdef@generic,
   \, for signature , cnames)
4: function (classes, fdef, mtable)
   {
   methods - .findInheritedMethods(classes, fdef, mtable)
   if (length(methods) == 1L)
   return(methods[[1L]])
   else if (length(methods) == 0L) {
   cnames - paste(\, sapply(classes, as.character),
   \, sep = , collapse = , )
   stop(unable to find an inherited method for function \,
   fdef@generic, \, for signature , cnames)
   }
   else stop(Internal error in finding inherited methods; didn't
return a unique method)
   }(list(itemMatrix), function (object)
   standardGeneric(size), environment)
3: size(is.unique)
2: pmml.rules(rules)
1: pmml(rules)

Thanks,
Patrick McCann

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique possible bug

2011-10-06 Thread Patrick McCann
The error I am referring to is in unique.c in Base R, it cannot
accomodate greater than 2^29 values, even though it appears the
overflow protection should be 2^30. The only relevance of the arules
package is I was using it while I discovered this issue.

Thanks,
Patrick



2011/10/6 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 05.10.2011 22:15, Patrick McCann wrote:

 Hi,

 I am trying to read in a rather large list of transactions using the
 arules library.

 You mean the arules package?


 It seems in the coerce method into the dgCmatrix, it
 somewhere calls unique. Unique.c throws an error when  n  536870912;
 however, when 4*n was modified to 2*n in 2004, the overflow protection
 should have changed from 2^29 to 2^30, right? If so, how would I
 change it in my copy? Do I have to recompile everything?

 Yes.

 There is also the way to ask the maintainer to improve it, but it won't work
 without reinstallation of the changed package sources.

 Uwe Ligges



 Thanks,
 Patrick McCann


 Here is a simple to reproduce example:

 runif(2^29+5)-a
 sum(unique(a))-b

 Error in unique.default(a) : length 536870917 is too large for hashing

 traceback()

 3: unique.default(a)
 2: unique(a)
 1: unique(a)

 unique.default

 function (x, incomparables = FALSE, fromLast = FALSE, ...)
 {
     z- .Internal(unique(x, incomparables, fromLast))
     if (is.factor(x))
         factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
             ordered = is.ordered(x))
     else if (inherits(x, POSIXct))
         structure(z, class = class(x), tzone = attr(x, tzone))
     else if (inherits(x, Date))
         structure(z, class = class(x))
     else z
 }
 environment: namespace:base

 From http://svn.r-project.org/R/trunk/src/main/unique.c I see:


 /*
  Choose M to be the smallest power of 2
  not less than 2*n and set K = log2(M).
  Need K= 1 and hence M= 2, and 2^M= 2^31 -1, hence n= 2^29.

  Dec 2004: modified from 4*n to 2*n, since in the worst case we have
  a 50% full table, and that is still rather efficient -- see
  R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
 */
 static void MKsetup(int n, HashData *d)
 {
    int n4 = 2 * n;

    if(n  0 || n  536870912) /* protect against overflow to -ve */
        error(_(length %d is too large for hashing), n);
    d-M = 2;
    d-K = 1;
    while (d-M  n4) {
        d-M *= 2;
        d-K += 1;
    }
 }

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique possible bug

2011-10-05 Thread Patrick McCann
Hi,

I am trying to read in a rather large list of transactions using the
arules library. It seems in the coerce method into the dgCmatrix, it
somewhere calls unique. Unique.c throws an error when  n  536870912;
however, when 4*n was modified to 2*n in 2004, the overflow protection
should have changed from 2^29 to 2^30, right? If so, how would I
change it in my copy? Do I have to recompile everything?

Thanks,
Patrick McCann


Here is a simple to reproduce example:
 runif(2^29+5)-a
 sum(unique(a))-b
Error in unique.default(a) : length 536870917 is too large for hashing
 traceback()
3: unique.default(a)
2: unique(a)
1: unique(a)
 unique.default
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
z - .Internal(unique(x, incomparables, fromLast))
if (is.factor(x))
factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
ordered = is.ordered(x))
else if (inherits(x, POSIXct))
structure(z, class = class(x), tzone = attr(x, tzone))
else if (inherits(x, Date))
structure(z, class = class(x))
else z
}
environment: namespace:base

From http://svn.r-project.org/R/trunk/src/main/unique.c I see:


/*
 Choose M to be the smallest power of 2
 not less than 2*n and set K = log2(M).
 Need K = 1 and hence M = 2, and 2^M = 2^31 -1, hence n = 2^29.

 Dec 2004: modified from 4*n to 2*n, since in the worst case we have
 a 50% full table, and that is still rather efficient -- see
 R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
*/
static void MKsetup(int n, HashData *d)
{
   int n4 = 2 * n;

   if(n  0 || n  536870912) /* protect against overflow to -ve */
   error(_(length %d is too large for hashing), n);
   d-M = 2;
   d-K = 1;
   while (d-M  n4) {
   d-M *= 2;
   d-K += 1;
   }
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.