Re: [R] Unable to install lme4

2009-10-01 Thread jamesmcc

I am baffled by this as well. I'm having the same issue. Using suse linux,
with 64 bit R2.8.1.

Thanks, 

james



Zege, Andrew wrote:
 
 I am unable to install package lme4, after several attempts to do so using
 various repository URLs.
 Just to make sure everything works fine with proxy, connection, etc, I
 installed ggplot2 and it worked fine.
 
 I am using command
 
 install.packages(lme4, lib=/myRlibs),
 
 optionally using contrib argument with different URLs.
 
 Error message the I get is
 
 Warning message;
 In install.packages(lme4, lib=/myRlibs)
  package 'lme4' is not available
 
 
 Some other details, not sure how relevant are:
 
 getOption(repos) returns http://lib.stat.cmu.edu/R/CRAN;
 
 I tried setting contrib to various other URL, such as 
 http://cran.mtu.edu/src/contrib; or Berkeley URL, but with no success.
 Actually, when I did available.packages() on this repos, I didn't see lme4
 in the package indices.
 My machine has x86_64bit RedHat Linux.
 
 Would appreciate any tips or directions,
 
 Thanks
 Andre
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Unable-to-install--lme4-tp25514856p25697423.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to install lme4

2009-10-01 Thread jamesmcc

This is the first time I've encountered R having difficulty with package and
R version compatibility. I cant believe no one has fixed generally so that
your version of R can get the latest package appropriate to that version.
How nice would that be? :)

Anyway, I figured it out for my version (2.8.1). I needed to install Matrix
package first, which was also outdated.

 R CMD INSTALL -l lib Matrix_0.999375-22.tar.gz
 R CMD INSTALL -l lib lme4_0.999375-28.tar.gz

it loads now within R. I haven used it much yet.



jamesmcc wrote:
 
 I am baffled by this as well. I'm having the same issue. Using suse linux,
 with 64 bit R2.8.1.
 
 Thanks, 
 
 james
 
 
 
 Zege, Andrew wrote:
 
 I am unable to install package lme4, after several attempts to do so
 using various repository URLs.
 Just to make sure everything works fine with proxy, connection, etc, I
 installed ggplot2 and it worked fine.
 
 I am using command
 
 install.packages(lme4, lib=/myRlibs),
 
 optionally using contrib argument with different URLs.
 
 Error message the I get is
 
 Warning message;
 In install.packages(lme4, lib=/myRlibs)
  package 'lme4' is not available
 
 
 Some other details, not sure how relevant are:
 
 getOption(repos) returns http://lib.stat.cmu.edu/R/CRAN;
 
 I tried setting contrib to various other URL, such as 
 http://cran.mtu.edu/src/contrib; or Berkeley URL, but with no success.
 Actually, when I did available.packages() on this repos, I didn't see
 lme4 in the package indices.
 My machine has x86_64bit RedHat Linux.
 
 Would appreciate any tips or directions,
 
 Thanks
 Andre
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Unable-to-install--lme4-tp25514856p25703018.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing random forests and classification trees

2009-09-17 Thread jamesmcc

Greetings tree and forest coders-

I'm interested in comparing randomforests and regression tree/ bagging tree
models. I'd like to propose a basis for doing this, get feedback, and
document this here. I kept it in this thread since that makes sense.

In this case I think it's appropriate to compare the R^2 values as one basic
measure. I'm actually going to compare mean error (ME), mean absolute error
(MAE), root mean squared error (RMSE) as well. This means that I need
estimates from each approach so that I can form residuals. **As I see it,
the important details are in how to set up the models so that I have
comparable estimates, particularly in how the trees/forests are trained and
evaluated.**

For regression/bagging trees, the typical approach for my application is 100
runs of 10-fold CV. In each run all the values are estimated in an
out-of-the-bag sense; each fold is estimated while it is withheld from
fitting, thus fit is not inflated. The estimates are then averaged over the
100 runs at each point to get an average simulation and this is used to
calculate residuals and the measures mentioned above. Somewhat more
specifically, the steps are: I fit a model, I prune it via inspection, I
loop 100 times on xpred.rpart(model,xval=10,cp=cp at bottom of cptable from
pruned fit) to generate the 100 runs (bagging is thus performed while
holding the cp criteria fixed?), I average these pointwise, I calculate the
desired stats/quantities for comparison to other models.

For randomForests, I would want to fit the model in a similar way, ie 100
runs of 10-fold CV. I think the 10-fold part is clear, the 100 runs, maybe
less so. To get 10-fold OOB estimates, I set replace=FALSE,
sampsize=.9*nrow(x). Then I get a randomForest with $predicted being the
average OOB estimates over all trees for which each point was OOB. I would
assume that each tree is constructed with a different 10-fold partitioning
of the data set. Thus the number of runs is really more like the number of
trees constructed. If i wanted to be really thorough, I could fit 100 random
forests and get the $predicted for each and then average these pointwise.
But that seems like over kill; isnt that the lesson of plot.randomForest
that as the # of trees goes up the error converges to some limit. (from what
i've seen). 

Thus, my primary concern is in the amount of data used for training and
cross validating the model in an out-of-bag sense; can i meaningfully
compare 10-fold oob estimates sing xpred.rpart to a random forest fit using
90% of the data as sampsize? 

Of secondary concern is the number of bagging trees versus then number of
trees in the random forest. As long as the average estimate error is nearing
some limit with the number of bagging trees I'm using, I think this is all
that matters. So this is more of methodological difference to be retained,
similar to differences in pruning under bagging and random forests, though I
should probably specify the node sizes to be similar for each.

Am I overlooking anything of grave consequence?

Any and all thoughts are welcome. If you are aware of any comparisons of
rpart and randomForests in the literature for any field (for regression) of
which I am ignorant, I would appreciate the tip. I have read over Newer
Classification and Regression Tree Techniques: Bagging and Random Forests
for Ecological Prediction by Prasad, Iverson, and Liaw. I may have missed
it, but I did not see discussion of maintaining consistency in the way the
models were trained, though it is a very nice paper overall and contained
many interesting approaches and points. 

Thanks in advance, 

James

-- 
View this message in context: 
http://www.nabble.com/-R--comparing-random-forests-and-classification-trees-tp8682315p25491934.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ipred bagging segfault on 64 bit linux build

2009-09-11 Thread jamesmcc

I wanted to report this issue here so others may not find themselves alone
and as the author is apparently active on the list.

I havent done an exhaustive test by any means, cause I dont have time. But
here's a small example. Apparently the ns argument is the one that is
killing it. I've gotten several different segfault messages, the only other
one I remember said out of memory. This one is probably most common from
the about 10 segfaults I've had. 

 *** caught segfault ***
address(nil), cause 'unknown'


I'm working on a 64bit build of R 2.8.1 on a linux machine. If you want more
details, I can surely get them.

It happens on the last line for the following for all different valies of
ns:

library(rpart)
library(ipred)

data(Forbes2000, package=HSAUR)
Forbes2000 - subset(Forbes2000, !is.na(profits))
datasize=length(Forbes2000$profits)
f - rpart(profits ~ assets + marketvalue + sales, data=Forbes2000)

fb - bagging(profits ~ assets + marketvalue + sales, data=Forbes2000)
fb - bagging(profits ~ assets + marketvalue + sales, data=Forbes2000,
  nbagg=100,coob=TRUE)
fb - bagging(profits ~ assets + marketvalue + sales, data=Forbes2000,
  nbagg=100,coob=TRUE, ns=round(.9*datasize))

-- 
View this message in context: 
http://www.nabble.com/ipred-bagging-segfault-on-64-bit-linux-build-tp25407509p25407509.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rpart - the xval argument in rpart.control and in xpred.rpart

2009-09-11 Thread jamesmcc

I have this *exact* same confusion. 

Adding to this is the fact that Everitt and Hothorn in their book, HSAUR,
say that setting xval=100 gives 100 runs of 10-fold cross-validation (1st
ed., page 136).

Is this actually 1 run of 100-fold cross-validation? 

For large xval, doing multiple cross-validations is not super important. But
I would want to perform multiple cross-validataion with different partitions
of the data when xval is moderate or small wrt the size of the data set. In
that case do we need to do as Paolo suggests?




Paolo Radaelli wrote:
 
 Usually 10-fold cross validation is performed more than once to get an 
 estimate of the misclassification rate thus I thought number of 
 cross-validations was different from the number of cross-validation 
 groups. So, if I want to perform 10-fold cross-validation more than once 
 (say 5) in order to estimate the miscalssification rate I have to run 
 xpred.rpart 5 times ?
 Thanks
 Paolo
 
 
 I have some problems in understanding the meaning of the xval argument
 in
 the two functions rpart.control and xpred.rpart. In the former it is 
 defined
 as the number of cross-validations while in the latter it is defined as 
 the
 number of cross-validation groups.
  It is the same thing.  If xval=10 then the data is divided into 10 
 disjoint
 groups.  A model is fit with group 1 left out and that model is used to 
 predict
 the observations in group 1; then a model is fit with group 2 left out; 
 then
 group 3, ...
   So 10 groups = 10 fits of the model.
 
 
 
 Actually I thought that in rpart.control
 



   Terry Therneau


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-rpart---the-xval-argument-in-rpart.control-and-in-xpred.rpart-tp23942907p25408496.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] goodness of prediction using a model (lm, glm, gam, brt, regression tree .... )

2009-09-11 Thread jamesmcc

I think it's important to say why you're unhappy with your current measures?
Are they not capturing aspects of the data you understand? 

I typically use several residual measures in conjunction, each has it's
benefits/drawbacks. I just throw them all in a table. 


-- 
View this message in context: 
http://www.nabble.com/goodness-of-%22prediction%22-using-a-model-%28lm%2C-glm%2C-gam%2C-brt%2C-regression-tree--%29-tp25270261p25408808.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.