[R] failure with merge

2016-07-14 Thread Max Kuhn
I am merging two data frames: tuneAcc <- structure(list(select = c(FALSE, TRUE), method = structure(c(1L, 1L), .Label = "GCV.Cp", class = "factor"), RMSE = c(29.2102056093962, 28.9743318817886), Rsquared = c(0.0322612161559773, 0.0281713457306074), RMSESD = c(0.981573768028697,

Re: [R] Installing Caret

2016-06-16 Thread Max Kuhn
The problem is not with `caret. Your output says: > installation of package ‘minqa’ had non-zero exit status `caret` has a dependency that has a dependency on `minqa`. The same is true for `RcppEigen` and the others. What code did you use to do the install? What OS and version or R etc? On

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
gt; > > Kind Regards > > > > -- > Muhammad Bilal > Research Fellow and Doctoral Researcher, > Bristol Enterprise, Research, and Innovation Centre (BERIC), > University of the West of England (UWE), > Frenchay Campus, > Bristol, > BS16 1QY > > *muhammad2.bi...@

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
It is extremely difficult to tell what the issue might be without a reproducible example. The only thing that I can suggest is to use the non-formula interface to `train` so that you can avoid creating dummy variables. On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal <

Re: [R] Mixture Discriminant Analysis and Penalized LDA

2016-01-25 Thread Max Kuhn
There is a function called `smda` in the sparseLDA package that implements the model described in Clemmensen, L., Hastie, T., Witten, D. and Ersbøll, B. Sparse discriminant analysis, Technometrics, 53(4): 406-413, 2011 Max On Sun, Jan 24, 2016 at 10:45 PM, TJUN KIAT TEO

Re: [R] Caret - Recursive Feature Elimination Error

2015-12-23 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. Also, what is the point of using glmnet with RFE? It already does feature selection. On Wed, Dec 23, 2015 at 1:48 AM, Manish MAHESHWARI wrote: > Hi, > > I am trying to use

Re: [R] Error in 'Contrasts<-' while using GBM.

2015-11-29 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. My only guess is that one or more of your predictors are factors and that the in-sample data (used to build the model during resampling) have different levels than the holdout samples. Max On

Re: [R] Ensure distribution of classes is the same as prior distribution in Cross Validation

2015-11-24 Thread Max Kuhn
Right now, using `method = "cv"` or `method = "repeatedcv"` does stratified sampling. Depending on what you mean by "ensure" and the nature of your outcome (categorical?), it probably already does. On Mon, Nov 23, 2015 at 7:04 PM, TJUN KIAT TEO wrote: > In the caret train

Re: [R] Caret Internal Data Representation

2015-11-06 Thread Max Kuhn
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. For example, did you use the formula or non-formula interface to `train` and so on On Thu, Nov 5, 2015 at 1:10 PM, Bert Gunter wrote: > I am not familiar with

Re: [R] Imbalanced random forest

2015-07-29 Thread Max Kuhn
This might help: http://bit.ly/1MUP0Lj On Wed, Jul 29, 2015 at 11:00 AM, jpara3 j.para.fernan...@hotmail.com wrote: ¿How can i set up a study with random forest where the response is highly imbalanced? - Guided Tours Basque Country Guided tours in the three capitals of the Basque

Re: [R] what constitutes a 'complete sentence'?

2015-07-07 Thread Max Kuhn
On Tue, Jul 7, 2015 at 8:19 AM, John Fox j...@mcmaster.ca wrote: Dear Peter, You're correct that these examples aren't verb phrases (though the second one contains a verb phrase). I don't want to make the discussion even more pedantic (moving it in this direction was my fault), but Paragraph

Re: [R] Caret and custom summary function

2015-05-11 Thread Max Kuhn
The version of caret just put on CRAN has a function called mnLogLoss that does this. Max On Mon, May 11, 2015 at 11:17 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I am trying to implement my own metric (a log loss metric) for a binary classification problem in Caret. I

Re: [R] Repeated failures to install caret package (of Max Kuhn)

2015-04-04 Thread Max Kuhn
-0500 To: r-help@r-project.org Subject: [R] Repeated failures to install caret package (of Max Kuhn) For an edx course, MIT's The Analtics Edge, I need to install the caret package that was originated and is maintained by Dr. Max Kuhn of Pfizer. So far, every effort I've made to try

Re: [R] #library(CHAID) - Cross validation for chaid

2015-01-05 Thread Max Kuhn
You can create your own: http://topepo.github.io/caret/custom_models.html I put a prototype together. Source this file: https://github.com/topepo/caret/blob/master/models/files/chaid.R then try this: library(CHAID) ### fit tree to subsample set.seed(290875) USvoteS -

Re: [R] Help with caret, please

2014-10-11 Thread Max Kuhn
What you are asking is a bad idea on multiple levels. You will grossly over-estimate the area under the ROC curve. Consider the 1-NN model: you will have perfect predictions every time. To do this, you will need to run train again and modify the index and indexOut objects: library(caret)

Re: [R] Training a model using glm

2014-09-17 Thread Max Kuhn
You have not shown all of your code and it is difficult to diagnose the issue. I assume that you are using the data from: library(AppliedPredictiveModeling) data(AlzheimerDisease) If so, there is example code to analyze these data in that package. See ?scriptLocation. We have no idea how

Re: [R] Use of library(X) in the code of library X.

2014-06-06 Thread Max Kuhn
That is legacy code but there was a good reason back then. caret is written to use parallel processing via the foreach package. There were some cases where the worker processes did not load the required packages (even when I used foreach's .packages argument) so I would do it explicitly. I don't

Re: [R] cforest sampling methods

2014-03-19 Thread Max Kuhn
You might look at the 'bag' function in the caret package. It will not do the subsampling of variables at each split but you can bag a tree and down-sample the data at each iteration. The help page has an examples bagging ctree (although you might want to play with the tree depth a little). Max

Re: [R] how is the model resample performance calculated by caret?

2014-02-28 Thread Max Kuhn
On Fri, Feb 28, 2014 at 1:13 AM, zhenjiang zech xu zhenjiang...@gmail.com wrote: Dear all, I did a 5-repeat of 10-fold cross validation using partial least square regression model provided by caret package. Can anyone tell me how are the values in plsTune$resample calculated? Is that

Re: [R] boxcox alternative

2014-02-24 Thread Max Kuhn
Michael, On Mon, Feb 24, 2014 at 5:51 AM, Michael Haenlein haenl...@escpeurope.eu wrote: Dear all, I am working with a set of variables that are very non-normally distributed. To improve the performance of my model, I'm currently applying a boxcox transformation to them. While this improves

Re: [R] Predictor Importance in Random Forests and bootstrap

2014-01-28 Thread Max Kuhn
I think that the fundamental problem is that you are using the default value of ntree (500). You should always use at least 1500 and more if n or p are large. Also, this link will give you more up-to-date information on that package and feature selection:

Re: [R] R crashes with memory errors on a 256GB machine (and system shoes only 60GB usage)

2014-01-02 Thread Max Kuhn
Describing the problem would help a lot more. For example, if you were using some of the parallel processing options in R, this can make extra copies of objects and drive memory usage up very quickly. Max On Thu, Jan 2, 2014 at 3:35 PM, Ben Bolker bbol...@gmail.com wrote: Xebar Saram zeltakc

Re: [R] Variable importance - ANN

2013-12-04 Thread Max Kuhn
If you are using the nnet package, the caret package has a variable importance method based on Gevrey, M., Dimopoulos, I., Lek, S. (2003). Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling, 160(3), 249-264. It is

Re: [R] Inconsistent results between caret+kernlab versions

2013-11-17 Thread Max Kuhn
Andrew, What I still don't quite understand is which accuracy values from train() I should trust: those using classProbs=T or classProbs=F? It depends on whether you need the class probabilities and class predictions to match (which they would if classProbs = TRUE). Another option is to use

Re: [R] Inconsistent results between caret+kernlab versions

2013-11-15 Thread Max Kuhn
into account the costs but the class probability predictions do not. I alerted both package maintainers to the issue some time ago.) HTH, Max On Fri, Nov 15, 2013 at 1:56 PM, Max Kuhn mxk...@gmail.com wrote: I've looked into this a bit and the issue seems to be with caret. I've been looking

Re: [R] C50 Node Assignment

2013-11-09 Thread Max Kuhn
There is a sub-object called 'rules' that has the output of C5.0 for this model: library(C50) mod - C5.0(Species ~ ., data = iris, rules = TRUE) cat(mod$rules) id=See5/C5.0 2.07 GPL Edition 2013-11-09 entries=1 rules=4 default=setosa conds=1 cover=50 ok=50 lift=2.94231 class=setosa type=2

Re: [R] Cross validation in R

2013-07-02 Thread Max Kuhn
How do i make a loop so that the process could be repeated several time, producing randomly ROC curve and under ROC values? Using the caret package http://caret.r-forge.r-project.org/ -- Max __ R-help@r-project.org mailing list

Re: [R] Error running caret's gbm train function with new version of caret

2013-05-06 Thread Max Kuhn
Katrina, I made some changes to accomidate gbm's new feature for 3+ categories, then had to harmonize how gbm and caret work together. I have a new version of caret that is not released yet (maybe within a month), but you should get it from: install.packages(caret,

Re: [R] C50 package in R

2013-04-26 Thread Max Kuhn
There isn't much out there. Quinlan didn't open source the code until about a year ago. I've been through the code line by line and we have a fairly descriptive summary of the model in our book (that's almost out): http://appliedpredictivemodeling.com/ I will say that the pruning is mostly

Re: [R] odfWeave: Some questions about potential formatting options

2013-04-17 Thread Max Kuhn
Paul, #1: I've never tried but you might be able to escape the required tags in your text (e.g. in html you could write out the b in your text). #3: Which output? Is this in text? #2: I may be possible and maybe easy to implement. So if you want to dig into it, have at it. For me, I'm

Re: [R] Parallelizing GBM

2013-03-24 Thread Max Kuhn
See this: https://code.google.com/p/gradientboostedmodels/issues/detail?id=3 and this: https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel Max On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote: Dear All, I am far from being a guru

Re: [R] CARET and NNET fail to train a model when the input is high dimensional

2013-03-06 Thread Max Kuhn
James, I did a fresh install from CRAN to get caret_5.15-61 and ran your code with method.name = nnet and grid.len = 3. I don't get an error, although there were issues: In nominalTrainWorkflow(dat = trainData, info = trainInfo, ... : There were missing values in resampled performance

Re: [R] caret pls model statistics

2013-03-03 Thread Max Kuhn
but the prior equation seems different to me. Could you explain if this is the same concept? Charles On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn mxk...@gmail.com wrote: Is there some literature that you make that statement? No, but there isn't literature on changing a lightbulb with a duck

Re: [R] caret pls model statistics

2013-03-02 Thread Max Kuhn
Charles, You should not be treating the classes as numeric (is virginica really three times setosa?). Q^2 and/or R^2 are not appropriate for classification. Max On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote: I have discovered on of my errors. The timematrix was

Re: [R] odfWeave: Trouble Getting the Package to Work

2013-02-18 Thread Max Kuhn
That's not a reproducible example. There is no sessionInfo() and you omitted code (where did 'fp' come from?). It works fine for me (see sessionInfo below) using the code in ?odfWeave. As for the file paths: you can point to different paths for the files (although don't change the working

Re: [R] CARET: Any way to access other tuning parameters?

2013-02-13 Thread Max Kuhn
James, You really need to read the documentation. Almost every question that you have has been addressed in the existing material. For this one, there is a section on custom models here: http://caret.r-forge.r-project.org/training.html Max On Wed, Feb 13, 2013 at 9:58 AM, James Jong

Re: [R] CARET: Any way to access other tuning parameters?

2013-02-13 Thread Max Kuhn
from each package other than those listed in the CARET documentation (e.g. I would like to specify sampsize and nodesize for randomForest, and not just mtry). Yes. A custom method is how you do that. Thanks, James On Wed, Feb 13, 2013 at 1:07 PM, Max Kuhn mxk...@gmail.com wrote

Re: [R] pROC and ROCR give different values for AUC

2012-12-19 Thread Max Kuhn
A reproducible example sent to the package maintainer(s) might yield results. Max On Wed, Dec 19, 2012 at 7:47 AM, Ivana Cace i.c...@ati-a.nl wrote: Packages pROC and ROCR both calculate/approximate the Area Under (Receiver Operator) Curve. However the results are different. I am computing

Re: [R] Help with this error kernlab class probability calculations failed; returning NAs

2012-11-29 Thread Max Kuhn
You didn't provide the results of sessionInfo(). Upgrade to the version just released on cran and see if you still have the issue. Max On Thu, Nov 29, 2012 at 6:55 PM, Brian Feeny bfe...@mac.com wrote: I have never been able to get class probabilities to work and I am relatively new to

Re: [R] Help with this error kernlab class probability calculations failed; returning NAs

2012-11-29 Thread Max Kuhn
(and not attached): [1] codetools_0.2-8 compiler_2.15.2 grid_2.15.2 iterators_1.0.6 tools_2.15.2 Is there an example that shows a classProbs example, I could try to run it to replicate and see if it works on my system. Brian On Nov 29, 2012, at 10:10 PM, Max Kuhn mxk...@gmail.com wrote: You

Re: [R] caret train and trainControl

2012-11-23 Thread Max Kuhn
Brian, This is all outlined in the package documentation. The final model is fit automatically. For example, using 'verboseIter' provides details. From ?train knnFit1 - train(TrainData, TrainClasses, + method = knn, + preProcess = c(center, scale), +

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-22 Thread Max Kuhn
Vik, On Fri, Sep 21, 2012 at 12:42 PM, Vik Rubenfeld v...@mindspring.com wrote: Max, I installed C50. I have a question about the syntax. Per the C50 manual: ## Default S3 method: C5.0(x, y, trials = 1, rules= FALSE, weights = NULL, control = C5.0Control(), costs = NULL, ...) ## S3

Re: [R] Caret: Use timingSamps leads to error

2012-07-12 Thread Max Kuhn
I can reproduce the errors. I'll take a look. Thanks, Max On Thu, Jul 12, 2012 at 5:24 AM, Dominik Bruhn domi...@dbruhn.de wrote: I want to use the caret package and found out about the timingSamps obtion to obtain the time which is needed to predict results. But, as soon as I set a value

Re: [R] caret() train based on cross validation - split dataset to keep sites together?

2012-05-30 Thread Max Kuhn
Tyrell, If you want to have the folds contain data from only one site at a time, you can develop a set of row indices and pass these to the index argument in trainControl. For example index = list(site1 = c(1, 6, 8, 12), site2 = c(120, 152, 176, 178), site3 = c(754, 789, 981)) The first fold

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-17 Thread Max Kuhn
))         names(out) - c(RMSE, Rsquared)         return(out) } --- [1]: http://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions Thanks! Dominik On 17/05/12 04:10, Max Kuhn wrote: Dominik, See this line:   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.  30.37   30.37   30.37

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn
failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote: More information is needed to be sure, but it is most likely

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn
assumption. Thanks anyway, Dominik On 16/05/12 17:30, Max Kuhn wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every

Re: [R] caret package: custom summary function in trainControl doesn't work with oob?

2012-04-13 Thread Max Kuhn
Matt, I've been using a custom summary function to optimise regression model methods using the caret package. This has worked smoothly. I've been using the default bootstrapping resampling method. For bagging models (specifically randomForest in this case) caret can, in theory, uses the

[R] nonparametric densities for bounded distributions

2012-03-09 Thread Max Kuhn
Can anyone recommend a good nonparametric density approach for data bounded (say between 0 and 1)? For example, using the basic Gaussian density approach doesn't generate a very realistic shape (nor should it): set.seed(1) dat - rbeta(100, 1, 2) plot(density(dat)) (note the area outside of

Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Max Kuhn
I think you need to read the man pages and the four vignettes. A lot of your questions have answers there. If you don't specify the resampling indices, they ones generated for you are saved in the train object: data(iris) TrainData - iris[,1:4] TrainClasses - iris[,5] knnFit1 -

Re: [R] Choosing glmnet lambda values via caret

2012-02-09 Thread Max Kuhn
You can adjust the candidate set of tuning parameters via the tuneGrid argument in trian() and the process by which the optimal choice is made (via the 'selectionFunction' argument in trainControl()). Check out the package vignettes. The latest version also has an update.train() function that

[R] lattice key in blank panel

2011-12-15 Thread Max Kuhn
Somewhere I've seen an example of an xyplot() where the key was placed in a location of a missing panel. For example, if there were 3 conditioning levels, the panel grid would look like: 34 12 In this (possibly imaginary) example, there were scatter plots in locations 1:3 and location 4 had no

[R] palettes for the color-blind

2011-11-02 Thread Max Kuhn
Everyone, I'm working with scatter plots with different colored symbols (via lattice). I'm currently using these colors for points and lines: col1 - c(rgb(1, 0, 0), rgb(0, 0, 1), rgb(0, 1, 0), rgb(0.55482458, 0.40350876, 0.0416), rgb(0, 0, 0)) plot(seq(along =

Re: [R] palettes for the color-blind

2011-11-02 Thread Max Kuhn
Yes, I was aware of the different type and their respective prevalences. The dichromat package helped me find what I needed. Thanks, Max On Wed, Nov 2, 2011 at 6:38 PM, Thomas Lumley tlum...@uw.edu wrote: On Thu, Nov 3, 2011 at 11:04 AM, Carl Witthoft c...@witthoft.com wrote: Before you

Re: [R] help with parallel processing code

2011-10-31 Thread Max Kuhn
I'm not sure what you mean by full code or the iteration. This uses foreach to parallelize the loops over different tuning parameters and resampled data sets. The only way I could set to split up the parallelism is if you are fitting different models to the same data. In that case, you could

Re: [R] Contrasts with an interaction. How does one specify the dummy variables for the interaction

2011-10-31 Thread Max Kuhn
This is failing because it is a saturated model and the contrast package tries to do a t-test (instead of a z test). I can add code to do this, but it will take a few days. Max On Fri, Oct 28, 2011 at 2:16 PM, John Sorkin jsor...@grecc.umaryland.edu wrote: Forgive my resending this post. To

Re: [R] help with parallel processing code

2011-10-27 Thread Max Kuhn
I have had issues with some parallel backends not finding functions within a namespace for packages listed in the .packages argument or explicitly loaded in the body of the foreach loop. This has occurred with MPI but not with multicore. I can get around this to some extent by calling the

Re: [R] difference between createPartition and createfold functions

2011-10-03 Thread Max Kuhn
, 2011 at 11:10 AM, bby2...@columbia.edu wrote: Hi Max, Thanks for the note. In your last paragraph, did you mean in createDataPartition? I'm a little vague about what returnTrain option does. Bonnie Quoting Max Kuhn mxk...@gmail.com: Basically, createDataPartition is used when you need

Re: [R] difference between createPartition and createfold functions

2011-10-02 Thread Max Kuhn
Basically, createDataPartition is used when you need to make one or more simple two-way splits of your data. For example, if you want to make a training and test set and keep your classes balanced, this is what you could use. It can also make multiple splits of this kind (or leave-group-out CV aka

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-16 Thread Max Kuhn
. Could you perhaps tell me which example I should have a look at? Regards, Jan On 09/15/2011 04:47 PM, Max Kuhn wrote: There are examples in the package directory that explain this. On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laanrh...@eoos.dds.nl  wrote: What is the correct way

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-15 Thread Max Kuhn
There are examples in the package directory that explain this. On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laan rh...@eoos.dds.nl wrote: What is the correct way to combine multiple calls to odfCat, odfItemize, odfTable etc. inside a function? As an example lets say I have a function that

Re: [R] Trying to extract probabilities in CARET (caret) package with a glmStepAIC model

2011-08-28 Thread Max Kuhn
Can you provide a reproducible example and the results of sessionInfo()? What are the levels of your classes? On Sat, Aug 27, 2011 at 10:43 PM, Jon Toledo tintin...@hotmail.com wrote: Dear developers, I have jutst started working with caret and all the nice features it offers. But I just

Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED]

2011-06-01 Thread Max Kuhn
David, The ROC curve should really be computed with some sort of numeric data (as opposed to classes). It varies the cutoff to get a continuum of sensitivity and specificity values.  Using the classes as 1's and 2's implies that the second class is twice the value of the first, which doesn't

Re: [R] issue with odfWeave running on Windows XP; question about installing packages under Linux

2011-05-18 Thread Max Kuhn
, but similar. I sent the info to Max Kuhn privately, but did not get a response after two tries.) My odfWeave reporting system worked fine prior to R2.12 and then the same code that ran fine under R2.11.1 stopped working. Using the very same machine and running the very same code under R2.11.1 it still

Re: [R] Can ROC be used as a metric for optimal model selection for randomForest?

2011-05-13 Thread Max Kuhn
XiaoLiu, I can't see the options in bootControl you used here. Your error is consistent with leaving classProbs and summaryFunction unspecified. Please double check that you set them with classProbs = TRUE and summaryFunction = twoClassSummary before you ran. Max On Thu, May 12, 2011 at 7:04

Re: [R] Can ROC be used as a metric for optimal model selection for randomForest?

2011-05-13 Thread Max Kuhn
Frank, It depends on how you define optimal. While I'm not a big fan of using the area under the ROC to characterize performance, there are a lot of times when likelihood measures are clearly sub-optimal in performance. Using resampled accuracy (or Kappa) instead of deviance (out-of-bag or not)

Re: [R] Bigining with a Program of SVR

2011-05-07 Thread Max Kuhn
As far as caret goes, you should read http://cran.r-project.org/web/packages/caret/vignettes/caretVarImp.pdf and look at rfe() and sbf(). On Fri, May 6, 2011 at 2:53 PM, ypriverol yprive...@gmail.com wrote: Thanks Max. I'm using now the library caret with my data. But the models showed a

Re: [R] Bigining with a Program of SVR

2011-05-04 Thread Max Kuhn
train() uses vectors, matrices and data frames as input. I really think you need to read materials on basic R before proceeding. Go to the R web page. There are introductory materials there. On Tue, May 3, 2011 at 11:19 AM, ypriverol yprive...@gmail.com wrote: I saw the format of the caret data

Re: [R] Bigining with a Program of SVR

2011-05-03 Thread Max Kuhn
See the examples at the end of: http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf for a QSAR data set for modeling the log blood-brain barrier concentration. SVMs are not used there but, if you use train(), the syntax is very similar. On Tue, May 3, 2011 at 9:38 AM,

Re: [R] caret - prevent resampling when no parameters to find

2011-05-02 Thread Max Kuhn
Yeah, that didn't work. Use fitControl-trainControl(index = list(seq(along = mdrrClass))) See ?trainControl to understand what this does in detail. Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
It isn't building the same model since each fit is created from different data sets. The resampling is sort of the point of the function, but if you really want to avoid it, supply your own index in trainControl that has every index (eg, index = seq(along = mdrrClass)). In this case, the

Re: [R] Bigining with a Program of SVR

2011-05-01 Thread Max Kuhn
When you say variable do you mean predictors or responses? In either case, they do. You can generally tell by reading the help files and looking at the examples. Max On Fri, Apr 29, 2011 at 3:47 PM, ypriverol yprive...@gmail.com wrote: Hi:  I'm starting a research of Support Vector

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
No, the sampling is done on rows. The definition of a bootstrap (re)sample is one which is the same size as the original data but taken with replacement. The Accuracy SD and Kappa SD columns give you a sense of how the model performance varied across these bootstrap data sets (i.e. they are not

Re: [R] caret - prevent resampling when no parameters to find

2011-05-01 Thread Max Kuhn
Not all modeling functions have both the formula and matrix interface. For example, glm() and rpart() only have formula method, enet() has only the matrix interface and ksvm() and others have both. This was one reason I created the package (so we don't have to remember all this). train() lets you

Re: [R] odfWeave Error unzipping file in Win 7

2011-03-21 Thread Max Kuhn
I don't think that this is the issue, but test it on a file without spaces. On Mon, Mar 21, 2011 at 2:25 PM, rmail...@justemail.net wrote: I have a very similar error that cropped up when I upgraded to R 2.12 and persists at R 2.12.1. I am running R on Windows XP and OO is at version 3.2.

Re: [R] Specify feature weights in model prediction (CARET)

2011-03-16 Thread Max Kuhn
Using the 'CARET' package, is it possible to specify weights for features used in model prediction? For what model? And for the 'knn' implementation, is there a way to choose a distance metric (i.e. Mahalanobis distance)? No, sorry. Max __

Re: [R] use caret to rank predictors by random forest model

2011-03-07 Thread Max Kuhn
It would help if you provided the code that you used for the caret functions. The most likely issues is not using importance = TRUE in the call to train() I believe that I've only implemented code for plotting the varImp objects resulting from train() (eg. there is plot.varImp.train but not

[R] Course: R for Predictive Modeling: A Hands-On Introduction

2011-03-04 Thread Max Kuhn
R for Predictive Modeling: A Hands-On Introduction Predictive Analytics World in San Francisco Sunday March 13, 9am to 4:30pm This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically

Re: [R] ROC from R-SVM?

2011-02-22 Thread Max Kuhn
The objects functions for kernel methods are unrelated to the area under the ROC curve. However, you can try to choose the cost and kernel parameters to maximize the ROC AUC. See the caret package, specifically the train function. Max On Mon, Feb 21, 2011 at 5:34 PM, Angel Russo

Re: [R] Random Forest Cross Validation

2011-02-20 Thread Max Kuhn
I am using randomForest package to do some prediction job on GWAS data. I firstly split the data into training and testing set (70% vs 30%), then using training set to grow the trees (ntree=10). It looks that the OOB error in training set is good (10%). However, it is not very good for the

Re: [R] caret::train() and ctree()

2011-02-16 Thread Max Kuhn
Andrew, ctree only tunes over mincriterion and ctree2 tunes over maxdepth (while fixing mincriterion = 0). Seeing both listed as the function is being executed is a bug. I'll setup checks to make sure that the columns specified in tuneGrid are actually the tuning parameters that are used. Max

Re: [R] Train error:: subscript out of bonds

2011-01-26 Thread Max Kuhn
Sort of. It lets you define a grid of candidate values to test and to define the rule to choose the best. For some models, it is each to come up with default values that work well (e.g. RBF SVM's, PLS, KNN) while others are more data dependent. In the latter case, the defaults may not work well.

Re: [R] Train error:: subscript out of bonds

2011-01-26 Thread Max Kuhn
No. Any valid seed should work. In this case, train() should on;y be using it to determine which training set samples are in the CV or bootstrap data sets. Max On Wed, Jan 26, 2011 at 9:56 AM, Neeti nikkiha...@gmail.com wrote: Thank you so much for your reply. In my case it is giving error in

Re: [R] Train error:: subscript out of bonds

2011-01-25 Thread Max Kuhn
What version of caret and R? We'll also need a reproducible example. On Mon, Jan 24, 2011 at 12:44 PM, Neeti nikkiha...@gmail.com wrote: Hi, I am trying to construct a svmpoly model using the caret package (please see code below). Using the same data, without changing any setting, I am just

Re: [R] circular reference lines in splom

2011-01-20 Thread Max Kuhn
, Jan 20, 2011 at 11:13 AM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2011-01-19 20:15, Max Kuhn wrote: Hello everyone, I'm stumped. I'd like to create a scatterplot matrix with circular reference lines. Here is an example in 2d: library(ellipse) set.seed(1) dat- matrix(rnorm(300), ncol = 3

[R] circular reference lines in splom

2011-01-19 Thread Max Kuhn
Hello everyone, I'm stumped. I'd like to create a scatterplot matrix with circular reference lines. Here is an example in 2d: library(ellipse) set.seed(1) dat - matrix(rnorm(300), ncol = 3) colnames(dat) - c(X1, X2, X3) dat - as.data.frame(dat) grps - factor(rep(letters[1:4], 25)) panel.circ -

[R] less than full rank contrast methods

2010-12-06 Thread Max Kuhn
I'd like to make a less than full rank design using dummy variables for factors. Here is some example data: when - data.frame(time = c(afternoon, night, afternoon, morning, morning, morning, morning, afternoon, afternoon),

Re: [R] Sporadic errors when training models using CARET

2010-11-23 Thread Max Kuhn
Kendric, I've seen these too and traceback() usually goes back to ksvm(). This doesn't mean that the error is there, but the results fo traceback() from you would be helpful. thanks, Max On Mon, Nov 22, 2010 at 6:18 PM, Kendric Wang kendr...@interchange.ubc.ca wrote: Hi. I am trying to

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Max Kuhn
Neeti, I'm pretty sure that the error is related to the confusionMAtrix call, which is in the caret package, not e1071. The error message is pretty clear: you need to pas in two factor objects that have the same levels. You can check by running the commands: str(pred_true1)

Re: [R] odfWeave - Format error discovered in the file in sub-document content.xml at 2, 4047 (row, col)

2010-11-16 Thread Max Kuhn
Can you try it with version 7.16 on R-Forge? Use install.packages(odfWeave, repos=http://R-Forge.R-project.org;) to get it. Thanks, Max On Tue, Nov 16, 2010 at 8:26 AM, Søren Højsgaard soren.hojsga...@agrsci.dk wrote: Dear Mike, Good point - thanks. The lines that caused the error

Re: [R] to determine the variable importance in svm

2010-10-26 Thread Max Kuhn
The caret package has answers to all your questions. 1) How to obtain a variable (attribute) importance using e1071:SVM (or other svm methods)? I haven't implemented a model-specific method for variables importance for SVM models. I know of one package (svmpath) that will return the

Re: [R] Random Forest AUC

2010-10-22 Thread Max Kuhn
Ravishankar, I used Random Forest with a couple of data sets I had to predict for binary response. In all the cases, the AUC of the training set is coming to be 1. Is this always the case with random forests? Can someone please clarify this? This is pretty typical for this model. I have

Re: [R] Understanding linear contrasts in Anova using R

2010-09-30 Thread Max Kuhn
These two resources might also help: http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf http://cran.r-project.org/web/packages/contrast/vignettes/contrast.pdf Max On Thu, Sep 30, 2010 at 1:33 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi Professor Howell, I think the issue here

Re: [R] Creating publication-quality plots for use in Microsoft Word

2010-09-15 Thread Max Kuhn
You might want to check out the Reproducible Research task view: http://cran.r-project.org/web/views/ReproducibleResearch.html There is a section on Microsoft formats, as well as other formats that can be converted. Max On Wed, Sep 15, 2010 at 11:49 AM, Thomas Lumley

Re: [R] Reproducible research

2010-09-09 Thread Max Kuhn
A Reproducible Research CRAN task view was recently created: http://cran.r-project.org/web/views/ReproducibleResearch.html I will be updating it with some of the information in this thread. thanks, Max On Thu, Sep 9, 2010 at 11:41 AM, Matt Shotwell shotw...@musc.edu wrote: Well, the

Re: [R] createDataPartition

2010-09-09 Thread Max Kuhn
Trafim, You'll get more answers if you adhere to the posting guide and tell us you version information and other necessary details. For example, this function is in the caret package (but nobody but me probably knows that =]). The first argument should be a vector of outcome values (not the

Re: [R] several odfWeave questions

2010-08-25 Thread Max Kuhn
Ben,  1a. am I right in believing that odfWeave does not respect the 'keep.source' option?  Am I missing something obvious? I believe it does, since this gets passed directly to Sweave.  1b. is there a way to set global options analogous to \SweaveOpts{} directives in Sweave? (I looked at

Re: [R] odfWeave Issue.

2010-08-11 Thread Max Kuhn
What does this mean? It's impossible to tell. Read the posting guide and figure out all the details that you left out. If we don't have more information, you should have low expectations about the quality of any replies to might get. -- Max __

Re: [R] UseR! 2010 - my impressions

2010-07-27 Thread Max Kuhn
Not to beat a dead horse... I've found that I like the useR conferences more than most statistics conferences. This isn't due to the difference in content, but the difference in the audience and the environment. For example, everyone is at useR because of their appreciation of R. At most other

Re: [R] Random Forest - Strata

2010-07-27 Thread Max Kuhn
The index indicates which samples should go into the training set. However, you are using out of bag sampling, so it would use the whole training set and return the OOB error (instead of the error estimates that would be produced by resampling via the index). Which do you want? OOB estimates or

  1   2   3   >