Thank you so much for your help. I hope it will work. However, why the same error doesn't arise when I am using rf. They both have the same parameters and it's default values.
Best regards On Friday, July 1, 2022, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > The error is in Ranger parameter mtry becoming greater than the number of > variables (columns). > mtry can be set manually in caret::train argument tuneGrid. But for random > forests you must also set the split rule and the minimum node. > > > library(caret) > library(farff) > > boot <- trainControl(method = "cv", number = 10) > > # set the maximum mtry manually to ncol(tr) > # this creates a sequence of mtry values > mtry <- var_seq(ncol(tr), len = 3) # 3 is the default value > mtry > # [1] 2 13 24 > #[1] 2 13 24 > > splitrule <- c("variance", "extratrees") > min.node.size <- 1:10 > mtrygrid <- expand.grid(mtry, splitrule, min.node.size) > names(mtrygrid) <- c("mtry", "splitrule", "min.node.size") > > c1 <- train(act_effort ~ ., data = tr, > method = "ranger", > tuneLength = 5, > metric = "MAE", > preProc = c("center", "scale", "nzv"), > tuneGrid = mtrygrid, > trControl = boot) > c1 > # Random Forest > # > # 30 samples > # 23 predictors > # > # Pre-processing: centered (48), scaled (48), remove (58) > # Resampling: Cross-Validated (10 fold) > # Summary of sample sizes: 28, 27, 27, 28, 27, 27, ... > # Resampling results across tuning parameters: > # > # mtry splitrule min.node.size RMSE Rsquared MAE > # 2 variance 1 256.6391 0.8103759 186.3609 > # 2 variance 2 249.7120 0.8628109 183.6696 > # 2 variance 3 258.8240 0.8284449 189.0712 > # > # [...omit...] > # > # 13 extratrees 10 254.9569 0.8918014 191.2524 > # 24 variance 1 177.7188 0.9458652 112.2800 > # 24 variance 2 172.6826 0.9204287 108.5943 > # 24 variance 3 172.9954 0.9271006 109.2554 > # 24 variance 4 172.2467 0.9523067 110.0776 > # 24 variance 5 175.2485 0.9283317 112.8798 > # 24 variance 6 177.9285 0.9369881 115.8970 > # 24 variance 7 180.5959 0.9485035 117.5816 > # 24 variance 8 178.8037 0.9358033 117.8725 > # 24 variance 9 176.5849 0.9210959 117.0055 > # 24 variance 10 178.6439 0.9257969 119.8035 > # 24 extratrees 1 219.1368 0.8801770 141.0720 > # 24 extratrees 2 216.1900 0.8550002 140.9263 > # 24 extratrees 3 212.4138 0.8979379 141.4282 > # 24 extratrees 4 218.2631 0.9121471 146.2908 > # 24 extratrees 5 212.5679 0.9279598 144.2715 > # 24 extratrees 6 218.9856 0.9141754 152.2099 > # 24 extratrees 7 222.8540 0.9412682 152.4614 > # 24 extratrees 8 228.1156 0.9423414 161.8456 > # 24 extratrees 9 226.6182 0.9408306 160.5264 > # 24 extratrees 10 226.9280 0.9429413 165.6878 > # > # MAE was used to select the optimal model using the smallest value. > # The final values used for the model were mtry = 24, splitrule = variance > # and min.node.size = 2. > plot(c1) > > > > Hope this helps, > > Rui Barradas > > > Às 23:03 de 30/06/2022, Neha gupta escreveu: > >> Ok, the data is pasted below >> >> But on the same data (everything the same) and with other models like RF, >> SVM etc, it works fine. >> >> > dput(head(tr, 30)) >> structure(list(recordnumber = c(0, 0.02, 0.04, 0.06, 0.07, 0.08, >> 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.16, 0.17, 0.18, 0.23, 0.24, >> 0.25, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.35, 0.36, 0.37, 0.38, >> 0.4, 0.41), projectname = structure(c(1L, 1L, 1L, 1L, 2L, 3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 5L, 6L), levels = c("de", "erb", "gal", >> "X", "hst", "slp", "spl", "Y"), class = "factor"), cat2 = structure(c(3L, >> 3L, 3L, 3L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, >> 9L, 11L, 5L, 4L, 6L, 8L, 3L, 9L, 9L, 9L, 9L, 6L, 7L), levels = >> c("Avionics", >> "application_ground", "avionicsmonitoring", "batchdataprocessing", >> "communications", "datacapture", "launchprocessing", "missionplanning", >> "monitor_control", "operatingsystem", "realdataprocessing", "science", >> "simulation", "utility"), class = "factor"), forg = structure(c(2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("f", >> "g"), class = "factor"), center = structure(c(2L, 2L, 2L, 2L, >> 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L), levels = c("1", "2", >> "3", "4", "5", "6"), class = "factor"), year = c(0.5, 0.5, 0.5, >> 0.5, 0.6875, 0.5625, 0.5625, 0.8125, 0.5625, 0.875, 0.5625, 0.75, >> 0.5625, 0.8125, 0.75, 0.9375, 0.9375, 0.9375, 0.6875, 0.6875, >> 0.6875, 0.6875, 0.875, 1, 0.9375, 0.9375, 0.9375, 0.9375, 0.5625, >> 0.25), mode = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L), levels = c("embedded", "organic", "semidetached" >> ), class = "factor"), rely = structure(c(4L, 4L, 4L, 4L, 4L, >> 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, >> 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L), levels = c("vl", "l", "n", >> "h", "vh", "xh"), class = "factor"), data = structure(c(2L, 2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, >> 5L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), cplx = structure(c(4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, >> 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), time = structure(c(3L, >> 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, >> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), stor = structure(c(3L, >> 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, >> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), virt = structure(c(2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 3L, >> 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), turn = structure(c(2L, >> 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, >> 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), acap = structure(c(3L, >> 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, >> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), aexp = structure(c(3L, >> 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L, 4L, 4L, >> 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), pcap = structure(c(3L, >> 3L, 3L, 3L, 3L, 4L, 5L, 4L, 5L, 3L, 4L, 4L, 5L, 4L, 4L, 4L, 4L, >> 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 4L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), vexp = structure(c(3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), lexp = structure(c(4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 4L, 4L, 4L, 4L, 3L, 3L, >> 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), modp = structure(c(4L, >> 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 4L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), tool = structure(c(3L, >> 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 1L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), sced = structure(c(2L, >> 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L), levels = c("vl", >> "l", "n", "h", "vh", "xh"), class = "factor"), equivphyskloc = c(0.025534, >> 0.006945, 0.008988, 0.002655, 0.067102, 0.006741, 0.019508, 0.005209, >> 0.101215, 0.010622, 0.101215, 0.019508, 0.152283, 0.031253, 0.014401, >> 0.014401, 0.037892, 0.009294, 0.015729, 0.012154, 0.032377, 0.035339, >> 0.004698, 0.009703, 0.00572, 0.012358, 0.091002, 0.007252, 0.180778, >> 0.307527), act_effort = c(117.6, 31.2, 25.2, 10.8, 352.8, 72, >> 72, 24, 360, 36, 215, 48, 324, 60, 48, 90, 210, 48, 82, 62, 170, >> 192, 18, 50, 42, 60, 444, 42, 1248, 2400)), row.names = c(1L, >> 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L, 18L, 19L, >> 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 36L, 37L, 38L, 39L, >> 41L, 42L), class = "data.frame") >> >> >> >> On Thu, Jun 30, 2022 at 11:28 PM Rui Barradas <ruipbarra...@sapo.pt >> <mailto:ruipbarra...@sapo.pt>> wrote: >> >> Hello, >> >> Please post data in dput format, without it it's difficult to tell. >> If I substitute >> >> mpg for act_effort >> mtcars for tr >> >> keeping everything else, I don't get any errors. >> And the error message says clearly that the error is in tr (data). >> >> Can you post the output of dput(head(tr, 30))? >> >> Rui Barradas >> >> >> Às 19:32 de 30/06/2022, Neha gupta escreveu: >> > I posted it for the second time as I didn't get any response from >> group >> > members. I am not sure if some problem is with the question. >> > >> > >> > >> > I cannot run the "ranger" model with caret. I am only using the >> farff and >> > caret libraries and the following code: >> > >> > boot <- trainControl(method = "cv", number=10) >> > >> > c1 <-train(act_effort ~ ., data = tr, >> > method = "ranger", >> > tuneLength = 5, >> > metric = "MAE", >> > preProc = c("center", "scale", "nzv"), >> > trControl = boot) >> > >> > The error I get is the repeating of the following message until I >> interrupt >> > it. >> > >> > Error: mtry can not be larger than number of variables in data. >> Ranger will >> > EXIT now. >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list >> -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> > and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.