Re: [R] time mathematics
Well, this is not an elegant (or robust) solution, but this would work for the example you give, at least: starttime <- as.POSIXct("2018-11-20 23:01:18") # Just pick a random date format(starttime + c(0:4), format = "%T") There are probably better ways. :) -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] security using R at work
The section I'm working in runs a facility for sensitive research data (https://www.uio.no/english/services/it/research/sensitive-data/). Our users use R (along with other analysis software). We don't consider R safe or unsafe, but have designed the services so that it should not be possible (or at least very difficult) for sensitive information to leak out of the network. I would say that your best bet is to expect all analysis software to have security holes or be compromised, and design your setup/network around that assumption. -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLS in R
Margarida Soares <margaridapmsoa...@gmail.com> writes: > Thanks for your reply on pls! > I have tried to do a correlation plot but I get the following group of > graphs. Any way of having only 1 plot? > This is my script: > > corrplot(plsrcue1, comp = 1:4, radii = c(sqrt(1/2), 1), identify = FALSE, > type = "p" ) "Correlation loadings" are the correlations between each variable and the selected components, so I don't see how you can have more than two sets of correlations (i.e., more than two components) in a single scatter plot. You could have three sets in a 3d plot, of course, but that you would have to implement yourself. :) -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLS in R
Margarida Soares <margaridapmsoa...@gmail.com> writes: > library(pls) > plsrcue<- plsr(cue~fb+cn+n+ph+fung+bact+resp, data = cue, ncomp=7, > na.action = NULL, method = "kernelpls", scale=FALSE, validation = "LOO", > model = TRUE, x = FALSE, y = FALSE) > summary(plsrcue) > > and I got this output, where I think I can choose the number of components > based on RMSEP, but how do I choose it? There are no "hard" rules for how to choose the number of components, but one rule of thumb is to stop when the RMSEP starts to flatten out, or to increase. In your case, I would say 4 components. An easier way to look at the RMSEP values is with plot(RMSEP(plsrcue)). (There are some algorithms that can suggest the number of components for you. Two of those are implemented in the development of the plsr package (hopefully released during Christmas). You can check it out here if you wish: https://github.com/bhmevik/pls . Disclaimer: I am the maintainer of the package. :) ) > - and also, how to proceed from here? That depends on what you want to do/learn about the system you aremodelling. Many researchers in fields like spectroscopy or chemometrics (where PLSR originated) plot loadings and scores and infer things graphically.) > - and how to make a correlation plot? corrplot(plsrcue) - at least if you mean a correlation loadings plot. See ?corrplot for details > - what to do with the values, coefficients that I get in the Environment > (pls values) Again, that depends on what you want with your model. -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pls package - validation
Bert Gunter <bgunter.4...@gmail.com> writes: > However, if I understand correctly, using pls or anything else to try > to fit (some combination of) 501 variables to 16 data points -- and > then crossvalidate with 6 data points -- is utter nonsense. You just > have a fancy random number generator! That is incorrect. PLSR and other dimension reducing regression methods can handle more prediction variables than samples perfectly fine -- many of them were created for that purpose. As for the original question: typically this happens when there is no (or very little) correlation between the response and the prediction variables. (Or as they tend to say in chemometrics: You don't have a model.) > As I said, I think it better to follow up or complain about me on > stackexchange rather than here. Sorry, I read this too late. :) -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] pls 2.6-0 released
Version 2.6-0 of the pls package has been released and will be available at your local CRAN mirror shortly. The pls package implements Partial Least Squares Regression, Principal Component Regression and Canonical Powered PLS. The major changes in 2.6-0 are: - It now has a function selectNcomp() for automatically suggesting the optimal number of components for the model. The function implements two different algorithms, and will optionally plot the RMSEP values and number of components. - A description of selectNcomp() has been added to the vignette. -- Regards, Bjørn-Helge Mevik ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version 3.2.3: package not available error with https
Loris Bennett <loris.benn...@fu-berlin.de> writes: > It seems that R needs libcurl 7.28.0, but my platform (Scientific Linux > 6.7) only provides version 7.19.7. We got "bit" by this when upgrading to 3.2.2. If you cannot upgrade libcurl on your machine(s), you can put local({ options(useHTTPS = FALSE) }) in the Rprofile.site file, or your ~/.Rprofile. You still get a warning, but you do get the list of http repositories. Come to think about it: would it be an idea if R defaulted to useHTTPS = FALSE if capabilites("libcur") is FALSE? -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with data structure when using plsr() from package pls
S Ellison <s.elli...@lgcgroup.com> writes: > Reading ?plsr examples and inspecting the data they use, you need to arrange > frame1 so that it has the data from n96 included as columns with names of the > from "n96.xxx" whre xxx can be numbers, names etc. No, you do not. :) plsr() is happy with a data frame where n96 is a single variable consisting of a matrix. And this is the recommended way for matrices with a lot of coloumns. Which is what you get with frame1 <- data.frame(gushVM, n96 = I(n96)) if n96 is a matrix, or frame1 <- data.frame(gushVM, n96 = I(as.matrix(n96))) if it is a data.frame. > If n96 is a data frame, try something like > names(n96) <- paste("n96", 1:96) > frame1 <- cbind(gushVM, n96) > > pls1 <- plsr(gushVM ~ n96, data = frame1) Have you actually tried this? It doesn't work: For instance: > gushVM <- 1:5 > n96 <- data.frame(a=1:5, b=2:6) > names(n96) <- paste("n96", 1:2) > n96 n96 1 n96 2 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 > frame1 <- cbind(gushVM, n96) > frame1 gushVM n96 1 n96 2 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 > dim(frame1) [1] 5 3 > pls1 <- plsr(gushVM ~ n96, data = frame1) Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : invalid type (list) for variable 'n96' The reason is that frame1 does _not_ contain a variable called 'n96', so plsr() (or actually model.frame.default()) searches in the global work space, where it finds a _data.frame_ n96. A data.frame is a list. Hence the error message. > If n96 is a matrix, > > frame1 <- data.frame(gushVM, n96=n96) > > should also give you a data frame with names of the right format. It does not: > n96 <- as.matrix(n96) > frame1 <- data.frame(gushVM, n96=n96) > frame1 gushVM n96.n96.1 n96.n96.2 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 > dim(frame1) [1] 5 3 > names(frame1) [1] "gushVM""n96.n96.1" "n96.n96.2" So the data frame still does not have any variable named 'n96'. The only reason > pls1 <- plsr(gushVM ~ n96, data = frame1) seems to work, is that the 'n96' variable it now finds in the global environment, happens to be a matrix > class(n96) [1] "matrix" If that wasn't there, you would get an error: > rm(n96) > pls1 <- plsr(gushVM ~ n96, data = frame1) Error in eval(expr, envir, enclos) : object 'n96' not found > I() wrapped round a matrix or data frame does nothing like what is needed if > you include it in a data frame construction, so either things have changed > since the tutorial was written, or the authors were not handling a matrix or > data frame with I(). Yes it does. :) Nothing (substantial) has changed, and we did/do handle matrices with I(): > n96 <- matrix(1:10, ncol=2) > n96 [,1] [,2] [1,]16 [2,]27 [3,]38 [4,]49 [5,]5 10 > frame1 <- data.frame(gushVM, I(n96)) > frame1 gushVM n96.1 n96.2 1 1 1 6 2 2 2 7 3 3 3 8 4 4 4 9 5 5 510 > dim(frame1) [1] 5 2 > names(frame1) [1] "gushVM" "n96" > rm(n96) > pls1 <- plsr(gushVM ~ n96, data = frame1) > pls1 Partial least squares regression , fitted with the kernel algorithm. Call: plsr(formula = gushVM ~ n96, data = frame1) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with data structure when using plsr() from package pls
Jeff Newmiller <jdnew...@dcn.davis.ca.us> writes: > Using I() in the data.frame seems ill-advised to me. You complain about 96 > variables but from reading your explanation that seems to be what your data > are. In PSLR, it is common to regress a variable against matrices with very many coloumns, often several thousands. Using a data frame with one predictor variable for each coloumn is going to make the formula handling very slow. And if you have several such predictor matrices, it is very practical to keep them as single variables in the data frame, so you easily can select/deselect which groups of variables you want in the model. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with data structure when using plsr() from package pls
CG Pettersson <cg.petters...@lantmannen.com> writes: >> frame1 <- data.frame(gushVM, I(n96)) [...] >> pls1 <- plsr(gushVM ~ n96, data = frame1) > Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : > invalid type (list) for variable 'n96' As far as I can remember, you get this error if the n96 object was a data.frame instead of a matrix. Can you check with, e.g., > class(n96) If it says "data.frame", try using I(as.matrix(n96)). -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing R 3.2.2 on machine with old libcurl
We have to install R 3.2.2 on machines with too old libcurl to be able to use https when installing packages, etc. When a user tries to use install.packages() (with the default value of the "repos" option), she is presented with a list of https-repos, which is not very useful. She also gets an error message Error in download.file(url, destfile = f, quiet = TRUE) : unsupported URL scheme We have put local({ options(useHTTPS = FALSE) }) into the Rprofile.site file, and after that, the user gets a list of http repos, so she will be able to install packages. But the error message is still displayed, which can be confusing. Is there a way around this problem? Also, perhaps the useHTTPS option should default to FALSE if the libcurl capability is FALSE? -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] pls 2.5-0 released
Version 2.5-0 of the pls package has been released. The pls package implements Partial Least Squares Regression, Principal Component Regression and Canonical Powered PLS. The major changes are: - Cross-validation can now make sure that replicates are kept in the same segment, by the use of a new argument `nrep'. See ?cvsegments for details. - It now has a vignette. - It now has a NEWS file that can be accessed by news(). -- Regards, Bjørn-Helge Mevik ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A strange problem using pls package
PO SU rhelpmaill...@163.com writes: suppose data has 20 columns traindata - data[ 1:10, 1:10] testdata - data[11:15,1:10] pls.fit - plsr(y~x, ncomp = 5, data = traindata, method= simpls, scale = FALSE, model = TRUE, validation = CV) ok, i get some result, the srange thing happens when i redo the plsr, i mean, i use traindata - data[ 1:10, 1:20] testdata - data[11:15,1:20] pls.fit - plsr(y~x, ncomp = 5, data = traindata, method= simpls, scale = FALSE, model = TRUE, validation = CV) I get the same result as the first one!!! The reason is probably that you ask plsr() to use the coloumn of traindata called x as the predictor. Then it will only use that coloumn, no matter how many coloumns traindata contains. The usual way of using plsr() is to have a data.frame with a _matrix_ as the predictor coloumn, for instance like this: mydata - data.frame(y = some_vector, X = I(some_matrix)) mymodel - plsr(y ~ X, ..., data = mydata) If you want to have the predictors as separate vectors, you must name all of them in the formula (y ~ x1 + x2 + x3 + ...), or you can use the following shortcut to regress y on all the remaining coloumns: plsr(y ~ ., ..., data = mydata) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] - PLS-Package - PLSR loadings
Wolfgang Obermeier wolfgang.oberme...@geo.uni-marburg.de writes: how is it possible that the loadings of the second or even third component of a PLS-Analysis show higher values than the first component? Somebody got an idea?? The loadings of a PLS regression are simply the coefficients that are multiplied with the X variables to transform X to the latent vectors used in the regression (this is slightly over-simplified). There is no reason why the coefficients of the first component should be larger than the coefficients of other components. (In fact, it is often the case that when one fits too many components (i.e., one starts to model noise), the coefficients of the last components get higher and higher.) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help plsr function
annie Zhang annie.zhang2...@gmail.com writes: ## the predicted scores from the model (pred - predict(data.cpls,n.comp=1:2,newdata=x.new,type=score)) ## the predicted scores using x%*%projection cbind(x.new.centered%*%data.cpls$projection[,1],x.new.centered%*%data.cpls$projection[,2]) Can someone please tell me why the two predicted scores don't match? If you look at the code that does the prediction: pls:::predict.mvr function (object, newdata, ncomp = 1:object$ncomp, comps, type = c(response, scores), na.action = na.pass, ...) { [...] TT - (newX - rep(object$Xmeans, each = nobs)) %*% object$projection[, comps] you will see that it subtracts the _old X_ coloumn means from the new X matrix, not the _new X_ coloumn means. So sweep(x.new, 2, data.cpls$Xmeans, -) %*% data.cpls$projection[,1:2] will reproduce the values from predict(). -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about R2 in pls package
Euna Jeong eaje...@gmail.com writes: I have questions about R2 used in pls (or multivariate analysis). Is R2 same with the square of the PCC (Pearson Correlation Coefficient)? If you read the manual for R2 in the pls package, it will tell you how R2 is calculated there, and that for _training_ data it is indeed PCC^2, but _not_ for cross-validation or test data. IMHO, R^2 only has a meaningful interpretation for training data. For test data or cross-validation, I prefer MSEP or RMSEP. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about the prediction plot in pls package
Euna Jeong eaje...@gmail.com writes: R plot(gas1, ncomp=2, asp = 1, line = TRUE) This shows only the cross-validated predictions. If you add the argument which = c(train, validation) (see ?predplot.mvr), you will get both. However, you will get them in separate panels in the plot. If you wish to have them in the same panel, you will have to add the points yourself. This should work: plot(gas1, ncomp=2, asp = 1, line = TRUE) points(predict(gas1, ncomp = 2) ~ gasoline$octane, col = red) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] pls 2.4-3 released
Version 2.4-3 of the pls package has been released. Windows and OSX binaries should appear shortly. The pls package implements Partial Least Squares Regression, Principal Component Regression and Canonical Powered PLS Regression. The major changes are: - Can now perform cross-validation in parallel, using the facilities of the 'parallel' package. See ?pls.options and the examples in ?mvr for details. (Note: in order to use MPI, packages 'snow' and 'Rmpi' must be installed, because 'parallel' rely on them for MPI parallelisation.) Other user-visible changes: - In order to comply with current CRAN submission policies, pls.options() no longer stores the modified option list in the global environment. This has the effect that the options will have to be set every time R is started, even if the work space was saved an loaded. -- Regards, Bjørn-Helge Mevik ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data structure for plsr
Emma Jones evjo...@ualberta.ca writes: My current data structure consists of a .csv file read into R containing 15 columns (a charcoal dilution series going From 100% to 0%) and 1050 rows of absorbance data from 400 nm to 2500 nm at 2 nm interval. I think I need to transpose the data such that the specific wavelengths become my columns and dilutions are defined in rows, Yes, you need to transpose the data so a coloumn corresponds to a variable (response or predictor). Should I (and how do I) make my absorbance data into individual matrices that read into a data frame with only two columns It is best to put all predictors (wavelengths) together in one matrix, yes. The same for the responses, if you have more than one response coloumn. This is untested, so there might be errors: Assuming that your spectroscopic data is read into a data frame called origspec: ## This should create a matrix with the wavelengths as coloumns: spec - t(as.matrix(origspec)) I don't know what your response is, so I'm just assuming it is in a vector called resp. # This would create a data frame suitable for plsr(): mydata - data.frame(resp = resp, spec = I(spec)) Then you can analyse like this: plsr(resp ~ spec, data = mydata, ) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] --enable-R-shlib and external BLAS/LAPACK libraries
A couple of years ago I noted that using the configure switch --enable-R-shlib when buildig R made configure ignore any specified external LAPACK library (I cannot recall if also the BLAS specification was ignored) and use the internal one insted. I asked why, and was told it was intentional. Now, with R 2.15.1, I see that it at least appears that this is no longer the case. I've run configure like this: fast=-ip -O3 -opt-mem-layout-trans=3 -xHost -mavx export CC=icc export CFLAGS=$fast -wd188 -fp-model precise export F77=ifort export FFLAGS=$fast -fp-model precise export CXX=icpc export CXXFLAGS=$fast -fp-model precise export FC=ifort export FCFLAGS=$fast -fp-model precise ./configure --with-blas='-mkl=parallel' --with-lapack --enable-R-shlib (in addition, paths to the intel compilers and librareis are set up). The output from configure says: Interfaces supported: X11, tcltk External libraries:readline, BLAS(generic), LAPACK(in blas) Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: shared R library, R profiling, Java After make install, we get a libR.so linked to MKL libraries (see below for details). Am I correct in assuming that this R will use the intel MKL libraries for BLAS and LAPACK routines? (That would be very nice, because we want to use the fast libraries, but some of our uses need to have libR.so, so up to now, we've had to build to versions of R.) # ldd libR.so linux-vdso.so.1 = (0x7ff52bcf8000) libifport.so.5 = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libifport.so.5 (0x7ff52b47d000) libifcore.so.5 = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libifcore.so.5 (0x7ff52b238000) libimf.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libimf.so (0x7ff52ae6d000) libsvml.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libsvml.so (0x7ff52a6f3000) libm.so.6 = /lib64/libm.so.6 (0x7ff52a45a000) libirc.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libirc.so (0x7ff52a30b000) libpthread.so.0 = /lib64/libpthread.so.0 (0x7ff52a0ee000) libdl.so.2 = /lib64/libdl.so.2 (0x7ff529ee9000) libreadline.so.6 = /lib64/libreadline.so.6 (0x7ff529ca6000) librt.so.1 = /lib64/librt.so.1 (0x7ff529a9e000) libmkl_intel_lp64.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_intel_lp64.so (0x7ff5292b7000) libmkl_intel_thread.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_intel_thread.so (0x7ff528238000) libmkl_core.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_core.so (0x7ff5271c2000) libiomp5.so = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libiomp5.so (0x7ff526ecf000) libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x7ff526cb9000) libintlc.so.5 = /cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libintlc.so.5 (0x7ff526b6a000) libc.so.6 = /lib64/libc.so.6 (0x7ff5267d7000) /lib64/ld-linux-x86-64.so.2 (0x00344520) libtinfo.so.5 = /lib64/libtinfo.so.5 (0x7ff5265b6000) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLSR AND PCR ISSUES
You give us far too little information about what you do, what you want and what happens. Given that, the only help one can give is: Read the documentation. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Research Computing Services, University of Oslo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discrepancies in the estimates of Partial least square (PLS) in SAS and R
rakeshnb rakeshn...@gmail.com writes: I am using pls package but how is scaling done in R? That is documented in the help pages: library(pls) ?plsr [snip] scale: numeric vector, or logical. If numeric vector, X is scaled by dividing each variable with the corresponding element of 'scale'. If 'scale' is 'TRUE', X is scaled by dividing each variable by its sample standard deviation. If cross-validation is selected, scaling by the standard deviation is done for every segment. When in doubt, read the documentation. :) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discrepancies in the estimates of Partial least square (PLS) in SAS and R
rakeshnb rakeshn...@gmail.com writes: I have been using R and SAS from past 6 months and i found a interesting thing while doing PLS in R and SAS is that when we use NO SCALE option in SAS and scale=FALSE in R , we see the estimates are matching but if we use scaling option in SAS and R the estimates differ to greater extent , you can try with any data set we will get very different estimates while using the scaling option. can any one help me in this issue ? My guess is that they use different scalings, which of course will give different results. However, since you don't say anything about which R package you use for PLSR (and since I don't have access to SAS), I can only guess. :) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLS Error message
Thomas Möckel thomas.moc...@nateko.lu.se writes: I work with hyperspectral remote sensing data and I try to built a pls model with this data. I already built the model but if I try to calculate the RMSEP and R2 with a test data set I get the following error message: Error: variable 'subX' was fitted with type nmatrix.501 but type nmatrix.73 was supplied Since you don't show what commands you used, this is guesswork, but my guess is that you used yourmodel - plsr(yourresponse ~ subX, data = yourdata) R2(yourmodel, newdata = yournewdata) and that yourdata$subX contains 501 coloumns, but yournewdata$subX only contains 73 coloumns. You must supply a newdata with the same number of coloumns as in the modelling data. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLS predict
Thomas Möckel thomas.moc...@nateko.lu.se writes: I have a question about understanding PLS. If I use the predict function of R than it seems to me the function only uses the last latent variable to model new Y values. But should the function not use all latent variables to model new Y´s? It should, and it definitely does. The _effect_ of each latent variable can vary a lot, though, but even then, the first ones usually have the greatest effect. Again, since you don't show what you did, it is hard to be more specific. -- B/H __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframes in PLS package
R. Michael Weylandt michael.weyla...@gmail.com writes: Without that though, I'm not sure you need the I(as.matrix.(dep)) and I(as.matrix(ind)), I would imagine (untested) that eqn - data.frame(depy = dep, indx = ind) would work (probably better as I() changes things just a little). The I() must be there to prevent data.frame() from separating the coloumns of the matrices into individual variables in the data frame. Without I() there will be no variables depy and indx in the data frame. Try this: A - matrix(1:4, ncol=2) B - matrix(2:5, ncol=2) A [,1] [,2] [1,]13 [2,]24 B [,1] [,2] [1,]24 [2,]35 ## With I(): d1 - data.frame(A = I(A), B = I(B)) d1 A.1 A.2 B.1 B.2 1 1 3 2 4 2 2 4 3 5 names(d1) [1] A B d1$A [,1] [,2] [1,]13 [2,]24 ## Without I(): d2 - data.frame(A = A, B = B) d2 A.1 A.2 B.1 B.2 1 1 3 2 4 2 2 4 3 5 names(d2) [1] A.1 A.2 B.1 B.2 d2$A NULL d2$A.1 [1] 1 2 -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframes in PLS package
westland westl...@uic.edu writes: Here is the dput(eqn) and showData for the file 'eqn': [...] showData(eqn) depy.w depy.h depy.d depy.s indx.a indx.i indx.r indx.x 63 55 1 0 44 37200 4 0 145 52 1 1 33 69300 4 1 104 32 0 1 68 56900 3 1 109 69 1 1 94 44300 6 1 221 61 0 1 72 79800 6 0 110 40 1 1 48 17600 5 1 194 41 0 0 85 58100 4 0 120 76 1 1 19 76700 3 0 210 61 0 0 41 37600 1 0 ... etc. Okay, let me guess: you took the data in the file pls, created a data frame eqn with two matrices in it, then used write.table() to write eqn to a file, and then read it back with read.table(). If that is so, the problem you have is that write.table() will separate the coloumns of the matrices into separate coloumns in the file (it really has no other choice), and then read.table() will of course read those in as separate coloumns again. You have two solutions: 1) Repeat the commands to recreate the eqn data frame as a a data frame with matrices, after reading it in from file: eqn - data.frame(depy = I(as.matrix(eqn[,1:4])), indx = I(as.matrix(eqn[,5:8]))) 2) Save the data frame in an .RData file with save() instead of as a text file with write.table(). That will keep the structure of the variable. Initially, I had input a file 'pls' with the script: dep - pls[,1:4] ind - pls[,5:8] eqn - data.frame(depy = dep, indx = ind) apls - plsr(depy ~ indx, data=eqn) and this gives me [7] ERROR: object 'depy' not found because you are missing the I(as.matrix()). -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframes in PLS package
westland westl...@uic.edu writes: R still doesn't seem to recognize the data.frame ... I get a [6] ERROR: object 'depy.w' not found from the following code: dep - pls[,1:4] ind - pls[,5:8] eqn - data.frame(depy = dep, indx = ind) apls - plsr(depy.w + depy.h + depy.d + depy.s ~ indx.a + indx.i + indx.r + indx.x, data=eqn) BUT I DID try to cbind() these after add-concatenating them (not sure exactly what I am doing) like so ... apls - plsr(cbind(depy.w ,depy.h , depy.d , depy.s) ~ cbind(indx.a , indx.i , indx.r,indx.x), data=eqn) For creating multi-coloumn responses on-the-fly, using cbind() like this works. However, you don't need that for the predictors; there you can get by with just using '+'. If you only have a few predictors/responses, this will work okay, but if you have many, it will take a lot of typing, and make the formula handling part of plsr() take _ages_. Then using matrices is easier and faster. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframes in PLS package
westland westl...@uic.edu writes: Here is what I have done: I read in an 1 x 8 table of data, and assign the first four columns to matrix A and the second four to matrix B pls -read.table(C:/Users/Chris/Desktop/SEM Book/SEM Stat Example/Simple Header Data for SEM.csv,header=TRUE, sep=,, na.strings=NA, dec=., strip.white=TRUE) The problem is here: A - c(pls[1],pls[2],pls[3],pls[4]) B - c(pls[5],pls[6],pls[7],pls[8]) This creates lists A and B, not data frames. Either use cbind() instead of c(), or simply say A - pls[,1:4] B - pls[,5:8] The the rest should work. Btw. it is probably a good idea to avoid single-character names for variables. Especially c and C, because they are names of functions in R. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] pls 2.3.0 released
Version 2.3.0 of the pls package has been released. The pls package implements Partial Least Squares Regression and Principal Component Regression. The major changes are: - New analysis method Canonical Powered PLS (CPPLS) implmemented. See ?cppls.fit. - coefplot() can now plot whiskers at +/-1 SE (since 2.2.0). See ?coefplot - The package now has a name space (since 2.2.0). -- Regards, Bjørn-Helge Mevik ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about plsr() results
Vytautas Rakeviius vytautas1...@yahoo.com writes: But still I have question about results interpretation. In the end I want to construct prediction function in form: Y=a1x1+a2x2 The predict() function does the prediction for you. If you want to construct the prediction _equation_, you can extract the coefficients from the model with coef(yourmodel, ncomp = thenumberofcomponents, intercept = TRUE) See ?coef.mvr for details. Documentation do not describe this. The pls package is designed to work as much as possible like the lm() function and its methods, helpers. So read any introduction to linear models in R, and you will come a long way. There is also a paper in JSS about the pls package: http://www.jstatsoft.org/v18/i02/ -- Cheers, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plsr how to return my formula
Try to read the pls package article available here: http://www.jstatsoft.org/v18/i02/ -- Cheers, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use of segments in PLS
arunkumar akpbond...@gmail.com writes: How to use the segments in the PLS fit1 - mvr(formula=Y~X1+X2+X3+X4+x5++x27, data=Dataset, comp=5,segment =7 ) here when i use segments,the error was like this rror in mvrCv(X, Y, ncomp, method = method, scale = sdscale, ...) : argument 7 matches multiple formal arguments This cannot be true. mvr does not call mvrCv unless you give it the argument validation = CV or validation = LOO. Anyway, the argument is segments, not segment, which - as the error message says - matches multiple arguments, in this case segment.type. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R square and F - stats in PLS
arunkumar akpbond...@gmail.com writes: In the lm function the summary(lmobject) we have adjusted.r square and f statistics Do we have similar to the pls package and how to get it No. Both of these requires theory about the model that doesn't exist for PLSR. (I should note that there have been published a couple of generalisations of the degrees of freedom to general regression models, and these could be used to calculate an adjusted R^2. However, they have not been implemented in the pls package.) It seems you would like to use PLSR the way you use OLS, with classical hypothesis tests and performance statistics. This is not how PLSR is usually applied, and there are few such tools. The traditional/typical focus amongst PLSR practicioners is much more on prediction performance (RMSEP) and interpretation by plotting scores and loadings. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting p-value and standard error in PLS
arunkumar akpbond...@gmail.com writes: How to get p-value and the standard error in PLS There is (to my knowledge) no theory able to calculate p-values for the regression coefficients in PLS regression. Most practicioners use cross-validation to estimate the Root Mean Squared Error (RMSEP) and use that as a measure of the quality of the fit. PLS regression is typically used when you have many (hundreds, thousands, tens of thousands) of predictors, where individual p-values are not very useful. The pls package does implement the jackknife to estimate the variance/standard error of the regression coefficients. There is even a function to calculate p-values from that, but please _do_ read the warning in the documentation: the distribution of the t values used in the test is _unknown_. See the example in ?jack.test for how to use the jackknife. I have used the following function to calculate PLS fit1 - mvr(formula=Y~X1+X2+X3+X4, data=Dataset, comp=4) From a previous message on this list, I see that each of these predictor terms (X1, ...) is a vector. Thus you have only 4 predictor variables, so it would probably be better to use Ordinary Least Squares (OLS) regression (the lm() function in R). There you get p-values automatically. Furthermore, a PLS regression with the same number of components as predictor variables is equivalent to OLS, so there seems no reason to use PLS at all in your case. -- Cheers, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting plsr loadings
Amit Patel amitrh...@yahoo.co.uk writes: plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = numbers, xlab = nm) Error in loadingplot.default(x, ...) : Could not convert variable names to numbers. str(BHPLS1_Loadings) loadings [1:8892, 1:60] -0.00717 0.00414 0.02611 0.00468 -0.00676 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:8892] PCIList1 PCIList2 PCIList3 PCIList4 ... ..$ : chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ... - attr(*, explvar)= Named num [1:60] 2.67 4.14 4.41 3.55 2.59 ... ..- attr(*, names)= chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ... Can anyone see the problem?? By using `labels = numbers', you are asking the plot function to convert the names PCIList1 PCIList2 PCIList3 PCIList4 ... to numbers. It doesn't know how to do that. (See ?loadingplot for the details.) Your options are using `label = names', provide your own labels, not using the `labels' argument, or converting the names manually. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with PLSR with jack knife
Amit Patel amitrh...@yahoo.co.uk writes: BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = 10, data = PLSdata, validation = LOO) and BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = 10, data = PLSdata, validation = CV) [...] Now I am unsure of how to utilise these to identify the significant variables. You can use the jackknife built into plsr to get an indication about significant variables, by adding the argument jackknife = TRUE to the plsr call. Use jack.test(BHPLS1) to do the test. But _PLEASE_ do read the Warning section inf ?jack.test! -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with PLSR Loadings
Amit Patel amitrh...@yahoo.co.uk writes: x - loadings(BHPLS1) my loadings contain variable names rather than numbers. No, they don't. str(x) loadings [1:94727, 1:10] -0.00113 -0.03001 -0.00059 -0.00734 -0.02969 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:94727] PCIList1 PCIList2 PCIList3 PCIList4 ... ..$ : chr [1:10] Comp 1 Comp 2 Comp 3 Comp 4 ... - attr(*, explvar)= Named num [1:10] 14.57 6.62 7.59 5.91 3.26 ... ..- attr(*, names)= chr [1:10] Comp 1 Comp 2 Comp 3 Comp 4 ... Look at the first line of output. These are the values, and they are numeric (it is a matrix). The other lines are attributes of the matrix. plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = numbers, xlab = nm) Error in loadingplot.default(x, ...) : Could not convert variable names to numbers. This says that loadingplot.default could not convert variable _names_ to numbers. That is not surprising, since the variable names are PCIList1, PCIList2, etc., and the documentation for loadinplot says: with 'numbers', the variable names are converted to numbers, if possible. Variable names of the forms 'number' or 'number text' (where the space is optional), are handled. So don't ask the plot function to use numbers as labels. Use e.g. names instead: labels = names. Tip: It is always a good idea to read the output and error messages very carefully. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Help with PLSR
Amit Patel amitrh...@yahoo.co.uk writes: str(FullDataListTrans) num [1:40, 1:94727] 42 40.9 65 56 61.7 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:40] X X.1 X.12 X.13 ... ..$ : NULL I have also created a vector GroupingList which gives the groupnames for each respective sample(row). GroupingList [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 [39] 4 4 str(GroupingList) int [1:40] 1 1 1 1 1 1 1 1 1 1 ... I am now stuck while conducting the plsr. I have tried various methods of creating structured lists etc and have got nowhere. I have also tried many incarnations of BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = FeaturePresenceExpected[1], data = FullDataListTrans, validation = LOO) Where am I going wrong. You are not telling us what happens (or how you tried to make structured lists), but from your description of the data, FullDataListTrans is a matrix with only the the predictor variables, and GroupingList is a vector with the response. The data argument of plsr() (as of most modelling functions in R) expects a data.frame with both response and predictor variables. Try this: mydata - data.frame(GroupingList = GroupingList, PCIList = I(FullDataListTrans)) (The I() is to prevent R from making the coloumns in FullDataListTrans separate variables in the data frame.) BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = FeaturePresenceExpected[1], data = FullDataListTrans, validation = LOO) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R at Supercomputing 10
SC10 Disruptive Technology Preview: The First Cloud Portal to “R” and Beyond http://www.hpcinthecloud.com/features/SC10-Disruptive-Technology-Preview--The-First-Cloud-Portal-to-R-and-Beyond-105776458.html?viewAll=y (My apologies if ths has been posted already.) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems using external BLAS
I have problems building R 2.11.1 with an external BLAS. I've tried several with several libraries: # ACML: export LD_LIBRARY_PATH=/site/VERSIONS/acml-3.6.0/gfortran64_int64/lib BLAS=--with-blas=-L/site/VERSIONS/acml-3.6.0/gfortran64_int64/lib -lacml LAPACK=--with-lapack # MKL 11: BLAS=--with-blas=-L/site/VERSIONS/intel-11.1/mkl/lib/em64t -lmkl_gf_lp64 -lmkl_sequential -lmkl_lapack -lmkl_core LAPACK=--with-lapack # MKL 8.1, trad. way: BLAS=--with-blas=-L/site/intel/cmkl/8.1/lib/em64t -lmkl -lvml -lguide -lpthread LAPACK=--with-lapack I configure R like this: export CFLAGS=-O3 -mtune=opteron export FFLAGS=-O3 -mtune=opteron export CXXFLAGS=-O3 -mtune=opteron export FCFLAGS=-O3 -mtune=opteron ./configure --prefix=/site/VERSIONS/R-2.11.1 \ $BLAS $LAPACK \ --enable-R-shlib In all cases, I get configure:29120: checking whether double complex BLAS can be used configure:29206: result: no The conftestf.f and conftest.c seem to compile fine, but exit status from conftest in line 29181 of configure is nonzero. This is on a Quad-Core AMD Opteron node running CentOS 5.2, with gcc and gfortran version 4.1.2 20071124 (Red Hat 4.1.2-42) (I have also tried without the *FLAGS variables, and without --with-lapack. The result is the same.) (We have successfully built older versions of R with MKL 8.1 earlier, but with intel compilers v. 10.1, using ./configure --prefix=/site/VERSIONS/R-$version \ --with-blas=-L/site/intel/cmkl/8.1/lib/em64t -lmkl -lvml -lguide -lpthread \ --with-lapack=-L/site/intel/cmkl/8.1/lib/em64t -lmkl_lapack64 -lmkl \ --enable-R-shlib but we wanted to switch to gcc because not all R packages compile with icc.) Does anyone have any idea about what could be wrong? -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2 function from PLS to use a model on test data
Addi Wei addi...@gmail.com writes: Hello, I am having some trouble using a model I created from plsr (of train) to analyze each invididual R^2 of the 10 components against the test data. For example: mice1 - plsr(response ~factors, ncomp=10 data=MiceTrain) R2(mice1)##this provides the correct R2 for the Train data for 10 components ## Now my next objective is to calculate my model's R2 for each component on the Test data. (In other words - test how good the model is on test data) The only thing I need are the MiceTest.response, and compare that with predict(mice1, ncomp=1, newdata=MiceTest , and I should be able to calculate R2.but I can't figure out the correct command to do this. I tried the command below, which does provide a different R2 response, however, I'm not sure it is correct as I get a different R^2 value from another software MOE ( Molecular Operating Environment ). R2(mice1, estimate=test, MiceTest) Is the above the correct code to achieve what I'm doing? If so, then MOE probably uses a different function to calculate the model component's R^2 for Test data. That is the way to get test set R^2 for PLSR/PCR models, yes. If you read in the documentation of R2, you will find: The R^2 values returned by 'R2' are calculated as 1 - SSE/SST, where SST is the (corrected) total sum of squares of the response, and SSE is the sum of squared errors for either the fitted values (i.e., the residual sum of squares), test set predictions or cross-validated predictions (i.e., the PRESS). This is, AFAIK, the most common way to define R^2. For training data, this is equivalent to cor(y, yhat)^2, but not for test data or cross-validation. From your second email, I would guess that MOE uses cor(y, yhat)^2 instead of 1 - SSE/SST. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package(pls) - extracting explained Y-variance
Christian Jebsen jeb...@rz.uni-leipzig.de writes: Dear R-help users, I'd like to use the R-package pls and want to extract the explained Y-variance to identify the important (PLS-) principal components in my model, related to the y-data. For explained X-variance there is a function: explvar(). If I understand it right, the summary() function gives an overview, where the y-variance is shown, but I can't extract it for plotting. If you look at the summary function (summary.mvr), you will see that it uses the R2 function for this: yve - 100 * drop(R2(object, estimate = train, intercept = FALSE)$val) (For cross-validated or test set validated models, it uses RMSEP.) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gasoline Data in pls package
Ravi Ramaswamy raram...@gmail.com writes: I am using pls package for some pcr computations. There is a data set called gasoline. Would someone be able to tell me what command(s) could be used to produce this graph in R? I presume you are talking about Figure 1 in the pls article in R news 2006/3 [1]. The plot was produced with the following commands: data(gasoline) par(mar = c(2, 4, 1, 0) + 0.1) matplot(t(gasoline$NIR), type = l, ylab = log(1/R), xaxt = n) ind - pretty(seq(from = 900, to = 1700, by = 2)) ind - ind[ind = 900 ind = 1700] ind - (ind - 898) / 2 axis(1, ind, colnames(gasoline$NIR)[ind]) I am not sure where the log(1/R) - Y-axis - are coming from The measurements in the NIR matrix are log(1 / reflectance), hence the label log(1/R). This is how the data was published by Kalivas [2], and is a standard way of representing Near Infrared Reflectance measurements. [1] http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf [2] J. H. Kalivas. Two data sets of near infrared spectra. Chemometrics and Intelligent Laboratory Systems, 37: 255–259, 1997. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cross-validation in plsr package
Peter Tillmann peter.tillm...@t-online.de writes: can anyone give an example how to use cross-validation in the plsr package. There are examples in the references cited on http://mevik.net/work/software/pls.html I miss to find the number of factors proposed by cross-validation as optimum. The cross-validation in the pls package does not propose a number of factors as optimum, you have to select this yourself. (The reason for this is that there is AFAIK no theoretically founded and widely accepted way of doing this automatically. I'd be happy to learn otherwise.) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pls package
Payam Minoofar payam.minoo...@meissner.com writes: I have managed to format my data into a single datframe consisting of two AsIs response and predictor dataframes in order to supply the plsr command of the pls package for principal components analysis. When I execute the command, however, I get this error: fiber1 - plsr(respmat ~ predmat, ncomp=1, data=inputmat,validation=LOO) Error in model.frame.default(formula = respmat ~ predmat, data = inputmat) : invalid type (list) for variable 'respmat' I happen to have a lot of NAs in some of the columns. Is that the problem? The underlying PLSR/PCR functions do not handle NAs, but that is probably not the problem here. My guess is that you have done something like inputmat - data.frame(respmat = I(foo), predmat = I(bar)) where foo (and perhaps bar) is a _data.frame_ (that is at leas consistent with the error message). If sapply(inputmat, class) produces something like respmat predmat [1,] AsIs AsIs [2,] data.frame data.frame then this is certainly the case. That will not work. They should be matrices instead of data frames, for instance by converting them like this: inputmat - data.frame(respmat = I(as.matrix(foo)), predmat = I(as.matrix(bar))) As for missing values: the default behaviour of plsr is to omit cases with missing values. This is controlled by the 'na.action' argument. See ?na.action for details. -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NotePad++ Syntax file
[Ricardo Rodriguez] Your XEN ICT Team webmas...@xen.net writes: John Kane wrote: No but have you had a look at Tinn-R http://www.sciviews.org/Tinn-R/. Any similar option for Mac OS X? I guess you can use Emacs on Mac OS X. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CRAN + geography = Cranography
Barry Rowlingson b.rowling...@lancaster.ac.uk writes: http://www.maths.lancs.ac.uk/~rowlings/R/Cranography/ Absolutely beautiful! Note this is just for fun. No warranties. Maybe I should use a little 'R' as a marker. That would be cool. Maybe I should get a life. :-) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PLS regression on near infrared (NIR) spectra data
Paulo Ricardo Gherardi Hein phein1...@gmail.com writes: I am new here (since jan2009) and up to now, I not seen anyone commenting about principal component analysis and regression PLS to analyze spectral information in R system. Sorry, I am a R starter... Anybody have any package, or trick to suggest me? There is the package 'pls', with Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). It also contains a couple of plots that are useful for princomp() or prcomp() analyses (PCA). -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA functions
glenn g1enn.robe...@btinternet.com writes: Is there a function (before I try and write it !) that allows the input of a covariance or correlation matrix to calculate PCA, rather than the actual data as in princomp() Yes, there is: princomp(). :-) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package pls
tsn4867 [EMAIL PROTECTED] writes: For the package pls, I need to understand the algorithm for simpls.fit for Partial Least Squares. I'm not sure if simpls.fit tries to find the weight vectors (loadings) to maximize which of the two: Cov(Xw, y) or maximize Cov^2(Xw,y)? Are these objective functions equivalent? (in some texts, they use the first and in other texts, they use the second obj. function.). I think the algorithm for simpls.fit is using Cov(Xw,y). Also, can you give me some references where they state the equivalency of the two obj. functions? The implementation in simpls.fit follows the algorithm in de Jong, S. (1993) SIMPLS: an alternative approach to partial least squares regression. _Chemometrics and Intelligent Laboratory Systems_, *18*, 251-263. (up to simplifications and performance changes). I don't recall if the criterion was cov or cov^2, but I believe they should be identical (up to sign). -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R command line
Raphael Saldanha [EMAIL PROTECTED] writes: Is there a Gui for R with improvements in the command line? I'm not looking for buttons, menus and etc, but (more) colored syntax, auto-complete commands and etc? ESS in Emacs, perhaps? -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate SPE in PLS package
Stella Sim [EMAIL PROTECTED] writes: I want to calculate SPE (squared prediction error) in x-space, can someone help? Here are my codes: fit.pls- plsr(Y~X,data=DAT,ncomp=3,scale=T,method='oscorespls',validation=CV,x= T) actual-fit.pls$model$X (The x = TRUE is not needed as long as model = TRUE (default). x=TRUE returns the predictors as fit.pls$x, and is included for compatibility with lm().) pred-fit.pls$scores %*% t(fit.pls$loadings) SPE.x-rowSums((actual-pred)^2) Am I missing something here? You are missing the mean X spectrum. See matplot(t(pred), type = l, lty = 1) vs. matplot(t(actual), type = l, lty = 1) The Xmeans compontent of fit.pls contains this, so pred - sweep(fit.pls$scores %*% t(fit.pls$loadings), 2, fit.pls$Xmeans, +) would give you what you want. Note, however, that this will calculate the _fitted_ SPE, not the cross-validated SPE. The crossvalidation implemented in the pls package does not save the cross-validated scores/loadings -- that would consume too much memory. (Calculation of SPE withing the cross-validation routines could have been implemented, but was not.) -- Regards, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Append to a vector?
Why not simply a - c(a, 5) or a - c(a, b) if b is another vector. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in using PCR
Gavin Simpson [EMAIL PROTECTED] writes: Ok, lets sort this out. [Not tested as I don't have your data] df - data.frame(resp = cancerv1[, 408], VARS = as.matrix(cancerv1[, 2:407]) Actually, you _do_ need an I() here: df - data.frame(resp = cancerv1[, 408], VARS = I(as.matrix(cancerv1[, 2:407]))) otherwise data.frame() will split the matrix into single coloumn variables. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in using PCR
Gavin Simpson [EMAIL PROTECTED] writes: df - data.frame(resp = dat[,1], VARS = I(as.matrix(dat[, 2:101]))) class(df$VARS) [1] AsIs The class is AsIs for $VARS. But if I look at your yarn data set for example, the NIR component is of class matrix: class(yarn$NIR) [1] matrix How did you achieve this? I don't remember exactly what I did, but this will work, at least: yarn - data.frame(density = ..., train = ...) yarn$NIR - as.matrix(...) For practical purposes, I haven't found any difference between having the matrices with class AsIs and matrix. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in using PCR
Gavin Simpson [EMAIL PROTECTED] writes: You can do this another way though, that I feel is more natural. So lets assume that your data frame contains columns that are named, and that one of these is the response variable, the remaining columns are the predictors. Further assume that this response is called 'myresp', then you can proceed by the following: cancerv1.pcr - pcr(myresp ~ . , ncomp = 6, data = cancerv1, validation = CV) This works fine as long as the number of (predictor) variables is not too large. With many variables ( 1000), R will spend a very long time dealing with the formula. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significant variables in GPLS ?
There is little theory about significance and testing for PLSR (and, I would guess, GPLSR). Many practicioners use Jackknife variance estimates as a basis for significance tests. Note, however, that these variance estimates are known to be biased (in general), and their distribution is (to my knowledge) not known. Any significance deduced from them should therefore be regarede as merely indicators. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different results in calculating SD of 2 numbers
Ron Michael [EMAIL PROTECTED] writes: Can anyone tell me why I am getting different results in calculating SD of 2 numbers ? (1.25-0.95)/2 [1] 0.15 Because this is not the SD? Try (1.25-0.95)/sqrt(2) :-) sd(c(1.25, 0.95)) [1] 0.2121320 # why it is different from 0.15? -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mvr error in PLS package
Gavin Simpson wrote: On Mon, 2007-11-26 at 09:25 -0800, Bricklemyer, Ross S wrote: libs.IC.cal - mvr(libs.IC.fmla, data = libsdata.cond.cal, ncomp=20,validation = LOO, method = oscorespls) Error in colMeans(x, n, prod(dn), na.rm) : 'x' must be numeric There are many 0 for this soil property. Could this cause the error? Without having the data (or a small example therefore) it is impossible to tell. It would also be nice to know which version of the package you are using. :-) To start, try str(libsdata.cond.cal) and check that the variables referenced in your formula object (which is what I presume libs.IC.fmla is?) are all numeric and haven't got coded as factors or characters or something strange. Actually, as of version 2.0-0, mvr() etal should cope with factors without problems. They will be coded just as in lm(). Another thing to try is to say traceback() just after receiving the error message. That might tell you more about _where_ the error occurred. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] pls version 2.1-0
Version 2.1-0 of the pls package is now available on CRAN. The pls package implements partial least squares regression (PLSR) and principal component regression (PCR). Features of the package include - Several plsr algorithms: orthogonal scores, kernel pls, wide kernel pls, and simpls - Flexible cross-validation - A formula interface, with traditional methods like predict, coef, plot and summary - Functions for extraction of scores and loadings, and calculation of (R)MSEP and R2 - Functions for plotting predictions, validation statistics, coefficients, scores, loadings, and correlation loadings. The main changes since 2.0-0 are - Jackknife variance estimation of regression coefficients has been added. - The `wide kernel' PLS algorithm has been implemented. It is faster than the other algorithms for very wide data. - The definition of R^2 has been changed to 1 - SSE/SST for all estimators, so R2() will give different results for test sets and cross-validation compared to pls 2.1-0. Also, the internal calculations have been reorganised. - The plot functions for coefficients, predictions and validation results (R2, (R)MSEP) have gained an argument `main' to set the main title of the plot. - plots that go over several pages now only set `par(ask = TRUE)' if the plot device is interactive (suggested by Kevin Wright). - mvr() and mvrCv() now check for near zero standard deviation when autoscaling (`scale = TRUE') See the file CHANGES in the sources for all changes. -- Bjørn-Helge Mevik and Ron Wehrens ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compute R2 and Q2 in PLS with pls.pcr package
Ana Conesa wrote: I am using the mvr function of the package pls.pcr to compute PLS resgression You should consider switching to the package 'pls'. It supersedes 'pls.pcr', which is no longer maintained (the last version came in 2005). In pls, you would do the following to get R^2 and cross-validated R^2 (A.K.A. Q^2): mypls - plsr(Ytrain ~ Xtrain, ncomp = 1, validation=LOO) ## R^2: R2(mypls, estimate = train) ## cross-validated R^2: R2(mypls) ## Both: R2(mypls, estimate = all) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Who uses R?
(Ted Harding) wrote: Pat Altham (now retired) developed extensive teaching (and other) materials in R at the Cambridge University Statistical Laboratory. From her personal web page: Some of the computer languages I have had to try to learn since graduating in 1964: Cambridge autocode, algol, phoenix, machine-code, Fortran, BBC-Basic, GLIM, GENSTAT, Linux, S-Plus and finally (probably the best so far!) R. Well, calling Linux a computer language will probably not add too much credibility to the quote(r). :-) -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is RDA file and how to open it in R program?
Jittima Piriyapongsa wrote: I have a set of gene expression data in .RDA file. I have downloaded Bioconductor and R program for analyzing these data. Anyway, I am not sure how to open this RDA file in R program (what is the command?) in order to look at these data. load(filename.RDA) (.RDA (or .rda) is short for .RData (or .rdata :-). It is the usual file format for saving R objects to file (with save() or save.image()).) And which package should I use for analyzing it e.g. plot the expression image? That depends entirely on what is inside the file. The best idea is probably to ask the one(s) who created the file. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.