from:"Bjørn\-Helge Mevik"

Re: [R] time mathematics

2018-11-20 Thread Bjørn-Helge Mevik

Well, this is not an elegant (or robust) solution, but this would work
for the example you give, at least:

starttime <- as.POSIXct("2018-11-20 23:01:18") # Just pick a random date
format(starttime + c(0:4), format = "%T")

There are probably better ways. :)

-- 
Regards,
Bjørn-Helge Mevik


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] security using R at work

2018-08-09 Thread Bjørn-Helge Mevik

The section I'm working in runs a facility for sensitive research data
(https://www.uio.no/english/services/it/research/sensitive-data/).  Our
users use R (along with other analysis software).  We don't consider R
safe or unsafe, but have designed the services so that it should not be
possible (or at least very difficult) for sensitive information to leak
out of the network.

I would say that your best bet is to expect all analysis software to
have security holes or be compromised, and design your setup/network
around that assumption.

-- 
Regards,
Bjørn-Helge Mevik


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLS in R

2017-12-12 Thread Bjørn-Helge Mevik

Margarida Soares <margaridapmsoa...@gmail.com> writes:

> Thanks for your reply on pls!
> I have tried to do a correlation plot but I get the following group of
> graphs. Any way of having only 1 plot?
> This is my script:
>
> corrplot(plsrcue1, comp = 1:4, radii = c(sqrt(1/2), 1), identify = FALSE,
> type = "p" )

"Correlation loadings" are the correlations between each variable and
the selected components, so I don't see how you can have more than two
sets of correlations (i.e., more than two components) in a single
scatter plot.  You could have three sets in a 3d plot, of course, but
that you would have to implement yourself. :)

-- 
Regards,
Bjørn-Helge Mevik

signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLS in R

2017-12-06 Thread Bjørn-Helge Mevik

Margarida Soares <margaridapmsoa...@gmail.com> writes:

> library(pls)
> plsrcue<- plsr(cue~fb+cn+n+ph+fung+bact+resp, data = cue, ncomp=7,
> na.action = NULL, method = "kernelpls", scale=FALSE, validation = "LOO",
> model = TRUE, x = FALSE, y = FALSE)
> summary(plsrcue)
>
> and I got this output, where I think I can choose the number of components
> based on RMSEP, but how do I choose it?

There are no "hard" rules for how to choose the number of components,
but one rule of thumb is to stop when the RMSEP starts to flatten out,
or to increase.  In your case, I would say 4 components.  An easier way
to look at the RMSEP values is with plot(RMSEP(plsrcue)).

(There are some algorithms that can suggest the number of components for
you.  Two of those are implemented in the development of the plsr
package (hopefully released during Christmas).  You can check it out
here if you wish: https://github.com/bhmevik/pls .  Disclaimer: I am the
maintainer of the package. :) )

> - and also, how to proceed from here?

That depends on what you want to do/learn about the system you
aremodelling.  Many researchers in fields like spectroscopy or
chemometrics (where PLSR originated) plot loadings and scores and infer
things graphically.)

> - and how to make a correlation plot?

corrplot(plsrcue) - at least if you mean a correlation loadings plot.
See ?corrplot for details

> - what to do with the values, coefficients that I get in the Environment
> (pls values)

Again, that depends on what you want with your model.

-- 
Regards,
Bjørn-Helge Mevik

signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pls package - validation

2017-02-08 Thread Bjørn-Helge Mevik

Bert Gunter <bgunter.4...@gmail.com> writes:

> However, if I understand correctly, using pls or anything else to try
> to fit (some combination of) 501 variables to 16 data points -- and
> then crossvalidate with 6 data points -- is utter nonsense. You just
> have a fancy random number generator!

That is incorrect.  PLSR and other dimension reducing regression methods
can handle more prediction variables than samples perfectly fine -- many
of them were created for that purpose.

As for the original question: typically this happens when there is no
(or very little) correlation between the response and the prediction
variables.  (Or as they tend to say in chemometrics: You don't have a
model.)

> As I said, I think it better to follow up or complain about me on
> stackexchange rather than here.

Sorry, I read this too late. :)

-- 
Regards,
Bjørn-Helge Mevik

signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] pls 2.6-0 released

2016-12-18 Thread Bjørn-Helge Mevik

Version 2.6-0 of the pls package has been released and will be available
at your local CRAN mirror shortly.  The pls package implements Partial
Least Squares Regression, Principal Component Regression and Canonical
Powered PLS.

The major changes in 2.6-0 are:

- It now has a function selectNcomp() for automatically suggesting the
  optimal number of components for the model.  The function implements
  two different algorithms, and will optionally plot the RMSEP values
  and number of components.

- A description of selectNcomp() has been added to the vignette.

-- 
Regards,
Bjørn-Helge Mevik

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version 3.2.3: package not available error with https

2016-03-01 Thread Bjørn-Helge Mevik

Loris Bennett <loris.benn...@fu-berlin.de> writes:

> It seems that R needs libcurl 7.28.0, but my platform (Scientific Linux
> 6.7) only provides version 7.19.7.

We got "bit" by this when upgrading to 3.2.2.  If you cannot upgrade
libcurl on your machine(s), you can put

local({
options(useHTTPS = FALSE)
})

in the Rprofile.site file, or your ~/.Rprofile.  You still get a
warning, but you do get the list of http repositories.

Come to think about it: would it be an idea if R defaulted to useHTTPS =
FALSE if capabilites("libcur") is FALSE?

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with data structure when using plsr() from package pls

2016-01-18 Thread Bjørn-Helge Mevik

S Ellison <s.elli...@lgcgroup.com> writes:

> Reading ?plsr examples and inspecting the data they use, you need to arrange
> frame1 so that it has the data from n96 included as columns with names of the
> from "n96.xxx" whre xxx can be numbers, names etc.

No, you do not. :)  plsr() is happy with a data frame where n96 is a
single variable consisting of a matrix.  And this is the recommended way
for matrices with a lot of coloumns.  Which is what you get with

frame1 <- data.frame(gushVM, n96 = I(n96))

if n96 is a matrix, or

frame1 <- data.frame(gushVM, n96 = I(as.matrix(n96)))

if it is a data.frame.

> If n96 is a data frame, try something like
> names(n96) <- paste("n96", 1:96) 
> frame1 <- cbind(gushVM, n96)
>
> pls1 <- plsr(gushVM ~ n96, data = frame1)

Have you actually tried this?  It doesn't work:  For instance:

> gushVM <- 1:5
> n96 <- data.frame(a=1:5, b=2:6)
> names(n96) <- paste("n96", 1:2)
> n96
  n96 1 n96 2
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
> frame1 <- cbind(gushVM, n96)
> frame1
  gushVM n96 1 n96 2
1  1 1 2
2  2 2 3
3  3 3 4
4  4 4 5
5  5 5 6
> dim(frame1)
[1] 5 3
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : 
  invalid type (list) for variable 'n96'

The reason is that frame1 does _not_ contain a variable called 'n96', so
plsr() (or actually model.frame.default()) searches in the global work
space, where it finds a _data.frame_ n96.  A data.frame is a list.
Hence the error message.

> If n96 is a matrix, 
>
> frame1 <- data.frame(gushVM, n96=n96)
>
> should also give you a data frame with names of the right format.

It does not:

> n96 <- as.matrix(n96)
> frame1 <- data.frame(gushVM, n96=n96)
> frame1
  gushVM n96.n96.1 n96.n96.2
1  1 1 2
2  2 2 3
3  3 3 4
4  4 4 5
5  5 5 6
> dim(frame1)
[1] 5 3
> names(frame1)
[1] "gushVM""n96.n96.1" "n96.n96.2"

So the data frame still does not have any variable named 'n96'.  The
only reason

> pls1 <- plsr(gushVM ~ n96, data = frame1)

seems to work, is that the 'n96' variable it now finds in the global
environment, happens to be a matrix

> class(n96)
[1] "matrix"

If that wasn't there, you would get an error:

> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in eval(expr, envir, enclos) : object 'n96' not found

> I() wrapped round a matrix or data frame does nothing like what is needed if
> you include it in a data frame construction, so either things have changed
> since the tutorial was written, or the authors were not handling a matrix or
> data frame with I().

Yes it does. :)  Nothing (substantial) has changed, and we did/do handle
matrices with I():

> n96 <- matrix(1:10, ncol=2)
> n96
 [,1] [,2]
[1,]16
[2,]27
[3,]38
[4,]49
[5,]5   10
> frame1 <- data.frame(gushVM, I(n96))
> frame1
  gushVM n96.1 n96.2
1  1 1 6
2  2 2 7
3  3 3 8
4  4 4 9
5  5 510
> dim(frame1)
[1] 5 2
> names(frame1)
[1] "gushVM" "n96"   
> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
> pls1
Partial least squares regression , fitted with the kernel algorithm.
Call:
plsr(formula = gushVM ~ n96, data = frame1)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with data structure when using plsr() from package pls

2016-01-15 Thread Bjørn-Helge Mevik

Jeff Newmiller <jdnew...@dcn.davis.ca.us> writes:

> Using I() in the data.frame seems ill-advised to me. You complain about 96
> variables but from reading your explanation that seems to be what your data
> are.

In PSLR, it is common to regress a variable against matrices with very
many coloumns, often several thousands.  Using a data frame with one
predictor variable for each coloumn is going to make the formula
handling very slow.  And if you have several such predictor matrices, it
is very practical to keep them as single variables in the data frame, so
you easily can select/deselect which groups of variables you want in the
model.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with data structure when using plsr() from package pls

2016-01-15 Thread Bjørn-Helge Mevik

CG Pettersson <cg.petters...@lantmannen.com> writes:

>> frame1 <- data.frame(gushVM, I(n96))

[...]

>> pls1 <- plsr(gushVM ~ n96, data = frame1)
> Error in model.frame.default(formula = gushVM ~ n96, data = frame1) :
>   invalid type (list) for variable 'n96'

As far as I can remember, you get this error if the n96 object was a
data.frame instead of a matrix.  Can you check with, e.g.,

> class(n96)

If it says "data.frame", try using I(as.matrix(n96)).

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installing R 3.2.2 on machine with old libcurl

2015-10-15 Thread Bjørn-Helge Mevik

We have to install R 3.2.2 on machines with too old libcurl to be able
to use https when installing packages, etc.

When a user tries to use install.packages() (with the default value of
the "repos" option), she is presented with a list of https-repos, which
is not very useful.  She also gets an error message

Error in download.file(url, destfile = f, quiet = TRUE) : 
  unsupported URL scheme

We have put

local({
options(useHTTPS = FALSE)
})

into the Rprofile.site file, and after that, the user gets a list of
http repos, so she will be able to install packages.  But the error
message is still displayed, which can be confusing.  Is there a way
around this problem?

Also, perhaps the useHTTPS option should default to FALSE if the libcurl
capability is FALSE?


-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] pls 2.5-0 released

2015-08-22 Thread Bjørn-Helge Mevik

Version 2.5-0 of the pls package has been released.  The pls package
implements Partial Least Squares Regression, Principal Component
Regression and Canonical Powered PLS.

The major changes are:

- Cross-validation can now make sure that replicates are kept in the
  same segment, by the use of a new argument `nrep'.  See ?cvsegments
  for details.

- It now has a vignette.

- It now has a NEWS file that can be accessed by news().
 
-- 
Regards,
Bjørn-Helge Mevik

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A strange problem using pls package

2015-07-27 Thread Bjørn-Helge Mevik

PO SU rhelpmaill...@163.com writes:

  suppose data has 20 columns
   traindata - data[ 1:10, 1:10]
  testdata - data[11:15,1:10]
   pls.fit - plsr(y~x, ncomp = 5, data = traindata, method= simpls, scale = 
 FALSE, model = TRUE, validation = CV)
 ok, i get some result, the srange thing happens when i redo the plsr, i mean, 
 i use

  traindata - data[ 1:10, 1:20]
  testdata - data[11:15,1:20]
  pls.fit - plsr(y~x, ncomp = 5, data = traindata, method= simpls, scale = 
 FALSE, model = TRUE, validation = CV)


 I get the same result as the first one!!!

The reason is probably that you ask plsr() to use the coloumn of
traindata called x as the predictor.  Then it will only use that
coloumn, no matter how many coloumns traindata contains.

The usual way of using plsr() is to have a data.frame with a _matrix_ as
the predictor coloumn, for instance like this:

mydata - data.frame(y = some_vector, X = I(some_matrix))
mymodel - plsr(y ~ X, ..., data = mydata)

If you want to have the predictors as separate vectors, you must name
all of them in the formula (y ~ x1 + x2 + x3 + ...), or you can use the
following shortcut to regress y on all the remaining coloumns:
plsr(y ~ ., ..., data = mydata)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] - PLS-Package - PLSR loadings

2014-12-17 Thread Bjørn-Helge Mevik

Wolfgang Obermeier wolfgang.oberme...@geo.uni-marburg.de writes:

 how is it possible that the loadings of the second or even third component of
 a PLS-Analysis show higher values than the first component? Somebody got an
 idea??

The loadings of a PLS regression are simply the coefficients that are
multiplied with the X variables to transform X to the latent vectors
used in the regression (this is slightly over-simplified).  There is no
reason why the coefficients of the first component should be larger than
the coefficients of other components.  (In fact, it is often the case
that when one fits too many components (i.e., one starts to model
noise), the coefficients of the last components get higher and
higher.)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help plsr function

2014-06-24 Thread Bjørn-Helge Mevik

annie Zhang annie.zhang2...@gmail.com writes:

 ## the predicted scores from the model
 (pred - predict(data.cpls,n.comp=1:2,newdata=x.new,type=score))
 ## the predicted scores using x%*%projection
 cbind(x.new.centered%*%data.cpls$projection[,1],x.new.centered%*%data.cpls$projection[,2])

 Can someone please tell me why the two predicted scores don't match?

If you look at the code that does the prediction:

 pls:::predict.mvr
function (object, newdata, ncomp = 1:object$ncomp, comps, type = c(response, 
scores), na.action = na.pass, ...) 
{
[...]
   TT - (newX - rep(object$Xmeans, each = nobs)) %*% 
object$projection[, comps]

you will see that it subtracts the _old X_ coloumn means from the new X
matrix, not the _new X_ coloumn means.  So

sweep(x.new, 2, data.cpls$Xmeans, -) %*% data.cpls$projection[,1:2]

will reproduce the values from predict().

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about R2 in pls package

2013-09-05 Thread Bjørn-Helge Mevik

Euna Jeong eaje...@gmail.com writes:

 I have questions about R2 used in pls (or multivariate analysis).

 Is R2 same with the square of the PCC (Pearson Correlation Coefficient)?

If you read the manual for R2 in the pls package, it will tell you how
R2 is calculated there, and that for _training_ data it is indeed
PCC^2, but _not_ for cross-validation or test data.

IMHO, R^2 only has a meaningful interpretation for training data.  For
test data or cross-validation, I prefer MSEP or RMSEP.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about the prediction plot in pls package

2013-09-03 Thread Bjørn-Helge Mevik

Euna Jeong eaje...@gmail.com writes:

 R plot(gas1, ncomp=2, asp = 1, line = TRUE)

 This shows only the cross-validated predictions.

If you add the argument which = c(train, validation) (see
?predplot.mvr), you will get both.  However, you will get them in
separate panels in the plot.

If you wish to have them in the same panel, you will have to add the
points yourself.  This should work:

plot(gas1, ncomp=2, asp = 1, line = TRUE)
points(predict(gas1, ncomp = 2) ~ gasoline$octane, col = red)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] pls 2.4-3 released

2013-08-11 Thread Bjørn-Helge Mevik

Version 2.4-3 of the pls package has been released.  Windows and OSX
binaries should appear shortly.

The pls package implements Partial Least Squares Regression, Principal
Component Regression and Canonical Powered PLS Regression.

The major changes are:

- Can now perform cross-validation in parallel, using the facilities of
  the 'parallel' package.  See ?pls.options and the examples in ?mvr for
  details.  (Note: in order to use MPI, packages 'snow' and 'Rmpi' must
  be installed, because 'parallel' rely on them for MPI parallelisation.)

Other user-visible changes:

- In order to comply with current CRAN submission policies,
  pls.options() no longer stores the modified option list in the global 
  environment.  This has the effect that the options will have to be set
  every time R is started, even if the work space was saved an loaded.

-- 
Regards,
Bjørn-Helge Mevik

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data structure for plsr

2012-10-05 Thread Bjørn-Helge Mevik

Emma Jones evjo...@ualberta.ca writes:

 My current data structure consists of a .csv
 file read into R containing 15 columns (a charcoal dilution series going
From 100% to 0%) and 1050 rows of absorbance data from 400 nm to 2500 nm at
  2 nm interval. I think I need to transpose the data such that the specific
 wavelengths become my columns and dilutions are defined in rows,

Yes, you need to transpose the data so a coloumn corresponds to a
variable (response or predictor).

 Should I (and how do I) make my absorbance data into
 individual matrices that read into a data frame with only two columns

It is best to put all predictors (wavelengths) together in one matrix,
yes.  The same for the responses, if you have more than one response
coloumn.

This is untested, so there might be errors:

Assuming that your spectroscopic data is read into a data frame called
origspec:

## This should create a matrix with the wavelengths as coloumns:
spec - t(as.matrix(origspec))

I don't know what your response is, so I'm just assuming it is in a
vector called resp.

# This would create a data frame suitable for plsr():
mydata - data.frame(resp = resp, spec = I(spec))

Then you can analyse like this:

plsr(resp ~ spec, data = mydata, )

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] --enable-R-shlib and external BLAS/LAPACK libraries

2012-09-21 Thread Bjørn-Helge Mevik

A couple of years ago I noted that using the configure switch
--enable-R-shlib when buildig R made configure ignore any specified
external LAPACK library (I cannot recall if also the BLAS specification
was ignored) and use the internal one insted.  I asked why, and was told
it was intentional.

Now, with R 2.15.1, I see that it at least appears that this is no
longer the case.  I've run configure like this:

fast=-ip -O3 -opt-mem-layout-trans=3 -xHost -mavx
export CC=icc
export CFLAGS=$fast -wd188 -fp-model precise
export F77=ifort
export FFLAGS=$fast -fp-model precise
export CXX=icpc
export CXXFLAGS=$fast -fp-model precise
export FC=ifort
export FCFLAGS=$fast -fp-model precise

./configure --with-blas='-mkl=parallel' --with-lapack --enable-R-shlib

(in addition, paths to the intel compilers and librareis are set up).

The output from configure says:

  Interfaces supported:  X11, tcltk
  External libraries:readline, BLAS(generic), LAPACK(in blas)
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:   shared R library, R profiling, Java

After make install, we get a libR.so linked to MKL libraries (see below
for details).

Am I correct in assuming that this R will use the intel MKL libraries
for BLAS and LAPACK routines?  (That would be very nice, because we want
to use the fast libraries, but some of our uses need to have libR.so, so
up to now, we've had to build to versions of R.)


# ldd libR.so 
linux-vdso.so.1 =  (0x7ff52bcf8000)
libifport.so.5 = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libifport.so.5
 (0x7ff52b47d000)
libifcore.so.5 = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libifcore.so.5
 (0x7ff52b238000)
libimf.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libimf.so
 (0x7ff52ae6d000)
libsvml.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libsvml.so
 (0x7ff52a6f3000)
libm.so.6 = /lib64/libm.so.6 (0x7ff52a45a000)
libirc.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libirc.so
 (0x7ff52a30b000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x7ff52a0ee000)
libdl.so.2 = /lib64/libdl.so.2 (0x7ff529ee9000)
libreadline.so.6 = /lib64/libreadline.so.6 (0x7ff529ca6000)
librt.so.1 = /lib64/librt.so.1 (0x7ff529a9e000)
libmkl_intel_lp64.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_intel_lp64.so
 (0x7ff5292b7000)
libmkl_intel_thread.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_intel_thread.so
 (0x7ff528238000)
libmkl_core.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_core.so
 (0x7ff5271c2000)
libiomp5.so = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libiomp5.so
 (0x7ff526ecf000)
libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x7ff526cb9000)
libintlc.so.5 = 
/cluster/software/VERSIONS/intel-2011.10/composer_xe_2011_sp1/lib/intel64/libintlc.so.5
 (0x7ff526b6a000)
libc.so.6 = /lib64/libc.so.6 (0x7ff5267d7000)
/lib64/ld-linux-x86-64.so.2 (0x00344520)
libtinfo.so.5 = /lib64/libtinfo.so.5 (0x7ff5265b6000)



-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLSR AND PCR ISSUES

2012-07-30 Thread Bjørn-Helge Mevik

You give us far too little information about what you do, what you
want and what happens.

Given that, the only help one can give is: Read the documentation. :)

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discrepancies in the estimates of Partial least square (PLS) in SAS and R

2012-05-04 Thread Bjørn-Helge Mevik

rakeshnb rakeshn...@gmail.com writes:

 I am using pls package but how is scaling done in R?

That is documented in the help pages:

 library(pls)
 ?plsr
[snip]
   scale: numeric vector, or logical.  If numeric vector, X is scaled
  by dividing each variable with the corresponding element of
  'scale'.  If 'scale' is 'TRUE', X is scaled by dividing each
  variable by its sample standard deviation.  If
  cross-validation is selected, scaling by the standard
  deviation is done for every segment.

When in doubt, read the documentation. :)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discrepancies in the estimates of Partial least square (PLS) in SAS and R

2012-05-03 Thread Bjørn-Helge Mevik

rakeshnb rakeshn...@gmail.com writes:

 I have been using R and SAS from past 6 months and i found a interesting
 thing while doing PLS in R and SAS is that when we use NO SCALE option in
 SAS and scale=FALSE in R , we see the estimates are matching but if we use
 scaling option in SAS and R the estimates differ to greater extent , you can
 try with any data set we will get very different estimates while using the
 scaling option. can any one help me in this issue ?

My guess is that they use different scalings, which of course will give
different results.  However, since you don't say anything about which R
package you use for PLSR (and since I don't have access to SAS), I can
only guess. :)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLS Error message

2012-03-06 Thread Bjørn-Helge Mevik

Thomas Möckel thomas.moc...@nateko.lu.se writes:

 I work with hyperspectral remote sensing data and I try to built a pls 
 model with this data. I already built the model but if I try to 
 calculate the RMSEP and R2 with a test data set I get the following 
 error message:

 Error: variable 'subX' was fitted with type nmatrix.501 but type 
 nmatrix.73 was supplied

Since you don't show what commands you used, this is guesswork, but my
guess is that you used

yourmodel - plsr(yourresponse ~ subX, data = yourdata)
R2(yourmodel, newdata = yournewdata)

and that yourdata$subX contains 501 coloumns, but yournewdata$subX only
contains 73 coloumns.  You must supply a newdata with the same number of
coloumns as in the modelling data.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLS predict

2012-03-06 Thread Bjørn-Helge Mevik

Thomas Möckel thomas.moc...@nateko.lu.se writes:

 I have a question about understanding PLS. If I use the predict function of R
 than it seems to me the function only uses the last latent variable to model
 new Y values. But should the function not use all latent variables to model
 new Y´s?

It should, and it definitely does.  The _effect_ of each latent variable
can vary a lot, though, but even then, the first ones usually have the
greatest effect.

Again, since you don't show what you did, it is hard to be more
specific.

-- 
B/H

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframes in PLS package

2012-03-05 Thread Bjørn-Helge Mevik

R. Michael Weylandt michael.weyla...@gmail.com writes:

 Without that though, I'm not sure you need the I(as.matrix.(dep)) and
 I(as.matrix(ind)), I would imagine (untested) that eqn -
 data.frame(depy = dep, indx = ind) would work (probably better as I()
 changes things just a little).

The I() must be there to prevent data.frame() from separating the
coloumns of the matrices into individual variables in the data frame.
Without I() there will be no variables depy and indx in the data frame.

Try this:

 A - matrix(1:4, ncol=2)
 B - matrix(2:5, ncol=2)
 A
 [,1] [,2]
[1,]13
[2,]24
 B
 [,1] [,2]
[1,]24
[2,]35

 ## With I():
 d1 - data.frame(A = I(A), B = I(B))
 d1
  A.1 A.2 B.1 B.2
1   1   3   2   4
2   2   4   3   5
 names(d1)
[1] A B
 d1$A
 [,1] [,2]
[1,]13
[2,]24

 ## Without I():
 d2 - data.frame(A = A, B = B)
 d2
  A.1 A.2 B.1 B.2
1   1   3   2   4
2   2   4   3   5
 names(d2)
[1] A.1 A.2 B.1 B.2
 d2$A
NULL
 d2$A.1
[1] 1 2


-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframes in PLS package

2012-03-05 Thread Bjørn-Helge Mevik

westland westl...@uic.edu writes:

 Here is the dput(eqn)  and showData for the file 'eqn':
[...]
 showData(eqn)
  
 depy.w depy.h depy.d depy.s indx.a indx.i indx.r indx.x
   63 55  1  0 44  37200  4  0
145 52  1  1 33  69300  4  1
104 32  0  1 68  56900  3  1
109 69  1  1 94  44300  6  1
221 61  0  1 72  79800  6  0
110 40  1  1 48  17600  5  1
194 41  0  0 85  58100  4  0
120 76  1  1 19  76700  3  0
210 61  0  0 41  37600  1  0 ... etc.

Okay, let me guess: you took the data in the file pls, created a data
frame eqn with two matrices in it, then used write.table() to write
eqn to a file, and then read it back with read.table().

If that is so, the problem you have is that write.table() will separate
the coloumns of the matrices into separate coloumns in the file (it
really has no other choice), and then read.table() will of course read
those in as separate coloumns again.

You have two solutions:

1) Repeat the commands to recreate the eqn data frame as a a data frame
with matrices, after reading it in from file:
   eqn - data.frame(depy = I(as.matrix(eqn[,1:4])), 
 indx = I(as.matrix(eqn[,5:8])))

2) Save the data frame in an .RData file with save() instead of as a
text file with write.table().  That will keep the structure of the
variable.





 Initially, I had input a file 'pls' with the script:

 dep - pls[,1:4]
 ind - pls[,5:8]
 eqn - data.frame(depy = dep, indx = ind)
 apls - plsr(depy ~ indx, data=eqn)

  and this gives me   [7] ERROR:  object 'depy' not found

because you are missing the I(as.matrix()).

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframes in PLS package

2012-03-05 Thread Bjørn-Helge Mevik

westland westl...@uic.edu writes:

 R still doesn't seem to recognize the data.frame ...  I get a [6] ERROR: 
 object 'depy.w' not found from the following code:

 dep - pls[,1:4]
 ind - pls[,5:8]
 eqn - data.frame(depy = dep, indx = ind) 
 apls - plsr(depy.w + depy.h + depy.d + depy.s ~ indx.a + indx.i + indx.r +
 indx.x,  data=eqn)


 BUT  I DID try to cbind() these after add-concatenating them (not sure
 exactly what I am doing) like so ...

 apls - plsr(cbind(depy.w ,depy.h , depy.d , depy.s) ~ cbind(indx.a , indx.i
 , indx.r,indx.x), data=eqn)

For creating multi-coloumn responses on-the-fly, using cbind() like this
works.  However, you don't need that for the predictors; there you can
get by with just using '+'.

If you only have a few predictors/responses, this will work okay, but if
you have many, it will take a lot of typing, and make the
formula handling part of plsr() take _ages_.  Then using matrices is
easier and faster.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframes in PLS package

2012-02-21 Thread Bjørn-Helge Mevik

westland westl...@uic.edu writes:

 Here is what I have done:

 I read in an 1 x 8 table of data, and assign the first four columns to
 matrix A and the second four to matrix B

 pls -read.table(C:/Users/Chris/Desktop/SEM Book/SEM Stat
 Example/Simple Header Data for SEM.csv,header=TRUE, sep=,,
 na.strings=NA, dec=., strip.white=TRUE)

The problem is here:

 A - c(pls[1],pls[2],pls[3],pls[4])
 B - c(pls[5],pls[6],pls[7],pls[8])

This creates lists A and B, not data frames.

Either use cbind() instead of c(), or simply say

A - pls[,1:4]
B - pls[,5:8]

The the rest should work.

Btw. it is probably a good idea to avoid single-character names for
variables.  Especially c and C, because they are names of functions in R.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] pls 2.3.0 released

2011-12-06 Thread Bjørn-Helge Mevik

Version 2.3.0 of the pls package has been released.  The pls package
implements Partial Least Squares Regression and Principal Component
Regression.

The major changes are:

- New analysis method Canonical Powered PLS (CPPLS) implmemented.  See
  ?cppls.fit.

- coefplot() can now plot whiskers at +/-1 SE (since 2.2.0).  See ?coefplot

- The package now has a name space (since 2.2.0).

-- 
Regards,
Bjørn-Helge Mevik

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about plsr() results

2011-12-02 Thread Bjørn-Helge Mevik

Vytautas Rakeviius vytautas1...@yahoo.com writes:

 But still I have question about results interpretation. In the end I
 want to construct prediction function in form:
 Y=a1x1+a2x2

The predict() function does the prediction for you.  If you want to
construct the prediction _equation_, you can extract the coefficients
from the model with

coef(yourmodel, ncomp = thenumberofcomponents, intercept = TRUE)

See ?coef.mvr for details.

 Documentation do not describe this.

The pls package is designed to work as much as possible like the lm()
function and its methods, helpers.  So read any introduction to linear
models in R, and you will come a long way.

There is also a paper in JSS about the pls package: 
http://www.jstatsoft.org/v18/i02/

-- 
Cheers,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plsr how to return my formula

2011-11-28 Thread Bjørn-Helge Mevik

Try to read the pls package article available here: 

http://www.jstatsoft.org/v18/i02/

-- 
Cheers,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use of segments in PLS

2011-10-21 Thread Bjørn-Helge Mevik

arunkumar akpbond...@gmail.com writes:

 How to use the segments in the PLS

 fit1 - mvr(formula=Y~X1+X2+X3+X4+x5++x27, data=Dataset, comp=5,segment
 =7 ) 

 here when i use segments,the error was like this

 rror in mvrCv(X, Y, ncomp, method = method, scale = sdscale, ...) : 
   argument 7 matches multiple formal arguments

This cannot be true.  mvr does not call mvrCv unless you give it the
argument validation = CV or validation = LOO.

Anyway, the argument is segments, not segment, which - as the error
message says - matches multiple arguments, in this case segment.type.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R square and F - stats in PLS

2011-10-21 Thread Bjørn-Helge Mevik

arunkumar akpbond...@gmail.com writes:

 In the lm function the summary(lmobject) we have adjusted.r square and f
 statistics

 Do we have similar to the pls package and how to get it

No.  Both of these requires theory about the model that doesn't exist
for PLSR.  (I should note that there have been published a couple of
generalisations of the degrees of freedom to general regression models,
and these could be used to calculate an adjusted R^2.  However, they
have not been implemented in the pls package.)

It seems you would like to use PLSR the way you use OLS, with classical
hypothesis tests and performance statistics.  This is not how PLSR is
usually applied, and there are few such tools.  The traditional/typical
focus amongst PLSR practicioners is much more on prediction performance
(RMSEP) and interpretation by plotting scores and loadings.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting p-value and standard error in PLS

2011-10-19 Thread Bjørn-Helge Mevik

arunkumar akpbond...@gmail.com writes:

 How to get p-value and the standard error in PLS

There is (to my knowledge) no theory able to calculate p-values for the
regression coefficients in PLS regression.  Most practicioners use
cross-validation to estimate the Root Mean Squared Error (RMSEP) and use
that as a measure of the quality of the fit.  PLS regression is
typically used when you have many (hundreds, thousands, tens of
thousands) of predictors, where individual p-values are not very useful.

The pls package does implement the jackknife to estimate the
variance/standard error of the regression coefficients.  There is even a
function to calculate p-values from that, but please _do_ read the
warning in the documentation: the distribution of the t values used in
the test is _unknown_.  See the example in ?jack.test for how to use the
jackknife.

 I have used the following function to calculate PLS

 fit1 - mvr(formula=Y~X1+X2+X3+X4, data=Dataset, comp=4)

From a previous message on this list, I see that each of these predictor
terms (X1, ...) is a vector.  Thus you have only 4 predictor variables,
so it would probably be better to use Ordinary Least Squares (OLS)
regression (the lm() function in R).  There you get p-values automatically.

Furthermore, a PLS regression with the same number of components as
predictor variables is equivalent to OLS, so there seems no reason to
use PLS at all in your case.

-- 
Cheers,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with plotting plsr loadings

2011-06-10 Thread Bjørn-Helge Mevik

Amit Patel amitrh...@yahoo.co.uk writes:

plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = 
numbers, 
xlab = nm)

 Error in loadingplot.default(x, ...) : 
   Could not convert variable names to numbers.


  str(BHPLS1_Loadings)
  loadings [1:8892, 1:60] -0.00717 0.00414 0.02611 0.00468 -0.00676 ...
  - attr(*, dimnames)=List of 2
   ..$ : chr [1:8892] PCIList1 PCIList2 PCIList3 PCIList4 ...
   ..$ : chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ...
  - attr(*, explvar)= Named num [1:60] 2.67 4.14 4.41 3.55 2.59 ...
   ..- attr(*, names)= chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ...

 Can anyone see the problem??

By using `labels = numbers', you are asking the plot function to
convert the names PCIList1 PCIList2 PCIList3 PCIList4 ... to
numbers.  It doesn't know how to do that.  (See ?loadingplot for the
details.)

Your options are using `label = names', provide your own labels, not
using the `labels' argument, or converting the names manually.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with PLSR with jack knife

2011-05-18 Thread Bjørn-Helge Mevik

Amit Patel amitrh...@yahoo.co.uk writes:

 BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = 10, data = PLSdata, validation 
 = 
 LOO)

 and

 BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = 10, data = PLSdata, validation 
 = 
 CV)
[...]
 Now I am unsure of how to utilise these to identify the significant 
 variables. 

You can use the jackknife built into plsr to get an indication about
significant variables, by adding the argument jackknife = TRUE to the
plsr call.  Use jack.test(BHPLS1) to do the test.

But _PLEASE_ do read the Warning section inf ?jack.test!

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with PLSR Loadings

2011-05-18 Thread Bjørn-Helge Mevik

Amit Patel amitrh...@yahoo.co.uk writes:

 x - loadings(BHPLS1)

 my loadings contain variable names rather than numbers.

No, they don't.

str(x) 
  loadings [1:94727, 1:10] -0.00113 -0.03001 -0.00059 -0.00734 -0.02969 ...
  - attr(*, dimnames)=List of 2
   ..$ : chr [1:94727] PCIList1 PCIList2 PCIList3 PCIList4 ...
   ..$ : chr [1:10] Comp 1 Comp 2 Comp 3 Comp 4 ...
  - attr(*, explvar)= Named num [1:10] 14.57 6.62 7.59 5.91 3.26 ...
   ..- attr(*, names)= chr [1:10] Comp 1 Comp 2 Comp 3 Comp 4 ...

Look at the first line of output.  These are the values, and they are
numeric (it is a matrix).  The other lines are attributes of the matrix.

plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = 
numbers, 
xlab = nm)
 Error in loadingplot.default(x, ...) : 
   Could not convert variable names to numbers.

This says that loadingplot.default could not convert variable _names_ to
numbers.  That is not surprising, since the variable names are PCIList1,
PCIList2, etc., and the documentation for loadinplot says:

 with 'numbers', the variable names are converted to numbers, if
 possible.  Variable names of the forms 'number' or 'number
 text' (where the space is optional), are handled.

So don't ask the plot function to use numbers as labels.  Use e.g. names
instead: labels = names.

Tip: It is always a good idea to read the output and error messages very
carefully.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Help with PLSR

2011-05-13 Thread Bjørn-Helge Mevik

Amit Patel amitrh...@yahoo.co.uk writes:

str(FullDataListTrans)
  num [1:40, 1:94727] 42 40.9 65 56 61.7 ...
  - attr(*, dimnames)=List of 2
   ..$ : chr [1:40] X X.1 X.12 X.13 ...
   ..$ : NULL

 I have also created a vector GroupingList which gives the groupnames for 
 each 
 respective sample(row).

 GroupingList
  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 
 4 4
 [39] 4 4
 str(GroupingList)
  int [1:40] 1 1 1 1 1 1 1 1 1 1 ...

 I am now stuck while conducting the plsr. I have tried various methods of 
 creating structured lists etc  and have got nowhere. I have also tried many 
 incarnations of 


 BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = FeaturePresenceExpected[1], 
 data 
 = FullDataListTrans, validation = LOO)

 Where am I going wrong.

You are not telling us what happens (or how you tried to make
structured lists), but from your description of the
data, FullDataListTrans is a matrix with only the the predictor
variables, and GroupingList is a vector with the response.  The data
argument of plsr() (as of most modelling functions in R) expects a
data.frame with both response and predictor variables.

Try this:

mydata - data.frame(GroupingList = GroupingList, PCIList = 
I(FullDataListTrans))

(The I() is to prevent R from making the coloumns in FullDataListTrans
separate variables in the data frame.)

BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = FeaturePresenceExpected[1], data 
= FullDataListTrans, validation = LOO)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R at Supercomputing 10

2010-10-27 Thread Bjørn-Helge Mevik

SC10 Disruptive Technology Preview: The First Cloud Portal to “R” and Beyond

http://www.hpcinthecloud.com/features/SC10-Disruptive-Technology-Preview--The-First-Cloud-Portal-to-R-and-Beyond-105776458.html?viewAll=y

(My apologies if ths has been posted already.)

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems using external BLAS

2010-08-06 Thread Bjørn-Helge Mevik

I have problems building R 2.11.1 with an external BLAS.  I've tried
several with several libraries:

# ACML:
export LD_LIBRARY_PATH=/site/VERSIONS/acml-3.6.0/gfortran64_int64/lib
BLAS=--with-blas=-L/site/VERSIONS/acml-3.6.0/gfortran64_int64/lib -lacml
LAPACK=--with-lapack

# MKL 11:
BLAS=--with-blas=-L/site/VERSIONS/intel-11.1/mkl/lib/em64t -lmkl_gf_lp64 
-lmkl_sequential -lmkl_lapack -lmkl_core
LAPACK=--with-lapack

# MKL 8.1, trad. way:
BLAS=--with-blas=-L/site/intel/cmkl/8.1/lib/em64t -lmkl -lvml -lguide 
-lpthread
LAPACK=--with-lapack

I configure R like this:

export CFLAGS=-O3 -mtune=opteron
export FFLAGS=-O3 -mtune=opteron
export CXXFLAGS=-O3 -mtune=opteron
export FCFLAGS=-O3 -mtune=opteron

./configure --prefix=/site/VERSIONS/R-2.11.1 \
  $BLAS $LAPACK \
  --enable-R-shlib

In all cases, I get

configure:29120: checking whether double complex BLAS can be used
configure:29206: result: no

The conftestf.f and conftest.c seem to compile fine, but exit status
from conftest in line 29181 of configure is nonzero.

This is on a Quad-Core AMD Opteron node running CentOS 5.2, with gcc and
gfortran version 4.1.2 20071124 (Red Hat 4.1.2-42)

(I have also tried without the *FLAGS variables, and without
--with-lapack.  The result is the same.)

(We have successfully built older versions of R with MKL 8.1 earlier, but
with intel compilers v. 10.1, using
./configure --prefix=/site/VERSIONS/R-$version \
  --with-blas=-L/site/intel/cmkl/8.1/lib/em64t -lmkl -lvml -lguide -lpthread \
  --with-lapack=-L/site/intel/cmkl/8.1/lib/em64t -lmkl_lapack64 -lmkl \
  --enable-R-shlib
but we wanted to switch to gcc because not all R packages compile with icc.)

Does anyone have any idea about what could be wrong?


-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R2 function from PLS to use a model on test data

2010-08-03 Thread Bjørn-Helge Mevik

Addi Wei addi...@gmail.com writes:

 Hello,  
I am having some trouble using a model I created from plsr (of train) to
 analyze each invididual R^2 of the 10 components against the test data.  For
 example:

 mice1 - plsr(response ~factors, ncomp=10 data=MiceTrain)
 R2(mice1)##this provides the correct R2 for the Train data for 10
 components
 ## Now my next objective is to calculate my model's R2 for each component on
 the Test data. (In other words - test how good the model is on test data) 
 The only thing I need are the MiceTest.response, and compare that with
 predict(mice1, ncomp=1, newdata=MiceTest , and I should be able to calculate
 R2.but I can't figure out the correct command to do this.   I tried the
 command below, which does provide a different R2 response, however, I'm not
 sure it is correct as I get a different R^2 value from another software MOE
 ( Molecular Operating Environment ).

 R2(mice1, estimate=test, MiceTest)

 Is the above the correct code to achieve what I'm doing?  If so, then MOE
 probably uses a different function to calculate the model component's R^2
 for Test data.

That is the way to get test set R^2 for PLSR/PCR models, yes.

If you read in the documentation of R2, you will find:

 The R^2 values returned by 'R2' are calculated as 1 - SSE/SST,
 where SST is the (corrected) total sum of squares of the response,
 and SSE is the sum of squared errors for either the fitted values
 (i.e., the residual sum of squares), test set predictions or
 cross-validated predictions (i.e., the PRESS).

This is, AFAIK, the most common way to define R^2.  For training data,
this is equivalent to cor(y, yhat)^2, but not for test data or
cross-validation.

From your second email, I would guess that MOE uses cor(y, yhat)^2 instead
of 1 - SSE/SST.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package(pls) - extracting explained Y-variance

2010-06-28 Thread Bjørn-Helge Mevik

Christian Jebsen jeb...@rz.uni-leipzig.de writes:

 Dear R-help users,

 I'd like to use the R-package pls and want to extract the explained
 Y-variance to identify the important (PLS-) principal components in my
 model, related to the y-data. For explained X-variance there is a  function:
 explvar(). If I understand it right, the summary()  function gives an
 overview, where the y-variance is shown, but I can't  extract it for
 plotting.

If you look at the summary function (summary.mvr), you will see that it
uses the R2 function for this:

yve - 100 * drop(R2(object, estimate = train, 
intercept = FALSE)$val)

(For cross-validated or test set validated models, it uses RMSEP.)

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gasoline Data in pls package

2010-05-12 Thread Bjørn-Helge Mevik

Ravi Ramaswamy raram...@gmail.com writes:

 I am using pls package for some pcr computations. There is a data set called
 gasoline.

 Would someone be able to tell me what command(s) could be used to produce
 this graph in R?

I presume you are talking about Figure 1 in the pls article in R news
2006/3 [1].  The plot was produced with the following commands:

data(gasoline)
par(mar = c(2, 4, 1, 0) + 0.1)
matplot(t(gasoline$NIR), type = l, ylab = log(1/R), xaxt = n)
ind - pretty(seq(from = 900, to = 1700, by = 2))
ind - ind[ind = 900  ind = 1700]
ind - (ind - 898) / 2
axis(1, ind, colnames(gasoline$NIR)[ind])

 I am not sure where the log(1/R) - Y-axis - are coming from

The measurements in the NIR matrix are log(1 / reflectance), hence the
label log(1/R).  This is how the data was published by Kalivas [2],
and is a standard way of representing Near Infrared Reflectance
measurements.

[1] http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf
[2] J. H. Kalivas. Two data sets of near infrared spectra.  Chemometrics and
Intelligent Laboratory Systems, 37: 255–259, 1997.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in plsr package

2010-02-22 Thread Bjørn-Helge Mevik

Peter Tillmann peter.tillm...@t-online.de writes:

 can anyone give an example how to use cross-validation in the plsr package.

There are examples in the references cited on
http://mevik.net/work/software/pls.html

 I miss to find the number of factors proposed by cross-validation as
 optimum.

The cross-validation in the pls package does not propose a number of
factors as optimum, you have to select this yourself.  (The reason for
this is that there is AFAIK no theoretically founded and widely accepted
way of doing this automatically.  I'd be happy to learn otherwise.)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pls package

2009-08-31 Thread Bjørn-Helge Mevik

Payam Minoofar payam.minoo...@meissner.com writes:

 I have managed to format my data into a single datframe consisting of two 
 AsIs response and predictor dataframes in order to supply the plsr command of 
 the pls package for principal components analysis.

 When I execute the command, however, I get this error:
 fiber1 - plsr(respmat ~ predmat, ncomp=1, data=inputmat,validation=LOO)
 Error in model.frame.default(formula = respmat ~ predmat, data = inputmat) :
   invalid type (list) for variable 'respmat'

 I happen to have a lot of NAs  in some of the columns. Is that the
 problem?

The underlying PLSR/PCR functions do not handle NAs, but that is
probably not the problem here.

My guess is that you have done something like

  inputmat - data.frame(respmat = I(foo), predmat = I(bar))

where foo (and perhaps bar) is a _data.frame_ (that is at leas
consistent with the error message).  If sapply(inputmat, class) produces
something like

 respmat  predmat
[1,] AsIs   AsIs  
[2,] data.frame data.frame

then this is certainly the case.  That will not work.  They should be
matrices instead of data frames, for instance by converting them like
this:

  inputmat - data.frame(respmat = I(as.matrix(foo)), predmat = 
I(as.matrix(bar)))

As for missing values: the default behaviour of plsr is to omit cases
with missing values.  This is controlled by the 'na.action' argument.
See ?na.action for details.

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NotePad++ Syntax file

2009-08-11 Thread Bjørn-Helge Mevik

[Ricardo Rodriguez] Your XEN ICT Team webmas...@xen.net writes:

 John Kane wrote:
 No but have you had a look at Tinn-R  http://www.sciviews.org/Tinn-R/. 

 Any similar option for Mac OS X?

I guess you can use Emacs on Mac OS X.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CRAN + geography = Cranography

2009-07-31 Thread Bjørn-Helge Mevik

Barry Rowlingson b.rowling...@lancaster.ac.uk writes:

 http://www.maths.lancs.ac.uk/~rowlings/R/Cranography/

Absolutely beautiful!

 Note this is just for fun. No warranties. Maybe I should use a little
 'R' as a marker.

That would be cool.

 Maybe I should get a life.

:-)

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PLS regression on near infrared (NIR) spectra data

2009-03-04 Thread Bjørn-Helge Mevik

Paulo Ricardo Gherardi Hein phein1...@gmail.com writes:

 I am new here (since jan2009) and up to now, I not seen anyone commenting
 about principal component analysis and regression PLS to analyze spectral
 information in R system. Sorry, I am a R starter...

 Anybody have any package, or trick to suggest me?

There is the package 'pls', with Principal Component Regression (PCR) and
Partial Least Squares Regression (PLSR).  It also contains a couple of
plots that are useful for princomp() or prcomp() analyses (PCA).

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PCA functions

2009-02-16 Thread Bjørn-Helge Mevik

glenn g1enn.robe...@btinternet.com writes:

 Is there a function (before I try and write it !) that allows the input of a
 covariance or correlation matrix to calculate PCA, rather than the actual
 data as in princomp()

Yes, there is: princomp(). :-)


-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package pls

2008-10-31 Thread Bjørn-Helge Mevik

tsn4867 [EMAIL PROTECTED] writes:

For the package pls, I need to understand the algorithm for simpls.fit
 for Partial Least Squares. I'm not sure if simpls.fit tries to find the
 weight vectors (loadings) to maximize which of the two:  Cov(Xw, y) or
 maximize Cov^2(Xw,y)? Are these objective functions equivalent? (in some
 texts, they use the first and in other texts, they use the second
 obj. function.). I think the algorithm for simpls.fit is using Cov(Xw,y).
 Also, can you give me some references where they state the equivalency of
 the two obj. functions?

The implementation in simpls.fit follows the algorithm in

 de Jong, S. (1993) SIMPLS: an alternative approach to partial
 least squares regression.  _Chemometrics and Intelligent
 Laboratory Systems_, *18*, 251-263.

(up to simplifications and performance changes).  I don't recall if the
criterion was cov or cov^2, but I believe they should be identical (up
to sign).

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R command line

2008-10-21 Thread Bjørn-Helge Mevik

Raphael Saldanha [EMAIL PROTECTED] writes:

 Is there a Gui for R with improvements in the command line? I'm not looking
 for buttons, menus and etc, but (more) colored syntax, auto-complete
 commands and etc?

ESS in Emacs, perhaps?

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate SPE in PLS package

2008-10-20 Thread Bjørn-Helge Mevik

Stella Sim [EMAIL PROTECTED] writes:

 I want to calculate SPE (squared prediction error) in x-space, can
 someone help?

 Here are my codes:

 fit.pls-
 plsr(Y~X,data=DAT,ncomp=3,scale=T,method='oscorespls',validation=CV,x=
 T) 
 actual-fit.pls$model$X

(The x = TRUE is not needed as long as model = TRUE (default).  x=TRUE
returns the predictors as fit.pls$x, and is included for compatibility
with lm().)

 pred-fit.pls$scores %*% t(fit.pls$loadings)
 SPE.x-rowSums((actual-pred)^2)

 Am I missing something here? 

You are missing the mean X spectrum.  See
matplot(t(pred), type = l, lty = 1) vs. matplot(t(actual), type = l, lty = 
1)

The Xmeans compontent of fit.pls contains this, so

pred - sweep(fit.pls$scores %*% t(fit.pls$loadings), 2, fit.pls$Xmeans, +)

would give you what you want.

Note, however, that this will calculate the _fitted_ SPE, not the
cross-validated SPE.  The crossvalidation implemented in the pls package
does not save the cross-validated scores/loadings -- that would consume
too much memory.  (Calculation of SPE withing the cross-validation
routines could have been implemented, but was not.)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Append to a vector?

2008-08-08 Thread Bjørn-Helge Mevik

Why not simply

a - c(a, 5)

or 

a - c(a, b)

if b is another vector.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in using PCR

2008-07-02 Thread Bjørn-Helge Mevik

Gavin Simpson [EMAIL PROTECTED] writes:

 Ok, lets sort this out. [Not tested as I don't have your data]

 df - data.frame(resp = cancerv1[, 408], 
  VARS = as.matrix(cancerv1[, 2:407])

Actually, you _do_ need an I() here:

df - data.frame(resp = cancerv1[, 408], 
 VARS = I(as.matrix(cancerv1[, 2:407])))

otherwise data.frame() will split the matrix into single coloumn variables.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in using PCR

2008-07-02 Thread Bjørn-Helge Mevik

Gavin Simpson [EMAIL PROTECTED] writes:

 df - data.frame(resp = dat[,1], VARS = I(as.matrix(dat[, 2:101])))
 class(df$VARS)
 [1] AsIs

 The class is AsIs for $VARS. But if I look at your yarn data set for
 example, the NIR component is of class matrix:

 class(yarn$NIR)
 [1] matrix

 How did you achieve this?

I don't remember exactly what I did, but this will work, at least:

yarn - data.frame(density = ..., train = ...)
yarn$NIR - as.matrix(...)

For practical purposes, I haven't found any difference between having
the matrices with class AsIs and matrix.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in using PCR

2008-07-01 Thread Bjørn-Helge Mevik

Gavin Simpson [EMAIL PROTECTED] writes:

 You can do this another way though, that I feel is more natural. So lets
 assume that your data frame contains columns that are named, and that
 one of these is the response variable, the remaining columns are the
 predictors. Further assume that this response is called 'myresp', then
 you can proceed by the following:

 cancerv1.pcr - pcr(myresp ~ . , ncomp = 6, data = cancerv1,
 validation = CV)

This works fine as long as the number of (predictor) variables is not
too large.  With many variables ( 1000), R will spend a very long time
dealing with the formula.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] significant variables in GPLS ?

2008-04-24 Thread Bjørn-Helge Mevik

There is little theory about significance and testing for PLSR (and, I
would guess, GPLSR).  Many practicioners use Jackknife variance
estimates as a basis for significance tests.  Note, however, that these
variance estimates are known to be biased (in general), and their
distribution is (to my knowledge) not known.  Any significance deduced
from them should therefore be regarede as merely indicators.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different results in calculating SD of 2 numbers

2008-01-16 Thread Bjørn-Helge Mevik

Ron Michael [EMAIL PROTECTED] writes:

 Can anyone tell me why I am getting different results in calculating
 SD of 2 numbers ?

 (1.25-0.95)/2
 [1] 0.15

Because this is not the SD?  Try

 (1.25-0.95)/sqrt(2)

:-)

 sd(c(1.25, 0.95))
 [1] 0.2121320  # why it is different from 0.15?

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mvr error in PLS package

2007-11-27 Thread Bjørn-Helge Mevik

Gavin Simpson wrote:

 On Mon, 2007-11-26 at 09:25 -0800, Bricklemyer, Ross S wrote:
 
  libs.IC.cal - mvr(libs.IC.fmla, data = libsdata.cond.cal,
 ncomp=20,validation = LOO, method = oscorespls) 
 
 Error in colMeans(x, n, prod(dn), na.rm) : 
 'x' must be numeric
 
 There are many 0 for this soil property.  Could this cause the
 error?

 Without having the data (or a small example therefore) it is impossible
 to tell.

It would also be nice to know which version of the package you are
using. :-)

 To start, try str(libsdata.cond.cal) and check that the variables
 referenced in your formula object (which is what I presume libs.IC.fmla
 is?) are all numeric and haven't got coded as factors or characters or
 something strange.

Actually, as of version 2.0-0, mvr() etal should cope with factors
without problems.  They will be coded just as in lm().

Another thing to try is to say  traceback() just after receiving the
error message.  That might tell you more about _where_ the error
occurred.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] pls version 2.1-0

2007-10-26 Thread Bjørn-Helge Mevik and Ron Wehrens


Version 2.1-0 of the pls package is now available on CRAN.

The pls package implements partial least squares regression (PLSR) and
principal component regression (PCR).  Features of the package include

- Several plsr algorithms: orthogonal scores, kernel pls, wide kernel
  pls, and simpls
- Flexible cross-validation
- A formula interface, with traditional methods like predict, coef,
  plot and summary
- Functions for extraction of scores and loadings, and calculation of
  (R)MSEP and R2
- Functions for plotting predictions, validation statistics,
  coefficients, scores, loadings, and correlation loadings.

The main changes since 2.0-0 are

- Jackknife variance estimation of regression coefficients has been added.

- The `wide kernel' PLS algorithm has been implemented.  It is faster than the
  other algorithms for very wide data.

- The definition of R^2 has been changed to  1 - SSE/SST for all estimators,
  so R2() will give different results for test sets and
  cross-validation compared to pls  2.1-0.  Also, the internal
  calculations have been reorganised.

- The plot functions for coefficients, predictions and validation results
  (R2, (R)MSEP) have gained an argument `main' to set the main title of the
  plot.

- plots that go over several pages now only set `par(ask = TRUE)' if the plot
  device is interactive (suggested by Kevin Wright).

- mvr() and mvrCv() now check for near zero standard deviation when autoscaling
  (`scale = TRUE')


See the file CHANGES in the sources for all changes.

-- 
Bjørn-Helge Mevik and Ron Wehrens

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compute R2 and Q2 in PLS with pls.pcr package

2007-10-24 Thread Bjørn-Helge Mevik

Ana Conesa wrote:

 I am using the mvr function of the package pls.pcr to compute PLS 
 resgression

You should consider switching to the package 'pls'.  It supersedes
'pls.pcr', which is no longer maintained (the last version came in 2005).

In pls, you would do the following to get R^2 and cross-validated R^2
(A.K.A. Q^2):

mypls - plsr(Ytrain ~ Xtrain, ncomp = 1, validation=LOO)
## R^2:
R2(mypls, estimate = train)
## cross-validated R^2:
R2(mypls)
## Both:
R2(mypls, estimate = all)

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Who uses R?

2007-09-26 Thread Bjørn-Helge Mevik

(Ted Harding) wrote:

 Pat Altham (now retired) developed extensive teaching (and
 other) materials in R at the Cambridge University Statistical
 Laboratory. From her personal web page:

   Some of the computer languages I have had to try to
learn since graduating in 1964: Cambridge autocode,
algol, phoenix, machine-code, Fortran, BBC-Basic,
GLIM, GENSTAT, Linux, S-Plus and finally (probably
the best so far!) R.

Well, calling Linux a computer language will probably not add too much
credibility to the quote(r). :-)

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is RDA file and how to open it in R program?

2007-09-24 Thread Bjørn-Helge Mevik

Jittima Piriyapongsa wrote:

 I have a set of gene expression data in .RDA file. I have downloaded
 Bioconductor and R program for analyzing these data. Anyway, I am not sure how
 to open this RDA file in R program (what is the command?) in order to look at
 these data.

load(filename.RDA)

(.RDA (or .rda) is short for .RData (or .rdata :-).  It is the usual
file format for saving R objects to file (with save() or
save.image()).)

 And which package should I use for analyzing it e.g. plot the
 expression image?

That depends entirely on what is inside the file.  The best idea is
probably to ask the one(s) who created the file.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

64 matches

Mail list logo