[R] MOOC on Statistical Learning with R

2013-11-30 Thread Trevor Hastie
Rob Tibshirani and I are offering a MOOC in January on Statistical Learning.
This “massive open online course is free, and is based entirely on our new book
“An Introduction to Statistical Learning with Applications in R” 
(James, Witten, Hastie, Tibshirani 2013, Springer). 
http://www-bcf.usc.edu/~gareth/ISL/
The pdf of the book will also be free.

The course, hosted on Open edX, consists of video lecture segments, quizzes, 
video R sessions, interviews with famous statisticians,
lecture notes, and more. The course starts on January 22 and runs for 10 weeks.

Please consult the course webpage  http://statlearning.class.stanford.edu/  to 
enroll and for for further details.
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Some improvements in gam package

2013-08-11 Thread Trevor Hastie
I have posted a new version of the gam package:  gam_1.09
to CRAN. Thus update improved the step.gam function considerably,
and gives it a parallel option.

I am posting this update announcement along with the original package 
announcement below,
which may be of interest to those new to the list

Trevor Hastie

Begin forwarded message:

 From: Trevor Hastie has...@stanford.edu
 Subject: gam --- a new contributed package
 Date: August 6, 2004 10:35:36 AM PDT
 To: r-packa...@stat.math.ethz.ch
 
 I have contributed a gam library to CRAN,
 which implements Generalized Additive Models.
 
 This implementation follows closely the description in 
 the GAM chapter 7 of the white book Statistical Models in S
 (Chambers  Hastie (eds), 1992, Wadsworth), as well as the philosophy
 in Generalized Additive Models (Hastie  Tibshirani 1990, Chapman and
 Hall). Hence it behaves pretty much like the Splus version of GAM.
 
 Note: this gam library and functions therein are different from the
 gam function in package mgcv, and both libraries should not be used
 simultaneously.
 
 The gam library allows both local regression (loess) and smoothing
 spline smoothers, and uses backfitting and local scoring to fit gams.
 It also allows users to supply their own smoothing methods which can
 then be included in gam fits.
 
 The gam function in mgcv uses only smoothing spline smoothers, with a
 focus on automatic parameter selection via gcv. 
 
 Some of the features of the gam library:
 
 * full compatibility with the R functions glm and lm - a fitted gam
  inherits from class glm and lm
 
 * print, summary, anova, predict and plot methods are provided, as
  well as the usual extractor methods like coefficients, residuals etc
 
 * the method step.gam provides a flexible and customizable approach to
  model selection. 
 
 Some differences with the Splus version of gam:
 
 * predictions with new data are improved, without need for the
  safe.predict.gam function. This was partly facilitated by
  the improved prediction strategy used in R for GLMs and LMs
 
 * Currently the only backfitting algorithm is all.wam. In the earlier
  versions of gam, dedicated fortran routines fit models that had only
  smoothing spline terms (s.wam) or all local regression terms
  (lo.wam), which in fact made calls back to Splus to update the
  working response and weights. These were designed for efficiency. It
  seems now with much faster computers this efficiency is no longer
  needed, and all.wam is modular and visible
 
 
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] glmnet webinar Friday May 3 at 10am PDT

2013-04-25 Thread Trevor Hastie
I will be giving a webinar on glmnet on Friday May 3, 2013 at 10am PDT (pacific 
daylight time)
The one-hour webinar will consist of:

- Intro to lasso and elastic net regularization, and coefficient paths 
- Why is glmnet so efficient and flexible 
- New features of the latest version of glmnet 
- Live glmnet demonstration 
- Question and Answer period 

To sign up for the webinar, please go to
https://www3.gotomeeting.com/register/77950

The webinar is hosted by the Orange County R User Group., and will be moderated 
by its
president  Ray DiGiacomo


 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] softImpute_1.0 uploaded to CRAN

2013-04-02 Thread Trevor Hastie
SoftImpute is a new package for matrix completion - i.e. for imputing missing 
values in matrices.
SoftImpute was written by myself and Rahul Mazumder.
softImpute uses uses squared-error loss with nuclear norm regularization - one 
can think of it as 
the lasso for matrix approximation - to find a low-rank approximation to the 
observed entries in the matrix.
This low-rank approximation is then used to impute the missing entries.

softImpute works in a kind of EM fashion. Given a current guess, it fills in 
the missing entries.
Then it computes a soft-thresholded SVD of this complete matrix,  which yields 
the next guess.
These steps are iterated till convergence to the solution of the 
convex-optimation problem.

The algorithm can work with large matrices, such as the netflix matrix (400K 
x 20K) by making heavy use
of sparse-matrix methods in the Matrix package. It creates new S4 classes such 
as Incomplete  for storing the large   
data matrix, and SparseplusLowRank for representing the completed matrix. SVD 
computations are done using
a specially built block-alternating algorithm, svd.als, that exploits these 
structures and uses warm starts.


Some of the methods used are described in 
Rahul Mazumder, Trevor Hastie and Rob Tibshirani: 
Spectral Regularization Algorithms for Learning Large Incomplete Matrices. 
JMLR 2010 11 2287-2322 

Other newer and more efficient methods that inter-weave the alternating block 
algorithm steps with imputation steps will
be described in a forthcoming article.

Some of the features of softImpute are

* works with large matrices using sparse matrix methods, or smaller matrices 
using standard svd methods.
* one can control the maximum rank of the solution, to avoid overly expensive 
operations.
* warm starts can be used to move from one solution to a new solution with a 
different value for the nuclear-norm regularization parameter lambda (and/or a 
different rank)
* with lambda=0 and a specified rank, one automatically gets an implementation 
of hardImpute - iterative svd imputation
* softImpute has an option type which can be svd or als (alternating 
least squares), for specifying which of the two approaches above should be used.
*included in the package is svd.als, an efficient rank-restricted svd algorithm 
that can exploit sparsity and other special structure, and accept warm starts.
* a function biScale is provided, for centering and scaling both rows and 
columns of matrix to have means zero and variance 1. The centering and scaling
constants are stored on the object. For sparse matrices with centering, 
the centered object is stored in SparseplusLowRank form to preserve its 
special structure
* prediction functions impute and complete are provided.


Trevor Hastie
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] glmnet 1.9-3 uploaded to CRAN (with intercept option)

2013-03-01 Thread Trevor Hastie
This update adds an intercept option (by popular request) - now one can fit a 
model without an intercept

Glmnet is a package that fits the regularization path for a number of 
generalized linear models, with  with elastic net 
regularization (tunable mixture of L1 and L2 penalties). Glmnet uses pathwise 
coordinate descent, and is very fast.

The current list of models covered are:

least squares linear regression
binary logistic regression
multinomial logistic regression (grouped and ungrouped)
poisson regression
multi-response linear regression (grouped)
Cox proportinal-hazards model


Some of the features of glmnet:

* By default it computes the path at 100 uniformly spaced (on the log scale) 
values of the regularization parameter lambda. 
  Alternatively users can provide their own values of lambda
* Recognizes and exploits sparse input matrices (ala Matrix package; this 
feature not yet implemented for Cox family).
* Coefficient matrices are output in sparse matrix representation.
* Penalty is (1-a)*||\beta||_2^2 +a*||beta||_1  where a is between 0 and 1;  
a=0 is the Lasso penalty, a=1 is the ridge penalty.
  For many correlated predictors, a=.95 or thereabouts improves the performance 
of the lasso.
* Convenient predict, plot, print, and coef methods
* Variable-wise penalty modulation allows each variable to be penalized by a 
scalable amount; if zero that variable always enters
* Some variables can be excluded (a convenience option)
* Glmnet uses a symmetric parametrization for multinomial, with constraints 
enforced by the penalization. 
  When the grouped option is used, it selects in or out all the class 
coefficients for a variable together.
* A comprehensive set of cross-validation routines are provided for all models 
and several error measures;  These include deviance,
  mean absolute error, misclassification error and auc for logistic or 
multinomial models. 
* Offsets and weights can be provided for all models
* Upper and lower bounds can be imposed on each of the coefficients
* An intercept option allows for models to be fit with or without intercepts.
* A standardize option allows for variable standardization
* A number of control parameters can be set in the calling function. In 
addition, a function glmnet.control allows users to set 
  some internal control variables for the entire session.
* Uses strong rules for speeding up convergence (by temporarily limiting the 
active set).

Examples of glmnet speed trials:
Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along 
lasso path.   Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along 
lasso path. Time = 30secs

Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani and  Noah Simon

References:
 Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization
 Paths for Generalized Linear Models via Coordinate Descent
 http://www.stanford.edu/~hastie/Papers/glmnet.pdf
 Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010
 http://www.jstatsoft.org/v33/i01/

 Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
 Regularization Paths for Cox's Proportional Hazards Model via
 Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
 1-13  http://www.jstatsoft.org/v39/i05/

 Tibshirani, Robert., Bien, J., Friedman, J.,Hastie, T.,Simon,
 N.,Taylor, J. and Tibshirani, Ryan. (2010) Strong Rules for
 Discarding Predictors in Lasso-type Problems,
 http://www-stat.stanford.edu/~tibs/ftp/strong.pdf

 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] glmnet_1.9-1 submitted to CRAN

2013-02-10 Thread Trevor Hastie
This new version of glmnet has some bug fixes, and some new features

* new arguments lower.limits=-Inf  and upper.limits=Inf (defaults shown) for 
all the coefficients
in glmnet. Users can provide limits on coefficients. See the documentation for 
glmnet.
Typical usage:
glmnet(x,y,lower=0)

Here the argument is abbreviated, and by giving a single value, this uses the 
same value for all parameters.
This fits a positive lasso

* new function glmnet.control() allows one to set internal parameters in 
glmnet, previously not under user control.
 These are for knowledgeable users. Once changed, the settings persist for the 
session. 
glmnet.control has a useful factory=TRUE argument, which will reset the 
factory defaults.

* a memory bug in coxnet has been fixed. 

Trevor Hastie
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] glmnet_1.8-4 on CRAN

2012-12-27 Thread Trevor Hastie
This version has some minor bug fixes, plus some new features.

* The exact=TRUE option in predict and coef methods now works.

In earlier versions of glmnet, if you supplied a value of s different from the 
sequence of lambdas used to compute the fit, predict used interpolation.
This is exact for lasso (alpha=1) and family=gaussian, and an approximation
otherwise.
Outside the range it used the closest member in the range.
The most frequent value requested was typically s=0,
and that was a) never in the range, and b) always a little off.

Now predict.glmnet returns the exact values

In case you missed earlier announcements, glmnet now has additional families.

* mgaussian is a multi-response gaussian model, that uses a group lasso 
penalty
for the set of coefficients for each predictor.

* For the type=multinomial family, there is an additional argument 
type.multinomial=c(ungrouped,grouped) 
For the grouped cases, again a group lasso penalty is used on the set  of class 
coefficients
for a predictor.


Trevor Hastie

 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Glmnet_1.8 uploaded to CRAN

2012-07-02 Thread Trevor Hastie
This is a major revision, with two additional models included.

1) Multiresponse regression - family=mgaussian
Here we have a matrix of M responses, and we fit a series of linear models in 
parallel. We use a group-lasso penalty on the set of M coefficients for each 
variable.
This means they are all in or out together

2) family=multinomial, type.multinomial=grouped
Same story = multinomial regression, but now the group lasso penalty ensures all
the coefficients are in or out for each class at the same time.  We have left 
the default
as type.multinomial=ungrouped because currently this grouped version is about 
10
 times slower. We will be looking to improve this aspect.

Thanks to Noah Simon for his work on developing the algorithms for
both these options. A report is in the works.


 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet_1.7.4

2012-04-26 Thread Trevor Hastie
A new version of glmnet is uploaded to CRAN.
This should take care of the problem on PCs that 
caused it to fail there. Many thanks to B. Narasimhan
for his stoic efforts in debugging this problem, which was
a real nasty idiosyncrasy  in the gfortran compiler that exists
on windows but not on linux or MacOS platforms.


 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet_1.7.3 on windows

2012-04-23 Thread Trevor Hastie
We are aware that glmnet_1.7.3 does not pass for windows
and are looking into the problem. It has something to do
with the gcc compiler being slightly different on
windows versus linux/mac  platforms. As soon as we have 
resolved the issue, we will post a new version to CRAN

Trevor Hastie
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] sparsenet: a new package for sparse model selection

2012-03-06 Thread Trevor Hastie
We have put a new package sparsenet on CRAN.

Sparsenet fits regularization paths for sparse model selection via coordinate 
descent, 
using a penalized least-squares framework and a non-convex penalty.

The package is based on our JASA paper
Rahul Mazumder, Jerome Friedman and Trevor Hastie: SparseNet : Coordinate 
Descent with Non-Convex Penalties. (JASA 2011)
http://www.stanford.edu/~hastie/Papers/Sparsenet/jasa_MFH_final.pdf

We use Zhang's  MC+ penalty to impose sparsity in model selection. This penalty 
parametrizes a family ranging between L1 and L0 regularization. One nice 
feature of this
family is that the single-coordinate optimization problems are convex, making it
ideal for coordinate descent.

The package fits the regularization surface for each parameter -  a surface 
over the 
two-dimensional space of tuning parameters. The concavity parameter gamma 
indexes 
the member of the family, and lambda is the usual Lagrange penalty parameter 
which
determines the strength of the penalty.

Sparsenet is extremely fast. For example, with 10K variables and 1K samples, 
the entire surface with
10 values of gamma and 50 values of lambda takes under a second on a Macbook 
Pro.

The package includes functions for fitting, plotting and cross-validation of 
the models,
as well as methods for prediction.

Trevor Hastie, with Jerome Friedman and Rahul Mazumder

 
___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differences between 1.7 and 1.7.1 glmnet versions

2011-12-29 Thread Trevor Hastie
I have just started using changelogs, and am clearly not disciplined enough at 
it.
The big change that occurred was the convergence criterion, which would account 
for the difference.
At some point will put up details of this.

Trevor Hastie
On Dec 26, 2011, at 11:55 PM, Damjan Krstajic wrote:

 
 Dear All,
  
 I have found differences between glmnet versions 1.7 and 1.7.1 which, in 
 my opinion, are not cosmetic and do not appear in the ChangeLog. If I am 
 not mistaken, glmnet appears to return different number of selected 
 input variables, i.e. nonzeroCoef(fit$beta[[1]]) differes between 
 versions. The code below is the same for 1.7.1 and 1.7, but you can see 
 that outputs differ. I would automatically use the latest version, but 
 by looking at the ChangeLog I wonder if this is a bug or expected 
 behaviour, as this change is not documented.
 
 Thanks in advance.
 DK
 
 # glmnet 1.7.1
 library(glmnet)
 Loading required package: Matrix
 Loading required package: lattice
 Loaded glmnet 1.7.1
 set.seed(1)
 x=matrix(rnorm(40*500),40,500)
 g4=sample(1:7,40,replace=TRUE)
 fit=glmnet(x,g4,family=multinomial,alpha=0.1)
 dgcBeta- fit$beta[[1]]
 which=nonzeroCoef(dgcBeta)
 which
 [1]   1  12  15  17  19  20  34  39  42  58  60  62  63  65  71  72 
 73  77
 [19]  80  82  85  86  95  97  98  99 106 110 113 114 119 120 123 124 
 128 130
 [37] 136 138 139 143 148 149 151 160 161 162 173 174 175 176 177 183 
 186 187
 [55] 188 190 193 194 195 198 199 204 206 218 224 238 239 240 241 245 
 247 250
 [73] 252 255 256 258 265 266 270 277 278 281 287 293 294 296 297 300 
 306 308
 [91] 311 316 317 321 326 329 336 337 341 349 354 356 363 365 368 374 
 376 377
 [109] 379 384 385 389 397 398 400 403 404 407 415 417 418 423 424 430 
 432 437
 [127] 440 442 446 450 451 454 456 459 463 467 470 472 474 478 481 488 
 496 497
 [145] 498 500
 # just to check that inputs to glmnet are the same
 g4
 [1] 5 4 5 3 2 6 1 6 6 1 3 6 1 2 6 3 7 2 6 7 6 7 5 1 3 2 2 3 2 3 3 1 5 
 6 7 4 6 3
 [39] 2 7
 x[,1]
 [1] -0.62645381  0.18364332 -0.83562861  1.59528080  0.32950777 
 -0.82046838
 [7]  0.48742905  0.73832471  0.57578135 -0.30538839  1.51178117 
 0.38984324
 [13] -0.62124058 -2.21469989  1.12493092 -0.04493361 -0.01619026  0.94383621
 [19]  0.82122120  0.59390132  0.91897737  0.78213630  0.07456498 -1.98935170
 [25]  0.61982575 -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
 [31]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956 -0.41499456
 [37] -0.39428995 -0.05931340  1.10002537  0.76317575
 
 
  glmnet 1.7
 library(glmnet)
 Loading required package: Matrix
 Loading required package: lattice
 Loaded glmnet 1.7
 set.seed(1)
 x=matrix(rnorm(40*500),40,500)
 g4=sample(1:7,40,replace=TRUE)
 fit=glmnet(x,g4,family=multinomial,alpha=0.1)
 dgcBeta- fit$beta[[1]]
 which=nonzeroCoef(dgcBeta)
 which
 [1]   1   2   3   4   6   7   8   9  10  11  12  13  14  15  16  17 
 18  19
 [19]  20  21  22  23  24  25  26  27  28  30  31  32  33  34  35  36 
 37  38
 [37]  39  41  42  43  44  45  46  47  48  50  51  52  53  54  55  56 
 57  58
 [55]  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74 
 75  76
 [73]  77  78  79  80  81  82  83  84  85  86  87  88  89  91  93  94 
 95  97
 [91]  98  99 100 101 102 104 105 106 107 109 110 111 112 113 114 115 
 116 119
 [109] 120 121 122 123 124 126 127 128 130 131 132 133 134 135 136 137 
 138 139
 [127] 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 156 
 157 159
 [145] 160 161 162 163 164 165 167 168 170 171 172 173 174 175 176 177 
 178 179
 [163] 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 
 197 198
 [181] 199 200 203 204 205 206 207 208 209 211 212 213 215 216 217 218 
 219 220
 [199] 221 222 223 224 225 226 227 228 229 231 232 233 234 235 236 237 
 238 239
 [217] 240 241 242 243 244 245 246 247 248 249 250 251 252 253 255 256 
 257 258
 [235] 259 261 262 263 264 265 266 268 269 270 271 272 273 274 275 276 
 277 278
 [253] 279 280 281 282 283 285 286 287 288 289 290 291 292 293 294 295 
 296 297
 [271] 298 299 300 301 302 304 305 306 307 308 309 310 311 312 313 314 
 315 316
 [289] 317 318 319 321 323 324 325 326 327 328 329 330 331 332 333 334 
 336 337
 [307] 338 339 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
 355 356
 [325] 357 358 361 362 363 364 365 366 367 368 369 370 371 372 373 374 
 375 376
 [343] 377 378 379 380 381 382 384 385 386 388 389 390 393 394 395 396 
 397 398
 [361] 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 
 415 417
 [379] 418 420 421 422 423 424 425 426 427 428 429 430 432 433 434 436 
 437 438
 [397] 439 440 441 442 443 444 445 446 448 450 451 452 453 454 455 456 
 457 458
 [415] 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 
 475 476
 [433] 477 478 479 480 481 482 483 484 486 488 489 490 491 493 494 495 
 496 497
 [451] 498 499 500
 # just to check that inputs to glmnet are the same
 g4
 [1] 5 4 5 3 2 6 1 6 6 1 3 6 1 2 6 3 7 2 6 7 6

[R] [R-pkgs] svmpath_0.95 uploaded to CRAN

2011-06-08 Thread Trevor Hastie
This new version includes a plot method for plotting
a particular instance along the path.

 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] glmnet_1.6 uploaded to CRAN

2011-04-19 Thread Trevor Hastie
We have submitted glmnet_1.6 to CRAN

This version has an improved convergence criterion, and it also uses
a variable screening algorithm that dramatically reduces the time
to convergence (while still producing the exact solutions).
The speedups in some cases are by a factors of 20 to 50, depending on
the particular problem and loss function.

See our paper http://www-stat.stanford.edu/~tibs/ftp/strong.pdf 
Strong Rules for Discarding Predictors in Lasso-type Problems
for details of this screening method.

---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet_1.5.1 uploaded to CRAN

2010-11-18 Thread Trevor Hastie
In glmnet_1.5 a poor default was set for the argument type which caused the 
program
to be very slow or even crash when nvar (p) is very large.

The argument type (now called type.gaussian) has two options,
covariance or naive, and is used for the default  family=gaussion model 
(squared error loss).

When type.gaussian=covariance, all inner-products between variables in the 
active set
and all other variables are cached, and can cause considerable speedup when 
nobs is large.
However, when nvar is large (500) the matrix to be stored gets large, and this 
strategy becomes counterproductive.
In addition, when nvar is very large, glmnet tries to allocate a storage space 
for this matrix that can exceed the 
machine's memory.

When type.gaussian=naive, nothing is cached, and inner products (loop over 
nobs) are computed whenever needed.

In this minor upgrade,  the default is covariance if nvar500, else it is 
naive. We established this rule after conducting
extensive simulations.

In addition, the argument was renamed so as not to collide with the argument 
type to cv.glmnet, which is now renamed to
type.measure.  In both cases, abbreviations work.

---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet_1.5 uploaded to CRAN

2010-11-04 Thread Trevor Hastie
This is a new version of glmnet, that incorporates some bug fixes and 
speedups.

* a new convergence criterion which which offers 10x or more speedups for
 saturated fits (mainly effects logistic, Poisson and Cox)
* one can now predict directly from a cv.object - see the help files for 
cv.glmnet
  and predict.cv.glmnet
* other new methods are deviance()  for glmnet and coef() for cv.glmnet

Here is the description of the package.

glmnet is a package that fits the regularization path for linear, two- and 
multi-class logistic regression
models, poisson regression and the Cox model, with elastic net regularization 
(tunable mixture of L1 and L2 penalties).
glmnet uses pathwise coordinate descent, and is very fast.

Some of the features of glmnet:

* by default it computes the path at 100 uniformly spaced (on the log scale) 
values of the regularization parameter
* glmnet is very fast, even for large data sets.
* recognizes and exploits sparse input matrices (ala Matrix package). 
Coefficient matrices are output in sparse matrix representation.
* penalty is (1-a)*||\beta||_2^2 +a*||beta||_1  where a is between 0 and 1;  
a=0 is the Lasso penalty, a=1 is the ridge penalty.
   For many correlated predictors, a=.95 or thereabouts improves the 
performance of the lasso.
* convenient predict, plot, print, and coef methods
* variable-wise penalty modulation allows each variable to be penalized by a 
scalable amount; if zero that variable always enters
* glmnet uses a symmetric parametrization for multinomial, with constraints 
enforced by the penalization.
* a comprehensive set of cross-validation routines are provided for all models 
and several error measures
* offsets and weights can be provided for all models


Examples of glmnet speed trials:
Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along 
lasso path.   Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along 
lasso path. Time = 30secs

Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.

See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for 
implementation details,
and comparisons with other related software.



---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Statistical Learning and Datamining Course October 2010 Washington DC

2010-07-12 Thread Trevor Hastie
Short course: Statistical Learning and Data Mining III:
  Ten Hot Ideas for Learning from Data

 Trevor Hastie and Robert Tibshirani, Stanford University


 Georgetown University Conference Center
 Washington DC,
 October 11-12, 2010.

 This two-day course gives a detailed overview of statistical models for
 data mining, inference and prediction. With the rapid developments in
 internet technology, genomics, financial risk modeling, and other
 high-tech industries, we rely increasingly more on data analysis and
 statistical models to exploit the vast amounts of data at our
 fingertips.

 In this course we emphasize the tools useful for tackling modern-day
 data analysis problems. From the vast array of tools available, we have
 selected what we consider are the most relevant and exciting. Our
 top-ten list of topics are:

  * Regression and Logistic Regression (two golden oldies),
  * Lasso and Related Methods,
  * Support Vector and Kernel Methodology,
  * Principal Components (SVD) and Variations: sparse SVD, supervised
PCA, Nonnegative Matrix Factorization
  * Boosting, Random Forests and Ensemble Methods,
  * Rule based methods (PRIM),
  * Graphical Models,
  * Cross-Validation,
  * Bootstrap,
  * Feature Selection, False Discovery Rates and Permutation Tests.

 Our earlier courses are not a prerequisite for this new course. Although
 there is some overlap with past courses, our new course contains many
 topics not covered by us before.

 The material is based on recent papers by the authors and other
 researchers, as well as the new second edition of our best selling book:


 Statistical Learning: data mining, inference and prediction

 Hastie, Tibshirani  Friedman, Springer-Verlag, 2009

 http://www-stat.stanford.edu/ElemStatLearn/

 A copy of this book will be given to all attendees.

 The lectures will consist of video-projected presentations and
 discussion.
 Go to the site

 http://www-stat.stanford.edu/~hastie/sldm.html

 for more information and online registration.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help needed with help

2010-04-30 Thread Trevor Hastie
I installed 
R version 2.11.0 (2010-04-22)
on may macbook (snow leopard)
and run R from within emacs

Now when I try to get help, I get
 ?lm

(in the new help window)


Error in help(lm, htmlhelp = FALSE) : 
  unused argument(s) (htmlhelp = FALSE)



Help!

p.s. I am running:
This is GNU Emacs 22.2.50.1 (i386-apple-darwin9.4.0, Carbon Version 1.6.0)
 of 2008-07-17 on seijiz.local

---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New package for ICA uploaded to CRA

2010-04-28 Thread Trevor Hastie
I have uploaded a new package to CRAN called ProDenICA.

This fits ICA models directly via product-density estimation
of the source densities. This package was promised on page 567 in the 
2nd edition of our book 'Elements of Statistical Learning' 
(Hastie, Tibshirani and Friedman, 2009, Springer) . Apologies that it is so 
late.

The method fits each source density by a tilted gaussian density, where
the log of the tilting function is modeled by a smoothing spline. This 
function is then used as a contrast function for computing the negentropy
measure for this source component. The estimation is achieved by fitting a 
poisson
GAM model for each component, with the log-gaussian as an offset.

The method was first described in
Hastie, T. and Tibshirani, R. (2003). Independent component analysis
through product density estimation, in S. T. S. Becker and K. Obermayer
(eds), Advances in Neural Information Processing Systems 15,
MIT Press, Cambridge, MA, pp. 649-656.



---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major glmnet upgrade on CRAN

2010-04-04 Thread Trevor Hastie
glmnet_1.2 has been uploaded to CRAN. 

This is a major upgrade, with the following additional features:

* poisson family, with dense or sparse x
* Cox proportional hazards family, for dense x
* wide range of cross-validation features. All models have several criteria for 
cross-validation.  
  These include deviance, mean absolute error, misclassification error and 
auc for logistic or multinomial models. 
  Observation weights are incorporated.
* offset is allowed in fitting the model 

Here is the description of the package.

glmnet is a package that fits the regularization path for linear, two- and 
multi-class logistic regression
models, poisson regression and the Cox model, with elastic net regularization 
(tunable mixture of L1 and L2 penalties).
glmnet uses pathwise coordinate descent, and is very fast.

Some of the features of glmnet:

* by default it computes the path at 100 uniformly spaced (on the log scale) 
values of the regularization parameter
* glmnet appears to be faster than any of the packages that are freely 
available, in some cases by two orders of magnitude.
* recognizes and exploits sparse input matrices (ala Matrix package). 
Coefficient matrices are output in sparse matrix representation.
* penalty is (1-a)*||\beta||_2^2 +a*||beta||_1  where a is between 0 and 1;  
a=0 is the Lasso penalty, a=1 is the ridge penalty.
  For many correlated predictors, a=.95 or thereabouts improves the performance 
of the lasso.
* convenient predict, plot, print, and coef methods
* variable-wise penalty modulation allows each variable to be penalized by a 
scalable amount; if zero that variable always enters
* glmnet uses a symmetric parametrization for multinomial, with constraints 
enforced by the penalization.
* a comprehensive set of cross-validation routines are provided for all models 
and several error measures
* offsets and weights can be provided for all models

 
Examples of glmnet speed trials:
Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along 
lasso path.   Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along 
lasso path. Time = 30secs

Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.

See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for 
implementation details,
and comparisons with other related software.



---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new version of glmnet

2009-12-19 Thread Trevor Hastie

glmnet _1.1-4 is on CRAN now.


This version includes cross.validation functions to assist in picking  
a good value for lambda
These functions are preliminary, in that they can only handle gaussian  
or logistic models for binary data.

The complete range will appear in the future.

For those unfamiliar with glmnet, here is the original blurb:

glmnet fits lasso and elastic net regularization paths for squared  
error, binomial and multinomial
models via coordinate descent. It is extremely fast and can work on  
large scale problems.
See the paper: Regularized Paths for Generalized Linear Models via  
Coordinate Descent by

Friedman, Hastie, Tibshirani on my website for details.

Glmnet can accommodate sparse data matrices efficiently, and thereby  
handle even larger problems.
For example for a two class logistic model with 11K obs and 750K  
variables (with  99% zeros in X matrix),
glmnet takes less than two minutes to fit the entire regularization  
path on a grid of 100 values of the
reg. parameter lambda. For a 14-class gene expression dataset (144  
obs, 16K vars, not sparse), it takes 15 seconds

to fit the path at 100 values of lambda



Trevor Hastie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] New version of package mda

2009-12-19 Thread Trevor Hastie

mda 0.1-4 is on CRAN

Many thanks to Friedrich Leisch, Kurt Hornik and Brian Ripley for  
their early work in porting the mda package
into R, and to Kurt for maintaining the package. I have taken back  
mda and will maintain it from now on.


The package fits flexible, penalized and mixture discriminant models.  
For a brief introduction, see Sections 12.4-7 of Elements of  
Statistical Learning (first or second edition).


This new version has documentation for the plot method, and has  
improved functionality for the regression method gen.ridge. The  
laplacian penalty works and is documented, for implementing  
penalized discriminant analysis via the function fda(). The mars  
function has not changed, but users are encouraged to use the earth  
package of Stephen Milborrow, which fits MARS models; in particular,  
earth works as a regression method for fda() and mda().



Trevor Hastie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Austria, September, 2009: Statistical Learning and Data Mining Course

2009-08-05 Thread Trevor Hastie

Short course: Statistical Learning and Data Mining III:
Ten Hot Ideas for Learning from Data

Trevor Hastie and Robert Tibshirani, Stanford University

Danube University
Krems, Austria
25-26 September 2009

This two-day course gives a detailed overview of statistical models for
data mining, inference and prediction. With the rapid developments in
internet technology, genomics, financial risk modeling, and other
high-tech industries, we rely increasingly more on data analysis and
statistical models to exploit the vast amounts of data at our
fingertips.

In this course we emphasize the tools useful for tackling modern-day
data analysis problems. From the vast array of tools available, we have
selected what we consider are the most relevant and exciting. Our
top-ten list of topics are:

* Regression and Logistic Regression (two golden oldies),
* Lasso and Related Methods,
* Support Vector and Kernel Methodology,
* Principal Components (SVD) and Variations: sparse SVD, supervised PCA,
  Multidimensional Scaling and Isomap, Nonnegative Matrix
   Factorization, and  Local Linear Embedding,
* Boosting, Random Forests and Ensemble Methods,
* Rule based methods (PRIM),
* Graphical Models,
* Cross-Validation,
* Bootstrap,
* Feature Selection, False Discovery Rates and Permutation Tests.

The material is based on recent papers by ourselves and other
researchers, as well as the new second edition of our book:

Elements of Statistical Learning: data mining, inference and prediction

Hastie, Tibshirani  Friedman, Springer-Verlag, 2009 (second edition)

http://www-stat.stanford.edu/ElemStatLearn/

A copy of this book will be given to all attendees.
The lectures will consist of video-projected presentations and
discussion.

Visit
http://www-stat.stanford.edu/~hastie/SLDM/Austria.htm
for more information and registration instructions.


---
  Trevor Hastie   has...@stanford.edu
  Professor  Chair, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Austria, September, 2009: Statistical Learning and Data Mining Course

2009-06-10 Thread Trevor Hastie

Short course: Statistical Learning and Data Mining III:
 Ten Hot Ideas for Learning from Data

Trevor Hastie and Robert Tibshirani, Stanford University

Danube University
Krems, Austria
25-26 September 2009

This two-day course gives a detailed overview of statistical models for
data mining, inference and prediction. With the rapid developments in
internet technology, genomics, financial risk modeling, and other
high-tech industries, we rely increasingly more on data analysis and
statistical models to exploit the vast amounts of data at our
fingertips.

In this course we emphasize the tools useful for tackling modern-day
data analysis problems. From the vast array of tools available, we have
selected what we consider are the most relevant and exciting. Our
top-ten list of topics are:

 * Regression and Logistic Regression (two golden oldies),
 * Lasso and Related Methods,
 * Support Vector and Kernel Methodology,
 * Principal Components (SVD) and Variations: sparse SVD, supervised PCA,
   Multidimensional Scaling and Isomap, Nonnegative Matrix
Factorization, and  Local Linear Embedding,
 * Boosting, Random Forests and Ensemble Methods,
 * Rule based methods (PRIM),
 * Graphical Models,
 * Cross-Validation,
 * Bootstrap,
 * Feature Selection, False Discovery Rates and Permutation Tests.

The material is based on recent papers by ourselves and other
researchers, as well as the new second edition of our book:

Elements of Statistical Learning: data mining, inference and prediction

Hastie, Tibshirani  Friedman, Springer-Verlag, 2009 (second edition)

http://www-stat.stanford.edu/ElemStatLearn/

A copy of this book will be given to all attendees.
 The lectures will consist of video-projected presentations and
discussion.

This European  edition of our course is organized by Prof. Michael G. 
Schimek ,
who has been teaching in this field for about 10 years at various 
universities in Europe.


Visit
http://www-stat.stanford.edu/~hastie/SLDM/Austria.htm
for more information and registration instructions.


--

 Trevor Hastie  has...@stanford.edu
 Professor  Chair, Department of Statistics, Stanford University
 Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977
 (650) 498-5233 (Biostatistics)  Fax: (650) 725-6951
 URL: http://www-stat.stanford.edu/~hastie
 address: room 104, Department of Statistics, Sequoia Hall
  390 Serra Mall, Stanford University, CA 94305-4065

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new version of glmnet

2009-01-24 Thread Trevor Hastie

glmnet _1.1-3 is on CRAN now.

glmnet fits lasso and elastic net regularization paths for squared  
error, binomial and multinomial
models via coordinate descent. It is extremely fast and can work on  
large scale problems.
See the paper: Regularized Paths for Generalized Linear Models via  
Coordinate Descent by

Friedman, Hastie, Tibshirani on my website for details.

Glmnet can accommodate sparse data matrices efficiently, and thereby  
handle even larger problems.
For example for a two class logistic model with 11K obs and 750K  
variables (with  99% zeros in X matrix),
 glmnet takes less than two minutes to fit the entire regularization  
path on a grid of 100 values of the
reg. parameter lambda. For a 14-class gene expression dataset (144  
obs, 16K vars, not sparse), it takes 15 seconds

 to fit the path at 100 values of lambda

Several minor fixes, as well as two more serious fixes:

1) predict( ...,type=class) was returning flipped labels for a two  
class logistic model.
2) if a weight argument was supplied to binomial/multinomial model,  
with some zero weight entries,

the program bombed with an unhelpful message. Now it works as expected.

Thanks to many users, esp. Tim Hesterberg, for notifying us of the  
errors.


Trevor Hastie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] New Statistical Learning and Data Mining Course

2009-01-15 Thread Trevor Hastie

Short course: Statistical Learning and Data Mining III:
  Ten Hot Ideas for Learning from Data

 Trevor Hastie and Robert Tibshirani, Stanford University

 Sheraton Hotel
 Palo Alto, CA
 March 16-17, 2009

 This two-day course gives a detailed overview of statistical models  
for

 data mining, inference and prediction. With the rapid developments in
 internet technology, genomics, financial risk modeling, and other
 high-tech industries, we rely increasingly more on data analysis and
 statistical models to exploit the vast amounts of data at our
 fingertips.

 In this course we emphasize the tools useful for tackling modern-day
 data analysis problems. From the vast array of tools available, we  
have

 selected what we consider are the most relevant and exciting. Our
 top-ten list of topics are:

  * Regression and Logistic Regression (two golden oldies),
  * Lasso and Related Methods,
  * Support Vector and Kernel Methodology,
  * Principal Components (SVD) and Variations: sparse SVD, supervised  
PCA,

Multidimensional Scaling and Isomap, Nonnegative Matrix
 Factorization, and  Local Linear Embedding,
  * Boosting, Random Forests and Ensemble Methods,
  * Rule based methods (PRIM),
  * Graphical Models,
  * Cross-Validation,
  * Bootstrap,
  * Feature Selection, False Discovery Rates and Permutation Tests.

 Our earlier courses are not a prerequisite for this new course.  
Although

 there is some overlap with past courses, our new course contains many
 topics not covered by us before.

 The material is based on recent papers by the authors and other
 researchers, as well as the new second edition of our best selling  
book:


Statistical Learning: data mining, inference and prediction

Hastie, Tibshirani  Friedman, Springer-Verlag, 2008

http://www-stat.stanford.edu/ElemStatLearn/

 A copy of this book will be given to all attendees.
 ###

 The lectures will consist of video-projected presentations and
 discussion.
 Go to the site
 http://www-stat.stanford.edu/~hastie/sldm.html
 for more information and online registration.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New glmnet package on CRAN

2008-06-02 Thread Trevor Hastie
glmnet is a package that fits the regularization path for linear, two- 
and multi-class logistic regression
models with elastic net regularization (tunable mixture of L1 and L2 
penalties).

glmnet uses pathwise coordinate descent, and is very fast.

Some of the features of glmnet:

* by default it computes the path at 100 uniformly spaced (on the log 
scale) values of the regularization parameter
* glmnet appears to be faster than any of the packages that are freely 
available, in some cases by two orders of magnitude.
* recognizes and exploits sparse input matrices (ala Matrix package). 
Coefficient matrices are output in sparse matrix representation.
* penalty is (1-a)*||\beta||_2^2 +a*||beta||_1  where a is between 0 and 
1;  a=0 is the Lasso penalty, a=1 is the ridge penalty.
  For many correlated predictors, a=.95 or thereabouts improves the 
performance of the lasso.

* convenient predict, plot, print, and coef methods
* variable-wise penalty modulation allows each variable to be penalized 
by a scalable amount; if zero that variable always enters
* glmnet uses a symmetric parametrization for multinomial, with 
constraints enforced by the penalization.


Other families such as poisson might appear in later versions of glmnet.

Examples of glmnet speed trials:

Newsgroup data: N=11,000, p=4 Million, two class logistic. 100 values 
along lasso path.   Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values 
along lasso path. Time = 30secs


Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.

See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for 
implementation details,

and comparisons with other related software.

--

 Trevor Hastie  [EMAIL PROTECTED]
 Professor  Chair, Department of Statistics, Stanford University
 Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977
 (650) 498-5233 (Biostatistics)  Fax: (650) 725-6951
 URL: http://www-stat.stanford.edu/~hastie
 address: room 104, Department of Statistics, Sequoia Hall
  390 Serra Mall, Stanford University, CA 94305-4065

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Short Course: Statistical Learning and Data Mining

2008-02-08 Thread Trevor Hastie
Short course: Statistical Learning and Data Mining II:
tools for tall and wide data

Trevor Hastie and Robert Tibshirani, Stanford University

Sheraton Hotel,
Palo Alto, California,
April 3-4, 2006.

This two-day course gives a detailed overview of statistical models for
data mining, inference and prediction.  With the rapid developments
in internet technology, genomics, financial risk modeling, and other
high-tech industries, we rely increasingly more on data analysis and
statistical models to exploit the vast amounts of data at our fingertips.

This course is the third in a series, and follows our popular past
offerings Modern Regression and Classification, and Statistical
Learning and Data Mining.

The two earlier courses are not a prerequisite for this new course.

In this course we emphasize the tools useful for tackling modern-day
data analysis problems. We focus on both tall data ( Np where
N=#cases, p=#features) and wide data (pN). The tools include
gradient boosting, SVMs and kernel methods, random forests, lasso and
LARS, ridge regression and GAMs, supervised principal components, and
cross-validation.  We also present some interesting case studies in a
variety of application areas. All our examples are developed using the
S language, and most of the procedures we discuss are implemented in
publicly available R packages.

Please visit the site
http://www-stat.stanford.edu/~hastie/sldm.html
for more information and registration details.

-- 

  Trevor Hastie  [EMAIL PROTECTED]
  Professor  Chair, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977
 (650) 498-5233 (Biostatistics)  Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie
  address: room 104, Department of Statistics, Sequoia Hall
  390 Serra Mall, Stanford University, CA 94305-4065

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correction: Short Course: Statistical Learning and Data Mining

2008-02-08 Thread Trevor Hastie
Apologies, my last email announcing this course
had the wrong dates. Here is the corrected header:

 Short course: Statistical Learning and Data Mining II:
tools for tall and wide data

Trevor Hastie and Robert Tibshirani, Stanford University

Sheraton Hotel,
Palo Alto, California,
March 6-7, 2008

-- 

  Trevor Hastie  [EMAIL PROTECTED]
  Professor  Chair, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977
 (650) 498-5233 (Biostatistics)  Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie
  address: room 104, Department of Statistics, Sequoia Hall
  390 Serra Mall, Stanford University, CA 94305-4065

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.