Re: [R] Reduced Error Logistic Regression, and R?

2007-04-26 Thread paulandpen
Further to Simon's points,

Here is what is confusing to me and I highlight the section of the claims below:

The key assumption concerns symmetrical error constraints. These symmetrical 
error constraints force a solution where the probabilities of positive and 
negative error are symmetrical across all cross product sums that are the basis 
of maximum likelihood logistic regression. As the number of independent 
variables increases, it becomes more and more likely that this symmetrical 
assumption is accurate. Because this error component can be reliably estimated 
and subtracted out with a large enough number of variables, the resulting model 
parameters are strikingly error-free and do not overfit the data. 

For me, maybe this is a bit old school here, but isn't the point of model 
development generating the most parsimonious model with the greatest 
explanatory power from the fewest variables.  I myself could just imagine going 
to a client and standing in a 'bored' (grin) room for a presentation, and 
saying hay client, here are the 200 variables that are driving choice 
behaviour.  I use latent class and bayes based approaches because they recover 
heterogeneity in utility allocation across the sample, that to me is a big 
battle in choice based analytics.  

I believe that after a certain point, a heap of predictors become meaningless.  
I can see some of my colleagues adopting this because it is in SAS and makes up 
for poor design.  

Anyway, from a technical point of view, I would have to read a little about the 
error they are referring to.  Good on them for developing a new technology, 
like any algorithm, it will have its strengths and weaknesses and depending on 
factors such as usability etc, will gain some level of acceptance. 

Paul


 Simon Blomberg [EMAIL PROTECTED] wrote:
 
 From what I've read (which isn't much), the idea is to estimate a
 utility (preference) function for discrete categories, using logistic
 regression, under the assumption that the residuals of the linear
 predictor of the utilities are ~ Type I Gumbel. This implies the
 independence of irrelevant alternatives in economic jargon. ie the
 utility of choice a versus choice b is independent of the introduction
 of a third choice c. It also implies homoscedasticity of the errors. The
 model can be generalized in various ways. If you are willing to
 introduce extra parameters into the model, such as the parameters of the
 Gumbel distribution, you may get more precision in the estimates of the
 utility function. An alternative (without the independence of irrelevant
 alternatives assumption) is to model the errors as multivariate normal
 (ie use probit regression), which is computationally much more
 difficult.
 
 Whether it makes substantive sense to use these models outside of
 discrete choice experiments is another question.
 
  Patenting these methods is worrying. There have been a lot of people
 working on discrete choice experiments over the years. It's hard to
 believe that a single company could have ownership over an idea that is
 the result of a collaborative effort such as this.
 
 Cheers,
 
 Simon.
 
  On Thu, 2007-04-26 at 12:29 +1000, Tim Churches wrote:
  This news item in a data mining newsletter makes various claims for a 
 technique called Reduced Error Logistic Regression: 
 http://www.kdnuggets.com/news/2007/n08/12i.html
  
  In brief, are these (ambitious) claims justified and if so, has this 
 technique been implemented in R (or does anyone have any plans to do 
 so)? 
  
  Tim C
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 -- 
 Simon Blomberg, BSc (Hons), PhD, MAppStat. 
 Lecturer and Consultant Statistician 
 Faculty of Biological and Chemical Sciences 
 The University of Queensland 
 St. Lucia Queensland 4072 
 Australia
 
 Room 320, Goddard Building (8)
 T: +61 7 3365 2506 
 email: S.Blomberg1_at_uq.edu.au 
 
 The combination of some data and an aching desire for 
 an answer does not ensure that a reasonable answer can 
 be extracted from a given body of data. - John Tukey.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reduced Error Logistic Regression, and R?

2007-04-25 Thread Tim Churches
This news item in a data mining newsletter makes various claims for a technique 
called Reduced Error Logistic Regression: 
http://www.kdnuggets.com/news/2007/n08/12i.html

In brief, are these (ambitious) claims justified and if so, has this technique 
been implemented in R (or does anyone have any plans to do so)? 

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reduced Error Logistic Regression, and R?

2007-04-25 Thread Roy Mendelssohn
I don't know about the claims, but I do know about this:

 Recent News: January 31, 2007. St. Louis, MO - Rice Analytics  
 applied for a U.S. patent this week on a generalized form of  
 Reduced Error Logistic Regression.  This generalized form allows  
 repeated measures, multilevel, and survival designs that include  
 individual level estimates.  None of these capabilities were  
 possible with the previously disclosed formulation which also had  
 limited application because it could only be applied to models  
 where all variables had no missing observations

This is a very bad trend in science and statistics, IMHO.

-Roy M.

On Apr 25, 2007, at 7:29 PM, Tim Churches wrote:

 This news item in a data mining newsletter makes various claims for  
 a technique called Reduced Error Logistic Regression: http:// 
 www.kdnuggets.com/news/2007/n08/12i.html

 In brief, are these (ambitious) claims justified and if so, has  
 this technique been implemented in R (or does anyone have any plans  
 to do so)?

 Tim C

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S.  
Government or NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division 
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: [EMAIL PROTECTED] (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reduced Error Logistic Regression, and R?

2007-04-25 Thread Simon Blomberg
From what I've read (which isn't much), the idea is to estimate a
utility (preference) function for discrete categories, using logistic
regression, under the assumption that the residuals of the linear
predictor of the utilities are ~ Type I Gumbel. This implies the
independence of irrelevant alternatives in economic jargon. ie the
utility of choice a versus choice b is independent of the introduction
of a third choice c. It also implies homoscedasticity of the errors. The
model can be generalized in various ways. If you are willing to
introduce extra parameters into the model, such as the parameters of the
Gumbel distribution, you may get more precision in the estimates of the
utility function. An alternative (without the independence of irrelevant
alternatives assumption) is to model the errors as multivariate normal
(ie use probit regression), which is computationally much more
difficult.

Whether it makes substantive sense to use these models outside of
discrete choice experiments is another question.

 Patenting these methods is worrying. There have been a lot of people
working on discrete choice experiments over the years. It's hard to
believe that a single company could have ownership over an idea that is
the result of a collaborative effort such as this.

Cheers,

Simon.

 On Thu, 2007-04-26 at 12:29 +1000, Tim Churches wrote:
 This news item in a data mining newsletter makes various claims for a 
 technique called Reduced Error Logistic Regression: 
 http://www.kdnuggets.com/news/2007/n08/12i.html
 
 In brief, are these (ambitious) claims justified and if so, has this 
 technique been implemented in R (or does anyone have any plans to do so)? 
 
 Tim C
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia

Room 320, Goddard Building (8)
T: +61 7 3365 2506 
email: S.Blomberg1_at_uq.edu.au 

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.