Re: [R] Constrained Regression
Hello, Your problem is y=bX+epsilon It can be transformed into: epsilon^2=(y-bX)^2 Standard (unconstrained) regressions are about minimizing the variance of epsilon, ie (y-bX)^2. In your case, you need to minimize again the quantity (y-bX)^2 with your constraints on b=(b1,...,b5). Solve.QP should just do that for you. HTH, Samuel -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jackie Chen Sent: 20 January 2011 16:31 To: R-help@r-project.org Subject: [R] Constrained Regression Hi everyone, I'm trying to perform a linear regression y = b1x1 + b2x2 + b3x3 + b4x4 + b5x5 while constraining the coefficients such that -3 = bi = 3, and the sum of bi =1. I've searched R-help and have found solutions for constrained regression using quadratic programming (solve.QP) where the coefficients are between 0 and 1 and sum to 1, but unfortunately do not understand it well enough to adapt to my problem. Is there a way to do this using the lm function or do I absolutely need to use solve.QP? And if I need to use solve.QP, how would I modify the Boston data example to my problem? Thanks so much. Jackie --- This communication may contain confidential and/or privileged information. If you are not the intended recipient (or have received this communication in error) please notify the sender immediately and destroy this communication. Any unauthorized copying, disclosure or distribution of the material in this communication is strictly forbidden. Deutsche Bank does not render legal or tax advice, and the information contained in this communication should not be regarded as such. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Information from ESET NOD32 Antivirus, version of virus signature database 5803 (20110120) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5803 (20110120) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 I tried the help on the constrained regression in R but I concede that it was not helpful. Any help is greatly appreciated -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
What is the stochastic mechanism that generates the data? In other words, what distribution is Y, conditioned on X1 and X2, supposed to be? You can also get `a' if you do not wanrt to specify the probability mechanism by just doing a least squares fitting, but then making inferences can be a bit tricky. Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Jim Silverton jim.silver...@gmail.com Date: Sunday, October 31, 2010 2:45 am Subject: Re: [R] Constrained Regression To: r-help@r-project.org Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 I tried the help on the constrained regression in R but I concede that it was not helpful. Any help is greatly appreciated -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote: Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 It would not accomplish the constraint that a 0 but you could accomplish the other constraint within an lm fit: X3 - X1-X2 lm(Y ~ X3 + offset(X2) ) Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2) Y ~ intercept + beta1*X1 + (1 -beta1)*X2 ... so beta1 is a. In the case beta 0 then I suppose a would be assigned 0. This might be accomplished within an iterative calculation framework by a large penalization for negative values. In a reply (1) to a question by Carlos Alzola in 2008 on rhalp, Berwin Turlach offered a solution to a similar problem ( sum(coef) == 1 AND coef non-negative). Modifying his code to incorporate the above strategy (and choosing two variables for which parameter values might be inside the constraint boundaries) we get: library(MASS) ## to access the Boston data designmat - model.matrix(medv~I(age-lstat) +offset(lstat), data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1,rep(0,NROW(Dmat))) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 0.686547 0.313453 Turlach specifically advised against any interpretation of this particular result which was only contructed to demonstrate the mathematical mechanics. I tried the help on the constrained regression in R but I concede that it was not helpful. I must not have that package installed because I got nothing that appeared to be useful with ??constrained regression . David Winsemius, MD West Hartford, CT 1) http://finzi.psych.upenn.edu/Rhelp10/2008-March/155990.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
Have you tried the 'sos' package? install.packages('sos') # if not already installed library(sos) cr - ???'constrained regression' # found 149 matches summary(cr) # in 69 packages cr # opens a table in a browser listing all 169 matches with links to the help pages However, I agree with Ravi Varadhan: I'd want to understand the physical mechanism generating the data. If each is, for example, a proportion, then I'd want to use logistic regression, possible after some approximate logistic transformation of X1 and X2 that prevents logit(X) from going to +/-Inf. This is a different model, but it achieves the need to avoid predictions of Y going outside the range (0, 1). Spencer On 10/31/2010 9:01 AM, David Winsemius wrote: On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote: Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 It would not accomplish the constraint that a 0 but you could accomplish the other constraint within an lm fit: X3 - X1-X2 lm(Y ~ X3 + offset(X2) ) Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2) Y ~ intercept + beta1*X1 + (1 -beta1)*X2 ... so beta1 is a. In the case beta 0 then I suppose a would be assigned 0. This might be accomplished within an iterative calculation framework by a large penalization for negative values. In a reply (1) to a question by Carlos Alzola in 2008 on rhalp, Berwin Turlach offered a solution to a similar problem ( sum(coef) == 1 AND coef non-negative). Modifying his code to incorporate the above strategy (and choosing two variables for which parameter values might be inside the constraint boundaries) we get: library(MASS) ## to access the Boston data designmat - model.matrix(medv~I(age-lstat) +offset(lstat), data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1,rep(0,NROW(Dmat))) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 0.686547 0.313453 Turlach specifically advised against any interpretation of this particular result which was only contructed to demonstrate the mathematical mechanics. I tried the help on the constrained regression in R but I concede that it was not helpful. I must not have that package installed because I got nothing that appeared to be useful with ??constrained regression . David Winsemius, MD West Hartford, CT 1) http://finzi.psych.upenn.edu/Rhelp10/2008-March/155990.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
On Oct 31, 2010, at 12:54 PM, Spencer Graves wrote: Have you tried the 'sos' package? I have, and I am taking this opportunity to load it with my .Rprofile to make it more accessible. It works very well. Very clean display. I also have constructed a variant of RSiteSearch that I find more useful than the current defaults. rhelpSearch - function(string, restrict = c(Rhelp10, Rhelp08, Rhelp02, functions ), matchesPerPage = 100, ...) { RSiteSearch(string=string, restrict = restrict, matchesPerPage = matchesPerPage, ...)} install.packages('sos') # if not already installed library(sos) cr - ???'constrained regression' # found 149 matches summary(cr) # in 69 packages cr # opens a table in a browser listing all 169 matches with links to the help pages However, I agree with Ravi Varadhan: I'd want to understand the physical mechanism generating the data. If each is, for example, a proportion, then I'd want to use logistic regression, possible after some approximate logistic transformation of X1 and X2 that prevents logit(X) from going to +/-Inf. This is a different model, but it achieves the need to avoid predictions of Y going outside the range (0, 1). No argument. I defer to both of your greater experiences in such problems and your interest in educating us less knowledgeable users. I also need to amend my suggested strategy in situations where a linear model _might_ be appropriate, since I think the inclusion of the surrogate variable in the solve.QP setup is very probably creating confusion. After reconsideration I think one should keep the two approaches separate. These are two approaches to the non-intercept versions of the model that yield the same estimate (but only because the constraints do not get invoked ): lm(medv~I(age-lstat) +offset(lstat) -1, data=Boston) Call: lm(formula = medv ~ I(age - lstat) + offset(lstat) - 1, data = Boston) Coefficients: I(age - lstat) 0.1163 library(MASS) ## to access the Boston data designmat - model.matrix(medv~age+lstat-1, data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, + Boston$medv) Amat - cbind(1, diag(NROW(Dmat))); bvec - c(1,rep(0,NROW(Dmat))); meq - 1 library(quadprog); res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) # zapsmall not really needed in this instance [1] 0.1163065 0.8836935 -- David. Spencer On 10/31/2010 9:01 AM, David Winsemius wrote: On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote: Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 It would not accomplish the constraint that a 0 but you could accomplish the other constraint within an lm fit: X3 - X1-X2 lm(Y ~ X3 + offset(X2) ) Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2) Y ~ intercept + beta1*X1 + (1 -beta1)*X2 ... so beta1 is a. In the case beta 0 then I suppose a would be assigned 0. This might be accomplished within an iterative calculation framework by a large penalization for negative values. In a reply (1) to a question by Carlos Alzola in 2008 on rhalp, Berwin Turlach offered a solution to a similar problem ( sum(coef) == 1 AND coef non- negative). Modifying his code to incorporate the above strategy (and choosing two variables for which parameter values might be inside the constraint boundaries) we get: library(MASS) ## to access the Boston data designmat - model.matrix(medv~I(age-lstat) +offset(lstat), data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1,rep(0,NROW(Dmat))) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 0.686547 0.313453 Turlach specifically advised against any interpretation of this particular result which was only contructed to demonstrate the mathematical mechanics. I tried the help on the constrained regression in R but I concede that it was not helpful. I must not have that package installed because I got nothing that appeared to be useful with ??constrained regression . David Winsemius, MD West Hartford, CT 1) http://finzi.psych.upenn.edu/Rhelp10/2008-March/155990.html __ David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Regression
I thought I would 'add' some meat to the problem I sent. This is all I know (1) f = a*X1 + (1-a)*X2 (2) I know n values of f and X1 which happens to be probabilities (3) I know nothing about X2 except that it also lies in (0,1) (4) X1 is the probability under the null (fisher's exact test) and X2 is the alternative. But I have no idea what the alternative to fisher's exact test can be.. (5) I need to estimate a (which is supposed to be a proportion). (6) I was thinking about imputing values from (0,1) using a beta as the values of X2. Any help is greatly appreciated. Jim... On Sun, Oct 31, 2010 at 1:44 PM, David Winsemius dwinsem...@comcast.netwrote: On Oct 31, 2010, at 12:54 PM, Spencer Graves wrote: Have you tried the 'sos' package? I have, and I am taking this opportunity to load it with my .Rprofile to make it more accessible. It works very well. Very clean display. I also have constructed a variant of RSiteSearch that I find more useful than the current defaults. rhelpSearch - function(string, restrict = c(Rhelp10, Rhelp08, Rhelp02, functions ), matchesPerPage = 100, ...) { RSiteSearch(string=string, restrict = restrict, matchesPerPage = matchesPerPage, ...)} install.packages('sos') # if not already installed library(sos) cr - ???'constrained regression' # found 149 matches summary(cr) # in 69 packages cr # opens a table in a browser listing all 169 matches with links to the help pages However, I agree with Ravi Varadhan: I'd want to understand the physical mechanism generating the data. If each is, for example, a proportion, then I'd want to use logistic regression, possible after some approximate logistic transformation of X1 and X2 that prevents logit(X) from going to +/-Inf. This is a different model, but it achieves the need to avoid predictions of Y going outside the range (0, 1). No argument. I defer to both of your greater experiences in such problems and your interest in educating us less knowledgeable users. I also need to amend my suggested strategy in situations where a linear model _might_ be appropriate, since I think the inclusion of the surrogate variable in the solve.QP setup is very probably creating confusion. After reconsideration I think one should keep the two approaches separate. These are two approaches to the non-intercept versions of the model that yield the same estimate (but only because the constraints do not get invoked ): lm(medv~I(age-lstat) +offset(lstat) -1, data=Boston) Call: lm(formula = medv ~ I(age - lstat) + offset(lstat) - 1, data = Boston) Coefficients: I(age - lstat) 0.1163 library(MASS) ## to access the Boston data designmat - model.matrix(medv~age+lstat-1, data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, + Boston$medv) Amat - cbind(1, diag(NROW(Dmat))); bvec - c(1,rep(0,NROW(Dmat))); meq - 1 library(quadprog); res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) # zapsmall not really needed in this instance [1] 0.1163065 0.8836935 -- David. Spencer On 10/31/2010 9:01 AM, David Winsemius wrote: On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote: Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a) 0 for the model: Y = a*X1 + (1-a)*X2 It would not accomplish the constraint that a 0 but you could accomplish the other constraint within an lm fit: X3 - X1-X2 lm(Y ~ X3 + offset(X2) ) Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2) Y ~ intercept + beta1*X1 + (1 -beta1)*X2 ... so beta1 is a. In the case beta 0 then I suppose a would be assigned 0. This might be accomplished within an iterative calculation framework by a large penalization for negative values. In a reply (1) to a question by Carlos Alzola in 2008 on rhalp, Berwin Turlach offered a solution to a similar problem ( sum(coef) == 1 AND coef non-negative). Modifying his code to incorporate the above strategy (and choosing two variables for which parameter values might be inside the constraint boundaries) we get: library(MASS) ## to access the Boston data designmat - model.matrix(medv~I(age-lstat) +offset(lstat), data=Boston) Dmat -crossprod(designmat, designmat); dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1,rep(0,NROW(Dmat))) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 0.686547 0.313453 Turlach specifically advised against any interpretation of this particular result which was only contructed to demonstrate the mathematical mechanics. I tried the help on the constrained regression in R but I concede that it was not helpful. I must not have that package installed because I got
Re: [R] Constrained Regression
in line On 10/31/2010 6:26 PM, Jim Silverton wrote: I thought I would 'add' some meat to the problem I sent. This is all I know (1) f = a*X1 + (1-a)*X2 How do you know f = a*X1 + (1-a)*X2? Why does this relationship make more sense than, e.g., log(f/(1-f)) = a*X1 + (1-a)*X2? (2) I know n values of f and X1 which happens to be probabilities What do you mean that f and X1 are probabilities? Where do they come from? Are the the results of binomial experiments like tosses of a biased coin? (3) I know nothing about X2 except that it also lies in (0,1) (4) X1 is the probability under the null (fisher's exact test) and X2 is the alternative. But I have no idea what the alternative to fisher's exact test can be.. How do you compute X1? Do you compute it from data? If yes, then it's more like an estimate of a probability rather than a probability itself. Ditto for X2. The preferred method for estimating almost anything whenever feasible is to write a likelihood function, i.e., the probability of data as a function of unknown parameters, then estimate those parameters to maximize the likelihood. Even most nonparametric procedures can be expressed this way for an appropriately chosen probability model. In most situations, I prefer to eliminate constraints by transformations, replacing f by ph = log(f/(1-f), X1 and X2 by xi1 = log(X1/(1-X1)) and X2 by xi2 = log(X2/(1-X2)), a by alpha = log(a/(1-a)), then write a relationship between these unconstrained variables and try to estimate alpha by maximum likelihood. Hope this helps. Spencer (5) I need to estimate a (which is supposed to be a proportion). (6) I was thinking about imputing values from (0,1) using a beta as the values of X2. Any help is greatly appreciated. Jim... On Sun, Oct 31, 2010 at 1:44 PM, David Winsemiusdwinsem...@comcast.netwrote: On Oct 31, 2010, at 12:54 PM, Spencer Graves wrote: Have you tried the 'sos' package? I have, and I am taking this opportunity to load it with my .Rprofile to make it more accessible. It works very well. Very clean display. I also have constructed a variant of RSiteSearch that I find more useful than the current defaults. rhelpSearch- function(string, restrict = c(Rhelp10, Rhelp08, Rhelp02, functions ), matchesPerPage = 100, ...) { RSiteSearch(string=string, restrict = restrict, matchesPerPage = matchesPerPage, ...)} install.packages('sos') # if not already installed library(sos) cr- ???'constrained regression' # found 149 matches summary(cr) # in 69 packages cr # opens a table in a browser listing all 169 matches with links to the help pages However, I agree with Ravi Varadhan: I'd want to understand the physical mechanism generating the data. If each is, for example, a proportion, then I'd want to use logistic regression, possible after some approximate logistic transformation of X1 and X2 that prevents logit(X) from going to +/-Inf. This is a different model, but it achieves the need to avoid predictions of Y going outside the range (0, 1). No argument. I defer to both of your greater experiences in such problems and your interest in educating us less knowledgeable users. I also need to amend my suggested strategy in situations where a linear model _might_ be appropriate, since I think the inclusion of the surrogate variable in the solve.QP setup is very probably creating confusion. After reconsideration I think one should keep the two approaches separate. These are two approaches to the non-intercept versions of the model that yield the same estimate (but only because the constraints do not get invoked ): lm(medv~I(age-lstat) +offset(lstat) -1, data=Boston) Call: lm(formula = medv ~ I(age - lstat) + offset(lstat) - 1, data = Boston) Coefficients: I(age - lstat) 0.1163 library(MASS) ## to access the Boston data designmat- model.matrix(medv~age+lstat-1, data=Boston) Dmat-crossprod(designmat, designmat); dvec- crossprod(designmat, + Boston$medv) Amat- cbind(1, diag(NROW(Dmat))); bvec- c(1,rep(0,NROW(Dmat))); meq- 1 library(quadprog); res- solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) # zapsmall not really needed in this instance [1] 0.1163065 0.8836935 -- David. Spencer On 10/31/2010 9:01 AM, David Winsemius wrote: On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote: Hello everyone, I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I want to do a constrained regression such that a0 and (1-a)0 for the model: Y = a*X1 + (1-a)*X2 It would not accomplish the constraint that a 0 but you could accomplish the other constraint within an lm fit: X3- X1-X2 lm(Y ~ X3 + offset(X2) ) Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2) Y ~ intercept + beta1*X1 + (1 -beta1)*X2 ... so beta1 is a. In the case beta 0 then I suppose a would be assigned 0. This
Re: [R] Constrained regression
Dear Carlos, One approach is to use structural equation modeling (SEM). Some SEM packages, such as LISREL, Mplus and Mx, allow inequality and nonlinear constraints. Phantom variables (Rindskopf, 1984) may be used to impose inequality constraints. Your model is basically: y = b0 + b1*b1*x1 + b2*b2*x2 +...+ bp*bp*xp + e 1 = b1*b1 + b2*b2 +...+ bp*bp Alternatively, you can set some condition bounds on the parameter estimates. Then you only have to impose the second constraint. Rindskopf, D. (1984). Using phantom and imaginary latent variables to parameterize constraints in linear structural models. Psychometrika, 49, 37-47. Regards, Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Mon, Mar 3, 2008 at 11:52 AM, Carlos Alzola [EMAIL PROTECTED] wrote: Dear list members, I am trying to get information on how to fit a linear regression with constrained parameters. Specifically, I have 8 predictors , their coeffiecients should all be non-negative and add up to 1. I understand it is a quadratic programming problem but I have no experience in the subject. I searched the archives but the results were inconclusive. Could someone provide suggestions and references to the literature, please? Thank you very much. Carlos Carlos Alzola [EMAIL PROTECTED] (703) 242-6747 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained regression
G'day Carlos, On Mon, Mar 3, 2008 at 11:52 AM Carlos Alzola [EMAIL PROTECTED] wrote: I am trying to get information on how to fit a linear regression with constrained parameters. Specifically, I have 8 predictors , their coeffiecients should all be non-negative and add up to 1. I understand it is a quadratic programming problem but I have no experience in the subject. I searched the archives but the results were inconclusive. Could someone provide suggestions and references to the literature, please? A suggestion: library(MASS) ## to access the Boston data designmat - model.matrix(medv~., data=Boston) Dmat - crossprod(designmat, designmat) dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1, rep(0,NROW(Dmat)) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) The solution seems to contain values that are, for all practical purposes, actually zero: res$solution [1] 4.535581e-16 2.661931e-18 1.016929e-01 -1.850699e-17 [5] 1.458219e-16 -3.892418e-15 8.544939e-01 0.00e+00 [9] 2.410742e-16 2.905722e-17 -5.700600e-20 -4.227261e-17 [13] 4.381328e-02 -3.723065e-18 So perhaps better: zapsmall(res$solution) [1] 0.000 0.000 0.1016929 0.000 0.000 0.000 [7] 0.8544939 0.000 0.000 0.000 0.000 0.000 [13] 0.0438133 0.000 So the estimates seem to follow the constraints. And the unconstrained solution is: res$unconstrainted.solution [1] 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 [5] 2.686734e+00 -1.776661e+01 3.809865e+00 6.922246e-04 [9] -1.475567e+00 3.060495e-01 -1.233459e-02 -9.527472e-01 [13] 9.311683e-03 -5.247584e-01 which seems to coincide with what lm() thinks it should be: coef(lm(medv~., Boston)) (Intercept) crimzn indus chas 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00 noxrm age dis rad -1.776661e+01 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01 tax ptratio black lstat -1.233459e-02 -9.527472e-01 9.311683e-03 -5.247584e-01 So there seem to be no numeric problems. Otherwise we could have done something else (e.g calculate the QR factorization of the design matrix, say X, and give the R factor to solve.QP, instead of calculating X'X and giving that one to solve.QP). If the intercept is not supposed to be included in the set of constrained estimates, then something like the following can be done: Amat[1,] - 0 res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 6.073972 0.00 0.109124 0.00 0.00 0.00 0.863421 [8] 0.00 0.00 0.00 0.00 0.00 0.027455 0.00 Of course, since after the first command in that last block the second column of Amat contains only zeros Amat[,2] [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 we might as well have removed it (and the corresponding entry in bvec) Amat - Amat[, -2] bvec - bvec[-2] before calling solve.QP(). Note, the Boston data set was only used to illustrate how to fit such models, I do not want to imply that these models are sensible for these data. :-) Hope this helps. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.