Re: [R] Optimization problem
Try this. 1. following Ben remove the Randalstown point and reset the levels of the Location factor. 2. then replace solve with ginv so it uses the generalized inverse to calculate the hessian: alan2 - subset(alan, subset = Location != Randalstown) alan2$Location - factor(as.character(alan2$Location)) library(MASS) solve - ginv zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data = alan2) rm(solve) On 8/21/07, Ben Bolker [EMAIL PROTECTED] wrote: (Hope this gets threaded properly. Sorry if it doesn't.) Gabor: Lac and Lacfac being the same is irrelevant, wouldn't produce NAs (but would produce something like a singular Hessian and maybe other problems) -- but they're not even specified in this model. The bottom line is that you have a location with a single observation, so the GLM that zicounts runs to get the initial parameter values has an unestimable location:mass interaction for one location, so it gives an NA, so optim complains. In gruesome detail: ## set up data scardat = read.table(scars.dat,header=TRUE) library(zicounts) ## try to run model zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat) ## tried to debug this by dumping zicounts.R to a file, modifying ## it to put a trace argument in that would print out the parameters ## and log-likelihood for every call to the log-likelihood function. dump(zicounts,file=zicounts.R) source(zicounts.R) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,trace=TRUE) ## this actually didn't do any good because the negative log-likelihood ## function never gets called -- as it turns out optim() barfs when it ## gets its initial values, before it ever gets to evaluating the log-likelihood ## check the glm -- this is the equivalent of what zicounts does to ## get the initial values of the x parameters p1 - glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,family=poisson) which(is.na(coef(p1))) ## find out what the deal is table(scardat$Location) scar2 = subset(scardat,Location!=Randalstown) ## first step to removing the bad point from the data set -- but ... table(scar2$Location) ## it leaves the Location factor with the same levels, so ## now we have ZERO counts for one location: ## redefine the factor to drop unused levels scar2$Location - factor(scar2$Location) ## OK, looks fine now table(scar2$Location) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scar2) ## now we get another error (system is computationally singular when ## trying to compute Hessian -- overparameterized?) Not in any ## trivial way that I can see. It would be nice to get into the guts ## of zicounts and stop it from trying to invert the Hessian, which is ## I think where this happens. In the meanwhile, I have some other ideas about this analysis (sorry, but you started it ...) Looking at the data in a few different ways: library(lattice) xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE, auto.key=list(columns=3)) xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE) xyplot(Scars~Lar,groups=Location,data=scar2, auto.key=list(columns=3)) xyplot(Scars~Mass|Lar,data=scar2) xyplot(Scars~Lar|Location,data=scar2) Some thoughts: (1) I'm not at all sure that zero-inflation is necessary (see Warton 2005, Environmentrics). This is a fairly small, noisy data set without huge numbers of zeros -- a plain old negative binomial might be fine. I don't actually see a lot of signal here, period (although there may be some) ... there's not a huge range in Lar (whatever it is -- the rest of the covariates I think I can interpret). It would be tempting to try to fit location as a random effect, because fitting all those extra degrees of freedom is going to kill you. On the other hand, GLMMs are a bit hairy. cheers Ben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimization problem
Hello Folks, Very new to R so bear with me, running 5.2 on XP. Trying to do a zero-inflated negative binomial regression on placental scar data as dependent. Lactation, location, number of tick larvae present and mass of mouse are independents. Dataframe and attributes below: Location Lac Scars Lar Mass Lacfac 1 Tullychurry 0 0 15 13.87 0 2 Somerset 0 0 0 15.60 0 3 Tollymore 0 0 3 16.43 0 4 Tollymore 0 0 0 16.55 0 5 Caledon 0 0 0 17.47 0 6 Hillsborough 1 5 0 18.18 1 7 Caledon 0 0 1 19.06 0 8 Portglenone 0 4 0 19.10 0 9 Portglenone 0 5 0 19.13 0 10Tollymore 0 5 3 19.50 0 11 Hillsborough 1 5 0 19.58 1 12 Portglenone 0 4 0 19.76 0 13 Caledon 0 8 0 19.97 0 14 Hillsborough 1 4 0 20.02 1 15 Tullychurry 0 3 3 20.13 0 16 Hillsborough 1 5 0 20.18 1 17 LoughNavar 1 5 0 20.20 1 18Tollymore 0 0 1 20.24 0 19 Hillsborough 1 5 0 20.48 1 20 Caledon 0 4 1 20.56 0 21 Caledon 0 3 2 20.58 0 22Tollymore 0 4 3 20.58 0 23Tollymore 0 0 2 20.88 0 24 Hillsborough 1 0 0 21.01 1 25 Portglenone 0 5 0 21.08 0 26 Tullychurry 0 2 5 21.28 0 27 Ballysallagh 1 4 0 21.59 1 28 Caledon 0 0 1 21.68 0 29 Hillsborough 1 5 0 22.09 1 30 Tullychurry 0 5 5 22.28 0 31 Tullychurry 1 6 75 22.43 1 32 Ballysallagh 1 5 0 22.57 1 33 Ballysallagh 1 4 0 22.67 1 34 LoughNavar 1 5 3 22.71 1 35 Hillsborough 1 4 0 23.01 1 36 Caledon 0 0 3 23.08 0 37 LoughNavar 1 5 0 23.53 1 38 Ballysallagh 1 4 0 23.55 1 39 Portglenone 1 6 0 23.61 1 40 Mt.Stewart 0 3 0 23.70 0 41 Somerset 0 5 0 23.83 0 42 Ballysallagh 1 5 0 23.93 1 43 Ballysallagh 1 5 0 24.01 1 44 Caledon 0 0 3 24.14 0 45 LoughNavar 0 6 0 24.30 0 46 LoughNavar 1 5 0 24.34 1 47 Hillsborough 1 4 0 24.45 1 48 Caledon 0 3 2 24.55 0 49 Tullychurry 0 5 44 24.83 0 50 Hillsborough 1 5 0 24.86 1 51 Ballysallagh 1 5 0 25.02 1 52 Tullychurry 0 0 9 25.27 0 53 Mt.Stewart 0 5 0 25.31 0 54 LoughNavar 1 4 8 25.43 1 55 Somerset 1 0 0 25.58 1 56 Hillsborough 1 5 0 25.82 1 57 Portglenone 1 2 0 26.02 1 58 Ballysallagh 1 5 0 26.19 1 59 Mt.Stewart 1 0 0 26.66 1 60 Randalstown 1 0 1 26.70 1 61 Somerset 0 4 0 27.01 0 62 Mt.Stewart 0 4 0 27.05 0 63 Somerset 0 3 0 27.10 0 64 Somerset 0 6 0 27.34 0 65 Somerset 0 0 0 27.87 0 66 LoughNavar 1 5 1 28.01 1 67 Tullychurry 1 6 42 28.55 1 68 Hillsborough 1 5 0 28.84 1 69 Portglenone 1 4 0 29.00 1 70 Somerset 1 4 0 31.87 1 71 Ballysallagh 1 5 0 33.06 1 72 LoughNavar 1 4 0 33.24 1 73 Somerset 1 4 0 33.36 1 alan : 'data.frame':73 obs. of 6 variables: $ Location: Factor w/ 10 levels Ballysallagh,..: 10 8 9 9 2 3 2 6 6 9 ... $ Lac : int 0 0 0 0 0 1 0 0 0 0 ... $ Scars : int 0 0 0 0 0 5 0 4 5 5 ... $ Lar : int 15 0 3 0 0 0 1 0 0 3 ... $ Mass: num 13.9 15.6 16.4 16.6 17.5 ... $ Lacfac : Factor w/ 2 levels 0,1: 1 1 1 1 1 2 1 1 1 1 ... The syntax I used to create the model is: zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan) The error given is: Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE, : non-finite value supplied by optim In addition: Warning message: fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial()) I understand this is a problem with the model I specified, could anyone help out?? Many thanks Alan Harrison Quercus Queen's University Belfast MBC, 97 Lisburn Road Belfast BT9 7BL T: 02890 972219 M: 07798615682 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization problem
Lac and Lacfac are the same. On 8/21/07, Alan Harrison [EMAIL PROTECTED] wrote: Hello Folks, Very new to R so bear with me, running 5.2 on XP. Trying to do a zero-inflated negative binomial regression on placental scar data as dependent. Lactation, location, number of tick larvae present and mass of mouse are independents. Dataframe and attributes below: Location Lac Scars Lar Mass Lacfac 1 Tullychurry 0 0 15 13.87 0 2 Somerset 0 0 0 15.60 0 3 Tollymore 0 0 3 16.43 0 4 Tollymore 0 0 0 16.55 0 5 Caledon 0 0 0 17.47 0 6 Hillsborough 1 5 0 18.18 1 7 Caledon 0 0 1 19.06 0 8 Portglenone 0 4 0 19.10 0 9 Portglenone 0 5 0 19.13 0 10Tollymore 0 5 3 19.50 0 11 Hillsborough 1 5 0 19.58 1 12 Portglenone 0 4 0 19.76 0 13 Caledon 0 8 0 19.97 0 14 Hillsborough 1 4 0 20.02 1 15 Tullychurry 0 3 3 20.13 0 16 Hillsborough 1 5 0 20.18 1 17 LoughNavar 1 5 0 20.20 1 18Tollymore 0 0 1 20.24 0 19 Hillsborough 1 5 0 20.48 1 20 Caledon 0 4 1 20.56 0 21 Caledon 0 3 2 20.58 0 22Tollymore 0 4 3 20.58 0 23Tollymore 0 0 2 20.88 0 24 Hillsborough 1 0 0 21.01 1 25 Portglenone 0 5 0 21.08 0 26 Tullychurry 0 2 5 21.28 0 27 Ballysallagh 1 4 0 21.59 1 28 Caledon 0 0 1 21.68 0 29 Hillsborough 1 5 0 22.09 1 30 Tullychurry 0 5 5 22.28 0 31 Tullychurry 1 6 75 22.43 1 32 Ballysallagh 1 5 0 22.57 1 33 Ballysallagh 1 4 0 22.67 1 34 LoughNavar 1 5 3 22.71 1 35 Hillsborough 1 4 0 23.01 1 36 Caledon 0 0 3 23.08 0 37 LoughNavar 1 5 0 23.53 1 38 Ballysallagh 1 4 0 23.55 1 39 Portglenone 1 6 0 23.61 1 40 Mt.Stewart 0 3 0 23.70 0 41 Somerset 0 5 0 23.83 0 42 Ballysallagh 1 5 0 23.93 1 43 Ballysallagh 1 5 0 24.01 1 44 Caledon 0 0 3 24.14 0 45 LoughNavar 0 6 0 24.30 0 46 LoughNavar 1 5 0 24.34 1 47 Hillsborough 1 4 0 24.45 1 48 Caledon 0 3 2 24.55 0 49 Tullychurry 0 5 44 24.83 0 50 Hillsborough 1 5 0 24.86 1 51 Ballysallagh 1 5 0 25.02 1 52 Tullychurry 0 0 9 25.27 0 53 Mt.Stewart 0 5 0 25.31 0 54 LoughNavar 1 4 8 25.43 1 55 Somerset 1 0 0 25.58 1 56 Hillsborough 1 5 0 25.82 1 57 Portglenone 1 2 0 26.02 1 58 Ballysallagh 1 5 0 26.19 1 59 Mt.Stewart 1 0 0 26.66 1 60 Randalstown 1 0 1 26.70 1 61 Somerset 0 4 0 27.01 0 62 Mt.Stewart 0 4 0 27.05 0 63 Somerset 0 3 0 27.10 0 64 Somerset 0 6 0 27.34 0 65 Somerset 0 0 0 27.87 0 66 LoughNavar 1 5 1 28.01 1 67 Tullychurry 1 6 42 28.55 1 68 Hillsborough 1 5 0 28.84 1 69 Portglenone 1 4 0 29.00 1 70 Somerset 1 4 0 31.87 1 71 Ballysallagh 1 5 0 33.06 1 72 LoughNavar 1 4 0 33.24 1 73 Somerset 1 4 0 33.36 1 alan : 'data.frame':73 obs. of 6 variables: $ Location: Factor w/ 10 levels Ballysallagh,..: 10 8 9 9 2 3 2 6 6 9 ... $ Lac : int 0 0 0 0 0 1 0 0 0 0 ... $ Scars : int 0 0 0 0 0 5 0 4 5 5 ... $ Lar : int 15 0 3 0 0 0 1 0 0 3 ... $ Mass: num 13.9 15.6 16.4 16.6 17.5 ... $ Lacfac : Factor w/ 2 levels 0,1: 1 1 1 1 1 2 1 1 1 1 ... The syntax I used to create the model is: zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan) The error given is: Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE, : non-finite value supplied by optim In addition: Warning message: fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial()) I understand this is a problem with the model I specified, could anyone help out?? Many thanks Alan Harrison Quercus Queen's University Belfast MBC, 97 Lisburn Road Belfast BT9 7BL T: 02890 972219 M: 07798615682 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Optimization problem
(Hope this gets threaded properly. Sorry if it doesn't.) Gabor: Lac and Lacfac being the same is irrelevant, wouldn't produce NAs (but would produce something like a singular Hessian and maybe other problems) -- but they're not even specified in this model. The bottom line is that you have a location with a single observation, so the GLM that zicounts runs to get the initial parameter values has an unestimable location:mass interaction for one location, so it gives an NA, so optim complains. In gruesome detail: ## set up data scardat = read.table(scars.dat,header=TRUE) library(zicounts) ## try to run model zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat) ## tried to debug this by dumping zicounts.R to a file, modifying ## it to put a trace argument in that would print out the parameters ## and log-likelihood for every call to the log-likelihood function. dump(zicounts,file=zicounts.R) source(zicounts.R) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,trace=TRUE) ## this actually didn't do any good because the negative log-likelihood ## function never gets called -- as it turns out optim() barfs when it ## gets its initial values, before it ever gets to evaluating the log-likelihood ## check the glm -- this is the equivalent of what zicounts does to ## get the initial values of the x parameters p1 - glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,family=poisson) which(is.na(coef(p1))) ## find out what the deal is table(scardat$Location) scar2 = subset(scardat,Location!=Randalstown) ## first step to removing the bad point from the data set -- but ... table(scar2$Location) ## it leaves the Location factor with the same levels, so ## now we have ZERO counts for one location: ## redefine the factor to drop unused levels scar2$Location - factor(scar2$Location) ## OK, looks fine now table(scar2$Location) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scar2) ## now we get another error (system is computationally singular when ## trying to compute Hessian -- overparameterized?) Not in any ## trivial way that I can see. It would be nice to get into the guts ## of zicounts and stop it from trying to invert the Hessian, which is ## I think where this happens. In the meanwhile, I have some other ideas about this analysis (sorry, but you started it ...) Looking at the data in a few different ways: library(lattice) xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE, auto.key=list(columns=3)) xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE) xyplot(Scars~Lar,groups=Location,data=scar2, auto.key=list(columns=3)) xyplot(Scars~Mass|Lar,data=scar2) xyplot(Scars~Lar|Location,data=scar2) Some thoughts: (1) I'm not at all sure that zero-inflation is necessary (see Warton 2005, Environmentrics). This is a fairly small, noisy data set without huge numbers of zeros -- a plain old negative binomial might be fine. I don't actually see a lot of signal here, period (although there may be some) ... there's not a huge range in Lar (whatever it is -- the rest of the covariates I think I can interpret). It would be tempting to try to fit location as a random effect, because fitting all those extra degrees of freedom is going to kill you. On the other hand, GLMMs are a bit hairy. cheers Ben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimization problem: selecting independent rows to maximizethe mean
Does R have packages for such multi-objectives optimization problems ? The rgenoud (R-GENetic Optimization Using Derivatives) package allows for multiple object optimization problems. See the lexical option which searches for the Pareto front. The package is written for NP-hard problems (but they are...well...difficult). See CRAN or: http://sekhon.berkeley.edu/rgenoud/ Cheers, Jas. === Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 === nojhan wrote: Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a ?crit : 2) That the mean and sd can be simultaneously optimized as you describe-- what if the subset with maximum mean also has bigger than minimal sd? Then you have two choices : 1) balance the two objectives with weights, according to the importance you give to each one 2) get a list of non-dominated solutions (a Pareto front) Does R have packages for such multi-objectives optimization problems ? Moreover, does it have a package for difficult (i.e. NP-hard) problems ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Optimization problem: selecting independent rows to maximizethe mean
Regarding multi-object optimization, I just got 0 hits from RSiteSearch(multi-objective optimization) and RSiteSearch(multiobjective optimization). However, it shouldn't be too difficult to write a wrapper function to blend other functions however you would like, then use optim or nlminb or one of the other optimizers in R. I don't feel qualified to even comment on 'difficult (i.e. NP-hard) problems'. hope this helps, spencer graves nojhan wrote: Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a écrit : 2) That the mean and sd can be simultaneously optimized as you describe-- what if the subset with maximum mean also has bigger than minimal sd? Then you have two choices : 1) balance the two objectives with weights, according to the importance you give to each one 2) get a list of non-dominated solutions (a Pareto front) Does R have packages for such multi-objectives optimization problems ? Moreover, does it have a package for difficult (i.e. NP-hard) problems ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Optimization problem: selecting independent rows to maximizethe mean
Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a écrit : 2) That the mean and sd can be simultaneously optimized as you describe-- what if the subset with maximum mean also has bigger than minimal sd? Then you have two choices : 1) balance the two objectives with weights, according to the importance you give to each one 2) get a list of non-dominated solutions (a Pareto front) Does R have packages for such multi-objectives optimization problems ? Moreover, does it have a package for difficult (i.e. NP-hard) problems ? -- nojhan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Optimization problem: selecting independent rows to maximize the mean
Dear R community, I have a dataframe with 500,000 rows and 102 columns. The rows represent spatial polygons, some of which overlap others (i.e., not all rows are independent of each other). Given a particular row, the first column contains a unique RowID. The second column contains the Variable of interest. The remaining 100 columns (Overlap1 ... Overlap100) each contain a row ID that overlaps this row (but if this row overlaps fewer than 100 other rows then the remainder of the columns OL1...OL100 contain NA). Here's the problem: I need to select the subset of 500 independent rows that maximizes the mean and minimizes the stdev of Variable. Clearly this requires iterative selection and comparison of rows, because each newly-selected row must be compared to rows already selected to ensure it does not overlap them. At each step, a row already selected might be removed from the subset if it can be replaced with another that increases the mean and/or reduces the stdev. The above description is a simplification of my problem, but it's a start. As I am new to R (and programming in general) I'm not sure how to start thinking about this, or even where to look. I'd appreciate any ideas that might help. Thank you, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Optimization problem: selecting independent rows to maximizethe mean
This sounds either easy via a greedy algorithm or NP-hard. Moreover, it is not clear to me that 1) A subset of 500 indpendent rows exists, where I presume independent means pairwise nonoverlapping; 2) That the mean and sd can be simultaneously optimized as you describe-- what if the subset with maximum mean also has bigger than minimal sd? -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Sent: Wednesday, March 01, 2006 12:40 PM To: r-help@stat.math.ethz.ch Subject: [R] Optimization problem: selecting independent rows to maximizethe mean Dear R community, I have a dataframe with 500,000 rows and 102 columns. The rows represent spatial polygons, some of which overlap others (i.e., not all rows are independent of each other). Given a particular row, the first column contains a unique RowID. The second column contains the Variable of interest. The remaining 100 columns (Overlap1 ... Overlap100) each contain a row ID that overlaps this row (but if this row overlaps fewer than 100 other rows then the remainder of the columns OL1...OL100 contain NA). Here's the problem: I need to select the subset of 500 independent rows that maximizes the mean and minimizes the stdev of Variable. Clearly this requires iterative selection and comparison of rows, because each newly-selected row must be compared to rows already selected to ensure it does not overlap them. At each step, a row already selected might be removed from the subset if it can be replaced with another that increases the mean and/or reduces the stdev. The above description is a simplification of my problem, but it's a start. As I am new to R (and programming in general) I'm not sure how to start thinking about this, or even where to look. I'd appreciate any ideas that might help. Thank you, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Optimization problem: selecting independent rows to maximize the mean
Package lpSolve might help. On 3/1/06, Mark [EMAIL PROTECTED] wrote: Dear R community, I have a dataframe with 500,000 rows and 102 columns. The rows represent spatial polygons, some of which overlap others (i.e., not all rows are independent of each other). Given a particular row, the first column contains a unique RowID. The second column contains the Variable of interest. The remaining 100 columns (Overlap1 ... Overlap100) each contain a row ID that overlaps this row (but if this row overlaps fewer than 100 other rows then the remainder of the columns OL1...OL100 contain NA). Here's the problem: I need to select the subset of 500 independent rows that maximizes the mean and minimizes the stdev of Variable. Clearly this requires iterative selection and comparison of rows, because each newly-selected row must be compared to rows already selected to ensure it does not overlap them. At each step, a row already selected might be removed from the subset if it can be replaced with another that increases the mean and/or reduces the stdev. The above description is a simplification of my problem, but it's a start. As I am new to R (and programming in general) I'm not sure how to start thinking about this, or even where to look. I'd appreciate any ideas that might help. Thank you, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] optimization problem in R ... can this be done?
Gregory Gentlemen wrote: Spencer: Thank you for the helpful suggestions. I have another question following some code I wrote. The function below gives a crude approximation for the x of interest (that value of x such that g(x,n) is less than 0 for all n). # // btilda optimize g(n,x) for some fixed x, and then approximately finds that g(n,x) #such that abs(g(n*,x)=0 // btilda - function(range,len) { # range: over which to look for x bb - seq(range[1],range[2],length=len) OBJ - sapply(bb,function(x) {fixed - c(x,100,692,50,1600,v1,217227); return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)}) tt - data.frame(b=bb,obj=OBJ) tt$absobj - abs(tt$obj) d - tt[order(tt$absobj),][1:3,] return(as.vector(d)) } For instance a run of btilda(c(20.55806,20.55816),1) returns: b objabsobj 5834 20.55812 -0.00049428480.0004942848 5833 20.55812 0.0011715433 0.0011715433 5835 20.55812-0.00216011400.0021601140 My question is how to improve the precision of b (which is x) here. It seems that seq(20.55806,20.55816,length=1 ) is only precise to 5 or so digits, and thus, is equivalent for numerous succesive Why do you think so? It is much more accurate! See ?.Machine Uwe Ligges values. How can I get around this? Spencer Graves [EMAIL PROTECTED] wrote: Part of the R culture is a statement by Simon Blomberg immortalized in library(fortunes) as, This is R. There is no if. Only how. I can't see now how I would automate a complete solution to your problem in general. However, given a specific g(x, n), I would start by writing a function to use expand.grid and contour to make a contour plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) and n = seq(0, n.max, npts) for a specified number of points npts. Then I'd play with x.max, n.max, and npts until I got what I wanted. With the right choices for x.max, n.max, and npts, the solution will be obvious from the plot. In some cases, nothing more will be required. If I wanted more than that, I would need to exploit further some specifics of the problem. For that, permit me to restate some of what I think I understood of your specific problem: (1) For fixed n, g(x, n) is monotonically decreasing in x0. (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or n=eps0, esp arbitrarily small) and the other at n2(x), say, with a local minimum in between at n1(x), say. With this, I would write functions to find n1(x) and n2(x) given x. I might not even need n1(x) if I could figure out how to obtain n2(x) without it. Then I'd make a plot with two lines (using plot and lines) of g(x, 0) and g(x, n2(x)) vs. x. By the time I'd done all that, if I still needed more, I'd probably have ideas about what else to do. hope this helps. spencer graves Gregory Gentlemen wrote: Im trying to ascertain whether or not the facilities of R are sufficient for solving an optimization problem I've come accross. Because of my limited experience with R, I would greatly appreciate some feedback from more frequent users. The problem can be delineated as such: A utility function, we shall call g is a function of x, n ... g(x,n). g has the properties: n 0, x lies on the real line. g may take values along the real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an inflection point, thereafter increasing until it reaches a maxima and then declinces (neither concave nor convex). My optimization problem is to find the largest positive x such that g(x,n) is less than zero for all n. In fact, because of the symmetry of g around x, we need only consider x 0. Such an x does exists in this problem, and of course g obtains a maximum value of 0 at some n for this value of x. my issue is writing some code to systematically obtain this value. Is R capable of handling such a problem? (i.e. through some sort of optimization fucntion, or some sort of grid search with the relevant constraints) Any suggestions would be appreciated. Gregory Gentlemen [EMAIL PROTECTED] The following is a sketch of an optimization problem I need to solve. __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] optimization problem in R ... can this be done?
The precision is not a problem, only the display, as Uwe indicated. Consider the following: (seq(25.5,25.6,length=20)-25.5)[c(1, 2, 19, 20)] [1] 0.00e+00 5.25e-07 9.50e-02 1.00e-01 ?options options(digits=20) seq(25.5,25.6,length=20)[c(1, 2, 19, 20)] [1] 25.5 25.50525 25.59475 25.6 spencer graves Gregory Gentlemen wrote: Okay let me attempt to be clear: if i construct the following sequence in R: seq(25.5,25.6,length=20) For instance, the last 10 elements of the sequence are all 26, and the preceding 20 are all 25.5. Presumably some rounding up is being done. How do I adjust the precision here such that each element is distinct? Thanks in advance guys, Gregory [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Uwe Ligges [EMAIL PROTECTED] wrote: Gregory Gentlemen wrote: Spencer: Thank you for the helpful suggestions. I have another question following some code I wrote. The function below gives a crude approximation for the x of interest (that value of x such that g(x,n) is less than 0 for all n). # // btilda optimize g(n,x) for some fixed x, and then approximately finds that g(n,x) # such that abs(g(n*,x)=0 // btilda - function(range,len) { # range: over which to look for x bb - seq(range[1],range[2],length=len) OBJ - sapply(bb,function(x) {fixed - c(x,100,692,50,1600,v1,217227); return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)}) tt - data.frame(b=bb,obj=OBJ) tt$absobj - abs(tt$obj) d - tt[order(tt$absobj),][1:3,] return(as.vector(d)) } ! For instance a run of btilda(c(20.55806,20.55816),1) returns: b obj absobj 5834 20.55812 -0.0004942848 0.0004942848 5833 20.55812 0.0011715433 0.0011715433 5835 20.55812 -0.0021601140 0.0021601140 My question is how to improve the precision of b (which is x) here. It seems that seq(20.55806,20.55816,length=1 ) is only precise to 5 or so digits, and thus, is equivalent for numerous succesive Why do you think so? It is much more accurate! See ?.Machine Uwe Ligges values. How can I get around this? Spencer Graves wrote: Part of the R culture is a statement by Simon Blomberg immortalized in library(fortunes) as, This is R. There is no if. Only how. I can't see now how I would automate a complete solution to your problem in general. However, given a specific ! g(x, n), I would start by writing a function to use expand.grid and contour to make a contour plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) and n = seq(0, n.max, npts) for a specified number of points npts. Then I'd play with x.max, n.max, and npts until I got what I wanted. With the right choices for x.max, n.max, and npts, the solution will be obvious from the plot. In some cases, nothing more will be required. If I wanted more than that, I would need to exploit further some specifics of the problem. For that, permit me to restate some of what I think I understood of your specific problem: (1) For fixed n, g(x, n) is monotonically decreasing in x0. (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or n=eps0, esp arbitrarily small) and the other at n2(x), say, with a local minimum in betwee! n at n1(x), say. With this, I would write functions to find n1(x) and n2(x) given x. I might not even need n1(x) if I could figure out how to obtain n2(x) without it. Then I'd make a plot with two lines (using plot and lines) of g(x, 0) and g(x, n2(x)) vs. x. By the time I'd done all that, if I still needed more, I'd probably have ideas about what else to do. hope this helps. spencer graves Gregory Gentlemen wrote: Im trying to ascertain whether or not the facilities of R are sufficient for solving an optimization problem I've come accross. Because of my limited experience with R, I would greatly appreciate some feedback from more frequent users. The problem can be delineated as such: A utility function, we shall call g is a function of x, n ... g(x,n)! . g has the properties: n 0, x lies on the real line. g may take values along the real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an inflection point, thereafter increasing until it reaches a maxima and then declinces (neither concave nor convex). My
Re: [R] optimization problem in R ... can this be done?
Part of the R culture is a statement by Simon Blomberg immortalized in library(fortunes) as, This is R. There is no if. Only how. I can't see now how I would automate a complete solution to your problem in general. However, given a specific g(x, n), I would start by writing a function to use expand.grid and contour to make a contour plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) and n = seq(0, n.max, npts) for a specified number of points npts. Then I'd play with x.max, n.max, and npts until I got what I wanted. With the right choices for x.max, n.max, and npts, the solution will be obvious from the plot. In some cases, nothing more will be required. If I wanted more than that, I would need to exploit further some specifics of the problem. For that, permit me to restate some of what I think I understood of your specific problem: (1) For fixed n, g(x, n) is monotonically decreasing in x0. (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or n=eps0, esp arbitrarily small) and the other at n2(x), say, with a local minimum in between at n1(x), say. With this, I would write functions to find n1(x) and n2(x) given x. I might not even need n1(x) if I could figure out how to obtain n2(x) without it. Then I'd make a plot with two lines (using plot and lines) of g(x, 0) and g(x, n2(x)) vs. x. By the time I'd done all that, if I still needed more, I'd probably have ideas about what else to do. hope this helps. spencer graves Gregory Gentlemen wrote: Im trying to ascertain whether or not the facilities of R are sufficient for solving an optimization problem I've come accross. Because of my limited experience with R, I would greatly appreciate some feedback from more frequent users. The problem can be delineated as such: A utility function, we shall call g is a function of x, n ... g(x,n). g has the properties: n 0, x lies on the real line. g may take values along the real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an inflection point, thereafter increasing until it reaches a maxima and then declinces (neither concave nor convex). My optimization problem is to find the largest positive x such that g(x,n) is less than zero for all n. In fact, because of the symmetry of g around x, we need only consider x 0. Such an x does exists in this problem, and of course g obtains a maximum value of 0 at some n for this value of x. my issue is writing some code to systematically obtain this value. Is R capable of handling such a problem? (i.e. through some sort of optimization fucntion, or some sort of grid search with the relevant constraints) Any suggestions would be appreciated. Gregory Gentlemen [EMAIL PROTECTED] The following is a sketch of an optimization problem I need to solve. __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] optimization problem in R ... can this be done?
Spencer: Thank you for the helpful suggestions. I have another question following some code I wrote. The function below gives a crude approximation for the x of interest (that value of x such that g(x,n) is less than 0 for all n). # // btilda optimize g(n,x) for some fixed x, and then approximately finds that g(n,x) #such that abs(g(n*,x)=0 // btilda - function(range,len) { # range: over which to look for x bb - seq(range[1],range[2],length=len) OBJ - sapply(bb,function(x) {fixed - c(x,100,692,50,1600,v1,217227); return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)}) tt - data.frame(b=bb,obj=OBJ) tt$absobj - abs(tt$obj) d - tt[order(tt$absobj),][1:3,] return(as.vector(d)) } For instance a run of btilda(c(20.55806,20.55816),1) returns: b objabsobj 5834 20.55812-0.00049428480.0004942848 5833 20.55812 0.00117154330.0011715433 5835 20.55812-0.00216011400.0021601140 My question is how to improve the precision of b (which is x) here. It seems that seq(20.55806,20.55816,length=1 ) is only precise to 5 or so digits, and thus, is equivalent for numerous succesive values. How can I get around this? Spencer Graves [EMAIL PROTECTED] wrote: Part of the R culture is a statement by Simon Blomberg immortalized in library(fortunes) as, This is R. There is no if. Only how. I can't see now how I would automate a complete solution to your problem in general. However, given a specific g(x, n), I would start by writing a function to use expand.grid and contour to make a contour plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) and n = seq(0, n.max, npts) for a specified number of points npts. Then I'd play with x.max, n.max, and npts until I got what I wanted. With the right choices for x.max, n.max, and npts, the solution will be obvious from the plot. In some cases, nothing more will be required. If I wanted more than that, I would need to exploit further some specifics of the problem. For that, permit me to restate some of what I think I understood of your specific problem: (1) For fixed n, g(x, n) is monotonically decreasing in x0. (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or n=eps0, esp arbitrarily small) and the other at n2(x), say, with a local minimum in between at n1(x), say. With this, I would write functions to find n1(x) and n2(x) given x. I might not even need n1(x) if I could figure out how to obtain n2(x) without it. Then I'd make a plot with two lines (using plot and lines) of g(x, 0) and g(x, n2(x)) vs. x. By the time I'd done all that, if I still needed more, I'd probably have ideas about what else to do. hope this helps. spencer graves Gregory Gentlemen wrote: Im trying to ascertain whether or not the facilities of R are sufficient for solving an optimization problem I've come accross. Because of my limited experience with R, I would greatly appreciate some feedback from more frequent users. The problem can be delineated as such: A utility function, we shall call g is a function of x, n ... g(x,n). g has the properties: n 0, x lies on the real line. g may take values along the real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an inflection point, thereafter increasing until it reaches a maxima and then declinces (neither concave nor convex). My optimization problem is to find the largest positive x such that g(x,n) is less than zero for all n. In fact, because of the symmetry of g around x, we need only consider x 0. Such an x does exists in this problem, and of course g obtains a maximum value of 0 at some n for this value of x. my issue is writing some code to systematically obtain this value. Is R capable of handling such a problem? (i.e. through some sort of optimization fucntion, or some sort of grid search with the relevant constraints) Any suggestions would be appreciated. Gregory Gentlemen [EMAIL PROTECTED] The following is a sketch of an optimization problem I need to solve. __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
[R] optimization problem in R ... can this be done?
Im trying to ascertain whether or not the facilities of R are sufficient for solving an optimization problem I've come accross. Because of my limited experience with R, I would greatly appreciate some feedback from more frequent users. The problem can be delineated as such: A utility function, we shall call g is a function of x, n ... g(x,n). g has the properties: n 0, x lies on the real line. g may take values along the real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an inflection point, thereafter increasing until it reaches a maxima and then declinces (neither concave nor convex). My optimization problem is to find the largest positive x such that g(x,n) is less than zero for all n. In fact, because of the symmetry of g around x, we need only consider x 0. Such an x does exists in this problem, and of course g obtains a maximum value of 0 at some n for this value of x. my issue is writing some code to systematically obtain this value. Is R capable of handling such a problem? (i.e. through some sort of optimization fucntion, or some sort of grid search with the relevant constraints) Any suggestions would be appreciated. Gregory Gentlemen [EMAIL PROTECTED] The following is a sketch of an optimization problem I need to solve. __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html