Re: [R] Optimization problem

2007-08-22 Thread Gabor Grothendieck
Try this.

1. following Ben remove the Randalstown point and reset the levels of the
Location factor.

2. then replace solve with ginv so it uses the generalized inverse to calculate
the hessian:

alan2 - subset(alan, subset = Location != Randalstown)
alan2$Location - factor(as.character(alan2$Location))

library(MASS)
solve - ginv

zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass
+ Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data = alan2)

rm(solve)

On 8/21/07, Ben Bolker [EMAIL PROTECTED] wrote:

  (Hope this gets threaded properly.  Sorry if it doesn't.)

   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
 produce NAs (but would produce something like a singular Hessian
 and maybe other problems) -- but they're not even specified in this
 model.

  The bottom line is that you have a location with a single
 observation, so the GLM that zicounts runs to get the initial
 parameter values has an unestimable location:mass interaction
 for one location, so it gives an NA, so optim complains.

  In gruesome detail:

 ## set up  data
 scardat = read.table(scars.dat,header=TRUE)
 library(zicounts)
 ## try to run model
 zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat)
 ## tried to debug this by dumping zicounts.R to a file, modifying
 ## it to put a trace argument in that would print out the parameters
 ## and log-likelihood for every call to the log-likelihood function.
 dump(zicounts,file=zicounts.R)
 source(zicounts.R)
 zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat,trace=TRUE)
 ## this actually didn't do any good because the negative log-likelihood
 ## function never gets called -- as it turns out optim() barfs when it
 ## gets its initial values, before it ever gets to evaluating the
 log-likelihood

 ## check the glm -- this is the equivalent of what zicounts does to
 ## get the initial values of the x parameters
 p1 - glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
  data=scardat,family=poisson)
 which(is.na(coef(p1)))

 ## find out what the deal is
 table(scardat$Location)

 scar2 = subset(scardat,Location!=Randalstown)
 ## first step to removing the bad point from the data set -- but ...
 table(scar2$Location)
 ## it leaves the Location factor with the same levels, so
 ##  now we have ZERO counts for one location:
 ## redefine the factor to drop unused levels
 scar2$Location - factor(scar2$Location)
 ## OK, looks fine now
 table(scar2$Location)

 zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scar2)
 ## now we get another error (system is computationally singular when
 ## trying to compute Hessian -- overparameterized?)   Not in any
 ## trivial way that I can see.  It would be nice to get into the guts
 ## of zicounts and stop it from trying to invert the Hessian, which is
 ## I think where this happens.

  In the meanwhile, I have some other  ideas about this analysis (sorry,
 but you started it ...)

  Looking at the data in a few different ways:

 library(lattice)
 xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
   auto.key=list(columns=3))
 xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)

 xyplot(Scars~Lar,groups=Location,data=scar2,
   auto.key=list(columns=3))
 xyplot(Scars~Mass|Lar,data=scar2)
 xyplot(Scars~Lar|Location,data=scar2)

   Some thoughts: (1) I'm not at all sure that
 zero-inflation is necessary (see Warton 2005, Environmentrics).
 This is a fairly small, noisy data set without huge numbers
 of zeros -- a plain old negative binomial might be fine.

   I don't actually see a lot of signal here, period (although there may
 be some) ...
 there's not a huge range in Lar (whatever it is -- the rest of the
 covariates I
 think I can interpret).  It would be tempting to try to fit location as
 a random
 effect, because fitting all those extra degrees of freedom is going to
 kill you.
 On the other hand, GLMMs are a bit hairy.

   cheers
   Ben



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Optimization problem

2007-08-21 Thread Alan Harrison
Hello Folks,

Very new to R so bear with me, running 5.2 on XP.  Trying to do a zero-inflated 
negative binomial regression on placental scar data as dependent.  Lactation, 
location, number of tick larvae present and mass of mouse are independents.  
Dataframe and attributes below:


 Location Lac Scars Lar Mass Lacfac
1   Tullychurry   0 0  15 13.87  0
2  Somerset   0 0   0 15.60  0
3 Tollymore   0 0   3 16.43  0
4 Tollymore   0 0   0 16.55  0
5   Caledon   0 0   0 17.47  0
6  Hillsborough   1 5   0 18.18  1
7   Caledon   0 0   1 19.06  0
8   Portglenone   0 4   0 19.10  0
9   Portglenone   0 5   0 19.13  0
10Tollymore   0 5   3 19.50  0
11 Hillsborough   1 5   0 19.58  1
12  Portglenone   0 4   0 19.76  0
13  Caledon   0 8   0 19.97  0
14 Hillsborough   1 4   0 20.02  1
15  Tullychurry   0 3   3 20.13  0
16 Hillsborough   1 5   0 20.18  1
17   LoughNavar   1 5   0 20.20  1
18Tollymore   0 0   1 20.24  0
19 Hillsborough   1 5   0 20.48  1
20  Caledon   0 4   1 20.56  0
21  Caledon   0 3   2 20.58  0
22Tollymore   0 4   3 20.58  0
23Tollymore   0 0   2 20.88  0
24 Hillsborough   1 0   0 21.01  1
25  Portglenone   0 5   0 21.08  0
26  Tullychurry   0 2   5 21.28  0
27 Ballysallagh   1 4   0 21.59  1
28  Caledon   0 0   1 21.68  0
29 Hillsborough   1 5   0 22.09  1
30  Tullychurry   0 5   5 22.28  0
31  Tullychurry   1 6  75 22.43  1
32 Ballysallagh   1 5   0 22.57  1
33 Ballysallagh   1 4   0 22.67  1
34   LoughNavar   1 5   3 22.71  1
35 Hillsborough   1 4   0 23.01  1
36  Caledon   0 0   3 23.08  0
37   LoughNavar   1 5   0 23.53  1
38 Ballysallagh   1 4   0 23.55  1
39  Portglenone   1 6   0 23.61  1
40   Mt.Stewart   0 3   0 23.70  0
41 Somerset   0 5   0 23.83  0
42 Ballysallagh   1 5   0 23.93  1
43 Ballysallagh   1 5   0 24.01  1
44  Caledon   0 0   3 24.14  0
45   LoughNavar   0 6   0 24.30  0
46   LoughNavar   1 5   0 24.34  1
47 Hillsborough   1 4   0 24.45  1
48  Caledon   0 3   2 24.55  0
49  Tullychurry   0 5  44 24.83  0
50 Hillsborough   1 5   0 24.86  1
51 Ballysallagh   1 5   0 25.02  1
52  Tullychurry   0 0   9 25.27  0
53   Mt.Stewart   0 5   0 25.31  0
54   LoughNavar   1 4   8 25.43  1
55 Somerset   1 0   0 25.58  1
56 Hillsborough   1 5   0 25.82  1
57  Portglenone   1 2   0 26.02  1
58 Ballysallagh   1 5   0 26.19  1
59   Mt.Stewart   1 0   0 26.66  1
60  Randalstown   1 0   1 26.70  1
61 Somerset   0 4   0 27.01  0
62   Mt.Stewart   0 4   0 27.05  0
63 Somerset   0 3   0 27.10  0
64 Somerset   0 6   0 27.34  0
65 Somerset   0 0   0 27.87  0
66   LoughNavar   1 5   1 28.01  1
67  Tullychurry   1 6  42 28.55  1
68 Hillsborough   1 5   0 28.84  1
69  Portglenone   1 4   0 29.00  1
70 Somerset   1 4   0 31.87  1
71 Ballysallagh   1 5   0 33.06  1
72   LoughNavar   1 4   0 33.24  1
73 Somerset   1 4   0 33.36  1

alan : 'data.frame':73 obs. of  6 variables:
 $ Location: Factor w/ 10 levels Ballysallagh,..: 10 8 9 9 2 3 2 6 6 9 ...
 $ Lac : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
 $ Lar : int  15 0 3 0 0 0 1 0 0 3 ...
 $ Mass: num  13.9 15.6 16.4 16.6 17.5 ...
 $ Lacfac  : Factor w/ 2 levels 0,1: 1 1 1 1 1 2 1 1 1 1 ...

The syntax I used to create the model is:

zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + 
Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)

The error given is:

Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  : 
non-finite value supplied by optim
In addition: Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 
1), family = binomial())

I understand this is a problem with the model I specified, could anyone help 
out??

Many thanks

Alan Harrison

Quercus
Queen's University Belfast
MBC, 97 Lisburn Road
Belfast

BT9 7BL

T: 02890 972219
M: 07798615682


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimization problem

2007-08-21 Thread Gabor Grothendieck
Lac and Lacfac are the same.

On 8/21/07, Alan Harrison [EMAIL PROTECTED] wrote:
 Hello Folks,

 Very new to R so bear with me, running 5.2 on XP.  Trying to do a 
 zero-inflated negative binomial regression on placental scar data as 
 dependent.  Lactation, location, number of tick larvae present and mass of 
 mouse are independents.  Dataframe and attributes below:


  Location Lac Scars Lar Mass Lacfac
 1   Tullychurry   0 0  15 13.87  0
 2  Somerset   0 0   0 15.60  0
 3 Tollymore   0 0   3 16.43  0
 4 Tollymore   0 0   0 16.55  0
 5   Caledon   0 0   0 17.47  0
 6  Hillsborough   1 5   0 18.18  1
 7   Caledon   0 0   1 19.06  0
 8   Portglenone   0 4   0 19.10  0
 9   Portglenone   0 5   0 19.13  0
 10Tollymore   0 5   3 19.50  0
 11 Hillsborough   1 5   0 19.58  1
 12  Portglenone   0 4   0 19.76  0
 13  Caledon   0 8   0 19.97  0
 14 Hillsborough   1 4   0 20.02  1
 15  Tullychurry   0 3   3 20.13  0
 16 Hillsborough   1 5   0 20.18  1
 17   LoughNavar   1 5   0 20.20  1
 18Tollymore   0 0   1 20.24  0
 19 Hillsborough   1 5   0 20.48  1
 20  Caledon   0 4   1 20.56  0
 21  Caledon   0 3   2 20.58  0
 22Tollymore   0 4   3 20.58  0
 23Tollymore   0 0   2 20.88  0
 24 Hillsborough   1 0   0 21.01  1
 25  Portglenone   0 5   0 21.08  0
 26  Tullychurry   0 2   5 21.28  0
 27 Ballysallagh   1 4   0 21.59  1
 28  Caledon   0 0   1 21.68  0
 29 Hillsborough   1 5   0 22.09  1
 30  Tullychurry   0 5   5 22.28  0
 31  Tullychurry   1 6  75 22.43  1
 32 Ballysallagh   1 5   0 22.57  1
 33 Ballysallagh   1 4   0 22.67  1
 34   LoughNavar   1 5   3 22.71  1
 35 Hillsborough   1 4   0 23.01  1
 36  Caledon   0 0   3 23.08  0
 37   LoughNavar   1 5   0 23.53  1
 38 Ballysallagh   1 4   0 23.55  1
 39  Portglenone   1 6   0 23.61  1
 40   Mt.Stewart   0 3   0 23.70  0
 41 Somerset   0 5   0 23.83  0
 42 Ballysallagh   1 5   0 23.93  1
 43 Ballysallagh   1 5   0 24.01  1
 44  Caledon   0 0   3 24.14  0
 45   LoughNavar   0 6   0 24.30  0
 46   LoughNavar   1 5   0 24.34  1
 47 Hillsborough   1 4   0 24.45  1
 48  Caledon   0 3   2 24.55  0
 49  Tullychurry   0 5  44 24.83  0
 50 Hillsborough   1 5   0 24.86  1
 51 Ballysallagh   1 5   0 25.02  1
 52  Tullychurry   0 0   9 25.27  0
 53   Mt.Stewart   0 5   0 25.31  0
 54   LoughNavar   1 4   8 25.43  1
 55 Somerset   1 0   0 25.58  1
 56 Hillsborough   1 5   0 25.82  1
 57  Portglenone   1 2   0 26.02  1
 58 Ballysallagh   1 5   0 26.19  1
 59   Mt.Stewart   1 0   0 26.66  1
 60  Randalstown   1 0   1 26.70  1
 61 Somerset   0 4   0 27.01  0
 62   Mt.Stewart   0 4   0 27.05  0
 63 Somerset   0 3   0 27.10  0
 64 Somerset   0 6   0 27.34  0
 65 Somerset   0 0   0 27.87  0
 66   LoughNavar   1 5   1 28.01  1
 67  Tullychurry   1 6  42 28.55  1
 68 Hillsborough   1 5   0 28.84  1
 69  Portglenone   1 4   0 29.00  1
 70 Somerset   1 4   0 31.87  1
 71 Ballysallagh   1 5   0 33.06  1
 72   LoughNavar   1 4   0 33.24  1
 73 Somerset   1 4   0 33.36  1

 alan : 'data.frame':73 obs. of  6 variables:
  $ Location: Factor w/ 10 levels Ballysallagh,..: 10 8 9 9 2 3 2 6 6 9 ...
  $ Lac : int  0 0 0 0 0 1 0 0 0 0 ...
  $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
  $ Lar : int  15 0 3 0 0 0 1 0 0 3 ...
  $ Mass: num  13.9 15.6 16.4 16.6 17.5 ...
  $ Lacfac  : Factor w/ 2 levels 0,1: 1 1 1 1 1 2 1 1 1 1 ...

 The syntax I used to create the model is:

 zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + 
 Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)

 The error given is:

 Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  :
non-finite value supplied by optim
 In addition: Warning message:
 fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 
 1), family = binomial())

 I understand this is a problem with the model I specified, could anyone help 
 out??

 Many thanks

 Alan Harrison

 Quercus
 Queen's University Belfast
 MBC, 97 Lisburn Road
 Belfast

 BT9 7BL

 T: 02890 972219
 M: 07798615682


[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, 

Re: [R] Optimization problem

2007-08-21 Thread Ben Bolker

  (Hope this gets threaded properly.  Sorry if it doesn't.)

   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
produce NAs (but would produce something like a singular Hessian
and maybe other problems) -- but they're not even specified in this
model.

  The bottom line is that you have a location with a single
observation, so the GLM that zicounts runs to get the initial
parameter values has an unestimable location:mass interaction
for one location, so it gives an NA, so optim complains.

  In gruesome detail:

## set up  data
scardat = read.table(scars.dat,header=TRUE)
library(zicounts)
## try to run model
zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat)
## tried to debug this by dumping zicounts.R to a file, modifying
## it to put a trace argument in that would print out the parameters
## and log-likelihood for every call to the log-likelihood function.
dump(zicounts,file=zicounts.R)
source(zicounts.R)
zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat,trace=TRUE)
## this actually didn't do any good because the negative log-likelihood
## function never gets called -- as it turns out optim() barfs when it
## gets its initial values, before it ever gets to evaluating the 
log-likelihood

## check the glm -- this is the equivalent of what zicounts does to
## get the initial values of the x parameters
p1 - glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
  data=scardat,family=poisson)
which(is.na(coef(p1)))

## find out what the deal is
table(scardat$Location)

scar2 = subset(scardat,Location!=Randalstown)
## first step to removing the bad point from the data set -- but ...
table(scar2$Location)
## it leaves the Location factor with the same levels, so
##  now we have ZERO counts for one location:
## redefine the factor to drop unused levels
scar2$Location - factor(scar2$Location)
## OK, looks fine now
table(scar2$Location)

zinb.zc - zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scar2)
## now we get another error (system is computationally singular when
## trying to compute Hessian -- overparameterized?)   Not in any
## trivial way that I can see.  It would be nice to get into the guts
## of zicounts and stop it from trying to invert the Hessian, which is
## I think where this happens.

  In the meanwhile, I have some other  ideas about this analysis (sorry,
but you started it ...)

  Looking at the data in a few different ways:

library(lattice)
xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
   auto.key=list(columns=3))
xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)

xyplot(Scars~Lar,groups=Location,data=scar2,
   auto.key=list(columns=3))
xyplot(Scars~Mass|Lar,data=scar2)
xyplot(Scars~Lar|Location,data=scar2)

   Some thoughts: (1) I'm not at all sure that
zero-inflation is necessary (see Warton 2005, Environmentrics).
This is a fairly small, noisy data set without huge numbers
of zeros -- a plain old negative binomial might be fine.
 
   I don't actually see a lot of signal here, period (although there may 
be some) ...
there's not a huge range in Lar (whatever it is -- the rest of the 
covariates I
think I can interpret).  It would be tempting to try to fit location as 
a random
effect, because fitting all those extra degrees of freedom is going to 
kill you.
On the other hand, GLMMs are a bit hairy.

   cheers
   Ben

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Optimization problem: selecting independent rows to maximizethe mean

2006-03-06 Thread Jasjeet S. Sekhon

 Does R have packages for such multi-objectives optimization problems ?

The rgenoud (R-GENetic Optimization Using Derivatives) package
allows for multiple object optimization problems. See the lexical
option which searches for the Pareto front.  The package is written
for NP-hard problems (but they are...well...difficult).

See CRAN or:

http://sekhon.berkeley.edu/rgenoud/

Cheers,
Jas.

===
Jasjeet S. Sekhon 
  
Associate Professor 
Survey Research Center  
UC Berkeley 

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
===


nojhan wrote:
 Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a ?crit :
 
2) That the mean and sd can be simultaneously optimized as you
describe--
what if the subset with maximum mean also has bigger than minimal sd?
 
 
 Then you have two choices :
 1) balance the two objectives with weights, according to the
importance
 you give to each one
 2) get a list of non-dominated solutions (a Pareto front)
 
 Does R have packages for such multi-objectives optimization problems ?
 
 Moreover, does it have a package for difficult (i.e. NP-hard)
problems ?


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Optimization problem: selecting independent rows to maximizethe mean

2006-03-04 Thread Spencer Graves
  Regarding multi-object optimization, I just got 0 hits from 
RSiteSearch(multi-objective optimization) and 
RSiteSearch(multiobjective optimization).  However, it shouldn't be 
too difficult to write a wrapper function to blend other functions 
however you would like, then use optim or nlminb or one of the other 
optimizers in R.

  I don't feel qualified to even comment on 'difficult (i.e. NP-hard) 
problems'.  

  hope this helps,
  spencer graves

nojhan wrote:
 Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a écrit :
 
2) That the mean and sd can be simultaneously optimized as you describe--
what if the subset with maximum mean also has bigger than minimal sd?
 
 
 Then you have two choices :
 1) balance the two objectives with weights, according to the importance
 you give to each one
 2) get a list of non-dominated solutions (a Pareto front)
 
 Does R have packages for such multi-objectives optimization problems ?
 
 Moreover, does it have a package for difficult (i.e. NP-hard) problems ?


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Optimization problem: selecting independent rows to maximizethe mean

2006-03-02 Thread nojhan
Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a écrit :
 2) That the mean and sd can be simultaneously optimized as you describe--
 what if the subset with maximum mean also has bigger than minimal sd?

Then you have two choices :
1) balance the two objectives with weights, according to the importance
you give to each one
2) get a list of non-dominated solutions (a Pareto front)

Does R have packages for such multi-objectives optimization problems ?

Moreover, does it have a package for difficult (i.e. NP-hard) problems ?

-- 
nojhan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Optimization problem: selecting independent rows to maximize the mean

2006-03-01 Thread Mark
Dear R community,

I have a dataframe with 500,000 rows and 102 columns. The rows
represent spatial polygons, some of which overlap others (i.e., not
all rows are independent of each other).

Given a particular row, the first column contains a unique RowID.
The second column contains the Variable of interest. The remaining
100 columns (Overlap1 ... Overlap100) each contain a row ID that
overlaps this row (but if this row overlaps fewer than 100 other rows
then the remainder of the columns OL1...OL100 contain NA).

Here's the problem: I need to select the subset of 500 independent
rows that maximizes the mean and minimizes the stdev of Variable.

Clearly this requires iterative selection and comparison of rows,
because each newly-selected row must be compared to rows already
selected to ensure it does not overlap them. At each step, a row
already selected might be removed from the subset if it can be
replaced with another that increases the mean and/or reduces the
stdev.

The above description is a simplification of my problem, but it's a start.

As I am new to R (and programming in general) I'm not sure how to
start thinking about this, or even where to look. I'd appreciate any
ideas that might help.

Thank you, Mark

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Optimization problem: selecting independent rows to maximizethe mean

2006-03-01 Thread Berton Gunter
This sounds either easy via a greedy algorithm or NP-hard. Moreover, it is
not clear to me that

1) A subset of 500 indpendent rows exists, where I presume independent
means pairwise nonoverlapping;

2) That the mean and sd can be simultaneously optimized as you describe--
what if the subset with maximum mean also has bigger than minimal sd?


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Mark
 Sent: Wednesday, March 01, 2006 12:40 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Optimization problem: selecting independent rows 
 to maximizethe mean
 
 Dear R community,
 
 I have a dataframe with 500,000 rows and 102 columns. The rows
 represent spatial polygons, some of which overlap others (i.e., not
 all rows are independent of each other).
 
 Given a particular row, the first column contains a unique RowID.
 The second column contains the Variable of interest. The remaining
 100 columns (Overlap1 ... Overlap100) each contain a row ID that
 overlaps this row (but if this row overlaps fewer than 100 other rows
 then the remainder of the columns OL1...OL100 contain NA).
 
 Here's the problem: I need to select the subset of 500 independent
 rows that maximizes the mean and minimizes the stdev of Variable.
 
 Clearly this requires iterative selection and comparison of rows,
 because each newly-selected row must be compared to rows already
 selected to ensure it does not overlap them. At each step, a row
 already selected might be removed from the subset if it can be
 replaced with another that increases the mean and/or reduces the
 stdev.
 
 The above description is a simplification of my problem, but 
 it's a start.
 
 As I am new to R (and programming in general) I'm not sure how to
 start thinking about this, or even where to look. I'd appreciate any
 ideas that might help.
 
 Thank you, Mark
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Optimization problem: selecting independent rows to maximize the mean

2006-03-01 Thread Gabor Grothendieck
Package lpSolve might help.

On 3/1/06, Mark [EMAIL PROTECTED] wrote:
 Dear R community,

 I have a dataframe with 500,000 rows and 102 columns. The rows
 represent spatial polygons, some of which overlap others (i.e., not
 all rows are independent of each other).

 Given a particular row, the first column contains a unique RowID.
 The second column contains the Variable of interest. The remaining
 100 columns (Overlap1 ... Overlap100) each contain a row ID that
 overlaps this row (but if this row overlaps fewer than 100 other rows
 then the remainder of the columns OL1...OL100 contain NA).

 Here's the problem: I need to select the subset of 500 independent
 rows that maximizes the mean and minimizes the stdev of Variable.

 Clearly this requires iterative selection and comparison of rows,
 because each newly-selected row must be compared to rows already
 selected to ensure it does not overlap them. At each step, a row
 already selected might be removed from the subset if it can be
 replaced with another that increases the mean and/or reduces the
 stdev.

 The above description is a simplification of my problem, but it's a start.

 As I am new to R (and programming in general) I'm not sure how to
 start thinking about this, or even where to look. I'd appreciate any
 ideas that might help.

 Thank you, Mark

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] optimization problem in R ... can this be done?

2005-06-26 Thread Uwe Ligges
Gregory Gentlemen wrote:

 Spencer: Thank you for the helpful suggestions. I have another
 question following some code I wrote. The function below gives a
 crude approximation for the x of interest (that value of x such that
 g(x,n) is less than 0 for all n).
 
 # // btilda optimize g(n,x) for some fixed x, and then approximately
 finds that g(n,x) #such that abs(g(n*,x)=0 // btilda -
 function(range,len) { # range: over which to look for x bb -
 seq(range[1],range[2],length=len) OBJ - sapply(bb,function(x) {fixed
 - c(x,100,692,50,1600,v1,217227);
 return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)})
  tt - data.frame(b=bb,obj=OBJ) tt$absobj - abs(tt$obj) d -
 tt[order(tt$absobj),][1:3,] return(as.vector(d)) }
 
 For instance a run of
 
 btilda(c(20.55806,20.55816),1)  returns:
 
 b  objabsobj 5834   20.55812
 -0.00049428480.0004942848 5833   20.55812 0.0011715433
 0.0011715433 5835   20.55812-0.00216011400.0021601140
 
 My question is how to improve the precision of b (which is x) here.
 It seems that seq(20.55806,20.55816,length=1 ) is only precise to
 5 or so digits, and thus, is equivalent for numerous succesive

Why do you think so? It is much more accurate! See ?.Machine

Uwe Ligges



 values. How can I get around this?
 
 
 Spencer Graves [EMAIL PROTECTED] wrote: Part of the R culture
 is a statement by Simon Blomberg immortalized in library(fortunes)
 as, This is R. There is no if. Only how.
 
 I can't see now how I would automate a complete solution to your 
 problem in general. However, given a specific g(x, n), I would start
 by writing a function to use expand.grid and contour to make a
 contour plot of g(x, n) over specified ranges for x = seq(0, x.max,
 length=npts) and n = seq(0, n.max, npts) for a specified number of
 points npts. Then I'd play with x.max, n.max, and npts until I got
 what I wanted. With the right choices for x.max, n.max, and npts, the
 solution will be obvious from the plot. In some cases, nothing more
 will be required.
 
 If I wanted more than that, I would need to exploit further some 
 specifics of the problem. For that, permit me to restate some of what
 I think I understood of your specific problem:
 
 (1) For fixed n, g(x, n) is monotonically decreasing in x0.
 
 (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or 
 n=eps0, esp arbitrarily small) and the other at n2(x), say, with a 
 local minimum in between at n1(x), say.
 
 With this, I would write functions to find n1(x) and n2(x) given x. I
 might not even need n1(x) if I could figure out how to obtain n2(x) 
 without it. Then I'd make a plot with two lines (using plot and 
 lines) of g(x, 0) and g(x, n2(x)) vs. x.
 
 By the time I'd done all that, if I still needed more, I'd probably 
 have ideas about what else to do.
 
 hope this helps. spencer graves
 
 
 Gregory Gentlemen wrote:
 
 
 Im trying to ascertain whether or not the facilities of R are
 sufficient for solving an optimization problem I've come accross.
 Because of my limited experience with R, I would greatly appreciate
 some feedback from more frequent users. The problem can be
 delineated as such:
 
 A utility function, we shall call g is a function of x, n ...
 g(x,n). g has the properties: n  0, x lies on the real line. g may
 take values along the real line. g is such that g(x,n)=g(-x,n). g
 is a decreasing function of x for any n; for fixed x, g(x,n) is
 smooth and intially decreases upon reaching an inflection point,
 thereafter increasing until it reaches a maxima and then declinces
 (neither concave nor convex).
 
 My optimization problem is to find the largest positive x such that
 g(x,n) is less than zero for all n. In fact, because of the
 symmetry of g around x, we need only consider x  0. Such an x does
 exists in this problem, and of course g obtains a maximum value of
 0 at some n for this value of x. my issue is writing some code to
 systematically obtain this value.
 
 Is R capable of handling such a problem? (i.e. through some sort of
 optimization fucntion, or some sort of grid search with the
 relevant constraints)
 
 Any suggestions would be appreciated.
 
 Gregory Gentlemen [EMAIL PROTECTED]
 
 
 
 The following is a sketch of an optimization problem I need to
 solve.
 
 __
 
 
 
 [[alternative HTML version deleted]]
 
 __ 
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
 posting guide! http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] optimization problem in R ... can this be done?

2005-06-26 Thread Spencer Graves
  The precision is not a problem, only the display, as Uwe indicated. 
Consider the following:

  (seq(25.5,25.6,length=20)-25.5)[c(1, 2, 19, 20)]
[1] 0.00e+00 5.25e-07 9.50e-02 1.00e-01
  ?options
  options(digits=20)
  seq(25.5,25.6,length=20)[c(1, 2, 19, 20)]
[1] 25.5 25.50525 25.59475 25.6
 
  spencer graves

Gregory Gentlemen wrote:

 Okay let me attempt to be clear:
 if i construct the following sequence in R:
  
 seq(25.5,25.6,length=20)
  
 For instance, the last 10 elements of the sequence are all 26, and the 
 preceding 20 are all 25.5. Presumably some rounding up is being
 done. How do I adjust the precision here such that each element is distinct?
  
 Thanks in advance guys,
 Gregory
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 
 Uwe Ligges [EMAIL PROTECTED] wrote:
 
 Gregory Gentlemen wrote:
 
   Spencer: Thank you for the helpful suggestions. I have another
   question following some code I wrote. The function below gives a
   crude approximation for the x of interest (that value of x such that
   g(x,n) is less than 0 for all n).
  
   # // btilda optimize g(n,x) for some fixed x, and then approximately
   finds that g(n,x) # such that abs(g(n*,x)=0 // btilda -
   function(range,len) { # range: over which to look for x bb -
   seq(range[1],range[2],length=len) OBJ - sapply(bb,function(x) {fixed
   - c(x,100,692,50,1600,v1,217227);
  
 
 return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)})
   tt - data.frame(b=bb,obj=OBJ) tt$absobj - abs(tt$obj) d -
   tt[order(tt$absobj),][1:3,] return(as.vector(d)) }
  
 !  For instance a run of
  
   btilda(c(20.55806,20.55816),1)  returns:
  
   b obj absobj 5834 20.55812
   -0.0004942848 0.0004942848 5833 20.55812 0.0011715433
   0.0011715433 5835 20.55812 -0.0021601140 0.0021601140
  
   My question is how to improve the precision of b (which is x) here.
   It seems that seq(20.55806,20.55816,length=1 ) is only precise to
   5 or so digits, and thus, is equivalent for numerous succesive
 
 Why do you think so? It is much more accurate! See ?.Machine
 
 Uwe Ligges
 
 
 
   values. How can I get around this?
  
  
   Spencer Graves wrote: Part of the R culture
   is a statement by Simon Blomberg immortalized in library(fortunes)
   as, This is R. There is no if. Only how.
  
   I can't see now how I would automate a complete solution to your
   problem in general. However, given a specific ! g(x, n), I would
 start
   by writing a function to use expand.grid and contour to make a
   contour plot of g(x, n) over specified ranges for x = seq(0, x.max,
   length=npts) and n = seq(0, n.max, npts) for a specified number of
   points npts. Then I'd play with x.max, n.max, and npts until I got
   what I wanted. With the right choices for x.max, n.max, and npts, the
   solution will be obvious from the plot. In some cases, nothing more
   will be required.
  
   If I wanted more than that, I would need to exploit further some
   specifics of the problem. For that, permit me to restate some of what
   I think I understood of your specific problem:
  
   (1) For fixed n, g(x, n) is monotonically decreasing in x0.
  
   (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or
   n=eps0, esp arbitrarily small) and the other at n2(x), say, with a
   local minimum in betwee! n at n1(x), say.
  
   With this, I would write functions to find n1(x) and n2(x) given x. I
   might not even need n1(x) if I could figure out how to obtain n2(x)
   without it. Then I'd make a plot with two lines (using plot and
   lines) of g(x, 0) and g(x, n2(x)) vs. x.
  
   By the time I'd done all that, if I still needed more, I'd probably
   have ideas about what else to do.
  
   hope this helps. spencer graves
  
  
   Gregory Gentlemen wrote:
  
  
   Im trying to ascertain whether or not the facilities of R are
   sufficient for solving an optimization problem I've come accross.
   Because of my limited experience with R, I would greatly appreciate
   some feedback from more frequent users. The problem can be
   delineated as such:
  
   A utility function, we shall call g is a function of x, n ...
   g(x,n)! . g has the properties: n  0, x lies on the real line.
 g may
   take values along the real line. g is such that g(x,n)=g(-x,n). g
   is a decreasing function of x for any n; for fixed x, g(x,n) is
   smooth and intially decreases upon reaching an inflection point,
   thereafter increasing until it reaches a maxima and then declinces
   (neither concave nor convex).
  
   My 

Re: [R] optimization problem in R ... can this be done?

2005-06-25 Thread Spencer Graves
  Part of the R culture is a statement by Simon Blomberg immortalized 
in library(fortunes) as, This is R. There is no if. Only how.

  I can't see now how I would automate a complete solution to your 
problem in general.  However, given a specific g(x, n), I would start by 
writing a function to use expand.grid and contour to make a contour 
plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) 
and n = seq(0, n.max, npts) for a specified number of points npts.  Then 
I'd play with x.max, n.max, and npts until I got what I wanted.  With 
the right choices for x.max, n.max, and npts, the solution will be 
obvious from the plot.  In some cases, nothing more will be required.

  If I wanted more than that, I would need to exploit further some 
specifics of the problem.  For that, permit me to restate some of what I 
think I understood of your specific problem:

  (1) For fixed n, g(x, n) is monotonically decreasing in x0.

  (2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or 
n=eps0, esp arbitrarily small) and the other at n2(x), say, with a 
local minimum in between at n1(x), say.

  With this, I would write functions to find n1(x) and n2(x) given x. 
I might not even need n1(x) if I could figure out how to obtain n2(x) 
without it.  Then I'd make a plot with two lines (using plot and 
lines) of g(x, 0) and g(x, n2(x)) vs. x.

  By the time I'd done all that, if I still needed more, I'd probably 
have ideas about what else to do.

  hope this helps.  
  spencer graves


Gregory Gentlemen wrote:

 Im trying to ascertain whether or not the facilities of R are sufficient for 
 solving an optimization problem I've come accross. Because of my limited 
 experience with R, I would greatly appreciate some feedback from more 
 frequent users.
 The problem can be delineated as such:
  
 A utility function, we shall call g is a function of x, n ... g(x,n). g has 
 the properties: n  0, x lies on the real line. g may take values along the 
 real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for 
 any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an 
 inflection point, thereafter increasing until it reaches a maxima and then 
 declinces (neither concave nor convex).
  
 My optimization problem is to find the largest positive x such that g(x,n) is 
 less than zero for all n. In fact, because of the symmetry of g around x, we 
 need only consider x  0. Such an x does exists in this problem, and of 
 course g obtains a maximum value of 0 at some n for this value of x. my issue 
 is writing some code to systematically obtain this value. 
  
 Is R capable of handling such a problem? (i.e. through some sort of 
 optimization fucntion, or some sort of grid search with the relevant 
 constraints)
  
 Any suggestions would be appreciated.
  
 Gregory Gentlemen
 [EMAIL PROTECTED]
 
  
  
 The following is a sketch of an optimization problem I need to solve.
 
 __
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] optimization problem in R ... can this be done?

2005-06-25 Thread Gregory Gentlemen
Spencer: Thank you for the helpful suggestions.
I have another question following some code I wrote. The function below
gives a crude approximation for the x of interest (that value of x such that 
g(x,n) is less than 0 for all n).
 
# // btilda optimize g(n,x) for some fixed x, and then approximately finds that 
g(n,x) #such that abs(g(n*,x)=0 //
btilda - function(range,len) {
# range: over which to look for x
bb - seq(range[1],range[2],length=len)
OBJ - sapply(bb,function(x) {fixed - c(x,100,692,50,1600,v1,217227); 
return(optimize(g,c(1,1000),maximum=TRUE,tol=0.001,x=fixed)$objective)})
tt - data.frame(b=bb,obj=OBJ)
tt$absobj - abs(tt$obj)
d - tt[order(tt$absobj),][1:3,]
return(as.vector(d))
}
 
For instance a run of 
 btilda(c(20.55806,20.55816),1)  returns:
   b  objabsobj
5834   20.55812-0.00049428480.0004942848
5833   20.55812 0.00117154330.0011715433
5835   20.55812-0.00216011400.0021601140

My question is how to improve the precision of b (which is x) here. It seems 
that seq(20.55806,20.55816,length=1 ) is only precise to 5 or so digits, 
and thus, is equivalent for numerous succesive values. How can I get around 
this?


Spencer Graves [EMAIL PROTECTED] wrote:
Part of the R culture is a statement by Simon Blomberg immortalized 
in library(fortunes) as, This is R. There is no if. Only how.

I can't see now how I would automate a complete solution to your 
problem in general. However, given a specific g(x, n), I would start by 
writing a function to use expand.grid and contour to make a contour 
plot of g(x, n) over specified ranges for x = seq(0, x.max, length=npts) 
and n = seq(0, n.max, npts) for a specified number of points npts. Then 
I'd play with x.max, n.max, and npts until I got what I wanted. With 
the right choices for x.max, n.max, and npts, the solution will be 
obvious from the plot. In some cases, nothing more will be required.

If I wanted more than that, I would need to exploit further some 
specifics of the problem. For that, permit me to restate some of what I 
think I understood of your specific problem:

(1) For fixed n, g(x, n) is monotonically decreasing in x0.

(2) For fixed x, g(x, n) has only two local maxima, one at n=0 (or 
n=eps0, esp arbitrarily small) and the other at n2(x), say, with a 
local minimum in between at n1(x), say.

With this, I would write functions to find n1(x) and n2(x) given x. 
I might not even need n1(x) if I could figure out how to obtain n2(x) 
without it. Then I'd make a plot with two lines (using plot and 
lines) of g(x, 0) and g(x, n2(x)) vs. x.

By the time I'd done all that, if I still needed more, I'd probably 
have ideas about what else to do.

hope this helps. 
spencer graves


Gregory Gentlemen wrote:

 Im trying to ascertain whether or not the facilities of R are sufficient for 
 solving an optimization problem I've come accross. Because of my limited 
 experience with R, I would greatly appreciate some feedback from more 
 frequent users.
 The problem can be delineated as such:
 
 A utility function, we shall call g is a function of x, n ... g(x,n). g has 
 the properties: n  0, x lies on the real line. g may take values along the 
 real line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for 
 any n; for fixed x, g(x,n) is smooth and intially decreases upon reaching an 
 inflection point, thereafter increasing until it reaches a maxima and then 
 declinces (neither concave nor convex).
 
 My optimization problem is to find the largest positive x such that g(x,n) is 
 less than zero for all n. In fact, because of the symmetry of g around x, we 
 need only consider x  0. Such an x does exists in this problem, and of 
 course g obtains a maximum value of 0 at some n for this value of x. my issue 
 is writing some code to systematically obtain this value. 
 
 Is R capable of handling such a problem? (i.e. through some sort of 
 optimization fucntion, or some sort of grid search with the relevant 
 constraints)
 
 Any suggestions would be appreciated.
 
 Gregory Gentlemen
 [EMAIL PROTECTED]
 
 
 
 The following is a sketch of an optimization problem I need to solve.
 
 __
 
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel: 408-938-4420
Fax: 408-280-7915

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 

[R] optimization problem in R ... can this be done?

2005-06-24 Thread Gregory Gentlemen
Im trying to ascertain whether or not the facilities of R are sufficient for 
solving an optimization problem I've come accross. Because of my limited 
experience with R, I would greatly appreciate some feedback from more frequent 
users.
The problem can be delineated as such:
 
A utility function, we shall call g is a function of x, n ... g(x,n). g has the 
properties: n  0, x lies on the real line. g may take values along the real 
line. g is such that g(x,n)=g(-x,n). g is a decreasing function of x for any n; 
for fixed x, g(x,n) is smooth and intially decreases upon reaching an 
inflection point, thereafter increasing until it reaches a maxima and then 
declinces (neither concave nor convex).
 
My optimization problem is to find the largest positive x such that g(x,n) is 
less than zero for all n. In fact, because of the symmetry of g around x, we 
need only consider x  0. Such an x does exists in this problem, and of course 
g obtains a maximum value of 0 at some n for this value of x. my issue is 
writing some code to systematically obtain this value. 
 
Is R capable of handling such a problem? (i.e. through some sort of 
optimization fucntion, or some sort of grid search with the relevant 
constraints)
 
Any suggestions would be appreciated.
 
Gregory Gentlemen
[EMAIL PROTECTED]

 
 
The following is a sketch of an optimization problem I need to solve.

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html