Please, I seek expertise and advice, possibly leads to R packages or
stats literature.
My data: measurements of economic variables for each county of
California over 37 years. 
My dependent variable is square feet of office floor space permitted to
be added in a county.  
Independent variables include for example change in number of office
jobs in same county same year (and lagged years).  
Smaller (less populous) counties have many years in which there were no
permits taken out; the largest counties had at least some permitted
square footage each year.  Among the set of years and places where
permits were taken out,  smaller counties tend to more permitted square
footage per capita.  I imagine the relationships are as follows:
y* = desired change in floor space = X b + e , where X are independent
variables and b their coefficients, and e is heteroskedastic (by state)
and possibly autocorrelated noise.  e is the sort of noise you’d expect
on a time series of cross sections, no sampling (one observation for
every county for every year studied).  I must include fixed county
effects (county dummies) in X because I will need them for
county-specific forecasts later.
y = permits taken out
y = y* when y* > 0
y = 0 when y* = 0
y never has been observed  = 0 when population > pop*
How do folks recommend that I estimate this regression?  I could first
estimate the probability of having any permits given level of population
and change in office jobs.  That could include a deterministic
component: if population > pop* then y = y*.  I could try a tobit-2
model  or its ML estimator, which I see have just been developed in an R
package called “sampleSelection.”  But in that case, I have I would
guess in my ignorance that I’m way biased because it assumes specific
error distributions and that they are homoskedastic.  (I could transform
the data in advance to get homoskedasticity if necessary).  I could make
an instrument for probability(y*>0) and multiply that by each
observation of permits, avoiding distributional assumptions in e (but
not in prob(y*>0)), but would that give me really high variance?  (And
is there an easier way to find variance than algebraically figuring out
Newey West estimator for 2-stage method of moments procedures and how it
applies here?)  I know nothing of the non- and semi-parametric options
here but does someone had an article or book chapter telling that’s the
right thing to do and how?   It would be most convenient for me to use R
but I also have access to STATA and SAS.
Now another question for the statistically-minded:  After running this
regression I will forecast for each region how many square feet of
office space will have permits taken out for it each year, given
expected trends on office jobs and such.  This does not allow each
individual county to have a different type of response to office jobs,
assuming the office job coefficient is pooled.  Please be encouraged to
comment on these options I am considering to allow more variation:
County-specific coefficients don’t work well; I tried separate
(admittedly OLS) regressions for each county and found that with only 35
or so observations per county, my variances were too large and
coefficients were insignificant and often of unintuitive signs.
Random coefficients won’t give me county specific info, which I’ll need
for the forecasts.
So is this idea good?  After I have coefficients from the pooled
regression above, I take each coefficient b and its standard error.  I
use that as a stochastic restriction or Bayesian prior, for individual
county regressions.  That is, each county regression estimates its own b
value, but subject to the stochastic restriction or Bayesian prior that
b is in fact the pooled b, with the distribution of said prior being
that b’s variance is the variance we estimated in the pooled regression.
 (I’m thinking of what has been called Bayesian/Mixed Estimation here,
but if I’m out of the loop on newer better techniques, do tell.)  I’d
think this county-specific estimation would be a simple non-tobit
regression for  large counties that never lack additions in any year. 
For small counties, I might need to do a tobit-style or instrumental
variable regression again (or whatever you folks recommend).  It might
be harder to estimate probability of nonzero permits on the smaller
sample size so I might have to keep the old estimate.

All thoughts are welcome and appreciated.  Thanks very very much.
 

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to