[R] The math underlying the `betareg' package?

2007-01-18 Thread Ajay Narottam Shah
Folks,

The betareg package appears to be polished and works well. But I would
like to look at the exact formulas for the underlying model being
estimated, the likelihood function, etc. E.g. if one has to compute
\frac{\partial E(y)}{\partial x_i}, this requires careful calculations
through these formulas. I read Regression analysis of variates
observed on (0,1): percentages, proportions and fractions, by
Kieschnick  MucCullogh, `Statistical Modelling 2003, 3:193-213. They
say that the beta regression that they show is a proposal of theirs -
is this the same as what betareg does, or is this the Standard
Formulation?

What else should I be reading about beta regressions? :-)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ooops, small mistake fixed (pretty printing multiple models)

2006-08-31 Thread Ajay Narottam Shah
The R code I just mailed out had a small error in it. This one
works. Now what one needs is a way to get decimal alignment in LaTeX
tabular objects.

x1 - runif(100); x2 - runif(100); y - 2 + 3*x1 - 4*x2 + rnorm(100)
m1 - summary(lm(y ~ x1))
m2 - summary(lm(y ~ x2))
m3 - summary(lm(y ~ x1 + x2))

# What I want is this table:
# 
# ---
# M1 M2  M3
# ---
# Intercept 0.0816 3.6292 2.2272
#  (0.5533)   (0.2316)***(0.2385)***
# 
# x12.81512.7606
#  (0.5533)***   (0.3193)***
# 
# x2  -4.2899-4.2580
# (0.401)*** (0.3031)***
# 
# $\sigma_e$1.538  1.175  0.8873
# $R^2$ 0.2089 0.5385 0.7393
# ---

mmp - function(regressors, bottom.matter, models.names, allmodels) {
  numbers - matrix(NA, nrow=(2*length(regressors))+length(bottom.matter),
ncol=length(models.names))
  colnames(numbers) - models.names
  rownames(numbers) - rep(t, nrow(numbers))

  baserow - 1
  for (i in 1:length(regressors)) {
if (regressors[i] == Intercept) {
  regex - ^\\(Intercept\\)$
} else {
  regex - paste(^, regressors[i], $, sep=)
}
rownames(numbers)[baserow] - regressors[i]
for (j in 1:length(allmodels)) {
  m - allmodels[[j]]
  if (any(locations - grep(regex, rownames(m$coefficients {
if (length(locations)  1) {
  cat(Regex , regex,  has multiple matches at model , j, \n)
  return(NULL)
}
numbers[baserow,j] - as.numeric(sprintf(%.4f,
 m$coefficients[locations,1]))
numbers[baserow+1,j] - as.numeric(sprintf(%.4f,
   m$coefficients[locations,3]))
  }
}
baserow - baserow + 2
  }

# Now process the bottom.matter
  for (i in 1:length(bottom.matter)) {
if (bottom.matter[i] == sigma) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$sigma))
  }
  rownames(numbers)[baserow] - Residual std. dev.
  baserow - baserow + 1
}

if (bottom.matter[i] == r.squared) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$r.squared))
  }
  rownames(numbers)[baserow] - $R^2$
  baserow - baserow + 1
}

if (bottom.matter[i] == adj.r.squared) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$adj.r.squared))
  }
  rownames(numbers)[baserow] - Adjusted $R^2$
  baserow - baserow + 1
}
  }
  numbers
}

# Given a 't' statistic, give me a string with
# '*' or '**' or '***' based on the 90%, 95% and 99% cutoffs on N(0,1)
stars - function(t) {
  t - abs(t)
  n - -1 + as.numeric(cut(t,breaks=c(-0.01,-qnorm(c(0.05, 0.025, 0.005)),Inf)))
  if (n == 0) {
return()
  } else {
return(paste($^\\textrm{,
 paste(rep(*, n), sep=, collapse=),
 }$, sep=))
  }
}

specialised.latex.generation - function(numbers) {
  cat(\\hline\n)
  for (j in 1:ncol(numbers)) {
cat(  , colnames(numbers)[j])
  }
  cat(\n\\hline\n)
  for (i in 1:nrow(numbers)) {
if (rownames(numbers)[i] == t) {
  for (j in 1:ncol(numbers)) {
if (is.na(numbers[i,j])) {
  cat(  )
} else {
  cat(  , sprintf((%s)%s, numbers[i,j], stars(numbers[i,j])))
}
  }
  cat([1mm]\n)
} else {
  cat(rownames(numbers)[i])
  for (j in 1:ncol(numbers)) {
if (is.na(numbers[i,j])) {
  cat(  )
} else {
  cat(  , numbers[i, j])
}
  }
  cat(\n)
}
  }
  cat(\\hline)
}

numbers - mmp(regressors=c(Intercept, x1, x2),
   bottom.matter=c(sigma, r.squared, adj.r.squared),
   models.names=c(M1, M2, M3),
   allmodels=list(m1, m2, m3))
numbers
specialised.latex.generation(numbers)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pretty-printing multiple regression models

2006-08-31 Thread Ajay Narottam Shah
A few days ago, I had asked this question. Consider this situation:
 x1 - runif(100); x2 - runif(100); y - 2 + 3*x1 - 4*x2 + rnorm(100)
 m1 - summary(lm(y ~ x1))
 m2 - summary(lm(y ~ x2))
 m3 - summary(lm(y ~ x1 + x2))

You have estimated 3 different competing models, and suppose you
want to present the set of models in one table. xtable(m1) is cool,
but that would give us 3 different tables.

What I want is this table:

---
M1 M2  M3
---
Intercept 0.0816 3.6292 2.2272
 (0.5533)   (0.2316)***(0.2385)***

x12.81512.7606
 (0.5533)***   (0.3193)***

x2  -4.2899-4.2580
(0.401)*** (0.3031)***

$\sigma_e$1.538  1.175  0.8873
$R^2$ 0.2089 0.5385 0.7393
---

I was given feedback from this mailing list that this is a specialised
display and requires custom code. So I wrote some code. I will be very
happy if you could look at this code, and give me ideas on how to do
it better, and how to generalise it. I am most unhappy with the fact
that right now, I'm tied to the fact that summary.lm() gives you
something which has a $coefficients. Ideally this kind of display
should be general for all kinds of models.

My code produces two results:

 numbers
   M1  M2   M3
Intercept  0.0691  3.1110   1.7831
t  0.2471 12.1860   6.
x1 2.6595  NA   2.7772
t  5.3837  NA   8.0206
x2 NA -3.2788  -3.3683
t  NA -7.6922 -10.1331
Residual std. dev. 1.3567  1.2195   0.9505
$R^2$  0.2283  0.3765   0.6251
Adjusted $R^2$ 0.2204  0.3701   0.6174

and:

 specialised.latex.generation(numbers)
\hline
   M1   M2   M3\\
\hline
Intercept   0.0691   3.111   1.7831\\
   (0.2471)   (12.186)$^\mbox{***}   (6.)$^\mbox{***}\\[1mm]
x1   2.6595 2.7772\\
   (5.3837)$^\mbox{***} (8.0206)$^\mbox{***}\\[1mm]
x2 -3.2788   -3.3683\\
 (-7.6922)$^\mbox{***}   (-10.1331)$^\mbox{***}\\[1mm]
Residual std. dev.   1.3567   1.2195   0.9505\\
$R^2$   0.2283   0.3765   0.6251\\
Adjusted $R^2$   0.2204   0.3701   0.6174\\
\hline

mmp - function(regressors, bottom.matter, models.names, allmodels) {
  numbers - matrix(NA, nrow=(2*length(regressors))+length(bottom.matter),
ncol=length(models.names))
  colnames(numbers) - models.names
  rownames(numbers) - rep(t, nrow(numbers))

  baserow - 1
  for (i in 1:length(regressors)) {
if (regressors[i] == Intercept) {
  regex - ^\\(Intercept\\)$
} else {
  regex - paste(^, regressors[i], $, sep=)
}
rownames(numbers)[baserow] - regressors[i]
for (j in 1:length(allmodels)) {
  m - allmodels[[j]]
  if (any(locations - grep(regex, rownames(m$coefficients {
if (length(locations)  1) {
  cat(Regex , regex,  has multiple matches at model , j, \n)
  return(NULL)
}
numbers[baserow,j] - as.numeric(sprintf(%.4f,
 m$coefficients[locations,1]))
numbers[baserow+1,j] - as.numeric(sprintf(%.4f,
   m$coefficients[locations,3]))
  }
}
baserow - baserow + 2
  }

# Now process the bottom.matter
  for (i in 1:length(bottom.matter)) {
if (bottom.matter[i] == sigma) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$sigma))
  }
  rownames(numbers)[baserow] - Residual std. dev.
  baserow - baserow + 1
}

if (bottom.matter[i] == r.squared) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$r.squared))
  }
  rownames(numbers)[baserow] - $R^2$
  baserow - baserow + 1
}

if (bottom.matter[i] == adj.r.squared) {
  for (j in 1:length(allmodels)) {
m - allmodels[[j]]
numbers[baserow,j] - as.numeric(sprintf(%.4f, m$adj.r.squared))
  }
  rownames(numbers)[baserow] - Adjusted $R^2$
  baserow - baserow + 1
}
  }
  numbers
}

# Given a 't' statistic, give me a string with
# '*' or '**' or '***' based on the 90%, 95% and 99% cutoffs on N(0,1)
stars - function(t) {
  t - abs(t)
  n - -1 + as.numeric(cut(t,breaks=c(-0.01,-qnorm(c(0.05, 0.025, 0.005)),Inf)))
  if (n == 0) {
return()
  } else {
return(paste($^\\mbox{,
 paste(rep(*, n), sep=, collapse=),
   

[R] Presentation of multiple models in one table using xtable

2006-08-15 Thread Ajay Narottam Shah
Consider this situation:
 x1 - runif(100); x2 - runif(100); y - 2 + 3*x1 - 4*x2 + rnorm(100)
 m1 - summary(lm(y ~ x1))
 m2 - summary(lm(y ~ x2))
 m3 - summary(lm(y ~ x1 + x2))

Now you have estimated 3 different competing models, and suppose you
want to present the set of models in one table. xtable(m1) is cool,
but doing that thrice would give us 3 different tables.

What I want is this one table:

---
M1 M2  M3
---
Intercept 0.0816 3.6292 2.2272
 (0.5533)   (0.2316)***(0.2385)***

x12.81512.7606
 (0.5533)***   (0.3193)***

x2  -4.2899-4.2580
(0.401)*** (0.3031)***

$\sigma_e$1.538  1.175  0.8873
$R^2$ 0.2089 0.5385 0.7393
---

How would one set about doing this? I am hoping that it's possible to
write a function xtable.multi.lm where one would say
xtable.multi.lm(m1,m2,m3) and get the above table.

My sense is there are three challenges:

1. How to write a general R function which eats a unpredictable number
   of summary(lm()) objects, and fill out a matrix with results such
   as the above.

2. How to get a good xtable(), with decimal alignment and with the ***
   stuff (actually, $^{***}$). Will there be any catch in dropping
   into mathmode for $R^2$? After each pair of lines, I'd like to have
   a \\[2mm] so as to get a nice spacing in the table.

3. This style of presentation seems relevant to a whole host of models
   - whether ARCH models or survival models - not just OLS
   regressions. It would be very nice if one supported all manner of
   model objects and not just what comes out of lm().

I'm happy to take a crack at writing this function, which should
ideally go back into the xtable library. But I don't have an idea on
how to go about these two questions. If you will guide me, I am happy
to work on it. :-)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SUMMARY: making contour plots using (x,y,z) data

2006-07-01 Thread Ajay Narottam Shah
Folks,

A few days ago, I had asked a question on this mailing list about
making a contour plot where a function z(x,y) is evaluated on a grid
of (x,y) points, and the data structure at hand is a simple table of
(x,y,z) points. As usual, R has wonderful resources (and subtle
complexity) in doing this, and the gurus of the list showed me the
way. Here's a complete working example. One might stumble on contour()
but lattice::contourplot() fits this task better since the former
requires a certain unusual data representation, while
lattice::contourplot() wants a more natural data representation.

-ans.

# Setup an interesting data matrix of (x,y,z) points:
points - structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 
0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 
0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 
0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 
0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.3, 0.3, 
0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 
0.3, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 
0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 
0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.45, 0.45, 0.45, 
0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.!
 45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55, 0.55, 
0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 
0.55, 0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 
0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.7, 
0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 
0.7, 0.7, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 
0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 
0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.85, 0.85, 
0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 
0.85, 0.85, 0.85, 0.85, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 
0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.95, 0.95, 0.95,!
  0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.9
5, 0.95, 0.95, 0.95, 0.95, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 
180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 
160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 
140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 
120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 
80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 
60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 
40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 
20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 
190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 
170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100!
 , 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 
80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 
60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 
40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 
20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 
190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 
170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 
150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 
130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 
110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 
90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 
70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 1, 1, 1, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, !
 1, 0.998, 0.124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0.998
, 0.71, 0.068, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0.998, 0.898, 
0.396, 0.058, 0.002, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0.998, 0.97, 0.726, 
0.268, 0.056, 

[R] Puzzled with contour()

2006-06-26 Thread Ajay Narottam Shah
Folks,

The contour() function wants x and y to be in increasing order. I have
a situation where I have a grid in x and y, and associated z values,
which looks like this:

  x   y z
  [1,] 0.00  20 1.000
  [2,] 0.00  30 1.000
  [3,] 0.00  40 1.000
  [4,] 0.00  50 1.000
  [5,] 0.00  60 1.000
  [6,] 0.00  70 1.000
  [7,] 0.00  80 0.000
  [8,] 0.00  90 0.000
  [9,] 0.00 100 0.000
 [10,] 0.00 110 0.000
 [11,] 0.00 120 0.000
 [12,] 0.00 130 0.000
 [13,] 0.00 140 0.000
 [14,] 0.00 150 0.000
 [15,] 0.00 160 0.000
 [16,] 0.00 170 0.000
 [17,] 0.00 180 0.000
 [18,] 0.00 190 0.000
 [19,] 0.00 200 0.000
 [20,] 0.05  20 1.000
 [21,] 0.05  30 1.000
 [22,] 0.05  40 1.000
 [23,] 0.05  50 1.000
 [24,] 0.05  60 0.998
 [25,] 0.05  70 0.124
 [26,] 0.05  80 0.000
 [27,] 0.05  90 0.000
 [28,] 0.05 100 0.000
 [29,] 0.05 110 0.000
 [30,] 0.05 120 0.000
 [31,] 0.05 130 0.000
 [32,] 0.05 140 0.000
 [33,] 0.05 150 0.000
 [34,] 0.05 160 0.000
 [35,] 0.05 170 0.000
 [36,] 0.05 180 0.000
 [37,] 0.05 190 0.000
 [38,] 0.05 200 0.000
 [39,] 0.10  20 1.000
 [40,] 0.10  30 1.000

This looks like a nice case where both x and y are in increasing
order. But contour() gets unhappy saying that he wants x and y in
increasing order.

Gnuplot generates pretty 3d pictures from such data, where you are
standing above a surface and looking down at it. How does one do that
in R?

Any help will be most appreciated. A dput() of my data object is :

structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 
0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.1, 0.1, 
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 
0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 
0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 
0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 
0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 
0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.35, 
0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 
0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.4, 0.4, 0.4, 0.4, 
0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 
0.4, 0.4, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 
0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.45, 0.5, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 
0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 0.55, 
0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.65, 0.65, 0.65, 
0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65, 
0.65, 0.65, 0.65, 0.65, 0.65, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 
0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.75, 
0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 
0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.8, 0.8, 0.8, 0.8, 
0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 
0.8, 0.8, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 
0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.9, 
0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 
0.9, 0.9, 0.9, 0.9, 0.9, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 
0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 
0.95, 0.95, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 
150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 
40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 
180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 
130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 
80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 
20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 
160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 
110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 
50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 
190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 
140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 
90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 
30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 
170, 180, 190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 
120, 130, 140, 150, 160, 170, 180, 190, 200, 20, 30, 40, 50, 
60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 
190, 200, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 
140, 150, 160, 170, 180, 190, 200, 20, 

[R] Ordered probit (Puzzled at MASS:polr())

2006-03-30 Thread Ajay Narottam Shah
This part, a vanilla probit, works perfectly --
  # Simulate from probit model --
  x1 - 2*runif(5000)
  x2 - 5*runif(5000)
  ystar - 7 + 3*x1 - 4*x2 + rnorm(5000)
  y - as.numeric(ystar0)
  table(y)

  # Estimation using micEcon::probit()
  library(micEcon)
  summary(probit(y ~ x1 + x2))

  # Estimation using glm() --
  summary(glm(y~x1+x2, family=binomial(probit)) )

But this part, where I move on to ordered probit, gives me wonky values --

  # Simulate from ordered probit model
  y - cut(ystar, breaks=c(-100, -5, 0, 5, 100))
  table(y)
  # Estimate ordered probit model --
  library(MASS)
  summary(polr(y ~ x1 + x2, method=probit))

I get coefficients which are quite far from c(7,3,-4) which were used
in simulation above and I get breaks which are quite far from
c(-5,0,5) which were used in simulation above. I wonder what I'm doing
wrong.

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Solved ordered probit question

2006-03-30 Thread Ajay Narottam Shah
A few minutes ago I had asked why this didn't seem to work:

# Simulate from probit model --
x1 - 2*runif(5000)
x2 - 5*runif(5000)
ystar - 7 + 3*x1 - 4*x2 + rnorm(5000)
y - cut(ystar, breaks=c(-100, -5, 0, 5, 100))
table(y)
library(MASS)
summary(polr(y ~ x1 + x2, method=probit))

A little thinking revealed why. In the ordered probit model, the
intercept 7 is not identified. So the estimation recovers -7 +
c(-5,0,5) = c(-12,-7,-2) For the rest, it works fine. The slopes
c(3,-4) are recovered correctly.

:-)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Interleaving elements of two vectors?

2006-03-05 Thread Ajay Narottam Shah
Suppose one has

x - c(1,  2,  7,  9,  14)
y - c(71, 72, 77)

How would one write an R function which alternates between elements of
one vector and the next? In other words, one wants

z - c(x[1], y[1], x[2], y[2], x[3], y[3], x[4], y[4], x[5], y[5])

I couldn't think of a clever and general way to write this. I am aware
of gdata::interleave() but it deals with interleaving rows of a data
frame, not elems of vectors.

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Prediction when using orthogonal polynomials in regression

2006-01-27 Thread Ajay Narottam Shah
Folks,

I'm doing fine with using orthogonal polynomials in a regression context:

  # We will deal with noisy data from the d.g.p. y = sin(x) + e
  x - seq(0, 3.141592654, length.out=20)
  y - sin(x) + 0.1*rnorm(10)
  d - lm(y ~ poly(x, 4))
  plot(x, y, type=l); lines(x, d$fitted.values, col=blue) # Fits great!
  all.equal(as.numeric(d$coefficients[1] + m %*% d$coefficients[2:5]),
as.numeric(d$fitted.values))

What I would like to do now is to apply the estimated model to do
prediction for a new set of x points e.g.
  xnew - seq(0,5,.5)

We know that the predicted values should be roughly sin(xnew). What I
don't know is: how do I use the object `d' to make predictions for
xnew?

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Making a markov transition matrix - more progress

2006-01-23 Thread Ajay Narottam Shah
I solved the problem in one more (and more elegant) way. So here's the
program again.

Where does R stand on the Anderson-Goodman test of 1957? I hunted
around and nobody seems to be doing this in R. Is it that there has
been much progress after 1957 and nobody uses it anymore?

# Problem statement:
#
# You are holding a dataset where firms are observed for a fixed
# (and small) set of years. The data is in long format - one
# record for one firm for one point in time. A state variable is
# observed (a factor).
# You wish to make a markov transition matrix about the time-series
# evolution of that state variable.

set.seed(1001)

# Raw data in long format --
raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
  year=c(83,   84,  85,  86,  83,  84,  85,  86),
  state=sample(1:3, 8, replace=TRUE)
  )
# Shift to wide format --
fixedup - reshape(raw, timevar=year, idvar=name, v.names=state,
   direction=wide)
# Now tediously build up records for an intermediate data structure
tmp - rbind(
 data.frame(prev=fixedup$state.83, new=fixedup$state.84),
 data.frame(prev=fixedup$state.84, new=fixedup$state.85),
 data.frame(prev=fixedup$state.85, new=fixedup$state.86)
 )
# This is a bad method because it is hardcoded to the specific values
# of year.
markov - table(tmp$prev, tmp$new)
markov

# Gabor's method --
transition.probabilities - function(D, timevar=year,
 idvar=name, statevar=state) {
  stage1 - merge(D, cbind(nextt=D[,timevar] + 1, D),
  by.x=timevar, by.y=nextt)
  v1 - paste(idvar,.x,sep=)
  v2 - paste(idvar,.y,sep=)
  stage2 - subset(stage1, stage1[,v1]==stage1[,v2])
  v1 - paste(statevar,.x,sep=)
  v2 - paste(statevar,.y,sep=)
  t(table(stage2[,v1], stage2[,v2]))
}

transition.probabilities(raw, timevar=year, idvar=name, statevar=state)

# The new and improved way --
library(msm)
statetable.msm(state, name, data=raw)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Making a markov transition matrix

2006-01-21 Thread Ajay Narottam Shah
Folks,

I am holding a dataset where firms are observed for a fixed (and
small) set of years. The data is in long format - one record for one
firm for one point in time. A state variable is observed (a factor).

I wish to make a markov transition matrix about the time-series
evolution of that state variable. The code below does this. But it's
hardcoded to the specific years that I observe. How might one
generalise this and make a general function which does this? :-)

   -ans.



set.seed(1001)

# Raw data in long format --
raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
  year=c(83,   84,  85,  86,  83,  84,  85,  86),
  state=sample(1:3, 8, replace=TRUE)
  )
# Shift to wide format --
fixedup - reshape(raw, timevar=year, idvar=name, v.names=state,
   direction=wide)
# Now tediously build up records for an intermediate data structure
try - rbind(
 data.frame(prev=fixedup$state.83, new=fixedup$state.84),
 data.frame(prev=fixedup$state.84, new=fixedup$state.85),
 data.frame(prev=fixedup$state.85, new=fixedup$state.86)
 )
# This is a bad method because it is hardcoded to the specific values
# of year.
markov - table(destination$prev.state, destination$new.state)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Making a markov transition matrix

2006-01-21 Thread Ajay Narottam Shah
On Sun, Jan 22, 2006 at 01:47:00PM +1100, [EMAIL PROTECTED] wrote:
 If this is a real problem, here is a slightly tidier version of the
 function I gave on R-help:
 
 transitionM - function(name, year, state) {
   raw - data.frame(name = name, state = state)[order(name, year), ]
   raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), 
 name == name.1)
   with(raw01, table(state, state.1))
 }
 
 Notice that this does assume there are 'no gaps' in the time series
 within firms, but it does not require that each firm have responses for
 the same set of years.
 
 Estimating the transition probability matrix when there are gaps within
 firms is a more interesting problem, both statistically and, when you
 figure that out, computationally.

With help from Gabor, here's my best effort. It should work even if
there are gaps in the timeseries within firms, and it allows different
firms to have responses in different years. It is wrapped up as a
function which eats a data frame. Somebody should put this function
into Hmisc or gtools or something of the sort.

# Problem statement:
#
# You are holding a dataset where firms are observed for a fixed
# (and small) set of years. The data is in long format - one
# record for one firm for one point in time. A state variable is
# observed (a factor).
# You wish to make a markov transition matrix about the time-series
# evolution of that state variable.

set.seed(1001)

# Raw data in long format --
raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
  year=c(83,   84,  85,  86,  83,  84,  85,  86),
  state=sample(1:3, 8, replace=TRUE)
  )

transition.probabilities - function(D, timevar=year,
 idvar=name, statevar=state) {
  merged - merge(D, cbind(nextt=D[,timevar] + 1, D),
by.x = c(timevar, idvar), by.y = c(nextt, idvar))
  t(table(merged[, grep(statevar, names(merged), value = TRUE)]))
}

transition.probabilities(raw, timevar=year, idvar=name, statevar=state)

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Tobit estimation?

2006-01-19 Thread Ajay Narottam Shah
Folks,

Based on
  http://www.biostat.wustl.edu/archives/html/s-news/1999-06/msg00125.html

I thought I should experiment with using survreg() to estimate tobit
models.

I start by simulating a data frame with 100 observations from a tobit model

 x1 - runif(100)
 x2 - runif(100)*3
 ystar - 2 + 3*x1 - 4*x2 + rnorm(100)*2
 y - ystar
 censored - ystar = 0
 y[censored] - 0
 D - data.frame(y, x1, x2)
 head(D)
  y x1x2
1 0.000 0.86848630 2.6275703
2 0.000 0.88675832 1.7199261
3 2.7559349 0.38341782 0.6247869
4 0.000 0.02679007 2.4617981
5 2.2634588 0.96974450 0.4345950
6 0.6563741 0.92623096 2.4983289

 # Estimate it
 library(survival)
 tfit - survreg(Surv(y, y0, type='left') ~ x1 + x2,
  data=D, dist='gaussian', link='identity')

It says:

  Error in survreg.control(...) : unused argument(s) (link ...)
  Execution halted

My competence on library(survival) is zero. Is it still the case that
it's possible to be clever and estimate the tobit model using
library(survival)?

I also saw the two-equation setup in the micEcon library. I haven't
yet understood when I would use that and when I would use a straight
estimation of a censored regression by MLE. Can someone shed light on
that? My situation is: Foreign investment on the Indian stock
market. Lots of firms have zero foreign investment. But many do have
foreign investment. I thought this is a natural tobit situation.

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
[EMAIL PROTECTED] http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Rpart -- using predict() when missing data is present?

2005-10-08 Thread Ajay Narottam Shah
I am doing

 library(rpart)
 m - rpart(y ~ x, D[insample,])
 D[outsample,]
y   x
8  0.78391922 0.579025591
9  0.06629211  NA
10 NA 0.001593063
   p - predict(m, newdata=D[9,])
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  : 
invalid result from na.action

How do I persuade him to give me NA since x is NA?

I looked at ?predict.rpart but didn't find any mention about NAs.

(In this problem, I can easily do it manually, but this is a part of
something bigger where I want him to be able to gracefully handle
prediction requests involving NA).

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Placing axes label strings closer to the graph?

2005-10-01 Thread Ajay Narottam Shah
Folks,

I have placed an example of a self-contained R program later in this
mail. It generates a file inflation.pdf. When I stare at the picture,
I see the X label string and Y label string sitting lonely and far
away from the axes. How can these distances be adjusted? I read ?par
and didn't find this directly.

I want to hang on to 2.8 x 2.8 inches as the overall size of graph,
but it will be nice if the inner square frame was bigger and thus the
axes were closer to the strings. I will be happy with (say)
par(mai=c(.6,.6,.2,.2)). But if I just do this, the X label string
and Y label string were lost.

Thanks,

-ans.

D - structure(list(dates = c(1983, 1984, 1985, 1986, 1987, 1988,
  1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996,
  1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
  2005), inflation = c(7.6, 12.1, 6.3, 6.8, 8.7,
   8.8, 9.4, 6.1, 11.6, 13.5, 9.6, 7.5, 10.1, 10.2,
   9.3, 7, 13.1, 3.4, 3.7, 4.3, 4.1, 3.7, 4)),
   .Names = c(dates, inflation), row.names =
   c(1, 2, 3, 4, 5, 6, 7, 8, 9,
 10, 11, 12, 13, 14, 15, 16, 17,
 18, 19, 20, 21, 22, 23), class =
   data.frame)

pdf(file=inflation.pdf, width=2.8, height=2.8, bg=cadetblue1)
par(mai=c(0.8,0.8,0.2,0.2))
plot(D$dates, D$inflation, xlab=X label string, ylab=Y label string,
 type=l, lwd=2, col=cadetblue4, cex.axis=0.6, cex.lab=0.6)

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question on lm(): When does R-squared come out as NA?

2005-09-28 Thread Ajay Narottam Shah
On Wed, Sep 28, 2005 at 08:23:59AM +0100, Prof Brian Ripley wrote:
 I've not seen a reply to this, nor ever seen it.
 Please make a reproducible example available (do see the posting guide).

It was a mistake on my part. Just in case others are able to
recognise the situation, what was going on was that all the objects
being used in the lm() call were zoo objects.

It is a mystery to me as to why everything should work correctly but
the R2 should break, but that happened. I found that when I switched
to coredata(z) all was well.

Gabor reminded me that I should really be using his dyn package so as
to avoid such situations. Sorry for the false alarm,

   -ans.

 lm(formula = rj ~ rM + rM.1 + rM.2 + rM.3 + rM.4)
 
 Residuals:
 1990-06-04 1994-11-14 1998-08-21 2002-03-13 2005-09-15
  -5.64672   -0.59596   -0.041430.554128.18229
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept) -0.003297   0.017603  -0.1870.851
 rM   0.845169   0.010522  80.322   2e-16 ***
 rM.1 0.116330   0.010692  10.880   2e-16 ***
 rM.2 0.002044   0.010686   0.1910.848
 rM.3 0.013181   0.010692   1.2330.218
 rM.4 0.009587   0.010525   0.9110.362
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 Residual standard error: 1.044 on 3532 degrees of freedom
 Multiple R-Squared:NA,   Adjusted R-squared:NA
 F-statistic:NA on 5 and 3532 DF,  p-value: NA
 
 
 rM.1, rM.2, etc. are lagged values of rM. The OLS seems fine in every
 respect, except that there is an NA as the multiple R-squared. I will
 be happy to give sample data to someone curious about what is going
 on. I wondered if this was a well-known pathology. The way I know it,
 if the data allows computation of (X'X)^{-1}, one can compute the R2.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Question on lm(): When does R-squared come out as NA?

2005-09-25 Thread Ajay Narottam Shah
I have a situation with a large dataset (3000+ observations), where
I'm doing lags as regressors, where I get:

Call:
lm(formula = rj ~ rM + rM.1 + rM.2 + rM.3 + rM.4)

Residuals:
1990-06-04 1994-11-14 1998-08-21 2002-03-13 2005-09-15 
  -5.64672   -0.59596   -0.041430.554128.18229 

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept) -0.003297   0.017603  -0.1870.851
rM   0.845169   0.010522  80.322   2e-16 ***
rM.1 0.116330   0.010692  10.880   2e-16 ***
rM.2 0.002044   0.010686   0.1910.848
rM.3 0.013181   0.010692   1.2330.218
rM.4 0.009587   0.010525   0.9110.362
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 1.044 on 3532 degrees of freedom
Multiple R-Squared:NA,  Adjusted R-squared:NA 
F-statistic:NA on 5 and 3532 DF,  p-value: NA 


rM.1, rM.2, etc. are lagged values of rM. The OLS seems fine in every
respect, except that there is an NA as the multiple R-squared. I will
be happy to give sample data to someone curious about what is going
on. I wondered if this was a well-known pathology. The way I know it,
if the data allows computation of (X'X)^{-1}, one can compute the R2.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with get.hist.quote() in tseries

2005-08-18 Thread Ajay Narottam Shah
When using get.hist.quote(), I find the dates are broken. This is with
R 2.1.1 on Mac OS X `panther'.

 library(tseries)
Loading required package: quadprog

'tseries' version: 0.9-27

'tseries' is a package for time series analysis and computational
finance.

See 'library(help=tseries)' for details.

 x - get.hist.quote(^VIX)
trying URL 
'http://chart.yahoo.com/table.csv?s=^VIXa=0b=02c=1991d=7e=18f=2005g=dq=qy=0z=^VIXx=.csv'
Content type 'text/csv' length unknown
opened URL
.. .. .. .. ..
.. .. .. .. ..
.. .. .. .. ..

downloaded 150Kb

 head(x)
[1] 26.62 27.93 27.19NANA 28.95
 x[1:10,]
   Open  High   Low Close
 [1,] 26.62 26.62 26.62 26.62
 [2,] 27.93 27.93 27.93 27.93
 [3,] 27.19 27.19 27.19 27.19
 [4,]NANANANA
 [5,]NANANANA
 [6,] 28.95 28.95 28.95 28.95
 [7,] 30.38 30.38 30.38 30.38
 [8,] 33.30 33.30 33.30 33.30
 [9,] 31.33 31.33 31.33 31.33
[10,] 32.63 32.63 32.63 32.63
 plot(x)
  (dates don't show).
 str(x)
 mts [1:5343, 1:4] 26.6 27.9 27.2   NA   NA ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:4] Open High Low Close
 - attr(*, tsp)= num [1:3] 33240 38582 1
 - attr(*, class)= chr [1:2] mts ts

I wonder what I'm doing wrong.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Extracting some rows from a data frame - lapses into a vector

2005-08-16 Thread Ajay Narottam Shah
I have a data frame with one column x:

 str(data)
`data.frame':   20 obs. of  1 variable:
 $ x: num  0.0495 0.0986 0.9662 0.7501 0.8621 ...

Normally, I know that the notation dataframe[indexes,] gives you a new
data frame which is the specified set of rows. But I find:

 str(data[1:10,])
 num [1:10] 0.0495 0.0986 0.9662 0.7501 0.8621 ...

Here, it looks like the operation
  data[1:10,]
has converted it from type data frame into a numeric vector. Why does
this happen, and what can I do about it?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Panel data handling (lags, growth rates)

2005-08-14 Thread Ajay Narottam Shah
I have written two functions which do useful things with panel data
a.k.a. longitudinal data, where one unit of observation (a firm or a
person or an animal) is observed on a uniform time grid:
   - The first function makes lagged values of variables of your choice.
   - The second function makes growth rates w.r.t. q observations ago,
  for variables of your choice.

These strike me as bread-and-butter tasks in dealing with panel
data. I couldn't find these functions in the standard R
libraries. They are presented in this email for two reasons. First,
it'll be great if R gurus can look at the code and propose
improvements. Second, it'll be great if some package-owner can adopt
these orphans :-) and make them available to the R community.

The two functions follow:

library(Hmisc)  # Am using Lag() in this.

# Task: For a supplied list of variables (the list `lagvars'),
#   make new columns in a dataset denoting lagged values.
#   You must supply `unitvar' which identifies the unit that's
#   repeatedly observed.
#   You must supply the name of the time variable `timevar'
#   and you must tell a list of the lags that interest you (`lags')
# Example:
#  paneldata.lags(A, person, year, c(v1,v2), lags=1:4)
paneldata.lags - function(X, unitvar, timevar, lagvars, lags=1) {
  stopifnot(length(lagvars)=1)
  X - X[order(X[,timevar]),]   # just in case it's not sorted.

  innertask - function(Y, lagvars, lags) {
E - labels - NULL
for (v in lagvars) {
  for (i in lags) {
E - cbind(E, Lag(Y[,v], i))
  }
  labels - c(labels, paste(v, .l, lags, sep=))
}
colnames(E) - labels
cbind(Y, E)
  }

  do.call(rbind, by(X, X[,unitvar], innertask, lagvars, lags))
}

# Task: For a supplied list of variables (the list `gvars'),
#   make new columns in a dataset denoting growth rates.
#   You must supply `unitvar' which identifies the unit that's
#   repeatedly observed.
#   You must supply the name of the time variable `timevar'
#   and you must tell the time periods Q (vector is ok) over which
#   the growth rates are computed.
paneldata.growthrates - function(X, unitvar, timevar, gvars, Q=1) {
  stopifnot(length(gvars)=1)
  X - X[order(X[,timevar]),]

  makegrowths - function(x, q) {
new - rep(NA, length(x))
for (t in (1+q):length(x)) {
  new[t] - 100*((x[t]/x[t-q])-1)
}
return(new)
  }

  innertask - function(Y, gvars, Q) {
E - labels - NULL
for (v in gvars) {
  for (q in Q) {
E - cbind(E, makegrowths(Y[,v], q))
  }
  labels - c(labels, paste(v, .g, Q, sep=))
}
colnames(E) - labels
cbind(Y, E)
  }

  do.call(rbind, by(X, X[,unitvar], innertask, gvars, Q))
}

Here's a demo of using them:

# A simple panel dataset
A - data.frame(year=rep(1980:1982,4),
person=factor(sort(rep(1:4,3))),
v1=round(rnorm(12),digits=2), v2=round(rnorm(12),digits=2))

# Demo of creating lags for both variables v1 and v2 --
paneldata.lags(A, person, year, c(v1,v2), lags=1:2)
# Demo of creating growth rates for v2 w.r.t. 1  2 years ago --
paneldata.growthrates(A, person, year, v2, Q=1:2)




Finally, I have a question. In a previous posting on this subject,
Gabor showed me that my code:

# Blast this function for all the values that A$person takes --
new - NULL
for (f in levels(A$person)) {
  new - rbind(new,
   make.additional.variables(subset(A, A$person==f),
 nlags=2, Q=1))
}
A - new; rm(new)

can be replaced by one do.call() (as used above). It's awesome, but I
don't understand it! :-) Could someone please explain how and why this
works? I know by() and I see that when I do by(A,A$x), it gives me a
list containing as many entries as are levels of A$x. I couldn't think
of a way to force all this into one data frame; the best I could do
was to do for (f in levels (A$person)) {} as shown here. The two
functions above are using do.call() as Gabor used them, and they're
awesome, but I don't understand why they work! The man page ?do.call
was a bit too cryptic and I couldn't comprehend it. Where can I learn
this stuff?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Puzzled at rpart prediction

2005-08-03 Thread Ajay Narottam Shah
I'm in a situation where I say:

 predict(m.rpart, newdata=D[N1+t,])
  0   1
173 0.8 0.2

which I interpret as meaning: an 80% chance of 0 and a 20% chance of
1. Okay. This is consistent with:

 predict(m.rpart, newdata=D[N1+t,], type=class)
[1] 0
Levels: 0 1

But I'm puzzled at the following. If I say:

 predict(m.rpart, newdata=D[N1+t,], type=vector)
173 
  1 

What gives?

I will be happy to packup a runnable demonstration for any of you, but
I wondered if it was just my lack of knowledge about type in
predict.rpart; wondered if there was a simple and logical explanation.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Misbehaviour of DSE

2005-07-13 Thread Ajay Narottam Shah
On Mon, Jul 11, 2005 at 08:27:40AM -0700, Rob J Goedman wrote:
 Ajay,
 
 After installing both setRNG (2004.4-1, source or binary) and dse  
 (2005.6-1, source only), it works fine.

Thanks! :-) Now dse1 works, but I get:

 library(dse2)
Warning message:
replacing previous import: acf in: namespaceImportFrom(self, asNamespace(ns)) 

Should I worry?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Puzzled at ifelse()

2005-07-12 Thread Ajay Narottam Shah
I have a situation where this is fine:

   if (length(x)15) {
  clever - rr.ATM(x, maxtrim=7)
} else {
  clever - rr.ATM(x)
}
   clever
  $ATM
  [1] 1848.929

  $sigma
  [1] 1.613415

  $trim
  [1] 0

  $lo
  [1] 1845.714

  $hi
  [1] 1852.143

But this variant, using ifelse(), breaks:

   clever - ifelse(length(x)15, rr.ATM(x, maxtrim=7), rr.ATM(x))
   clever
  [[1]]
  [1] 1848.929

What am I doing wrong?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Puzzled in utilising summary.lm() to obtain Var(x)

2005-06-14 Thread Ajay Narottam Shah
I have a program which is doing a few thousand runs of lm(). Suppose
it is a simple model
   y = a + bx1 + cx2 + e

I have the R object d where
   d - summary(lm(y ~ x1 + x2))

I would like to obtain Var(x2) out of d. How might I do it?

I can, of course, always do sd(x2). But it would be much more
convenient if I could snoop around the contents of summary.lm and
extract Var() out of it. I couldn't readily see how. Would you know
what would click?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Puzzled in utilising summary.lm() to obtain Var(x)

2005-06-14 Thread Ajay Narottam Shah
  I have a program which is doing a few thousand runs of lm(). Suppose
  it is a simple model
y = a + bx1 + cx2 + e
  
  I have the R object d where
d - summary(lm(y ~ x1 + x2))
  
  I would like to obtain Var(x2) out of d. How might I do it?
  
  I can, of course, always do sd(x2). But it would be much more
  convenient if I could snoop around the contents of summary.lm and
  extract Var() out of it. I couldn't readily see how. Would you know
  what would click?
 
 Is the question how to get the variance of a column of the
 model matrix for a model that is the sum of terms given only
 summary output and the column name but not the name of the
 data frame?  If that is it then try this:
 
 d - summary(lm(Sepal.Length ~ Sepal.Width, iris)) # test data
 var(model.matrix(eval(d$call))[,Sepal.Width])

Yes, this is indeed exactly what I was looking for :-) Thanks,

The eval() pays the full cost of running d$call?

 -ans.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] optim() does SANN, why not genetic algorithm (genoud)

2005-06-09 Thread Ajay Narottam Shah
It is a very nice touch that optim() offers SANN (simulated annealing)
as a random search algorithm.

The R community already has genoud - an implementation of a genetic
algorithm for search.

Wouldn't it be neat if optim() would additionally offer method=GA
where it internally uses code from genoud?

I glanced at the code and I find the work is all being done in

   res - .Internal(optim(par, fn1, gr1, method, con, lower, upper))

Should we do it as:

   if (method == GA) {
 # insert genoud calls here.
   } else {
 res - .Internal(optim(par, fn1, gr1, method, con, lower, upper))
   }

Who wrote optim, and who maintains it?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R and MLE

2005-06-07 Thread Ajay Narottam Shah
I learned R  MLE in the last few days. It is great! I wrote up my
explorations as

  http://www.mayin.org/ajayshah/KB/R/mle/mle.html

I will be most happy if R gurus will look at this and comment on how
it can be improved.



I have a few specific questions:

* Should one use optim() or should one use stats4::mle()?

  I felt that mle() wasn't adding much value compared with optim, and
  in addition, I wasn't able to marry my likelihood functions to it.

* One very nice feature of mle() is that you can specify a few
  parameters which should be fixed in the estimation. How can one
  persuade optim() to behave like that?

* Can one use deriv() and friends to get analytical derivatives of
  these likelihood functions? I found I wasn't able to make headway
  when I was using vector/matrix notation. I think the greatness of R
  lies in a lovely vector/matrix notation, and it seems like a shame
  to have to not use that when trying to do deriv().

* For iid problems, the computation of the likelihood function and
  it's gradient vector are inherently parallelisable. How would one go
  about doing this within R?

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] A performance anomaly

2005-06-06 Thread Ajay Narottam Shah
I wrote a simple log likelihood (for the ordinary least squares (OLS)
model), in two ways. The first works out the likelihood. The second
merely calls the first, but after transforming the variance parameter,
so as to allow an unconstrained maximisation. So the second suffers a
slight cost for one exp() and then it pays the cost of calling the first.

I did performance measurement. One would expect the second version to
be slightly (very slightly) slower than the first. But I am finding
this is not the case! The second version is slightly faster. How can
this be?

On my machine (ibook @ 1.2 GHz, OS X panther, R 2.1), I get:

 measurement(ols.lf1)
[1] 7.45
 measurement(ols.lf1.inlogs)
[1] 6.9

Here is the self-contained bug-demonstration code:

ols.lf1 - function(theta, y, X) {
  beta - theta[-1]
  sigma2 - theta[1]
  if (sigma2 = 0) return(NA)
  n - nrow(X)
  e - y - X%*%beta
  logl - ((-n/2)*log(2*pi)) - ((n/2)*log(sigma2)) - ((t(e)%*%e)/(2*sigma2))
  return(-logl) # since optim() does minimisation by default.
}

# A variant: Shift to logs for sigma2 so I can do unconstrained maximisation --
ols.lf1.inlogs - function(truetheta, y, X) {
  ols.lf1(c(exp(truetheta[1]), truetheta[-1]), y, X)
}

# We embark on one numerical experiment
set.seed(101)
X - cbind(1, runif(2))
theta.true - c(2,4,6) # error variance = 2, intercept = 4, slope = 6.
y - X %*% theta.true[-1] + sqrt(theta.true[1]) * rnorm(2)

measurement - function(f) {
  N - 200
  cost - system.time(
  for (i in 1:N) {
j - f(c(2,3,4), y, X)
  }
  )
  return(cost[1]*1000/N)# cost per lf in milliseconds
}

measurement(ols.lf1)
measurement(ols.lf1.inlogs)

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] The economist's term fixed effects model - plain lm() should work

2005-06-06 Thread Ajay Narottam Shah
 CAN YOU TELL ME HOW TO FIT FIXED-EFFECTS MODEL WITH R? THANK YOU!

Ordinary lm() might suffice. 

In the code below, I try to simulate a dataset from a standard
earnings regression, where log earnings is quadratic in experience,
but the intercept floats by education category - you have 4 intercepts
for 4 education categories.

I think this works as a simple implementation of the fixed effects
model in the sense of the term that is used in economics. I will be
happy to hear from R gurus about how this can be done better using
nlme or lme4.

 education - factor(sample(1:4,1000, replace=TRUE),
  labels=c(none, school, college, beyond))
 experience - 30*runif(1000)# experience from 0 to 30 years
 intercept - c(0.5,1,1.5,2)[education]
 log.earnings - intercept + 2*experience -
0.05*experience*experience + rnorm(1000)
 
 summary(lm(log.earnings ~ 
 -1 + education + experience + I(experience*experience)))

Call:
lm(formula = log.earnings ~ -1 + education + experience + I(experience * 
experience))

Residuals:
Min  1Q  Median  3Q Max 
-3.1118 -0.6525 -0.0134  0.6790  4.1763 

Coefficients:
 Estimate Std. Error  t value Pr(|t|)
educationnone   0.5888110  0.11019635.343 1.13e-07 ***
educationschool 0.9062839  0.11061038.193 7.76e-16 ***
educationcollege1.3662172  0.1141488   11.969   2e-16 ***
educationbeyond 1.9739789  0.1147356   17.205   2e-16 ***
experience  2.0026482  0.0148110  135.214   2e-16 ***
I(experience * experience) -0.0499795  0.0004753 -105.152   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 1.015 on 994 degrees of freedom
Multiple R-Squared: 0.9966, Adjusted R-squared: 0.9966 
F-statistic: 4.818e+04 on 6 and 994 DF,  p-value:  2.2e-16 


As you see, it pretty much recovers the true parameter vector -- it
gets   c(.588, .906, 1.366, 1.974, 2.003, -0.0499, 1.015)
compared with  c(.5,   1.,   1.5,   2, 2, -0.05,   1)

I think the standard errors and tests should also be quite fine.

Please do post an informative summary of your exploration on the
economist's notation about panel data (fixed effects and random
effects models) on the mailing list, when you are finished learning
this question. :-) We will all benefit. Hope this helps,

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Solved: linear regression example using MLE using optim()

2005-05-31 Thread Ajay Narottam Shah
Thanks to Gabor for setting me right. My code is as follows. I found
it useful for learning optim(), and you might find it similarly
useful. I will be most grateful if you can guide me on how to do this
better. Should one be using optim() or stats4::mle?

set.seed(101)   # For replicability

# Setup problem
X - cbind(1, runif(100))
theta.true - c(2,3,1)
y - X %*% theta.true[1:2] + rnorm(100)

# OLS --
d - summary(lm(y ~ X[,2]))
theta.ols - c(d$coefficients[,1], d$sigma)

# Switch to log sigma as the free parameter
theta.true[3] - log(theta.true[3])
theta.ols[3]  - log(theta.ols[3])

# OLS likelihood function --
ols.lf - function(theta, K, y, X) {
  beta - theta[1:K]
  sigma - exp(theta[K+1])
  e - (y - X%*%beta)/sigma
  logl - sum(log(dnorm(e)/sigma))
  return(logl)
}

# Experiment with the LF --
cat(Evaluating LogL at stupid theta : , ols.lf(c(1,2,1), 2, y, X), \n)
cat(Evaluating LogL at true params  : , ols.lf(theta.true, 2, y, X), \n)
cat(Evaluating LogL at OLS estimates: , ols.lf(theta.ols, 2, y, X), \n)

optim(c(1,2,3),  # Starting values
  ols.lf,# Likelihood function
  control=list(trace=1, fnscale=-1), # See ?optim for all controls
  K=2, y=y, X=X  # ... stuff into ols.lf()
 )
# He will use numerical derivatives by default.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R commandline editor question

2005-05-27 Thread Ajay Narottam Shah
I am using R 2.1 on Apple OS X.

When I get the  prompt, I find it works well with emacs commandline
editing. Keys like M-f C-k etc. work fine.

The one thing that I really yearn for, which is missing, is bracket
matching When I am doing something which ends in  it is really
useful to have emacs or vi-style bracket matching, so as to be able
to visually keep track of whether I have the correct matching
brackets, whether ( or { or [.

I'm sure this is possible. I will be most grateful if someone will
show the way :-) Thanks,

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R commandline editor question

2005-05-27 Thread Ajay Narottam Shah
 well ESS has such a facility.
 
 However, I think Mathematica has a super scheme: unbalanced brackets 
 show up
 in red, making them obvious.
 
 This is particularly good for spotting wrongly interleaved brackets, as 
 in
 
 ([  blah di blah  )]
 
 note bracket closure is out of order
 
 in which case both opening braces are highlighted in red: and the 
 system won't
 accept a newline until the closures are all correctly matched.
 
 Would anyone else find such a thing useful?
 
 Could the ESS team make something like this happen?

ess is great, but I was asking about the R commandline. I tend to
write a lot of stuff on the fly at the R commandline.

Yes, colours are a great way to deal with this, and this feature
should ideally be in ESS.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Catching an error with lm()

2005-05-24 Thread Ajay Narottam Shah
Folks,

I'm in a situation where I do a few thousand regressions, and some of
them are bad data. How do I get back an error value (return code such
as NULL) from lm(), instead of an error _message_?

Here's an example:

 x - c(NA, 3, 4)
 y - c(2, NA, NA)
 d - lm(y ~ x)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
0 (non-NA) cases
 str(d)
Error in str(d) : Object d not found

My question is: How do I force lm() to quietly send back an error code
like NULL? I am happy to then look at is.null(d) and handle it
accordingly. I am stuck because when things go wrong, there is no
object d to analyse!

(My production situation is a bit more complex. It is costly for me to
first verify that the data is sound. I'd like to toss it into lm() and
get an error code for null data).

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Summary: My question about factor levels versus factor labels.

2005-05-09 Thread Ajay Narottam Shah
Yesterday, I had asked for help on the list. Brian Ripley and Bruno
Falissard had most kindly responded to me. Here is the solution.

   factorlabels - c(School, College, Beyond)
   #   1  2 3

   education.man  - c(1,2,1,2,1,2,1,2)  # PROBLEM: Level 3 doesn't occur.
   education.wife - c(1,2,3,1,2,3,1,2)

   education.wife - factor(education.wife, labels=factorlabels)  # Is fine.

   # But this breaks --
   # education.man - factor(education.man,   labels=factorlabels)

   # Solution --
   education.man - factor(education.man, levels = c(1,2,3),
   labels=factorlabels)

   # So now we can do --
   a - rbind(table(education.wife), table(education.man))
   rownames(a) - c(Wife, Man)
   print(a)
   School College Beyond
  Wife  3   3  2
  Man   4   4  0

which was the table that I had wanted.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Need a factor level even though there are no observations

2005-05-08 Thread Ajay Narottam Shah
I'm in this situation:

 factorlabels - c(School, College, Beyond)

with data for 8 families:

 education.man  - c(1,2,1,2,1,2,1,2)   # Note : no 3 values
 education.wife - c(1,2,3,1,2,3,1,2)   # 1,2,3 are all present.

My goal is to create this table:

 School College  Beyond
   Husband   4  40
   Wife  3  32


How do I do this?

I can readily do:
 education.wife - factor(education.wife, labels=factorlabels)

But this breaks:
 education.man - factor(education.man,   labels=factorlabels)

because none of the families have a husband who went beyond college.

I get around this problem in a limited way by:
 cautiously - function(x, labels) {
   factor(x, labels=factorlabels[as.numeric(levels(factor(x)))])
 }
 education.man - cautiously(education.man, labels=factorlabels)

Now I get:

  table(education.man)
 School College 
  4   4 
  table(education.wife)
 School College  Beyond 
  3   3   2 

This is a pain because now the two tables are not conformable. How do
I get to my end goal, which is the table:

 School College  Beyond
   Husband   4  40
   Wife  3  32

In other words, how do I force education.man to have a factor with 3
levels - School College Beyond - even though there is no
observation in Beyond.

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R on Mac OS X: odd errors when doing install.packages()

2005-05-03 Thread Ajay Narottam Shah
Should I be worried? The installation seems to go through fine and
apparently nothing is broken. The errors I repeatedly get are like this:

g++ -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/include  -I/usr/
local/include  -DUNIX -DOPTIM -DNONR -fno-common  -g -O2 -c unif.cpp -o unif.o
g++ -bundle -flat_namespace -undefined suppress -L/usr/local/lib -o rgenoud.so c
hange_order.o eval.o evaluate.o frange_ran.o genoud.o gradient.o math.o multiply
.o numerics.o operators.o print_format.o rgenoud.o unif.o  -lcc_dynamic -framewo
rk R
ld: warning multiple definitions of symbol _xerbla_
/Library/Frameworks/R.framework/R(print.lo) definition of _xerbla_
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.fra
mework/Versions/A/libBLAS.dylib(single module) definition of _xerbla_
ld: warning multiple definitions of symbol _BC
/Library/Frameworks/R.framework/Versions/2.1.0/Resources/lib/libreadline.5.0.dyl
ib(terminal.so) definition of _BC
/usr/lib/libncurses.5.dylib(lib_termcap.o) definition of _BC
ld: warning multiple definitions of symbol _UP
/Library/Frameworks/R.framework/Versions/2.1.0/Resources/lib/libreadline.5.0.dyl
ib(terminal.so) definition of _UP
/usr/lib/libncurses.5.dylib(lib_termcap.o) definition of _UP
ld: warning multiple definitions of symbol _PC
/Library/Frameworks/R.framework/Versions/2.1.0/Resources/lib/libreadline.5.0.dyl
ib(terminal.so) definition of _PC
/usr/lib/libncurses.5.dylib(lib_tputs.o) definition of _PC
** R
** help
  Building/Updating help pages for package 'rgenoud'
 Formats: text html latex example 
  genoudtexthtmllatex   example
** building package indices ...
* DONE (rgenoud)

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html