Hi there,
I have data on earnings of 12000 individuals at two points in time. I intend to
construct a transition matrix, where the typical element, p_ij, gives the
probability that an individual ends at the j-th decile of the earnings
distribution given that he was was initially at the i-th decile. Thus, this is
a bi-stochastic matrix. The problem is that the income data is nearly discrete
in the sense that many individuals hold the same income level at each point.
For instance, there are 1400 individuals who earned in period-one the minimum
positive income (say, $100). Therefore, in the first decile there will be more
than 10% of the individuals. This happens for both periods, and for a few
income levels. As a result, the transition matrix won't have both rows and
columns summing to one.
The solution I've found for this problem was to generate a uniform random
vector, with entries ranging from, say, -.0001 to .0001, and ad it to both
earnings vectors and compute the transition matrix. Repeat the procedure 1000
times and get the mean of the resulting matrices. The thing is I'm totally new
to simulations. Here's part of what I'm trying to do:
# X is a two-comun data frame. column 1 is the period-one individuals' earnings
and column two is the period-two.
n <- nrow(X) #12000
sim <- runif(n, -.0001, .0001)
X <- X + sim
q <- 10 # in order to compute deciles. it could be
quintiles, quartiles, whatever
p <- seq(0,1,1/q)
f.x <- quantile(X[,1], p, names=F)
f.y <- quantile(X[,2], p, names=F)
f.x[1] <- 0; f.y[1] <- 0
a <- cut(X[,1], f.x, right=T)
b <- cut(X[,2], f.y, right=T)
P <- table(a,b)
P <- P/rowSums(P)
P
The point is that I don't know how to store the matrix P efficiently so that it
can be averaged with the remianing 999.
Also, suggestions on how to solve the problem of discretized data are welcome.
Thanks a lot,
Dimitri
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html