Hi, R users

  I am trying to use EM algorithmto estimate a latent class model of discrete 
choice. The basic model is a logit model which has two variables X=(X1,X2) and 
the utility is defined as v = Beta*X. There are 3 classes of individuals each 
has its own Beta values, so Beta is a 3*2 matrix. For each individual, (there 
are 1000), he make a choice between two randomly generated choice alternatives 
and has to choose one. The proportion of classes is set to (0.2,0.5,0.3). based 
on this seting, the proportion parameters alpha always end up with one of them 
goes to 0 in EM estimation. I checked the code and can not find what's the 
problem. Maybe this is a bit off topic in R help, but I code it in R and hence 
in hope that some one here can figure it out. 

Plus, I have a normal mixture model with success EM result. 

Thanks in advance.

Following is the code:

#=========================

X = array(,dim=c(1000,2))#first alternative
b= c(2.3,0.7,0.3,0.7,0.1,0.8)
alpha = c(0.3,0.5,0.2)#proportion

X[,1] = runif(1000,min=5,max=20)
X[,2] = runif(1000,min=5,max=20)
##bi = 0.3
Y = array(,dim=c(1000,2))#second alternative

Y[,1] = runif(1000,min=5,max=20)
Y[,2] = runif(1000,min=5,max=20)

V11 = X[1:300,]%*%b[1:2] 
V12 = Y[1:300,]%*%b[1:2] 
V21 = X[301:800,]%*%b[3:4]
V22 = Y[301:800,]%*%b[3:4]
V31 = X[801:1000,]%*%b[5:6]
V32 = Y[801:1000,]%*%b[5:6]

V1 = rbind(V11,V21,V31)+rnorm(1000)
V2 = rbind(V12,V22,V32)+rnorm(1000)

oo = exp(V1)+exp(V2)

P1 = exp(V1)/oo
P2 = exp(V2)/oo
D = ifelse(V1>V2,0,1)

#second part of Q function
Q2 = function(Beta,H){
 probs = logProbInd(Beta)
 li = sum(H*probs)
 return(li)
}


logProbInd=function(Beta){#X, Y, D values take as is from environment
 dim(Beta) = c(2,3)
 Beta = t(Beta)
 probs = matrix(,nrow = 1000, ncol = 3)
 for(i in 1:3){
  v1 = X%*%Beta[(i-1)*2+1:2]
  v2 = Y%*%Beta[(i-1)*2+1:2]
  p1 = exp(v1)/(exp(v1)+exp(v2))
  p2 = exp(v2)/(exp(v1)+exp(v2))
  probs[,i] = ifelse (D==0,log(p1),log(p2))
  
 }
 
 return (probs)
}

#H [individuals][class]
E_step = function(alpha,Beta){#calc posterior of H
 
 tmpH = matrix(,nrow = 1000,ncol =3)
 lprobs = logProbInd(Beta)
 for(i in 1:3){#classes
  tmpH[,i] = alpha[i]*exp(lprobs[,i])
 }
 H = tmpH /apply(tmpH,1,sum)
 return( H)
}

M_step = function(H,Beta){
 #first part use direct estimation
 aita = apply(H,2,sum)/1000
 
 opt.c = optim(Beta,Q2,H=H,method="BFGS",control = list(fnscale = -1))
 
 lik = opt.c$value
 return(c(aita,opt.c$par,lik))
}


#EM loops
alf = c(0.33,0.33,0.33)
Bt = seq(0.1,0.6,by=0.1)

sc = rep(-8000,5)
i=1

while(T){
 H= E_step(alpha=alf,Beta=Bt)
 theta = M_step(H=H,Beta=Bt)
 print(theta)
 alf = theta[1:3]
 Bt = theta[4:9]
 
 #check convergence
 sc[(i%%5) +1] = theta[10]
 i=i+1
 #if((theta[10] - mean(sc) ) < 0.0005) break
}


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to