>The classification problem that I am working on has 37 input
> variables, 15 of them are categorical and the rest of them 
> are continuous.

Is the structure you are working with a Naive Bayes classifier

   C
 / |  \
X1 X2 X3

or a form of logistic regression

X1 X2 X3
\  |  /
 \ | / 
   C

where all arcs point down in both figures? (C is a discrete class label,
and Xi are continuous or discrete feeatures.)

The first model requires learning the prior distribution P(C), using EM,
and the conditional distributions P(Xi|C).
e.g., if Xi is Gaussian, just compute the sample mean and variance of
feature i for each class label, and combine them using the posterior on
C as the mixing weights. For distributions which are not in the
exponential family, you will need to use an iterative M step.

The second model only requires fitting the softmax function
P(C|X1,...,Xn), which can be done using iteratively reweighted least
squares.

I recommend the following article for a discussion of these two models:

@techreport{Jordan95,
 title = "Why the logistic function? {A} tutorial discussion on
                  probabilities and neural networks",
  author = "M. I. Jordan",
  institution = "MIT Computational Cognitive Science Report",
  number = 9503,
  month = "August",
  year = 1995,
  annote = "Discusses the relative merits of causal (BN) and
                  diagnostic (NN) models"
}

Kevin

Reply via email to