As Matthew Brand pointed out, there are fundamental limits on what any
learner can do with very small data sets.
But have you tried the Bayesian Dirichlet score? It gives exact posterior
probabilities when the structure-specific prior knowledge is represented by
the conjugate Dirichlet distribution. I've compared belief net learning
with naive Bayes and a number of simple heuristic learning algorithms, on
data sets with very small numbers of observations, and have found that
Bayes net learning with the Bayesian Dirichlet score outperforms the other
methods on holdout samples. It helps on very small data sets to use models
that allow for local structure (see Friedman, UAI 97 I think, but don't
have the ref handy).
Kathy Laskey
At 5:56 PM +0100 8/17/99, Corney, David wrote:
>Hi,
>
>I'm a PhD student looking for some advice about model selection of Bayes
>Belief Networks.
>
>I'm building BBNs using very small data sets (e.g. 20 records), and I'm
>using the Bayesian Information Criterion (BIC) to measure the quality of
>each network. The BIC is equivalent to the minimum description length (MDL)
>and has two terms:
>
>BIC = -log p(data|model) + 0.5 * log(N) * model_dimension
>
>or, more simply:
>BIC = error + complexity_penalty
>
>As the number of records (N) increases, the size of the error measure
>increases faster than the complexity penalty. This is because the error
>measure is from the *sum* of the log-likelihoods for each data point,
>whereas the complexity is based on *log* of the number of records, N. Thus
>for large N, the error dominates and accurate models are selected.
>
>For my 20-record data set, with around 4-10 variables, I typically get error
>measures of 8-15 and model dimensions of 20-30. So the BIC becomes:
>e.g. BIC = 8 + 0.5*log(20)*15
> = 8 + 22.47
>Thus the complexity penalty (22.47) is far larger than the error (8), so the
>system always selects the simplest model possible, regardless of error.
>
>So my questions are:
>1) is the above understanding of BIC correct?
>2) how can BIC be applied to very small data sets?
>3) is there an alternative measure appropriate for such small sets?
>
>Thanks in advance for any help/pointers/advice etc.
>
>Regards,
>
>David
>
>
>
>David Corney
>Dept. Computer Science
>University College London
>www.cs.ucl.ac.uk/staff/D.Corney