Hi, all.

       Given Matt Brand's e-mail below and the mention of MDL, Rissanen,
minimum message length (MML) and Kolmogorov complexity in that e-mail below,
I invite readers who are interested in these issues to look out for a
forthcoming issue of the Computer Journal, which is sort of meant to be
a 30th anniversary issue of the seminal Wallace and Boulton 1968 MML paper.

The issue will contain several articles, discussions and rejoinders, but,
of course :-) , one of the ones I encourage you to read is:

Wallace, C.S. and D.L. Dowe.  Minimum Message Length and Kolmogorov Complexity,
accepted, in press, to appear, Computer Journal.

The issue certainly should contain much up-to-date research in this area.


Regards.          - David Dowe.

>From [EMAIL PROTECTED]  Wed Aug 18 07:05:00 1999
Date: Tue, 17 Aug 1999 16:55:25 -0400
From: Matthew Brand <[EMAIL PROTECTED]>
To: "Corney, David" <[EMAIL PROTECTED]>
CC: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
Subject: Re: BBN and model complexity

David --

BIC is a large-sample approximation that arises in the study of the
marginal likelihood and the deviance.  It is known to select for overly
simple explanations of small data-sets.  The original MDL theory, which
you are invoking, often bottoms out in approximations with this problem,
and Rissanen has moved on to stochastic complexity.  Related
density-estimation approaches include minimum message length and entropy
minimization.  All invoke the code-length minimizing prescriptions of
Kolmogorov complexity theory, but differ in their relations to this
non-computable gold standard.  The gold standard itself is somewhat
overrated, since careful analysis of the current mathematics shows that
the P(minimum K-complexity coding is best) can vary over several orders
of magnitude.  

In short, minimizing expected coding length is the best policy but the
guarantees are weak and some of the more popular approximations cannot
be trusted with a small dataset, which, unfortunately, is exactly the
circumstance which calls for a good complexity penalty.

More specifically, with very limited training data you should expect an
ideal learner to infer a model that is simpler than the true
data-generating mechanism, simply because the data does not encompass
all the interesting behavior the data-generating mechanism is capable
of.

BIC uses just the first few terms of a Taylor expansion; the analytic
form and meaning of the next few terms has been explored by
Balasubramanian and might profitably be applied to your problem.

-- Matt

Reply via email to