I have had the same experience with the mentioned algorithm of Binder,
Koller, Russell and Kanazawa - it usually converges to local maxima of
the log-likelihood, which are far from optimal. I compared it with EM
on the problem of learning transition probabilities between hidden
state nodes in temporal probabilistic networks - more details can be
found in:

Nikovski, D. (1998). Learning stationary temporal probabilistic
networks. Conference on Automated Learning and Discovery, Pittsburgh,
June 11-13, 1998. http://www.cs.cmu.edu/~danieln/conald.ps

I have also experimented with continuous observation nodes, whose
emission densities are Gaussian. The algorithm of Binder, Koller,
Russell and Kanazawa works very well if the initial values of the
means and standard deviations are sufficiently close to their true
values, but converges to local maxima of log-likelihood when they are
initialized randomly. K-means clustering of observations usually
solves the problem, though, as has been found with learning hidden
Markov models too.

Again, learning in the CPTs of hidden nodes proceeds very slowly,
orders of magnitude slower than learning of emission probabilities,
and often converges to a value different than the true one (although
in its general vicinity).

Cheers,

--daniel


Frank Wittig <[EMAIL PROTECTED]> wrote:

>Dear colleagues,
>
>We have been using the Adaptive Probabilistic Networks method
>of Binder, Koller, Russell and Kanazawa (1997) to learn Bayesian
>networks with hidden variables. (The context is the modeling of
>unobservable properties of computer users.) We use
>Polak-Ribi=E8re's method for choosing the direction of the next
>hill-climbing step.
>
>Although the results so far have been reasonable, the learned
>nets seem to be local optima that could be improved upon. Also,
>the algorithm seems to concentrate its learning mainly on the
>CPTs that link a hidden variable with its observable children,
>leaving the CPT for the hidden variable largely unchanged, even
>when this initial CPT does not represent a particularly promising
>choice.
>
>We'd be interested to hear of methods that others have used to
>get good results with this type of algorithm - for example, by
>specifying constraints that the learned network ought to satisfy.
>
>We are also interested in experiences with implementations of
>the EM algorithm for learning Bayesian networks and how these or
>similar problems have been solved in that context.
>
>Thanks in advance,
>
>       Frank
>
>Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997).
>   Adaptive probabilistic networks with hidden variables.
>   Machine Learning, 29, 213-244.
>
>
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>Frank Wittig                Email: [EMAIL PROTECTED]
>University of Saarbruecken  WWW:   http://w5.cs.uni-sb.de/~fwittig
>P.O. Box 15 11 50           Phone: +49 681 302 4135  =20
>D-66041 Saarbruecken        Fax:   +49 681 302 4136  =20
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Reply via email to