Hello everyone,

My name is Daniel Duckworth, and I am the person responsible for this pull
request.  To give you a quick idea of my background, I'm going to be a
first year PhD student at University of Massachusetts, Amherst this fall,
and I've done (not really publication quality) research in Bayesian
Inference at U.C. Berkeley.  Like Gael's email, this is a long article, but
I wanted to address (1) What sklearn.kalman is, (2) Why I wrote
sklearn.kalman, (3) Gael's concerns, and (4) my own opinion on whether it
should be merged in, and if so, how.

What
====

The sklearn.kalman module current implements the standard Kalman Filter[1],
Kalman Smoother, and EM Algorithm as applied to a Markov model with hidden
states and noisy observations.  If you are familiar with sklearn.hmm's
Hidden Markov Model, the model makes precisely the same assumptions, but
all distributions are Gaussian instead of Discrete.

The Kalman Filter and Smoother intend to estimate some "hidden state" given
a sequence of noisy measurements based on some given or learned model
parameters.  They are the continuous-space equivalent to the "forward" part
and "forward-backward" part of the Forward-Backward Algorithm[2] used in
the sklearn.hmm, and thus actually recover a *distribution* over hidden
states for each time step in a sequence.

The Expectation-Maximization (EM) algorithm is the same conceptually as
that used in estimating the HMM's model parameters, though its
implementation differs wildly from in practice.  it is an iterative method
that runs the Kalman Smoother, then updates parameter values, and repeats
until convergence.

Why
===

In robotics, the first (and probably last) algorithm used for estimating
the state of a robot is the Kalman Filter or some variation thereof.  Most
people end up implementing on their own, as the entirety of the algorithm
can be implemented in no more than 10 lines of linear algebra code.  The
Kalman Smoother is a bit more complex, but manageable, and the EM algorithm
is rarely implemented unless necessary.  It saddened me that I could find
no easy-to-use reference implementation that had all the bells and whistles
we usually ignore when taught in class, so I endeavored to write my own.  I
had been taught how to implement the Kalman Filter and Kalman Smoother in
multiple classes during my undergraduate program, so I thought it would be
an easy contribution to the community. sklearn.kalman is the result of my
work

I initially intended to help sklearn by fixing/speeding up the code in
sklearn.hmm, but I saw that it was already taken care of (more or less) so
I thought, "why not write the continuous-space equivalent?"  I had already
contributed to sklearn with some simple patches, and I thought this would
be a good chance to add something more substantial.  Since the Kalman
Filter/Smoother + EM algorithm is conceptually the same as what's already
in sklearn.hmm, I assumed that getting it merged would just be an issue of
code standards.

This pull request only encompasses the vanilla Kalman Filter, Kalman
Smoother, and EM algorithm.  While these methods are all commonly used in
tracking/state estimation, also common are Particle Filters[3] and
variations of the Kalman Filter, namely the Extended and Unscented Kalman
Filter[4].  I currently have a very detailed implementation of the former
in another language, and have already finished my first-round
implementation the latter in the 'unscented' fork[5] on my github account.
 In the future, I would like to see a general framework for doing inference
in arbitrary Bayesian Networks and Conditional Random Fields, but this is a
long, long way off and would require much work.

Response
=======

a. Speed: the EM algorithm requires O(Td^3) complexity at least in each
iteration (here, T is the length of the observation sequence and d is the
size of the state space), which is to say the EM algorithm is slow by
design.  It is, however, the standard method used in learning Kalman Filter
parameters, and is what I have been taught most.  I had never heard of
spectral methods for parameter estimation before Gael mentioned it, and I'd
like to take a look to see how it actually performs.

b. Usefulness of the EM algorithm: the EM algorithm iteratively attempts to
*maximize the likelihood of the observation sequence*.  This means that it
is extremely sensitive to overfitting when insufficient data is present,
and is also why I give the user the option to specify the parameter
manually or disable it for particular parameters at initialization.  The EM
algorithm does not typically have any form of regularization, but a
variation of it can make use of "fake observations" to bias its output
towards a particular value.  This is called Maximum A Posterior (or MAP)
estimation.

c. In relation to sklearn's API: as I hear it discussed more and more, I
believe that the Kalman Filter is not a very good fit for the current API.
 While the EM algorithm maps easily to the fit() method, the Filter and
Smoother make far less sense as the predict() method in that they produce
sequences of probability distributions rather than point estimates.  If
sklearn had an API for predicting sequences, I think the Kalman Filter
would fit far more easily.  Perhaps I could redesign the Kalman Filter to
take a list of independent observation sequences, treating each as a single
"sample" in the sklearn sense.

Conclusion
========

In conclusion, I would be happy to have my code merged into sklearn, but in
my opinion, it would only make sense if an API is designed around
prediction for sequences or if a framework for arbitrary graphical models
is founded.  If there is insufficient time for either, I will happily
repackage my code outside of sklearn and release it on my own.  I make
minimal use of sklearn's other modules so it should not be difficult, but I
am unsure of how I should go about extracting the components I need from a
licensing perspective.  Until I start my PhD, I cannot say how much time I
will have on hand, else I would volunteer to draft a spec for sequence
classification myself.

I'd like to thank Gael, Oliver, and Alexandre for their comments and
suggestions on my pull request.  I honestly think this is the most
accessible code I've ever written (take that for what it's worth, haha)
thanks to their input, and I look forward to hearing everyone's replies.

- Daniel Duckworth


[1]: http://en.wikipedia.org/wiki/Kalman_filter
[2]: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm
[3]: http://en.wikipedia.org/wiki/Particle_filter
[4]: http://en.wikipedia.org/wiki/Kalman_filter#Unscented_Kalman_filter
[5]: https://github.com/duckworthd/scikit-learn/tree/unscented
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to