Thanks Jake, I was actually just reading this:
http://www.cs.mcgill.ca/~dprecup/courses/ML/Lectures/ml-lecture16.pdf
and starting to put all the pieces together when you sent this. In the
pdf, the K-means example you gave is basically Hard EM for GMM while the
latter is Soft EM that I am seeing in the code.
I really appreciate your explanation too, everything is starting to make
more sense.
I think the problem I routinely have is transforming math to code. I just
figured out that:
weighted_X_sum * inverse_weights
is just the weighted mean of the samples!
On Sat, Sep 7, 2013 at 12:19 PM, Jacob Vanderplas <[email protected]
> wrote:
> David,
> Have you looked at the K Means algorithm? It uses a similar approach of a
> two-phase iteration to determine clustering. In K means you're looking for
> K cluster centers, such that when each point is assigned to the nearest
> cluster, the total of the distances from points to their clusters is
> minimized. The general idea, then, is that you need to determine two
> things: (1) cluster membership and (2) cluster locations. Solving for (1)
> & (2) simultaneously is difficult, but by iteratively fixing (1) and
> minimizing (2), then fixing (2) and minimizing (1), it can be shown that
> you eventually converge to the optimum solution. In this case the E-step
> corresponds to assigning cluster membership (which cluster is the point
> *expected* to lie in?) and the M-step corresponds to shifting the cluster
> locations (minimizing the cost function), *keeping the cluster assignments
> constant*. Keeping things constant might seem like an ad-hoc assumption,
> but it's actually a very well-founded method.
>
> GMM is a bit more complicated: in step (1), rather than a cluster ID for
> each point, you have an array of weights describing how much each cluster
> "contributes" to each point. In step (2), rather than having a set of
> cluster locations, you have a set of cluster centers, a set of cluster
> widths or width matrices (for N-dimensional gaussians) and a set of
> normalizations. Despite the added complexity, the E-M algorithm still
> works (which, frankly, still amazes me). In the E step, you hold the
> cluster characteristics constant and compute cluster weights for each
> point. In the M step, you fix the weights of each point and minimize the
> cost by changing the cluster characteristics. The math (and therefore the
> code) is a bit more opaque, but at each step it essentially comes down to
> minimizing a Gaussian likelihood function while keeping either the cluster
> weights or the cluster characteristics constant.
>
> I hope that explanation is helpful...
> Jake
>
>
> On Sat, Sep 7, 2013 at 8:38 AM, David Reed <[email protected]> wrote:
>
>> ok, this is what I can gather from the code:
>>
>> ================================
>>
>> Expectation Step
>> --------------------------
>> Calculate the loglikelihood and responsibilities for each sample.
>> a. for each sample the loglikelihood is calculated for each gaussian
>> and then sum across models (logprob = logsumexp(lpr, axis=1))
>> b. responsibilites is just lpr, but normalized to 1 across each row?
>> This is figure out which gaussian model is responsible for the
>> loglikelihood of this sample correct?
>>
>> Maximization Step
>> ---------------------------
>> Need help here, can't quite figure out whats going on
>>
>> ================================
>>
>> As you can see the maximization step is where things fall apart for me.
>> I've followed many tutorials, including this (
>> https://www.youtube.com/watch?v=AnbiNaVp3eQ). They eventually involve
>> sufficient statistics, etc. which seem to make some sense at some high
>> level, but I can never connect that theory to code, in particular the code
>> in the GMM module.
>>
>> Can someone help me fill in the blanks? Sorry if this is a little open
>> ended
>>
>> -Dave
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 3, 2013 at 4:51 PM, Olivier Grisel
>> <[email protected]>wrote:
>>
>>> 2013/9/3 David Reed <[email protected]>:
>>> > Thanks Jake, I will look into this more. Is this the proper forum to
>>> ask
>>> > questions about its implementation if I am confused?
>>>
>>> Yes it is.
>>>
>>> --
>>> Olivier
>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
>>> Discover the easy way to master current and previous Microsoft
>>> technologies
>>> and advance your career. Get an incredible 1,500+ hours of step-by-step
>>> tutorial videos with LearnDevNow. Subscribe today and save!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
>> Discover the easy way to master current and previous Microsoft
>> technologies
>> and advance your career. Get an incredible 1,500+ hours of step-by-step
>> tutorial videos with LearnDevNow. Subscribe today and save!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general