>
> I agree with Fabian that this might be a bug, as these models are
> still relatively untested, and I'd love to see a testing script
> showing the errors.
>

I'll submit a bug report and script today.


>
> On Sat, Oct 8, 2011 at 09:21, Martin Fergie <[email protected]> wrote:
> > Hi,
> >
> > I've been experimenting with the variational clustering method introduced
> in
> > the latest version of scikits-learn.
> < ... snip ... >
> > Martin
> >
> > [1] Old faithful dataset:
>
> Glancing at these data it seems that the variables live in completely
> different orders of magnitude. Have you tried scaling/centering the
> data? Because of the way priors are used the DPGMM and VBGMM models
> are unfortunately biased towards zero, as you noticed, and this might
> be part of the reason why bad things are happening.
>
> Also, have you tried using more components to see what happens?
>
>
I've scaled the data to be zero mean and unit variance, and tried a variety
of covariance structures and values for alpha. I had a brief poke in the the
VBGMM._do_mstep method and there doesn't seem to be a
if 'w' in params:
    ...
block where the weights are updated. There is an _update_concentration() but
this updates gamma, not the weights themselves. I'm not quite sure what
gamma is in your implementation, from what I've been able to understand is
it the degrees of freedom parameter on the Wishart prior over the component
covariance? I'm less familiar with the Dirichlet process clustering, but for
the variational clustering I'd expect to see something like
pi_k = (alpha_k + N_k) / (no_components * alpha_0 + N)
this is from (Bishop, Pattern Recognition and Machine Learning).

Sorry if I've got the wrong end of the stick somewhere!

Thanks for your help and providing this implementation, these clustering
methods have been the one thing that I have been missing in python machine
learning tools!

Martin

On 12 October 2011 12:09, <
[email protected]> wrote:

> Send Scikit-learn-general mailing list submissions to
>        [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
>        [email protected]
>
> You can reach the person managing the list at
>        [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>   1. Re: Question about mixture.VBGMM and      mixture.DPGMM
>      (Alexandre Passos)
>   2. L?on Bottou SGD version 2.0 is out: Averaged SGD (Olivier Grisel)
>   3. Re: L?on Bottou SGD version 2.0 is out: Averaged SGD
>      (Peter Prettenhofer)
>   4. Re: L?on Bottou SGD version 2.0 is out: Averaged SGD
>      (Mathieu Blondel)
>   5. Re: Faster hierarchical clustering (Conrad Lee)
>   6. Re: Faster hierarchical clustering (Gael Varoquaux)
>   7. Re: L?on Bottou SGD version 2.0 is out: Averaged SGD
>      (Alexandre Passos)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 11 Oct 2011 07:20:33 -0400
> From: Alexandre Passos <[email protected]>
> Subject: Re: [Scikit-learn-general] Question about mixture.VBGMM and
>        mixture.DPGMM
> To: [email protected]
> Message-ID:
>        <cafownbbvtrpofw-okcrfyrxjzxyp7g2b0bnfhzdsspgzmp7...@mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> I agree with Fabian that this might be a bug, as these models are
> still relatively untested, and I'd love to see a testing script
> showing the errors.
>
> On Sat, Oct 8, 2011 at 09:21, Martin Fergie <[email protected]> wrote:
> > Hi,
> >
> > I've been experimenting with the variational clustering method introduced
> in
> > the latest version of scikits-learn. I'm having trouble getting these
> models
> > to fit properly. I've been experimenting with two small data sets, one is
> > the 'old faithful' data set [1], and the other is a 4 component data set
> > from [2]. I'm using the script given on the scikits website [3] but have
> > replaced the example data with the data sets above.
> >
> > Clustering using EM (with mixture.GMM)? seems to give reasonably reliable
> > results on both data sets. However when I use DPGMM and VBGMM the
> clusters
> > are heavily biased towards 0, and often over generalise. What is more
> > concerning, is that the component weights don't appear to change during
> > training. For example, a 2 component DPGMM/VBGMM will have weights = [0.5
> > 0.5] where as the GMM will have weights = [0.64,? 0.36].
> > Both models behave like this with default initialisation parameters and I
> > have tried a range of alphas.
> >
> > I have a matlab implemention of variational Bayes EM (non Dirichlet
> process)
> > which is able to cluster this data effectively.
> >
> > Does anyone have any experience with these models and may be able to shed
> > some light on the problems I am having? I can send a tar of the code/data
> > I'm using to anyone who is interested.
> >
> > Thanks, for such a useful toolkit!
> >
> > Martin
> >
> > [1] Old faithful dataset:
>
> Glancing at these data it seems that the variables live in completely
> different orders of magnitude. Have you tried scaling/centering the
> data? Because of the way priors are used the DPGMM and VBGMM models
> are unfortunately biased towards zero, as you noticed, and this might
> be part of the reason why bad things are happening.
>
> Also, have you tried using more components to see what happens?
>
> >
> http://research.microsoft.com/en-us/um/people/cmbishop/prml/webdatasets/datasets.htm
> > [2] Figueiredo and Jain, Unsupervised Learning of Finite Mixture Models,
> > PAMI 2002
> > [3]
> >
> http://scikit-learn.sourceforge.net/stable/auto_examples/mixture/plot_gmm.html#example-mixture-plot-gmm-py
> >
> >
> >
> ------------------------------------------------------------------------------
> > All of the data generated in your IT infrastructure is seriously
> valuable.
> > Why? It contains a definitive record of application performance, security
> > threats, fraudulent activity, and more. Splunk takes this data and makes
> > sense of it. IT sense. And common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy2
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
>
> --
> ?- Alexandre
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 11 Oct 2011 23:43:53 +0200
> From: Olivier Grisel <[email protected]>
> Subject: [Scikit-learn-general] L?on Bottou SGD version 2.0 is out:
>        Averaged SGD
> To: scikit-learn-general <[email protected]>
> Message-ID:
>        <cafve7k454ucvqfu6k8g3ntb8t07smevlxc2l3hv25cyekud...@mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> I think people here (e.g. @pprett) might be interested in the
> following new release of L?on Bottou's influential project:
>
>  http://leon.bottou.org/projects/sgd
>
> It did not know about Averaged SGD. Will have to read the cited references.
>
> I wonder if those results are transposable on online clustering and /
> or online matrix factorization.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 12 Oct 2011 07:52:54 +0200
> From: Peter Prettenhofer <[email protected]>
> Subject: Re: [Scikit-learn-general] L?on Bottou SGD version 2.0 is
>        out: Averaged SGD
> To: [email protected]
> Message-ID:
>        <CAD-7Wjoi_=5wzetrbm6stsodmrwvvbsdev4jyemzkmt9spg...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks Olivier! I really appreciate your updates on these issues!
>
> As far as I can tell Averaged SGD is similar to the Averaged
> Perceptron in that you simply average the weight vectors after each
> iteration (i.e. training sample). Of course, this can be done very
> efficiently in constant time and memory. AFAIK you cannot use this
> strategy for L1 regularization, tough.
>
> The results in [Xu 2011] are pretty impressive given the simplicity of
> the algorithm - we should definitely give it a try. Unfortunately, the
> algorithm shares some of the undesirable properties of SGD: you need a
> number of heuristics to make it work (e.g. learning rate schedule,
> averaging start point t_0)
>
> best,
>  Peter
>
> [Xu 2011] http://arxiv.org/pdf/1107.2490v1
>
> 2011/10/11 Olivier Grisel <[email protected]>:
> > I think people here (e.g. @pprett) might be interested in the
> > following new release of L?on Bottou's influential project:
> >
> > ?http://leon.bottou.org/projects/sgd
> >
> > It did not know about Averaged SGD. Will have to read the cited
> references.
> >
> > I wonder if those results are transposable on online clustering and /
> > or online matrix factorization.
> >
> > --
> > Olivier
> > http://twitter.com/ogrisel - http://github.com/ogrisel
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT infrastructure contains a
> > definitive record of customers, application performance, security
> > threats, fraudulent activity and more. Splunk takes this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2d-oct
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
>
> --
> Peter Prettenhofer
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 12 Oct 2011 15:56:25 +0900
> From: Mathieu Blondel <[email protected]>
> Subject: Re: [Scikit-learn-general] L?on Bottou SGD version 2.0 is
>        out: Averaged SGD
> To: [email protected]
> Message-ID:
>        <CAOKSrLzPMHg1vRPdMc7Chkww3+GCLKHf0n1TA=roradf1uf...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Oct 12, 2011 at 2:52 PM, Peter Prettenhofer
> <[email protected]> wrote:
>
> > The results in [Xu 2011] are pretty impressive given the simplicity of
> > the algorithm - we should definitely give it a try. Unfortunately, the
> > algorithm shares some of the undesirable properties of SGD: you need a
> > number of heuristics to make it work (e.g. learning rate schedule,
> > averaging start point t_0)
>
> Indeed, averaging has been used for ages in the Perceptron community.
> CRFsuite has been supporting averaging for quite some time too I
> think. ASGD's results look indeed impressive, though.
>
> Mathieu
>
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 12 Oct 2011 10:27:23 +0100
> From: Conrad Lee <[email protected]>
> Subject: Re: [Scikit-learn-general] Faster hierarchical clustering
> To: [email protected]
> Message-ID:
>        <CAJ42Ns9_L2np1RXnc3vJkxvRh8Qe=EUDEcd1hS+dTEmFPd=y...@mail.gmail.com
> >
> Content-Type: text/plain; charset="utf-8"
>
> I got in touch with the author of fastcluster, Daniel M?llner, and he says
> that he will release the code under the BSD-2 license if we agree to
> integrate it into scikit-learn.
>
> Even better, he suggests (and I agree), would be to make this change
> upstream, and replace scipy.cluster's hierarchy code with M?llner's faster
> code.  Then scikit-learn could benefit simply by building on
> scipy.cluster.hierarchy.  This would mean that scikit-learn relies on
> possibly hard-to-maintain C++ code. Oliver mentioned:
>
> ...there is a policy of trying to stay away from adding more C++ in
> > the scikit code base because of the maintenance cost inherent to C++.
>
>
> So I'm not sure how this relates to depending on a C++ implementation via
> scipy.
>
> M?llner's code has the same interface as the scipy.cluster.hierarchy
> implementation, so perhaps the integration with scipy would not be so
> difficult.  I have no experience working with the scipy team, so I have a
> question: where is the appropriate place to run this suggestion by them?
>  Should I just post my suggestion on the *Scipy-Dev *mailing list?
>
> Conrad
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 6
> Date: Wed, 12 Oct 2011 11:32:38 +0200
> From: Gael Varoquaux <[email protected]>
> Subject: Re: [Scikit-learn-general] Faster hierarchical clustering
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=iso-8859-1
>
> On Wed, Oct 12, 2011 at 10:27:23AM +0100, Conrad Lee wrote:
> >    Even better, he suggests (and I agree), would be to make this change
> >    upstream, and replace scipy.cluster's hierarchy code with?M?llner's
> faster
> >    code.
>
> I agree.
>
> >?   Then scikit-learn could benefit simply by building on
> >    scipy.cluster.hierarchy.
>
> It already does: we benched the two implementation and have a decision
> rule that chooses the most efficient one.
>
> >?   This would mean that scikit-learn relies on
> >    possibly hard-to-maintain C++ code.
>
> IMHO the hierarchical clustering code of scipy is already hard to
> maintain.
>
> >?   I have no?experience?working with the scipy team, so I have a
> >    question: where is the appropriate place to run this suggestion by
> them?
> >    ?Should I just post my suggestion on the?Scipy-Dev mailing list?
>
> Yes, scipy-dev is the right place to hold such discussion. Also a pull
> request to scipy should be prepared. Ideally, even if you don't submit a
> pull request, it would be interesting to point to the code, so that the
> discussion can be led on technical basis.
>
> Cheers,
>
> Ga?l
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 12 Oct 2011 07:09:24 -0400
> From: Alexandre Passos <[email protected]>
> Subject: Re: [Scikit-learn-general] L?on Bottou SGD version 2.0 is
>        out: Averaged SGD
> To: [email protected], [email protected]
> Message-ID:
>        <cafownbb7hwf3zvaxh7qeezgzdhv6hkq96_2ddelb3a2xfsk...@mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> On Wed, Oct 12, 2011 at 02:56, Mathieu Blondel <[email protected]>
> wrote:
> > On Wed, Oct 12, 2011 at 2:52 PM, Peter Prettenhofer
> > <[email protected]> wrote:
> >
> >> The results in [Xu 2011] are pretty impressive given the simplicity of
> >> the algorithm - we should definitely give it a try. Unfortunately, the
> >> algorithm shares some of the undesirable properties of SGD: you need a
> >> number of heuristics to make it work (e.g. learning rate schedule,
> >> averaging start point t_0)
> >
> > Indeed, averaging has been used for ages in the Perceptron community.
> > CRFsuite has been supporting averaging for quite some time too I
> > think. ASGD's results look indeed impressive, though.
>
> Does anyone know how to implement parameter averaging without touching
> every feature at every iteration? With things like CRFs you easily
> have millions of features, only a few hundred active per example, so
> it's a pain to touch everything all the time. In the page he mentions
> that
>
>    Both the stochastic gradient weights and the averaged weights are
>    represented using a linear transformation that yields efficiency gains
>    for sparse training data.
>
> Does anyone know what format this is?
>
> --
> ?- Alexandre
>
>
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 21, Issue 21
> ****************************************************
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to