Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Craig Brautigam
Thansk I 'll check it out!


From: Alex Herbert 
Sent: Monday, March 11, 2024 3:59 PM
To: Commons Users List 
Subject: Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization 
only 1 dimension

[You don't often get email from alex.d.herb...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On Mon, 11 Mar 2024 at 20:58, Craig Brautigam 
wrote:

> Alex,
>
> Thanks for getting back.
>
> Sorry for my lack of knowledge. What does IIUC stand for?
>

If I Understand Correctly.

I've tried the GaussianMixture with 1 component and it produces a fit
similar to Matlab with 1 dimension and 1 component. It will be very
inefficient. However it will output a likelihood value comparable to other
mixture models which is what you require.

I've pushed the change to the CM4 master branch and you should be able to
pick it up with the snapshots repo.

Alex


> The code I'm trying to port uses fmgmdist exclusively and tries anywhere
> from 1 to 5 components, and then choose the best component by selecting the
> lowest AIC value from the 5 tries as documented here:
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Ffitgmdist.html=05%7C02%7Ccbrautigam%40icr-team.com%7C7aa71ff8dd25453e35c908dc421695a5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457911998129556%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=oD3BLmqcKO544gNg7PEtRZWB7kqrNvGKT7b1iq25fmQ%3D=0
> So in some cases the code selects more than 1 component and other cases it
> selects just 1 component, all data dependent.  Also I see that scipy's
> Gaussian Mixture Model allows from 1 component as well:
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fscikit-learn.org%2Fstable%2Fmodules%2Fgenerated%2Fsklearn.mixture.GaussianMixture.html=05%7C02%7Ccbrautigam%40icr-team.com%7C7aa71ff8dd25453e35c908dc421695a5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457911998129556%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=LEHFxdE8NJE%2FLjdnT%2FL0k0Lv3VHRLb%2F2sgXY4IjNh1k%3D=0.
> So while there might be more efficient ways of calculating, it does seem
> from a completeness and usability perspective, that
> MultivariateNormalMixtureExpectationMaximization should allow for a user to
> select just 1 component for a GMM, like other language libraries.  It
> appears this is not all that uncommon being that both scipy and matlab
> support it.
>
> I will look into your suggestion with GaussianCurveFitter.
>
> thanks again!
>
> -Craig
>
>
>
>
> 
> From: Alex Herbert 
> Sent: Monday, March 11, 2024 1:23 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> On Mon, 11 Mar 2024 at 18:06, Craig Brautigam 
> wrote:
>
> > Just bumping this up...Would it be possible to get a fix for this?
> >
>
> You are requesting the estimate method of a mixture to support 1 component.
> This is not a mixture. IIUC this is the equivalent of fitting a normal
> distribution to your data with maximum likelihood estimation.
>
> Validated in Matlab:
>
> >> X = normrnd(4.125, 0.25, 100, 1);
> >> fitdist(X,'norm')
> ans =
>   NormalDistribution
>
>   Normal distribution
>mu = 4.13821   [4.08781, 4.1886]
> sigma = 0.25396   [0.222979, 0.29502]
>
> >> GMModel = fitgmdist(X,1)
> GMModel =
> Gaussian mixture distribution with 1 components in 1 dimensions
> Component 1:
> Mixing proportion: 1.00
> Mean:4.1382
> >> sqrt(GMModel.Sigma)
> ans =
> 0.2527
>
> I've tried this with a few different input X and the fit is slightly
> different for the sigma so Matlab is not simply calling fitdist from within
> fitgmdist (at least not with the defaults).
>
> So you can get around this issue by fitting the data with a Normal
> distribution. It will be a lot faster than the
> MultivariateNormalMixtureExpectationMaximization class.
>
> You could try this:
>
> org.apache.commons.math4.legacy.fitting.GaussianCurveFitter
>
> However that class does fit a normalisation factor in addition to the mean
> and standard deviation. If you only wish to fit mean and standard deviation
> then you could create your own fitter 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Alex Herbert
On Mon, 11 Mar 2024 at 20:58, Craig Brautigam 
wrote:

> Alex,
>
> Thanks for getting back.
>
> Sorry for my lack of knowledge. What does IIUC stand for?
>

If I Understand Correctly.

I've tried the GaussianMixture with 1 component and it produces a fit
similar to Matlab with 1 dimension and 1 component. It will be very
inefficient. However it will output a likelihood value comparable to other
mixture models which is what you require.

I've pushed the change to the CM4 master branch and you should be able to
pick it up with the snapshots repo.

Alex


> The code I'm trying to port uses fmgmdist exclusively and tries anywhere
> from 1 to 5 components, and then choose the best component by selecting the
> lowest AIC value from the 5 tries as documented here:
> https://www.mathworks.com/help/stats/fitgmdist.html
> So in some cases the code selects more than 1 component and other cases it
> selects just 1 component, all data dependent.  Also I see that scipy's
> Gaussian Mixture Model allows from 1 component as well:
> https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html.
> So while there might be more efficient ways of calculating, it does seem
> from a completeness and usability perspective, that
> MultivariateNormalMixtureExpectationMaximization should allow for a user to
> select just 1 component for a GMM, like other language libraries.  It
> appears this is not all that uncommon being that both scipy and matlab
> support it.
>
> I will look into your suggestion with GaussianCurveFitter.
>
> thanks again!
>
> -Craig
>
>
>
>
> 
> From: Alex Herbert 
> Sent: Monday, March 11, 2024 1:23 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> On Mon, 11 Mar 2024 at 18:06, Craig Brautigam 
> wrote:
>
> > Just bumping this up...Would it be possible to get a fix for this?
> >
>
> You are requesting the estimate method of a mixture to support 1 component.
> This is not a mixture. IIUC this is the equivalent of fitting a normal
> distribution to your data with maximum likelihood estimation.
>
> Validated in Matlab:
>
> >> X = normrnd(4.125, 0.25, 100, 1);
> >> fitdist(X,'norm')
> ans =
>   NormalDistribution
>
>   Normal distribution
>mu = 4.13821   [4.08781, 4.1886]
> sigma = 0.25396   [0.222979, 0.29502]
>
> >> GMModel = fitgmdist(X,1)
> GMModel =
> Gaussian mixture distribution with 1 components in 1 dimensions
> Component 1:
> Mixing proportion: 1.00
> Mean:4.1382
> >> sqrt(GMModel.Sigma)
> ans =
> 0.2527
>
> I've tried this with a few different input X and the fit is slightly
> different for the sigma so Matlab is not simply calling fitdist from within
> fitgmdist (at least not with the defaults).
>
> So you can get around this issue by fitting the data with a Normal
> distribution. It will be a lot faster than the
> MultivariateNormalMixtureExpectationMaximization class.
>
> You could try this:
>
> org.apache.commons.math4.legacy.fitting.GaussianCurveFitter
>
> However that class does fit a normalisation factor in addition to the mean
> and standard deviation. If you only wish to fit mean and standard deviation
> then you could create your own fitter based on the CurveFitter class that
> is extended by GaussianCurveFitter.
>
> See if this works for you.
>
> Regards,
>
> Alex
>
>
>
> >
> >
> > Thx!
> >
> >
> > 
> > From: Craig Brautigam 
> > Sent: Thursday, March 7, 2024 2:47 PM
> > To: Commons Users List 
> > Subject: Re: [External] - Re:
> > MultivariateNormalMixtureExpectationMaximization only 1 dimension
> >
> > Alex,
> >
> > Your fix seems to be working however, there is a similar problem in
> > MultivariateNormalMixtureExpectationMaximization.estimate().  The number
> of
> > components must be at least 2.  I think that you should be able to try to
> > estimate with 1 component if you want to.  The matlab function fitgmdist
> > does allow for  1 component, and much of our data does in fact best fit
> to
> > only 1 component.
> >
> > Thoughts on fixing that restriction as well?
> >
> >
> > Thx!
> > Craig
> >
> >
> > 
> > From: Alex Herbert 
> > Sent: Tuesday, March 5, 2024 11:35 AM
> > To: Commons Users List 
> > Subject: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization
> > only 1 dimension
> >
> > [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> > is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Craig Brautigam
Alex,

Thanks for getting back.

Sorry for my lack of knowledge. What does IIUC stand for?
The code I'm trying to port uses fmgmdist exclusively and tries anywhere from 1 
to 5 components, and then choose the best component by selecting the lowest AIC 
value from the 5 tries as documented here: 
https://www.mathworks.com/help/stats/fitgmdist.html
So in some cases the code selects more than 1 component and other cases it 
selects just 1 component, all data dependent.  Also I see that scipy's Gaussian 
Mixture Model allows from 1 component as 
well:https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html.
  So while there might be more efficient ways of calculating, it does seem from 
a completeness and usability perspective, that  
MultivariateNormalMixtureExpectationMaximization should allow for a user to 
select just 1 component for a GMM, like other language libraries.  It appears 
this is not all that uncommon being that both scipy and matlab support it.

I will look into your suggestion with GaussianCurveFitter.

thanks again!

-Craig





From: Alex Herbert 
Sent: Monday, March 11, 2024 1:23 PM
To: Commons Users List 
Subject: Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization 
only 1 dimension

[You don't often get email from alex.d.herb...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


On Mon, 11 Mar 2024 at 18:06, Craig Brautigam 
wrote:

> Just bumping this up...Would it be possible to get a fix for this?
>

You are requesting the estimate method of a mixture to support 1 component.
This is not a mixture. IIUC this is the equivalent of fitting a normal
distribution to your data with maximum likelihood estimation.

Validated in Matlab:

>> X = normrnd(4.125, 0.25, 100, 1);
>> fitdist(X,'norm')
ans =
  NormalDistribution

  Normal distribution
   mu = 4.13821   [4.08781, 4.1886]
sigma = 0.25396   [0.222979, 0.29502]

>> GMModel = fitgmdist(X,1)
GMModel =
Gaussian mixture distribution with 1 components in 1 dimensions
Component 1:
Mixing proportion: 1.00
Mean:4.1382
>> sqrt(GMModel.Sigma)
ans =
0.2527

I've tried this with a few different input X and the fit is slightly
different for the sigma so Matlab is not simply calling fitdist from within
fitgmdist (at least not with the defaults).

So you can get around this issue by fitting the data with a Normal
distribution. It will be a lot faster than the
MultivariateNormalMixtureExpectationMaximization class.

You could try this:

org.apache.commons.math4.legacy.fitting.GaussianCurveFitter

However that class does fit a normalisation factor in addition to the mean
and standard deviation. If you only wish to fit mean and standard deviation
then you could create your own fitter based on the CurveFitter class that
is extended by GaussianCurveFitter.

See if this works for you.

Regards,

Alex



>
>
> Thx!
>
>
> 
> From: Craig Brautigam 
> Sent: Thursday, March 7, 2024 2:47 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> Alex,
>
> Your fix seems to be working however, there is a similar problem in
> MultivariateNormalMixtureExpectationMaximization.estimate().  The number of
> components must be at least 2.  I think that you should be able to try to
> estimate with 1 component if you want to.  The matlab function fitgmdist
> does allow for  1 component, and much of our data does in fact best fit to
> only 1 component.
>
> Thoughts on fixing that restriction as well?
>
>
> Thx!
> Craig
>
>
> 
> From: Alex Herbert 
> Sent: Tuesday, March 5, 2024 11:35 AM
> To: Commons Users List 
> Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization
> only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> I have updated the master branch with a change to allow fitting a mixture
> with 1-column data.
>
> You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo
> if you configure your build to add the snapshot repository (see [1]).
>
> Let us know if this works for you. Note that if you only require fitting 1
> column data then you would be able to optimise the implementation as it
> will no longer require matrix inversion to compute the mixture probability
> distribution. The CM implementation can act as a reference point for your
> own implementation if desired.
>
> Regards,
>
> Alex
>
> [1]
>
> 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Alex Herbert
On Mon, 11 Mar 2024 at 18:06, Craig Brautigam 
wrote:

> Just bumping this up...Would it be possible to get a fix for this?
>

You are requesting the estimate method of a mixture to support 1 component.
This is not a mixture. IIUC this is the equivalent of fitting a normal
distribution to your data with maximum likelihood estimation.

Validated in Matlab:

>> X = normrnd(4.125, 0.25, 100, 1);
>> fitdist(X,'norm')
ans =
  NormalDistribution

  Normal distribution
   mu = 4.13821   [4.08781, 4.1886]
sigma = 0.25396   [0.222979, 0.29502]

>> GMModel = fitgmdist(X,1)
GMModel =
Gaussian mixture distribution with 1 components in 1 dimensions
Component 1:
Mixing proportion: 1.00
Mean:4.1382
>> sqrt(GMModel.Sigma)
ans =
0.2527

I've tried this with a few different input X and the fit is slightly
different for the sigma so Matlab is not simply calling fitdist from within
fitgmdist (at least not with the defaults).

So you can get around this issue by fitting the data with a Normal
distribution. It will be a lot faster than the
MultivariateNormalMixtureExpectationMaximization class.

You could try this:

org.apache.commons.math4.legacy.fitting.GaussianCurveFitter

However that class does fit a normalisation factor in addition to the mean
and standard deviation. If you only wish to fit mean and standard deviation
then you could create your own fitter based on the CurveFitter class that
is extended by GaussianCurveFitter.

See if this works for you.

Regards,

Alex



>
>
> Thx!
>
>
> 
> From: Craig Brautigam 
> Sent: Thursday, March 7, 2024 2:47 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> Alex,
>
> Your fix seems to be working however, there is a similar problem in
> MultivariateNormalMixtureExpectationMaximization.estimate().  The number of
> components must be at least 2.  I think that you should be able to try to
> estimate with 1 component if you want to.  The matlab function fitgmdist
> does allow for  1 component, and much of our data does in fact best fit to
> only 1 component.
>
> Thoughts on fixing that restriction as well?
>
>
> Thx!
> Craig
>
>
> 
> From: Alex Herbert 
> Sent: Tuesday, March 5, 2024 11:35 AM
> To: Commons Users List 
> Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization
> only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> I have updated the master branch with a change to allow fitting a mixture
> with 1-column data.
>
> You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo
> if you configure your build to add the snapshot repository (see [1]).
>
> Let us know if this works for you. Note that if you only require fitting 1
> column data then you would be able to optimise the implementation as it
> will no longer require matrix inversion to compute the mixture probability
> distribution. The CM implementation can act as a reference point for your
> own implementation if desired.
>
> Regards,
>
> Alex
>
> [1]
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=pV5bELVx3%2FwNJ0LADZVQHv4Mf0UZEWq5GdwTFJTTyP0%3D=0
> <
> https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/
> >
>
> On Tue, 5 Mar 2024 at 00:06, Alex Herbert 
> wrote:
>
> > Hi,
> >
> > I think this is a bug in the
> > MultivariateNormalMixtureExpectationMaximization class. When I update the
> > code to allow 1 column in the rows it outputs a similar fit to matlab.
> > Here's an example of Matlab:
> >
> > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)]
> > GMModel = fitgmdist(X,2);
> >
> > >> GMModel.mu
> > ans =
> > 0.0737
> > 3.0914
> > >> GMModel.ComponentProportion
> > ans =
> > 0.67500.3250
> > >> GMModel.Sigma
> > ans(:,:,1) =
> > 1.0505
> > ans(:,:,2) =
> > 1.6593
> >
> > I pasted the same X data into a test for
> > MultivariateNormalMixtureExpectationMaximization that had been updated to
> > allow data with a single column and get the following fit:
> >
> > MultivariateNormalMixtureExpectationMaximization fitter
> > = new MultivariateNormalMixtureExpectationMaximization(data);
> >
> > MixtureMultivariateNormalDistribution initialMix
> > = 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Gary Gregory
I'll let someone familiar with that code base opine 

Gary

On Mon, Mar 11, 2024, 2:48 PM Craig Brautigam 
wrote:

> Gary,
> Gary is this something you would agree is an issue?  I don't want to go
> through the effort of putting an official issue out there if it's something
> I'm misunderstanding.  Also I definitely do not have enough math chops to
> change this type of code, I'm simply trying to do a port from matlab to
> java.
>
> -Craig
>
>
> 
> From: Gary Gregory 
> Sent: Monday, March 11, 2024 12:27 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> [You don't often get email from garydgreg...@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> Hi Craig,
>
> In general, the fastest way to address an issue you care about is to
> provide a PR in GitHub, with a unit test 
>
> Gary
>
>
> On Mon, Mar 11, 2024, 2:06 PM Craig Brautigam 
> wrote:
>
> > Just bumping this up...Would it be possible to get a fix for this?
> >
> >
> > Thx!
> >
> >
> > 
> > From: Craig Brautigam 
> > Sent: Thursday, March 7, 2024 2:47 PM
> > To: Commons Users List 
> > Subject: Re: [External] - Re:
> > MultivariateNormalMixtureExpectationMaximization only 1 dimension
> >
> > Alex,
> >
> > Your fix seems to be working however, there is a similar problem in
> > MultivariateNormalMixtureExpectationMaximization.estimate().  The number
> of
> > components must be at least 2.  I think that you should be able to try to
> > estimate with 1 component if you want to.  The matlab function fitgmdist
> > does allow for  1 component, and much of our data does in fact best fit
> to
> > only 1 component.
> >
> > Thoughts on fixing that restriction as well?
> >
> >
> > Thx!
> > Craig
> >
> >
> > 
> > From: Alex Herbert 
> > Sent: Tuesday, March 5, 2024 11:35 AM
> > To: Commons Users List 
> > Subject: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization
> > only 1 dimension
> >
> > [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> > is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you recognize the sender and know
> > the content is safe.
> >
> >
> > I have updated the master branch with a change to allow fitting a mixture
> > with 1-column data.
> >
> > You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots
> repo
> > if you configure your build to add the snapshot repository (see [1]).
> >
> > Let us know if this works for you. Note that if you only require fitting
> 1
> > column data then you would be able to optimise the implementation as it
> > will no longer require matrix inversion to compute the mixture
> probability
> > distribution. The CM implementation can act as a reference point for your
> > own implementation if desired.
> >
> > Regards,
> >
> > Alex
> >
> > [1]
> >
> >
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Ce02533ac3d844a550c6408dc41f8fbd5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457784848076395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=l%2FPU0SIG1EftKyBiM4uV13PExBK%2FfPPThFePB5YGT1s%3D=0
> <
> https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/
> >
> > <
> >
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Ce02533ac3d844a550c6408dc41f8fbd5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457784848076395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=l%2FPU0SIG1EftKyBiM4uV13PExBK%2FfPPThFePB5YGT1s%3D=0
> <
> https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/
> >
> > >
> >
> > On Tue, 5 Mar 2024 at 00:06, Alex Herbert 
> > wrote:
> >
> > > Hi,
> > >
> > > I think this is a bug in the
> > > MultivariateNormalMixtureExpectationMaximization class. When I update
> the
> > > code to allow 1 column in the rows it outputs a similar fit to matlab.
> > > Here's an example of Matlab:
> > >
> > > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)]
> > > GMModel = fitgmdist(X,2);
> > >
> > > >> GMModel.mu
> > > ans =
> > > 0.0737
> > > 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Craig Brautigam
Gary,
Gary is this something you would agree is an issue?  I don't want to go through 
the effort of putting an official issue out there if it's something I'm 
misunderstanding.  Also I definitely do not have enough math chops to change 
this type of code, I'm simply trying to do a port from matlab to java.

-Craig



From: Gary Gregory 
Sent: Monday, March 11, 2024 12:27 PM
To: Commons Users List 
Subject: Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization 
only 1 dimension

[You don't often get email from garydgreg...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


Hi Craig,

In general, the fastest way to address an issue you care about is to
provide a PR in GitHub, with a unit test 

Gary


On Mon, Mar 11, 2024, 2:06 PM Craig Brautigam 
wrote:

> Just bumping this up...Would it be possible to get a fix for this?
>
>
> Thx!
>
>
> 
> From: Craig Brautigam 
> Sent: Thursday, March 7, 2024 2:47 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> Alex,
>
> Your fix seems to be working however, there is a similar problem in
> MultivariateNormalMixtureExpectationMaximization.estimate().  The number of
> components must be at least 2.  I think that you should be able to try to
> estimate with 1 component if you want to.  The matlab function fitgmdist
> does allow for  1 component, and much of our data does in fact best fit to
> only 1 component.
>
> Thoughts on fixing that restriction as well?
>
>
> Thx!
> Craig
>
>
> 
> From: Alex Herbert 
> Sent: Tuesday, March 5, 2024 11:35 AM
> To: Commons Users List 
> Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization
> only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> I have updated the master branch with a change to allow fitting a mixture
> with 1-column data.
>
> You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo
> if you configure your build to add the snapshot repository (see [1]).
>
> Let us know if this works for you. Note that if you only require fitting 1
> column data then you would be able to optimise the implementation as it
> will no longer require matrix inversion to compute the mixture probability
> distribution. The CM implementation can act as a reference point for your
> own implementation if desired.
>
> Regards,
>
> Alex
>
> [1]
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Ce02533ac3d844a550c6408dc41f8fbd5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457784848076395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=l%2FPU0SIG1EftKyBiM4uV13PExBK%2FfPPThFePB5YGT1s%3D=0
> <
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Ce02533ac3d844a550c6408dc41f8fbd5%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638457784848076395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=l%2FPU0SIG1EftKyBiM4uV13PExBK%2FfPPThFePB5YGT1s%3D=0
> >
>
> On Tue, 5 Mar 2024 at 00:06, Alex Herbert 
> wrote:
>
> > Hi,
> >
> > I think this is a bug in the
> > MultivariateNormalMixtureExpectationMaximization class. When I update the
> > code to allow 1 column in the rows it outputs a similar fit to matlab.
> > Here's an example of Matlab:
> >
> > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)]
> > GMModel = fitgmdist(X,2);
> >
> > >> GMModel.mu
> > ans =
> > 0.0737
> > 3.0914
> > >> GMModel.ComponentProportion
> > ans =
> > 0.67500.3250
> > >> GMModel.Sigma
> > ans(:,:,1) =
> > 1.0505
> > ans(:,:,2) =
> > 1.6593
> >
> > I pasted the same X data into a test for
> > MultivariateNormalMixtureExpectationMaximization that had been updated to
> > allow data with a single column and get the following fit:
> >
> > 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Gary Gregory
Hi Craig,

In general, the fastest way to address an issue you care about is to
provide a PR in GitHub, with a unit test 

Gary


On Mon, Mar 11, 2024, 2:06 PM Craig Brautigam 
wrote:

> Just bumping this up...Would it be possible to get a fix for this?
>
>
> Thx!
>
>
> 
> From: Craig Brautigam 
> Sent: Thursday, March 7, 2024 2:47 PM
> To: Commons Users List 
> Subject: Re: [External] - Re:
> MultivariateNormalMixtureExpectationMaximization only 1 dimension
>
> Alex,
>
> Your fix seems to be working however, there is a similar problem in
> MultivariateNormalMixtureExpectationMaximization.estimate().  The number of
> components must be at least 2.  I think that you should be able to try to
> estimate with 1 component if you want to.  The matlab function fitgmdist
> does allow for  1 component, and much of our data does in fact best fit to
> only 1 component.
>
> Thoughts on fixing that restriction as well?
>
>
> Thx!
> Craig
>
>
> 
> From: Alex Herbert 
> Sent: Tuesday, March 5, 2024 11:35 AM
> To: Commons Users List 
> Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization
> only 1 dimension
>
> [You don't often get email from alex.d.herb...@gmail.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> I have updated the master branch with a change to allow fitting a mixture
> with 1-column data.
>
> You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo
> if you configure your build to add the snapshot repository (see [1]).
>
> Let us know if this works for you. Note that if you only require fitting 1
> column data then you would be able to optimise the implementation as it
> will no longer require matrix inversion to compute the mixture probability
> distribution. The CM implementation can act as a reference point for your
> own implementation if desired.
>
> Regards,
>
> Alex
>
> [1]
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=pV5bELVx3%2FwNJ0LADZVQHv4Mf0UZEWq5GdwTFJTTyP0%3D=0
> <
> https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/
> >
>
> On Tue, 5 Mar 2024 at 00:06, Alex Herbert 
> wrote:
>
> > Hi,
> >
> > I think this is a bug in the
> > MultivariateNormalMixtureExpectationMaximization class. When I update the
> > code to allow 1 column in the rows it outputs a similar fit to matlab.
> > Here's an example of Matlab:
> >
> > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)]
> > GMModel = fitgmdist(X,2);
> >
> > >> GMModel.mu
> > ans =
> > 0.0737
> > 3.0914
> > >> GMModel.ComponentProportion
> > ans =
> > 0.67500.3250
> > >> GMModel.Sigma
> > ans(:,:,1) =
> > 1.0505
> > ans(:,:,2) =
> > 1.6593
> >
> > I pasted the same X data into a test for
> > MultivariateNormalMixtureExpectationMaximization that had been updated to
> > allow data with a single column and get the following fit:
> >
> > MultivariateNormalMixtureExpectationMaximization fitter
> > = new MultivariateNormalMixtureExpectationMaximization(data);
> >
> > MixtureMultivariateNormalDistribution initialMix
> > = MultivariateNormalMixtureExpectationMaximization.estimate(data, 2);
> > fitter.fit(initialMix);
> > MixtureMultivariateNormalDistribution fittedMix =
> fitter.getFittedModel();
> > List> components =
> > fittedMix.getComponents();
> >
> > for (Pair component :
> components) {
> > final double weight = component.getFirst();
> > final MultivariateNormalDistribution mvn = component.getSecond();
> > final double[] mean = mvn.getMeans();
> > final RealMatrix covMat = mvn.getCovariances();
> > System.out.printf("%s : %s : %s%n", weight, Arrays.toString(mean),
> > covMat.toString());
> > }
> >
> > 0.6420433138817465 : [0.016942587744259194] :
> > Array2DRowRealMatrix{{0.9929681356}}
> > 0.3579566861182536 : [2.9152176347671754] :
> > Array2DRowRealMatrix{{1.8940290549}}
> >
> > The numbers are close enough to indicate that the fit is valid.
> >
> > I think the error has been in assuming that because you require 2
> > components to have a mixture model then you must have 2 columns in the
> > input data. However this is not true. You can fit single dimension data
> > with a mixture of single Gaussians.
> >
> > Is this the functionality that you are expecting?
> >
> > Regards,
> >
> > Alex
> >
> >
> > On Mon, 4 Mar 2024 at 20:48, Craig 

Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 1 dimension

2024-03-11 Thread Craig Brautigam
Just bumping this up...Would it be possible to get a fix for this?


Thx!



From: Craig Brautigam 
Sent: Thursday, March 7, 2024 2:47 PM
To: Commons Users List 
Subject: Re: [External] - Re: MultivariateNormalMixtureExpectationMaximization 
only 1 dimension

Alex,

Your fix seems to be working however, there is a similar problem in 
MultivariateNormalMixtureExpectationMaximization.estimate().  The number of 
components must be at least 2.  I think that you should be able to try to 
estimate with 1 component if you want to.  The matlab function fitgmdist does 
allow for  1 component, and much of our data does in fact best fit to only 1 
component.

Thoughts on fixing that restriction as well?


Thx!
Craig



From: Alex Herbert 
Sent: Tuesday, March 5, 2024 11:35 AM
To: Commons Users List 
Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization only 
1 dimension

[You don't often get email from alex.d.herb...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


I have updated the master branch with a change to allow fitting a mixture
with 1-column data.

You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo
if you configure your build to add the snapshot repository (see [1]).

Let us know if this works for you. Note that if you only require fitting 1
column data then you would be able to optimise the implementation as it
will no longer require matrix inversion to compute the mixture probability
distribution. The CM implementation can act as a reference point for your
own implementation if desired.

Regards,

Alex

[1]
https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=pV5bELVx3%2FwNJ0LADZVQHv4Mf0UZEWq5GdwTFJTTyP0%3D=0

On Tue, 5 Mar 2024 at 00:06, Alex Herbert  wrote:

> Hi,
>
> I think this is a bug in the
> MultivariateNormalMixtureExpectationMaximization class. When I update the
> code to allow 1 column in the rows it outputs a similar fit to matlab.
> Here's an example of Matlab:
>
> X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)]
> GMModel = fitgmdist(X,2);
>
> >> GMModel.mu
> ans =
> 0.0737
> 3.0914
> >> GMModel.ComponentProportion
> ans =
> 0.67500.3250
> >> GMModel.Sigma
> ans(:,:,1) =
> 1.0505
> ans(:,:,2) =
> 1.6593
>
> I pasted the same X data into a test for
> MultivariateNormalMixtureExpectationMaximization that had been updated to
> allow data with a single column and get the following fit:
>
> MultivariateNormalMixtureExpectationMaximization fitter
> = new MultivariateNormalMixtureExpectationMaximization(data);
>
> MixtureMultivariateNormalDistribution initialMix
> = MultivariateNormalMixtureExpectationMaximization.estimate(data, 2);
> fitter.fit(initialMix);
> MixtureMultivariateNormalDistribution fittedMix = fitter.getFittedModel();
> List> components =
> fittedMix.getComponents();
>
> for (Pair component : components) {
> final double weight = component.getFirst();
> final MultivariateNormalDistribution mvn = component.getSecond();
> final double[] mean = mvn.getMeans();
> final RealMatrix covMat = mvn.getCovariances();
> System.out.printf("%s : %s : %s%n", weight, Arrays.toString(mean),
> covMat.toString());
> }
>
> 0.6420433138817465 : [0.016942587744259194] :
> Array2DRowRealMatrix{{0.9929681356}}
> 0.3579566861182536 : [2.9152176347671754] :
> Array2DRowRealMatrix{{1.8940290549}}
>
> The numbers are close enough to indicate that the fit is valid.
>
> I think the error has been in assuming that because you require 2
> components to have a mixture model then you must have 2 columns in the
> input data. However this is not true. You can fit single dimension data
> with a mixture of single Gaussians.
>
> Is this the functionality that you are expecting?
>
> Regards,
>
> Alex
>
>
> On Mon, 4 Mar 2024 at 20:48, Craig Brautigam 
> wrote:
>
>> Forgive me if this comes in twice... I did not subscribe first before
>> sending the message below.
>>
>>
>> 
>> From: Craig Brautigam
>> Sent: Monday, March 4, 2024 1:33 PM
>> To: user@commons.apache.org 
>> Subject: MultivariateNormalMixtureExpectationMaximization only 1 dimension
>>
>> Hi,
>>
>> Full disclosure, I'm not a mathematician so I can not go into the weeds
>>