Hi Craig, In general, the fastest way to address an issue you care about is to provide a PR in GitHub, with a unit test 😉
Gary On Mon, Mar 11, 2024, 2:06 PM Craig Brautigam <cbrauti...@icr-team.com> wrote: > Just bumping this up...Would it be possible to get a fix for this? > > > Thx! > > > ________________________________ > From: Craig Brautigam <cbrauti...@icr-team.com> > Sent: Thursday, March 7, 2024 2:47 PM > To: Commons Users List <user@commons.apache.org> > Subject: Re: [External] - Re: > MultivariateNormalMixtureExpectationMaximization only 1 dimension > > Alex, > > Your fix seems to be working however, there is a similar problem in > MultivariateNormalMixtureExpectationMaximization.estimate(). The number of > components must be at least 2. I think that you should be able to try to > estimate with 1 component if you want to. The matlab function fitgmdist > does allow for 1 component, and much of our data does in fact best fit to > only 1 component. > > Thoughts on fixing that restriction as well? > > > Thx! > Craig > > > ________________________________ > From: Alex Herbert <alex.d.herb...@gmail.com> > Sent: Tuesday, March 5, 2024 11:35 AM > To: Commons Users List <user@commons.apache.org> > Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization > only 1 dimension > > [You don't often get email from alex.d.herb...@gmail.com. Learn why this > is important at https://aka.ms/LearnAboutSenderIdentification ] > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you recognize the sender and know > the content is safe. > > > I have updated the master branch with a change to allow fitting a mixture > with 1-column data. > > You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo > if you configure your build to add the snapshot repository (see [1]). > > Let us know if this works for you. Note that if you only require fitting 1 > column data then you would be able to optimise the implementation as it > will no longer require matrix inversion to compute the mixture probability > distribution. The CM implementation can act as a reference point for your > own implementation if desired. > > Regards, > > Alex > > [1] > > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=pV5bELVx3%2FwNJ0LADZVQHv4Mf0UZEWq5GdwTFJTTyP0%3D&reserved=0 > < > https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/ > > > > On Tue, 5 Mar 2024 at 00:06, Alex Herbert <alex.d.herb...@gmail.com> > wrote: > > > Hi, > > > > I think this is a bug in the > > MultivariateNormalMixtureExpectationMaximization class. When I update the > > code to allow 1 column in the rows it outputs a similar fit to matlab. > > Here's an example of Matlab: > > > > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)] > > GMModel = fitgmdist(X,2); > > > > >> GMModel.mu > > ans = > > 0.0737 > > 3.0914 > > >> GMModel.ComponentProportion > > ans = > > 0.6750 0.3250 > > >> GMModel.Sigma > > ans(:,:,1) = > > 1.0505 > > ans(:,:,2) = > > 1.6593 > > > > I pasted the same X data into a test for > > MultivariateNormalMixtureExpectationMaximization that had been updated to > > allow data with a single column and get the following fit: > > > > MultivariateNormalMixtureExpectationMaximization fitter > > = new MultivariateNormalMixtureExpectationMaximization(data); > > > > MixtureMultivariateNormalDistribution initialMix > > = MultivariateNormalMixtureExpectationMaximization.estimate(data, 2); > > fitter.fit(initialMix); > > MixtureMultivariateNormalDistribution fittedMix = > fitter.getFittedModel(); > > List<Pair<Double, MultivariateNormalDistribution>> components = > > fittedMix.getComponents(); > > > > for (Pair<Double, MultivariateNormalDistribution> component : > components) { > > final double weight = component.getFirst(); > > final MultivariateNormalDistribution mvn = component.getSecond(); > > final double[] mean = mvn.getMeans(); > > final RealMatrix covMat = mvn.getCovariances(); > > System.out.printf("%s : %s : %s%n", weight, Arrays.toString(mean), > > covMat.toString()); > > } > > > > 0.6420433138817465 : [0.016942587744259194] : > > Array2DRowRealMatrix{{0.9929681356}} > > 0.3579566861182536 : [2.9152176347671754] : > > Array2DRowRealMatrix{{1.8940290549}} > > > > The numbers are close enough to indicate that the fit is valid. > > > > I think the error has been in assuming that because you require 2 > > components to have a mixture model then you must have 2 columns in the > > input data. However this is not true. You can fit single dimension data > > with a mixture of single Gaussians. > > > > Is this the functionality that you are expecting? > > > > Regards, > > > > Alex > > > > > > On Mon, 4 Mar 2024 at 20:48, Craig Brautigam <cbrauti...@icr-team.com> > > wrote: > > > >> Forgive me if this comes in twice... I did not subscribe first before > >> sending the message below. > >> > >> > >> ________________________________ > >> From: Craig Brautigam > >> Sent: Monday, March 4, 2024 1:33 PM > >> To: user@commons.apache.org <user@commons.apache.org> > >> Subject: MultivariateNormalMixtureExpectationMaximization only 1 > dimension > >> > >> Hi, > >> > >> Full disclosure, I'm not a mathematician so I can not go into the weeds > >> into the math. However I am tasked with porting some matlab code that > is > >> doing gaussian mixed model to java. I really want to use apache common > >> math if possible. However the code that I'm porting has 1 dimension ( a > >> single variable/attribute/property) that GMMs are being created from. > >> > >> MultivariateNormalMixtureExpectationMaximization looks to be a pretty > >> close drop in replacement for the matlab functions > >> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Ffitgmdist.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Mro7wLtSPNZ%2BvlTzgFkdtjwXDrVvw9YJwLGpXij7qNo%3D&reserved=0 > <<https://www.mathworks.com/help/stats/fitgmdist.html> > >> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Ffitgmdist.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Mro7wLtSPNZ%2BvlTzgFkdtjwXDrVvw9YJwLGpXij7qNo%3D&reserved=0 > ><https://www.mathworks.com/help/stats/fitgmdist.html> andhttps:// > >> > https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Fgmdistribution.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=zsj4iQQmeOUd9ZmleDuu8TB5AM%2BU82hoGBg0kJD541w%3D&reserved=0 > <http://www.mathworks.com/help/stats/gmdistribution.html>, however the > >> constructor for MultivariateNormalMixtureExpectationMaximization clearly > >> states the the number of columns in the double[][]data array MUST be no > >> less thatn2 columns. I'm completely baffled as to why this is the case > if > >> I want to try to fit data with 1 dimension in it. Is there a > workaround I > >> can use like provide a dummy column of data with all 0s to pacify the > >> constructor? Is there another class I should be using? > >> > >> Any help would be greatly appreciated. > >> > >> Thx! > >> > >> > >> ________________________________ > >> The information contained in this e-mail and any attachments from ICR, > >> Inc. may contain confidential and/or proprietary information, and is > >> intended only for the named recipient to whom it was originally > addressed. > >> If you are not the intended recipient, any disclosure, distribution, or > >> copying of this e-mail or its attachments is strictly prohibited. If you > >> have received this e-mail in error, please notify the sender > immediately by > >> return e-mail and permanently delete the e-mail and any attachments. > >> > > > > ________________________________ > From: Alex Herbert <alex.d.herb...@gmail.com> > Sent: Tuesday, March 5, 2024 11:35 AM > To: Commons Users List <user@commons.apache.org> > Subject: [External] - Re: MultivariateNormalMixtureExpectationMaximization > only 1 dimension > > [You don't often get email from alex.d.herb...@gmail.com. Learn why this > is important at https://aka.ms/LearnAboutSenderIdentification ] > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you recognize the sender and know > the content is safe. > > > I have updated the master branch with a change to allow fitting a mixture > with 1-column data. > > You should be able to pick up the 4.0-SNAPSHOT from the ASF snapshots repo > if you configure your build to add the snapshot repository (see [1]). > > Let us know if this works for you. Note that if you only require fitting 1 > column data then you would be able to optimise the implementation as it > will no longer require matrix inversion to compute the mixture probability > distribution. The CM implementation can act as a reference point for your > own implementation if desired. > > Regards, > > Alex > > [1] > > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Fsnapshots%2Forg%2Fapache%2Fcommons%2Fcommons-math4-legacy%2F4.0-SNAPSHOT%2F&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=pV5bELVx3%2FwNJ0LADZVQHv4Mf0UZEWq5GdwTFJTTyP0%3D&reserved=0 > < > https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math4-legacy/4.0-SNAPSHOT/ > > > > On Tue, 5 Mar 2024 at 00:06, Alex Herbert <alex.d.herb...@gmail.com> > wrote: > > > Hi, > > > > I think this is a bug in the > > MultivariateNormalMixtureExpectationMaximization class. When I update the > > code to allow 1 column in the rows it outputs a similar fit to matlab. > > Here's an example of Matlab: > > > > X = [normrnd(0, 1, 100, 1); normrnd(2, 2, 100, 1)] > > GMModel = fitgmdist(X,2); > > > > >> GMModel.mu > > ans = > > 0.0737 > > 3.0914 > > >> GMModel.ComponentProportion > > ans = > > 0.6750 0.3250 > > >> GMModel.Sigma > > ans(:,:,1) = > > 1.0505 > > ans(:,:,2) = > > 1.6593 > > > > I pasted the same X data into a test for > > MultivariateNormalMixtureExpectationMaximization that had been updated to > > allow data with a single column and get the following fit: > > > > MultivariateNormalMixtureExpectationMaximization fitter > > = new MultivariateNormalMixtureExpectationMaximization(data); > > > > MixtureMultivariateNormalDistribution initialMix > > = MultivariateNormalMixtureExpectationMaximization.estimate(data, 2); > > fitter.fit(initialMix); > > MixtureMultivariateNormalDistribution fittedMix = > fitter.getFittedModel(); > > List<Pair<Double, MultivariateNormalDistribution>> components = > > fittedMix.getComponents(); > > > > for (Pair<Double, MultivariateNormalDistribution> component : > components) { > > final double weight = component.getFirst(); > > final MultivariateNormalDistribution mvn = component.getSecond(); > > final double[] mean = mvn.getMeans(); > > final RealMatrix covMat = mvn.getCovariances(); > > System.out.printf("%s : %s : %s%n", weight, Arrays.toString(mean), > > covMat.toString()); > > } > > > > 0.6420433138817465 : [0.016942587744259194] : > > Array2DRowRealMatrix{{0.9929681356}} > > 0.3579566861182536 : [2.9152176347671754] : > > Array2DRowRealMatrix{{1.8940290549}} > > > > The numbers are close enough to indicate that the fit is valid. > > > > I think the error has been in assuming that because you require 2 > > components to have a mixture model then you must have 2 columns in the > > input data. However this is not true. You can fit single dimension data > > with a mixture of single Gaussians. > > > > Is this the functionality that you are expecting? > > > > Regards, > > > > Alex > > > > > > On Mon, 4 Mar 2024 at 20:48, Craig Brautigam <cbrauti...@icr-team.com> > > wrote: > > > >> Forgive me if this comes in twice... I did not subscribe first before > >> sending the message below. > >> > >> > >> ________________________________ > >> From: Craig Brautigam > >> Sent: Monday, March 4, 2024 1:33 PM > >> To: user@commons.apache.org <user@commons.apache.org> > >> Subject: MultivariateNormalMixtureExpectationMaximization only 1 > dimension > >> > >> Hi, > >> > >> Full disclosure, I'm not a mathematician so I can not go into the weeds > >> into the math. However I am tasked with porting some matlab code that > is > >> doing gaussian mixed model to java. I really want to use apache common > >> math if possible. However the code that I'm porting has 1 dimension ( a > >> single variable/attribute/property) that GMMs are being created from. > >> > >> MultivariateNormalMixtureExpectationMaximization looks to be a pretty > >> close drop in replacement for the matlab functions > >> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Ffitgmdist.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Mro7wLtSPNZ%2BvlTzgFkdtjwXDrVvw9YJwLGpXij7qNo%3D&reserved=0 > <<https://www.mathworks.com/help/stats/fitgmdist.html> > >> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Ffitgmdist.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Mro7wLtSPNZ%2BvlTzgFkdtjwXDrVvw9YJwLGpXij7qNo%3D&reserved=0 > ><https://www.mathworks.com/help/stats/fitgmdist.html> andhttps:// > >> > https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fstats%2Fgmdistribution.html&data=05%7C02%7Ccbrautigam%40icr-team.com%7Cbb1041fe6b994488070808dc3d431216%7C3d860a84424d44f9ab2bc61a036b4904%7C0%7C0%7C638452605500058423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=zsj4iQQmeOUd9ZmleDuu8TB5AM%2BU82hoGBg0kJD541w%3D&reserved=0 > <http://www.mathworks.com/help/stats/gmdistribution.html>, however the > >> constructor for MultivariateNormalMixtureExpectationMaximization clearly > >> states the the number of columns in the double[][]data array MUST be no > >> less thatn2 columns. I'm completely baffled as to why this is the case > if > >> I want to try to fit data with 1 dimension in it. Is there a > workaround I > >> can use like provide a dummy column of data with all 0s to pacify the > >> constructor? Is there another class I should be using? > >> > >> Any help would be greatly appreciated. > >> > >> Thx! > >> > >> > >> ________________________________ > >> The information contained in this e-mail and any attachments from ICR, > >> Inc. may contain confidential and/or proprietary information, and is > >> intended only for the named recipient to whom it was originally > addressed. > >> If you are not the intended recipient, any disclosure, distribution, or > >> copying of this e-mail or its attachments is strictly prohibited. If you > >> have received this e-mail in error, please notify the sender > immediately by > >> return e-mail and permanently delete the e-mail and any attachments. > >> > > > ________________________________ > The information contained in this e-mail and any attachments from ICR, > Inc. may contain confidential and/or proprietary information, and is > intended only for the named recipient to whom it was originally addressed. > If you are not the intended recipient, any disclosure, distribution, or > copying of this e-mail or its attachments is strictly prohibited. If you > have received this e-mail in error, please notify the sender immediately by > return e-mail and permanently delete the e-mail and any attachments. >