[Scikit-learn-general] Best way to integrate probability density function from GMM?

Benjamin Blumer Thu, 09 Oct 2014 16:05:05 -0700

Hi all,

I see that gmm.score(x) returns the log probability of x for that point.
I'm interested in integrating this probability over a region.  For example,
finding the probability of a ball being in the space (x,y,z) +/-  (delta_x,
delta_y, delta_z).  In this example, I'd be using past ball locations as
training data.


So the solution seems to be to integrate the probability density function
(returned by gmm.score()) over the region of interest.  Right now I'm using
Scipy's integrate.nquad method. But, coupling that with gmm.score is SLOW.
I've pasted an example below that takes > 10 minutes to compute on my i7.
The example is in 7D because my actual use case is in 7D.  The region is
pretty small -- though the actual region I'll be using is bigger. Think
delta = 0.1.

I'm wondering: Is there a way to speed this up? An analytical method to do
this? A built in method?

I've also attached this as a file.


import numpy as np
from sklearn import mixture
from scipy import integrate

data1 = 1.0
outcome_1_data = np.array([data1, data1, data1, data1, data1, data1, data1])

data2 = 2.0
outcome_2_data = np.array([data2, data2, data2, data2, data2, data2, data2])

data3 = 3.0
outcome_3_data = np.array([data3, data3, data3, data3, data3, data3, data3])

data4 = 4.0
outcome_4_data = np.array([data4, data4, data4, data4, data4, data4, data4])

data5 = 5.0
outcome_5_data = np.array([data5, data5, data5, data5, data5, data5, data5])

all_outcomes_together = np.array([outcome_1_data, outcome_2_data,
outcome_3_data, outcome_4_data, outcome_5_data, ])
prob_model = mixture.GMM(n_components = 5)
prob_model.fit(all_outcomes_together)

center_point = np.array([3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0])
delta = np.array([0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001])
lower_bounds = center_point - delta
upper_bounds = center_point + delta
both_bounds = [list(a) for a in zip(lower_bounds, upper_bounds)]
print both_bounds

prob_within_delta =  integrate.nquad(lambda *state:
np.exp(prob_model.score([list(state)])), both_bounds)

import numpy as np
from sklearn import mixture
from scipy import integrate




data1 = 1.0
outcome_1_data = np.array([data1, data1, data1, data1, data1, data1, data1])

data2 = 2.0
outcome_2_data = np.array([data2, data2, data2, data2, data2, data2, data2])
   
data3 = 3.0
outcome_3_data = np.array([data3, data3, data3, data3, data3, data3, data3])
    
data4 = 4.0
outcome_4_data = np.array([data4, data4, data4, data4, data4, data4, data4])
          
data5 = 5.0
outcome_5_data = np.array([data5, data5, data5, data5, data5, data5, data5])

all_outcomes_together = np.array([outcome_1_data, outcome_2_data, outcome_3_data, outcome_4_data, outcome_5_data, ])
prob_model = mixture.GMM(n_components = 5)
prob_model.fit(all_outcomes_together)

center_point = np.array([3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0])
delta = np.array([0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001])
lower_bounds = center_point - delta
upper_bounds = center_point + delta
both_bounds = [list(a) for a in zip(lower_bounds, upper_bounds)]
print both_bounds

prob_within_delta =  integrate.nquad(lambda *state: np.exp(prob_model.score([list(state)])), both_bounds)

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Best way to integrate probability density function from GMM?

Reply via email to