I'm slightly out of my depth on the theory here. I've been playing with
Kernel Density Estimation on my data and I'm getting good results. What
I would like to do is extend this to ensemble measurements.

I have data collected from n realisations of a lab experiment*. The data
is a time series of velocities, with N data points per ensemble member.
n~15, N~2500000

I have made a kde for a single set of data, using KernelDensity+. My
naive approach for creating an ensemble estimator is to bundle all the
ensemble data together and fit the kde to that. However I am aware that
this is going to be overfitted.

I would like to make an ensemble kde for this data. I'm thinking that I
need to cross validate the esimators somehow. My first thought is doing
a leave one out cross validation on the ensemble data, but I'm sure
there is a more appropriate way to do it.

I'm looking at the methods in sklearn.ensemble, but I'm a bit lost as to
what I should use. I feel I am a bit constrained computationally by the
quantity of data that I have.

Any input here would be appreciated.

cheers,
aaron



* for interest, it is a fluids experiment looking at turbulence with
piv. You can get a flavor here:

http://nbviewer.ipython.org/github/aaren/notebooks/blob/master/2d_pdf.ipynb

+ i.e. sklearn.neighbours.KernelDensity. thanks @jakevdp for the write
up!

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to