Hi Raphael. An option to highlight a dense region in your vector is to use a density estimator (http://scikit-learn.org/stable/modules/density.html).
But I think that the python module jenkspy ( https://pypi.python.org/pypi/jenkspy and https://github.com/mthh/jenkspy) can help you also. The method finds the natural breaks of data in 1d ( https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization). I think that if you find a good value for the 'nb_class' parameter you can separate the dense region of your data from the sparse one. K-means is a generalization of Jenks break optimization for multivariate data, so, maybe, you could use the K-means module of scikit-learn for that also. On this approach, personally, I think the jenskpy module more straightforward. I hope it helps. Pedro Pazzini 2018-04-12 16:22 GMT-03:00 Raphael C <drr...@gmail.com>: > I have a set of points in 1d represented by a list X of floating point > numbers. The list has one dense section and the rest is sparse and I > want to find the dense part. I can't release the actual data but here > is a simulation: > > N = 100 > > start = 0 > points = [] > rate = 0.1 > for i in range(N): > points.append(start) > start = start + random.expovariate(rate) > rate = 10 > for i in range(N*10): > points.append(start) > start = start + random.expovariate(rate) > rate = 0.1 > for i in range(N): > points.append(start) > start = start + random.expovariate(rate) > plt.hist(points, bins = 100) > plt.show() > > I would like to use scikit learn to find the dense region. This feels > a little like outlier detection or the task of finding one cluster > with noise. > > Is there a suitable method in scikit learn for this task? > > Raphael > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn