The algorithm isn't the issue so much as defining a metric that measures
the distance or affinity between items, or else finding a way to reduce
your data to a more standard metric space.

I have for instance clustered sets of objects by first minhashing them (an
approximate dim reduction for sets) then DBSCAN clustering in hamming
space. One benefit of this was that objects that differed only a little
might be reduced to the same hash, making the number of distinct samples to
cluster smaller, instead employing weighted samples in DBSCAN.

On 1 May 2015 at 06:32, Paul Frandsen <paulbfrand...@gmail.com> wrote:

> Hello,
>
> I'm interested in clustering many unordered sets of bitsets. In general, a
> data point would look like: {1011000010, 0100000001, 0000001100,
> 0000110000}, where each bitset has the same number of digits and are
> ordered, but the set is unordered. Alternatively (with this particular data
> set), I could represent the same data point as a set of sets of integers:
> {{0,2,3,8},{1,9},{6,7},{4,5}}. Ideally, I'd like to use k-means, but I
> imagine that figuring out centroids would be difficult. Are there any
> clustering algorithms in scikit-learn that could cluster data like these?
> I've looked through the docs, but I am coming up short.
>
> Thank you,
>
> Paul Frandsen
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to