Whoops, I think I misread, I thought each point was binary, not a binary
set.
I agree with Joel, it is more about defining a distance or an embedding.
You could min-hash, count occurances or use a set kernel?
It depends a lot on the semantics of the sets, I'd think.
On 04/30/2015 07:31 PM, Joel Nothman wrote:
The algorithm isn't the issue so much as defining a metric that
measures the distance or affinity between items, or else finding a way
to reduce your data to a more standard metric space.
I have for instance clustered sets of objects by first minhashing them
(an approximate dim reduction for sets) then DBSCAN clustering in
hamming space. One benefit of this was that objects that differed only
a little might be reduced to the same hash, making the number of
distinct samples to cluster smaller, instead employing weighted
samples in DBSCAN.
On 1 May 2015 at 06:32, Paul Frandsen <paulbfrand...@gmail.com
<mailto:paulbfrand...@gmail.com>> wrote:
Hello,
I'm interested in clustering many unordered sets of bitsets. In
general, a data point would look like: {1011000010, 0100000001,
0000001100, 0000110000}, where each bitset has the same number of
digits and are ordered, but the set is unordered. Alternatively
(with this particular data set), I could represent the same data
point as a set of sets of integers: {{0,2,3,8},{1,9},{6,7},{4,5}}.
Ideally, I'd like to use k-means, but I imagine that figuring out
centroids would be difficult. Are there any clustering algorithms
in scikit-learn that could cluster data like these? I've looked
through the docs, but I am coming up short.
Thank you,
Paul Frandsen
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable
Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general