Many thanks for the replies. I'll give your suggestions a shot.

I think the key for me is that the items *within* each data point are
unordered. To this end, jaccard distance could do the trick, if I figure
out how to get the data in the right form.

Thanks!

Paul

On Thu, Apr 30, 2015 at 7:35 PM, Andreas Mueller <t3k...@gmail.com> wrote:

>  Whoops, I think I misread, I thought each point was binary, not a binary
> set.
> I agree with Joel, it is more about defining a distance or an embedding.
>
> You could min-hash, count occurances or use a set kernel?
> It depends a lot on the semantics of the sets, I'd think.
>
>
>
> On 04/30/2015 07:31 PM, Joel Nothman wrote:
>
> The algorithm isn't the issue so much as defining a metric that measures
> the distance or affinity between items, or else finding a way to reduce
> your data to a more standard metric space.
>
>  I have for instance clustered sets of objects by first minhashing them
> (an approximate dim reduction for sets) then DBSCAN clustering in hamming
> space. One benefit of this was that objects that differed only a little
> might be reduced to the same hash, making the number of distinct samples to
> cluster smaller, instead employing weighted samples in DBSCAN.
>
> On 1 May 2015 at 06:32, Paul Frandsen <paulbfrand...@gmail.com> wrote:
>
>> Hello,
>>
>>  I'm interested in clustering many unordered sets of bitsets. In
>> general, a data point would look like: {1011000010, 0100000001, 0000001100,
>> 0000110000}, where each bitset has the same number of digits and are
>> ordered, but the set is unordered. Alternatively (with this particular data
>> set), I could represent the same data point as a set of sets of integers:
>> {{0,2,3,8},{1,9},{6,7},{4,5}}. Ideally, I'd like to use k-means, but I
>> imagine that figuring out centroids would be difficult. Are there any
>> clustering algorithms in scikit-learn that could cluster data like these?
>> I've looked through the docs, but I am coming up short.
>>
>>  Thank you,
>>
>>  Paul Frandsen
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM 
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Bioinformatics and Genomics
Office of Research Information Services
Office of the CIO
Smithsonian Institution
paulfrandsen.com
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to