Re: [sqlite] finding pairwise similarity

Jean-Christophe Deschamps Sat, 14 Jan 2012 16:44:45 -0800

where `similarity` is a measure of overlap of `t` between each pair of`c` and is calculated as the number of `t` common between a given pairdivided by the total number of unique `t` in the pair.


Just like Igor, I take "unique `t` in pair" to mean "distinct `t` in pair".

I want the result to be

    source_c target_c similarity
    -------- -------- ----------
1 2 0.2 (= 1 common `t` between the 2 `c` / bytotal 4 `t`)


Here I take 0.2 to mean 0.25.


If so, here's some more amunition:

select src.c_no "Source c",
       trg.c_no "Target c",

(select count(*) from (select 1 from c_t where c_t.c_no in(src.c_no, trg.c_no) group by c_t.t_no having count(*) > 1))

       * 1.0 /

(select count(*) from (select distinct t_no from c_t wherec_t.c_no in (src.c_no, trg.c_no))) Similarity

       from c src join c trg on (src.c_no < trg.c_no);

If I'm not mistaken, both versions should give identicalresults. Check by yourself on real data and see which is faster foryou (most probably Igor's).


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] finding pairwise similarity

Reply via email to