where `similarity` is a measure of overlap of `t` between each pair of
`c` and is calculated as the number of `t` common between a given pair
divided by the total number of unique `t` in the pair.
Just like Igor, I take "unique `t` in pair" to mean "distinct `t` in pair".
I want the result to be
source_c target_c similarity
-------- -------- ----------
1 2 0.2 (= 1 common `t` between the 2 `c` / by
total 4 `t`)
Here I take 0.2 to mean 0.25.
If so, here's some more amunition:
select src.c_no "Source c",
trg.c_no "Target c",
(select count(*) from (select 1 from c_t where c_t.c_no in
(src.c_no, trg.c_no) group by c_t.t_no having count(*) > 1))
* 1.0 /
(select count(*) from (select distinct t_no from c_t where
c_t.c_no in (src.c_no, trg.c_no))) Similarity
from c src join c trg on (src.c_no < trg.c_no);
If I'm not mistaken, both versions should give identical
results. Check by yourself on real data and see which is faster for
you (most probably Igor's).
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users