Thank you very much for all feedback! the last example crashed also so I
have tried with try and error to trace it down into the library and it
looks like the problem are this 2 lines in file
src/libsimmetrics/simmetrics/tokenizer.c
tmp = calloc((init_len + qtype->qgram_len),
sizeof(char));
probably both lines should be changed to
tmp = calloc((init_len + 2 * qtype->qgram_len),
sizeof(char));
@Andrea: can you verify and include that in library?
However you last SQL example show interesting thing: calling 2
stringmetrics in one query result in values 100 and 36 in one order and
40 and 100 in opposite order. This is also not good :(
sqlite> .load ./libstringmetrics.so
select a.firstname, b.firstname, a.lastname, b.lastname,
stringmetrics("qgrams_distance","similarity",a.firstname,
b.firstname,"") first_dist,
stringmetrics("qgrams_distance","similarity",a.lastname, b.lastname,"")
last_dist
from
(select "Milan" as firstname, "Roubal" as lastname ) a,
(select "Milan" as firstname, "RoubalRoubalRoubalRo" as lastname ) b
;
sqlite> ...> ...> ...> ...> ...> ...>
Milan|Milan|Roubal|RoubalRoubalRoubalRo|100.0|36.6666679382324
sqlite> select a.firstname, b.firstname, a.lastname, b.lastname,
stringmetrics("qgrams_distance","similarity",a.lastname, b.lastname,"")
last_dist,
stringmetrics("qgrams_distance","similarity",a.firstname,
b.firstname,"") first_dist
from
(select "Milan" as firstname, "Roubal" as lastname ) a,
(select "Milan" as firstname, "RoubalRoubalRoubalRo" as lastname ) b
;
...> ...> ...> ...> ...> ...>
Milan|Milan|Roubal|RoubalRoubalRoubalRo|40.0|100.0
Thank you
Best Regards
Milan
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users