Re: [sqlite] Proximity ranking with FTS

Dan Kennedy Tue, 17 Jun 2014 10:37:45 -0700

On 06/17/2014 10:48 AM, Josh Wilson wrote:

Yeah I had thought about using the byte distance between words but you get
these instances:


[Example A]
|word1|10charword|word2|

[Example B]
|word1|3charword|4charword|3charword|word2|

By using byte distances, both of these score the same, where Example A
should score more highly.

But it would seem I can use the fts3_tokenizer somehow to get the token
positions or that this underlying value is available but just not stored in
an accessible manner.

I think it's possible to do. When it visits a row as part of a full-textsearch, internally FTS has a list of matches within the current row foreach phrase in the query. Each match is stored as a column and tokenoffset - the number of tokens that precede the match within the column text.


Is that what you need? Do you have any ideas for an fts4 interface it?

Dan.


I implemented OkapiBM25f [1] but was hoping to implement something like the
following proximity ranking [2] as it combines Bag-Of-Words ranking and
proximity ranking. Although that article proposes to precalculate the
distance pairs for all tokens, I'm happy to accept the TimeCost and
calculate on the fly as that SpaceCost won't be worth it.

[1] https://github.com/neozenith/sqlite-okapi-bm25
[2] http://infolab.stanford.edu/~theobald/pub/proximity-spire07.pdf



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76152.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Proximity ranking with FTS

Reply via email to