?Hi,

I know I'm a newcomer into the SQLite project, but I'm excited about what
FTS5 has to offer. To me it seems simple and powerful, and has some really
nice ideas.

Is it possible for me to contribute on the module, or is it too late for
that?

I would like to mention two new ideas I would offer to introduce. First, a
customizable list of stopwords:

https://en.wikipedia.org/wiki/Stop_words
?
(I didn't find anything similar to that in the documentation, am I missing
something?)

I know I can add it via a custom tokenizer, but wouldn't it be useful to
have it straight out of the box?


Also, I would like to mention the usefulness of some statistics to create
more advanced ranking formulas. Things like: the Longest Common Subsequence
between query and document, number of unique matched keywords, etc. These
and other values are really useful in applications where bm25 is not
suitable or enough.

I come from using an engine called Sphinx Search (used on huge things like
Craigslist), which offers such factors. Using them, they have defined
rankers that mix bm25 with proximity, and some other they call
SPH_RANK_SPH04, which includes a weighting boost for the result appearing
at the beginning of the text field, and a bigger boost if its an exact
match:

http://sphinxsearch.com/docs/latest/builtin-rankers.html

The formulas (in sphinx higher is better) for them are:
http://sphinxsearch.com/docs/latest/formulas-for-builtin-rankers.html

And the list of supported factor is:
http://sphinxsearch.com/docs/latest/ranking-factors.html.

Of course having all of them would be overkill, but if you find them
interesting, we can get the most useful ones, allowing people to build
rankers to their own needs.


?Once again, you people are the experts and know if such ideas are feasible
and where is the right place to include them, so please tell me your
opinions.

 ?

Reply via email to