Thanks Mike that is very helpful. Am I reading the code correctly that the
norm lossy encoding is done in the similarity? How do you set the number
of bytes used for the norms?
Thanks,
Matt
On Thu, Jan 2, 2020 at 10:31 AM Michael McCandless <
luc...@mikemccandless.com> wrote:
> Norms encode th
Norms encode the number of tokens in the field, but in a lossy manner (1
byte by default), so you could probably create a custom query that filtered
based on that, if you could tolerate the loss in precision? Or maybe
change your norms storage to more precision?
You could use NormsFieldExistsQuer
This comes up occasionally, it’d be a neat thing to add to Solr if you’re
motivated. It gets tricky though.
- part of the config would have to be the name of the length field to put the
result into, that part’s easy.
- The trickier part is “when should the count be incremented?”. For instance,
That is a clever idea. I would still prefer something cleaner but this
could work. Thanks!
On Sat, Dec 28, 2019 at 10:11 PM Michael Sokolov wrote:
> I don't know of any pre-existing thing that does exactly this, but how
> about a token filter that counts tokens (or positions maybe), and then
>
I don't know of any pre-existing thing that does exactly this, but how
about a token filter that counts tokens (or positions maybe), and then
appends some special token encoding the length?
On Sat, Dec 28, 2019, 9:36 AM Matt Davis wrote:
> Hello,
>
> I was wondering if it is possible to search f
Hello,
I was wondering if it is possible to search for the number of tokens in a
text field. For example find book titles with 3 or more words. I don't
mind adding a field that is the number of tokens to the search index but I
would like to avoid analyzing the text two times. Can Lucene search