Re: Searching number of tokens in text field

2020-01-02 Thread Matt Davis
Thanks Mike that is very helpful. Am I reading the code correctly that the norm lossy encoding is done in the similarity? How do you set the number of bytes used for the norms? Thanks, Matt On Thu, Jan 2, 2020 at 10:31 AM Michael McCandless < luc...@mikemccandless.com> wrote: > Norms encode th

Re: Searching number of tokens in text field

2020-01-02 Thread Michael McCandless
Norms encode the number of tokens in the field, but in a lossy manner (1 byte by default), so you could probably create a custom query that filtered based on that, if you could tolerate the loss in precision? Or maybe change your norms storage to more precision? You could use NormsFieldExistsQuer

Re: Searching number of tokens in text field

2019-12-30 Thread Erick Erickson
This comes up occasionally, it’d be a neat thing to add to Solr if you’re motivated. It gets tricky though. - part of the config would have to be the name of the length field to put the result into, that part’s easy. - The trickier part is “when should the count be incremented?”. For instance,

Re: Searching number of tokens in text field

2019-12-29 Thread Matt Davis
That is a clever idea. I would still prefer something cleaner but this could work. Thanks! On Sat, Dec 28, 2019 at 10:11 PM Michael Sokolov wrote: > I don't know of any pre-existing thing that does exactly this, but how > about a token filter that counts tokens (or positions maybe), and then >

Re: Searching number of tokens in text field

2019-12-28 Thread Michael Sokolov
I don't know of any pre-existing thing that does exactly this, but how about a token filter that counts tokens (or positions maybe), and then appends some special token encoding the length? On Sat, Dec 28, 2019, 9:36 AM Matt Davis wrote: > Hello, > > I was wondering if it is possible to search f

Searching number of tokens in text field

2019-12-28 Thread Matt Davis
Hello, I was wondering if it is possible to search for the number of tokens in a text field. For example find book titles with 3 or more words. I don't mind adding a field that is the number of tokens to the search index but I would like to avoid analyzing the text two times. Can Lucene search