Yeah, but part of the problem is that an input string is not converted to "words" until analysis, which doesn't happen until after Solr creates the Lucene Document and hands it off to Lucene. In other words (Ha!Ha!), there are no words during the Solr-side of indexing. That said, you can always fake it by writing a JavaScript StatelessScriptUpdateProcessorFactory script that simulates basic tokenization, like converting punctuation to white space, trimming and eliminating excess white space and then doing a split and count the results. Or, we could add a new update processor that did exactly that - CountWordsUpdateProcessorFactory. Much like FieldLengthUpdateProcessorFactory... maybe it could be an option on FLUPF - count="words/chars".

-- Jack Krupansky

-----Original Message----- From: Walter Underwood
Sent: Thursday, June 06, 2013 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on results with more than N words.

Someone else asked about this recently. The best approach is to count the words at index time and add a field with the count, so "title" and "title_len" or something like that.

wunder

On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote:

I don't recall seeing any such filter. Sounds like a good idea though. Although, maybe it is another good idea that really isn't too necessary for solving many real world problems.

-- Jack Krupansky

-----Original Message----- From: Dotan Cohen
Sent: Thursday, June 06, 2013 3:45 AM
To: solr-user@lucene.apache.org
Subject: Filtering on results with more than N words.

Is there any way to restrict the search results to only those
documents with more than N words / tokens in the searched field? I
thought that this would be an easy one to Google for, but I cannot
figure it out. or find any references. There are many references to
word size in characters, but not to  filed size in words.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com



Reply via email to