Yeah, but part of the problem is that an input string is not converted to
"words" until analysis, which doesn't happen until after Solr creates the
Lucene Document and hands it off to Lucene. In other words (Ha!Ha!), there
are no words during the Solr-side of indexing. That said, you can always
fake it by writing a JavaScript StatelessScriptUpdateProcessorFactory script
that simulates basic tokenization, like converting punctuation to white
space, trimming and eliminating excess white space and then doing a split
and count the results. Or, we could add a new update processor that did
exactly that - CountWordsUpdateProcessorFactory. Much like
FieldLengthUpdateProcessorFactory... maybe it could be an option on FLUPF -
count="words/chars".
-- Jack Krupansky
-----Original Message-----
From: Walter Underwood
Sent: Thursday, June 06, 2013 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on results with more than N words.
Someone else asked about this recently. The best approach is to count the
words at index time and add a field with the count, so "title" and
"title_len" or something like that.
wunder
On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote:
I don't recall seeing any such filter. Sounds like a good idea though.
Although, maybe it is another good idea that really isn't too necessary
for solving many real world problems.
-- Jack Krupansky
-----Original Message----- From: Dotan Cohen
Sent: Thursday, June 06, 2013 3:45 AM
To: solr-user@lucene.apache.org
Subject: Filtering on results with more than N words.
Is there any way to restrict the search results to only those
documents with more than N words / tokens in the searched field? I
thought that this would be an easy one to Google for, but I cannot
figure it out. or find any references. There are many references to
word size in characters, but not to filed size in words.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com