I was thinking of counting the words before the field is indexed. It is quite 
possible that splitting on white space would be sufficient.

Of course, some idea of what problem this is supposed to solve would be very 
helpful.

wunder

On Jun 6, 2013, at 7:07 AM, Jack Krupansky wrote:

> Yeah, but part of the problem is that an input string is not converted to 
> "words" until analysis, which doesn't happen until after Solr creates the 
> Lucene Document and hands it off to Lucene. In other words (Ha!Ha!), there 
> are no words during the Solr-side of indexing. That said, you can always fake 
> it by writing a JavaScript StatelessScriptUpdateProcessorFactory script that 
> simulates basic tokenization, like converting punctuation to white space,  
> trimming and eliminating excess white space and then doing a split and count 
> the results. Or, we could add a new update processor that did exactly that - 
> CountWordsUpdateProcessorFactory. Much like 
> FieldLengthUpdateProcessorFactory... maybe it could be an option on FLUPF - 
> count="words/chars".
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Walter Underwood
> Sent: Thursday, June 06, 2013 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Filtering on results with more than N words.
> 
> Someone else asked about this recently. The best approach is to count the 
> words at index time and add a field with the count, so "title" and 
> "title_len" or something like that.
> 
> wunder
> 
> On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote:
> 
>> I don't recall seeing any such filter. Sounds like a good idea though. 
>> Although, maybe it is another good idea that really isn't too necessary for 
>> solving many real world problems.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Dotan Cohen
>> Sent: Thursday, June 06, 2013 3:45 AM
>> To: solr-user@lucene.apache.org
>> Subject: Filtering on results with more than N words.
>> 
>> Is there any way to restrict the search results to only those
>> documents with more than N words / tokens in the searched field? I
>> thought that this would be an easy one to Google for, but I cannot
>> figure it out. or find any references. There are many references to
>> word size in characters, but not to  filed size in words.
>> 
>> Thank you.
>> 
>> --
>> Dotan Cohen
>> 
>> http://gibberish.co.il
>> http://what-is-what.com
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to