Re: Filtering on results with more than N words.

Jack Krupansky Thu, 06 Jun 2013 07:08:39 -0700

Yeah, but part of the problem is that an input string is not converted to"words" until analysis, which doesn't happen until after Solr creates theLucene Document and hands it off to Lucene. In other words (Ha!Ha!), thereare no words during the Solr-side of indexing. That said, you can alwaysfake it by writing a JavaScript StatelessScriptUpdateProcessorFactory scriptthat simulates basic tokenization, like converting punctuation to whitespace, trimming and eliminating excess white space and then doing a splitand count the results. Or, we could add a new update processor that didexactly that - CountWordsUpdateProcessorFactory. Much likeFieldLengthUpdateProcessorFactory... maybe it could be an option on FLUPF -count="words/chars".


-- Jack Krupansky

-----Original Message-----From: Walter Underwood

Sent: Thursday, June 06, 2013 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on results with more than N words.

Someone else asked about this recently. The best approach is to count thewords at index time and add a field with the count, so "title" and"title_len" or something like that.


wunder

On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote:

I don't recall seeing any such filter. Sounds like a good idea though.Although, maybe it is another good idea that really isn't too necessaryfor solving many real world problems.

-- Jack Krupansky

-----Original Message----- From: Dotan Cohen
Sent: Thursday, June 06, 2013 3:45 AM
To: solr-user@lucene.apache.org
Subject: Filtering on results with more than N words.

Is there any way to restrict the search results to only those
documents with more than N words / tokens in the searched field? I
thought that this would be an easy one to Google for, but I cannot
figure it out. or find any references. There are many references to
word size in characters, but not to  filed size in words.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Filtering on results with more than N words.

Reply via email to