Re: Which Tokeniser (and/or filter)

Erik Hatcher Tue, 07 Feb 2012 13:03:58 -0800

A custom tokenizer/tokenfilter could set the position increment when a newline 
comes through as well.


   Erik

On Feb 7, 2012, at 15:28, Erick Erickson <erickerick...@gmail.com> wrote:

> Well, this is a common approach. Someone has to split up the
> input as "sentences" (whatever they are). Putting them in multi-valued
> fields is trivial.
> 
> Then you confine things to within sentences, then you start searching
> phrases with a slop less than your incrementGap...
> 
> Best
> Erick
> 
> On Tue, Feb 7, 2012 at 12:27 PM, Robert Brown <r...@intelcompute.com> wrote:
>> This all seems a bit too much work for such a real-world scenario?
>> 
>> 
>> ---
>> 
>> IntelCompute
>> Web Design & Local Online Marketing
>> 
>> http://www.intelcompute.com
>> 
>> 
>> On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
>> <iori...@yahoo.com> wrote:
>>>> I'm still finding matches across
>>>> newlines
>>>> 
>>>> index...
>>>> 
>>>> i am fluent
>>>> german racing
>>>> 
>>>> search...
>>>> 
>>>> "fluent german"
>>>> 
>>>> Any suggestions?
>>> 
>>> You can use a multiValued field for this. Split your document
>>> according to new line at client side.
>>> 
>>> <arr>i am fluent</arr>
>>> <arr>german racing</arr>
>>> 
>>> positionIncrementGap="100" will prevent query "fluent german" to match.
>>> 
>>> Or, may be you can inject artificial tokens via
>>> 
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
>>> 
>>> Your document becomes : i am fluent NEWLINE german racing
>>

Re: Which Tokeniser (and/or filter)

Reply via email to