Re: Skip first word

Finotti Simone Fri, 27 Jul 2012 02:53:59 -0700

Brilliant!
Thank you very much :)

________________________________________
Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
Inviato: venerdì 27 luglio 2012 11.20
Fine: solr-user@lucene.apache.org
Oggetto: Re: Skip first word


Hi Simone,

no I meant that you populate the two fields with the same input - best done via 
copyField directive.

The first field will contain ngrams of size 1 and 2. The other field will 
contain ngrams of size 3 and longer (you might want to set a decent maxsize 
there).

The query for the autocomplete list uses the first field when the input (typed 
in by the user) is one or two characters long. Your example was: "D", "G", or 
than "Do" or "Ga". The result would search only on the single token field that 
contains for the input "Dolce & Gabbana" only the ngrams "D" and "Do". So, only 
the input "D" or "Do" would result in a hit on "Dolce & Gabbana".
Once the user has typed in the third letter: "Dol" or "Gab", you query the 
second, more tokenized field which would contain for "Dolce & Gabbana" the 
ngrams "Dol" "Dolc" "Dolce" "Gab" "Gabb" "Gabba" etc.
Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana".

1. First  field type:

<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="2" 
side="front"/>

2. Secong field type:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- maybe add WordDelimiter etc. -->
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="10" 
side="front"/>

3. field declarations:

<field name="short_prefix" type="short_ngram" … />
<field name="long_prefix" type="long_ngram" … />

<copyField source="short_prefix" dest="long_prefix" />


Chantal

Am 27.07.2012 um 11:05 schrieb Finotti Simone:

> Hi Chantal,
>
> if I understand correctly, this implies that I have to populate different 
> fields according to their lenght. Since I'm not aware of any logical 
> condition you can apply to copyField directive, it means that this logic has 
> to be implementend by the process that populates the Solr core. Is this 
> assumption correct?
>
> That's kind of bad, because I'd like to have this kind of "rules" in the Solr 
> configuration. Of course, if that's the only way... :)
>
> Thank you
>
> ________________________________________
> Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
> Inviato: giovedì 26 luglio 2012 18.32
> Fine: solr-user@lucene.apache.org
> Oggetto: Re: Skip first word
>
> Hi,
>
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for 
> inputs of length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for all 
> longer inputs
>
>
> Cheers,
> Chantal
>
>
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
>
>> Hi Ahmet,
>> business asked me to apply EdgeNGram with minGramSize=1 on the first term 
>> and with minGramSize=3 on the latter terms.
>>
>> We are developing a search suggestion mechanism, the idea is that if the 
>> user types "D", the engine should suggest "Dolce & Gabbana", but if we type 
>> "G", it should suggest other brands. Only if users type "Gab" it should 
>> suggest "Dolce & Gabbana".
>>
>> Thanks
>> S
>> ________________________________________
>> Inizio: Ahmet Arslan [iori...@yahoo.com]
>> Inviato: mercoledì 25 luglio 2012 18.10
>> Fine: solr-user@lucene.apache.org
>> Oggetto: Re: Skip first word
>>
>>> is there a tokenizer and/or a combination of filter to
>>> remove the first term from a field?
>>>
>>> For example:
>>> The quick brown fox
>>>
>>> should be tokenized as:
>>> quick
>>> brown
>>> fox
>>
>> There is no such filter that i know of. Though, you can implement one with 
>> modifying source code of LengthFilterFactory or StopFilterFactory. They both 
>> remove tokens. Out of curiosity, what is the use case for this?
>>
>>
>>
>>
>
>
>
>
>

Re: Skip first word

Reply via email to