Brilliant! Thank you very much :) ________________________________________ Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: venerdì 27 luglio 2012 11.20 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word
Hi Simone, no I meant that you populate the two fields with the same input - best done via copyField directive. The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there). The query for the autocomplete list uses the first field when the input (typed in by the user) is one or two characters long. Your example was: "D", "G", or than "Do" or "Ga". The result would search only on the single token field that contains for the input "Dolce & Gabbana" only the ngrams "D" and "Do". So, only the input "D" or "Do" would result in a hit on "Dolce & Gabbana". Once the user has typed in the third letter: "Dol" or "Gab", you query the second, more tokenized field which would contain for "Dolce & Gabbana" the ngrams "Dol" "Dolc" "Dolce" "Gab" "Gabb" "Gabba" etc. Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana". 1. First field type: <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="2" side="front"/> 2. Secong field type: <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- maybe add WordDelimiter etc. --> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="10" side="front"/> 3. field declarations: <field name="short_prefix" type="short_ngram" … /> <field name="long_prefix" type="long_ngram" … /> <copyField source="short_prefix" dest="long_prefix" /> Chantal Am 27.07.2012 um 11:05 schrieb Finotti Simone: > Hi Chantal, > > if I understand correctly, this implies that I have to populate different > fields according to their lenght. Since I'm not aware of any logical > condition you can apply to copyField directive, it means that this logic has > to be implementend by the process that populates the Solr core. Is this > assumption correct? > > That's kind of bad, because I'd like to have this kind of "rules" in the Solr > configuration. Of course, if that's the only way... :) > > Thank you > > ________________________________________ > Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] > Inviato: giovedì 26 luglio 2012 18.32 > Fine: solr-user@lucene.apache.org > Oggetto: Re: Skip first word > > Hi, > > use two fields: > 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for > inputs of length < 3, > 2. the other one tokenized as appropriate with minsize=3 and longer for all > longer inputs > > > Cheers, > Chantal > > > Am 26.07.2012 um 09:05 schrieb Finotti Simone: > >> Hi Ahmet, >> business asked me to apply EdgeNGram with minGramSize=1 on the first term >> and with minGramSize=3 on the latter terms. >> >> We are developing a search suggestion mechanism, the idea is that if the >> user types "D", the engine should suggest "Dolce & Gabbana", but if we type >> "G", it should suggest other brands. Only if users type "Gab" it should >> suggest "Dolce & Gabbana". >> >> Thanks >> S >> ________________________________________ >> Inizio: Ahmet Arslan [iori...@yahoo.com] >> Inviato: mercoledì 25 luglio 2012 18.10 >> Fine: solr-user@lucene.apache.org >> Oggetto: Re: Skip first word >> >>> is there a tokenizer and/or a combination of filter to >>> remove the first term from a field? >>> >>> For example: >>> The quick brown fox >>> >>> should be tokenized as: >>> quick >>> brown >>> fox >> >> There is no such filter that i know of. Though, you can implement one with >> modifying source code of LengthFilterFactory or StopFilterFactory. They both >> remove tokens. Out of curiosity, what is the use case for this? >> >> >> >> > > > > >