Re: indexing two words, searching single word

2018-08-03 Thread Susheel Kumar
and as you suggested, use stop word before shingles...

On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV 
wrote:

> 
>   
>   
>outputUnigrams="true" tokenSeparator=""/> 
> 
>
> seems to "work"
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Freitag, 3. August 2018 13:46
> An: solr-user@lucene.apache.org
> Betreff: AW: indexing two words, searching single word
>
> >Because you probably are not looking for "andthe" kind of tokens
> (unfortunately) I guess I am, as we don't know what people enter...
>
> > a shingle plus regex to remove whitespace
> sounds interesting. How would that filter-chain look like? That would be
> an type="index"-analyzer?
> I guess we could shingle after stop-word-filtering and I quess
> maxShingleSize="2" would suffice
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch 
> Gesendet: Freitag, 3. August 2018 13:33
> An: solr-user 
> Betreff: Re: indexing two words, searching single word
>
> But what is your generic problem then. Because you probably are not
> looking for "andthe" kind of tokens.
>
> However a shingle plus regex to remove whitespace can give you "anytwo
> wordstogether smooshed" tokens in the index.
>
> Regards,
>  Alex
>
>
> On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, 
> wrote:
>
> > Hi Markus,
> > thanks for the quick answer.
> >
> > "sound stage" was just an example. We are looking for a generic
> > solution ...
> >
> > Is it "ok" to apply an NGRamFilter for query-analyzing?
> > 
> > 
> > 
> >          > maxGramSize="15" />
> > 
> >
> > I guess (besides the performance impact) this reduces search results
> > accuracy?
> >
> > -Clemens
> >
> > -Ursprüngliche Nachricht-
> > Von: Markus Jelsma 
> > Gesendet: Freitag, 3. August 2018 12:43
> > An: solr-user@lucene.apache.org
> > Betreff: RE: indexing two words, searching single word
> >
> > Hello,
> >
> > If your case is English you could use synonyms to work around the
> > problem of the few compound words of the language. However, would you
> > be dealing with a Germanic compound language, the
> > HyphenationCompoundWordTokenFilter
> > [1] or DictionaryCompoundWordTokenFilter are a better choice. The
> > former is much more flexible but has its drawbacks.
> >
> > Regards,
> > Markus
> >
> >
> > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
> >
> >
> >
> > -Original message-
> > > From:Clemens Wyss DEV 
> > > Sent: Friday 3rd August 2018 12:22
> > > To: solr-user@lucene.apache.org
> > > Subject: indexing two words, searching single word
> > >
> > > Sounds like a rather simple issue:
> > > if I index "sound stage" and search for "soundstage" I get no hits
> > >
> > > What am I doing wrong
> > > a) when indexing
> > > b) when searching
> > > ?
> > >
> > > Thx in advance
> > > - Clemens
> > >
> >
>


Re: indexing two words, searching single word

2018-08-03 Thread Alexandre Rafalovitch
But what is your generic problem then. Because you probably are not looking
for "andthe" kind of tokens.

However a shingle plus regex to remove whitespace can give you "anytwo
wordstogether smooshed" tokens in the index.

Regards,
 Alex


On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV,  wrote:

> Hi Markus,
> thanks for the quick answer.
>
> "sound stage" was just an example. We are looking for a generic solution
> ...
>
> Is it "ok" to apply an NGRamFilter for query-analyzing?
> 
> 
> 
>  maxGramSize="15" />
> 
>
> I guess (besides the performance impact) this reduces search results
> accuracy?
>
> -Clemens
>
> -Ursprüngliche Nachricht-
> Von: Markus Jelsma 
> Gesendet: Freitag, 3. August 2018 12:43
> An: solr-user@lucene.apache.org
> Betreff: RE: indexing two words, searching single word
>
> Hello,
>
> If your case is English you could use synonyms to work around the problem
> of the few compound words of the language. However, would you be dealing
> with a Germanic compound language, the HyphenationCompoundWordTokenFilter
> [1] or DictionaryCompoundWordTokenFilter are a better choice. The former is
> much more flexible but has its drawbacks.
>
> Regards,
> Markus
>
>
> https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
>
>
>
> -Original message-
> > From:Clemens Wyss DEV 
> > Sent: Friday 3rd August 2018 12:22
> > To: solr-user@lucene.apache.org
> > Subject: indexing two words, searching single word
> >
> > Sounds like a rather simple issue:
> > if I index "sound stage" and search for "soundstage" I get no hits
> >
> > What am I doing wrong
> > a) when indexing
> > b) when searching
> > ?
> >
> > Thx in advance
> > - Clemens
> >
>


RE: indexing two words, searching single word

2018-08-03 Thread Markus Jelsma
Hello,

If your case is English you could use synonyms to work around the problem of 
the few compound words of the language. However, would you be dealing with a 
Germanic compound language, the HyphenationCompoundWordTokenFilter [1] or 
DictionaryCompoundWordTokenFilter are a better choice. The former is much more 
flexible but has its drawbacks.

Regards,
Markus

https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html

 
 
-Original message-
> From:Clemens Wyss DEV 
> Sent: Friday 3rd August 2018 12:22
> To: solr-user@lucene.apache.org
> Subject: indexing two words, searching single word
> 
> Sounds like a rather simple issue:
> if I index "sound stage" and search for "soundstage" I get no hits
> 
> What am I doing wrong 
> a) when indexing
> b) when searching
> ?
> 
> Thx in advance
> - Clemens
>