RE: Solr Wildcard Search
A slightly more refined answer... In my experience with the systems I've worked with, Porter and other stemmers can be useful as a "fallback field" with a really low boost, but you should be really careful if you're only searching on one field. Cannot recommend Doug Turnbull and John Berryman's "Relevant Search" enough on how to layer fields...among many other great insights: https://www.manning.com/books/relevant-search -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 9:20 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search At the very least the English possessive filter, which you have. Great! Depending on what your query log analysis finds -- perhaps users are pretty much only searching on nouns? -- you might consider EnglishMinimalStemFilterFactory. I wouldn't say that porter was or wasn't chosen intentionally. It may be good for some use cases. However, for the use cases I've seen, it has been disastrous. I have code that shows "equivalence sets" for analysis chain A vs analysis chain B...with some noise...assume same tokenization... I should probably share that code on github or fold it into Luke somehow? You can see this on a one-off basis in the Solr admin window via the Analysis tab, but to see this on your corpus/corpora across terms can be eye-opening, and then to cross-check it against query logs...quite powerful. On one corpus, when I compared the same analysis chain A without Porter and B with porter, the output is e.g.: "stemmed\tunstemmed #docs|unstemmed #docs..." public public 9834 | publication 1429 | publications 960 | publicly 662 | public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | publiced 1 effect effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | effects 831 | effected 139 | effecting 85 | effectives 1 new new 13279 | newness 6 | newed 3 | newe 2 | newing 1 order order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | order's 17 | orderable 3 | orders_ 1 Imagine users searching for "publication" (~2500 docs) and getting back every document that mentions "public" (~10k). That's a huge problem in many circumstances. Good luck finding the name "newing". -Original Message- From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com] Sent: Thursday, November 30, 2017 8:31 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that Again, Porter can be very dangerous if it doesn't align with user expectations. -----Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: > I wish to understand if I can do something to get in result
RE: Solr Wildcard Search
At the very least the English possessive filter, which you have. Great! Depending on what your query log analysis finds -- perhaps users are pretty much only searching on nouns? -- you might consider EnglishMinimalStemFilterFactory. I wouldn't say that porter was or wasn't chosen intentionally. It may be good for some use cases. However, for the use cases I've seen, it has been disastrous. I have code that shows "equivalence sets" for analysis chain A vs analysis chain B...with some noise...assume same tokenization... I should probably share that code on github or fold it into Luke somehow? You can see this on a one-off basis in the Solr admin window via the Analysis tab, but to see this on your corpus/corpora across terms can be eye-opening, and then to cross-check it against query logs...quite powerful. On one corpus, when I compared the same analysis chain A without Porter and B with porter, the output is e.g.: "stemmed\tunstemmed #docs|unstemmed #docs..." public public 9834 | publication 1429 | publications 960 | publicly 662 | public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | publiced 1 effect effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | effects 831 | effected 139 | effecting 85 | effectives 1 new new 13279 | newness 6 | newed 3 | newe 2 | newing 1 order order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | order's 17 | orderable 3 | orders_ 1 Imagine users searching for "publication" (~2500 docs) and getting back every document that mentions "public" (~10k). That's a huge problem in many circumstances. Good luck finding the name "newing". -Original Message- From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com] Sent: Thursday, November 30, 2017 8:31 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that Again, Porter can be very dangerous if it doesn't align with user expectations. -----Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: > I wish to understand if I can do something to get in result term > "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > >
RE: Solr Wildcard Search
I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: > I wish to understand if I can do something to get in result term > "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -Original Message- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd > together. It also takes you down a different analysis pathway, as many > analysis components do not work with multiple terms. Look into the > SolrAdmin console, and use the analysis tab to understand what is > going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > <gnevsky.cn...@thomasnet.com> wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
RE: Solr Wildcard Search
The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: > I wish to understand if I can do something to get in result term "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -Original Message- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd > together. It also takes you down a different analysis pathway, as many > analysis components do not work with multiple terms. Look into the > SolrAdmin console, and use the analysis tab to understand what is > going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > <gnevsky.cn...@thomasnet.com> wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
Re: Solr Wildcard Search
As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: > I wish to understand if I can do something to get in result term "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are default > to > Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -----Original Message- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd together. It > also takes you down a different analysis pathway, as many analysis > components do not work with multiple terms. Look into the SolrAdmin > console, > and use the analysis tab to understand what is going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > <gnevsky.cn...@thomasnet.com> wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
RE: Solr Wildcard Search
I wish to understand if I can do something to get in result term "shipping" when search for "shipp*"? Here field definition: Anything else can be important? Most configuration parameters are default to Apache Solr 7.1.0. In the best we trust Georgy Nevsky -Original Message- From: Rick Leir [mailto:rl...@leirtech.com] Sent: Thursday, November 30, 2017 7:32 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search George, When you get those results it could be due to stemming. Wildcard processing expands your term to multiple terms, OR'd together. It also takes you down a different analysis pathway, as many analysis components do not work with multiple terms. Look into the SolrAdmin console, and use the analysis tab to understand what is going on. If you still have doubts, tell us more about your config. Cheers --Rick On November 30, 2017 7:06:42 AM EST, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: >Can somebody help me understand how Solr Wildcard Search is working? > >If I’m doing search for “ship*” term I’m getting in result many >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, >etc. > >But if I’m searching for “shipp*” I don’t get any result. > > > >In the best we trust > >Georgy Nevsky -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Solr Wildcard Search
George, When you get those results it could be due to stemming. Wildcard processing expands your term to multiple terms, OR'd together. It also takes you down a different analysis pathway, as many analysis components do not work with multiple terms. Look into the SolrAdmin console, and use the analysis tab to understand what is going on. If you still have doubts, tell us more about your config. Cheers --Rick On November 30, 2017 7:06:42 AM EST, Georgy Nevsky <gnevsky.cn...@thomasnet.com> wrote: >Can somebody help me understand how Solr Wildcard Search is working? > >If I’m doing search for “ship*” term I’m getting in result many >strings, >like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc. > >But if I’m searching for “shipp*” I don’t get any result. > > > >In the best we trust > >Georgy Nevsky -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Solr Wildcard Search
Can somebody help me understand how Solr Wildcard Search is working? If I’m doing search for “ship*” term I’m getting in result many strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc. But if I’m searching for “shipp*” I don’t get any result. In the best we trust Georgy Nevsky
Solr Wildcard Search for large amount of text
Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian = ni, nig, nige, niger, nigeri, nigeria, nigeria, nigerian However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Thank you so much for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Wildcard Search for large amount of text
On 6/27/2015 4:27 AM, octopus wrote: Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian = ni, nig, nige, niger, nigeri, nigeria, nigeria, nigerian However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Both edgengrams and wildcards are ways to do this. There are advantages and disadvantages to both ways. To do a wildcard search, Solr (Lucene really) must look up all the matching terms in the index and substitute them into the query so that it becomes a large number of simple string matches. If you have a large number of terms in your index, that can be slow. The expensive work (expanding the terms) is done for every single query. The edgengram filter does similar work, but it does it at *index* time, rather than query time. At query time, you are doing a simple string match with one term, although the index contains many more terms, because the very expensive work was done at index time. It's difficult to know which approach will be more efficient on *your* index without experimentation, but there is a general rule when it comes to Solr performance: As much as possible, do the expensive work at index time. Thanks, Shawn
Re: Solr Wildcard Search for large amount of text
That is one way to implement wildcarda, but isnt the most efficient. Just index normally, tokenized, and search with an asterisk suffix, e.g. foo* This will build a finite state transformer that will make wildcard handling efficient. Upayavira On, Jun 27, 2015, at 11:27 AM, pus wrote: Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian = ni, nig, nige, niger, nigeri, nigeria, nigeria, nigerian However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Thank you so much for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Wildcard Search for large amount of text
Try it and see ;). My experience is that wildcards work fine, although what fine is up to you to decide _if_ you restrict it to requiring at least two leading real characters, and I actually prefer three. I.e. ab* or abc*. Note that if you require leading wildcards, use the reverse wildcard filter. I will vociferously argue that single-letter wildcards are not useful anyway. I mean every single document in your corpus will probably match every single-letter wildcard (a*, b*, whatever), providing no benefit to the user. And, the need for wildcards can often be reduced or eliminated if you use can autosuggest or autocomplete. Of course if you're trying to satisfy more complex use cases where the user is composing their own complex clauses that may not apply. FWIW, Erick On Sat, Jun 27, 2015 at 10:06 AM, Shawn Heisey apa...@elyograg.org wrote: On 6/27/2015 4:27 AM, octopus wrote: Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian = ni, nig, nige, niger, nigeri, nigeria, nigeria, nigerian However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Both edgengrams and wildcards are ways to do this. There are advantages and disadvantages to both ways. To do a wildcard search, Solr (Lucene really) must look up all the matching terms in the index and substitute them into the query so that it becomes a large number of simple string matches. If you have a large number of terms in your index, that can be slow. The expensive work (expanding the terms) is done for every single query. The edgengram filter does similar work, but it does it at *index* time, rather than query time. At query time, you are doing a simple string match with one term, although the index contains many more terms, because the very expensive work was done at index time. It's difficult to know which approach will be more efficient on *your* index without experimentation, but there is a general rule when it comes to Solr performance: As much as possible, do the expensive work at index time. Thanks, Shawn
Re: Solr Wildcard Search for large amount of text
What do you want actual user queries to look like? I mean, having to explicitly write asterisks after every term is a real pain. Indexing ngrams has the advantage that phrase queries and edismax phrase boosting work automatically. Phrases don't work with explicit wildcard queries. The only real downside to ngrams is that they explode the size of the index. But memory is supposed to be cheap these days. I mean, compare the cost of the extra RAM (to keep the full index in memory) to the cost to users of tehir productivity constructing queries and having expensive staff to help them figure out why various queries don't work as expected. How big is your corpus - number of documents and average document size? -- Jack Krupansky On Sat, Jun 27, 2015 at 6:27 AM, octopus octroll...@gmail.com wrote: Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian = ni, nig, nige, niger, nigeri, nigeria, nigeria, nigerian However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Thank you so much for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr wildcard search
Also be aware that some analysis steps may not be performed on wildcards. The filter has to be MultTermAware. See: https://wiki.apache.org/solr/MultitermQueryAnalysis and http://searchhub.org/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ Best, Erick On Fri, Sep 13, 2013 at 12:12 PM, Jack Krupansky j...@basetechnology.comwrote: Wildcard applies only to a single term. The escaped space suggests that you are trying to match a wildcard on multiple terms. Try the contrib complex phrase query parser. -- Jack Krupansky -Original Message- From: Prasi S Sent: Friday, September 13, 2013 6:37 AM To: solr-user@lucene.apache.org Subject: Solr wildcard search Hi all, I am working with wildcard queries and few things are confusing. 1. Does a wildcard search omit the analysers on a particular field? 2. I have searched for q=google\ technology - gives result q=google technology - Gives results q=google tech* - gives results q=google\ tech* - 0 results. The debug Query for the last query is str name=parsedquery_toString**text:google tech*/str Why does this happen. Thanks, Prasi
Re: Solr wildcard search
Wildcard applies only to a single term. The escaped space suggests that you are trying to match a wildcard on multiple terms. Try the contrib complex phrase query parser. -- Jack Krupansky -Original Message- From: Prasi S Sent: Friday, September 13, 2013 6:37 AM To: solr-user@lucene.apache.org Subject: Solr wildcard search Hi all, I am working with wildcard queries and few things are confusing. 1. Does a wildcard search omit the analysers on a particular field? 2. I have searched for q=google\ technology - gives result q=google technology - Gives results q=google tech* - gives results q=google\ tech* - 0 results. The debug Query for the last query is str name=parsedquery_toStringtext:google tech*/str Why does this happen. Thanks, Prasi
Solr wildcard search
Hi all, I am working with wildcard queries and few things are confusing. 1. Does a wildcard search omit the analysers on a particular field? 2. I have searched for q=google\ technology - gives result q=google technology - Gives results q=google tech* - gives results q=google\ tech* - 0 results. The debug Query for the last query is str name=parsedquery_toStringtext:google tech*/str Why does this happen. Thanks, Prasi