RE: search with wildcard
I know it's documented that Lucene/Solr doesn't apply filters to queries with wildcards, but this seems to trip up a lot of users. I can also see why wildcards break a number of filters, but a number of filters (e.g. mapping charsets) could mostly or entirely work. The N-gram filter is another one that would be great to still run when there wildcards. If you indexed 4-grams and the query is a *testp*, you currently won't get any results; but the N-gram filter could have a wildcard mode that, in this case, would return just the first 4-gram as a token. Is this something you've considered? It would have to be enabled in the core network, but disabled by default for existing filters; then it could be enabled 1-by-1 for existing filters. Apologies if the dev list is a better place for this. Scott -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, November 21, 2013 8:40 AM To: solr-user@lucene.apache.org Subject: Re: search with wildcard Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
search with wildcard
I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Re: search with wildcard
Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
RE: search with wildcard
I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Re: search with wildcard
You might be able to make use of the dictionary compound word filter, but you will have to build up a dictionary of words to use: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html My e-book has some examples and a better description. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Thursday, November 21, 2013 11:40 AM To: solr-user@lucene.apache.org Subject: Re: search with wildcard Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Proximity search with wildcard
Hi, I am new to solr. Is it possible to do proximity search with solr. For example comp* engage~5. -- View this message in context: http://lucene.472066.n3.nabble.com/Proximity-search-with-wildcard-tp4096285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Proximity search with wildcard
Hi Sayeed, you can use fuzzy search. comp engage~0.2. Regards harshvardhan ojha On Fri, Oct 18, 2013 at 10:28 AM, sayeed abdulsayeed...@gmail.com wrote: Hi, I am new to solr. Is it possible to do proximity search with solr. For example comp* engage~5. -- View this message in context: http://lucene.472066.n3.nabble.com/Proximity-search-with-wildcard-tp4096285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Proximity search with wildcard
Generally in solr if we give Company engage~5 it will give the results containing engage 5 words near to the company. So here I want to get the results if i gave the query with wildcard as Compa* engage~5 - Sayeed -- View this message in context: http://lucene.472066.n3.nabble.com/Proximity-search-with-wildcard-tp4096285p4096354.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr admin search with wildcard
This is a no-op, or rather I'm not sure what it does: copyField source=url dest=url/ This is the key: copyField source=iframe dest=text/ But be aware that if you copy anything else into the text field you'll be searching there too. Now you can search the text field. Assuming this is from the example, the text field uses the text_general fieldType, which is defined to use the StandardTokenizerFactory to break up the incoming stream. Take a look at the javadocs and/or http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory The admin/analysis page will show you exactly what each step in an analyzer chain does to the input, you _really_ want to get familiar with that One final note, depending on your use-case, you may not need any copyfield at all, just use the text_general type for your iframe field If you choose this, be sure to delete your index and re-index from scratch... Best Erick On Thu, Jun 27, 2013 at 9:41 AM, Amit Sela am...@infolinks.com wrote: Forgive my ignorance but I want to be sure, do I add copyField source=iframe dest=text/ to solrindex-mapping.xml? so that my solrindex-mapping.xml looks like this: fields field dest=content source=content/ field dest=title source=title/ field dest=iframe source=iframe/ field dest=host source=host/ field dest=segment source=segment/ field dest=boost source=boost/ field dest=digest source=digest/ field dest=tstamp source=tstamp/ field dest=id source=url/ copyField source=url dest=url/ *copyField source=iframe dest=text/ * /fields uniqueKeyurl/uniqueKey And what do you mean by standard tokenization ? Thanks! On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky j...@basetechnology.com wrote: Just copyField from the string field to a text field and use standard tokenization, then you can search the text field for youtube or even something that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr admin search with wildcard The stored and indexed string is actually a url like http://www.youtube.com/**somethingsomething http://www.youtube.com/somethingsomething . It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.com * *wrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Solr admin search with wildcard
I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Solr admin search with wildcard
No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Solr admin search with wildcard
The stored and indexed string is actually a url like http://www.youtube.com/somethingsomething;. It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Solr admin search with wildcard
Just copyField from the string field to a text field and use standard tokenization, then you can search the text field for youtube or even something that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr admin search with wildcard The stored and indexed string is actually a url like http://www.youtube.com/somethingsomething;. It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Solr admin search with wildcard
Forgive my ignorance but I want to be sure, do I add copyField source=iframe dest=text/ to solrindex-mapping.xml? so that my solrindex-mapping.xml looks like this: fields field dest=content source=content/ field dest=title source=title/ field dest=iframe source=iframe/ field dest=host source=host/ field dest=segment source=segment/ field dest=boost source=boost/ field dest=digest source=digest/ field dest=tstamp source=tstamp/ field dest=id source=url/ copyField source=url dest=url/ *copyField source=iframe dest=text/ * /fields uniqueKeyurl/uniqueKey And what do you mean by standard tokenization ? Thanks! On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky j...@basetechnology.comwrote: Just copyField from the string field to a text field and use standard tokenization, then you can search the text field for youtube or even something that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr admin search with wildcard The stored and indexed string is actually a url like http://www.youtube.com/**somethingsomethinghttp://www.youtube.com/somethingsomething . It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.com* *wrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Search Phrase Wildcard?
Yes...!! you can search for phrases with wild cards. You dont have a direct support for it.. but u can achieve like the following... User input: Solr we Query should be: (name:Solr AND (name:we* OR name:we)) OR name:Solr we The query builder parses the original input and builds one that simulates a wildcard phrase query. It looks for all the words the user entered and adds a wildcard (*) to the last word. It also searches for the whole phrase the user entered using a phrase query in case the whole phrase is found in the index. This should work! let me know if you have any issues... -- View this message in context: http://www.nabble.com/Search-Phrase-Wildcard--tp23978330p23996409.html Sent from the Solr - User mailing list archive at Nabble.com.
Search Phrase Wildcard?
Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang
Re: Search Phrase Wildcard?
Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
You might be interested in this Lucene issue: https://issues.apache.org/jira/browse/LUCENE-1486 Aleksander M. Stensby wrote: Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- - Mark http://www.lucidimagination.com