Re: wild card search and lower-casing
Yes, it should be ok, as currently we are on the English side. If that's beneficial for the effort, I could do a field test on 3.4 after you close the jira. Best, Dmitry On Wed, Nov 23, 2011 at 2:52 PM, Erick Erickson wrote: > Ah, I see what you're doing, go for it. > > I intend to commit it today, but things happen. > > About changing the setLowerCaseExpandedTerms(true), yes > that'll take care of this issue, although it has some > locale-specific assumptions (i.e. string.toLowerCase() uses the > default locale). That may not matter in your situation though. > > Best > Erick > > On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan wrote: > > Thanks, Erick. I was in fact reading the patch (the one attached as a > > file to the aforementioned jira) you updated sometime yesterday. I'll > > watch the issue, but as said the change of a hard-coded boolean to its > > opposite worked just fine for me. > > > > Best, > > Dmitry > > > > > > On 11/22/11, Erick Erickson wrote: > >> No, no, no That's something buried in Lucene, it has nothing to > >> do with the patch! The patch has NOT yet been applied to any > >> released code. > >> > >> You could pull the patch from the JIRA and apply it to trunk locally if > >> you wanted. But there's no patch for 3.x, I'll probably put that up > >> over the holiday. > >> > >> But things have changed a bit (one of the things I'll have to do is > >> create some documentation). You *should* be able to specify > >> just legacyMultiTerm="true" in your if you want to > >> apply the 3.x patch to pre 3.6 code. It would be a good field test > >> if that worked for you. > >> > >> But you can't do any of this until the JIRA (SOLR-2438) is > >> marked "Resolution: Fixed". > >> > >> Don't be fooled by "Fix Version". "Fix Version" simply says > >> that those are the earliest versions it *could* go in. > >> > >> Best > >> Erick > >> > >> Best > >> Erick > >> > >> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan > wrote: > >>> I guess, I have found your comment, thanks. > >>> > >>> For our current needs I have just set: > >>> > >>> setLowercaseExpandedTerms(true); // changed from default false > >>> > >>> in the SolrQueryParser's constructor and that seem to work so far. > >>> > >>> In order not to start a separate thread on wildcards. Is it so, that > for > >>> the trailing wildcard there is a minimum of 2 preceding characters for > a > >>> search to happen? > >>> > >>> Dmitry > >>> > >>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson > >>> wrote: > >>> > It may be. The tricky bit is that there is a constant governing the > behavior of > this that restricts it to 3.6 and above. You'll have to change it > after > applying > the patch for this to work for you. Should be trivial, I'll leave a > note > in the > code about this, look for SOLR-2438 in the 3x code line for the place > to change. > > On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan > wrote: > > Thanks Erick. > > > > Do you think the patch you are working on will be applicable as > well to > 3.4? > > > > Best, > > Dmitry > > > > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson > > >wrote: > > > >> As it happens I'm working on SOLR-2438 which should address this. > This > >> patch > >> will provide two things: > >> > >> The ability to define a new analysis chain in your schema.xml, > >> currently > >> called > >> "multiterm" that will be applied to queries of various sorts, > >> including wildcard, > >> prefix, range. This will be somewhat of an "expert" thing to make > >> yourself... > >> > >> In the absence of an explicit definition it'll synthesize a > multiterm > >> analyzer > >> out of the query analyzer, taking any char fitlers, and > >> lowercaseFilter (if present), > >> and ASCIIFoldingfilter (if present) and putting them in the > multiterm > >> analyzer along > >> with a (hardcoded) WhitespaceTokenizer. > >> > >> As of 3.6 and 4.0, this will be the default behavior, although you > can > >> explicitly > >> define a field type parameter to specify the current behavior. > >> > >> The reason it is on 3.6 is that I want it to bake for a while > before > >> getting into the > >> wild, so I have no intention of trying to get it into the 3.5 > release. > >> > >> The patch is up for review now, I'd like another set of eyeballs or > >> two on it before > >> committing. > >> > >> The patch that's up there now is against trunk but I hope to have > a 3x > >> patch that > >> I'll apply to the 3x code line after 3.5 RC1 is cut. > >> > >> Best > >> Erick > >> > >> > >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan > wrote: > >> > > >> >> You're right: > >> >> > >> >> public SolrQueryParser(IndexSchema schema, S
Re: wild card search and lower-casing
Ah, I see what you're doing, go for it. I intend to commit it today, but things happen. About changing the setLowerCaseExpandedTerms(true), yes that'll take care of this issue, although it has some locale-specific assumptions (i.e. string.toLowerCase() uses the default locale). That may not matter in your situation though. Best Erick On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan wrote: > Thanks, Erick. I was in fact reading the patch (the one attached as a > file to the aforementioned jira) you updated sometime yesterday. I'll > watch the issue, but as said the change of a hard-coded boolean to its > opposite worked just fine for me. > > Best, > Dmitry > > > On 11/22/11, Erick Erickson wrote: >> No, no, no That's something buried in Lucene, it has nothing to >> do with the patch! The patch has NOT yet been applied to any >> released code. >> >> You could pull the patch from the JIRA and apply it to trunk locally if >> you wanted. But there's no patch for 3.x, I'll probably put that up >> over the holiday. >> >> But things have changed a bit (one of the things I'll have to do is >> create some documentation). You *should* be able to specify >> just legacyMultiTerm="true" in your if you want to >> apply the 3.x patch to pre 3.6 code. It would be a good field test >> if that worked for you. >> >> But you can't do any of this until the JIRA (SOLR-2438) is >> marked "Resolution: Fixed". >> >> Don't be fooled by "Fix Version". "Fix Version" simply says >> that those are the earliest versions it *could* go in. >> >> Best >> Erick >> >> Best >> Erick >> >> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan wrote: >>> I guess, I have found your comment, thanks. >>> >>> For our current needs I have just set: >>> >>> setLowercaseExpandedTerms(true); // changed from default false >>> >>> in the SolrQueryParser's constructor and that seem to work so far. >>> >>> In order not to start a separate thread on wildcards. Is it so, that for >>> the trailing wildcard there is a minimum of 2 preceding characters for a >>> search to happen? >>> >>> Dmitry >>> >>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson >>> wrote: >>> It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan wrote: > Thanks Erick. > > Do you think the patch you are working on will be applicable as well to 3.4? > > Best, > Dmitry > > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson > >>> >wrote: > >> As it happens I'm working on SOLR-2438 which should address this. This >> patch >> will provide two things: >> >> The ability to define a new analysis chain in your schema.xml, >> currently >> called >> "multiterm" that will be applied to queries of various sorts, >> including wildcard, >> prefix, range. This will be somewhat of an "expert" thing to make >> yourself... >> >> In the absence of an explicit definition it'll synthesize a multiterm >> analyzer >> out of the query analyzer, taking any char fitlers, and >> lowercaseFilter (if present), >> and ASCIIFoldingfilter (if present) and putting them in the multiterm >> analyzer along >> with a (hardcoded) WhitespaceTokenizer. >> >> As of 3.6 and 4.0, this will be the default behavior, although you can >> explicitly >> define a field type parameter to specify the current behavior. >> >> The reason it is on 3.6 is that I want it to bake for a while before >> getting into the >> wild, so I have no intention of trying to get it into the 3.5 release. >> >> The patch is up for review now, I'd like another set of eyeballs or >> two on it before >> committing. >> >> The patch that's up there now is against trunk but I hope to have a 3x >> patch that >> I'll apply to the 3x code line after 3.5 RC1 is cut. >> >> Best >> Erick >> >> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: >> > >> >> You're right: >> >> >> >> public SolrQueryParser(IndexSchema schema, String >> >> defaultField) { >> >> ... >> >> setLowercaseExpandedTerms(false); >> >> ... >> >> } >> > >> > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses >> default Locale) which is a Locale sensitive operation. >> > >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am >> > not >> sure if it is ported to solr. >> > >> > >> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucen
Re: wild card search and lower-casing
Thanks, Erick. I was in fact reading the patch (the one attached as a file to the aforementioned jira) you updated sometime yesterday. I'll watch the issue, but as said the change of a hard-coded boolean to its opposite worked just fine for me. Best, Dmitry On 11/22/11, Erick Erickson wrote: > No, no, no That's something buried in Lucene, it has nothing to > do with the patch! The patch has NOT yet been applied to any > released code. > > You could pull the patch from the JIRA and apply it to trunk locally if > you wanted. But there's no patch for 3.x, I'll probably put that up > over the holiday. > > But things have changed a bit (one of the things I'll have to do is > create some documentation). You *should* be able to specify > just legacyMultiTerm="true" in your if you want to > apply the 3.x patch to pre 3.6 code. It would be a good field test > if that worked for you. > > But you can't do any of this until the JIRA (SOLR-2438) is > marked "Resolution: Fixed". > > Don't be fooled by "Fix Version". "Fix Version" simply says > that those are the earliest versions it *could* go in. > > Best > Erick > > Best > Erick > > On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan wrote: >> I guess, I have found your comment, thanks. >> >> For our current needs I have just set: >> >> setLowercaseExpandedTerms(true); // changed from default false >> >> in the SolrQueryParser's constructor and that seem to work so far. >> >> In order not to start a separate thread on wildcards. Is it so, that for >> the trailing wildcard there is a minimum of 2 preceding characters for a >> search to happen? >> >> Dmitry >> >> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson >> wrote: >> >>> It may be. The tricky bit is that there is a constant governing the >>> behavior of >>> this that restricts it to 3.6 and above. You'll have to change it after >>> applying >>> the patch for this to work for you. Should be trivial, I'll leave a note >>> in the >>> code about this, look for SOLR-2438 in the 3x code line for the place >>> to change. >>> >>> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan wrote: >>> > Thanks Erick. >>> > >>> > Do you think the patch you are working on will be applicable as well to >>> 3.4? >>> > >>> > Best, >>> > Dmitry >>> > >>> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson >>> > >> >wrote: >>> > >>> >> As it happens I'm working on SOLR-2438 which should address this. This >>> >> patch >>> >> will provide two things: >>> >> >>> >> The ability to define a new analysis chain in your schema.xml, >>> >> currently >>> >> called >>> >> "multiterm" that will be applied to queries of various sorts, >>> >> including wildcard, >>> >> prefix, range. This will be somewhat of an "expert" thing to make >>> >> yourself... >>> >> >>> >> In the absence of an explicit definition it'll synthesize a multiterm >>> >> analyzer >>> >> out of the query analyzer, taking any char fitlers, and >>> >> lowercaseFilter (if present), >>> >> and ASCIIFoldingfilter (if present) and putting them in the multiterm >>> >> analyzer along >>> >> with a (hardcoded) WhitespaceTokenizer. >>> >> >>> >> As of 3.6 and 4.0, this will be the default behavior, although you can >>> >> explicitly >>> >> define a field type parameter to specify the current behavior. >>> >> >>> >> The reason it is on 3.6 is that I want it to bake for a while before >>> >> getting into the >>> >> wild, so I have no intention of trying to get it into the 3.5 release. >>> >> >>> >> The patch is up for review now, I'd like another set of eyeballs or >>> >> two on it before >>> >> committing. >>> >> >>> >> The patch that's up there now is against trunk but I hope to have a 3x >>> >> patch that >>> >> I'll apply to the 3x code line after 3.5 RC1 is cut. >>> >> >>> >> Best >>> >> Erick >>> >> >>> >> >>> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan >>> wrote: >>> >> > >>> >> >> You're right: >>> >> >> >>> >> >> public SolrQueryParser(IndexSchema schema, String >>> >> >> defaultField) { >>> >> >> ... >>> >> >> setLowercaseExpandedTerms(false); >>> >> >> ... >>> >> >> } >>> >> > >>> >> > Please note that lowercaseExpandedTerms uses String.toLowercase() >>> (uses >>> >> default Locale) which is a Locale sensitive operation. >>> >> > >>> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am >>> >> > not >>> >> sure if it is ported to solr. >>> >> > >>> >> > >>> >> >>> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >>> >> > >>> >> >>> > >>> >> > -- Regards, Dmitry Kan
Re: wild card search and lower-casing
No, no, no That's something buried in Lucene, it has nothing to do with the patch! The patch has NOT yet been applied to any released code. You could pull the patch from the JIRA and apply it to trunk locally if you wanted. But there's no patch for 3.x, I'll probably put that up over the holiday. But things have changed a bit (one of the things I'll have to do is create some documentation). You *should* be able to specify just legacyMultiTerm="true" in your if you want to apply the 3.x patch to pre 3.6 code. It would be a good field test if that worked for you. But you can't do any of this until the JIRA (SOLR-2438) is marked "Resolution: Fixed". Don't be fooled by "Fix Version". "Fix Version" simply says that those are the earliest versions it *could* go in. Best Erick Best Erick On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan wrote: > I guess, I have found your comment, thanks. > > For our current needs I have just set: > > setLowercaseExpandedTerms(true); // changed from default false > > in the SolrQueryParser's constructor and that seem to work so far. > > In order not to start a separate thread on wildcards. Is it so, that for > the trailing wildcard there is a minimum of 2 preceding characters for a > search to happen? > > Dmitry > > On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson > wrote: > >> It may be. The tricky bit is that there is a constant governing the >> behavior of >> this that restricts it to 3.6 and above. You'll have to change it after >> applying >> the patch for this to work for you. Should be trivial, I'll leave a note >> in the >> code about this, look for SOLR-2438 in the 3x code line for the place >> to change. >> >> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan wrote: >> > Thanks Erick. >> > >> > Do you think the patch you are working on will be applicable as well to >> 3.4? >> > >> > Best, >> > Dmitry >> > >> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson > >wrote: >> > >> >> As it happens I'm working on SOLR-2438 which should address this. This >> >> patch >> >> will provide two things: >> >> >> >> The ability to define a new analysis chain in your schema.xml, currently >> >> called >> >> "multiterm" that will be applied to queries of various sorts, >> >> including wildcard, >> >> prefix, range. This will be somewhat of an "expert" thing to make >> >> yourself... >> >> >> >> In the absence of an explicit definition it'll synthesize a multiterm >> >> analyzer >> >> out of the query analyzer, taking any char fitlers, and >> >> lowercaseFilter (if present), >> >> and ASCIIFoldingfilter (if present) and putting them in the multiterm >> >> analyzer along >> >> with a (hardcoded) WhitespaceTokenizer. >> >> >> >> As of 3.6 and 4.0, this will be the default behavior, although you can >> >> explicitly >> >> define a field type parameter to specify the current behavior. >> >> >> >> The reason it is on 3.6 is that I want it to bake for a while before >> >> getting into the >> >> wild, so I have no intention of trying to get it into the 3.5 release. >> >> >> >> The patch is up for review now, I'd like another set of eyeballs or >> >> two on it before >> >> committing. >> >> >> >> The patch that's up there now is against trunk but I hope to have a 3x >> >> patch that >> >> I'll apply to the 3x code line after 3.5 RC1 is cut. >> >> >> >> Best >> >> Erick >> >> >> >> >> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan >> wrote: >> >> > >> >> >> You're right: >> >> >> >> >> >> public SolrQueryParser(IndexSchema schema, String >> >> >> defaultField) { >> >> >> ... >> >> >> setLowercaseExpandedTerms(false); >> >> >> ... >> >> >> } >> >> > >> >> > Please note that lowercaseExpandedTerms uses String.toLowercase() >> (uses >> >> default Locale) which is a Locale sensitive operation. >> >> > >> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not >> >> sure if it is ported to solr. >> >> > >> >> > >> >> >> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >> >> > >> >> >> > >> >
Re: wild card search and lower-casing
I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson wrote: > It may be. The tricky bit is that there is a constant governing the > behavior of > this that restricts it to 3.6 and above. You'll have to change it after > applying > the patch for this to work for you. Should be trivial, I'll leave a note > in the > code about this, look for SOLR-2438 in the 3x code line for the place > to change. > > On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan wrote: > > Thanks Erick. > > > > Do you think the patch you are working on will be applicable as well to > 3.4? > > > > Best, > > Dmitry > > > > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson >wrote: > > > >> As it happens I'm working on SOLR-2438 which should address this. This > >> patch > >> will provide two things: > >> > >> The ability to define a new analysis chain in your schema.xml, currently > >> called > >> "multiterm" that will be applied to queries of various sorts, > >> including wildcard, > >> prefix, range. This will be somewhat of an "expert" thing to make > >> yourself... > >> > >> In the absence of an explicit definition it'll synthesize a multiterm > >> analyzer > >> out of the query analyzer, taking any char fitlers, and > >> lowercaseFilter (if present), > >> and ASCIIFoldingfilter (if present) and putting them in the multiterm > >> analyzer along > >> with a (hardcoded) WhitespaceTokenizer. > >> > >> As of 3.6 and 4.0, this will be the default behavior, although you can > >> explicitly > >> define a field type parameter to specify the current behavior. > >> > >> The reason it is on 3.6 is that I want it to bake for a while before > >> getting into the > >> wild, so I have no intention of trying to get it into the 3.5 release. > >> > >> The patch is up for review now, I'd like another set of eyeballs or > >> two on it before > >> committing. > >> > >> The patch that's up there now is against trunk but I hope to have a 3x > >> patch that > >> I'll apply to the 3x code line after 3.5 RC1 is cut. > >> > >> Best > >> Erick > >> > >> > >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan > wrote: > >> > > >> >> You're right: > >> >> > >> >> public SolrQueryParser(IndexSchema schema, String > >> >> defaultField) { > >> >> ... > >> >> setLowercaseExpandedTerms(false); > >> >> ... > >> >> } > >> > > >> > Please note that lowercaseExpandedTerms uses String.toLowercase() > (uses > >> default Locale) which is a Locale sensitive operation. > >> > > >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not > >> sure if it is ported to solr. > >> > > >> > > >> > http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html > >> > > >> > > >
Re: wild card search and lower-casing
It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan wrote: > Thanks Erick. > > Do you think the patch you are working on will be applicable as well to 3.4? > > Best, > Dmitry > > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson > wrote: > >> As it happens I'm working on SOLR-2438 which should address this. This >> patch >> will provide two things: >> >> The ability to define a new analysis chain in your schema.xml, currently >> called >> "multiterm" that will be applied to queries of various sorts, >> including wildcard, >> prefix, range. This will be somewhat of an "expert" thing to make >> yourself... >> >> In the absence of an explicit definition it'll synthesize a multiterm >> analyzer >> out of the query analyzer, taking any char fitlers, and >> lowercaseFilter (if present), >> and ASCIIFoldingfilter (if present) and putting them in the multiterm >> analyzer along >> with a (hardcoded) WhitespaceTokenizer. >> >> As of 3.6 and 4.0, this will be the default behavior, although you can >> explicitly >> define a field type parameter to specify the current behavior. >> >> The reason it is on 3.6 is that I want it to bake for a while before >> getting into the >> wild, so I have no intention of trying to get it into the 3.5 release. >> >> The patch is up for review now, I'd like another set of eyeballs or >> two on it before >> committing. >> >> The patch that's up there now is against trunk but I hope to have a 3x >> patch that >> I'll apply to the 3x code line after 3.5 RC1 is cut. >> >> Best >> Erick >> >> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: >> > >> >> You're right: >> >> >> >> public SolrQueryParser(IndexSchema schema, String >> >> defaultField) { >> >> ... >> >> setLowercaseExpandedTerms(false); >> >> ... >> >> } >> > >> > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses >> default Locale) which is a Locale sensitive operation. >> > >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not >> sure if it is ported to solr. >> > >> > >> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >> > >> >
Re: wild card search and lower-casing
Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson wrote: > As it happens I'm working on SOLR-2438 which should address this. This > patch > will provide two things: > > The ability to define a new analysis chain in your schema.xml, currently > called > "multiterm" that will be applied to queries of various sorts, > including wildcard, > prefix, range. This will be somewhat of an "expert" thing to make > yourself... > > In the absence of an explicit definition it'll synthesize a multiterm > analyzer > out of the query analyzer, taking any char fitlers, and > lowercaseFilter (if present), > and ASCIIFoldingfilter (if present) and putting them in the multiterm > analyzer along > with a (hardcoded) WhitespaceTokenizer. > > As of 3.6 and 4.0, this will be the default behavior, although you can > explicitly > define a field type parameter to specify the current behavior. > > The reason it is on 3.6 is that I want it to bake for a while before > getting into the > wild, so I have no intention of trying to get it into the 3.5 release. > > The patch is up for review now, I'd like another set of eyeballs or > two on it before > committing. > > The patch that's up there now is against trunk but I hope to have a 3x > patch that > I'll apply to the 3x code line after 3.5 RC1 is cut. > > Best > Erick > > > On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: > > > >> You're right: > >> > >> public SolrQueryParser(IndexSchema schema, String > >> defaultField) { > >> ... > >> setLowercaseExpandedTerms(false); > >> ... > >> } > > > > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses > default Locale) which is a Locale sensitive operation. > > > > In Lucene AnalyzingQueryParser exists for this purposes, but I am not > sure if it is ported to solr. > > > > > http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html > > >
Re: wild card search and lower-casing
As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called "multiterm" that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an "expert" thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: > >> You're right: >> >> public SolrQueryParser(IndexSchema schema, String >> defaultField) { >> ... >> setLowercaseExpandedTerms(false); >> ... >> } > > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses > default Locale) which is a Locale sensitive operation. > > In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if > it is ported to solr. > > http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >
Re: wild card search and lower-casing
> You're right: > > public SolrQueryParser(IndexSchema schema, String > defaultField) { > ... > setLowercaseExpandedTerms(false); > ... > } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
Re: wild card search and lower-casing
You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } OK, thanks for pointing. On Fri, Nov 18, 2011 at 4:12 PM, Ahmet Arslan wrote: > > Actually I have just checked the source code of Lucene's > > QueryParser and > > lowercaseExpandedTerms there is set to true by default > > (version 3.4). The > > code there does lower-casing by default. So in that sense I > > don't need to > > do anything in the client code. Is something wrong here? > > But SolrQueryParser extends that and default behavior may different. For > clarification see source code of SolrQueryParser. > -- Regards, Dmitry Kan
Re: wild card search and lower-casing
> Actually I have just checked the source code of Lucene's > QueryParser and > lowercaseExpandedTerms there is set to true by default > (version 3.4). The > code there does lower-casing by default. So in that sense I > don't need to > do anything in the client code. Is something wrong here? But SolrQueryParser extends that and default behavior may different. For clarification see source code of SolrQueryParser.
Re: wild card search and lower-casing
OK. Actually I have just checked the source code of Lucene's QueryParser and lowercaseExpandedTerms there is set to true by default (version 3.4). The code there does lower-casing by default. So in that sense I don't need to do anything in the client code. Is something wrong here? On Fri, Nov 18, 2011 at 3:49 PM, Ahmet Arslan wrote: > > Hi Ahmet, > > > > Thanks for the link. > > > > I'm a bit puzzled with the explanation found there > > regarding lower casing: > > > > These queries are case-insensitive anyway because > > QueryParser makes them > > lowercase. > > > > that's exactly what I want to achieve, but somehow the > > queries *are* > > case-sensitive. Probably I should play around with code of > > a query parser. > > There is an effort for this : > https://issues.apache.org/jira/browse/SOLR-218 > You can vote this issue. For the time being you can lowercase them in the > client side. > -- Regards, Dmitry Kan
Re: wild card search and lower-casing
> Hi Ahmet, > > Thanks for the link. > > I'm a bit puzzled with the explanation found there > regarding lower casing: > > These queries are case-insensitive anyway because > QueryParser makes them > lowercase. > > that's exactly what I want to achieve, but somehow the > queries *are* > case-sensitive. Probably I should play around with code of > a query parser. There is an effort for this : https://issues.apache.org/jira/browse/SOLR-218 You can vote this issue. For the time being you can lowercase them in the client side.
Re: wild card search and lower-casing
Hi Ahmet, Thanks for the link. I'm a bit puzzled with the explanation found there regarding lower casing: These queries are case-insensitive anyway because QueryParser makes them lowercase. that's exactly what I want to achieve, but somehow the queries *are* case-sensitive. Probably I should play around with code of a query parser. On Fri, Nov 18, 2011 at 2:50 PM, Ahmet Arslan wrote: > > Here is one puzzle I couldn't yet find a key for: > > > > for the wild-card query: > > > > *ocvd > > > > SOLR 3.4 returns hits. But for > > > > *OCVD > > > > it doesn't > > This is a FAQ. Please see > > > http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F > -- Regards, Dmitry Kan
Re: wild card search and lower-casing
> Here is one puzzle I couldn't yet find a key for: > > for the wild-card query: > > *ocvd > > SOLR 3.4 returns hits. But for > > *OCVD > > it doesn't This is a FAQ. Please see http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
wild card search and lower-casing
Hello, Here is one puzzle I couldn't yet find a key for: for the wild-card query: *ocvd SOLR 3.4 returns hits. But for *OCVD it doesn't On the indexing side two following tokenizers/filters are defined: On the query side: SOLR analysis tool shows, that OCVD gets lower-cased to ocvd. Does SOLR skip a lower-casing step when doing the actual wild-card search? BTW, the same issue for a trailing wild-card: mocv* produces hits, while MOCV* doesn't. Appreciate any help or pointers. -- Regards, Dmitry Kan