Re: wild card search and lower-casing

2011-11-23 Thread Dmitry Kan
Yes, it should be ok, as currently we are on the English side. If that's
beneficial for the effort, I could do a field test on 3.4 after you close
the jira.

Best,
Dmitry

On Wed, Nov 23, 2011 at 2:52 PM, Erick Erickson wrote:

> Ah, I see what you're doing, go for it.
>
> I intend to commit it today, but things happen.
>
> About changing the setLowerCaseExpandedTerms(true), yes
> that'll take care of this issue, although it has some
> locale-specific assumptions (i.e. string.toLowerCase() uses the
> default locale). That may not matter in your situation though.
>
> Best
> Erick
>
> On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan  wrote:
> > Thanks, Erick. I was in fact reading the patch (the one attached as a
> > file to the aforementioned jira) you updated sometime yesterday. I'll
> > watch the issue, but as said the change of a hard-coded boolean to its
> > opposite worked just fine for me.
> >
> > Best,
> > Dmitry
> >
> >
> > On 11/22/11, Erick Erickson  wrote:
> >> No, no, no That's something buried in Lucene, it has nothing to
> >> do with the patch! The patch has NOT yet been applied to any
> >> released code.
> >>
> >> You could pull the patch from the JIRA and apply it to trunk locally if
> >> you wanted. But there's no patch for 3.x, I'll probably put that up
> >> over the holiday.
> >>
> >> But things have changed a bit (one of the things I'll have to do is
> >> create some documentation). You *should* be able to specify
> >> just legacyMultiTerm="true" in your  if you want to
> >> apply the 3.x patch to pre 3.6 code. It would be a good field test
> >> if that worked for you.
> >>
> >> But you can't do any of this until the JIRA (SOLR-2438) is
> >> marked "Resolution: Fixed".
> >>
> >> Don't be fooled by "Fix Version". "Fix Version" simply says
> >> that those are the earliest versions it *could* go in.
> >>
> >> Best
> >> Erick
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan 
> wrote:
> >>> I guess, I have found your comment, thanks.
> >>>
> >>> For our current needs I have just set:
> >>>
> >>> setLowercaseExpandedTerms(true); // changed from default false
> >>>
> >>> in the SolrQueryParser's constructor and that seem to work so far.
> >>>
> >>> In order not to start a separate thread on wildcards. Is it so, that
> for
> >>> the trailing wildcard there is a minimum of 2 preceding characters for
> a
> >>> search to happen?
> >>>
> >>> Dmitry
> >>>
> >>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
> >>> wrote:
> >>>
>  It may be. The tricky bit is that there is a constant governing the
>  behavior of
>  this that restricts it to 3.6 and above. You'll have to change it
> after
>  applying
>  the patch for this to work for you. Should be trivial, I'll leave a
> note
>  in the
>  code about this, look for SOLR-2438 in the 3x code line for the place
>  to change.
> 
>  On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan 
> wrote:
>  > Thanks Erick.
>  >
>  > Do you think the patch you are working on will be applicable as
> well to
>  3.4?
>  >
>  > Best,
>  > Dmitry
>  >
>  > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
>  >   >wrote:
>  >
>  >> As it happens I'm working on SOLR-2438 which should address this.
> This
>  >> patch
>  >> will provide two things:
>  >>
>  >> The ability to define a new analysis chain in your schema.xml,
>  >> currently
>  >> called
>  >> "multiterm" that will be applied to queries of various sorts,
>  >> including wildcard,
>  >> prefix, range. This will be somewhat of an "expert" thing to make
>  >> yourself...
>  >>
>  >> In the absence of an explicit definition it'll synthesize a
> multiterm
>  >> analyzer
>  >> out of the query analyzer, taking any char fitlers, and
>  >> lowercaseFilter (if present),
>  >> and ASCIIFoldingfilter (if present) and putting them in the
> multiterm
>  >> analyzer along
>  >> with a (hardcoded) WhitespaceTokenizer.
>  >>
>  >> As of 3.6 and 4.0, this will be the default behavior, although you
> can
>  >> explicitly
>  >> define a field type parameter to specify the current behavior.
>  >>
>  >> The reason it is on 3.6 is that I want it to bake for a while
> before
>  >> getting into the
>  >> wild, so I have no intention of trying to get it into the 3.5
> release.
>  >>
>  >> The patch is up for review now, I'd like another set of eyeballs or
>  >> two on it before
>  >> committing.
>  >>
>  >> The patch that's up there now is against trunk but I hope to have
> a 3x
>  >> patch that
>  >> I'll apply to the 3x code line after 3.5 RC1 is cut.
>  >>
>  >> Best
>  >> Erick
>  >>
>  >>
>  >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan 
>  wrote:
>  >> >
>  >> >> You're right:
>  >> >>
>  >> >> public SolrQueryParser(IndexSchema schema, S

Re: wild card search and lower-casing

2011-11-23 Thread Erick Erickson
Ah, I see what you're doing, go for it.

I intend to commit it today, but things happen.

About changing the setLowerCaseExpandedTerms(true), yes
that'll take care of this issue, although it has some
locale-specific assumptions (i.e. string.toLowerCase() uses the
default locale). That may not matter in your situation though.

Best
Erick

On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan  wrote:
> Thanks, Erick. I was in fact reading the patch (the one attached as a
> file to the aforementioned jira) you updated sometime yesterday. I'll
> watch the issue, but as said the change of a hard-coded boolean to its
> opposite worked just fine for me.
>
> Best,
> Dmitry
>
>
> On 11/22/11, Erick Erickson  wrote:
>> No, no, no That's something buried in Lucene, it has nothing to
>> do with the patch! The patch has NOT yet been applied to any
>> released code.
>>
>> You could pull the patch from the JIRA and apply it to trunk locally if
>> you wanted. But there's no patch for 3.x, I'll probably put that up
>> over the holiday.
>>
>> But things have changed a bit (one of the things I'll have to do is
>> create some documentation). You *should* be able to specify
>> just legacyMultiTerm="true" in your  if you want to
>> apply the 3.x patch to pre 3.6 code. It would be a good field test
>> if that worked for you.
>>
>> But you can't do any of this until the JIRA (SOLR-2438) is
>> marked "Resolution: Fixed".
>>
>> Don't be fooled by "Fix Version". "Fix Version" simply says
>> that those are the earliest versions it *could* go in.
>>
>> Best
>> Erick
>>
>> Best
>> Erick
>>
>> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan  wrote:
>>> I guess, I have found your comment, thanks.
>>>
>>> For our current needs I have just set:
>>>
>>> setLowercaseExpandedTerms(true); // changed from default false
>>>
>>> in the SolrQueryParser's constructor and that seem to work so far.
>>>
>>> In order not to start a separate thread on wildcards. Is it so, that for
>>> the trailing wildcard there is a minimum of 2 preceding characters for a
>>> search to happen?
>>>
>>> Dmitry
>>>
>>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
>>> wrote:
>>>
 It may be. The tricky bit is that there is a constant governing the
 behavior of
 this that restricts it to 3.6 and above. You'll have to change it after
 applying
 the patch for this to work for you. Should be trivial, I'll leave a note
 in the
 code about this, look for SOLR-2438 in the 3x code line for the place
 to change.

 On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan  wrote:
 > Thanks Erick.
 >
 > Do you think the patch you are working on will be applicable as well to
 3.4?
 >
 > Best,
 > Dmitry
 >
 > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
 > >>> >wrote:
 >
 >> As it happens I'm working on SOLR-2438 which should address this. This
 >> patch
 >> will provide two things:
 >>
 >> The ability to define a new analysis chain in your schema.xml,
 >> currently
 >> called
 >> "multiterm" that will be applied to queries of various sorts,
 >> including wildcard,
 >> prefix, range. This will be somewhat of an "expert" thing to make
 >> yourself...
 >>
 >> In the absence of an explicit definition it'll synthesize a multiterm
 >> analyzer
 >> out of the query analyzer, taking any char fitlers, and
 >> lowercaseFilter (if present),
 >> and ASCIIFoldingfilter (if present) and putting them in the multiterm
 >> analyzer along
 >> with a (hardcoded) WhitespaceTokenizer.
 >>
 >> As of 3.6 and 4.0, this will be the default behavior, although you can
 >> explicitly
 >> define a field type parameter to specify the current behavior.
 >>
 >> The reason it is on 3.6 is that I want it to bake for a while before
 >> getting into the
 >> wild, so I have no intention of trying to get it into the 3.5 release.
 >>
 >> The patch is up for review now, I'd like another set of eyeballs or
 >> two on it before
 >> committing.
 >>
 >> The patch that's up there now is against trunk but I hope to have a 3x
 >> patch that
 >> I'll apply to the 3x code line after 3.5 RC1 is cut.
 >>
 >> Best
 >> Erick
 >>
 >>
 >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan 
 wrote:
 >> >
 >> >> You're right:
 >> >>
 >> >> public SolrQueryParser(IndexSchema schema, String
 >> >> defaultField) {
 >> >> ...
 >> >> setLowercaseExpandedTerms(false);
 >> >> ...
 >> >> }
 >> >
 >> > Please note that lowercaseExpandedTerms uses String.toLowercase()
 (uses
 >>  default Locale) which is a Locale sensitive operation.
 >> >
 >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am
 >> > not
 >> sure if it is ported to solr.
 >> >
 >> >
 >>
 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucen

Re: wild card search and lower-casing

2011-11-22 Thread Dmitry Kan
Thanks, Erick. I was in fact reading the patch (the one attached as a
file to the aforementioned jira) you updated sometime yesterday. I'll
watch the issue, but as said the change of a hard-coded boolean to its
opposite worked just fine for me.

Best,
Dmitry


On 11/22/11, Erick Erickson  wrote:
> No, no, no That's something buried in Lucene, it has nothing to
> do with the patch! The patch has NOT yet been applied to any
> released code.
>
> You could pull the patch from the JIRA and apply it to trunk locally if
> you wanted. But there's no patch for 3.x, I'll probably put that up
> over the holiday.
>
> But things have changed a bit (one of the things I'll have to do is
> create some documentation). You *should* be able to specify
> just legacyMultiTerm="true" in your  if you want to
> apply the 3.x patch to pre 3.6 code. It would be a good field test
> if that worked for you.
>
> But you can't do any of this until the JIRA (SOLR-2438) is
> marked "Resolution: Fixed".
>
> Don't be fooled by "Fix Version". "Fix Version" simply says
> that those are the earliest versions it *could* go in.
>
> Best
> Erick
>
> Best
> Erick
>
> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan  wrote:
>> I guess, I have found your comment, thanks.
>>
>> For our current needs I have just set:
>>
>> setLowercaseExpandedTerms(true); // changed from default false
>>
>> in the SolrQueryParser's constructor and that seem to work so far.
>>
>> In order not to start a separate thread on wildcards. Is it so, that for
>> the trailing wildcard there is a minimum of 2 preceding characters for a
>> search to happen?
>>
>> Dmitry
>>
>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
>> wrote:
>>
>>> It may be. The tricky bit is that there is a constant governing the
>>> behavior of
>>> this that restricts it to 3.6 and above. You'll have to change it after
>>> applying
>>> the patch for this to work for you. Should be trivial, I'll leave a note
>>> in the
>>> code about this, look for SOLR-2438 in the 3x code line for the place
>>> to change.
>>>
>>> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan  wrote:
>>> > Thanks Erick.
>>> >
>>> > Do you think the patch you are working on will be applicable as well to
>>> 3.4?
>>> >
>>> > Best,
>>> > Dmitry
>>> >
>>> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
>>> > >> >wrote:
>>> >
>>> >> As it happens I'm working on SOLR-2438 which should address this. This
>>> >> patch
>>> >> will provide two things:
>>> >>
>>> >> The ability to define a new analysis chain in your schema.xml,
>>> >> currently
>>> >> called
>>> >> "multiterm" that will be applied to queries of various sorts,
>>> >> including wildcard,
>>> >> prefix, range. This will be somewhat of an "expert" thing to make
>>> >> yourself...
>>> >>
>>> >> In the absence of an explicit definition it'll synthesize a multiterm
>>> >> analyzer
>>> >> out of the query analyzer, taking any char fitlers, and
>>> >> lowercaseFilter (if present),
>>> >> and ASCIIFoldingfilter (if present) and putting them in the multiterm
>>> >> analyzer along
>>> >> with a (hardcoded) WhitespaceTokenizer.
>>> >>
>>> >> As of 3.6 and 4.0, this will be the default behavior, although you can
>>> >> explicitly
>>> >> define a field type parameter to specify the current behavior.
>>> >>
>>> >> The reason it is on 3.6 is that I want it to bake for a while before
>>> >> getting into the
>>> >> wild, so I have no intention of trying to get it into the 3.5 release.
>>> >>
>>> >> The patch is up for review now, I'd like another set of eyeballs or
>>> >> two on it before
>>> >> committing.
>>> >>
>>> >> The patch that's up there now is against trunk but I hope to have a 3x
>>> >> patch that
>>> >> I'll apply to the 3x code line after 3.5 RC1 is cut.
>>> >>
>>> >> Best
>>> >> Erick
>>> >>
>>> >>
>>> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan 
>>> wrote:
>>> >> >
>>> >> >> You're right:
>>> >> >>
>>> >> >> public SolrQueryParser(IndexSchema schema, String
>>> >> >> defaultField) {
>>> >> >> ...
>>> >> >> setLowercaseExpandedTerms(false);
>>> >> >> ...
>>> >> >> }
>>> >> >
>>> >> > Please note that lowercaseExpandedTerms uses String.toLowercase()
>>> (uses
>>> >>  default Locale) which is a Locale sensitive operation.
>>> >> >
>>> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am
>>> >> > not
>>> >> sure if it is ported to solr.
>>> >> >
>>> >> >
>>> >>
>>> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>>> >> >
>>> >>
>>> >
>>>
>>
>


-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-22 Thread Erick Erickson
No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But things have changed a bit (one of the things I'll have to do is
create some documentation). You *should* be able to specify
just legacyMultiTerm="true" in your  if you want to
apply the 3.x patch to pre 3.6 code. It would be a good field test
if that worked for you.

But you can't do any of this until the JIRA (SOLR-2438) is
marked "Resolution: Fixed".

Don't be fooled by "Fix Version". "Fix Version" simply says
that those are the earliest versions it *could* go in.

Best
Erick

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan  wrote:
> I guess, I have found your comment, thanks.
>
> For our current needs I have just set:
>
> setLowercaseExpandedTerms(true); // changed from default false
>
> in the SolrQueryParser's constructor and that seem to work so far.
>
> In order not to start a separate thread on wildcards. Is it so, that for
> the trailing wildcard there is a minimum of 2 preceding characters for a
> search to happen?
>
> Dmitry
>
> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson 
> wrote:
>
>> It may be. The tricky bit is that there is a constant governing the
>> behavior of
>> this that restricts it to 3.6 and above. You'll have to change it after
>> applying
>> the patch for this to work for you. Should be trivial, I'll leave a note
>> in the
>> code about this, look for SOLR-2438 in the 3x code line for the place
>> to change.
>>
>> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan  wrote:
>> > Thanks Erick.
>> >
>> > Do you think the patch you are working on will be applicable as well to
>> 3.4?
>> >
>> > Best,
>> > Dmitry
>> >
>> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson > >wrote:
>> >
>> >> As it happens I'm working on SOLR-2438 which should address this. This
>> >> patch
>> >> will provide two things:
>> >>
>> >> The ability to define a new analysis chain in your schema.xml, currently
>> >> called
>> >> "multiterm" that will be applied to queries of various sorts,
>> >> including wildcard,
>> >> prefix, range. This will be somewhat of an "expert" thing to make
>> >> yourself...
>> >>
>> >> In the absence of an explicit definition it'll synthesize a multiterm
>> >> analyzer
>> >> out of the query analyzer, taking any char fitlers, and
>> >> lowercaseFilter (if present),
>> >> and ASCIIFoldingfilter (if present) and putting them in the multiterm
>> >> analyzer along
>> >> with a (hardcoded) WhitespaceTokenizer.
>> >>
>> >> As of 3.6 and 4.0, this will be the default behavior, although you can
>> >> explicitly
>> >> define a field type parameter to specify the current behavior.
>> >>
>> >> The reason it is on 3.6 is that I want it to bake for a while before
>> >> getting into the
>> >> wild, so I have no intention of trying to get it into the 3.5 release.
>> >>
>> >> The patch is up for review now, I'd like another set of eyeballs or
>> >> two on it before
>> >> committing.
>> >>
>> >> The patch that's up there now is against trunk but I hope to have a 3x
>> >> patch that
>> >> I'll apply to the 3x code line after 3.5 RC1 is cut.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >>
>> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan 
>> wrote:
>> >> >
>> >> >> You're right:
>> >> >>
>> >> >> public SolrQueryParser(IndexSchema schema, String
>> >> >> defaultField) {
>> >> >> ...
>> >> >> setLowercaseExpandedTerms(false);
>> >> >> ...
>> >> >> }
>> >> >
>> >> > Please note that lowercaseExpandedTerms uses String.toLowercase()
>> (uses
>> >>  default Locale) which is a Locale sensitive operation.
>> >> >
>> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not
>> >> sure if it is ported to solr.
>> >> >
>> >> >
>> >>
>> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>> >> >
>> >>
>> >
>>
>


Re: wild card search and lower-casing

2011-11-22 Thread Dmitry Kan
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that for
the trailing wildcard there is a minimum of 2 preceding characters for a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson wrote:

> It may be. The tricky bit is that there is a constant governing the
> behavior of
> this that restricts it to 3.6 and above. You'll have to change it after
> applying
> the patch for this to work for you. Should be trivial, I'll leave a note
> in the
> code about this, look for SOLR-2438 in the 3x code line for the place
> to change.
>
> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan  wrote:
> > Thanks Erick.
> >
> > Do you think the patch you are working on will be applicable as well to
> 3.4?
> >
> > Best,
> > Dmitry
> >
> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson  >wrote:
> >
> >> As it happens I'm working on SOLR-2438 which should address this. This
> >> patch
> >> will provide two things:
> >>
> >> The ability to define a new analysis chain in your schema.xml, currently
> >> called
> >> "multiterm" that will be applied to queries of various sorts,
> >> including wildcard,
> >> prefix, range. This will be somewhat of an "expert" thing to make
> >> yourself...
> >>
> >> In the absence of an explicit definition it'll synthesize a multiterm
> >> analyzer
> >> out of the query analyzer, taking any char fitlers, and
> >> lowercaseFilter (if present),
> >> and ASCIIFoldingfilter (if present) and putting them in the multiterm
> >> analyzer along
> >> with a (hardcoded) WhitespaceTokenizer.
> >>
> >> As of 3.6 and 4.0, this will be the default behavior, although you can
> >> explicitly
> >> define a field type parameter to specify the current behavior.
> >>
> >> The reason it is on 3.6 is that I want it to bake for a while before
> >> getting into the
> >> wild, so I have no intention of trying to get it into the 3.5 release.
> >>
> >> The patch is up for review now, I'd like another set of eyeballs or
> >> two on it before
> >> committing.
> >>
> >> The patch that's up there now is against trunk but I hope to have a 3x
> >> patch that
> >> I'll apply to the 3x code line after 3.5 RC1 is cut.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan 
> wrote:
> >> >
> >> >> You're right:
> >> >>
> >> >> public SolrQueryParser(IndexSchema schema, String
> >> >> defaultField) {
> >> >> ...
> >> >> setLowercaseExpandedTerms(false);
> >> >> ...
> >> >> }
> >> >
> >> > Please note that lowercaseExpandedTerms uses String.toLowercase()
> (uses
> >>  default Locale) which is a Locale sensitive operation.
> >> >
> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not
> >> sure if it is ported to solr.
> >> >
> >> >
> >>
> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
> >> >
> >>
> >
>


Re: wild card search and lower-casing

2011-11-21 Thread Erick Erickson
It may be. The tricky bit is that there is a constant governing the behavior of
this that restricts it to 3.6 and above. You'll have to change it after applying
the patch for this to work for you. Should be trivial, I'll leave a note in the
code about this, look for SOLR-2438 in the 3x code line for the place
to change.

On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan  wrote:
> Thanks Erick.
>
> Do you think the patch you are working on will be applicable as well to 3.4?
>
> Best,
> Dmitry
>
> On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson 
> wrote:
>
>> As it happens I'm working on SOLR-2438 which should address this. This
>> patch
>> will provide two things:
>>
>> The ability to define a new analysis chain in your schema.xml, currently
>> called
>> "multiterm" that will be applied to queries of various sorts,
>> including wildcard,
>> prefix, range. This will be somewhat of an "expert" thing to make
>> yourself...
>>
>> In the absence of an explicit definition it'll synthesize a multiterm
>> analyzer
>> out of the query analyzer, taking any char fitlers, and
>> lowercaseFilter (if present),
>> and ASCIIFoldingfilter (if present) and putting them in the multiterm
>> analyzer along
>> with a (hardcoded) WhitespaceTokenizer.
>>
>> As of 3.6 and 4.0, this will be the default behavior, although you can
>> explicitly
>> define a field type parameter to specify the current behavior.
>>
>> The reason it is on 3.6 is that I want it to bake for a while before
>> getting into the
>> wild, so I have no intention of trying to get it into the 3.5 release.
>>
>> The patch is up for review now, I'd like another set of eyeballs or
>> two on it before
>> committing.
>>
>> The patch that's up there now is against trunk but I hope to have a 3x
>> patch that
>> I'll apply to the 3x code line after 3.5 RC1 is cut.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan  wrote:
>> >
>> >> You're right:
>> >>
>> >> public SolrQueryParser(IndexSchema schema, String
>> >> defaultField) {
>> >> ...
>> >> setLowercaseExpandedTerms(false);
>> >> ...
>> >> }
>> >
>> > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses
>>  default Locale) which is a Locale sensitive operation.
>> >
>> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not
>> sure if it is ported to solr.
>> >
>> >
>> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>> >
>>
>


Re: wild card search and lower-casing

2011-11-20 Thread Dmitry Kan
Thanks Erick.

Do you think the patch you are working on will be applicable as well to 3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson wrote:

> As it happens I'm working on SOLR-2438 which should address this. This
> patch
> will provide two things:
>
> The ability to define a new analysis chain in your schema.xml, currently
> called
> "multiterm" that will be applied to queries of various sorts,
> including wildcard,
> prefix, range. This will be somewhat of an "expert" thing to make
> yourself...
>
> In the absence of an explicit definition it'll synthesize a multiterm
> analyzer
> out of the query analyzer, taking any char fitlers, and
> lowercaseFilter (if present),
> and ASCIIFoldingfilter (if present) and putting them in the multiterm
> analyzer along
> with a (hardcoded) WhitespaceTokenizer.
>
> As of 3.6 and 4.0, this will be the default behavior, although you can
> explicitly
> define a field type parameter to specify the current behavior.
>
> The reason it is on 3.6 is that I want it to bake for a while before
> getting into the
> wild, so I have no intention of trying to get it into the 3.5 release.
>
> The patch is up for review now, I'd like another set of eyeballs or
> two on it before
> committing.
>
> The patch that's up there now is against trunk but I hope to have a 3x
> patch that
> I'll apply to the 3x code line after 3.5 RC1 is cut.
>
> Best
> Erick
>
>
> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan  wrote:
> >
> >> You're right:
> >>
> >> public SolrQueryParser(IndexSchema schema, String
> >> defaultField) {
> >> ...
> >> setLowercaseExpandedTerms(false);
> >> ...
> >> }
> >
> > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses
>  default Locale) which is a Locale sensitive operation.
> >
> > In Lucene AnalyzingQueryParser exists for this purposes, but I am not
> sure if it is ported to solr.
> >
> >
> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
> >
>


Re: wild card search and lower-casing

2011-11-20 Thread Erick Erickson
As it happens I'm working on SOLR-2438 which should address this. This patch
will provide two things:

The ability to define a new analysis chain in your schema.xml, currently called
"multiterm" that will be applied to queries of various sorts,
including wildcard,
prefix, range. This will be somewhat of an "expert" thing to make yourself...

In the absence of an explicit definition it'll synthesize a multiterm analyzer
out of the query analyzer, taking any char fitlers, and
lowercaseFilter (if present),
and ASCIIFoldingfilter (if present) and putting them in the multiterm
analyzer along
with a (hardcoded) WhitespaceTokenizer.

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick


On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan  wrote:
>
>> You're right:
>>
>> public SolrQueryParser(IndexSchema schema, String
>> defaultField) {
>> ...
>> setLowercaseExpandedTerms(false);
>> ...
>> }
>
> Please note that lowercaseExpandedTerms uses String.toLowercase() (uses  
> default Locale) which is a Locale sensitive operation.
>
> In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if 
> it is ported to solr.
>
>  http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan

> You're right:
> 
> public SolrQueryParser(IndexSchema schema, String
> defaultField) {
> ...
> setLowercaseExpandedTerms(false);
> ...
> }

Please note that lowercaseExpandedTerms uses String.toLowercase() (uses  
default Locale) which is a Locale sensitive operation. 

In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if 
it is ported to solr.

  
http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
You're right:

public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

OK, thanks for pointing.

On Fri, Nov 18, 2011 at 4:12 PM, Ahmet Arslan  wrote:

> > Actually I have just checked the source code of Lucene's
> > QueryParser and
> > lowercaseExpandedTerms there is set to true by default
> > (version 3.4). The
> > code there does lower-casing by default. So in that sense I
> > don't need to
> > do anything in the client code. Is something wrong here?
>
> But SolrQueryParser extends that and default behavior may different. For
> clarification see source code of SolrQueryParser.
>



-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
> Actually I have just checked the source code of Lucene's
> QueryParser and
> lowercaseExpandedTerms there is set to true by default
> (version 3.4). The
> code there does lower-casing by default. So in that sense I
> don't need to
> do anything in the client code. Is something wrong here?

But SolrQueryParser extends that and default behavior may different. For 
clarification see source code of SolrQueryParser.


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
OK.

Actually I have just checked the source code of Lucene's QueryParser and
lowercaseExpandedTerms there is set to true by default (version 3.4). The
code there does lower-casing by default. So in that sense I don't need to
do anything in the client code. Is something wrong here?

On Fri, Nov 18, 2011 at 3:49 PM, Ahmet Arslan  wrote:

> > Hi Ahmet,
> >
> > Thanks for the link.
> >
> > I'm a bit puzzled with the explanation found there
> > regarding lower casing:
> >
> > These queries are case-insensitive anyway because
> > QueryParser makes them
> > lowercase.
> >
> > that's exactly what I want to achieve, but somehow the
> > queries *are*
> > case-sensitive. Probably I should play around with code of
> > a query parser.
>
> There is an effort for this :
> https://issues.apache.org/jira/browse/SOLR-218
> You can vote this issue. For the time being you can lowercase them in the
> client side.
>



-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
> Hi Ahmet,
> 
> Thanks for the link.
> 
> I'm a bit puzzled with the explanation found there
> regarding lower casing:
> 
> These queries are case-insensitive anyway because
> QueryParser makes them
> lowercase.
> 
> that's exactly what I want to achieve, but somehow the
> queries *are*
> case-sensitive. Probably I should play around with code of
> a query parser.

There is an effort for this : 
https://issues.apache.org/jira/browse/SOLR-218
You can vote this issue. For the time being you can lowercase them in the 
client side.


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
Hi Ahmet,

Thanks for the link.

I'm a bit puzzled with the explanation found there regarding lower casing:

These queries are case-insensitive anyway because QueryParser makes them
lowercase.

that's exactly what I want to achieve, but somehow the queries *are*
case-sensitive. Probably I should play around with code of a query parser.

On Fri, Nov 18, 2011 at 2:50 PM, Ahmet Arslan  wrote:

> > Here is one puzzle I couldn't yet find a key for:
> >
> > for the wild-card query:
> >
> > *ocvd
> >
> > SOLR 3.4 returns hits. But for
> >
> > *OCVD
> >
> > it doesn't
>
> This is a FAQ. Please see
>
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
>



-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
> Here is one puzzle I couldn't yet find a key for:
> 
> for the wild-card query:
> 
> *ocvd
> 
> SOLR 3.4 returns hits. But for
> 
> *OCVD
> 
> it doesn't

This is a FAQ. Please see 

http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F


wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
Hello,

Here is one puzzle I couldn't yet find a key for:

for the wild-card query:

*ocvd

SOLR 3.4 returns hits. But for

*OCVD

it doesn't

On the indexing side two following tokenizers/filters are defined:






On the query side:




SOLR analysis tool shows, that OCVD gets lower-cased to ocvd. Does SOLR
skip a lower-casing step when doing the actual wild-card search?

BTW, the same issue for a trailing wild-card:

mocv*

produces hits, while

MOCV*

doesn't. Appreciate any help or pointers.


-- 
Regards,

Dmitry Kan