subject:"case sensitivity"

eDisMax and Boolean operator case-sensitivity

2013-11-06 Thread Tom Mortimer

Hi,

I'm using eDisMax query parser, and need to support Boolean operators AND
and OR. It seems from testing that these are *not* case sensitive, e.g.
setting mm to 0, oscar AND wilde returns the same results as oscar and
wilde (15 hits) while oscar foo wilde returns the same results as oscar
wilde (2000 hits).

Is it possible to configure eDisMax to do case-sensitive parsing, so that
AND is an operator but and is just another term?

thanks,
Tom

Re: eDisMax and Boolean operator case-sensitivity

2013-11-06 Thread Shawn Heisey


On 11/6/2013 11:46 AM, Tom Mortimer wrote:

I'm using eDisMax query parser, and need to support Boolean operators AND
and OR. It seems from testing that these are *not* case sensitive, e.g.
setting mm to 0, oscar AND wilde returns the same results as oscar and
wilde (15 hits) while oscar foo wilde returns the same results as oscar
wilde (2000 hits).

Is it possible to configure eDisMax to do case-sensitive parsing, so that
AND is an operator but and is just another term?


Include another query parameter: lowercaseOperators=false

http://wiki.apache.org/solr/ExtendedDisMax#lowercaseOperators

Thanks,
Shawn

Re: eDisMax and Boolean operator case-sensitivity

2013-11-06 Thread Tom Mortimer

Oh, good grief - I was just reading that page, how did I miss that? *derp*

Thanks Shawn!!!

Tom


On 6 November 2013 18:59, Shawn Heisey s...@elyograg.org wrote:

 On 11/6/2013 11:46 AM, Tom Mortimer wrote:

 I'm using eDisMax query parser, and need to support Boolean operators AND
 and OR. It seems from testing that these are *not* case sensitive, e.g.

 setting mm to 0, oscar AND wilde returns the same results as oscar and
 wilde (15 hits) while oscar foo wilde returns the same results as
 oscar
 wilde (2000 hits).

 Is it possible to configure eDisMax to do case-sensitive parsing, so that
 AND is an operator but and is just another term?


 Include another query parameter: lowercaseOperators=false

 http://wiki.apache.org/solr/ExtendedDisMax#lowercaseOperators

 Thanks,
 Shawn

Re: why does * affect case sensitivity of query results

2013-04-30 Thread Erick Erickson

Actually, look at the referenced JIRA
https://issues.apache.org/jira/browse/SOLR-2438 and you'll see it's
changed in 3.6.

Best
Erick

On Mon, Apr 29, 2013 at 9:36 AM, geeky2 gee...@hotmail.com wrote:
 here is the jira link:

 https://issues.apache.org/jira/browse/SOLR-219





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059814.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: why does * affect case sensitivity of query results

2013-04-30 Thread geeky2

hello erik,

thank you for the info - yes - i did notice ;)

one more reason for us to upgrade from 3.5.

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p406.html
Sent from the Solr - User mailing list archive at Nabble.com.

why does * affect case sensitivity of query results

2013-04-29 Thread geeky2

hello,

environment: solr 3.5


problem statement: when query has * appended, it turns case sensitive.

assumption: query should NOT be case sensitive

actual value in database at time of index: 4387828BULK

here is a snapshot of what works and does not work.

what works:

  itemModelNoExactMatchStr:4387828bULk (and any variation of upper and lower
case letters for *bulk*)

  itemModelNoExactMatchStr:4387828bu*
  itemModelNoExactMatchStr:4387828bul*
  itemModelNoExactMatchStr:4387828bulk*


what does NOT work:

 itemModelNoExactMatchStr:4387828BU*
 itemModelNoExactMatchStr:4387828BUL*
 itemModelNoExactMatchStr:4387828BULK*


below are the specifics of my field and fieldType

  field name=itemModelNoExactMatchStr type=text_exact indexed=true
stored=true/


fieldType name=text_exact class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: why does * affect case sensitivity of query results

2013-04-29 Thread Alexandre Rafalovitch

http://wiki.apache.org/solr/MultitermQueryAnalysis

Sorry, not for your version of Solr.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Apr 29, 2013 at 11:40 AM, geeky2 gee...@hotmail.com wrote:
 hello,

 environment: solr 3.5


 problem statement: when query has * appended, it turns case sensitive.

 assumption: query should NOT be case sensitive

 actual value in database at time of index: 4387828BULK

 here is a snapshot of what works and does not work.

 what works:

   itemModelNoExactMatchStr:4387828bULk (and any variation of upper and lower
 case letters for *bulk*)

   itemModelNoExactMatchStr:4387828bu*
   itemModelNoExactMatchStr:4387828bul*
   itemModelNoExactMatchStr:4387828bulk*


 what does NOT work:

  itemModelNoExactMatchStr:4387828BU*
  itemModelNoExactMatchStr:4387828BUL*
  itemModelNoExactMatchStr:4387828BULK*


 below are the specifics of my field and fieldType

   field name=itemModelNoExactMatchStr type=text_exact indexed=true
 stored=true/


 fieldType name=text_exact class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 thx
 mark





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: why does * affect case sensitivity of query results

2013-04-29 Thread geeky2

was looking in Smiley's book on page 129 and 130.

from the book,


No text analysis is performed on the search word containing the wildcard,
not even lowercasing. So if you want to find a word starting with Sma, then
sma* is required instead of Sma*, assuming the index side of the field's
type
includes lowercasing. This shortcoming is tracked on SOLR-219. Moreover,
if the field that you want to use the wildcard query on is stemmed in the
analysis, then smashing* would not find the original text Smashing because
the stemming process transforms this to smash. Consequently, don't stem.


thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: why does * affect case sensitivity of query results

2013-04-29 Thread geeky2

here is the jira link:

https://issues.apache.org/jira/browse/SOLR-219





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Case-sensitivity issue with search field name

2013-03-01 Thread hyrax

Hi Shawn,
Thanks for your reply.
So you mean the field name can't be case insensitive when specifies in a
query?
I'm gonna stop doing research on this issue if this is confirmed...
Thanks,
Hyrax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Case-sensitivity-issue-with-search-field-name-tp4043800p4044006.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Case-sensitivity issue with search field name

2013-03-01 Thread hyrax

Hi wunder,
Great advice!
As a matter of fact, I choose to use upper case due to the document I
indexed, but it is really pain in the ass when typing the field names all in
upper case.
I thought there probably would be a way to set field names case-insensitive.
I was wrong, wasn't I?
Thanks,
Hyrax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Case-sensitivity-issue-with-search-field-name-tp4043800p4044010.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Case-sensitivity issue with search field name

2013-02-28 Thread hyrax

Hi guys,

I'm using Solr 4.0 and I recently notice an issue that bothers me a lot
which is that if you define a field in your schema named 'HOST' then in the
query you have to specify this field by 'HOST' while if you used 'host' it
would throw an 'undefined field' error.

I have done some googling while I only found a jira ticket which says this
issue had been fixed:  https://issues.apache.org/jira/browse/SOLR-873
https://issues.apache.org/jira/browse/SOLR-873  

I know I can use copyField to accomplish this but I'm wonder if there a
way to apply this change all the field on the fly not one by one ...

Many many thanks in advance!
Thanks,
Hyrax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Case-sensitivity-issue-with-search-field-name-tp4043800.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Case-sensitivity issue with search field name

2013-02-28 Thread Shawn Heisey


On 2/28/2013 3:40 PM, hyrax wrote:

I'm using Solr 4.0 and I recently notice an issue that bothers me a lot
which is that if you define a field in your schema named 'HOST' then in the
query you have to specify this field by 'HOST' while if you used 'host' it
would throw an 'undefined field' error.

I have done some googling while I only found a jira ticket which says this
issue had been fixed:  https://issues.apache.org/jira/browse/SOLR-873
https://issues.apache.org/jira/browse/SOLR-873

I know I can use copyField to accomplish this but I'm wonder if there a
way to apply this change all the field on the fly not one by one ...


It appears that the issue you have linked is specific to the dataimport 
handler (importing from a database or another structured data source), 
not searching.  I've always read that fields in a Solr schema are case 
sensitive.


My own recommendation is that you pick a standard, either all uppercase 
or all lowercase, and that you stick with it.  I prefer all lowercase 
myself.


Thanks,
Shawn

Re: Solr Case-sensitivity issue with search field name

2013-02-28 Thread Walter Underwood

Lower case is safer than upper case. For unicode, uppercasing is a lossy 
conversion. There are sets of different lower case characters that convert to 
the same upper case character. When you convert back to lower case, you don't 
know which one it was originally.

Always use lower case for text. That avoids some really subtle bugs.

wunder

On Feb 28, 2013, at 3:47 PM, Shawn Heisey wrote:

 On 2/28/2013 3:40 PM, hyrax wrote:
 I'm using Solr 4.0 and I recently notice an issue that bothers me a lot
 which is that if you define a field in your schema named 'HOST' then in the
 query you have to specify this field by 'HOST' while if you used 'host' it
 would throw an 'undefined field' error.
 
 I have done some googling while I only found a jira ticket which says this
 issue had been fixed:  https://issues.apache.org/jira/browse/SOLR-873
 https://issues.apache.org/jira/browse/SOLR-873
 
 I know I can use copyField to accomplish this but I'm wonder if there a
 way to apply this change all the field on the fly not one by one ...
 
 It appears that the issue you have linked is specific to the dataimport 
 handler (importing from a database or another structured data source), not 
 searching.  I've always read that fields in a Solr schema are case sensitive.
 
 My own recommendation is that you pick a standard, either all uppercase or 
 all lowercase, and that you stick with it.  I prefer all lowercase myself.
 
 Thanks,
 Shawn

Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson

I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote:
 I wonder whether CharFilters are applied to wildcard terms?  I suspect they
 might be.  If that's the case, you could use the MappingCharFilter to
 perform lowercasing (and strip diacritics too if you want that)

 -Mike

 On 06/15/2011 10:12 AM, Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?

 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:

 opps, please s/Highlight/Wildcard/

 On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
         --
 !-- Case insensitive stop word removal.
           add enablePositionIncrements=true in both the index and query
           analyzers to leave a 'gap' for more accurate phrase queries.
         --
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType

 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?

Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson

I think my answer is here...

On wildcard and fuzzy searches, no text analysis is performed on the
search word. 

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson jej2...@gmail.com wrote:
 I'm not familiar with the CharFilters, I'll look into those now.

 Is the solr.LowerCaseFilterFactory not handling wildcards the expected
 result or is this a bug?

 On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote:
 I wonder whether CharFilters are applied to wildcard terms?  I suspect they
 might be.  If that's the case, you could use the MappingCharFilter to
 perform lowercasing (and strip diacritics too if you want that)

 -Mike

 On 06/15/2011 10:12 AM, Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?

 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:

 opps, please s/Highlight/Wildcard/

 On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
         --
 !-- Case insensitive stop word removal.
           add enablePositionIncrements=true in both the index and query
           analyzers to leave a 'gap' for more accurate phrase queries.
         --
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType

 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?

Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov

Yes, after posting that response, I read some more and came to the same 
conclusion... there seems to be some interest on the dev list in 
building a capability to specify an analysis chain for use with wildcard 
and related queries, but it doesn't exist now.


-Mike

On 06/30/2011 10:34 AM, Jamie Johnson wrote:

I think my answer is here...

On wildcard and fuzzy searches, no text analysis is performed on the
search word. 

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnsonjej2...@gmail.com  wrote:
   

I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolovsoko...@ifactory.com  wrote:
 

I wonder whether CharFilters are applied to wildcard terms?  I suspect they
might be.  If that's the case, you could use the MappingCharFilter to
perform lowercasing (and strip diacritics too if you want that)

-Mike

On 06/15/2011 10:12 AM, Jamie Johnson wrote:

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com  wrote:
   

opps, please s/Highlight/Wildcard/

On 06/14/2011 05:31 PM, Mike Sokolov wrote:
 

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:
   

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 

I am using the following for my text field:

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
!-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

Re: Text field case sensitivity problem

2011-06-30 Thread Erik Hatcher

Jamie - there is a JIRA about this, at least one: 
https://issues.apache.org/jira/browse/SOLR-218

Erik
 
On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?
 
 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:
 
 opps, please s/Highlight/Wildcard/
 
 
 On 06/14/2011 05:31 PM, Mike Sokolov wrote:
 
 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.
 
 -Mike
 
 On 06/14/2011 05:13 PM, Jamie Johnson wrote:
 
 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine
 
 
 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
 wrote:
 
 I am using the following for my text field:
 
 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
 !-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType
 
 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /
 
 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?

Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov


Yes, and this too: https://issues.apache.org/jira/browse/SOLR-219

On 06/30/2011 12:46 PM, Erik Hatcher wrote:

Jamie - there is a JIRA about this, at least 
one:https://issues.apache.org/jira/browse/SOLR-218

Erik

On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

   

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com  wrote:

 

opps, please s/Highlight/Wildcard/


On 06/14/2011 05:31 PM, Mike Sokolov wrote:

   

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
wrote:

I am using the following for my text field:
   

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

Re: Text field case sensitivity problem

2011-06-15 Thread Jamie Johnson

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:

 opps, please s/Highlight/Wildcard/


 On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

  I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType

 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?

Re: Text field case sensitivity problem

2011-06-15 Thread Mike Sokolov

I wonder whether CharFilters are applied to wildcard terms?  I suspect 
they might be.  If that's the case, you could use the MappingCharFilter 
to perform lowercasing (and strip diacritics too if you want that)


-Mike

On 06/15/2011 10:12 AM, Jamie Johnson wrote:
So simply lower casing the works but can get complex.  The query that 
I'm executing may have things like ranges which require some words to 
be upper case (i.e. TO).  I think this would be much better solved on 
Solrs end, is there a JIRA about this?


On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com 
mailto:soko...@ifactory.com wrote:


opps, please s/Highlight/Wildcard/


On 06/14/2011 05:31 PM, Mike Sokolov wrote:

Wildcard queries aren't analyzed, I think?  I'm not completely
sure what the best workaround is here: perhaps simply
lowercasing the query terms yourself in the application.  Also
- I hope someone more knowledgeable will say that the new
HighlightQuery in trunk doesn't have this restriction, but I'm
not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

Also of interest to me is this returns results

http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine

http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie
Johnsonjej2...@gmail.com mailto:jej2...@gmail.com  wrote:

I am using the following for my text field:

fieldType name=text class=solr.TextField
positionIncrementGap=100
autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at
query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true
expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both
the index and query
  analyzers to leave a 'gap' for more accurate
phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1
catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=0
catenateNumbers=0 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true
indexed=true /

when I execute a go to the following url I get results

http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*

http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do

http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*

http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory
would have handled
lowercasing both the query and what is being indexed,
am I missing
something?

Text field case sensitivity problem

2011-06-14 Thread Jamie Johnson

I am using the following for my text field:

fieldType name=text class=solr.TextField positionIncrementGap=100
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

I have a field defined as
   field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

Re: Text field case sensitivity problem

2011-06-14 Thread Jamie Johnson

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com wrote:

 I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

 I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?

Re: Text field case sensitivity problem

2011-06-14 Thread Mike Sokolov

Wildcard queries aren't analyzed, I think?  I'm not completely sure what 
the best workaround is here: perhaps simply lowercasing the query terms 
yourself in the application.  Also - I hope someone more knowledgeable 
will say that the new HighlightQuery in trunk doesn't have this 
restriction, but I'm not sure about that.


-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com  wrote:

   

I am using the following for my text field:

 fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

RE: Text field case sensitivity problem

2011-06-14 Thread Bob Sandiford

Unfortunately, wild card search terms don't get processed by the analyzers.

One suggestion that's fairly common is to make sure you lower case your wild 
card search terms yourself before issuing the query.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Jamie Johnson [mailto:jej2...@gmail.com]
 Sent: Tuesday, June 14, 2011 5:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Text field case sensitivity problem
 
 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine
 
 
 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  I am using the following for my text field:
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and
 query
analyzers to leave a 'gap' for more accurate phrase
 queries.
  --
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
  /fieldType
 
  I have a field defined as
 field name=Person_Name type=text stored=true indexed=true
 /
 
  when I execute a go to the following url I get results
  http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
  but if I do
  http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
  I get nothing.  I thought the LowerCaseFilterFactory would have
 handled
  lowercasing both the query and what is being indexed, am I missing
  something?

Re: Text field case sensitivity problem

2011-06-14 Thread Mike Sokolov


opps, please s/Highlight/Wildcard/

On 06/14/2011 05:31 PM, Mike Sokolov wrote:
Wildcard queries aren't analyzed, I think?  I'm not completely sure 
what the best workaround is here: perhaps simply lowercasing the query 
terms yourself in the application.  Also - I hope someone more 
knowledgeable will say that the new HighlightQuery in trunk doesn't 
have this restriction, but I'm not sure about that.


-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com  
wrote:



I am using the following for my text field:

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
!-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and 
query

   analyzers to leave a 'gap' for more accurate phrase queries.
 --
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

How to ignore whitespace/ case sensitivity with dedupe

2011-05-28 Thread tinman

Hi all,

I've followed the instructions at this link
http://wiki.apache.org/solr/Deduplication and got the basic dedupe field
working. However, it doesn't seem to recognize case differences or white
space differences even thought I've defined the type of the fields to be
used for dedupe as well as the signature field as followings in schema.xml

fieldType autoGeneratePhraseQueries=true class=solr.TextField
name=text_ws_lower positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
field name=name type=text_ws_lower/
field name=signatureField type=text_ws_lower/

and in the solrconfig.xml updateRequestProcessorChain name=dedupe
processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupesfalse/bool
  str name=signatureFieldsignatureField/str
  str name=fieldsname/str
  str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

I know a possible solution is to lowercase and remove white spaces for the
field name before submiting documents to solr, but is there any other
alternatives so that when the following data is given
Name: JOHN SMITH and jOhn  SMITh the documents have the same outcome in
signatureField?

Thanks heaps
Cheers
tinman







--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-ignore-whitespace-case-sensitivity-with-dedupe-tp2997624p2997624.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to ignore whitespace/ case sensitivity with dedupe

2011-05-28 Thread Koji Sekiguchi


(11/05/29 8:47), tinman wrote:

Hi all,

I've followed the instructions at this link
http://wiki.apache.org/solr/Deduplication and got the basic dedupe field
working. However, it doesn't seem to recognize case differences or white
space differences even thought I've defined the type of the fields to be
used for dedupe as well as the signature field as followings in schema.xml

fieldType autoGeneratePhraseQueries=true class=solr.TextField
name=text_ws_lower positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
field name=name type=text_ws_lower/
field name=signatureField type=text_ws_lower/

and in the solrconfig.xmlupdateRequestProcessorChain name=dedupe
 processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   bool name=overwriteDupesfalse/bool
   str name=signatureFieldsignatureField/str
   str name=fieldsname/str
   str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
 /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain

I know a possible solution is to lowercase and remove white spaces for the
field name before submiting documents to solr, but is there any other
alternatives so that when the following data is given
Name: JOHN SMITH and jOhn  SMITh the documents have the same outcome in
signatureField?


I can't believe this. Those signatures should be different.

Are you sure you see same signatures in signatureField (it should be stored=true
in order to see the result of signature)? Or did you just see those duplicate 
documents
were registered and not checked signatureField by yourself? If latter, it is 
feature.
Because you set overwriteDupes=false and it mean duplication check works on 
uniqueKey field.

koji
--
http://www.rondhuit.com/en/

Re: How to ignore whitespace/ case sensitivity with dedupe

2011-05-28 Thread tinman

By default, stored = true, indexed = true. Any case, this is an example
output from solr search console.

result name=response numFound=2 start=0
  doc
str name=id1234/str
str name=nameJOHN   SMITH /str
str name=signatureField5430fbe9e6374611/str/doc
  doc
str name=id1233/str
str name=name   john SMITh/str
str name=signatureField49867a7835ff6741/str/doc
/result

As you can see, the 2 signature fields are different. And I want the
overrides = false as I want to use field collapsing for removing dedupe at
query time.

Thanks
tinman


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-ignore-whitespace-case-sensitivity-with-dedupe-tp2997624p2997738.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler - case sensitivity of column names

2010-02-08 Thread Alexey Serba

I encountered the problem with Oracle converting column names to upper
case. As a result SolrInputDocument is created with field names in
upper case and Document [null] missing required field: id exception
is thrown ( although ID field is defined ).

I do not specify field elements explicitly.

I know that I can rewrite all my queries to select id as id, body
as body from document format, but is there any other workaround for
this? case insensitive option or something?

Here's my data-config:
dataConfig
  dataSource convertType=true
driver=oracle.jdbc.driver.OracleDriver password=oracle
url=jdbc:oracle:thin:@localhost:1521:xe user=SYSTEM/
  document name=items
entity name=root pk=id preImportDeleteQuery=db:db1
query=select id, body from document
transformer=TemplateTransformer
  entity name=nested1 query=select category from
document_category where doc_id='${root.id}'/
  entity name=nested2 query=select tag from document_tag where
doc_id='${root.id}'/
  field column=db template=db1/
/entity
  /document
/dataConfig

Alexey

Re: DataImportHandler - case sensitivity of column names

2010-02-08 Thread Shalin Shekhar Mangar

On Mon, Feb 8, 2010 at 3:59 PM, Alexey Serba ase...@gmail.com wrote:

 I encountered the problem with Oracle converting column names to upper
 case. As a result SolrInputDocument is created with field names in
 upper case and Document [null] missing required field: id exception
 is thrown ( although ID field is defined ).

 I do not specify field elements explicitly.

 I know that I can rewrite all my queries to select id as id, body
 as body from document format, but is there any other workaround for
 this? case insensitive option or something?

 Here's my data-config:
 dataConfig
  dataSource convertType=true
 driver=oracle.jdbc.driver.OracleDriver password=oracle
 url=jdbc:oracle:thin:@localhost:1521:xe user=SYSTEM/
  document name=items
entity name=root pk=id preImportDeleteQuery=db:db1
 query=select id, body from document
 transformer=TemplateTransformer
  entity name=nested1 query=select category from
 document_category where doc_id='${root.id}'/
  entity name=nested2 query=select tag from document_tag where
 doc_id='${root.id}'/
  field column=db template=db1/
/entity
  /document
 /dataConfig


Fields are imported in a case-insensitive manner as long as they are not
specified explicitly. In this case, however, the problem is that the ${
root.id} is case sensitive. There is no way right now to resolve variables
in a case-insensitive manner.

-- 
Regards,
Shalin Shekhar Mangar.

documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Jonathan Vanasco


I couldn't find this anywhere on solr's docs / faq

i finally found a reference on lucene
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

this should really be added somewhere.  i'm not sure where, but I  
thought this was worth bringing up to the list -- as it really  
confused the hell out of me :)

Re: documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Chris Hostetter


: Subject: documentation deficiency : case sensitivity of boolean operators
: 
: I couldn't find this anywhere on solr's docs / faq

if you have suggestions on places to add it, feel free to update the wiki.

(most of the documentation is deliberatly agnostic to the specifics of the 
query parser syntax, instead relying on links to point you to the same 
refrence URL you found ... so i can't actually think of anywhere in the 
Solr docs that mentions the AND/OR/NOT syntax that it would make sense to 
clarify this)

-Hoss

Re: documentation deficiency : case sensitivity of boolean operators

2009-09-15 Thread Yonik Seeley

That's already linked from
http://wiki.apache.org/solr/SolrQuerySyntax

-Yonik
http://www.lucidimagination.com


On Tue, Sep 15, 2009 at 5:38 PM, Jonathan Vanasco jvana...@2xlp.com wrote:
 I couldn't find this anywhere on solr's docs / faq

 i finally found a reference on lucene
        http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

 this should really be added somewhere.  i'm not sure where, but I thought
 this was worth bringing up to the list -- as it really confused the hell out
 of me :)

solr field types and case sensitivity

2007-12-18 Thread Dryganets Sergey


can I change query analyzer for concrete request to solr?
ie: I want add option on my site use case-sensitive search or not for this
search request, but can't find any good solution ...

I think that create duplicates (index only fields with different analyzers
configuration) for each field it's bad idea ...

May be any one know good solution for this problem?

-- 
View this message in context: 
http://www.nabble.com/solr-field-types-and-case-sensitivity-tp14395912p14395912.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr field types and case sensitivity

2007-12-18 Thread Ryan McKinley


Dryganets Sergey wrote:

can I change query analyzer for concrete request to solr?
ie: I want add option on my site use case-sensitive search or not for this
search request, but can't find any good solution ...

I think that create duplicates (index only fields with different analyzers
configuration) for each field it's bad idea ...



yes, you would index a field twice - once with a LowerCaseFilter and 
once without.  That is a good solution.


ryan

Re: solr field types and case sensitivity

2007-12-18 Thread Dryganets Sergey




ryantxu wrote:
 
 yes, you would index a field twice - once with a LowerCaseFilter and 
 once without.  That is a good solution.
 

Hm... 
So I'm should create n*n indexes where n is search options count ...

Can I copy fields automatically?  

For example I have a field with name name and subset of fields with
prefixes or suffixes, so
can I use regexp to copy field.

Or may be I can describe copy field policy for a fieldType (as for me this
solution will be better - there are less efforts to add new search option)

-- 
View this message in context: 
http://www.nabble.com/solr-field-types-and-case-sensitivity-tp14395912p14411420.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Erick Erickson

DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be
useful

For your line number, page number etc perspective, it is possible to index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token $.
Coincident
with the last token of every paragraph index the token #. If you get
the
offsets of the matching terms, you can quite quickly simply count the number
of line and paragraph tokens using TermDocs/TermEnums and correlate hits
to lines and paragraphs. The trick is to index your special tokens with an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on this).


Another possibility is to add a special field with each document with the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not indexed).
Again, given the offsets,  you can read in this field and figure out what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of your
particular problem space. I'm not sure either of them is suitable for very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective, so
don't
even *think* of asking me how to really make this work in SOLR G.

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert [EMAIL PROTECTED] wrote:

 Ryan (and others who need something to put them so sleep :) )

 Wow -- the light-bulb finally went off -- the Analzyer admin page is very
 cool -- I just was not at all thinking the SOLR/Lucene way.

 I need to rethink my whole approach now that I understand (from reviewing
 the schema.xml closer and playing with the Analyser) how compatible index
 and query policies can be applied automatically on a field by field basis by
 SOLR at both index and query time.

 I still may have a stumper here, but I need to give it some thought, and
 may return again with another question:

 The problem is that my text is book text (fairly large) that ooks very
 much like one would expect:
 book
 chapter
 parasen.../sensen/sen/para
 parasen.../sensen/sen/para
 parasen.../sensen.../sen/para
 /chapter
 /book

 The search results need to return exact sentences or paragraphs with their
 exact page:line numbers (which is available in the embedded markup in the
 text).

 There were previous responses by others, suggesting I look into payloads,
 but I did not fully understand that -- I may have to re-read those e-mails
 now that I am getting a clearer picture of SOLR/Lucene.

 However, the reason I resorted to indexing each paragraph as a single
 document, and then redundantly indexing each sentence as a single document,
 is because I was planning on pre-parsing the text myself (outside of SOLR)
 -- and feeding separate doc elements to the add because in that way I
 could produce the page:line reference in the pre-parsing (again outside of
 SOLR) and feed it in as explict field in the doc elements of the add
 requests.  Therefore at query time, I will have the exact page:line
 corresponding to the start of the paragraph or sentence.

 But I am beginning to suspect, I was planning to do a lot of work that
 SOLR can do for me.

 I will continue to study this and respond when I am a bit clearer, but the
 closer I could get to just submitting the books a chapter at a time -- and
 letting SOLR do the work, the better (cause I have all the books in well
 formed xml at chapter levels).  However, I don't  see yet how I could get
 par/sen granular search result hits, along with their exact page:line
 coordinates unless I approach it by explicitly indexing the pars and sens as
 single documents, not chapters hits, and also return the entire text of the
 sen or par, and highlight the keywords within (for the search result hit).
  Once a search result hit is selected, it would then act as expected and
 position into the chapter, at the selected reference, highlight again the
 key words, but this time in the context of an entire chapter (the whole
 document to the user's mind).

 Even with my new understanding you (and others) have given me, which I can
 use to certainly improve my approach -- it still seems to me that because
 multi-valued fields concatenate text -- even if you use the
 positionGapIncrment feature to prohibit unwanted phrase matches, how do you
 produce a well definied search result hit, bounded by the exact sen or par,
 unless you index them as single documents?

 Should I still read up on the payload discussion?

 Dave




 - Original Message 
 From: Ryan McKinley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, November 10, 2007 5:00:43 PM
 Subject: Re: Redundant indexing * 4 only solution (for par/sen and case
 sensitivity)


 David Neubert wrote:
  Ryan,
 
  Thanks for your response.  I infer from your response that you can
  have a different analyzer for each field

 yes

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert

Erik,

Probably because of my newness to SOLR/Lucene, I see now what you/Yonik meant 
by case field, but I am not clear about your wording per-book setting 
attached at index time - would you mind ellaborating on that, so I am clear?

Dave

- Original Message 
From: Erik Hatcher [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Sunday, November 11, 2007 5:21:45 AM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


Solr query syntax is documented here: http://wiki.apache.org/solr/ 
SolrQuerySyntax

What Yonik is referring to is creating your own case field with the  
per-book setting attached at index time.

Erik


On Nov 11, 2007, at 12:55 AM, David Neubert wrote:

 Yonik (or anyone else)

 Do you know where on-line documentation on the +case: syntax is  
 located?  I can't seem to find it.

 Dave

 - Original Message 
 From: Yonik Seeley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, November 10, 2007 4:56:40 PM
 Subject: Re: Redundant indexing * 4 only solution (for par/sen and  
 case sensitivity)


 On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that
  invoke different Analyzers -- am I at a dead end, and have to  
 result to
  consequetive multiple queries instead

 Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability
  to search multiple books at once
 with sen/par and case sensitivity settings individually selectable
  per book

 You could easily select case sensitivity or not *per query* across
 all
  books.
 You should step back and see what the requirements actually are (i.e.
 the reasons why one needs to be able to select case
 sensitive/insensitive on a book level... it doesn't make sense to me
 at first blush).

 It could be done on a per-book level in solr with a more complex
 query
 structure though...

 (+case:sensitive +(normal relevancy query on the case sensitive
 fields
 goes here)) OR (+case:insensitive +(normal relevancy query on the
 case
 insensitive fields goes here))

 -Yonik





 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert

Erik - thanks, I am considering this approach, verses explicit redundant 
indexing -- and am also considering Lucene -- problem is, I am one week into 
both technologies (though have years in the search space) -- wish I could go to 
Hong Kong -- any discounts available anywhere :)

Dave

- Original Message 
From: Erick Erickson [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 2:11:14 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)

DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this
 may be
useful

For your line number, page number etc perspective, it is possible to
 index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token $.
Coincident
with the last token of every paragraph index the token #. If you
 get
the
offsets of the matching terms, you can quite quickly simply count the
 number
of line and paragraph tokens using TermDocs/TermEnums and correlate
 hits
to lines and paragraphs. The trick is to index your special tokens with
 an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on
 this).

Another possibility is to add a special field with each document with
 the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not
 indexed).
Again, given the offsets,  you can read in this field and figure out
 what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of
 your
particular problem space. I'm not sure either of them is suitable for
 very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective,
 so
don't
even *think* of asking me how to really make this work in SOLR G.

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert [EMAIL PROTECTED] wrote:

 Ryan (and others who need something to put them so sleep :) )

 Wow -- the light-bulb finally went off -- the Analzyer admin page is
 very
 cool -- I just was not at all thinking the SOLR/Lucene way.

 I need to rethink my whole approach now that I understand (from
 reviewing
 the schema.xml closer and playing with the Analyser) how compatible
 index
 and query policies can be applied automatically on a field by field
 basis by
 SOLR at both index and query time.

 I still may have a stumper here, but I need to give it some thought,
 and
 may return again with another question:

 The problem is that my text is book text (fairly large) that ooks
 very
 much like one would expect:
 book
 chapter
 parasen.../sensen/sen/para
 parasen.../sensen/sen/para
 parasen.../sensen.../sen/para
 /chapter
 /book

 The search results need to return exact sentences or paragraphs with
 their
 exact page:line numbers (which is available in the embedded markup in
 the
 text).

 There were previous responses by others, suggesting I look into
 payloads,
 but I did not fully understand that -- I may have to re-read those
 e-mails
 now that I am getting a clearer picture of SOLR/Lucene.

 However, the reason I resorted to indexing each paragraph as a single
 document, and then redundantly indexing each sentence as a single
 document,
 is because I was planning on pre-parsing the text myself (outside of
 SOLR)
 -- and feeding separate doc elements to the add because in that
 way I
 could produce the page:line reference in the pre-parsing (again
 outside of
 SOLR) and feed it in as explict field in the doc elements of the
 add
 requests.  Therefore at query time, I will have the exact page:line
 corresponding to the start of the paragraph or sentence.

 But I am beginning to suspect, I was planning to do a lot of work
 that
 SOLR can do for me.

 I will continue to study this and respond when I am a bit clearer,
 but the
 closer I could get to just submitting the books a chapter at a time
 -- and
 letting SOLR do the work, the better (cause I have all the books in
 well
 formed xml at chapter levels).  However, I don't  see yet how I could
 get
 par/sen granular search result hits, along with their exact page:line
 coordinates unless I approach it by explicitly indexing the pars and
 sens as
 single documents, not chapters hits, and also return the entire text
 of the
 sen or par, and highlight the keywords within (for the search result
 hit).
  Once a search result hit is selected, it would then act as expected
 and
 position into the chapter, at the selected reference, highlight again
 the
 key words, but this time in the context of an entire chapter (the
 whole
 document to the user's mind).

 Even with my new understanding you (and others) have given me, which
 I can
 use to certainly improve my approach -- it still seems to me that
 because
 multi-valued fields concatenate text -- even if you use the
 positionGapIncrment feature to prohibit unwanted phrase matches, how
 do you
 produce

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Yonik Seeley

On Nov 12, 2007 2:20 PM, David Neubert [EMAIL PROTECTED] wrote:
 Erik - thanks, I am considering this approach, verses explicit redundant 
 indexing -- and am also considering Lucene -

There's not a well defined solution in either IMO.

 - problem is, I am one week into both technologies (though have years in the 
 search space) -- wish I could
 go to Hong Kong -- any discounts available anywhere :)

Unfortunately the OS Summit has been canceled.

-Yonik

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Chris Hostetter


:  - problem is, I am one week into both technologies (though have years in 
the search space) -- wish I could
:  go to Hong Kong -- any discounts available anywhere :)
: 
: Unfortunately the OS Summit has been canceled.

Or rescheduled to 2008 ... depending on wether you are a half-empty / 
half-full kind of person.

And lets not forget atlanta ... starting today and all...

http://us.apachecon.com/us2007/



-Hoss

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-11 Thread Erik Hatcher

Solr query syntax is documented here: http://wiki.apache.org/solr/ 
SolrQuerySyntax


What Yonik is referring to is creating your own case field with the  
per-book setting attached at index time.


Erik


On Nov 11, 2007, at 12:55 AM, David Neubert wrote:


Yonik (or anyone else)

Do you know where on-line documentation on the +case: syntax is  
located?  I can't seem to find it.


Dave

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 4:56:40 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and  
case sensitivity)



On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:

So if I am hitting multiple fields (in the same search request) that
 invoke different Analyzers -- am I at a dead end, and have to  
result to

 consequetive multiple queries instead

Solr handles that for you automatically.


The app that I am replacing (and trying to enhance) has the ability

 to search multiple books at once

with sen/par and case sensitivity settings individually selectable

 per book

You could easily select case sensitivity or not *per query* across all
 books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert

Hi all,

Using SOLR, I believe I have to index the same content 4 times (not desirable) 
into 2 indexes -- and I don't know how you can practically do multiple indexes 
in SOLR (if indeed there is no better solution than 4 indexing runs into two 
indexes?

My need is case-sensitive and case insensitive searches over well formed XML 
content (books), performing exact searches at the paragraph and sentence levels 
-- no errors over approximate boundaries -- the source content has exact 
par/sen tags.

I have already proven a pretty nice solution for par/sen indexing twice into 
the same index in SOLR.  I have added a tags field, and put correlative XML 
tags (comma delimited) into this field (one of which is either a para or sen 
flag) which flags the document (partial) as a paragraph or sentence.  Thus all 
paragraphs of the book are indexed as single document (with its sentences 
combined and concatenated) and then all sentences in the book are indexed again 
as single documents.  Both go into the same SOLR index. I just add an AND 
tags:para or tags:sen to my search and everything works fine.

The obvious downside to this approach is the 2X indexing, but it does execute 
quite nicely on a single Index using SOLR. This obviously doesn't scale nicely, 
but will do for quite a while probably.

I thought I could live with that

But then I moved on to case sensitive and case-insensitive searches, and my 
research so far is pointing to one index for each case.

So now I have:
(1) 4X in content indexing
(2) 2X in actual SOLR/Lucene indices
(3) I don't know how to practically due multiple indices using SOLR?

If there is a better way of attacking this problem, I would appreciate 
recommendations!!!

Also, I don't know how to do multiple indices in SOLR -- I have heard it might 
be available in 1.3.0.?  If this is my only recourse, please advise me where 
really good documentation is available on building 1.3.0.  I am not admin 
savvy, but I did succeed in getting SOLR up myself and navigation through it 
with the help of this forum.  But I have that building 1.3.0 (as opposed to 
downloading and installing it, like in 1.2.0) is a whole different experience 
and much more complex.

Thanks

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert

Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field -- I guess I should have figured that out 
--but because I had not thought of that, I concluded that  I needed multiple 
indices (sorry , I am still very new to Solr/Lucene).  

Does such an approach make querying difficult under the following condition: ?

The app that I am replacing (and trying to enhance) has the ability to search 
multiple books at once with sen/par and case sensitivity settings individually 
selectable per book (e.g. default search modes per book).  So with a single 
query request (just the query word(s)), you can search one book by par, with 
case, another by sen w/o case, etc. -- all settable as user defaults.  I need 
to try to figure out how to match that in Solr/Lucene -- I believe that the 
Analyzer approach you suggested requires the use of the same Analzyer at query 
time that was used during indexing.   So if I am hitting multiple fields (in 
the same search request) that invoke different Analyzers -- am I at a dead end, 
and have to result to consequetive multiple queries instead (and sort merge 
results afterwards?)  Or am I just over complicating this?

Dave

- Original Message 
From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 2:18:00 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)



 So now I have:
 (1) 4X in content indexing
 (2) 2X in actual SOLR/Lucene indices
 (3) I don't know how to practically due multiple indices using SOLR?
 
 If there is a better way of attacking this problem, I would
 appreciate recommendations!!!
 

I don't quite follow your current approach, but it sounds like you just
 
needs some copyFields to index the same content with multiple
 analyzers.

for example, say you have fields:

  field name=content type=string indexed=true stored=true/
  field name=content_sentence type=sentence indexed=true 
stored=false/
  field name=content_paragraph type=paragraph indexed=true 
stored=false/
  field name=content_text type=text indexed=true
 stored=false/

and copy fields:

   copyField source=content dest=content_sentence/
   copyField source=content dest=content_paragraph/
   copyField source=content dest=content_text/


The 4X indexing cost?  If you *need* to index the content 4 different 
ways, you don't have any way around that - do you?  But is it really a 
big deal?  How often does it need to index?  How big is the data?

I'm not quite following your need for multiple solr indicies, but in
 1.3 
it is possible.

ryan





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Yonik Seeley

On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that invoke 
 different Analyzers -- am I at a dead end, and have to result to consequetive 
 multiple queries instead

Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability to search 
 multiple books at once
 with sen/par and case sensitivity settings individually selectable per book

You could easily select case sensitivity or not *per query* across all books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Ryan McKinley


David Neubert wrote:

Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field


yes!  each field can have its own indexing strategy.


I believe that the Analyzer approach you suggested requires the use 
of the same Analzyer at query time that was used during indexing.  


it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.


check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.


re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.


I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.


ryan

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert

Ryan (and others who need something to put them so sleep :) )

Wow -- the light-bulb finally went off -- the Analzyer admin page is very cool 
-- I just was not at all thinking the SOLR/Lucene way.

I need to rethink my whole approach now that I understand (from reviewing the 
schema.xml closer and playing with the Analyser) how compatible index and query 
policies can be applied automatically on a field by field basis by SOLR at both 
index and query time.

I still may have a stumper here, but I need to give it some thought, and may 
return again with another question:

The problem is that my text is book text (fairly large) that ooks very much 
like one would expect:
book
chapter
parasen.../sensen/sen/para
parasen.../sensen/sen/para
parasen.../sensen.../sen/para
/chapter
/book

The search results need to return exact sentences or paragraphs with their 
exact page:line numbers (which is available in the embedded markup in the text).

There were previous responses by others, suggesting I look into payloads, but I 
did not fully understand that -- I may have to re-read those e-mails now that I 
am getting a clearer picture of SOLR/Lucene.

However, the reason I resorted to indexing each paragraph as a single document, 
and then redundantly indexing each sentence as a single document, is because I 
was planning on pre-parsing the text myself (outside of SOLR) -- and feeding 
separate doc elements to the add because in that way I could produce the 
page:line reference in the pre-parsing (again outside of SOLR) and feed it in 
as explict field in the doc elements of the add requests.  Therefore at 
query time, I will have the exact page:line corresponding to the start of the 
paragraph or sentence.

But I am beginning to suspect, I was planning to do a lot of work that SOLR can 
do for me.

I will continue to study this and respond when I am a bit clearer, but the 
closer I could get to just submitting the books a chapter at a time -- and 
letting SOLR do the work, the better (cause I have all the books in well formed 
xml at chapter levels).  However, I don't  see yet how I could get par/sen 
granular search result hits, along with their exact page:line coordinates 
unless I approach it by explicitly indexing the pars and sens as single 
documents, not chapters hits, and also return the entire text of the sen or 
par, and highlight the keywords within (for the search result hit).  Once a 
search result hit is selected, it would then act as expected and position into 
the chapter, at the selected reference, highlight again the key words, but this 
time in the context of an entire chapter (the whole document to the user's 
mind).

Even with my new understanding you (and others) have given me, which I can use 
to certainly improve my approach -- it still seems to me that because 
multi-valued fields concatenate text -- even if you use the positionGapIncrment 
feature to prohibit unwanted phrase matches, how do you produce a well definied 
search result hit, bounded by the exact sen or par, unless you index them as 
single documents?

Should I still read up on the payload discussion?

Dave




- Original Message 
From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 5:00:43 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


David Neubert wrote:
 Ryan,
 
 Thanks for your response.  I infer from your response that you can
 have a different analyzer for each field

yes!  each field can have its own indexing strategy.


 I believe that the Analyzer approach you suggested requires the use 
 of the same Analzyer at query time that was used during indexing.  

it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.

check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.

re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.

I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.

ryan






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert

Yonik (or anyone else)

Do you know where on-line documentation on the +case: syntax is located?  I 
can't seem to find it.

Dave

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 4:56:40 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)

On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that
 invoke different Analyzers -- am I at a dead end, and have to result to
 consequetive multiple queries instead

Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability
 to search multiple books at once
 with sen/par and case sensitivity settings individually selectable
 per book

You could easily select case sensitivity or not *per query* across all
 books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

TextField case sensitivity

2007-06-07 Thread Xuesong Luo

I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong

fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

Re: TextField case sensitivity

2007-06-07 Thread Yonik Seeley


On 6/7/07, Xuesong Luo [EMAIL PROTECTED] wrote:

I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.


Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for text, did you restart Solr and
re-index your collection?

-Yonik



I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong

fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

Re: TextField case sensitivity

2007-06-07 Thread Ryan McKinley


have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your 
description, I'm suspicious that the generated tokens are actually:

 willi am

Also, if you want the same analyzer for indexing and query, just define one:

analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
/analyzer



Xuesong Luo wrote:

I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong

fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo

I have WordDelimiterFilter defined in the schema, I didn't include it in
my original email because I thought it doesn't matter. It seems it
matters. Looks like WilliAm is treated as two words. That's why it
didn't find a match.

Thanks
Xuesong

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, June 07, 2007 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

On 6/7/07, Xuesong Luo [EMAIL PROTECTED] wrote:
 I run a problem when searching on a TextField. When I pass q=William
or
 q=WILLiam, solr is able to find records whose default search field
value
 is William, however if I pass q=WilliAm, solr did not return any
thing.

Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for text, did you restart Solr and
re-index your collection?

-Yonik


 I searched on the archive, Yonik mentioned the lowercasefilterfactory
 doesn't work for wildcard because the QueryParser does not invoke
 analysis for partial word, that makes sense. But in my case, it's a
 whole word. Anyone knows why it's not working? Below is my schema
info.

 Thanks
 Xuesong

 fieldtype name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype

RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo

Ryan, you are right, that's the problem. WilliAM is treated as two words
by the WordDelimiterFilterFactory.

Thanks
Xuesong

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your 
description, I'm suspicious that the generated tokens are actually:
  willi am

Also, if you want the same analyzer for indexing and query, just define
one:

analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer

Xuesong Luo wrote:
 I run a problem when searching on a TextField. When I pass q=William
or
 q=WILLiam, solr is able to find records whose default search field
value
 is William, however if I pass q=WilliAm, solr did not return any
thing.
 I searched on the archive, Yonik mentioned the lowercasefilterfactory
 doesn't work for wildcard because the QueryParser does not invoke
 analysis for partial word, that makes sense. But in my case, it's a
 whole word. Anyone knows why it's not working? Below is my schema
info.

 Thanks
 Xuesong

 fieldtype name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype

Re: TextField case sensitivity

2007-06-07 Thread Mike Klaas



On 7-Jun-07, at 1:04 PM, Xuesong Luo wrote:

Ryan, you are right, that's the problem. WilliAM is treated as two  
words

by the WordDelimiterFilterFactory.


I have found this behaviour a little too aggresive for my needs, so i  
added an option to disable it.  Patch is here:

http://issues.apache.org/jira/browse/SOLR-257

I'll probably commit it in a day or so, at which point it will be  
part of the Solr nightly build.


-Mike

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

We're (and by 'we' I mean my esteemed colleague!) working on patching a few
of these items to be in the solrconf.xml file and should likely have some
patches submitted next week.  It's being done on 'company time' and I'm not
sure about the exact policy/procedure for this sort of thing here (or
indeed, if there is one at all).


That's fine, as long as your company has agreed to contribute back the
patch (under the Apache license).  Apache enjoys a lot of business
support (being business friendly) and a *lot* of contributions is done
on company time.

Anything really big would probably need a CLA, but patches only
require clicking the grant license to ASF button in JIRA.

-Yonik

Re: case sensitivity

2007-04-27 Thread Michael Kimsal


Can you point me to the process for submitting these small patches?  I'm
looking at the jira site but don't see much of anything there outlining a
process for submitting patches.  Sorry to be so basic about this, but I'm
trying to follow correct procedures on both sides of the aisle, so to speak.


On 4/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:
 We're (and by 'we' I mean my esteemed colleague!) working on patching a
few
 of these items to be in the solrconf.xml file and should likely have
some
 patches submitted next week.  It's being done on 'company time' and I'm
not
 sure about the exact policy/procedure for this sort of thing here (or
 indeed, if there is one at all).

That's fine, as long as your company has agreed to contribute back the
patch (under the Apache license).  Apache enjoys a lot of business
support (being business friendly) and a *lot* of contributions is done
on company time.

Anything really big would probably need a CLA, but patches only
require clicking the grant license to ASF button in JIRA.

-Yonik





--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-27 Thread Otis Gospodnetic

Once the code/patch in the issue is put/committed to SVN, it means it will be 
in the next release.  You get your patch committed faster if it's clear, well 
written and explained, if it comes with a unit test if it's a code change, and 
so on.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Michael Kimsal [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, April 27, 2007 1:47:06 PM
Subject: Re: case sensitivity

What's the procedure then for something to get included in the next
release?

Thanks again all!

On 4/27/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 So I just create my own 'issue' first?  OK.  Thanks.

 On 4/27/07, Ryan McKinley [EMAIL PROTECTED] wrote:
 
  Michael Kimsal wrote:
   Can you point me to the process for submitting these small
  patches?  I'm
   looking at the jira site but don't see much of anything there
  outlining a
   process for submitting patches.  Sorry to be so basic about this, but
  I'm
   trying to follow correct procedures on both sides of the aisle, so to
   speak.
  
 
  Check: http://wiki.apache.org/solr/HowToContribute
 
  Essentially you will create a new issue on JIRA, then upload a svn diff
  to that issue.
 
  holler if you have any troubles
 
  ryan
 
 


 --
 Michael Kimsal
 http://webdevradio.com




-- 
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:

I think we should open up as many of the switches as we can to
QueryParser, allowing users to tinker with them if they want, setting
the defaults to the most common reasonable settings we can agree upon.


I think we should also try and handle what we can automatically too.
Always lowercasing or not isn't elegant, as the right thing to do
depends on the field.

I always had it in my head that the QueryParser should figure it out.
Actually, for good performance, the fieldType should figure it out just once.
The presense of a LowerCaseFilter could be one signal to lowercase
prefix strings,
or one could actually run a test token through analysis and test if it
comes out lowercased.

Numeric fields are a sticking point... prefix queries and wildcard
queries aren't even possible there.  Of course, even stemming is
problematic with wildcard queries.

-Yonik

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?


Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false).  I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too.  You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't.  If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

Re: case sensitivity

2007-04-27 Thread Michael Pelz Sherman

In our experience, setting a LowercaseFilter in the query did not work; we had 
to call setLowercaseExpandedTerms(true) to get wildcard queries to be 
case-insensitive.
   
  Here's our analyzer definition from our solr schema:
   
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
   
  If calling setLowercaseExpandedTerms(true) is *not* in fact necessary for 
case-insensitive wildcard queries, could you please provide an example of a 
solr schema that can achieve this?
   
  Thanks!
  - mps
  
Yonik Seeley [EMAIL PROTECTED] wrote:
  On 4/26/07, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true. Was this intentional? Or an oversight?

Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false). I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too. You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't. If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

Re: case sensitivity

2007-04-27 Thread Yonik Seeley


On 4/27/07, Michael Pelz Sherman [EMAIL PROTECTED] wrote:

In our experience, setting a LowercaseFilter in the query did not work; we had 
to call setLowercaseExpandedTerms(true) to get wildcard queries to be 
case-insensitive.


Correct, because in that case the QueryParser does not invoke analysis
(because it's a partial word, not a whole word).


  If calling setLowercaseExpandedTerms(true) is *not* in fact necessary for 
case-insensitive wildcard queries, could you please provide an example of a 
solr schema that can achieve this?


I didn't say that :-)

I'm saying setLowercaseExpandedTerms(true) is not sufficient for
wildcard queries in general.  If the term is indexed as Windows95,
then a prefix query of Windows* won't find anything if
setLowercaseExpandedTerms(true)

-Yonik



Yonik Seeley [EMAIL PROTECTED] wrote:
  On 4/26/07, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true. Was this intentional? Or an oversight?

Way back before Solr was opensourced, and Chris was the only
user, I thought he needed to do prefix queries where case sensitive
wildcard queries (hence I set it to false). I think I may have been
mistaken about that need, but by that time, I didn't know if anyone
depended on it, so I never changed it back.

A default of false is actually more powerful too. You can do prefix
queries on fields that have a LowercaseFilter in their analyzer, and
also fields that don't. If it's set to true, you can't reliably do
prefix queries on fields that don't have a LowercaseFilter.

-Yonik

case sensitivity

2007-04-26 Thread Michael Kimsal


I've looked through the mailing lists and can't find much of anything
regarding case sensitivity.  It
seems SOLR is case sensitive by default - I'm using the default settings
with a very basic schema - just text fields.

Is there any way to tell the query parser to be case insensitive during a
query?  Or do I have to reindex
all my data again with lowercase values?



--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-26 Thread Erik Hatcher



On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:

I've looked through the mailing lists and can't find much of anything
regarding case sensitivity.  It
seems SOLR is case sensitive by default - I'm using the default  
settings

with a very basic schema - just text fields.


All depends on the analysis you have set up for the fields.  If  
you're indexing string-type fields in the default example schema,  
there is effectively no analysis so searches must be exact matches  
case and all.


Is there any way to tell the query parser to be case insensitive  
during a

query?  Or do I have to reindex
all my data again with lowercase values?


Terms are indexed in a case-sensitive manner, so if you need case  
insensitivity you need to lowercase on the way in and on querying.


Erik

Re: case sensitivity

2007-04-26 Thread Michael Kimsal


I was just writing a followup.

I'm using the default text field type

   fieldtype name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   !-- in this example, we will only use synonyms at query time
   filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
   --
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldtype


That looks to me like it's got LowerCaseFilterFactory in the query analyzer
and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
 I've looked through the mailing lists and can't find much of anything
 regarding case sensitivity.  It
 seems SOLR is case sensitive by default - I'm using the default
 settings
 with a very basic schema - just text fields.

All depends on the analysis you have set up for the fields.  If
you're indexing string-type fields in the default example schema,
there is effectively no analysis so searches must be exact matches
case and all.

 Is there any way to tell the query parser to be case insensitive
 during a
 query?  Or do I have to reindex
 all my data again with lowercase values?

Terms are indexed in a case-sensitive manner, so if you need case
insensitivity you need to lowercase on the way in and on querying.

Erik






--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-26 Thread Michael Kimsal


type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) ) )
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) ) )

Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always case-sensitive.
However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:


I was just writing a followup.

I'm using the default text field type

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/

--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=1 catenateNumbers=1 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=
solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=
solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=
solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype


That looks to me like it's got LowerCaseFilterFactory in the query
analyzer and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:


 On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
  I've looked through the mailing lists and can't find much of anything
  regarding case sensitivity.  It
  seems SOLR is case sensitive by default - I'm using the default
  settings
  with a very basic schema - just text fields.

 All depends on the analysis you have set up for the fields.  If
 you're indexing string-type fields in the default example schema,
 there is effectively no analysis so searches must be exact matches
 case and all.

  Is there any way to tell the query parser to be case insensitive
  during a
  query?  Or do I have to reindex
  all my data again with lowercase values?

 Terms are indexed in a case-sensitive manner, so if you need case
 insensitivity you need to lowercase on the way in and on querying.

 Erik





--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-26 Thread Michael Kimsal


My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?

Perhaps it's not related to my problem, but it seems that it might be.

Thanks in advance!

On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:


type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) )
)
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) )
)

Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always
case-sensitive. However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
 to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 I was just writing a followup.

 I'm using the default text field type

 fieldtype name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index


 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/


 --
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=1 catenateNumbers=1 catenateAll=0/


 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=

 solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=

 solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true words=

 stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0/


 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=

 solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldtype


 That looks to me like it's got LowerCaseFilterFactory in the query
 analyzer and the index analyzer.

 I'm still digging in to this, but are there any other things to look for
 anyone can point me to?  (Thanks Erik!)




 On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
  On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
   I've looked through the mailing lists and can't find much of
  anything
   regarding case sensitivity.  It
   seems SOLR is case sensitive by default - I'm using the default
   settings
   with a very basic schema - just text fields.
 
  All depends on the analysis you have set up for the fields.  If
  you're indexing string-type fields in the default example schema,
  there is effectively no analysis so searches must be exact matches
  case and all.
 
   Is there any way to tell the query parser to be case insensitive
   during a
   query?  Or do I have to reindex
   all my data again with lowercase values?
 
  Terms are indexed in a case-sensitive manner, so if you need case
  insensitivity you need to lowercase on the way in and on querying.
 
  Erik
 
 
 


 --
 Michael Kimsal
 http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

2007-04-26 Thread Erik Hatcher



On Apr 26, 2007, at 6:03 PM, Michael Kimsal wrote:

My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an  
oversight?


I was just about to respond that this is likely the issue with your  
non-totally-lowercased wildcard terms.


I don't consider it an oversight, but rather this whole analysis  
business and wildcards are things that vary from project to project  
on how they should be handled.  If you, have, for example, a string  
field and want to do prefixed queries on them (trailing asterisk) you  
wouldn't want the term to be lowercased.


I think we should open up as many of the switches as we can to  
QueryParser, allowing users to tinker with them if they want, setting  
the defaults to the most common reasonable settings we can agree upon.


Erik

Re: case sensitivity

2007-04-26 Thread Michael Kimsal


We're (and by 'we' I mean my esteemed colleague!) working on patching a few
of these items to be in the solrconf.xml file and should likely have some
patches submitted next week.  It's being done on 'company time' and I'm not
sure about the exact policy/procedure for this sort of thing here (or
indeed, if there is one at all).


On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 26, 2007, at 6:03 PM, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true.  Was this intentional?  Or an
 oversight?

I was just about to respond that this is likely the issue with your
non-totally-lowercased wildcard terms.

I don't consider it an oversight, but rather this whole analysis
business and wildcards are things that vary from project to project
on how they should be handled.  If you, have, for example, a string
field and want to do prefixed queries on them (trailing asterisk) you
wouldn't want the term to be lowercased.

I think we should open up as many of the switches as we can to
QueryParser, allowing users to tinker with them if they want, setting
the defaults to the most common reasonable settings we can agree upon.

Erik





--
Michael Kimsal
http://webdevradio.com

Case sensitivity on hostnames and email addresses

2006-12-13 Thread Wade Leftwich

I've run into some unexpected case sensitivity on searches, at least
unexpected by me.

If you index a text field containing this sentence:

A sentence containing CamelCase words by [EMAIL PROTECTED] is found
at StudlyCaps.org

The document will be found by searching for camelcase but not for
[EMAIL PROTECTED] or studlycaps.org.

This happens with the Standard or the DisMax query handler.

A bit of a problem for me, because I'm indexing a bunch of business
magazines, and domain names are frequently capitalized, often in CamelCase.

Is this maybe a bug? Or a WAD?

-- Wade Leftwich
Ithaca, NY

70 matches

Mail list logo