Re: Problems with WordDelimiterFilterFactory

2009-10-10 Thread Shalin Shekhar Mangar
On Fri, Oct 9, 2009 at 3:33 AM, Patrick Jungermann 
patrick.jungerm...@googlemail.com wrote:

 Hi Bern,

 the problem is the character sequence --. A query is not allowed to
 have minus characters that consequent upon another one. Remove one minus
 character and the query will be parsed without problems.


Or you could escape the hyphen character. If you are using SolrJ, use
ClientUtils.escapeQueryChars on the query string.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problems with WordDelimiterFilterFactory

2009-10-09 Thread Chantal Ackermann

Hi Bernadette,

Bernadette Houghton schrieb:

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -


that's probably because the hyphen/minus has a special meaning (not 
containing)? Try putting the input in quotes. But I agree with 
Christian that the hyphens should have been removed during index time by 
the token filters.


cheers
chantal






filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/

To

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement=  replace=all
/

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect.

Regards
Bern


RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any  
 attachments immediately and advise the sender by return email or  
 telephone.
 Deakin University does not warrant that this email and any  
 attachments are error or virus free





RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
'(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
... WILDTERM ... [ ... { ... NUMBER ...

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: Friday, 9 October 2009 8:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any

Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Patrick Jungermann
Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email

RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Thanks for this, marklo; it is a *very* useful page.
bern

-Original Message-
From: marklo [mailto:mar...@pcmall.com] 
Sent: Thursday, 8 October 2009 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory


Use http://solr-url/solr/admin/analysis.jsp to see how your data is
indexed/queried

-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/

To 

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement=  replace=all
/

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect. 

Regards
Bern

-Original Message-
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] 
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory

Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Christian Zambrano

Bern,

The only way that could be happening is if you are not using the field 
type you described on your original e-mail. The TokenFilter 
WordDelimiterFilterFactory should take care of the hyphen.


On 10/08/2009 05:30 PM, Bernadette Houghton wrote:

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

 filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z]) replacement= replace=all
 /

To

 filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z]) replacement=  replace=all
 /

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect.

Regards
Bern

-Original Message-
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com]
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
   

Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, 
column 7. Was expecting one of: ( ... * ...QUOTED  ...TERM  ...PREFIXTERM  ...WILDTERM  ... 
[ ... { ...NUMBER  ...

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au]
Sent: Friday, 9 October 2009 8:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error -

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2))
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND 
status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com]
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette 
Houghtonbernadette.hough...@deakin.edu.au
wrote:

 

Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

Either scroll down and click one of the television broadcasting --
asia links, or type it in the Quick Search box.


TIA

bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com]
Sent: Thursday, 8 October 2009 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:hot and cold: temperatures

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
   

We are having some issues with our solr parent application not
retrieving records as expected.

For example, if the input query includes a colon (e.g. hot and
cold: temperatures), the relevant record (which contains a colon in
the same place) does not get retrieved; if the input query does not
include the colon, all is fine.  Ditto if the user searches for a
query containing hyphens, e.g. asia - civilization, although with
the qualifier that something like asia-civilization (no spaces
either side of the hyphen) works fine, whereas asia -
civilization (spaces either side of hyphen) doesn't work.

Our schema.xml contains the following -

 fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter
class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer

Re: Problems with WordDelimiterFilterFactory

2009-10-07 Thread Christian Zambrano
Could you please provide the exact URL of a query where you are 
experiencing this problem?

eg(Not URL encoded): q=fieldName:hot and cold: temperatures

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:

We are having some issues with our solr parent application not retrieving 
records as expected.

For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record 
(which contains a colon in the same place) does not get retrieved; if the input query does not include 
the colon, all is fine.  Ditto if the user searches for a query containing hyphens, e.g. asia - 
civilization, although with the qualifier that something like asia-civilization (no spaces 
either side of the hyphen) works fine, whereas asia - civilization (spaces either side of 
hyphen) doesn't work.

Our schema.xml contains the following -

 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
 --
 filter 
class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=1 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter 
class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email: 
bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.
Deakin University does not warrant that this email and any attachments are 
error or virus free


   


RE: Problems with WordDelimiterFilterFactory

2009-10-07 Thread Bernadette Houghton
Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

Either scroll down and click one of the television broadcasting -- asia 
links, or type it in the Quick Search box.


TIA

bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are 
experiencing this problem?
eg(Not URL encoded): q=fieldName:hot and cold: temperatures

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not retrieving 
 records as expected.

 For example, if the input query includes a colon (e.g. hot and cold: 
 temperatures), the relevant record (which contains a colon in the same place) 
 does not get retrieved; if the input query does not include the colon, all is 
 fine.  Ditto if the user searches for a query containing hyphens, e.g. asia 
 - civilization, although with the qualifier that something like 
 asia-civilization (no spaces either side of the hyphen) works fine, whereas 
 asia - civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

  fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  filter 
 class=solr.ISOLatin1AccentFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter 
 class=solr.ISOLatin1AccentFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=0 
 catenateNumbers=0 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic)

 Important Notice: The contents of this email are intended solely for the 
 named addressee and are confidential; any unauthorised use, reproduction or 
 storage of the contents is expressly prohibited. If you have received this 
 email in error, please delete it and any attachments immediately and advise 
 the sender by return email or telephone.
 Deakin University does not warrant that this email and any attachments are 
 error or virus free





Re: Problems with WordDelimiterFilterFactory

2009-10-07 Thread Christian Zambrano

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.


Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au 
 wrote:



Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

Either scroll down and click one of the television broadcasting --  
asia links, or type it in the Quick Search box.



TIA

bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com]
Sent: Thursday, 8 October 2009 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:hot and cold: temperatures

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
We are having some issues with our solr parent application not  
retrieving records as expected.


For example, if the input query includes a colon (e.g. hot and  
cold: temperatures), the relevant record (which contains a colon in  
the same place) does not get retrieved; if the input query does not  
include the colon, all is fine.  Ditto if the user searches for a  
query containing hyphens, e.g. asia - civilization, although with  
the qualifier that something like asia-civilization (no spaces  
either side of the hyphen) works fine, whereas asia -  
civilization (spaces either side of hyphen) doesn't work.


Our schema.xml contains the following -

fieldType name=text class=solr.TextField  
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory  
synonyms=index_synonyms.txt ignoreCase=true expand=false/

--
filter  
class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter  
class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 


Website: http://www.deakin.edu.au
http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
00113B (Vic)


Important Notice: The contents of this email are intended solely  
for the named addressee and are confidential; any unauthorised use,  
reproduction or storage of the contents is expressly prohibited. If  
you have received this email in error, please delete it and any  
attachments immediately and advise the sender by return email or  
telephone.
Deakin University does not warrant that this email and any  
attachments are error or virus free






Re: Problems with WordDelimiterFilterFactory

2009-10-07 Thread marklo

Use http://solr-url/solr/admin/analysis.jsp to see how your data is
indexed/queried

-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Problems with WordDelimiterFilterFactory

2009-10-07 Thread Sandeep Tagore

Hi Bern,
I indexed some records with - and : today using your configuration and I
searched with following urls
http://localhost/solr/select?q=CONTENT:cold : temperature
http://localhost/solr/select?q=CONTENT:cold: temperature
http://localhost/solr/select?q=CONTENT:cold :temperature
http://localhost/solr/select?q=CONTENT:cold temperature
and 
http://localhost/solr/select?q=CONTENT:asia - civilization
http://localhost/solr/select?q=CONTENT:asia- civilization
http://localhost/solr/select?q=CONTENT:asia -civilization
http://localhost/solr/select?q=CONTENT:asia civilization
The results doesn't make any difference. It worked all the times and I saw
the relevant records.

Regards,
Sandeep
-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25798793.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problems with WordDelimiterFilterFactory

2008-12-24 Thread GPS.

Tried this once again, with different combinations, none worked actually..


David Smiley @MITRE.org wrote:
 
 It seems you want Id to only match on complete field values.  If that is
 the case then you should not do tokenization nor perhaps any text analysis
 altogether.  Consider removing the whole analyzer / block or using
 KeywordTokenizerFactory plus a modicum of other stuff (perhaps
 lowercasing).  For the particular examples you gave... it would be
 sufficient to simply remove “WordDelimeterFilterFactory” as an analysis
 step.
 
 ~ David
 
 
 GPS. wrote:
 
 I am using a fieldType, with following configuration:
 
 !-- Less flexible matching, but less false matches.  Probably not ideal
 for product names,
  but may be good for SKUs.  Can insert dashes in the wrong place
 and still match. --
 fieldType name=textTight class=solr.TextField
 positionIncrementGap=100 
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
 I have field name=Id type=textTight indexed=true stored=true
 omitNorms=true/
 
 When I try searching with :
 http://localhost:8001/solr/select/?q=Id:ARMZ
 It gives me complete list, where Id is: ARMZ or ARMZ117 or ARMZ129
 
 What I want is if I search for ARMZ, it should tightly match only ARMZ
 and shouldn't return ARMZ117 OR ARMZ129
 Similarly, If I try searching for ARMZ1, it shouldn't give me any of
 ARMZ117 OR ARMZ129
 
 Is it possible to achieve this, by somehow strictly mapping the input
 text with field Id?
 Any help on this matter would be deeply appreciated.
 
 Thanks
 GPS.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp21149384p21156935.html
Sent from the Solr - User mailing list archive at Nabble.com.