RE: Problems with WordDelimiterFilterFactory

Bernadette Houghton Thu, 08 Oct 2009 15:31:32 -0700

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -


        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />

To 

        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement=" " replace="all"
        />

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect. 

Regards
Bern

-----Original Message-----
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] 
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence "--". A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
> Sorry, the last line was truncated -
> 
> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
> '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 
> 7. Was expecting one of: "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> 
> ... <WILDTERM> ... "[" ... "{" ... <NUMBER> ...
> 
> -----Original Message-----
> From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
> Sent: Friday, 9 October 2009 8:22 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Problems with WordDelimiterFilterFactory
> 
> Here's the query and the error - 
> 
> Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization 
> AND status_i:(2)) 
> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
> Oct 09 08:20:17  [error] Error on searching: "400" Status: 
> org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
> Civilization AND status_i:(2)) ': Encount
> 
> Bern
> 
> -----Original Message-----
> From: Christian Zambrano [mailto:czamb...@gmail.com] 
> Sent: Thursday, 8 October 2009 12:48 PM
> To: solr-user@lucene.apache.org
> Cc: solr-user@lucene.apache.org
> Subject: Re: Problems with WordDelimiterFilterFactory
> 
> Bern,
> 
> I am interested on the solr query. In other words, the query that your  
> system sends to solr.
> 
> Thanks,
> 
> 
> Christian
> 
> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
> <bernadette.hough...@deakin.edu.au 
>  > wrote:
> 
>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>>
>> Either scroll down and click one of the "television broadcasting --  
>> asia" links, or type it in the Quick Search box.
>>
>>
>> TIA
>>
>> bern
>>
>> -----Original Message-----
>> From: Christian Zambrano [mailto:czamb...@gmail.com]
>> Sent: Thursday, 8 October 2009 9:43 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Could you please provide the exact URL of a query where you are
>> experiencing this problem?
>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>
>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>> We are having some issues with our solr parent application not  
>>> retrieving records as expected.
>>>
>>> For example, if the input query includes a colon (e.g. hot and  
>>> cold: temperatures), the relevant record (which contains a colon in  
>>> the same place) does not get retrieved; if the input query does not  
>>> include the colon, all is fine.  Ditto if the user searches for a  
>>> query containing hyphens, e.g. "asia - civilization, although with  
>>> the qualifier that something like "asia-civilization" (no spaces  
>>> either side of the hyphen) works fine, whereas "asia -  
>>> civilization" (spaces either side of hyphen) doesn't work.
>>>
>>> Our schema.xml contains the following -
>>>
>>>     <fieldType name="text" class="solr.TextField"  
>>> positionIncrementGap="100">
>>>       <analyzer type="index">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>         <!-- in this example, we will only use synonyms at query time
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>         -->
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>>> catenateNumbers="1" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>       <analyzer type="query">
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>                                 <filter  
>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>         <filter class="solr.SynonymFilterFactory"  
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>         <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>>> catenateNumbers="0" catenateAll="0"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>       </analyzer>
>>>     </fieldType>
>>>
>>> Bernadette Houghton, Library Business Applications Developer
>>> Deakin University Geelong Victoria 3217 Australia.
>>> Phone: 03 5227 8230 International: +61 3 5227 8230
>>> Fax: 03 5227 8000 International: +61 3 5227 8000
>>> MSN: bern_hough...@hotmail.com
>>> Email: 
>>> bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au 
>>> Website: http://www.deakin.edu.au
>>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code  
>>> 00113B (Vic)
>>>
>>> Important Notice: The contents of this email are intended solely  
>>> for the named addressee and are confidential; any unauthorised use,  
>>> reproduction or storage of the contents is expressly prohibited. If  
>>> you have received this email in error, please delete it and any  
>>> attachments immediately and advise the sender by return email or  
>>> telephone.
>>> Deakin University does not warrant that this email and any  
>>> attachments are error or virus free
>>>
>>>
>>>

RE: Problems with WordDelimiterFilterFactory

Reply via email to