I tried it and PRFF is indeed generating an empty token. I don't know how
Lucene will index or query an empty term. I mean, what it "should" do. In
any case, it is best to avoid them.
You should be using a "charFilter" to simply filter raw characters before
tokenizing. So, try:
<charFilter class="solr.PatternReplaceCharFilterFactory"/>
It has the same pattern and replacement attributes.
-- Jack Krupansky
-----Original Message-----
From: Jack Krupansky
Sent: Monday, September 24, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Remove specific punctuation marks
1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex "\p{Punct}":
"POSIX character classes (US-ASCII only)", so if any of the punctuation is
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has empty terms - normally the
query parsers will ignore terms that analyze to zero tokens. Maybe your "{"
is not an ASCII left brace code and is (apparently) unprintable in the
parsed query. Or, maybe there is some encoding problem in the analyzer.
-- Jack Krupansky
-----Original Message-----
From: Daisy
Sent: Monday, September 24, 2012 9:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Remove specific punctuation marks
I tried & and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,
<str name="rawquerystring">"{"</str>
<str name="querystring">"{"</str>
<str name="parsedquery">text:</str>
<str name="parsedquery_toString">text:</str>
but still the numfound gives 1
<result name="response" numFound="1" start="0">
and the highlight shows the result of punctuation mark
<em>{</em>
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html
Sent from the Solr - User mailing list archive at Nabble.com.