Re: Solr - Remove specific punctuation marks

Jack Krupansky Mon, 24 Sep 2012 10:07:34 -0700

I tried it and PRFF is indeed generating an empty token. I don't know howLucene will index or query an empty term. I mean, what it "should" do. Inany case, it is best to avoid them.

You should be using a "charFilter" to simply filter raw characters beforetokenizing. So, try:


<charFilter class="solr.PatternReplaceCharFilterFactory"/>

It has the same pattern and replacement attributes.

-- Jack Krupansky

-----Original Message-----From: Jack Krupansky

Sent: Monday, September 24, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Remove specific punctuation marks

1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex "\p{Punct}":
"POSIX character classes (US-ASCII only)", so if any of the punctuation is
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has empty terms - normally the
query parsers will ignore terms that analyze to zero tokens. Maybe your "{"
is not an ASCII left brace code and is (apparently) unprintable in the
parsed query. Or, maybe there is some encoding problem in the analyzer.

-- Jack Krupansky

-----Original Message-----From: Daisy

Sent: Monday, September 24, 2012 9:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Remove specific punctuation marks

I tried &amp; and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,

<str name="rawquerystring">"{"</str>
<str name="querystring">"{"</str>
<str name="parsedquery">text:</str>
<str name="parsedquery_toString">text:</str>

but still the numfound gives 1

<result name="response" numFound="1" start="0">

and the highlight shows the result of punctuation mark
<em>{</em>
The steps I did:
1- editing the schema
2- restart the server
3-delete the file
4-index the file




--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr - Remove specific punctuation marks

Reply via email to