I've had problems with empty tokens. You can remove those with this as a step in the analyzer chain.
<filter class="solr.LengthFilterFactory" min="1" max="1024"/> wunder On Sep 24, 2012, at 10:07 AM, Jack Krupansky wrote: > I tried it and PRFF is indeed generating an empty token. I don't know how > Lucene will index or query an empty term. I mean, what it "should" do. In any > case, it is best to avoid them. > > You should be using a "charFilter" to simply filter raw characters before > tokenizing. So, try: > > <charFilter class="solr.PatternReplaceCharFilterFactory"/> > > It has the same pattern and replacement attributes. > > -- Jack Krupansky > > -----Original Message----- From: Jack Krupansky > Sent: Monday, September 24, 2012 12:43 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr - Remove specific punctuation marks > > 1. Which query parser are you using? > 2. I see the following comment in the Java 6 doc for regex "\p{Punct}": > "POSIX character classes (US-ASCII only)", so if any of the punctuation is > some higher Unicode character code, it won't be matched/removed. > 3. It seems very odd that the parsed query has empty terms - normally the > query parsers will ignore terms that analyze to zero tokens. Maybe your "{" > is not an ASCII left brace code and is (apparently) unprintable in the > parsed query. Or, maybe there is some encoding problem in the analyzer. > > -- Jack Krupansky > > -----Original Message----- From: Daisy > Sent: Monday, September 24, 2012 9:26 AM > To: solr-user@lucene.apache.org > Subject: RE: Solr - Remove specific punctuation marks > > I tried & and it solved the 500 error code. But still it could find > punctuation marks. > Although the parsed query didnt contain the punctuation mark, > > <str name="rawquerystring">"{"</str> > <str name="querystring">"{"</str> > <str name="parsedquery">text:</str> > <str name="parsedquery_toString">text:</str> > > but still the numfound gives 1 > > <result name="response" numFound="1" start="0"> > > and the highlight shows the result of punctuation mark > <em>{</em> > The steps I did: > 1- editing the schema > 2- restart the server > 3-delete the file > 4-index the file > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org