previously given pattern will solve the '<' char issue. however you
will get following exception in the log
Caused by: java.util.regex.PatternSyntaxException: Look-behind group
does not have an obvious maximum length near index 48
(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)
^
so revisit your regex pattern particularly position 48
-Jeevanandam
On 19-04-2012 7:06 pm, Jeevanandam wrote:
try this one
pattern="(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
I tested locally, solr start perfectly. now please test with data.
-Jeevanandam
On 19-04-2012 9:29 am, smooth almonds wrote:
Using Solr 3.5.0 and in my schema.xml I'm using the following to
mark the end
of sentences and replace the end punctuation with a symbolic token:
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
replacement=" monkeysentence"/>
I'm not sure if that will even work for what I want, but first I
need to
solve the problem of escaping the '<' character in the first '?<='
lookbehind.
I get the following error:
org.xml.sax.SAXParseException: The value of attribute "pattern"
associated
with an element type "null" must not contain the '<' character.
I've tried using a '\' as in:
pattern="(?\<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
But I get the same error.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-escape-character-in-regex-in-Solr-schema-xml-tp3921961p3921961.html
Sent from the Solr - User mailing list archive at Nabble.com.