Hi,
 
I got SolrException when submitting XML for indexing (using solr 3.6.1)
 
////
Jan 15, 2013 10:22:42 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, cod
e 31))
 at [row,col {unknown-source}]: [2,1169]
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)
 
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
((CTRL-CHAR, code 31))
...
 at [row,col {unknown-source}]: [2,1169]
        at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
        at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660)
        at 
com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240)
        at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280)
        at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
        at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
////
 
I checked details, the data causing trouble is 
 
word1chr(31)word2
 
here both word1 and word2 are normail English characters and "chr(31)" is just 
the returning value of PHP
function chr(31). Our XML is well constructed and encoding/charset are well 
defined. 
 
The problem is due to chr(31), if I replace it with another UTF-8 character, 
indexing is OK. 
 
I checked source code com.ctc.wstx.sr.BasicStreamReader.java, it seems that it 
is by design any CTRL
character is not allowed inside CDATA text, but I am puzzled that how could we 
avoid CTRL character in
text in general (sure it is not a common occurance but can still happen)?
 
Thanks very much for helps, Lisheng

Reply via email to