escaping/removing control characters?

2008-12-13 Thread Ryan McKinley
I'm indexing some mail archives and within the various formats/ encodings etc, some messages have invalid control characters. doc.setField( body, content.toString() ); In the solr logs, I get: [java] SEVERE: java.io.IOException: Illegal character ((CTRL- CHAR, code 22)) [java] at

Re: escaping/removing control characters?

2008-12-13 Thread Yonik Seeley
On Sat, Dec 13, 2008 at 1:45 PM, Ryan McKinley ryan...@gmail.com wrote: Is there any standard way to escape invalid xml control characters? Not that I know of... it's a shame that XML can't carry the full unicode range. Good reason to get binary or JSON indexing interface at some point... I