On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : I agree with you, 0xfffe is a special character, that is why I was asking > : how it's handled in solr. > : In my document, 0xfffe does not appear at the beginning, it's in the > : content. > > Unless i'm missunderstanding something (and it's very likely that i am)... > > 0xfffe is not a special character -- it is explicitly *not* a character in > Unicode at all, it is set asside as "not a character." specifically so > that the character 0xfeff can be used as a BOM, and if the BOM is read > incorrectly, it will cause an error.
XML doesnt allow control character like this, it defines character as: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */