Tom Bradford wrote:
Isn't the default/implied encoding for an XML document UTF-8 though? So wouldn't the Latin character set be available? I would think a parser would be able to make this assumption and behave properly without being told specifically.
Right Tom, that's what I thought too, but it seems like it's not the case: if the same file is taken from disk everything is fine, but if it comes from XIndice it just doesn't work. The only difference that I could see in the XML streams is the encoding part which is missing. But actually I even don't understand what this small encoding issue might have to do when the only things flowing around should be SAX events as it is in Cocoon where only getContentAsSAX() is called. Isn't it weird?
We can always modify the serializer to write out an encoding of UTF-8,
since really, that's all we support at the moment anyway :-). Once we
shift to the DTSM format, we should be able to maintain any Non-UTF-8
encodings.
Eagerly waiting for exotic encodings :)
Ciao,
-- Gianugo Rabellino
