Or, if you feel you have to store the file in memory before parsing, then
at least store it in a ByteArray and not a CharArray. Then you can feed
the parser with an InputStream on the ByteArray, and avoid the encoding
and byte-order problems that Marshall describes.
On Sun, 22 Aug 2010 13:48:39 -0700, Marshall Schor <[email protected]> wrote:
I'm not an expert here, but I found by googling that at least one
person thinks
it's a bad practice to read things into char arrays, and then send those
to an
XML parser.
The web page http://www.odi.ch/prog/design/newbies.php#7 says:
It is a very bad idea to read an XML file and store it in a String. An
XML
specifies its encoding in the XML header. But when reading a file you
have to
know the encoding beforehand! Also storing an XML file in a String wastes
memory. All XML parsers accept an InputStream as a parsing source and
they
figure out the encoding themselves correctly. So you can feed them an
InputStream instead of storing the whole file in memory temporarily. The
byte
order (big-endian, little-endian) is another trap when a multi-byte
encoding
(such as UTF-8) is used. XML files may carry a byte order mark at the
beginning
that specifies the byte order. XML parsers handle them correctly.
-Marshall
On 8/22/2010 8:52 AM, John Wiesel wrote:
Dear all,
I am currently stalled in my project by XmiCasDeserializer.deserialize:
I
am wondering why there is no method that allows to directly set up the
XML
parser with a InputSource instead of an InputStream. I would like to
load
my CAS from an XMI file that I have cached in a CharArray. As I cannot
generate an InputStream from a String (StringBufferInputStream is
deprecated since JDK 1.1) but should be able to do so using an
InputSource
w/o much trouble, I hope there is a sensible solution for this that I
just
haven't thought of yet.
Any suggestions?
Thanks folks.
John
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/