Or, if you feel you have to store the file in memory before parsing, then
at least store it in a ByteArray and not a CharArray.  Then you can feed
the parser with an InputStream on the ByteArray, and avoid the encoding
and byte-order problems that Marshall describes.

On Sun, 22 Aug 2010 13:48:39 -0700, Marshall Schor <[email protected]> wrote:

I'm not an expert here, but I found by googling that at least one person thinks it's a bad practice to read things into char arrays, and then send those to an
XML parser.

The web page http://www.odi.ch/prog/design/newbies.php#7 says:

It is a very bad idea to read an XML file and store it in a String. An XML specifies its encoding in the XML header. But when reading a file you have to
know the encoding beforehand! Also storing an XML file in a String wastes
memory. All XML parsers accept an InputStream as a parsing source and they
figure out the encoding themselves correctly. So you can feed them an
InputStream instead of storing the whole file in memory temporarily. The byte order (big-endian, little-endian) is another trap when a multi-byte encoding (such as UTF-8) is used. XML files may carry a byte order mark at the beginning
that specifies the byte order. XML parsers handle them correctly.

-Marshall

On 8/22/2010 8:52 AM, John Wiesel wrote:
Dear all,

I am currently stalled in my project by XmiCasDeserializer.deserialize: I am wondering why there is no method that allows to directly set up the XML parser with a InputSource instead of an InputStream. I would like to load
my CAS from an XMI file that I have cached in a CharArray. As I cannot
generate an InputStream from a String (StringBufferInputStream is
deprecated since JDK 1.1) but should be able to do so using an InputSource w/o much trouble, I hope there is a sensible solution for this that I just
haven't thought of yet.

Any suggestions?
Thanks folks.

John




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Reply via email to