My understanding is that if one does reads like the
currently-checked-in FileUtils, where you read into a buffer of any
reasonable size, there is no additional advantage to using a
BufferedInputStream (you are essentially implementing buffering
yourself anyway). The advantage comes if you want to use the 1-byte
read method, since this would be highly inefficient if you did not use
a BufferedInputStream to manage the buffer for you.
If Xerces didn't perform well when passed a FileInputStream, I'd say
that would be a bug in Xerces for sure. It would be terrible to force
your users to create a BufferedInputStream every time they wanted to
parse something at a reasonable speed.
Tweaking buffer sizes could help I guess, feel free to do a test. My
gut says the Xerces default will perform just fine, or somebody would
have changed it by now.
-Adam
On 2/12/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
Adam Lally wrote:
> I doubt it. Is there something that led you to believe this would be
> necessary?
Just doing some code inspection and seeing this - that it is perfectly
feasible to
pass a buffered version of the input to this, and that the general
contract for IO
seems to imply that you should use buffering for performance considerations.
But I see from some web surfing that the Xerces impl does some buffering,
and you can set the buffer size via a property (do we do that? default
= 2k I think,
and the Apache license is about 1K by itself :-) ).
I guess some simple test would tell...
Some web surfing turned up:
Parsers like Apache Xerces have the ability to set the input buffer size:
|// Set the chunk to read in by SAX
parser.setProperty("http://apache.org/xml/properties/input-buffer-size",
new Integer(2048));
See also http://xerces.apache.org/xerces2-j/properties.html
which gives some advice on how large to set this.
|
-Marshall