My understanding is that if one does reads like the
currently-checked-in FileUtils, where you read into a buffer of any
reasonable size, there is no additional advantage to using a
BufferedInputStream (you are essentially implementing buffering
yourself anyway).  The advantage comes if you want to use the 1-byte
read method, since this would be highly inefficient if you did not use
a BufferedInputStream to manage the buffer for you.

If Xerces didn't perform well when passed a FileInputStream, I'd say
that would be a bug in Xerces for sure.  It would be terrible to force
your users to create a BufferedInputStream every time they wanted to
parse something at a reasonable speed.

Tweaking buffer sizes could help I guess, feel free to do a test.  My
gut says the Xerces default will perform just fine, or somebody would
have changed it by now.

-Adam

On 2/12/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
Adam Lally wrote:
> I doubt it.  Is there something that led you to believe this would be
> necessary?

Just doing some code inspection and seeing this - that it is perfectly
feasible to
pass a buffered version of the input to this, and that the general
contract for IO
seems to imply that you should use buffering for performance considerations.
But I see from some web surfing that the Xerces impl does some buffering,
and you can set the buffer size via a property (do we do that?  default
= 2k I think,
and the Apache license is about 1K by itself :-) ).

I guess some simple test would tell...

Some web surfing turned up:

Parsers like Apache Xerces have the ability to set the input buffer size:

|// Set the chunk to read in by SAX
  parser.setProperty("http://apache.org/xml/properties/input-buffer-size";,
      new Integer(2048));

See also http://xerces.apache.org/xerces2-j/properties.html
which gives some advice on how large to set this.

|

-Marshall


Reply via email to