Adam Lally wrote:
I think this is fine.

Here's my thinking on the pros/cons:

Heap used:

The overall heap consumed by both methods, for file having "N" bytes in file length, ignoring heap consumed by result which would be the same in both cases:

Current: 10,000 chars for buf + approx N to N/2 (depending on encoding) chars in string buf + maybe lots of garbage as string buf is repeatedly expanded ( estimated as approx: N to N/2). One way to reduce this is to get the file length
in bytes and preallocate the string buffer to, say N/2.
Previous:  N chars in buf


IIRC the String Buffer "cheats" and doesn't reallocate the memory
again when you call toString() on it (an advantage of being in the
java.lang package I guess, user code can't do that)... unless you
subsequently append more to the buffer.  If true then the "previous"
approach has an additional N to N/2 chars in the String itself, which
the current approach does not have.

So for large files, the previous could be wasteful by overallocating the buf in the case of character encoding being used, and the current is wasteful in terms of the stringbufer being reallocated repeatedly.


But what about a file that was, say, 100 MB, regardless of character
encoding?  Surely it is wasteful to allocate a 100 million character
array as temporary storage and then also allocate about that much (or
half that much) again for the String itself.

I agree with you, if, as you say, the StringBuffer "cheats". I presumed, perhaps incorrectly, that it made a copy of the underlying char array object. The JavaDocs imply this is what happens:

Implementation advice: This method can be coded so as to create a new |String| object without allocating new memory to hold a copy of the character sequence. Instead, the string can share the memory used by the string buffer. Any subsequent operation that alters the content or capacity of the string buffer must then make a copy of the internal buffer at that time. This strategy is effective for reducing the amount of memory allocated by a string concatenation operation when it is implemented using a string buffer.

Thanks for pointing that out!  -Marshall

Reply via email to