Hmm,
I thought some more about this, and I'm not so sure anymore.
Let's look again at the code:
int index = offset & CHUNK_MASK;
byte[] data = fData;
boolean skiplf = false;
while (offset < endOffset) {
int b0 = data[index++] & 0xff;
offset++;
if (index == CHUNK_SIZE && offset < endOffset) {
dataChunk = dataChunk.fNextChunk;
data = dataChunk.fData;
index = 0;
}
<process-byte 0>
int b1 = data[index++] & 0xff;
offset++;
if (index == CHUNK_SIZE && offset < endOffset) {
dataChunk = dataChunk.fNextChunk;
data = dataChunk.fData;
index = 0;
}
<process-byte 1>
int b2 = data[index++] & 0xff;
offset++;
if (index == CHUNK_SIZE && offset < endOffset) {
dataChunk = dataChunk.fNextChunk;
data = dataChunk.fData;
index = 0;
}
<process-byte 2>
int b3 = data[index++] & 0xff;
offset++;
if (index == CHUNK_SIZE && offset < endOffset) {
dataChunk = dataChunk.fNextChunk;
data = dataChunk.fData;
index = 0;
}
<process-byte 3>
}
Now, the initial value of index is guaranteed to be correct, as it can only
be <= CHUNK_MASK == (CHUNK_SIZE - 1). So first time through the loop we're
fine. Now, every time we increment index, we do:
if (index == CHUNK_SIZE && offset < endOffset) {
dataChunk = dataChunk.fNextChunk;
data = dataChunk.fData;
index = 0;
}
See, that it's no longer obvious. If the test would have been:
if (index == CHUNK_SIZE) { ...
I would have understood, as it obviously guarantees that index
is never equal to CHUNK_SIZE.
However, the way it's currently written, it makes me wonder: what if index
_is_ equal to CHUNK_SIZE, and offset is not less then endOffset? When can
this happen? Well, it can happen if we have a Unicode char encoded in more
than one byte (2, 3, 4), and the encoding spans across two chunks, and the
end offset is within the encoding.
That is, say we have a two-byte encoded char (say 0x88, 0x40). The first
byte (0x88) is in the last byte (16383) of one chunk (chunk0), and the
second byte (0x40) is in the first byte (0) of the next chunk (chunk1).
Moreover, the end offset ends up being right on the _first_ byte (0x88).
As you can see, we will get an exception when we read b1. What protects us
from that happening?
--
Dimi.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]