>However, the way it's currently written, it makes me wonder: what if index
>_is_ equal to CHUNK_SIZE, and offset is not less then endOffset? When can
>this happen? Well, it can happen if we have a Unicode char encoded in more
>than one byte (2, 3, 4), and the encoding spans across two chunks, and the
>end offset is within the encoding.

Here is the beginning of toString() again:

    public String toString(int offset, int length) {

        synchronized (fgTempBufferLock) {
            int outOffset = 0;
            UTF8DataChunk dataChunk = this;
            int endOffset = offset + length;

Keep in mind that when the toString() method is called it is reading the
UTF-8 bytestream for the SECOND time.  The UTF8Reader has already parsed
the bytestream to find the offset and length of each "string" that the
StringPool provides a handle for.  Therefore, the endOffset will never
be in the middle of a multibyte character.

However, let us assume that this were not true, or that there is a bug
elsewhere that breaks the invariant that endOffset not land in the middle
of an encoding.

Let's look at the code again:

147:            while (offset < endOffset) {
148:                int b0 = data[index++] & 0xff;

Your exception is thrown on the "b0 byte fetch".  Even if it were the
case that offset is not less that endOffset and one of the

                if (index == CHUNK_SIZE && offset < endOffset) {

tests was false, it would have been one of the other byte fetches that
got the ArrayIndexOutOfBoundsException.  How can offset be not less than
endOffset and get past the "while (offset < endOffset)" test?  If the
code were executing correctly it could not happen.  For you to get an
exception at line 148 would require the test on line 147 to be true,
which would cause the (index == CHUNK_SIZE && ... tests to also be true.

The only reason for the && offset < endOffset part of the test is to
handle the case when a string ends on the last byte of a chunk.  In this
case the fNextChunk pointer is null and we do not want to dereference
the null pointer to move to the next chunk when there is no data to
fetch from that chunk.  The code could be:

                if (index == CHUNK_SIZE) {
                    if (offset < endOffset) {
                        dataChunk = dataChunk.fNextChunk;
                        data = dataChunk.fData;
                    }
                    index = 0;
                }

But it should not change the behavior you are seeing.  (Of course, it is
always possible that it could just as any other purely cosmetic reordering
of code can cause a JIT bug to disappear...)

I would guess that I have needed to rewrite some part of Xerces or another
at least a dozen times in the last few years just to work around assorted
JIT bugs, so very little surprises me at this point.  I guess the fates
needed to give me something to do after I was no longer spending my time
chasing down C++ Optimizer bugs !!  :-)

Regards,
Glenn



<[EMAIL PROTECTED]> on 07/23/2001 03:12:20 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED], Glenn Marcy/Cupertino/IBM@IBMUS
cc:
Subject:  Re: Race condition in org.apache.xerces.utils.UTF8DataChunk



Hmm,

I thought some more about this, and I'm not so sure anymore.
Let's look again at the code:


            int index = offset & CHUNK_MASK;
            byte[] data = fData;
            boolean skiplf = false;
            while (offset < endOffset) {
                int b0 = data[index++] & 0xff;
                offset++;
                if (index == CHUNK_SIZE && offset < endOffset) {
                    dataChunk = dataChunk.fNextChunk;
                    data = dataChunk.fData;
                    index = 0;
                }
          <process-byte 0>
                int b1 = data[index++] & 0xff;
                offset++;
                if (index == CHUNK_SIZE && offset < endOffset) {
                    dataChunk = dataChunk.fNextChunk;
                    data = dataChunk.fData;
                    index = 0;
                }
          <process-byte 1>
                int b2 = data[index++] & 0xff;
                offset++;
                if (index == CHUNK_SIZE && offset < endOffset) {
                    dataChunk = dataChunk.fNextChunk;
                    data = dataChunk.fData;
                    index = 0;
                }
          <process-byte 2>
                int b3 = data[index++] & 0xff;
                offset++;
                if (index == CHUNK_SIZE && offset < endOffset) {
                    dataChunk = dataChunk.fNextChunk;
                    data = dataChunk.fData;
                    index = 0;
                }
          <process-byte 3>
         }

Now, the initial value of index is guaranteed to be correct, as it can only
be <= CHUNK_MASK == (CHUNK_SIZE - 1). So first time through the loop we're
fine. Now, every time we increment index, we do:
                if (index == CHUNK_SIZE && offset < endOffset) {
                    dataChunk = dataChunk.fNextChunk;
                    data = dataChunk.fData;
                    index = 0;
                }
See, that it's no longer obvious. If the test would have been:
                if (index == CHUNK_SIZE) { ...
I would have understood, as it obviously guarantees that index
is never equal to CHUNK_SIZE.

However, the way it's currently written, it makes me wonder: what if index
_is_ equal to CHUNK_SIZE, and offset is not less then endOffset? When can
this happen? Well, it can happen if we have a Unicode char encoded in more
than one byte (2, 3, 4), and the encoding spans across two chunks, and the
end offset is within the encoding.

That is, say we have a two-byte encoded char (say 0x88, 0x40). The first
byte (0x88) is in the last byte (16383) of one chunk (chunk0), and the
second byte (0x40) is in the first byte (0) of the next chunk (chunk1).
Moreover, the end offset ends up being right on the _first_ byte (0x88).
As you can see, we will get an exception when we read b1. What protects us
from that happening?

--
Dimi.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to