there is two bugs in bug in UTF8Reader in function skippedString, that appears when the string being skipped is shared between two chunks. - at line 725, index should be reset to fCurrentIndex, or the test for the part of the string in the first chunk is never performed - at line 749, the fCurrentIndex is incremented by the full length of the skipped string, while it should be incremented only by the number of characters in the second chunk.
Vassili Dzuba, TetraSys Index: UTF8Reader.java =================================================================== RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/readers/UTF8Reader.jav a,v retrieving revision 1.3 diff -r1.3 UTF8Reader.java 724a725 > index = fCurrentIndex; 728a730 > int nbBytesInFirstChunk = i; 744a747 > fCurrentIndex -= nbBytesInFirstChunk;