i have a problem if i want to get the position of a XML tag. the position
given by the locator is wrong if i have German umlauts between my tags. the
real position number minus the number of German umlauts i get as the result.
if i have "\r\n" after each tag then the locator returns me the right
position.
that behavior i notice in the 2.0.15 version of the XML4J package and it is
also in the new xerces version (3.0.1).
I attache a java class that produce the behavior.
After a look in the xerces code (3.0.1) I found the point where that
behavior is produced.
In the class "org.apache.xerces.readers.AbstractCharReader" in the method
"skipMultiByteCharData". If I extend the method by two lines that increment
the fCharacterCounter then it works fine. Please have a look in the modified
code.
Does anyone have an idea if it is a bug or a wrong using of the SAXParser ?
many thanks,
joerg flammiger
____________________________________________________ ___
__________
INTERSHOP. Creating the Digital Economy(tm)
Joerg Flammiger
Software Engineer, Cartridge Division, R&D
INTERSHOP Communications GmbH, Leutragraben 2-4, D-07743 Jena, Germany
Phone +49 (0) 3641-894 323, Fax +49 (0) 3641-894 111, www.intershop.de
Modified code:
private boolean skipMultiByteCharData(int ch) throws Exception {
if (ch < 0xD800) {
/* ---------- */
/* modified -start--*/
/* ---------- */
fCharacterCounter++;
/* ---------- */
/* modified -end--*/
/* ---------- */
loadNextChar();
return true;
}
if (ch > 0xFFFD)
return false;
if (ch >= 0xDC00 && ch < 0xE000)
return false;
if (ch >= 0xD800 && ch < 0xDC00) {
CharDataChunk savedChunk = fCurrentChunk;
int savedIndex = fCurrentIndex;
int savedOffset = fCurrentOffset;
ch = loadNextChar();
if (ch < 0xDC00 || ch >= 0xE000) {
fCurrentChunk = savedChunk;
fCurrentIndex = savedIndex;
fCurrentOffset = savedOffset;
fMostRecentData = savedChunk.toCharArray();
fMostRecentChar = fMostRecentData[savedIndex] & 0xFFFF;
return false;
}
}
/* ---------- */
/* modified -start--*/
/* ---------- */
fCharacterCounter++;
/* ---------- */
/* modified -end--*/
/* ---------- */
loadNextChar();
return true;
}
example tester:
import java.io.StringReader;
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.*;
/**
*/
public class ApacheLocatorTester
{
private String xmlData;
ApacheLocatorTester()
{
xmlData="<?xml version='1.0'
encoding='ISO-8859-1'?><IS_DATA><PRODUCT><ATTRIBUTES_BLOCK><PRODUCT_ATTRIBUT
E><ATTR_NAME>LongDescription</ATTR_NAME><ATTR_VALUE>���.</ATTR_VALUE><ATTR_T
YPE>text</ATTR_TYPE></PRODUCT_ATTRIBUTE></ATTRIBUTES_BLOCK></PRODUCT></IS_DA
TA>";
}
public static void main(String args[])
{
ApacheLocatorTester mTester=new ApacheLocatorTester();
mTester.parse();
}
private void parse()
{
System.out.println("True position when finishing PRODUCT:");
System.out.println("ColumnNumber="+(xmlData.indexOf("</PRODUCT>")+11));
SAXParser parser=new SAXParser();
parser.setContentHandler(new TesterContentHandler());
try
{
parser.parse(new InputSource(new StringReader(xmlData)));
}
catch(Exception ex)
{
System.out.println(ex.getMessage());
ex.printStackTrace();
return;
}
}
class TesterContentHandler extends org.xml.sax.helpers.DefaultHandler
{
Locator mLocator;
public void setDocumentLocator(Locator locator)
{
mLocator=locator;
super.setDocumentLocator(locator);
}
public void endElement(String uri, String localName, String
rawName)
{
if(localName.equals("PRODUCT"))
{
System.out.println("Position given by the locator at the
end of the tag PRODUCT:");
System.out.println("LineNumber="+mLocator.getLineNumber());
System.out.println("ColumnNumber="+mLocator.getColumnNumber());
}
}
}
}: