i have a problem if i want to get the position of a XML tag. the position
given by the locator is wrong if i have German umlauts between my tags. the
real position number minus the number of German umlauts i get as the result.
if i have "\r\n" after each tag then the locator returns me the right
position.

that behavior i notice in the 2.0.15 version of the XML4J package and it is
also in the new xerces version (3.0.1).

I attache a java class that produce the behavior.

After a look in the xerces code (3.0.1) I found the point where that
behavior is produced.

In the class "org.apache.xerces.readers.AbstractCharReader" in the method
"skipMultiByteCharData". If I extend the method by two lines that increment
the fCharacterCounter then it works fine. Please have a look in the modified
code.

Does anyone have an idea if it is a bug or a wrong using of the SAXParser ?

many thanks,
joerg flammiger

____________________________________________________ ___
__________
INTERSHOP. Creating the Digital Economy(tm)
Joerg Flammiger
Software Engineer, Cartridge Division, R&D
INTERSHOP Communications GmbH, Leutragraben 2-4, D-07743 Jena, Germany
Phone +49 (0) 3641-894 323, Fax +49 (0) 3641-894 111, www.intershop.de



Modified code:
private boolean skipMultiByteCharData(int ch) throws Exception {
        if (ch < 0xD800) {
/* ---------- */
/* modified -start--*/
/* ---------- */
        fCharacterCounter++;
/* ---------- */
/* modified -end--*/
/* ---------- */
            loadNextChar();
            return true;
        }
        if (ch > 0xFFFD)
            return false;
        if (ch >= 0xDC00 && ch < 0xE000)
            return false;
        if (ch >= 0xD800 && ch < 0xDC00) {
            CharDataChunk savedChunk = fCurrentChunk;
            int savedIndex = fCurrentIndex;
            int savedOffset = fCurrentOffset;
            ch = loadNextChar();
            if (ch < 0xDC00 || ch >= 0xE000) {
                fCurrentChunk = savedChunk;
                fCurrentIndex = savedIndex;
                fCurrentOffset = savedOffset;
                fMostRecentData = savedChunk.toCharArray();
                fMostRecentChar = fMostRecentData[savedIndex] & 0xFFFF;
                return false;
            }
        }
/* ---------- */
/* modified -start--*/
/* ---------- */
        fCharacterCounter++;
/* ---------- */
/* modified -end--*/
/* ---------- */
        loadNextChar();
        return true;
    }
 

example tester:

import java.io.StringReader;

import org.apache.xerces.parsers.SAXParser;

import org.xml.sax.*;

/**
*/

public class ApacheLocatorTester
{ 
    private String xmlData;
    
    ApacheLocatorTester()
    {
        xmlData="<?xml version='1.0'
encoding='ISO-8859-1'?><IS_DATA><PRODUCT><ATTRIBUTES_BLOCK><PRODUCT_ATTRIBUT
E><ATTR_NAME>LongDescription</ATTR_NAME><ATTR_VALUE>���.</ATTR_VALUE><ATTR_T
YPE>text</ATTR_TYPE></PRODUCT_ATTRIBUTE></ATTRIBUTES_BLOCK></PRODUCT></IS_DA
TA>";
    }
    
    public static void main(String args[])
    {   
        ApacheLocatorTester mTester=new ApacheLocatorTester();
        mTester.parse();
    }

    private void parse()
    {
        System.out.println("True position when finishing PRODUCT:");
 
System.out.println("ColumnNumber="+(xmlData.indexOf("</PRODUCT>")+11));
        SAXParser parser=new SAXParser();
        parser.setContentHandler(new TesterContentHandler());
        try
        {
            parser.parse(new InputSource(new StringReader(xmlData)));
        }
        catch(Exception ex)
        {
            System.out.println(ex.getMessage());
            ex.printStackTrace();
            return;
        }
    }
    class TesterContentHandler extends org.xml.sax.helpers.DefaultHandler
    {
         Locator mLocator;
         
         public void setDocumentLocator(Locator locator) 
         {
            mLocator=locator;
                super.setDocumentLocator(locator);
         }
         public void endElement(String uri, String localName, String
rawName) 
         {
            if(localName.equals("PRODUCT"))
            {
                    System.out.println("Position given by the locator at the
end of the tag PRODUCT:");
 
System.out.println("LineNumber="+mLocator.getLineNumber());
 
System.out.println("ColumnNumber="+mLocator.getColumnNumber());
                }
                
         }
    }
}:

Reply via email to