If you are using UTF-16 encoding then probably <bar> will not start at byte 5. Do you want the character position or the byte position?
Also if you have the BOM Marker for Unicode at the start of the Stream then again the byte position varies. I think that was one reason why xerces-j was not providing that kind API. /Suresh >>-----Original Message----- >>From: Xiaoming Liu [mailto:[EMAIL PROTECTED] >>Sent: Tuesday, January 04, 2005 4:44 AM >>To: [EMAIL PROTECTED] >>Subject: read byte offset information during xml parsing >> >> >>hi, >> >>I am looking for a Java XML parser which supports reading byte offset >>information during xml parsing, e.g. in '<foo><bar></bar></foo>', the >>parser can report '<bar>' starts from byte 5; and '</bar>' starts from >>byte 10 . >> >>I went through standard APIs like DOM, SAX, and XMLPull and >>cannot find >>related APIs. In Sax, the nearest interface is >>org.xml.sax.Locator. I also >>checked Xerces XNI and found the nearest class is >>org.apache.xerces.xni.XMLLocator. In either class, only line >>number and >>column number are reported. >> >>However, similar functions are provided in other languages, >>such as the >>"XML_GetCurrentByteIndex" of expat parser (C, perl). >> >>so my question is whether there is a Java XML Parser >>reporting byte offset >>information during parsing, and if not, is there any plan to >>implement this feature? >> >>many thanks, >>Xiaoming >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
