Hi.
I've earlier written some serialization code that used the SAX API of
Xerces-C for reading, and a modified SAXPrint example for writing stuff
back out.
I've started blowing the dust of this thing. Last time I made sure it
compiled and ran with Xerces-C was with version 1.3.0, so now with 1.5.0
and G++3.0, I decided to clean stuff up.
The problem is local codepage characters.
I have tried reading in a file that contains Norwegian specific
characters (���), but I keep getting a segmentation fault in strlen,
deep within the SAX code:
<STACKTRACE>
#0 0x403e8361 in strlen () from /lib/libc.so.6
#1 0x0806fc91 in std::char_traits<char>::length(char const*) (__s=0x0)
at /usr/include/g++-v3/bits/char_traits.h:158
#2 0x08071906 in std::string::append(char const*) (this=0xbfffeed0,
__s=0x0)
at /usr/include/g++-v3/bits/basic_string.h:473
#3 0x0808cf21 in std::string::operator+=(char const*) (this=0xbfffeed0,
__s=0x0) at /usr/include/g++-v3/bits/basic_string.h:457
#4 0x08065286 in txos::SAXDeserializerContext::characters(unsigned
short const*, unsigned) (this=0xbfffee80, chars=0x80efe58, length=9)
at src/SAXDeserializerContext.cpp:377
#5 0x4011629b in SAXParser::docCharacters(unsigned short const*,
unsigned, bool) () from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#6 0x40144147 in XMLScanner::sendCharData(XMLBuffer&) ()
from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#7 0x40146083 in XMLScanner::scanCharData(XMLBuffer&) ()
from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#8 0x4014aa75 in XMLScanner::scanContent(bool) ()
from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#9 0x40148b81 in XMLScanner::scanDocument(InputSource const&, bool) ()
from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#10 0x40148a59 in XMLScanner::scanDocument(unsigned short const*, bool)
()
from
/home/trustix/devel/kentda/devel/cpp/xml/xerces/lib/libxerces-c1_5.so
#11 0x40115cad in SAXParser::parse(unsigned short const*, bool) ()
</STACKTRACE>
I have tried with the encoding both in ISO8859-1 and UTF8, running the
textfile thru iconv for conversion, but neither works:
---------------8<------------------
<?xml version="1.0" encoding="UTF8"?>
<person>
<name>Ola Børre</name>
</person>
---------------8<------------------
<?xml version="1.0" encoding="ISO8859-1"?>
<person>
<name>Ola B�rre</name>
</person>
---------------8<------------------
Is this a known problem with G++3.0, Xerces-C 1.5.0, the old SAX API or
some FAQ I've missed?
--
<[ Kent Dahl ]>================<[ http://www.stud.ntnu.no/~kentda/ ]>
)____(stud.techn.;ind.�k.data)||(softwareDeveloper.at(Trustix))_(
/"Opinions expressed are mine and not those of my Employer, "\
( "the University, my girlfriend, stray cats, banana fruitflies, " )
\"nor the frontal lobe of my left cerebral hemisphere. "/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]