"Stefan Berglund" <[EMAIL PROTECTED]> writes:

> I've tried to create entities in the DTD like: <!ENTITY aring "&#229;"> and
> then it works better - getNodeValue on "&aring;" then gets me the right
> character.. But using the DOMPrint code on a document as that gives me
> "&amp;ring;"...

It's kind of irritating to have to do that, given that 8859-1 is a
standard encoding, the parsers should recognize &aring; if your're
encoding is set properly. I had a lot of trouble with '&egrave;', 
'&eacute;', etc. when converting Roget's thesaurus to XML. I finally
had to declare them in the DTD as you had to.

Strange that DOMPrint screws it up. I tested a small piece of my
roget.xml against SAXPrint, SAX2Print, and DOMPrint. DOMPrint got it
wrong, the other two got it correct. 

 $ SAXPrint -v=auto /tmp/tst.xml
<?xml version="1.0" encoding="LATIN1"?>
<thesaurus>
  <section title="THESAURUS OF ENGLISH WORDS AND PHRASES">
    <major id="major_383" name="Cold">
      <minor part_of_speech="NOUN">
        <synonym-group>
          <related-synonym>
            <synonym name="nev�e"></synonym>
            <synonym name="serac" comments="obs3"></synonym>
          </related-synonym>
        </synonym-group>
      </minor>
    </major>
  </section>
</thesaurus>

 $ DOMPrint -v=auto /tmp/tst.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE thesaurus SYSTEM "roget.dtd">
<thesaurus>
  <section title="THESAURUS OF ENGLISH WORDS AND PHRASES">
    <major id="major_383" name="Cold">
      <minor part_of_speech="NOUN">
        <synonym-group>
          <related-synonym>
            <synonym name="nevée"/>
            <synonym name="serac" comments="obs3"/>
          </related-synonym>
        </synonym-group>
      </minor>
    </major>
  </section>
</thesaurus>

 $ SAX2Print -v=auto !$
SAX2Print -v=auto /tmp/tst.xml
<?xml version="1.0" encoding="LATIN1"?>
<thesaurus>
  <section title="THESAURUS OF ENGLISH WORDS AND PHRASES">
    <major id="major_383" name="Cold">
      <minor part_of_speech="NOUN">
        <synonym-group>
          <related-synonym>
            <synonym name="nev�e"></synonym>
            <synonym name="serac" comments="obs3"></synonym>
          </related-synonym>
        </synonym-group>
      </minor>
    </major>
  </section>
</thesaurus>

Looks like DOMPrint is buggered.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to