Adam,

I'd encourage you to "follow the money" or, in this case, "follow the
characters".

Internally in Xerces (SAX and DOM) all characters are stored in UTF-16; each
of the quotes should consume just one UTF-16 character in this
representation. So the real issue probably becomes one of the source and
destination encodings, and whether these are correct.

I would start by verifying:

    - What the source encoding is
    - Whether the characters are properly encoded in it
    - Whether you've communicated this encoding properly to Xerces

Then:

    - Whether the characters get properly transcoded by Xerces into UTF-16
for internal use

Then:

    - What your destination encoding is
    - Whether you've communicated this encoding properly to Xerces
    - Whether the characters are properly transcoded into it

Recent versions of BBEdit on the Mac do a pretty good job of handling text
files in multiple encodings of choice, and so may be able to help you with
checking the source and destination encodings.

Hope that helps. If you do find a problem with the Mac transcoder in Xerces,
I would of course love to hear more about it... ;)

-jdb


On 9/18/03 6:22 AM, "Adam Heinz" <[EMAIL PROTECTED]> wrote:

> My DOMWriter is transcoding from one of the native Macintosh code pages to
> UTF-8.  I'm exporting XML from a document layout program.  My specific issue
> is that high byte "smart quotes" characters are being serialized as three
> unrelated garbage characters (which I don't assume are wrong, since I'm
> looking at UTF-8), depending on the kind of quotes.  After transcoding by my
> SAX parser, these characters turn into three different garbage characters,
> which then promptly choke the target application (on Windows).  In the
> short-term, I'm probably going to just switch over the character and convert
> back to regular quotes.  In the long-term, I'd like to do away with this hack,
> as I expect it's just a matter of time before I hit more code page problems.
> 
> Sorry for being cranky about Mac code pages; I should have said unfamiliar
> instead of non-standard.  I'm no Windows zealot.
> 
> Adam Heinz
> Development Consultant
> 
> Exstream Software
> 2424 Harrodsburg Road, Suite 200
> Lexington, KY 40503
> (317) 879-2831
> [EMAIL PROTECTED]
> 
> connecting with the eGeneration
> www.exstream.com
> 
> -----Original Message-----
> From: James Berry [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 17, 2003 8:13 PM
> To: Xerces C Dev
> Cc: Adam Heinz
> Subject: Re: native macintosh code page
> 
> 
> Hi Adam,
> 
> Well, the Mac's native character encoding is no more non-standard than any
> other, it's just different! (And it has the moral advantage of having been
> defined before the Window's code page).
> 
> But to answer your question may require more information about what you're
> really trying to do, and what's not working. You can set the output encoding
> by using the setEncoding method on DOMWriter, for instance.
> 
> Does that help at all?
> 
> -jdb
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to