Message: The following issue has been resolved as FIXED.
Resolver: James Berry Date: Mon, 20 Sep 2004 1:01 PM As a result of this bug report, the default encoding for Mac OS X was changed to utf-8 to account for the fact that this is the default encoding coming off the command line. A build option XML_MACOS_LCP_TRADITIONAL is available to revert to the previous behavior. --------------------------------------------------------------------- View the issue: http://issues.apache.org/jira/browse/XERCESC-1166 Here is an overview of the issue: --------------------------------------------------------------------- Key: XERCESC-1166 Summary: Xerces cannot open file whose name includes UTF8 characters Type: Bug Status: Resolved Resolution: FIXED Project: Xerces-C++ Components: Utilities Fix Fors: Nightly build (please specify the date) Versions: 2.4.0 Assignee: James Berry Reporter: Mark Goldstein Created: Thu, 26 Feb 2004 9:35 PM Updated: Mon, 20 Sep 2004 1:01 PM Environment: Operating System: Other Platform: Macintosh Description: I originally wrote about this as attached below. James Berry asked me to file the big report, see his e-mail below as well. On Feb 25, 2004, at 5:31 PM, Mark Goldstein wrote: Hello, Using Xalan/Xerces I tried to transform a file with a name that included an "e" with accent. Your mailer might show it: féébad.xml The command line call (using Mac OS-X copy/paste which converts the characters to octal constants) looks like this: mark$ ./Xalan -o foo.out fe\314\201e\314\201bad.xml foo.xsl And it results in this error: Fatal Error at (unknown file , line 0 , column {null} ): An exception occurred! Type:RuntimeException, Message:The primary document entity could not be opened. Id=féébad.xml SAXParseException: An exception occurred! Type:RuntimeException, Message:The primary document entity could not be opened. Id=féébad.xml (, line 0, column 0) Is this a known bug? Is there a work-around? This isn't a known bug, but, having done a bit of snooping, I do believe that it is a bug. Here's what I think is going on: Xerces creates a transcoder that converts from the local code page to unicode (LCP Transcoder). On Mac OS, it assumes the local code page is whatever the default system script encoding is, which is often MacRoman. This LCP Transcoder is used whenever a XMLString is created from a char*. That is done, for instance, as part of taking a file off the command line and creating a parser from it. The problem in your case is that the characters coming off the Mac OS X command line are actually utf-8, not (MacRoman, or whatever). They're being converted to utf-16 as if they were MacRoman. And all hell breaks loose, including the unfortunate fact that the file can't be opened. This is a bit of a no-win situation. We could simply make the LCP Transcoder assume the LCP is always utf-8, but that would require a major re-architecting of the transcoder, since we rely on the lower level unicode converter, which can't transcode between unicode encodings, only to and from them. It also may not be quite the right answer either, since it just fixes the situation for the command line and ignores the fact that there are a number of other LCP encodings being used, which this decision could affect. There are probably a number of workarounds, but they all basically boil down to not relying on the LCP transcoder to convert the utf-8 string from the command line into unicode in the first place. For instance, you could explicitly call the intrinsic utf-8 transcoder through Transervice, or cheat and call TranscodeUTF8ToUniChars, which is buried down in MacOSPlatformUtils. There are probably better solutions, but it's getting late for me now. Once you have the filename in utf-16, pass that directly into the parser. There may be other simpler workarounds, which might include simply changing the encoding of text in the terminal to MacRoman, or whatever. But it's making my head hurt to understand the interactions that would occur in that case...your file wouldn't list correctly in that case, I would think. Please let me know how it goes, and if you could write a bug report that would help as well. James. --------------------------------------------------------------------- JIRA INFORMATION: This message is automatically generated by JIRA. If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]