On Mar 7, 2004, at 7:42 PM, Keith Rogers wrote:
All of our file XML is UTF-8 input, and I haven't seen any problems with direct file transforms using Xerces 1.6/Xalan 1.3 or Xerces 2.3/Xalan 1.6.� I never saw a reason for ICU, since all of our stuff is UTF-8 (or UTF-16), so don't build with it.� Like I said, the only time I saw (what should have been Japanese) characters incorrectly converted to entities was when the original got mangled, either through an incorrect conversion, or sometimes ignored URL encoding flubbups.� Sorry I can't see what you're doing wrong, offhand.
You were right - the text was totally mangled - thanks for the advice, as it saved me a lot of time tearing my hair out. This turns out not to be the fault of Xalan/Xerces, but rather of Windows copy/paste. If you copy UTF-8 text, don't expect it to paste as UTF-8 text. Whee. This may work with some apps, I suppose, but it didn't work with the XSLT development application we are working with. We've still got some problems, but things are looking better now.
Does anybody have a favorite XML editor for windows that correctly supports unicode? Also, we're having some trouble moving cross-platform with UTF-8 encoded files - moving between Solaris and Win32 seems to be a pretty good way to screw up your files. Right now I'm creating all the files on my mac and moving them to windows, which seems to work (MacOS X really seems to have the best in-built unicode support of any OS I've seen), but we're going to have to have some way to produce source text on windows and solaris, or at least be able to create on windows and move to solaris without blowing everything away. Anybody have any suggestions for tools?
-- Nick
