> 1) Include no header in the XML file being read.  This results in 
> non-English characters being read in as a ? character.

I'm surprised that you didn't get an encoding exception since ISO-8859-1 code points 
would rarely be legal UTF-8.

> 
> 2) Including the header <?xml version="1.0" 
> encoding="iso-8859-1" ?>.  This 
> causes the file not to be read at all.  Looking at the Xerces 
> code I was 
> able to track down one of the problems to the way Xerces 
> detects codepages 
> in Win32TransService.cpp.  In the constructor it checks for 
> the codepages on 
> the machine by looking in the registry under 
> HKCR\MIME\Database\Codepage 
> (and Charset), which doesn't exist on a base Windows 95 system.

There does seem to be a internal ISO-8859-1 transcoder, but it looks like it might be 
sensitive to capitalization.  What happens if you use encoding="ISO-8859-1"?

> 
> I was able to add this set of registry keys by installing IE 
> 4.01, but the 
> iso-8859-1 encoding still doesn't work for non-English 
> characters.  In this 
> case Xerces ignores the entire file if it contains such characters.  
> Unfortunately, the 1252 codepage (which is what iso-8859-1 
> looks like it is 
> mapped to) appears to be the only one installed on this 
> version of Windows 
> 95.  The 1252 codepage is named "Western European (Windows)" 
> in the registry 
> which sounds like the character set I am looking for.  
> Looking at Xerces 
> documentation it looks like they support iso-8859-1 as "ISO 
> Latin 1" which 
> sounds promising as well.  So it looks like I am using the 
> proper codepage, 
> but it just isn't working for some reason.

CP-1252 is ISO-8859-1 + plus a few additional characters between 0x82 and 0x8C and 
0x91 and 0x9C and 0x9F.
> 
> On a side note, I found that using iso-8859-3 (1254) does 
> allow Xerces to 
> use these non-English characters.  Though this encoding is 
> not installed on 
> these Windows 95 systems.  If anyone knows an easy way to 
> install this 
> encoding (without installing a whole application like IE) 
> that would be 
> helpful as well.
> 
> Any help is greatly appreciated.

There are also a few unnecessary dependencies on IE 4 components (urlmon and wininet) 
in the COM wrapper.  

For equivalence with MSXML, the COM wrapper provides an XMLHttpRequest object that is 
implemented using WININET.  Unfortunately, this causes xml4com.dll not to load if IE4+ 
isn't present even if you
weren't planning on using XMLHttpRequest.  I have a personal copy that has rewritten 
XMLHttpRequest so that it dynamically loads WININET if and only if you try to do 
something with XMLHttpRequest.

Also, XMLDOMDocument makes calls to PathIsURL, PathIsRelative and 
URLDownloadToCacheFile in urlmon.  PathIsURL and PathIsRelative can both be trivially 
implemented locally.  For
URLDownloadToCacheFile, my proxy will return the local file name if the URL is a local 
file and dynamically load urlmon if the url is remote.  This at least allows you to 
parse local files without
having IE present.

Since the COM wrapper is moderately comatose and Win95 without IE 4 even more so, I 
haven't prep'd these changes for inclusion in the CVS.  However, if you would like 
them as is, let me know.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to