On 02/16/2011 11:37 PM, Andy Black wrote:
> Hussein:
> 
> One of our users recently ran into a frustrating situation. He had a 
> <link> element whose href attribute was set as a relative URI to a sound 
> file in MP3 format. The sound file was on his hard disk in a 
> subdirectory of where the main file is located. (This is for the 
> XLingPaper custom configuration files we have for use with XXE, but I 
> suspect that is not a factor.) The file name included accented vowels. 
> As an example, one was
> 
> ɔ̀kɔ́ àbɔ̀n.mp3
> 
> (The name may not come through correctly in this email. The Unicode 
> characters are: U+0254 U+0300 k U+0254 U+0301 (space) a U+0300 b U+0254 
> U+0300 n . m p 3.)

Surprising! Why, for example, express agrave as U+0061 U+0300 (that is,
Letter 'a' followed by Combining Diacritical Mark '`') when there is a
Unicode character for agrave: U+00E0?



> 
> When he used the Browse Files... option in the Attributes Editor of XXE, 
> the result was
> 
> %C9%94%CC%80k%C9%94%CC%81%20%C3%A0b%C9%94%CC%80n.mp3
> 
> I understand that this is the same file name using percent-encoding 
> (with UTF-8 encoding values) for those requiring it with the exception 
> that the acute a in the file name is two Unicode characters (U+0061 
> U+0300) while the acute a in the URI is one Unicode character (U+00E0). 
> Apparently, this difference is crucial.
> 
> The problem becomes clear when the user's XML file is converted to 
> either a web page output or a PDF output and the user clicks on the 
> link. The browser or PDF reader indicates that it will look for the 
> correct file name (at least, it looks correct - one can see the acute a, 
> for example), but these applications report that they could not find the 
> file. Looking at the web page, the file name is exactly as the URI 
> returned from the Attributes Editor as given above. Similarly for the 
> PDF file. So why is it that everything looks good, but these 
> applications say that they cannot find the file?
> 
>  From what we can tell, the problem is for characters like the acute 
> accented a. The file system on the user's hard drive 

Probably Mac OS X HFS+. See
http://stackoverflow.com/questions/3610013/file-listfiles-mangles-unicode-names-with-jdk-6-unicode-normalization-issues



> uses the decomposed 
> form (NFD) of the acute a (i.e. it is U+0061 U+0300) while the result of 
> the Browse Files... option in the Attributes Tool (U+00E0) uses the 
> composed form (NFC). When the composed form is used by a web browser or 
> PDF reader, there is a mismatch to the file name on the hard drive so 
> the file cannot be found.
> 
> Is this a known issue with the Browse Files... dialog box in the 
> Attributes Tool? That is, is it known that this tool converts NFD format 
> to NFC? Is there a preferences setting that can be set to control this? 
> Is there some other work-around available?
> 

XXE simply uses the characters of the filenames passed to it by the Java
runtime. Therefore it's a Java issue and not an XXE issue. Java does not
seem to keep the original decomposed form of the characters. I don't
know any (simple) workaround.
 
--
XMLmind XML Editor Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to