Marcos Caceres wrote:
Ok, as I know little of SVG, I've asked Doug Scheppers to help me

That sounds like an excellent plan.  Thank you!

It is, but this affects more than just Zip. See also [3] with the
problems Limewire had in respect to normalization of Unicode on MacOs
X.

Note that this is a pretty old article. I agree that this stuff doesn't work as well as would be ideal, of course.

Sort of.  We use JAR, not ZIP.  Any JAR file is a ZIP file, but not vice
versa.  In particular, the JAR spec [1] defines that all non-ASCII bytes are
UTF-8.

AFAIK, JAR uses Java's Modified UTF-8 so it's quite proprietary.

The only difference between standard UTF-8 and Modified UTF-8 is how the character U+0000 is encoded. If someone is putting that particular character in their filenames, I have no problem saying that behavior is undefined as long as it's secure.

The use of modified UTF-8 in Java wrt Zip has led to significant problems
[2] (this bug appeared in 1999 (!)

Looks like that bug is more about the fact that using Java's ZIP-manipulation functionality on JARs fails because the ZIP-manipulation stuff uses the OS-default encoding...

Which does bring us back to the issue of ZIP tools sucking in this regard, of course.

My gut feeling is that we run with this known issue; We have a warning
in the spec that authors should avoid using file names outside the
ASCII range.

I can live with that, as long as the issue has been considered. In practice, I'll just hope that everyone involved migrates to UTF-8 and is done with it.

-Boris

Reply via email to