Marcos Caceres wrote:
Ok, as I know little of SVG, I've asked Doug Scheppers to help me
That sounds like an excellent plan. Thank you!
It is, but this affects more than just Zip. See also [3] with the
problems Limewire had in respect to normalization of Unicode on MacOs
X.
Note that this is a pretty old article. I agree that this stuff doesn't
work as well as would be ideal, of course.
Sort of. We use JAR, not ZIP. Any JAR file is a ZIP file, but not vice
versa. In particular, the JAR spec [1] defines that all non-ASCII bytes are
UTF-8.
AFAIK, JAR uses Java's Modified UTF-8 so it's quite proprietary.
The only difference between standard UTF-8 and Modified UTF-8 is how the
character U+0000 is encoded. If someone is putting that particular
character in their filenames, I have no problem saying that behavior is
undefined as long as it's secure.
The use of modified UTF-8 in Java wrt Zip has led to significant problems
[2] (this bug appeared in 1999 (!)
Looks like that bug is more about the fact that using Java's
ZIP-manipulation functionality on JARs fails because the
ZIP-manipulation stuff uses the OS-default encoding...
Which does bring us back to the issue of ZIP tools sucking in this
regard, of course.
My gut feeling is that we run with this known issue; We have a warning
in the spec that authors should avoid using file names outside the
ASCII range.
I can live with that, as long as the issue has been considered. In
practice, I'll just hope that everyone involved migrates to UTF-8 and is
done with it.
-Boris