Marcos Caceres wrote:
Ok, that sounds like a completely reasonable proposal. And you are right, I
had thought about this in totally the wrong way. I did as you suggested:
* widget engines may now support SVG 1.1.
* authors, however, should try to conform to SVG Tiny 1.2.
* conformance checkers should warn authors when their icons don't conform
to SVG tiny 1.2.
Note that SVG Tiny 1.2 is not a subset of SVG 1.1, by the way... I'm
not sure whether that should affect this section; just pointing it out.
I think it makes more sense to just allow widget engines to implement
whatever SVG version they want (as in, place no restrictions on it, past
the fact that .svg files should be processed per the image/svg+xml MIME
type registration).
Correct. So what is wrong with limiting sniffing to the table in the spec?
Nothing. In fact it's highly desirable.
Or to the content-sniffing internet draft I pointed you to earlier?... I'm
not sure I'm understanding what you want me to specify here.
I was just pointing out that current implementations of something like
widgets which don't use a MIME manifest or some such use an alternate
system (aggressive extension sniffing) that we don't want to use here.
Understood. However, wouldn't you have to deal with the fact that
non-conforming zip implementations are used to create the widgets in the
first place.
That's a good question, actually. I'm not sure I have enough of a grasp
of the issue to tell you what this would mean for a widget UA in
practice....
Do we have any data to support this supposition? That's certainly how
things work with web pages, and in small market segments like Western
Europe there are multiple encodings in common use (ISO-8859-1 and
UTF-8).
No, not directly. I only have anecdotal evidence: a podcast from the Harvard
Business Review about globalization and the internet, but I don't have a
pointer. In that podcast, some research was presented that indicated that
only 15% of internet traffic actually leaves the boundaries of a country and
is decreasing. That means that 85% or more of all communication would, in
theory, be done using the same language and, by extension, the same
character encoding.
Unfortunately, the language to character encoding mapping is not
one-to-one... See above about Western Europe.
I reached similar conclusions through my own testing/research [1]. Note that
on Mac it is apparently some proprietary variant of UTF-8 in fully
decomposed canonical form. I'm not sure what different flavors of Linux use
Nowadays UTF-8 for the most part, at least for new data being created.
but again: things seem bad on the file name encoding front. In essence, you
can't share Zip files across OS if they contain characters outside the ASCII
range.
This seems like a problem to me...
By "reality" I meant the reality about zip implementations - i.e., no
respect for encodings.
OK.
MHTML *may* be more technically superior and architecturally better, but
there is more tool support for Zip than MTHML. AFAIK, MHTML packaging tools
do not ship with any operating system. Zipping tools do.
Quite true. At the same time, we're discussing the fact that once you
want non-ASCII filenames the zip tools hinder more than help, right?
I don't have any statistics, but I assume Zip is used around the world - I
mean the fact that it is a standard tool on all OS has to mean something
significant.
True.
Also, Mozilla uses it to ship add-ons right? What, if any,
problems have you guys experienced wrt to zip in internationalized contexts?
Sort of. We use JAR, not ZIP. Any JAR file is a ZIP file, but not vice
versa. In particular, the JAR spec [1] defines that all non-ASCII bytes
are UTF-8.
Again, I'm not sure how to proceed.
That really depends on how much you care about allowing any ZIP
implementation to be used for creating widgets vs how much you care
about internationalization issues that might arise as a result...
"In result, excluding any U+0020 SPACE characters, convert any sequence of
one or more characters marked with the [Unicode] property "White_Space" into
a single U+0020 SPACE."
The next step collapses sequences of two or more U+0020 SPACE into a single
U+0020 SPACE.
Sounds great.
-Boris