Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009

Boris Zbarsky Wed, 28 Jan 2009 07:10:39 -0800


Marcos Caceres wrote:

Ok, that sounds like a completely reasonable proposal. And you are right, I
had thought about this in totally the wrong way. I did as you suggested:
  * widget engines may now support SVG 1.1.
  * authors, however, should try to conform to SVG Tiny 1.2.
  * conformance checkers should warn authors when their icons don't conform

to SVG tiny 1.2.

Note that SVG Tiny 1.2 is not a subset of SVG 1.1, by the way... I'mnot sure whether that should affect this section; just pointing it out.

I think it makes more sense to just allow widget engines to implementwhatever SVG version they want (as in, place no restrictions on it, pastthe fact that .svg files should be processed per the image/svg+xml MIMEtype registration).

Correct. So what is wrong with limiting sniffing to the table in the spec?


Nothing.  In fact it's highly desirable.

Or to the content-sniffing internet draft I pointed you to earlier?... I'm
not sure I'm understanding what you want me to specify here.

I was just pointing out that current implementations of something likewidgets which don't use a MIME manifest or some such use an alternatesystem (aggressive extension sniffing) that we don't want to use here.

Understood. However, wouldn't you have to deal with the fact that
non-conforming zip implementations are used to create the widgets in the

first place.

That's a good question, actually. I'm not sure I have enough of a graspof the issue to tell you what this would mean for a widget UA inpractice....

Do we have any data to support this supposition?  That's certainly how
things work with web pages, and in small market segments like Western
Europe there are multiple encodings in common use (ISO-8859-1 and

UTF-8).


No, not directly. I only have anecdotal evidence: a podcast from the Harvard
Business Review about globalization and the internet, but I don't have a
pointer. In that podcast, some research was presented that indicated that
only 15% of internet traffic actually leaves the boundaries of a country and
is decreasing. That means that 85% or more of all communication would, in
theory, be done using the same language and, by extension, the same
character encoding.

Unfortunately, the language to character encoding mapping is notone-to-one... See above about Western Europe.

I reached similar conclusions through my own testing/research [1]. Note that
on Mac it is apparently some proprietary variant of UTF-8 in fully
decomposed canonical form. I'm not sure what different flavors of Linux use


Nowadays UTF-8 for the most part, at least for new data being created.

but again: things seem bad on the file name encoding front. In essence, you
can't share Zip files across OS if they contain characters outside the ASCII
range.


This seems like a problem to me...

By "reality" I meant the reality about zip implementations - i.e., no
respect for encodings.

OK.

MHTML *may* be more technically superior and architecturally better, but
there is more tool support for Zip than MTHML. AFAIK, MHTML packaging tools
do not ship with any operating system. Zipping tools do.

Quite true. At the same time, we're discussing the fact that once youwant non-ASCII filenames the zip tools hinder more than help, right?

I don't have any statistics, but I assume Zip is used around the world - I
mean the fact that it is a standard tool on all OS has to mean something
significant.


True.

Also, Mozilla uses it to ship add-ons right? What, if any,
problems have you guys experienced wrt to zip in internationalized contexts?

Sort of. We use JAR, not ZIP. Any JAR file is a ZIP file, but not viceversa. In particular, the JAR spec [1] defines that all non-ASCII bytesare UTF-8.

Again, I'm not sure how to proceed.

That really depends on how much you care about allowing any ZIPimplementation to be used for creating widgets vs how much you careabout internationalization issues that might arise as a result...

"In result, excluding any U+0020 SPACE characters, convert any sequence of
one or more characters marked with the [Unicode] property "White_Space" into
a single U+0020 SPACE."

The next step collapses sequences of two or more U+0020 SPACE into a single
U+0020 SPACE.


Sounds great.

-Boris

Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009

Reply via email to