Re: Cost of ICU data

2013-10-22 Thread wsel...@mozilla.com
Hi,

Bill from user research here. 

We just finished some research in Thailand and Indonesia where we conducted ~40 
interviews with desktop browser users (half of whom were Firefox users). We'll 
be presenting findings from the research next month, but I'd like to share a 
few observations from the field that give a clear picture of the Internet 
infrastructure in emerging markets in SE Asia.

Indonesia is worth focusing on for the discussion because they have a large 
population and Firefox has a large market share. The infrastructure there is 
similar to India which has an even larger population.

Some context:

First, the connection speeds are really, really slow and stability is poor. 
Only 3% of the population in Indonesia has wired home connections. Everyone 
else either connects at wifi hotspots, internet cafes, or using the 3G network. 
Even these connections are slow. Average connection speed is 3 Mbps compared to 
20 in the US. An example to give some context: most people are not able to 
stream video from YouTube. They install add-ons (such as IDM) to download the 
videos to watch them later.

Second, most users buy their computers from local vendors, not chain stores. 
The local vendors preinstall software on the computers including Firefox (and 
Chrome). Many of these versions of Firefox are older. We saw versions 12, 15, 
18. Some of these have add-ons preinstalled (such as Yahoo, etc.). Others are 
configured to prevent updates.

There is a high correlation between download speed and being up-to-date with 
Firefox. We know from metrics data that ~50% of users in Indonesia are using 
versions other the current version of Firefox. Only the wealthiest of our 
participants had the most current version of Firefox. Our lower-income 
participants who were connecting to the Internet had older versions and add-ons 
that were hijacking search and the user experience in general. 

The key point is that download size is very important in these markets. Also, 
it is important for us to think about two related topics: 
1) How to get people in these markets to current versions of Firefox?
2) If downloading is not currently the most effective distribution model in 
emerging markets, how can we think of alternatives or make downloading work?

One final point: we have observed that in rural parts of N. America that 
connection speeds and stability are similar. So, it's not only an emerging vs. 
emerged markets challenge.

Please let me know if you have more specific or follow-up questions. I'd love 
to share what we learned.

Thanks!
Bill 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Benjamin Smedberg

On 10/15/2013 12:06 PM, Benjamin Smedberg wrote:
With the landing of bug 853301, we are now shipping ICU in desktop 
Firefox builds. This costs us about 10% in both download and on-disk 
footprint

I'm going to try and summarize the discussion and indicate next steps.

==

First, I want to be clear that I am approaching this question for 
desktop Firefox only. B2G has different adoption requirements and a 
different localization strategy than desktop.


==

Several people thought that it would be a bad thing to include only one 
or some subset of language data in Firefox as shipped by default, on the 
grounds that there should be one web platform. Anne brought up the use 
case of a computer in a hotel/hostel.


Axel points out that ICU locales are more analogous to language 
dictionaries. Users can choose to install dictionaries independent of 
the UI locale of Firefox.


Jeff is worried that any sort of dynamic system would cause the Intl.* 
methods to return different results over time, which is surprising to 
platform developers.


Jeff also said that Chromium may not be shipping the full language list, 
but perhaps only a subset of languages. I tested Chrome's behavior, and 
it appears to be shipping a fairly full set of language data, including 
languages such as Amharic which I'm pretty sure it doesn't ship.


I'll also mention that this came up in the previous discussion last 
December, and at the time we discussed whether it would be better for 
websites to provide their own implementation of these intl functions and 
download whatever data they needed; the obvious disadvantage of this is 
that each site would be downloading the data separately without sharing, 
which is not a good experience for developers.


==

jwatt mentioned that he has a dependency on DecimalFormat for parsing 
numbers from input type=number. What locale data does this actually 
require?


==

mbrubeck wonders why this particular feature is being questioned based 
on its size, when in general the Firefox package size has gotten larger 
with other features but without a lot of fuss.


I am questioning this feature now because it is a sizeable jump even by 
historical standards, and because I was made aware of data that shows 
that download size affects both initial adoption and update rates. 
Perhaps we have been adding features to the platform too liberally and 
affecting adoption. Perhaps we need to set an absolute cap on download 
size, and figure out how to work within that cap. I don't really know 
the answers, but we should all be worried about our adoption and market 
share numbers; death by a fairly small set of 10% increases is still a 
big deal.


==

There was a technical discussion about how we could implement dynamic 
download of more languages, and whether the spec made that easy or hard. 
It is clear that the current spec is synchronous and doesn't have a way 
to request additional languages and wait for them. We could do the 
download and start showing results later, but we can't really block on 
that data.


My only other thought here is whether we should propose for the Intl 
draft an additional async API to request new languages and get a promise 
back for when they are ready.


==

cpeterson asked whether we have funnelcake data to actually measure the 
effects of additional download weight.


I had been pushing this with out stats/UR group, and this is now filed 
as bug 928017. I don't have commitments to make this happen yet, but I'm 
working on it.


==

Bill provided more details about the user research data about connection 
speeds and update rates. The summary seems to be that update rates are 
much lower for users with slower connection speeds.


==

I don't think that there is enough data yet to make a decision. 
Hopefully funnelcake results which help make a more informed choice. If 
it turns out that that Firefox wants this decision reconsidered, what 
groups and goals would be affected by asking for ICU or at least the 
number-format and date-format APIs to be disabled for download weight 
reasons in 27?


To be clear, the final decision is definitely not mine to make: I just 
want to make sure that we know what we're trading off and that it's 
clearly what we ought to be doing.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Chris Peterson

On 10/22/13 11:34 AM, wsel...@mozilla.com wrote:

The key point is that download size is very important in these markets. Also, 
it is important for us to think about two related topics:
1) How to get people in these markets to current versions of Firefox?
2) If downloading is not currently the most effective distribution model in 
emerging markets, how can we think of alternatives or make downloading work?


AOL had good success with CDs. :)

I'm only half-joking! CDs are cheap and we have enthusiastic Mozilla 
Reps in many countries. We could make official ISO images and CD art 
designs available on our website.


We could provide or subsidize blank CDs, CD burners, and CD stickers to 
official Mozilla Reps. They could help update people's browsers. The CD 
art could include slogans like after you install Firefox, pass this CD 
along to your friends.



cpeterson
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Chris Peterson

On 10/22/13, 2:09 PM, wsel...@gmail.com wrote:

One suggestion that our team came up with is to provide Firefox-branded USB 
keys and distribute them through reps, chains like KFC and 7-11, and local 
computer vendors where people connect online. These would have installers for 
the latest version of Firefox and FF for Android.


I like the USB thumb drive idea because they are reusable, but (I 
assume) they are more expensive than CDs. But maybe USB thumb drives are 
cheaper because you don't need to buy a CD burner or blank CDs.



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Ehsan Akhgari

On 2013-10-22 4:06 PM, Benjamin Smedberg wrote:

I don't think that there is enough data yet to make a decision.
Hopefully funnelcake results which help make a more informed choice. If
it turns out that that Firefox wants this decision reconsidered, what
groups and goals would be affected by asking for ICU or at least the
number-format and date-format APIs to be disabled for download weight
reasons in 27?


More and more people in libxul want access to ICU.  See the dependency 
list for bug 915735 for a (partial) list.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Benjamin Smedberg

On 10/22/2013 6:19 PM, Ehsan Akhgari wrote:

On 2013-10-22 4:06 PM, Benjamin Smedberg wrote:

I don't think that there is enough data yet to make a decision.
Hopefully funnelcake results which help make a more informed choice. If
it turns out that that Firefox wants this decision reconsidered, what
groups and goals would be affected by asking for ICU or at least the
number-format and date-format APIs to be disabled for download weight
reasons in 27?


More and more people in libxul want access to ICU.  See the dependency 
list for bug 915735 for a (partial) list.
I'm aware of that, but I'm not clear on whether any of those features 
require the language data we're talking about here, and whether having 
the single UI locale or all locales would be necessary. I know that the 
indexeddb use-case only requires the collation tables and not any of the 
locale data.


--BDs

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-22 Thread Ehsan Akhgari

On 2013-10-22 6:34 PM, Benjamin Smedberg wrote:

On 10/22/2013 6:19 PM, Ehsan Akhgari wrote:

On 2013-10-22 4:06 PM, Benjamin Smedberg wrote:

I don't think that there is enough data yet to make a decision.
Hopefully funnelcake results which help make a more informed choice. If
it turns out that that Firefox wants this decision reconsidered, what
groups and goals would be affected by asking for ICU or at least the
number-format and date-format APIs to be disabled for download weight
reasons in 27?


More and more people in libxul want access to ICU.  See the dependency
list for bug 915735 for a (partial) list.

I'm aware of that, but I'm not clear on whether any of those features
require the language data we're talking about here, and whether having
the single UI locale or all locales would be necessary. I know that the
indexeddb use-case only requires the collation tables and not any of the
locale data.


Yes, that's correct.  Simon and Jonathan can probably clarify which 
parts of ICU they're hoping to use.


Cheers,
Ehsan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-18 Thread Chris Peterson

On 10/17/13 11:43 AM, Matt Brubeck wrote:

For this reason, I'm a bit confused at the level of scrutiny of ICU's
size when we've added many times that amount to our download size over
the past couple of years without any pushback or even discussion.


Do we have Funnelcake data comparing download size vs successful 
installations for 2013? If we don't know how big is too big, blocking 
ICU seems premature (but still worth investigation).


Download size is a concern for users in developing countries, but the 
same users will benefit from ICU.



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-18 Thread Chris Peterson

On 10/18/13 4:06 PM, Chris Peterson wrote:

On 10/17/13 11:43 AM, Matt Brubeck wrote:

For this reason, I'm a bit confused at the level of scrutiny of ICU's
size when we've added many times that amount to our download size over
the past couple of years without any pushback or even discussion.


Do we have Funnelcake data comparing download size vs successful
installations for 2013? If we don't know how big is too big, blocking
ICU seems premature (but still worth investigation).

Download size is a concern for users in developing countries, but the
same users will benefit from ICU.


Also, if the ICU data does push us over the download size limit, then we 
may be able to decrease download size and/or improve download success 
through other means.



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Axel Hecht

On 10/17/13 12:02 PM, Gervase Markham wrote:

On 16/10/13 16:02, Axel Hecht wrote:

We'll need to go down a path that works for Firefox OS.


With Firefox OS, we don't have the download-size issue, do we? So we can
ship all the data.

Gerv



We have issues with disk space, currently. We're already in the 
situation where all our keyboard data doesn't fit on quite a few of the 
devices out there.


Also, FOTA size matters a bit, though that's probably less of a problem.

Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Axel Hecht

On 10/16/13 5:39 PM, Jeff Walden wrote:

On 10/16/2013 02:10 PM, Axel Hecht wrote:

I wonder how far we can get by doing something along the lines we use for 
webfonts, starting to do the best we can with the data we already have, and 
improve once the perfect data is local.

Having the Intl.Foo algorithms returning different data over time is, IMO, even 
worse than deciding that certain locales are less important than others.  Aside 
from Math.random, of course, I can't think of anything in JS that has different 
behavior on the same inputs over time.

Jeff
For one, I don't think that's true for web.  You might think so in 
terms of stuff in the js specs, but the distinction between that and 
html5 and all kinds of server errors and timing differences is just theory.


More importantly, the impact of supporting a finite set of languages can 
easily be the nail in the coffin for the others. I don't think that's 
what mozilla stands for.


Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Dao

On 16.10.2013 17:02, Axel Hecht wrote:

We'll need to go down a path that works for Firefox OS.


[...]


But, yes, I think we'll need a hosted service to provide that data on
demand in the end.


This sounds like a non-starter for mobile devices, doesn't it?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Axel Hecht

On 10/17/13 2:41 PM, Dao wrote:

On 16.10.2013 17:02, Axel Hecht wrote:

We'll need to go down a path that works for Firefox OS.


[...]


But, yes, I think we'll need a hosted service to provide that data on
demand in the end.


This sounds like a non-starter for mobile devices, doesn't it?


Well, it makes the implementation trickier.

Of course, telefonica just updated the phones from 1.0.1 to 1.1 in 
spain, over the air without charges, so the infrastructure is there.


It's an organizational effort to tie into that infrastructure. We'll 
need a reference implementation like we have with software update, and 
then get the our partner contacts in shape to explain how to do that on 
their side. Plus customizable hooks, of course.


And then, yes, we'd need to still disable the downloads, or make them 
really optional, if you're on roaming data or something. But software 
update can do that already, too, I suspect.


Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Brian Smith
On Thu, Oct 17, 2013 at 3:46 AM, Axel Hecht l...@mozilla.com wrote:
 We have issues with disk space, currently. We're already in the situation
 where all our keyboard data doesn't fit on quite a few of the devices out
 there.

Where can one read more about this? This ICU data is not *that* huge.
If we can't afford a couple of megabytes now on B2G then it seems like
we're in for severe problems soon. Isn't Gecko alone growing by
megabytes per year?

Cheers,
Brian
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Axel Hecht

On 10/17/13 3:41 PM, Brian Smith wrote:

On Thu, Oct 17, 2013 at 3:46 AM, Axel Hecht l...@mozilla.com wrote:

We have issues with disk space, currently. We're already in the situation
where all our keyboard data doesn't fit on quite a few of the devices out
there.


Where can one read more about this? This ICU data is not *that* huge.
If we can't afford a couple of megabytes now on B2G then it seems like
we're in for severe problems soon. Isn't Gecko alone growing by
megabytes per year?


I wish there were docs and clear cuts. We've been in dire problems 
already, when our QA smoketest phones wouldn't get updates for days due 
to system.img being too large. And thus we didn't get QA to run tests.


These are the questions I asked last time, and don't have answers to:

- What exactly are the limiting sizes?
-- image size (per bootloader?)
-- disk partition size
--- at which point in time? user dependent?
--- can we have telemetry for this, if so?

I suspect we're talking about the joint size for gaia and gecko, but I'm 
not sure that's the case, or at least always the case. I.e., do we get a 
cookie if we move data from gaia into gecko?


There's probably more that I don't know, just because I don't know much 
about phones, and the various processes to get software on to them.


Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-17 Thread Matt Brubeck

On 10/17/2013 10:24 AM, Ehsan Akhgari wrote:

We used to have codesighs measurements (and perhaps still do) but
historically many people just ignored them.


We stopped collecting codesighs measurements in November 2012 (bug 
803736).  As Ehsan says, it was widely ignored.  It regressed 
constantly, and it never seemed reasonable to demand that people 
implement desired features and fixes without adding any code.


For this reason, I'm a bit confused at the level of scrutiny of ICU's 
size when we've added many times that amount to our download size over 
the past couple of years without any pushback or even discussion.


(On a related note, what happened to http://www.arewesmallyet.com/?)
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Axel Hecht

Jumping in late, so top posting.

I think being able to load language data dynamically is a good idea. I 
don't see a reason why this should be tied in to a language pack, 
though. The other way around is a different question. i.e.


language data doesn't include UI localization
UI localization should include language data

We have several multi-language products by now, those should work, in 
particular Firefox OS. We're doing quite a few things there that already 
duplicate language data. Much of that is in /shared, which isn't shared, 
but copied to many apps. Having that data inside gecko would actually 
get it to be shared.


I think much of the ICU data (which is technically CLDR data packed in 
ICU mostly) flows along similar lines of our hyphenation dictionaries. 
The web should just work, independent of which UI locale you're using.


I wonder how far we can get by doing something along the lines we use 
for webfonts, starting to do the best we can with the data we already 
have, and improve once the perfect data is local. I'm personally OK if 
this is a notification bar to reload, even.


Axel

PS: ICU is driven by js globalization api. That API was driven by MS and 
Google to get the data into their html app platforms. For mozilla, IMHO, 
the driver for g18n api should be Firefox OS, we're struggling to work 
around the lack of data for sorting, timezones, language data all around.


On 10/15/13 6:06 PM, Benjamin Smedberg wrote:

With the landing of bug 853301, we are now shipping ICU in desktop
Firefox builds. This costs us about 10% in both download and on-disk
footprint: see https://bugzilla.mozilla.org/show_bug.cgi?id=853301#c2.
After a discussion with Waldo, I'm going to post some details here about
how much this costs in terms of disk footprint, to discuss whether there
are things we can remove from this footprint, and whether the footprint
is actually worth the cost. This is particularly important because our
user research team has identified Firefox download weight as an
important factor affecting Firefox adoption and update rates in some
markets.

On-disk, ICU data breaks into the following categories:

* collation tables - 3.3MB

These are rules for sorting strings in multiple languages and
situations. See http://userguide.icu-project.org/collation for basic
background. These tables are necessary for implementing Intl.Collator.

The Intl.Collator API has methods to expose a subset of languages. It is
not clear from my reading of the specification whether it is expected
that browsers will normally ship with the full set of languages or only
the subset of the browser locale.

* currency tables - 1.9 MB

These are primarily the localized name of each currency in each
language. This is used by the Intl.NumberFormat API to format
international currencies.

* timezone tables - 1.7MB

Primarily the name of every time zone in each language. This data is
necessary for implementing Intl.DateTimeFormat.

* language data - 2.1 MB

This is a bunch of other data associated with displaying information for
a particular language: number formatting in various long and short
formats, calendar formats and names for the various world calendar systems.

==

Do we need this data for any language other than the language Firefox
ships in? Can we just include the relevant language data in each
localized build of Firefox, and allow users to get other language data
via downloadable language packs, similarly to how dictionaries are handled?

Is it possible that some of this data (the collation tables?) should be
in all Firefox locales, but other data (currency and timezone names) is
not as important and we can ship it only in one language?

As far as I can tell, the spec allows user agents to ship whatever
languages they need; the real question is what users and site authors
actually need and expect out of the API. (I'm reading the spec out of
http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts)

I am still working to get better number to quantify the costs in terms
of lost adoption for additional download weight.

Also, we are currently duplicating the data tables on mac universal
builds, because they are compiled-in symbols. We should clearly use a
separate file for these tables to avoid unnecessary download/install
weight. This is now filed as bug 926980.

--BDS




___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Gervase Markham
On 15/10/13 17:06, Benjamin Smedberg wrote:
 With the landing of bug 853301, we are now shipping ICU in desktop
 Firefox builds. This costs us about 10% in both download and on-disk
 footprint: see https://bugzilla.mozilla.org/show_bug.cgi?id=853301#c2.
 After a discussion with Waldo, I'm going to post some details here about
 how much this costs in terms of disk footprint, to discuss whether there
 are things we can remove from this footprint, and whether the footprint
 is actually worth the cost. This is particularly important because our
 user research team has identified Firefox download weight as an
 important factor affecting Firefox adoption and update rates in some
 markets.

You have given on-disk footprint values, but surely download size values
are the important ones for the issue you are raising? After all, some of
this data may be very compressible, and some may not.

 * currency tables - 1.9 MB
 
 These are primarily the localized name of each currency in each
 language. This is used by the Intl.NumberFormat API to format
 international currencies.
 
 * timezone tables - 1.7MB
 
 Primarily the name of every time zone in each language. This data is
 necessary for implementing Intl.DateTimeFormat.

I wonder if we could do this as a webservice? That is, when the browser
is asked to render a timezone string or a currency string in a
particular language, it goes and grabs all the data for that language.
We could therefore have full support, but a one-off delay for each new
language the user wanted to see UI rendered in (which, for most people,
will be a very small set). We could ship a set of common ones plus the
UI language one to reduce still further the number of times the service
would get hit.

Gerv

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Anne van Kesteren
On Wed, Oct 16, 2013 at 2:39 PM, Gervase Markham g...@mozilla.org wrote:
 I wonder if we could do this as a webservice? That is, when the browser
 is asked to render a timezone string or a currency string in a
 particular language, it goes and grabs all the data for that language.
 We could therefore have full support, but a one-off delay for each new
 language the user wanted to see UI rendered in (which, for most people,
 will be a very small set). We could ship a set of common ones plus the
 UI language one to reduce still further the number of times the service
 would get hit.

The API is synchronous so that seems like a bad idea.


-- 
http://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Gervase Markham
On 16/10/13 14:47, Anne van Kesteren wrote:
 The API is synchronous so that seems like a bad idea.

As in, it'll cause the tab to freeze (one time only, when a new language
is called for) while the file is downloading? OK, that's bad, but so is
having Firefox be a lot bigger...

Perhaps, as Brian suggested, we should be looking at using the Windows
APIs and/or system ICU for some of this data, even if there are some
things for which we want to ship our own implementation.

Gerv

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Axel Hecht

On 10/16/13 3:50 PM, Gervase Markham wrote:

On 16/10/13 14:47, Anne van Kesteren wrote:

The API is synchronous so that seems like a bad idea.


As in, it'll cause the tab to freeze (one time only, when a new language
is called for) while the file is downloading? OK, that's bad, but so is
having Firefox be a lot bigger...

Perhaps, as Brian suggested, we should be looking at using the Windows
APIs and/or system ICU for some of this data, even if there are some
things for which we want to ship our own implementation.

Gerv



We'll need to go down a path that works for Firefox OS.

I think that being less-than-great at the first time you hit something 
off the main track is OK. We should see what actually happens with 
what's in the g18n apis now.


We'll likely also need a way to free excessive use of disk space, or DOS 
attacks by sneaking up little fragments of language content for 200 
languages or somesuch.


But, yes, I think we'll need a hosted service to provide that data on 
demand in the end.


Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Jeff Walden
On 10/16/2013 12:45 AM, Karl Tomlinson wrote:
 When sync I/O is performed to read in-binary-object data, how is
 that better?
 
 Just readahead?

Readahead, it being part of the binary/libxul/whatever so already one coherent 
file to load, etc.  I'm not aware that you can reasonably predict adjacency 
predictions from the OS if you use separate files.  But I could be mistaken 
about that.

Jeff
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Chris Peterson

On 10/16/13 6:39 AM, Gervase Markham wrote:

You have given on-disk footprint values, but surely download size values
are the important ones for the issue you are raising? After all, some of
this data may be very compressible, and some may not.


Can we repackage the ICU data so we can compress it using a smarter 
content-aware algorithm? We could decompress the ICU data on first use.



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Benjamin Smedberg

On 10/16/2013 9:39 AM, Gervase Markham wrote:

On 15/10/13 17:06, Benjamin Smedberg wrote:

You have given on-disk footprint values, but surely download size values
are the important ones for the issue you are raising? After all, some of
this data may be very compressible, and some may not.
Correct. The download weight costs are listed in the bug, 
https://bugzilla.mozilla.org/show_bug.cgi?id=853301#c2


MacOS X, 32+64 bit (dmg):60.7 MB   54.7 MB   5.9 MB   10.8 %
Windows, 32 bit (installer.exe): 22.4 MB   20.5 MB   1.9 MB9.2 %

I don't know whether there is a way to more optimally compress these in 
the installer.


--BDS


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Gregory Szorc
Possible crazy idea: do we actively track and send tree management 
notices when package or binary size changes? This seems like something 
we'd want to cover under the perf regressions get backed out or need 
approval policy. It may also help identify build system regressions and 
compiler oddities where sections in the binaries change in size 
surprisingly.


On 10/15/13 9:06 AM, Benjamin Smedberg wrote:

With the landing of bug 853301, we are now shipping ICU in desktop
Firefox builds. This costs us about 10% in both download and on-disk
footprint: see https://bugzilla.mozilla.org/show_bug.cgi?id=853301#c2.
After a discussion with Waldo, I'm going to post some details here about
how much this costs in terms of disk footprint, to discuss whether there
are things we can remove from this footprint, and whether the footprint
is actually worth the cost. This is particularly important because our
user research team has identified Firefox download weight as an
important factor affecting Firefox adoption and update rates in some
markets.

On-disk, ICU data breaks into the following categories:

* collation tables - 3.3MB

These are rules for sorting strings in multiple languages and
situations. See http://userguide.icu-project.org/collation for basic
background. These tables are necessary for implementing Intl.Collator.

The Intl.Collator API has methods to expose a subset of languages. It is
not clear from my reading of the specification whether it is expected
that browsers will normally ship with the full set of languages or only
the subset of the browser locale.

* currency tables - 1.9 MB

These are primarily the localized name of each currency in each
language. This is used by the Intl.NumberFormat API to format
international currencies.

* timezone tables - 1.7MB

Primarily the name of every time zone in each language. This data is
necessary for implementing Intl.DateTimeFormat.

* language data - 2.1 MB

This is a bunch of other data associated with displaying information for
a particular language: number formatting in various long and short
formats, calendar formats and names for the various world calendar systems.

==

Do we need this data for any language other than the language Firefox
ships in? Can we just include the relevant language data in each
localized build of Firefox, and allow users to get other language data
via downloadable language packs, similarly to how dictionaries are handled?

Is it possible that some of this data (the collation tables?) should be
in all Firefox locales, but other data (currency and timezone names) is
not as important and we can ship it only in one language?

As far as I can tell, the spec allows user agents to ship whatever
languages they need; the real question is what users and site authors
actually need and expect out of the API. (I'm reading the spec out of
http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts)

I am still working to get better number to quantify the costs in terms
of lost adoption for additional download weight.

Also, we are currently duplicating the data tables on mac universal
builds, because they are compiled-in symbols. We should clearly use a
separate file for these tables to avoid unnecessary download/install
weight. This is now filed as bug 926980.

--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-16 Thread Ed Morley

On 16 October 2013 23:10:39, Gregory Szorc wrote:

Possible crazy idea: do we actively track and send tree management
notices when package or binary size changes?


Not at present as far as I know, though Tim Taubert created something 
temporary last year (no longer accessible, but perhaps worth following 
up with him):

https://groups.google.com/d/msg/mozilla.dev.apps.firefox/k7fzkhdt9io/n6jnbeFsIBMJ

Best wishes,

Ed
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Jonathan Watt

On 15/10/2013 17:06, Benjamin Smedberg wrote:

I'm going to post some details here about
how much this costs in terms of disk footprint, to discuss whether there
are things we can remove from this footprint, and whether the footprint
is actually worth the cost.


As a heads up, I'm currently intending on using DecimalFormat (a subclass of 
NumberFormat) to parse numbers from strings as part of implementing input 
type=number.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Brian Smith
On Tue, Oct 15, 2013 at 9:06 AM, Benjamin Smedberg
benja...@smedbergs.us wrote:
 Do we need this data for any language other than the language Firefox ships
 in? Can we just include the relevant language data in each localized build
 of Firefox, and allow users to get other language data via downloadable
 language packs, similarly to how dictionaries are handled?

My understanding is that web content should not be able to tell which
locale the browser is configured to use, for privacy (fingerprinting)
reasons. If we went the route suggested above, it would be easy to
figure out, for many users, which locale he/she is using.

 I am still working to get better number to quantify the costs in terms of
 lost adoption for additional download weight.

My (naive) understanding is that the Windows has its own API that does
what ICU does. I believe that Internet Explorer 11 is an existence
proof of that. If we used the Windows API on Windows, maybe we could
avoid building ICU altogether on Windows. Since that accounts to 90+%
of our users, that would almost make it problem solved all on its
own even if we did nothing else.

Cheers,
Brian
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Jeff Walden
On 10/15/2013 06:06 PM, Benjamin Smedberg wrote:
 Do we need this data for any language other than the language Firefox ships 
 in? Can we just include the relevant language data in each localized build of 
 Firefox, and allow users to get other language data via downloadable language 
 packs, similarly to how dictionaries are handled?
 
 Is it possible that some of this data (the collation tables?) should be in 
 all Firefox locales, but other data (currency and timezone names) is not as 
 important and we can ship it only in one language?

It seems a fairly bad thing to me for us to get into the habit of prioritizing 
certain languages above others.

Technically, if the data is compiled into the code, this would mean language 
repacks would...not be repacks any more.  If you had sidealong data files 
everywhere, then you could perhaps repack still.  This might require some 
repacking adjustments, possibly.  ICU provides a udata_setCommonData function 
that lets you load data from anywhere, so there's some flexibility here.

It's worth noting we currently have no central hook to insert this call 
before ICU's ever used.  We init ICU at startup, but that init-call is fast.  
Presumably this new call can't be so fast, because you have to page in all the 
ICU data.  And if you can't delay that til ICU is used, there's really no 
difference between the current setup and a setup that calls udata_setCommonData 
at startup.  Of course, this is all just software.  :-)

 As far as I can tell, the spec allows user agents to ship whatever languages 
 they need; the real question is what users and site authors actually need and 
 expect out of the API. (I'm reading the spec out of 
 http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts)

Grunging through v8's code, I...think...they cull locale lists for stuff to 
some degree.  Maybe to the language set they ship.  I'm looking at 
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/README.chromium
 and honestly don't understand enough about ICU to fully grok the substantial 
set of changes they've made.

 Also, we are currently duplicating the data tables on mac universal builds, 
 because they are compiled-in symbols.

That means sync I/O on the main thread, and not well-optimized because it won't 
be part of the binary.  Just to note.

Jeff
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Benjamin Smedberg

On 10/15/2013 1:18 PM, Brian Smith wrote:

On Tue, Oct 15, 2013 at 9:06 AM, Benjamin Smedberg
benja...@smedbergs.us wrote:

Do we need this data for any language other than the language Firefox ships
in? Can we just include the relevant language data in each localized build
of Firefox, and allow users to get other language data via downloadable
language packs, similarly to how dictionaries are handled?

My understanding is that web content should not be able to tell which
locale the browser is configured to use, for privacy (fingerprinting)
reasons.
I haven't heard this rule before. By default your browser language 
affects the HTTP accept-lang setting, as well as things like default 
font choices. You can certainly customize those back to a 
non-fingerprintable setting, but I'm not convinced that we should worry 
about this as a fingerprinting vector.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Anne van Kesteren
On Tue, Oct 15, 2013 at 6:45 PM, Benjamin Smedberg
benja...@smedbergs.us wrote:
 On 10/15/2013 1:18 PM, Brian Smith wrote:
 My understanding is that web content should not be able to tell which
 locale the browser is configured to use, for privacy (fingerprinting)
 reasons.

 I haven't heard this rule before. By default your browser language affects
 the HTTP accept-lang setting, as well as things like default font choices.
 You can certainly customize those back to a non-fingerprintable setting, but
 I'm not convinced that we should worry about this as a fingerprinting
 vector.

I think preventing fingerprinting at a technical level is something
we've lost though we should try to avoid introducing new vectors.

As far as JavaScript API features go, I don't think we should vary our
offering by locale. E.g. for Firefox OS we want changing locale to
just work and not require a new version of Firefox OS. The same goes
for a computer in a hotel or hostel or some such. Firefox should work
for each locale users might have set in Gmail.


-- 
http://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Benjamin Smedberg

On 10/15/2013 1:50 PM, Anne van Kesteren wrote:
As far as JavaScript API features go, I don't think we should vary our 
offering by locale. E.g. for Firefox OS we want changing locale to 
just work and not require a new version of Firefox OS. The same goes 
for a computer in a hotel or hostel or some such. Firefox should work 
for each locale users might have set in Gmail. 
And yet, we don't ship by default a version of Firefox that has all the 
languages in it, even though that would be good for those use cases also.


If it didn't cost us anything to include all languages, I wouldn't be 
harping on this. But we know that increased package sizes cost us 
Firefox desktop adoption. So what would the practical effect be of only 
including the English data files in the English Firefox, and so forth, 
and allowing users to get additional ICU data via langpacks, the same 
way we get a Firefox translation?


Is there a primary use case for supporting these Intl APIs for languages 
that a user normally doesn't see?


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Brian Smith
On Tue, Oct 15, 2013 at 10:50 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Tue, Oct 15, 2013 at 6:45 PM, Benjamin Smedberg
 benja...@smedbergs.us wrote:
 On 10/15/2013 1:18 PM, Brian Smith wrote:
 My understanding is that web content should not be able to tell which
 locale the browser is configured to use, for privacy (fingerprinting)
 reasons.

 I haven't heard this rule before. By default your browser language affects
 the HTTP accept-lang setting, as well as things like default font choices.
 You can certainly customize those back to a non-fingerprintable setting, but
 I'm not convinced that we should worry about this as a fingerprinting
 vector.

 I think preventing fingerprinting at a technical level is something
 we've lost though we should try to avoid introducing new vectors.

I think, at least, we should consider ways to avoid adding new vectors
when we are making decisions. It doesn't have to be *the* deciding
factor.

 As far as JavaScript API features go, I don't think we should vary our
 offering by locale. E.g. for Firefox OS we want changing locale to
 just work and not require a new version of Firefox OS. The same goes
 for a computer in a hotel or hostel or some such. Firefox should work
 for each locale users might have set in Gmail.

I strongly agree with this. No doubt there is a strong correlation
between the UI locale and the locale used for web content, but it is
far from a perfect correlation. Socially, we should be erring on the
side of encouraging a multilingual society instead of discouraging a
multilingual society. Technically, we should minimize the web-facing
differences between different installations of Firefox, because having
a consistent platform for web developers is a good thing. That is why
we create web standards, and that is why making parts of standards
optional is generally a bad thing.

I have no idea how to install a langpack. Presumably it is something
that is done through AMO. I am skeptical that this is easy enough to
make it acceptable to push this task off to the user. we should at
least automate it for them. If this data is too large and contributing
towards aborted installs, why not just split the installation phase
into two parts, and install the locale data in parallel to starting up
the browser?

Cheers,
Brian
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Chris Peterson

On 10/15/13 12:28 PM, Brian Smith wrote:

I have no idea how to install a langpack. Presumably it is something
that is done through AMO. I am skeptical that this is easy enough to
make it acceptable to push this task off to the user. we should at
least automate it for them. If this data is too large and contributing
towards aborted installs, why not just split the installation phase
into two parts, and install the locale data in parallel to starting up
the browser?


How large is a langpack? Could Firefox install (all) langpacks in the 
background or on demand?


I've heard rumblings about a Firefox updater project to unify updates 
for Firefox data files that are not coupled to a particular Firefox 
release (such as CRLS and GPU driver blocklists).



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Karl Tomlinson
Jeff Walden writes:

 On 10/15/2013 06:06 PM, Benjamin Smedberg wrote:

 That means sync I/O on the main thread, and not well-optimized because it
 won't be part of the binary.  Just to note.

When sync I/O is performed to read in-binary-object data, how is
that better?

Just readahead?
Wouldn't something similar be possible with separate files?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Cost of ICU data

2013-10-15 Thread Jorge Villalobos
On 10/15/13 2:41 PM, Chris Peterson wrote:
 On 10/15/13 12:28 PM, Brian Smith wrote:
 I have no idea how to install a langpack. Presumably it is something
 that is done through AMO. I am skeptical that this is easy enough to
 make it acceptable to push this task off to the user. we should at
 least automate it for them. If this data is too large and contributing
 towards aborted installs, why not just split the installation phase
 into two parts, and install the locale data in parallel to starting up
 the browser?
 
 How large is a langpack? Could Firefox install (all) langpacks in the
 background or on demand?
 
 I've heard rumblings about a Firefox updater project to unify updates
 for Firefox data files that are not coupled to a particular Firefox
 release (such as CRLS and GPU driver blocklists).
 
 
 chris

A quick look at this page
(https://addons.mozilla.org/firefox/language-tools/) shows that they're
generally in the 350-400 Kb range, each. I don't know how those would
compare with ICU lang packs.

Jorge
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform