Mats Blakstad wrote: > For myself I was not actually considering the amount of speakers in > each country, but to map languages with countries/territories where > the language originated or have been spoken traditionally.
And that is where I think you'll have disagreement on the details. > So I guess what matters is which language people mostly expect to find > under the country/territory. Yep, that's the challenge. > Would it be possible to extend this dataset to all languages and start > build an open source data set for language-territory mapping? > http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html > That's a good question for the CLDR folks, who have their own mailing list. Keep in mind that the CLDR table documents 675 of the world's best-known languages, counting variants such as three different orthographies of Uzbek. While anything is possible, extending this to "all languages," e.g. the other 6,300 lesser-known living languages, might require a bit of time and money. There is also a resource in the "UDHR in Unicode" project that might be worth investigating, though it too is an imperfect match with what you seem to be looking for. -- Doug Ewell | Thornton, CO, US | ewellic.org