Human Rights translations
Hi Who is at the moment organizing the human rights translations in Unicode? How can we submit new translations? Best regards Mats Blakstad
Re: Dataset for all ISO639 code sorted by country/territory?
On 20 September 2016 at 18:34, Doug Ewellwrote: > > Is there any dataset that contains all languages in the world sorted > > by country/territory? > > As others have pointed out, be careful about how slippery this slope can > get. Everyone has his or her own opinion about how many speakers of > Language X in country Y need to be identified, estimated, or conjectured > in order to say that "language X is spoken in country Y." > For myself I was not actually considering the amount of speakers in each country, but to map languages with countries/territories where the language originated or have been spoken traditionally. For instance in Norway we do have many immigrants from Pakistan, but I doubt any of them would expect to see Urdu sorted under Norway, even though there are many people in Norway that speak Urdu. They would expect to see it under Pakistan that is a their heritage country, I guess this is a lot an identity issue also I do understand that it is not easy to get a perfect language-country mapping, and I guess the mapping also depend on the use. For myself I want people to be able to sort languages by country/territories to make it easier to make lists of translations, I think it can be good to be able to sort by territories instead of providing a looong list of languages. So I guess what matters is which language people mostly expect to find under the country/territory. > > > I manage to find a dataset on the website of Ethnologue, though it > > doesn't look like open source, need to check with them exactly how I'm > > allowed to use it: > > http://www.ethnologue.com/codes/download-code-tables > > The readme file included in the downloadable zip file makes SIL's terms > very clear. Basically you need to credit SIL as the source of the data, > not change it, and not make the data directly available for others to > download. It's best not to get caught up in "open source" as if any > other terms would make the data totally unusable. > > I agree that a dataset is not unusable just because it is not open source, but for myself I in fact need a dowbloadable file! I tried contact SiL but they will only sell the dataset for a fee and will not give an open source license. Would it be possible to extend this dataset to all languages and start build an open source data set for language-territory mapping? http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html
Re: Possible to add new precomposed characters for local language in Togo?
> Don’t use dead keys on the keyboard layout, then you can have the same keyboard on Windows and Ubuntu. As we try to keep the French keyboard 1:1 and only extend it with extra functionalities, I guess we need to keep the dead keys already present there? > Shouldn’t you already have broken the French layout by reassigning keys to Togo language letters Ɛ, Ǝ, Ɩ, Ɔ, Ʋ, Ʊ, Ŋ? > If not, it sounds like it will slow down typing in those languages. No, in XKB we managed to keep the French keyboard 1:1, only extend it with extra symbols. We can't reassigning keys as local languages in Togo also use all letters in French alphabet. Besides, they mostly use the French keyboard, it will make it a lot easier & faster if they just can get extended buttons to a keyboard they already know. > You can also do dead keys in reverse where, instead of having the diacritic key as a dead key that one pressed before a letter key, you have the letter key as a dead key that you press before the diacritic key. I managed to maske such a solution, but then the keyboard is not any longer 1;1 with French keyboard as users can use the keyboard exactly as they're used to use the French keyboard. What I try achieve is to keep the French keyboard unchanged, extend it with symbols for Togolese local languages, and keep the assignment of diacritics consistent with that of the French keyboard. > Windows keymap compiler supports chained dead keys, it's only the visual editor that does not allow it > Serial dead keys are a Windows feature,and implementing them is feasible around MSKLC although not in the GUI Are there any other framework than MSKLC that is simple and easy to use? Or do we need to build from scratch? > http://charupdate.info#drivers > Further I recommend to program the deadtrans list in C because this has the advantage of working on a flat list, while in the .klc source it is grouped. > http://keyman.com/ Thanks for these great leads! I guess keyman will make it dependent for the user to install extra softwares? And the charupdate is not available. To me now it seems like the best approach to do it in C, I will try investigate more on this. Thanks for all the helpful feedbacks! On 3 November 2016 at 08:56, Marcel Schneider <charupd...@orange.fr> wrote: > On Thu, 3 Nov 2016 01:05:13 +0100, Mats Blakstad wrote: > > > After managing to add the keyboard to XKB I started on a new venture of > > trying to make a windows version of the keyboard using this: > > https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx > > > > It is nearly impossible to replicate as it seems like you can only add > dead > > keys if they have a precomposed character. > > This Windows limitation is indeed a significant drawback. You may wish to > browse > the archive back and forth starting from here: > http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html > > > > > Also, in Togo it is used double tones like these: > > > > "Ɛ̃́" LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE > > "Ɛ̃̀" LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE > > > > And windows do not even allow dead keys with double symbols... > > I top on Philippe Verdyʼs reply. Serial dead keys are a Windows feature, > and implementing them is feasible around MSKLC although not in the GUI, as > its developer Michael Kaplan explained in a blog post that Doug Ewell > shared in: > http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0214.html > > Actually Iʼm localizing in English an interactive, self-explaining script > in batch > to facilitate generating the sources and layout drivers. It will soon be > for free download here: > http://charupdate.info#drivers > > Even the EULA issue is settled, as you may read there. > > Further I recommend to program the deadtrans list in C because this has the > advantage of working on a flat list, while in the .klc source it is > grouped. > > > > > So I wonder if it could be a solution for a precomposed double tone? > > So one unicode for tilde+acute and another for tilde+grave? > > > > The only way we manage to make the keyboard now is to add all the tones > > behind the letters instead of before the letters. > > I think in fact it seems easier than on French keyboard, but it will also > > break the French keyboard when it comes to what order you click buttons > to > > add tones. > > I also think it would be a benefit to have the keyboard on windows and > > Ubuntu work mostly the same. > > > > Not sure if there are any other good ideas for how to solve it? > > Additionally to Denis Jacqueryeʼs replies, I would mention again a software > that I believe is best fit to get what you need on Windows: > Keyman. > Keyman is now a part of SIL and is being made available for free. > http://keyman.com/ > > Best regards, > > Marcel > > > > > On 25 February 2016 at 09:35, Marcel Schneider wrote: > > > […] > >
Re: Possible to add new precomposed characters for local language in Togo?
gt; We'll continue to live for long with the 3 basic layouts for Latin > (QWERTY, > > AZERTY, QWERTZ). And nothing will really change without a strong national > > standard that will convince manufacturers to propose it at normal prices, > > and force software vendors to include it in the builtin layouts for their > > OSes. > > When I wrote: «The only difference […] should be […]», I swapped over into > an ideal world… let alone that the historic swap from QWERTY to AZERTY was > triggered by an «accessibility» issue based «on frequencies of use». My > purpose being not to *enforce* ergonomics as about the alphabetical layout, > I fully agree with Mats Blakstad, whose «method of extending the main > layout is likely to be the only useful one» as I wrote in the same > e-mail―and with Doug Ewell and Philippe Verdy, whose valuable contributions > came on to sustain. > > All parts of the Latin script as provided by Unicode, that are not used to > write local and national languages e.g. of Togo, or of France, may be > hidden as on keytops, but accessible on software side, i.e. in the layout > driver or in the configuration files. One other challenge in Togo would be > how to give easy access to the seven supplemental letters Ɛ, Ɩ, Ɔ, Ǝ, Ʋ, Ʊ > and Ŋ, while the five French precomposed letters are to be maintained, let > alone Œ and Æ―the latter being rather seldom in French however―that are > part of the new governmental requirements in France, among other characters > like the angle quotation marks, called guillemets-chevrons[1]. > > Generally talking, I canʼt help believe that providing the ability to type > any Latin script using language on any Latin keyboard would be a good idea. > Again, that is feasible without overloading the keyboard with dead keys, > just providing the most frequently used ones, six in Togo as I can see. > > Marcel > > [1] Vers une norme française pour les claviers informatiques - Langue > française et langues de France - Ministère de la Culture et de la > Communication. (2016, January 15). Retrieved January 19, 2016, from > http://www.culturecommunication.gouv.fr/Politiques-ministerielles/ > Langue-francaise-et-langues-de-France/Politiques-de-la- > langue/Langues-et-numerique/Les-technologies-de-la-langue- > et-la-normalisation/Vers-une-norme-francaise-pour-les- > claviers-informatiques >
Re: Dataset for all ISO639 code sorted by country/territory?
I manage to find a dataset on the website of Ethnologue, though it doesn't look like open source, need to check with them exactly how I'm allowed to use it: http://www.ethnologue.com/codes/download-code-tables Thanks for the explanation Phillippe. I know it is not an easy issue. Look for different resources on the web, any specific links or feedbacks would be helpful. On 17 September 2016 at 13:35, Philippe Verdy <verd...@wanadoo.fr> wrote: > Not all languages are sorted, only those for which there are released data > in CLDR. > And languages frequently belong to several countries/territories at the > same time, with different official or recognized status (itself independant > of the number of actual speakers, which is very frequently roughly > estimated). > Some countries are giving official statistics about their national or > regional languages, but frequently these stats are old, or underestimated > or overestimated for political reasons, or some languages are mixed as if > they were only one, or simply discarded if it is considered locally as a > secondary language, even if the official language is superficially > understood but taken as a primary one. > Statistics are also forgetting native speakers living abroad in a > diaspora, or secondary learners of a language taught in foreign countries. > > > 2016-09-17 11:19 GMT+02:00 Mats Blakstad <mats.gbproj...@gmail.com>: > >> Hi >> >> Is there any dataset that contains all languages in the world sorted by >> country/territory? >> >> I found this at Unicode, but seems like only containing the most spoken >> languages in each country and not the smaller once: >> http://www.unicode.org/cldr/charts/latest/supplemental/terri >> tory_language_information.html >> >> Thanks in advance for help. >> >> Best regards >> Mats Blakstad >> > >
Dataset for all ISO639 code sorted by country/territory?
Hi Is there any dataset that contains all languages in the world sorted by country/territory? I found this at Unicode, but seems like only containing the most spoken languages in each country and not the smaller once: http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html Thanks in advance for help. Best regards Mats Blakstad
Re: Possible to add new precomposed characters for local language in Togo?
Thanks for all the useful feedbacks and ideas! Exactly where should these combinations be documented? 2016-02-16 15:01 GMT+01:00 Marcel Schneider: > > Experience shows however that training on dead key layouts as used for > French, can be extended to the use of combining diacritics entered after > the base letter, with an appropriate keyboard layout driver. These > combining characters being actually the most useful form of most > diacritics, it is recommended that they be generated when the space bar is > hit after a dead key if such are present. More obviously all needed > diacritics are allocated to key positions, so that they can be added to any > letter by the means of a single keystroke. One example is the keyboard > layout for Bamanankan and French on the /Mali Pense/ site that Don Osbornʼs > /Beyond Niamey/ blog linkes to [2]. Anyway, entering diacritics _after_ the > base letter is the most up-to-date way to input composed characters, > because it is very intuitive, and because it realizes the spirit of the > character representation scheme of Unicode. > > Thanks for this info, however; How much are the difference between if people add the diacritics before or after the letter? If people are used to add diacritics before the letter, would it not be pedagogically a better idea to continue that logic on a new keyboard? What we tried to do is to make a keyboard that simply extends the French keyboard (which is by far the most used in Togo), and then people can get more keys to a keyboard they already know. There are also other keyboards used locally by linguistics, but people tend not to learn them, and it can be a barrier when people need to click to change keyboard from "French" to a "Local languages keyboard" all the time; I guess people prefer a keyboard that they can use to write both. Anyway thanks a lot for these really useful ideas that I will keep in mind!
Possible to add new precomposed characters for local language in Togo?
I've worked to upload a keyboard for local languages in Togo to XKB project, it is a combination keyboard based on French keyboard and extended to make it possible to write all the local languages in Togo. However many of the languages have several tones and even use combined tones. However when I tried to update the composer to make it work it seems like the composer only can give back a precomposed character and not a string with combined characters. I now wonder, generally, is it best to add new precomposed characters to Unicode? Should there be a unicode symbol for each combination used? What is best practise? I ask because I see some unicodes are precomposed characters, I'm not sure why they are useful, but if they are maybe we also should add these? For reference here are the combinations needed, as you can see there are many! I've tried to check over, I don't think there exists precomposed characters for any of them. ɛ / epsilon = U025B : "ɛ́" LATIN SMALL LETTER EPSILON WITH ACUTE : "ɛ̀" LATIN SMALL LETTER EPSILON WITH GRAVE : "ɛ̂" LATIN SMALL LETTER EPSILON WITH CIRCUMFLEX : "ɛ̌" LATIN SMALL LETTER EPSILON WITH CARON : "ɛ̄" LATIN SMALL LETTER EPSILON WITH MACRON : "ɛ̃" LATIN SMALL LETTER EPSILON WITH TILDE : "ɛ̃́" LATIN SMALL LETTER EPSILON WITH TILDE AND ACUTE : "ɛ̃̀" LATIN SMALL LETTER EPSILON WITH TILDE AND GRAVE Ɛ / EPSILON = U0190 : "Ɛ́" LATIN CAPITAL LETTER EPSILON WITH ACUTE : "Ɛ̀" LATIN CAPITAL LETTER EPSILON WITH GRAVE : "Ɛ̂" LATIN CAPITAL LETTER EPSILON WITH CIRCUMFLEX : "Ɛ̌" LATIN CAPITAL LETTER EPSILON WITH CARON : "Ɛ̄" LATIN CAPITAL LETTER EPSILON WITH MACRON : "Ɛ̃" LATIN CAPITAL LETTER EPSILON WITH TILDE : "Ɛ̃́" LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE : "Ɛ̃̀" LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE ɩ / iota = U0269 : "ɩ́" LATIN SMALL LETTER IOTA WITH ACUTE : "ɩ̀" LATIN SMALL LETTER IOTA WITH GRAVE : "ɩ̂" LATIN SMALL LETTER IOTA WITH CIRCUMFLEX : "ɩ̌" LATIN SMALL LETTER IOTA WITH CARON : "ɩ̄" LATIN SMALL LETTER IOTA WITH MACRON ɩ / IOTA = U0196 : "Ɩ́" LATIN CAPITAL LETTER IOTA WITH ACUTE : "Ɩ̀" LATIN CAPITAL LETTER IOTA WITH GRAVE : "Ɩ̂" LATIN CAPITAL LETTER IOTA WITH CIRCUMFLEX : "Ɩ̌" LATIN CAPITAL LETTER IOTA WITH CARON : "Ɩ̄" LATIN CAPITAL LETTER IOTA WITH MACRON ɔ / open o = U0254 : "ɔ́" LATIN SMALL LETTER OPEN O WITH ACUTE : "ɔ̀" LATIN SMALL LETTER OPEN O WITH GRAVE : "ɔ̂" LATIN SMALL LETTER OPEN O WITH CIRCUMFLEX : "ɔ̌" LATIN SMALL LETTER OPEN O WITH CARON : "ɔ̄" LATIN SMALL LETTER OPEN O WITH MACRON : "ɔ̃" LATIN SMALL LETTER OPEN O WITH TILDE : "ɔ̃́" LATIN SMALL LETTER OPEN O WITH TILDE AND ACUTE : "ɔ̃̀" LATIN SMALL LETTER OPEN O WITH TILDE AND GRAVE ɔ / OPEN O = U0186 : "Ɔ́" LATIN CAPITAL LETTER OPEN O WITH ACUTE : "Ɔ̀" LATIN CAPITAL LETTER OPEN O WITH GRAVE : "Ɔ̂" LATIN CAPITAL LETTER OPEN O WITH CIRCUMFLEX : "Ɔ̌" LATIN CAPITAL LETTER OPEN O WITH CARON : "Ɔ̄" LATIN CAPITAL LETTER OPEN O WITH MACRON : "Ɔ̃" LATIN CAPITAL LETTER OPEN O WITH TILDE : "Ɔ̃́" LATIN CAPITAL LETTER OPEN O WITH TILDE AND ACUTE : "Ɔ̃̀" LATIN CAPITAL LETTER OPEN O WITH TILDE AND GRAVE ǝ / turned e = U01DD : "ǝ́" LATIN SMALL LETTER TURNED E WITH ACUTE : "ǝ̀" LATIN SMALL LETTER TURNED E WITH GRAVE : "ǝ̂" LATIN SMALL LETTER TURNED E WITH CIRCUMFLEX : "ǝ̌" LATIN SMALL LETTER TURNED E WITH CARON : "ǝ̄" LATIN SMALL LETTER TURNED E WITH MACRON : "ǝ̃" LATIN SMALL LETTER TURNED E WITH TILDE : "ǝ̃́" LATIN SMALL LETTER TURNED E WITH TILDE AND ACUTE : "ǝ̃̀" LATIN SMALL LETTER TURNED E WITH TILDE AND GRAVE Ǝ / TURNED E = U018E : "Ǝ́" LATIN CAPITAL LETTER TURNED E WITH ACUTE : "Ǝ̀" LATIN CAPITAL LETTER TURNED E WITH GRAVE : "Ǝ̂" LATIN CAPITAL LETTER TURNED E WITH CIRCUMFLEX : "Ǝ̌" LATIN CAPITAL LETTER TURNED E WITH CARON : "Ǝ̄" LATIN CAPITAL LETTER TURNED E WITH MACRON : "Ǝ̃" LATIN CAPITAL LETTER TURNED E WITH TILDE : "Ǝ̃́" LATIN