Human Rights translations

2017-05-09 Thread Mats Blakstad via Unicode
Hi

Who is at the moment organizing the human rights translations in Unicode?
How can we submit new translations?

Best regards
Mats Blakstad


Re: Dataset for all ISO639 code sorted by country/territory?

2016-11-10 Thread Mats Blakstad
On 20 September 2016 at 18:34, Doug Ewell  wrote:

> > Is there any dataset that contains all languages in the world sorted
> > by country/territory?
>
> As others have pointed out, be careful about how slippery this slope can
> get. Everyone has his or her own opinion about how many speakers of
> Language X in country Y need to be identified, estimated, or conjectured
> in order to say that "language X is spoken in country Y."
>

For myself I was not actually considering the amount of speakers in each
country, but to map languages with countries/territories where the language
originated or have been spoken traditionally.
For instance in Norway we do have many immigrants from Pakistan, but I
doubt any of them would expect to see Urdu sorted under Norway, even though
there are many people in Norway that speak Urdu.
They would expect to see it under Pakistan that is a their heritage
country, I guess this is a lot an identity issue also

I do understand that it is not easy to get a perfect language-country
mapping, and I guess the mapping also depend on the use.
For myself I want people to be able to sort languages by
country/territories to make it easier to make lists of translations, I
think it can be good to be able to sort by territories instead of providing
a looong list of languages.
So I guess what matters is which language people mostly expect to find
under the country/territory.


>
> > I manage to find a dataset on the website of Ethnologue, though it
> > doesn't look like open source, need to check with them exactly how I'm
> > allowed to use it:
> > http://www.ethnologue.com/codes/download-code-tables
>
> The readme file included in the downloadable zip file makes SIL's terms
> very clear. Basically you need to credit SIL as the source of the data,
> not change it, and not make the data directly available for others to
> download. It's best not to get caught up in "open source" as if any
> other terms would make the data totally unusable.
>
>
I agree that a dataset is not unusable just because it is not open source,
but for myself I in fact need a dowbloadable file!

I tried contact SiL but they will only sell the dataset for a fee and will
not give an open source license.

Would it be possible to extend this dataset to all languages and start
build an open source data set for language-territory mapping?
http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html


Re: Possible to add new precomposed characters for local language in Togo?

2016-11-03 Thread Mats Blakstad
> Don’t use dead keys on the keyboard layout, then you can have the same
keyboard on Windows and Ubuntu.

As we try to keep the French keyboard 1:1 and only extend it with extra
functionalities, I guess we need to keep the dead keys already present
there?

> Shouldn’t you already have broken the French layout by reassigning keys
to Togo language letters Ɛ, Ǝ, Ɩ, Ɔ, Ʋ, Ʊ, Ŋ?
> If not, it sounds like it will slow down typing in those languages.

No, in XKB we managed to keep the French keyboard 1:1, only extend it with
extra symbols.
We can't reassigning keys as local languages in Togo also use all letters
in French alphabet.
Besides, they mostly use the French keyboard, it will make it a lot easier
& faster if they just can get extended buttons to a keyboard they already
know.

> You can also do dead keys in reverse where, instead of having the
diacritic key as a dead key that one pressed before a letter key, you have
the letter key as a dead key that you press before the diacritic key.

I managed to maske such a solution, but then the keyboard is not any longer
1;1 with French keyboard as users can use the keyboard exactly as they're
used to use the French keyboard.
What I try achieve is to keep the French keyboard unchanged, extend it with
symbols for Togolese local languages, and keep the assignment of diacritics
consistent with that of the French keyboard.

> Windows keymap compiler supports chained dead keys, it's only the visual
editor that does not allow it
> Serial dead keys are a Windows feature,and implementing them is feasible
around MSKLC although not in the GUI

Are there any other framework than MSKLC that is simple and easy to use?
Or do we need to build from scratch?

> http://charupdate.info#drivers
> Further I recommend to program the deadtrans list in C because this has
the advantage of working on a flat list, while in the .klc source it is
grouped.
> http://keyman.com/

Thanks for these great leads! I guess keyman will make it dependent for the
user to install extra softwares? And the charupdate is not available.
To me now it seems like the best approach to do it in C, I will try
investigate more on this.

Thanks for all the helpful feedbacks!

On 3 November 2016 at 08:56, Marcel Schneider <charupd...@orange.fr> wrote:

> On Thu, 3 Nov 2016 01:05:13 +0100, Mats Blakstad wrote:
>
> > After managing to add the keyboard to XKB I started on a new venture of
> > trying to make a windows version of the keyboard using this:
> > https://msdn.microsoft.com/en-us/globalization/keyboardlayouts.aspx
> >
> > It is nearly impossible to replicate as it seems like you can only add
> dead
> > keys if they have a precomposed character.
>
> This Windows limitation is indeed a significant drawback. You may wish to
> browse
> the archive back and forth starting from here:
> http://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0040.html
>
> >
> > Also, in Togo it is used double tones like these:
> >
> > "Ɛ̃́" LATIN CAPITAL LETTER EPSILON WITH TILDE AND ACUTE
> > "Ɛ̃̀" LATIN CAPITAL LETTER EPSILON WITH TILDE AND GRAVE
> >
> > And windows do not even allow dead keys with double symbols...
>
> I top on Philippe Verdyʼs reply. Serial dead keys are a Windows feature,
> and implementing them is feasible around MSKLC although not in the GUI, as
> its developer Michael Kaplan explained in a blog post that Doug Ewell
> shared in:
> http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0214.html
>
> Actually Iʼm localizing in English an interactive, self-explaining script
> in batch
> to facilitate generating the sources and layout drivers. It will soon be
> for free download here:
> http://charupdate.info#drivers
>
> Even the EULA issue is settled, as you may read there.
>
> Further I recommend to program the deadtrans list in C because this has the
> advantage of working on a flat list, while in the .klc source it is
> grouped.
>
> >
> > So I wonder if it could be a solution for a precomposed double tone?
> > So one unicode for tilde+acute and another for tilde+grave?
> >
> > The only way we manage to make the keyboard now is to add all the tones
> > behind the letters instead of before the letters.
> > I think in fact it seems easier than on French keyboard, but it will also
> > break the French keyboard when it comes to what order you click buttons
> to
> > add tones.
> > I also think it would be a benefit to have the keyboard on windows and
> > Ubuntu work mostly the same.
> >
> > Not sure if there are any other good ideas for how to solve it?
>
> Additionally to Denis Jacqueryeʼs replies, I would mention again a software
> that I believe is best fit to get what you need on Windows:
> Keyman.
> Keyman is now a part of SIL and is being made available for free.
> http://keyman.com/
>
> Best regards,
>
> Marcel
>
> >
> > On 25 February 2016 at 09:35, Marcel Schneider  wrote:
> >
> […]
>
>


Re: Possible to add new precomposed characters for local language in Togo?

2016-11-02 Thread Mats Blakstad
gt; We'll continue to live for long with the 3 basic layouts for Latin
> (QWERTY,
> > AZERTY, QWERTZ). And nothing will really change without a strong national
> > standard that will convince manufacturers to propose it at normal prices,
> > and force software vendors to include it in the builtin layouts for their
> > OSes.
>
> When I wrote: «The only difference […] should be […]», I swapped over into
> an ideal world… let alone that the historic swap from QWERTY to AZERTY was
> triggered by an «accessibility» issue based «on frequencies of use». My
> purpose being not to *enforce* ergonomics as about the alphabetical layout,
> I fully agree with Mats Blakstad, whose «method of extending the main
> layout is likely to be the only useful one» as I wrote in the same
> e-mail―and with Doug Ewell and Philippe Verdy, whose valuable contributions
> came on to sustain.
>
> All parts of the Latin script as provided by Unicode, that are not used to
> write local and national languages e.g. of Togo, or of France, may be
> hidden as on keytops, but accessible on software side, i.e. in the layout
> driver or in the configuration files. One other challenge in Togo would be
> how to give easy access to the seven supplemental letters Ɛ, Ɩ, Ɔ, Ǝ, Ʋ, Ʊ
> and Ŋ, while the five French precomposed letters are to be maintained, let
> alone Œ and Æ―the latter being rather seldom in French however―that are
> part of the new governmental requirements in France, among other characters
> like the angle quotation marks, called guillemets-chevrons[1].
>
> Generally talking, I canʼt help believe that providing the ability to type
> any Latin script using language on any Latin keyboard would be a good idea.
> Again, that is feasible without overloading the keyboard with dead keys,
> just providing the most frequently used ones, six in Togo as I can see.
>
> Marcel
>
> [1] Vers une norme française pour les claviers informatiques - Langue
> française et langues de France - Ministère de la Culture et de la
> Communication. (2016, January 15). Retrieved January 19, 2016, from
> http://www.culturecommunication.gouv.fr/Politiques-ministerielles/
> Langue-francaise-et-langues-de-France/Politiques-de-la-
> langue/Langues-et-numerique/Les-technologies-de-la-langue-
> et-la-normalisation/Vers-une-norme-francaise-pour-les-
> claviers-informatiques
>


Re: Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Mats Blakstad
I manage to find a dataset on the website of Ethnologue, though it doesn't
look like open source, need to check with them exactly how I'm allowed to
use it:
http://www.ethnologue.com/codes/download-code-tables

Thanks for the explanation Phillippe. I know it is not an easy issue. Look
for different resources on the web, any specific links or feedbacks would
be helpful.

On 17 September 2016 at 13:35, Philippe Verdy <verd...@wanadoo.fr> wrote:

> Not all languages are sorted, only those for which there are released data
> in CLDR.
> And languages frequently belong to several countries/territories at the
> same time, with different official or recognized status (itself independant
> of the number of actual speakers, which is very frequently roughly
> estimated).
> Some countries are giving official statistics about their national or
> regional languages, but frequently these stats are old, or underestimated
> or overestimated for political reasons, or some languages are mixed as if
> they were only one, or simply discarded if it is considered locally as a
> secondary language, even if the official language is superficially
> understood but taken as a primary one.
> Statistics are also forgetting native speakers living abroad in a
> diaspora, or secondary learners of a language taught in foreign countries.
>
>
> 2016-09-17 11:19 GMT+02:00 Mats Blakstad <mats.gbproj...@gmail.com>:
>
>> Hi
>>
>> Is there any dataset that contains all languages in the world sorted by
>> country/territory?
>>
>> I found this at Unicode, but seems like only containing the most spoken
>> languages in each country and not the smaller once:
>> http://www.unicode.org/cldr/charts/latest/supplemental/terri
>> tory_language_information.html
>>
>> Thanks in advance for help.
>>
>> Best regards
>> Mats Blakstad
>>
>
>


Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Mats Blakstad
Hi

Is there any dataset that contains all languages in the world sorted by
country/territory?

I found this at Unicode, but seems like only containing the most spoken
languages in each country and not the smaller once:
http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html

Thanks in advance for help.

Best regards
Mats Blakstad


Re: Possible to add new precomposed characters for local language in Togo?

2016-02-22 Thread Mats Blakstad
Thanks for all the useful feedbacks and ideas!

Exactly where should these combinations be documented?

2016-02-16 15:01 GMT+01:00 Marcel Schneider :

>
> Experience shows however that training on dead key layouts as used for
> French, can be extended to the use of combining diacritics entered after
> the base letter, with an appropriate keyboard layout driver. These
> combining characters being actually the most useful form of most
> diacritics, it is recommended that they be generated when the space bar is
> hit after a dead key if such are present. More obviously all needed
> diacritics are allocated to key positions, so that they can be added to any
> letter by the means of a single keystroke. One example is the keyboard
> layout for Bamanankan and French on the /Mali Pense/ site that Don Osbornʼs
> /Beyond Niamey/ blog linkes to [2]. Anyway, entering diacritics _after_ the
> base letter is the most up-to-date way to input composed characters,
> because it is very intuitive, and because it realizes the spirit of the
> character representation scheme of Unicode.
>
>
Thanks for this info, however; How much are the difference between if
people add the diacritics before or after the letter? If people are used to
add diacritics before the letter, would it not be pedagogically a better
idea to continue that logic on a new keyboard? What we tried to do is to
make a keyboard that simply extends the French keyboard (which is by far
the most used in Togo), and then people can get more keys to a keyboard
they already know. There are also other keyboards used locally by
linguistics, but people tend not to learn them, and it can be a barrier
when people need to click to change keyboard from "French" to a "Local
languages keyboard" all the time; I guess people prefer a keyboard that
they can use to write both. Anyway thanks a lot for these really useful
ideas that I will keep in mind!


Possible to add new precomposed characters for local language in Togo?

2016-02-15 Thread Mats Blakstad
I've worked to upload a keyboard for local languages in Togo to XKB
project, it is a combination keyboard based on French keyboard and extended
to make it possible to write all the local languages in Togo. However many
of the languages have several tones and even use combined tones. However
when I tried to update the composer to make it work it seems like the
composer only can give back a precomposed character and not a string with
combined characters.

I now wonder, generally, is it best to add new precomposed characters to
Unicode? Should there be a unicode symbol for each combination used? What
is best practise? I ask because I see some unicodes are precomposed
characters, I'm not sure why they are useful, but if they are maybe we also
should add these?

For reference here are the combinations needed, as you can see there are
many! I've tried to check over, I don't think there exists precomposed
characters for any of them.

ɛ / epsilon = U025B
  : "ɛ́"   LATIN SMALL LETTER
EPSILON WITH ACUTE
  : "ɛ̀"   LATIN SMALL LETTER
EPSILON WITH GRAVE
: "ɛ̂"   LATIN SMALL LETTER EPSILON
WITH CIRCUMFLEX
  : "ɛ̌"   LATIN SMALL LETTER
EPSILON WITH CARON
  : "ɛ̄"   LATIN SMALL LETTER
EPSILON WITH MACRON
  : "ɛ̃"   LATIN SMALL LETTER
EPSILON WITH TILDE
 : "ɛ̃́"   LATIN SMALL LETTER
EPSILON WITH TILDE AND ACUTE
 : "ɛ̃̀"   LATIN SMALL LETTER
EPSILON WITH TILDE AND GRAVE

Ɛ / EPSILON = U0190
  : "Ɛ́"   LATIN CAPITAL LETTER
EPSILON WITH ACUTE
  : "Ɛ̀"   LATIN CAPITAL LETTER
EPSILON WITH GRAVE
: "Ɛ̂"   LATIN CAPITAL LETTER
EPSILON WITH CIRCUMFLEX
  : "Ɛ̌"   LATIN CAPITAL LETTER
EPSILON WITH CARON
  : "Ɛ̄"   LATIN CAPITAL LETTER
EPSILON WITH MACRON
  : "Ɛ̃"   LATIN CAPITAL LETTER
EPSILON WITH TILDE
 : "Ɛ̃́"   LATIN CAPITAL LETTER
EPSILON WITH TILDE AND ACUTE
 : "Ɛ̃̀"   LATIN CAPITAL LETTER
EPSILON WITH TILDE AND GRAVE

ɩ / iota = U0269
  : "ɩ́"   LATIN SMALL LETTER IOTA
WITH ACUTE
  : "ɩ̀"   LATIN SMALL LETTER IOTA
WITH GRAVE
: "ɩ̂"   LATIN SMALL LETTER IOTA
WITH CIRCUMFLEX
  : "ɩ̌"   LATIN SMALL LETTER IOTA
WITH CARON
  : "ɩ̄"   LATIN SMALL LETTER IOTA
WITH MACRON

ɩ / IOTA = U0196
  : "Ɩ́"   LATIN CAPITAL LETTER IOTA
WITH ACUTE
  : "Ɩ̀"   LATIN CAPITAL LETTER IOTA
WITH GRAVE
: "Ɩ̂"   LATIN CAPITAL LETTER IOTA
WITH CIRCUMFLEX
  : "Ɩ̌"   LATIN CAPITAL LETTER IOTA
WITH CARON
  : "Ɩ̄"   LATIN CAPITAL LETTER
IOTA WITH MACRON

ɔ / open o = U0254
  : "ɔ́"   LATIN SMALL LETTER OPEN O
WITH ACUTE
  : "ɔ̀"   LATIN SMALL LETTER OPEN O
WITH GRAVE
: "ɔ̂"   LATIN SMALL LETTER OPEN O
WITH CIRCUMFLEX
  : "ɔ̌"   LATIN SMALL LETTER OPEN O
WITH CARON
  : "ɔ̄"   LATIN SMALL LETTER OPEN
O WITH MACRON
  : "ɔ̃"   LATIN SMALL LETTER OPEN O
WITH TILDE
 : "ɔ̃́"   LATIN SMALL LETTER OPEN O
WITH TILDE AND ACUTE
 : "ɔ̃̀"   LATIN SMALL LETTER OPEN O
WITH TILDE AND GRAVE

ɔ / OPEN O = U0186
  : "Ɔ́"   LATIN CAPITAL LETTER OPEN
O WITH ACUTE
  : "Ɔ̀"   LATIN CAPITAL LETTER OPEN
O WITH GRAVE
: "Ɔ̂"   LATIN CAPITAL LETTER OPEN
O WITH CIRCUMFLEX
  : "Ɔ̌"   LATIN CAPITAL LETTER OPEN
O WITH CARON
  : "Ɔ̄"   LATIN CAPITAL LETTER
OPEN O WITH MACRON
  : "Ɔ̃"   LATIN CAPITAL LETTER OPEN
O WITH TILDE
 : "Ɔ̃́"   LATIN CAPITAL LETTER OPEN
O WITH TILDE AND ACUTE
 : "Ɔ̃̀"   LATIN CAPITAL LETTER OPEN
O WITH TILDE AND GRAVE

ǝ / turned e = U01DD
  : "ǝ́"   LATIN SMALL LETTER TURNED
E WITH ACUTE
  : "ǝ̀"   LATIN SMALL LETTER TURNED
E WITH GRAVE
: "ǝ̂"   LATIN SMALL LETTER TURNED
E WITH CIRCUMFLEX
  : "ǝ̌"   LATIN SMALL LETTER TURNED
E WITH CARON
  : "ǝ̄"   LATIN SMALL LETTER
TURNED E WITH MACRON
  : "ǝ̃"   LATIN SMALL LETTER TURNED
E WITH TILDE
 : "ǝ̃́"   LATIN SMALL LETTER TURNED
E WITH TILDE AND ACUTE
 : "ǝ̃̀"   LATIN SMALL LETTER TURNED
E WITH TILDE AND GRAVE

Ǝ / TURNED E = U018E
  : "Ǝ́"   LATIN CAPITAL LETTER
TURNED E WITH ACUTE
  : "Ǝ̀"   LATIN CAPITAL LETTER
TURNED E WITH GRAVE
: "Ǝ̂"   LATIN CAPITAL LETTER
TURNED E WITH CIRCUMFLEX
  : "Ǝ̌"   LATIN CAPITAL LETTER
TURNED E WITH CARON
  : "Ǝ̄"   LATIN CAPITAL LETTER
TURNED E WITH MACRON
  : "Ǝ̃"   LATIN CAPITAL LETTER
TURNED E WITH TILDE
 : "Ǝ̃́"   LATIN