Re: Unicode Bidi Algorithm – Java reference implementation

2016-09-17 Thread Deepak Jois
On Sat, Sep 17, 2016 at 9:53 PM, Khaled Hosny  wrote:
> I think there is a C implementation that is kept up to date,

Yes, I found that one after I posted. FWIW, here are the changes for
the latest version:

https://gist.github.com/deepakjois/5a3ae81a105abd3523ed0efe2e52f52e/revisions

> is also a Python implementation that should pass the tests

That implementation looks very different from the C and Java versions.
I can’t tell by looking at a glance if it has been updated for the
changes in Unicode 8.0. But it definitely will not pass the tests in
BidiCharacter.txt because it lacks support for paired brackets.

I just finished writing a reference implementation in Lua[1] which is
a line by line port of the Java reference implementation and passes
nearly all tests in BidiCharacter.txt.

I now need to make the updates to support the changes in Unicode 8.0,
and I am finding it a bit hard to grok the changes in C at a glance.

Deepak

[1]: https://github.com/deepakjois/luabidi/blob/master/src/bidi.lua



Re: Unicode Bidi Algorithm – Java reference implementation

2016-09-17 Thread Khaled Hosny
On Sat, Sep 17, 2016 at 05:01:10PM +0530, Deepak Jois wrote:
> Hi
> 
> It seems that the Java reference implementation for the Unicode Bidi
> algorithm that I downloaded from the unicode.org site fails against
> some test cases in the BidiCharacterTest.txt file – the ones that are
> specifically meant to test for changes in Unicode 8.0.
> 
> Has the reference implementation been updated, and does anyone have a
> copy they can share? Is there a reference implementation in some other
> language that I could look at, which has been updated?

I think there is a C implementation that is kept up to date, and there
is also a Python implementation that should pass the tests:
https://github.com/behdad/pybyedie

Regards,
Khaled


Unicode Bidi Algorithm – Java reference implementation

2016-09-17 Thread Deepak Jois
Hi

It seems that the Java reference implementation for the Unicode Bidi
algorithm that I downloaded from the unicode.org site fails against
some test cases in the BidiCharacterTest.txt file – the ones that are
specifically meant to test for changes in Unicode 8.0.

Has the reference implementation been updated, and does anyone have a
copy they can share? Is there a reference implementation in some other
language that I could look at, which has been updated?

Thank you
Deepak



Re: Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Mats Blakstad
I manage to find a dataset on the website of Ethnologue, though it doesn't
look like open source, need to check with them exactly how I'm allowed to
use it:
http://www.ethnologue.com/codes/download-code-tables

Thanks for the explanation Phillippe. I know it is not an easy issue. Look
for different resources on the web, any specific links or feedbacks would
be helpful.

On 17 September 2016 at 13:35, Philippe Verdy  wrote:

> Not all languages are sorted, only those for which there are released data
> in CLDR.
> And languages frequently belong to several countries/territories at the
> same time, with different official or recognized status (itself independant
> of the number of actual speakers, which is very frequently roughly
> estimated).
> Some countries are giving official statistics about their national or
> regional languages, but frequently these stats are old, or underestimated
> or overestimated for political reasons, or some languages are mixed as if
> they were only one, or simply discarded if it is considered locally as a
> secondary language, even if the official language is superficially
> understood but taken as a primary one.
> Statistics are also forgetting native speakers living abroad in a
> diaspora, or secondary learners of a language taught in foreign countries.
>
>
> 2016-09-17 11:19 GMT+02:00 Mats Blakstad :
>
>> Hi
>>
>> Is there any dataset that contains all languages in the world sorted by
>> country/territory?
>>
>> I found this at Unicode, but seems like only containing the most spoken
>> languages in each country and not the smaller once:
>> http://www.unicode.org/cldr/charts/latest/supplemental/terri
>> tory_language_information.html
>>
>> Thanks in advance for help.
>>
>> Best regards
>> Mats Blakstad
>>
>
>


Re: Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Philippe Verdy
Not all languages are sorted, only those for which there are released data
in CLDR.
And languages frequently belong to several countries/territories at the
same time, with different official or recognized status (itself independant
of the number of actual speakers, which is very frequently roughly
estimated).
Some countries are giving official statistics about their national or
regional languages, but frequently these stats are old, or underestimated
or overestimated for political reasons, or some languages are mixed as if
they were only one, or simply discarded if it is considered locally as a
secondary language, even if the official language is superficially
understood but taken as a primary one.
Statistics are also forgetting native speakers living abroad in a diaspora,
or secondary learners of a language taught in foreign countries.


2016-09-17 11:19 GMT+02:00 Mats Blakstad :

> Hi
>
> Is there any dataset that contains all languages in the world sorted by
> country/territory?
>
> I found this at Unicode, but seems like only containing the most spoken
> languages in each country and not the smaller once:
> http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_
> information.html
>
> Thanks in advance for help.
>
> Best regards
> Mats Blakstad
>


Re: Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Otto Stolz

Hello,

am 2016-09-17 um 11:19 Uhr hat Mats Blakstad geschrieben:

Is there any dataset that contains all languages in the world sorted by
country/territory?


Have you tried , already?

Also, 
and 
may provide partial answers.

Best wishes,
  Otto Stolz



Dataset for all ISO639 code sorted by country/territory?

2016-09-17 Thread Mats Blakstad
Hi

Is there any dataset that contains all languages in the world sorted by
country/territory?

I found this at Unicode, but seems like only containing the most spoken
languages in each country and not the smaller once:
http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html

Thanks in advance for help.

Best regards
Mats Blakstad