Re: [elixir-core:7865] Proposal: String normalization in compatibility mode and transliteration

2018-02-12 Thread Michał Muskała
Is there some Unicode standard around transliteration?

As far as I know transliteration is very language specific - for example 
Russian is transliterated differently when embedded in an English text vs a 
Polish text. 

Michał. 

> Wiadomość napisana przez Nicolas Goy  w dniu 12.02.2018, o 
> godz. 14:30:
> 
> 1.
> 
> String.normalize should support NFKC and NFKD unicode normalization format.
> 
> Reference: https://www.unicode.org/reports/tr15/
> 
> Those are particularly useful to generate "machine identifiers" from user 
> input, like usernames.
> 
> 2.
> 
> The second part (which is independent but related), is support for unicode 
> transliteration.
> 
> Basically, this is a "non destructive" unicode->ascii conversion.
> 
> There is a library doing it in elixir
> https://github.com/fcevado/unidecode 
> 
> and a javascript example 
> https://github.com/pid/speakingurl
> 
> Also some discussion on the forum:
> https://elixirforum.com/t/how-to-replace-accented-letters-with-ascii-letters/539/8
> 
> My thinking is that all those libraries are doing it a bit differently, 
> because, well, unicode is hard.
> And with unicode being so hard, I think it should be implemented at the 
> language level (or in a core library) to be done right and supported.
> It might not matters much for English readers, but for other languages, it is 
> something you will implement eventually, often poorly.
> 
> Some references:
> http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elixir-lang-core+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elixir-lang-core/d2839fb2-984c-4bcf-b8fd-c891c8c24c83%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/0679C6CE-9F4A-48FD-8E43-221E114A2642%40muskala.eu.
For more options, visit https://groups.google.com/d/optout.


[elixir-core:7865] Proposal: String normalization in compatibility mode and transliteration

2018-02-12 Thread Nicolas Goy
1.

String.normalize should support NFKC and NFKD unicode normalization format.

Reference: https://www.unicode.org/reports/tr15/

Those are particularly useful to generate "machine identifiers" from user 
input, like usernames.

2.

The second part (which is independent but related), is support for unicode 
transliteration.

Basically, this is a "non destructive" unicode->ascii conversion.

There is a library doing it in elixir
https://github.com/fcevado/unidecode 

and a javascript example 
https://github.com/pid/speakingurl

Also some discussion on the forum:
https://elixirforum.com/t/how-to-replace-accented-letters-with-ascii-letters/539/8

My thinking is that all those libraries are doing it a bit differently, 
because, well, unicode is hard.
And with unicode being so hard, I think it should be implemented at the 
language level (or in a core library) to be done right and supported.
It might not matters much for English readers, but for other languages, it 
is something you will implement eventually, often poorly.

Some references:
http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/d2839fb2-984c-4bcf-b8fd-c891c8c24c83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.