Re: GREETINGS FROM iSTANBUL

2021-08-01 Thread Salih Dincer via Digitalmars-d-learn

On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:


A common solution to this in other languages is to have a 
version of toUpper that takes a locale as an argument. Some 
examples:


- Javascript: 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase


I did not know that; exactly that I want to talk about.  So clean 
code...


Thank you Paul.


Re: GREETINGS FROM iSTANBUL

2021-08-01 Thread Salih Dincer via Digitalmars-d-learn

On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:
On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole 
wrote:

It appears you are using the wrong lowercase character.


I think so too, here's the proof:
```d
import std.string, std.stdio;

void main()
{
  auto istanbul = "\u0131stanbul";
  enum capitalized = "Istanbul";
  assert(istanbul.capitalize == capitalized);
  assert("istanbul".capitalize == capitalized);
}
```
Different characters but same and seamless results...


Re: GREETINGS FROM iSTANBUL

2021-08-01 Thread Paul Backus via Digitalmars-d-learn

On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole wrote:

It appears you are using the wrong lowercase character.

https://en.wikipedia.org/wiki/Dotted_and_dotless_I

From a quick experiment, it appears std.uni is treating the 
upper case dotted I's lower case as a grapheme. Which it 
probably shouldn't be as there is an actual character for that.


We might need to update our unicode database... or something.


It's not the wrong lower-case character. Turkish uses U+0069 
(a.k.a. ASCII 'i') for lower-case dotted I, but has a non-default 
case mapping that pairs U+0069 with U+0130 ('İ') rather than 
U+0049 (ASCII 'I'). Phobos' std.uni uses the default case mapping 
for its toUpper function, so it does not produce the correct 
result for Turkish text.


Source: https://www.unicode.org/faq/casemap_charprop.html#1

A common solution to this in other languages is to have a version 
of toUpper that takes a locale as an argument. Some examples:


- Javascript: 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase

- Go: https://pkg.go.dev/strings#ToUpperSpecial
- Java: 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#toUpperCase(java.util.Locale)
- C#: 
https://docs.microsoft.com/en-US/dotnet/api/system.string.toupper?view=net-5.0


Re: GREETINGS FROM iSTANBUL

2021-08-01 Thread rikki cattermole via Digitalmars-d-learn

It appears you are using the wrong lowercase character.

https://en.wikipedia.org/wiki/Dotted_and_dotless_I

From a quick experiment, it appears std.uni is treating the upper case 
dotted I's lower case as a grapheme. Which it probably shouldn't be as 
there is an actual character for that.


We might need to update our unicode database... or something.