Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-04 Thread crimaniak via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 14:39:34 UTC, ag0aep6g wrote:

Looks like forum.dlang.org has a problem when they appear side 
by-side.


Works (in the preview): ‍‍‍ ️‍
Doesn't work: ‍‍‍️‍


For me, it looks as the used font has ligatures for these faces. 
Mozilla under Linux, I guess it's 'EmojiOne Mozilla' font.




Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-04 Thread ag0aep6g via Digitalmars-d-learn

On 07/04/2018 05:12 PM, aliak wrote:

Is updating unicode stuff to the latest a matter of some config file
somewhere with the code point configurations that result in specific
graphemes?


I don't know.

[...]
Also, any reason (technical or otherwise) that we have to slice a 
grapheme to get it printed? Or just no one implemented something like

toString or the like?


I don't know.

[...]

I can't really imagine anyone figuring out that they have to slice a
grapheme to get it to print 樂


You can figure it out by reading the documentation for `Grapheme`.
However, the documentation doesn't make it clear that `byGrapheme` is a
range of `Grapheme`s. That's an easy fix, though:

https://github.com/dlang/phobos/pull/6627


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-04 Thread aliak via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 14:37:32 UTC, Adam D. Ruppe wrote:

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:

[...]


What system are you on? Successfully printing this stuff 
depends on a lot of display details too, like writeln goes to a 
terminal/console and they are rarely configured to support such 
characters by default.


You might actually be better off printing it to a file instead 
of to a display, then opening that file in your browser or 
something, just to confirm the code printed is correctly 
displayed by the other program.



  [...]


prolly just printing `c` itself would work and if not try `c[]`

but then again it might see it as multiple graphemes, idk if it 
is even implemented.


Just 'c' didn't but 'c[]' seems like the thing to do! Thankies!

Terminal on osx, and yeah you're right. Seems like just trying to 
paste rainbow flag right in to terminal results in the 3 separate 
code points




Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-04 Thread aliak via Digitalmars-d-learn
On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer 
wrote:

On 7/3/18 10:37 AM, ag0aep6g wrote:

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:

foreach (c; "‍‍‍️‍") {
  writeln(c);
}

So basically the above just doesn't work. Prints gibberish.


Because you're printing one UTF-8 code unit (`char`) per line.

So I figured, std.uni.byGrapheme would help, since that's 
what they are, but I can't get it to print them back out? Is 
there a way?


foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);
}


You're looking for `c[]`. But that won't work, because std.uni 
apparently doesn't recognize those as grapheme clusters. The 
emojis may be too new. std.uni is based on Unicode version 
6.2, which is a couple years old.


Oops! I didn't realize this, ignore my message about reporting 
a bug.


I still think it's very odd for printing a grapheme to print 
the data structure.


-Steve



Aha, ok I see. Many gracias!

Though, seems by a couple years old you mean 6 years! :) Is 
updating unicode stuff to the latest a matter of some config file 
somewhere with the code point configurations that result in 
specific graphemes? Feels kinda ... quite bad that we're 6 years 
behind the current standard.


Also, any reason (technical or otherwise) that we have to slice a 
grapheme to get it printed? Or just no one implemented something 
like toString or the like? It's quite non intuitive as it is 
right now IMO. I can't really imagine anyone figuring out that 
they have to slice a grapheme to get it to print 樂


Cheers,
- Ali


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 7/3/18 10:37 AM, ag0aep6g wrote:

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:

foreach (c; "‍‍‍️‍") {
  writeln(c);
}

So basically the above just doesn't work. Prints gibberish.


Because you're printing one UTF-8 code unit (`char`) per line.

So I figured, std.uni.byGrapheme would help, since that's what they 
are, but I can't get it to print them back out? Is there a way?


foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);
}


You're looking for `c[]`. But that won't work, because std.uni 
apparently doesn't recognize those as grapheme clusters. The emojis may 
be too new. std.uni is based on Unicode version 6.2, which is a couple 
years old.


Oops! I didn't realize this, ignore my message about reporting a bug.

I still think it's very odd for printing a grapheme to print the data 
structure.


-Steve


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread Adam D. Ruppe via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:

So basically the above just doesn't work. Prints gibberish.


What system are you on? Successfully printing this stuff depends 
on a lot of display details too, like writeln goes to a 
terminal/console and they are rarely configured to support such 
characters by default.


You might actually be better off printing it to a file instead of 
to a display, then opening that file in your browser or 
something, just to confirm the code printed is correctly 
displayed by the other program.



foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);


prolly just printing `c` itself would work and if not try `c[]`

but then again it might see it as multiple graphemes, idk if it 
is even implemented.


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread ag0aep6g via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 13:36:56 UTC, aliak wrote:

Hehe I guess the forum really is using D :p

The two graphemes I'm talking about (which seem to not be 
rendered correctly above) are:


family emoji: https://emojipedia.org/family-woman-woman-boy-boy/
rainbow flag: https://emojipedia.org/rainbow-flag/


Looks like forum.dlang.org has a problem when they appear side 
by-side.


Works (in the preview): ‍‍‍ ️‍
Doesn't work: ‍‍‍️‍


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread ag0aep6g via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:

foreach (c; "‍‍‍️‍") {
  writeln(c);
}

So basically the above just doesn't work. Prints gibberish.


Because you're printing one UTF-8 code unit (`char`) per line.

So I figured, std.uni.byGrapheme would help, since that's what 
they are, but I can't get it to print them back out? Is there a 
way?


foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);
}


You're looking for `c[]`. But that won't work, because std.uni 
apparently doesn't recognize those as grapheme clusters. The 
emojis may be too new. std.uni is based on Unicode version 6.2, 
which is a couple years old.


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 7/3/18 9:32 AM, aliak wrote:
Hi, trying to figure out how to loop through a string of characters and 
then spit them back out.


Eg:

foreach (c; "‍‍‍️‍") {
   writeln(c);
}

So basically the above just doesn't work. Prints gibberish.

So I figured, std.uni.byGrapheme would help, since that's what they are, 
but I can't get it to print them back out? Is there a way?


foreach (c; "‍‍‍️‍".byGrapheme) {
   writeln(c.);
}

And then if I type the loop variable as dchar,  then it seems that the 
family empji is printed out as 4 faces - so the code points I guess - 
and the rainbow flag is other stuff (also its code points I assume)


Yeah, it appears that you can't actually print a grapheme. I would have 
assumed writeln(c) works. It does work, it just prints the struct data 
instead of converting back to utf.


Is there a type that I can use to store graphemes and then output them 
as a grapheme as well? Or do I have to use like lib ICU maybe or 
something similar?


I honestly can't figure it out. I think directly writing graphemes as 
viewable UTF was not something that was considered.


Definitely needs a bugzilla issue.

-Steve


Re: what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread aliak via Digitalmars-d-learn

On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
Hi, trying to figure out how to loop through a string of 
characters and then spit them back out.


Eg:

foreach (c; "‍‍‍️‍") {
  writeln(c);
}

So basically the above just doesn't work. Prints gibberish.

So I figured, std.uni.byGrapheme would help, since that's what 
they are, but I can't get it to print them back out? Is there a 
way?


foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);
}

And then if I type the loop variable as dchar,  then it seems  
that the family empji is printed out as 4 faces - so the code 
points I guess - and the rainbow flag is other stuff (also its 
code points I assume)


Is there a type that I can use to store graphemes and then 
output them as a grapheme as well? Or do I have to use like lib 
ICU maybe or something similar?


Cheers,
- Ali


Hehe I guess the forum really is using D :p

The two graphemes I'm talking about (which seem to not be 
rendered correctly above) are:


family emoji: https://emojipedia.org/family-woman-woman-boy-boy/
rainbow flag: https://emojipedia.org/rainbow-flag/



what's the correct way to handle unicode? - trying to print out graphemes here.

2018-07-03 Thread aliak via Digitalmars-d-learn
Hi, trying to figure out how to loop through a string of 
characters and then spit them back out.


Eg:

foreach (c; "‍‍‍️‍") {
  writeln(c);
}

So basically the above just doesn't work. Prints gibberish.

So I figured, std.uni.byGrapheme would help, since that's what 
they are, but I can't get it to print them back out? Is there a 
way?


foreach (c; "‍‍‍️‍".byGrapheme) {
  writeln(c.);
}

And then if I type the loop variable as dchar,  then it seems  
that the family empji is printed out as 4 faces - so the code 
points I guess - and the rainbow flag is other stuff (also its 
code points I assume)


Is there a type that I can use to store graphemes and then output 
them as a grapheme as well? Or do I have to use like lib ICU 
maybe or something similar?


Cheers,
- Ali