Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 14:39:34 UTC, ag0aep6g wrote: Looks like forum.dlang.org has a problem when they appear side by-side. Works (in the preview): ️ Doesn't work: ️ For me, it looks as the used font has ligatures for these faces. Mozilla under Linux, I guess it's 'EmojiOne Mozilla' font.
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On 07/04/2018 05:12 PM, aliak wrote: Is updating unicode stuff to the latest a matter of some config file somewhere with the code point configurations that result in specific graphemes? I don't know. [...] Also, any reason (technical or otherwise) that we have to slice a grapheme to get it printed? Or just no one implemented something like toString or the like? I don't know. [...] I can't really imagine anyone figuring out that they have to slice a grapheme to get it to print 樂 You can figure it out by reading the documentation for `Grapheme`. However, the documentation doesn't make it clear that `byGrapheme` is a range of `Grapheme`s. That's an easy fix, though: https://github.com/dlang/phobos/pull/6627
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 14:37:32 UTC, Adam D. Ruppe wrote: On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: [...] What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program. [...] prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented. Just 'c' didn't but 'c[]' seems like the thing to do! Thankies! Terminal on osx, and yeah you're right. Seems like just trying to paste rainbow flag right in to terminal results in the 3 separate code points
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer wrote: On 7/3/18 10:37 AM, ag0aep6g wrote: On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. Because you're printing one UTF-8 code unit (`char`) per line. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old. Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steve Aha, ok I see. Many gracias! Though, seems by a couple years old you mean 6 years! :) Is updating unicode stuff to the latest a matter of some config file somewhere with the code point configurations that result in specific graphemes? Feels kinda ... quite bad that we're 6 years behind the current standard. Also, any reason (technical or otherwise) that we have to slice a grapheme to get it printed? Or just no one implemented something like toString or the like? It's quite non intuitive as it is right now IMO. I can't really imagine anyone figuring out that they have to slice a grapheme to get it to print 樂 Cheers, - Ali
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On 7/3/18 10:37 AM, ag0aep6g wrote: On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. Because you're printing one UTF-8 code unit (`char`) per line. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old. Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steve
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: So basically the above just doesn't work. Prints gibberish. What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program. foreach (c; "️".byGrapheme) { writeln(c.); prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented.
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 13:36:56 UTC, aliak wrote: Hehe I guess the forum really is using D :p The two graphemes I'm talking about (which seem to not be rendered correctly above) are: family emoji: https://emojipedia.org/family-woman-woman-boy-boy/ rainbow flag: https://emojipedia.org/rainbow-flag/ Looks like forum.dlang.org has a problem when they appear side by-side. Works (in the preview): ️ Doesn't work: ️
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. Because you're printing one UTF-8 code unit (`char`) per line. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On 7/3/18 9:32 AM, aliak wrote: Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume) Yeah, it appears that you can't actually print a grapheme. I would have assumed writeln(c) works. It does work, it just prints the struct data instead of converting back to utf. Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar? I honestly can't figure it out. I think directly writing graphemes as viewable UTF was not something that was considered. Definitely needs a bugzilla issue. -Steve
Re: what's the correct way to handle unicode? - trying to print out graphemes here.
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote: Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume) Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar? Cheers, - Ali Hehe I guess the forum really is using D :p The two graphemes I'm talking about (which seem to not be rendered correctly above) are: family emoji: https://emojipedia.org/family-woman-woman-boy-boy/ rainbow flag: https://emojipedia.org/rainbow-flag/
what's the correct way to handle unicode? - trying to print out graphemes here.
Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "️") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "️".byGrapheme) { writeln(c.); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume) Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar? Cheers, - Ali