Re: [go-nuts] understanding utf-8 for a newbie

2017-05-07 Thread Jan Mercl
On Sun, May 7, 2017 at 8:39 PM peterGo wrote: > "[Rob Pike and Ken Thompson] they made sure it was backwards compatible with ASCII." > ASCII is 7-bits. So is any UTF-8 encoded ASCII. -- -j -- You received this message because you are subscribed to the Google Groups

Re: [go-nuts] understanding utf-8 for a newbie

2017-05-07 Thread peterGo
Sam, "[Rob Pike and Ken Thompson] they made sure it was backwards compatible with ASCII." ASCII is 7-bits. Peter On Sunday, May 7, 2017 at 11:29:53 AM UTC-4, Sam Whited wrote: > > On Sun, May 7, 2017 at 9:44 AM, rob solomon > wrote: > > I now understand that the bytes

Re: [go-nuts] understanding utf-8 for a newbie

2017-05-07 Thread peterGo
Sam, "I'd be suprised if Windows didn't understand UTF-8 these days," Be surprised! For Unicode, Microsoft Windows uses UTF-16. Peter On Sunday, May 7, 2017 at 11:29:53 AM UTC-4, Sam Whited wrote: > > On Sun, May 7, 2017 at 9:44 AM, rob solomon > wrote: > > I now

Re: [go-nuts] understanding utf-8 for a newbie

2017-05-07 Thread Sam Whited
On Sun, May 7, 2017 at 9:44 AM, rob solomon wrote: > I now understand that the bytes may be different. It's also worth noting that when Ken Thompson and Rob Pike (yes, the same Rob Pike and Ken Thompson that created Go) created UTF-8, they made sure it was backwards

[go-nuts] understanding utf-8 for a newbie

2017-05-07 Thread rob solomon
Thanks to those who answered. I grew up in the EBCDIC vs ASCII era, and I've always expected that the bytes in the file were the same as those that represented a character. I now understand that the bytes may be different. Thanks guys. -- rob solomon -- You received this message because

Re: [go-nuts] understanding utf-8 for a newbie

2017-05-06 Thread Sam Whited
On Fri, May 5, 2017 at 8:11 PM, rob solomon wrote: > I decided to first change ", ' and emdash characters. Using hexdump -C in > Ubuntu, the runes in the file are: > > open quote = 0xE2809C > > close quote = 0xE2809D > > apostrophe = 0xE28099 > > emdash = 0xE28094 The

Re: [go-nuts] understanding utf-8 for a newbie

2017-05-05 Thread Andy Balholm
Hexdump shows the actual bytes in the fileā€”the UTF-8 encoding of the runes (Unicode code points). Apparently you are reading them with utf8.DecodeRune or something like that; those return the code points, without the UTF-8 encoding. Andy -- You received this message because you are subscribed

[go-nuts] understanding utf-8 for a newbie

2017-05-05 Thread rob solomon
Hi. I decided to write a small program in Go to convert utf8 to simple ASCII. This need arose by my copying a file created in Ubuntu 16.04 amd64, and used on a win10 computer. I decided to first change ", ' and emdash characters. Using hexdump -C in Ubuntu, the runes in the file are: