[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-03-03 Thread gary . willoughby
Go strings are UTF-8 encoded as others have mentioned. This means that each human readable character in the string is really a cluster of one or more runes. Some characters are made up of one rune, some are made up of many. Some runes combine with others to create different characters. Also,

[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Robert Johnstone
Hello, Strings are encoded using UTF-8, which is a multi-byte encoding. Different runes require different lengths to be encoded, and the prefix you noted is how that length is transmitted (although the ranges in your message don't seem to be correct). Robert On Tuesday, 28 February 2017

[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Fraser Hanson
I understand now, it's just the UTF-8 representation of these runes. Even though ascii 128-255 are representable as single bytes (e.g. 0x80), UTF-8 doesn't do it that way and prepends a byte. The results seen in my output are shown as the UTF-8 representation in the unicode tables:

[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Volker Dobler
Strings in Go are UTF-8, so it's expected. See e.g. http://www.fileformat.info/info/unicode/char/00a9/index.htm What bytes for © _would_ you expect? And why? V. Am Dienstag, 28. Februar 2017 18:29:07 UTC+1 schrieb Fraser Hanson: > > https://play.golang.org/p/05wZM9BhfB > > I'm working on some