[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-03-03 Thread gary . willoughby
Go strings are UTF-8 encoded as others have mentioned. This means that each 
human readable character in the string is really a cluster of one or more 
runes. Some characters are made up of one rune, some are made up of many. 
Some runes combine with others to create different characters. Also, runes 
don't have a preset size in bytes, some are made up of one byte, others are 
made up of more.

In your example, the character © is made up of one rune, which is defined 
using two bytes, each with the values 0xc2 and x0a9 respectively.

On Tuesday, 28 February 2017 17:29:07 UTC, Fraser Hanson wrote:
>
> https://play.golang.org/p/05wZM9BhfB
>
> I'm working on some code that reads UTF32 and converts it to go strings. 
> I'm finding some surprising behavior when casting slices of runes to 
> strings.
>
>  runes := []rune{'©'}
>  fmt.Printf(" cast to string: (%s)\n", string(runes))
>  fmt.Printf("bytes in string: (%x)\n", string(runes))
> Output:
>
>  cast to string: (©)
> bytes in string: (c2a9) // <-- where's the C2 byte coming from??
>
>
> The weird part is that casting the rune slice to a string causes it to 
> pick up an additional leading character. 
>
> runesi 0x00-0x7f get nothing prepended.
> runes 0x80-0xbf gets a leading c2 byte as seen above.
> runes 0xc0-0xff gets a leading c3 byte.
> rune 0x100 gets a leading c4 byte.  Seems like a pattern here.
>
> The same thing happens if I add the runes into a bytes.Buffer with 
> WriteRune(), then print it out with bytes.Buffer.String().
>
> Can anyone explain this?  
> What's the correct way to convert a slice of runes into a string?
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Robert Johnstone
Hello,

Strings are encoded using UTF-8, which is a multi-byte encoding.  Different 
runes require different lengths to be encoded, and the prefix you noted is 
how that length is transmitted (although the ranges in your message don't 
seem to be correct).

Robert


On Tuesday, 28 February 2017 12:29:07 UTC-5, Fraser Hanson wrote:
>
> https://play.golang.org/p/05wZM9BhfB
>
> I'm working on some code that reads UTF32 and converts it to go strings. 
> I'm finding some surprising behavior when casting slices of runes to 
> strings.
>
>  runes := []rune{'©'}
>  fmt.Printf(" cast to string: (%s)\n", string(runes))
>  fmt.Printf("bytes in string: (%x)\n", string(runes))
> Output:
>
>  cast to string: (©)
> bytes in string: (c2a9) // <-- where's the C2 byte coming from??
>
>
> The weird part is that casting the rune slice to a string causes it to 
> pick up an additional leading character. 
>
> runesi 0x00-0x7f get nothing prepended.
> runes 0x80-0xbf gets a leading c2 byte as seen above.
> runes 0xc0-0xff gets a leading c3 byte.
> rune 0x100 gets a leading c4 byte.  Seems like a pattern here.
>
> The same thing happens if I add the runes into a bytes.Buffer with 
> WriteRune(), then print it out with bytes.Buffer.String().
>
> Can anyone explain this?  
> What's the correct way to convert a slice of runes into a string?
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Fraser Hanson
I understand now, it's just the UTF-8 representation of these runes.

Even though ascii 128-255 are representable as single bytes (e.g. 0x80), 
UTF-8 doesn't do it that way and prepends a byte.
The results seen in my output are shown as the UTF-8 representation in the 
unicode tables:

https://unicode-table.com/en/0080/
https://unicode-table.com/en/00FF/

As described in the go docs, casting anything to a string results in UTF-8. 
 


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: casting slice of rune to string picks up extra characters for some inputs

2017-02-28 Thread Volker Dobler
Strings in Go are UTF-8, so it's expected.
See e.g. http://www.fileformat.info/info/unicode/char/00a9/index.htm
What bytes for © _would_ you expect? And why?

V.

Am Dienstag, 28. Februar 2017 18:29:07 UTC+1 schrieb Fraser Hanson:
>
> https://play.golang.org/p/05wZM9BhfB
>
> I'm working on some code that reads UTF32 and converts it to go strings. 
> I'm finding some surprising behavior when casting slices of runes to 
> strings.
>
>  runes := []rune{'©'}
>  fmt.Printf(" cast to string: (%s)\n", string(runes))
>  fmt.Printf("bytes in string: (%x)\n", string(runes))
> Output:
>
>  cast to string: (©)
> bytes in string: (c2a9) // <-- where's the C2 byte coming from??
>
>
> The weird part is that casting the rune slice to a string causes it to 
> pick up an additional leading character. 
>
> runesi 0x00-0x7f get nothing prepended.
> runes 0x80-0xbf gets a leading c2 byte as seen above.
> runes 0xc0-0xff gets a leading c3 byte.
> rune 0x100 gets a leading c4 byte.  Seems like a pattern here.
>
> The same thing happens if I add the runes into a bytes.Buffer with 
> WriteRune(), then print it out with bytes.Buffer.String().
>
> Can anyone explain this?  
> What's the correct way to convert a slice of runes into a string?
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.