Re: [go-nuts] `for i, v := range` causes allocation

2021-04-12 Thread Luke Wilson

On 4/12/21 11:31 AM, Jan Mercl wrote:

I believe no silent allocation and no conversion to a slice of runes
occurs. A single instance of variable c, of type rune, exists within
the loop. There's no problem with modifying 'c'. A problem exists if
the _address_ of 'c' is assumed to point to different variables in
each cycle. That's not the case.


And Ian Lance Taylor:


There are different Go compilers, but I'm not aware of any compiler
that implements range over a string by first converting the string to
a slice of runes.  The default Go compiler implements this by calling
runtime.decoderune in a loop to fetch each new rune.  You can see that
code athttps://golang.org/src/runtime/utf8.go#L51.  There won't be
any additional allocation in this statement.


Sweet, I am very excited to hear this. Thanks Ian, and Jan. I'll let 
people know.


--
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4432d4a3-1722-0efc-d414-33f76c0fd3d3%40gmail.com.


Re: [go-nuts] `for i, v := range` causes allocation

2021-04-12 Thread Ian Lance Taylor
On Mon, Apr 12, 2021 at 9:01 AM Luke Wilson  wrote:
>
> I've heard several times from members of the community (on Matrix and 
> possibly on answers) that a simple iteration like
>
> const mixed = "\b5Ὂg̀9! ℃ᾭG"
> for _, c := range mixed {
> ... do something with c (but not write to it)
>
> will actually silently allocate a slice of runes and decode the string into 
> it, before iteration. I've heard it is done to prevent problems that occur 
> when a programmer might overwrite data being iterated, which should be a 
> no-brainer for programmers in general, but sure, whatever. So is it true in 
> the case for constants? Is it true always, or only when writes occur to the 
> source string or `c` in that case?

There are different Go compilers, but I'm not aware of any compiler
that implements range over a string by first converting the string to
a slice of runes.  The default Go compiler implements this by calling
runtime.decoderune in a loop to fetch each new rune.  You can see that
code at https://golang.org/src/runtime/utf8.go#L51.  There won't be
any additional allocation in this statement.


> Since hearing of this, I started doing all of my utf-8 iteration of string 
> runes as such:
>
> import "unicode/utf8"
> str := "Some UTF-8 string."
> var i int
> for i < len(str) {
> r, size := utf8.DecodeRune(str[i:])
> // do something with r...
> i += size
> }
>
> Which admittedly is a bit of a burden, and possibly premature optimization if 
> I'm wrong. But I just think it shouldn't silently allocate a ginormous slice 
> in the background to iterate the runes of a string I might not read all of, 
> and especially considering `for i, v := range` should be idiomatic.

There is nothing wrong with that code, but it's essentially the same
as what the simple range over a string does.


> In addition, I wonder if it's the same for other types being iterated? What 
> if a byte slice for example has some bytes modified during iteration? It must 
> create a copy, but it shouldn't copy for loops that do not write to the 
> iterated variable.

Ranging over a slice does not create a copy of the slice.

However, ranging over an array (not a pointer to an array) does, in
the general case, create a copy of the array.  That is required by the
semantics of range.  In some cases that copy can be optimized away.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXb6GESuPDTHBXR-5bL1Y6G_UoJFG_wD5%2B2S4j1n7VP2g%40mail.gmail.com.


Re: [go-nuts] `for i, v := range` causes allocation

2021-04-12 Thread Jan Mercl
On Mon, Apr 12, 2021 at 6:01 PM Luke Wilson  wrote:

> I've heard several times from members of the community (on Matrix and 
> possibly on answers) that a simple iteration like
>
> const mixed = "\b5Ὂg̀9! ℃ᾭG"
> for _, c := range mixed {
> ... do something with c (but not write to it)
>
> will actually silently allocate a slice of runes and decode the string into 
> it, before iteration. I've heard it is done to prevent problems that occur 
> when a programmer might overwrite data being iterated

I believe no silent allocation and no conversion to a slice of runes
occurs. A single instance of variable c, of type rune, exists within
the loop. There's no problem with modifying 'c'. A problem exists if
the _address_ of 'c' is assumed to point to different variables in
each cycle. That's not the case.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAA40n-Wfk%3DSEFxvb%3Dqot9dpC%3Dz6BM48dzPzwct76v2ceCJoNVw%40mail.gmail.com.


[go-nuts] `for i, v := range` causes allocation

2021-04-12 Thread Luke Wilson
I've heard several times from members of the community (on Matrix and 
possibly on answers) that a simple iteration like


const mixed = "\b5Ὂg̀9! ℃ᾭG"
for _, c := range mixed {
... do something with c (but not write to it)

will actually silently allocate a slice of runes and decode the string 
into it, before iteration. I've heard it is done to prevent problems 
that occur when a programmer might overwrite data being iterated, which 
should be a no-brainer for programmers in general, but sure, whatever. 
So is it true in the case for constants? Is it true always, or only when 
writes occur to the source string or `c` in that case?


And if it always occurs, wouldn't it be great optimization to only 
decode runes when we get to them?


Since hearing of this, I started doing all of my utf-8 iteration of 
string runes as such:


import "unicode/utf8"
str := "Some UTF-8 string."
var i int
for i < len(str) {
r, size := utf8.DecodeRune(str[i:])
// do something with r...
i += size
}

Which admittedly is a bit of a burden, and possibly premature 
optimization if I'm wrong. But I just think it shouldn't silently 
allocate a ginormous slice in the background to iterate the runes of a 
string I might not read all of, and especially considering `for i, v := 


range` should be idiomatic.

I tried to make a test on Compiler Explorer 
 
modifying the string being iterated, and not. I learned that I can't 
read the assembly gc produces. But what I gathered is that it takes a 
pointer to the original string and actually calls runtime.decoderune on 
each iteration. Modifications to mixed change variable mixed (to point 
to the newly allocated string(s)), while again, the original string 
pointer is kept in stack while iterating. I'd like to have somebody who 
can read the assembly confirm.


If that's the case, then it completely voids my argument and concerns. 
And will make many people, including myself, very happy to learn that it 
is in fact optimal.


In addition, I wonder if it's the same for other types being iterated? 
What if a byte slice for example has some bytes modified during 
iteration? It must create a copy, but it shouldn't copy for loops that 
do not write to the iterated variable.


Sorry for the rant, I'm passionate about it.

Luke

--
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4390361c-288c-4c55-91c9-e9f53ffd74a2%40gmail.com.