[go-nuts] Re: Clarification on unsafe conversion between string <-> []byte

2019-10-07 Thread francis
Thank you Serhat :)

On Monday, October 7, 2019 at 10:04:05 AM UTC+2, Serhat Şevki Dinçer wrote:
>
> Turns out it did not really need uintptr so I switched to unsafe.Pointer. 
> Also added more tests.
>
> Thanks Francis..
>
> On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote:
>>
>> Serhat,
>>
>> That implementation looks very tidy. But it still uses uintptr. So it 
>> doesn't solve the GC problems discussed above.
>>
>> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
>> wrote:
>>>
>>> Hi,
>>>
>>> I wrote a string utility library  with 
>>> many tests to ensure the conversion assumptions are correct. It could be 
>>> helpful.
>>>
>>> Cheers..
>>>
>>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:

 I am looking at the correct way to convert from a byte slice to a 
 string and back with no allocations. All very unsafe.

 I think these two cases are fairly symmetrical. So to simplify the 
 discussion below I will only talk about converting from a string to []byte.

 func StringToBytes(s string) (b []byte)

 From what I have read it is currently not clear how to perform this 
 correctly.

 When I say correctly I mean that the function returns a `[]byte` which 
 contains all of and only the bytes in the string and never confuses the 
 garbage collector. We fully expect that the `[]byte` returned will contain 
 the same underlying memory as the string and modifying its contents will 
 modify the string, making the string dangerously mutable. We are 
 comfortable with the dangerously mutable string.

 Following the directions in unsafe you _might_ think that this would be 
 a good solution.

 func StringToBytes(s string) []byte {
 return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
 Data: 
 (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
 Len:  len(s),
 Cap:  len(s),
 }))
 }

 The line

 Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,

 here is a really carefully commented example of this approach from 
 github

 seems to satisfy unsafe  rule 5 about 
 converting uintptr to and from unsafe.Pointer in a singe expression.

 However, it clearly violates rule 6 which states `...SliceHeader and 
 StringHeader are only valid when interpreting the content of an actual 
 slice or string value.`. The `[]byte` we are returning here is built from 
 a 
 `&reflect.SliceHeader` and not based on an existing `[]byte`.

 So we can switch to

 func StringToBytes(s string) (b []byte) {
 stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
 sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
 sliceHeader.Data = stringHeader.Data
 sliceHeader.Len = stringHeader.Len
 sliceHeader.Cap = stringHeader.Len
 return b
 }

 Now we are using an existing []byte to build `sliceHeader` which is 
 good. But we end up with a new problem. sliceHeader.Data and 
 stringHeader.Data are both uintptr. So by creating them in one expression 
 and then writing them in another expression we violate the rule that 
 `uintptr cannot be stored in variable`.

 There is a possible sense that we are protected because both of our 
 `uinptr`s are actually real pointers inside a real string and []byte. This 
 seems to be indicated by the line `In this usage hdr.Data is really an 
 alternate way to refer to the underlying pointer in the string header, not 
 a uintptr variable itself.`

 This feels very unclear to me.

 In particular the code example in the unsafe package

 var s string
 hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
 hdr.Data = uintptr(unsafe.Pointer(p))  // case 6 (this case)
 hdr.Len = n


 is not the same as the case we are dealing with here. Specifically in 
 the unsafe package documentation we are writing from a uintpr stored in a 
 separate variable to another uinptr. They are probably very similar in 
 practice, but it isn't obvious and in my experience subtly incorrect code 
 often comes from relying on vague understandings of important documents.

 If we assume that our uinptrs are safe because they are backed by real 
 pointers then there is another issue with our string being garbage 
 collected.

 func StringToBytes(s string) (b []byte) {
 stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
 // Our string is no longer referenced anywhere and could 
 potentially be garbage collected
 sliceHeader := (*reflect.SliceHeader)(unsafe.Pointe

[go-nuts] Re: Clarification on unsafe conversion between string <-> []byte

2019-10-07 Thread Serhat Şevki Dinçer
Turns out it did not really need uintptr so I switched to unsafe.Pointer. 
Also added more tests.

Thanks Francis..

On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote:
>
> Serhat,
>
> That implementation looks very tidy. But it still uses uintptr. So it 
> doesn't solve the GC problems discussed above.
>
> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
> wrote:
>>
>> Hi,
>>
>> I wrote a string utility library  with 
>> many tests to ensure the conversion assumptions are correct. It could be 
>> helpful.
>>
>> Cheers..
>>
>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>>
>>> I am looking at the correct way to convert from a byte slice to a string 
>>> and back with no allocations. All very unsafe.
>>>
>>> I think these two cases are fairly symmetrical. So to simplify the 
>>> discussion below I will only talk about converting from a string to []byte.
>>>
>>> func StringToBytes(s string) (b []byte)
>>>
>>> From what I have read it is currently not clear how to perform this 
>>> correctly.
>>>
>>> When I say correctly I mean that the function returns a `[]byte` which 
>>> contains all of and only the bytes in the string and never confuses the 
>>> garbage collector. We fully expect that the `[]byte` returned will contain 
>>> the same underlying memory as the string and modifying its contents will 
>>> modify the string, making the string dangerously mutable. We are 
>>> comfortable with the dangerously mutable string.
>>>
>>> Following the directions in unsafe you _might_ think that this would be 
>>> a good solution.
>>>
>>> func StringToBytes(s string) []byte {
>>> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>>> Data: 
>>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>> Len:  len(s),
>>> Cap:  len(s),
>>> }))
>>> }
>>>
>>> The line
>>>
>>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>
>>> here is a really carefully commented example of this approach from github
>>>
>>> seems to satisfy unsafe  rule 5 about 
>>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>>
>>> However, it clearly violates rule 6 which states `...SliceHeader and 
>>> StringHeader are only valid when interpreting the content of an actual 
>>> slice or string value.`. The `[]byte` we are returning here is built from a 
>>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>>
>>> So we can switch to
>>>
>>> func StringToBytes(s string) (b []byte) {
>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>> sliceHeader.Data = stringHeader.Data
>>> sliceHeader.Len = stringHeader.Len
>>> sliceHeader.Cap = stringHeader.Len
>>> return b
>>> }
>>>
>>> Now we are using an existing []byte to build `sliceHeader` which is 
>>> good. But we end up with a new problem. sliceHeader.Data and 
>>> stringHeader.Data are both uintptr. So by creating them in one expression 
>>> and then writing them in another expression we violate the rule that 
>>> `uintptr cannot be stored in variable`.
>>>
>>> There is a possible sense that we are protected because both of our 
>>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>>> seems to be indicated by the line `In this usage hdr.Data is really an 
>>> alternate way to refer to the underlying pointer in the string header, not 
>>> a uintptr variable itself.`
>>>
>>> This feels very unclear to me.
>>>
>>> In particular the code example in the unsafe package
>>>
>>> var s string
>>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>>> hdr.Data = uintptr(unsafe.Pointer(p))  // case 6 (this case)
>>> hdr.Len = n
>>>
>>>
>>> is not the same as the case we are dealing with here. Specifically in 
>>> the unsafe package documentation we are writing from a uintpr stored in a 
>>> separate variable to another uinptr. They are probably very similar in 
>>> practice, but it isn't obvious and in my experience subtly incorrect code 
>>> often comes from relying on vague understandings of important documents.
>>>
>>> If we assume that our uinptrs are safe because they are backed by real 
>>> pointers then there is another issue with our string being garbage 
>>> collected.
>>>
>>> func StringToBytes(s string) (b []byte) {
>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>> // Our string is no longer referenced anywhere and could 
>>> potentially be garbage collected
>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>> sliceHeader.Data = stringHeader.Data
>>> sliceHeader.Len = stringHeader.Len
>>> sliceHeader.Cap = stringHeader.Len
>>> return b
>>> }
>>>
>>>
>>> There is a discussion where this potent

[go-nuts] Re: Clarification on unsafe conversion between string <-> []byte

2019-10-04 Thread francis
I wrote a self contained implementation of the solution described by Keith 
Randall.

It seemed at the end of this long thread it would be nice to have something 
concrete to look at.

https://github.com/fmstephe/unsafeutil

On Friday, October 4, 2019 at 11:38:22 AM UTC+2, Francis Stephens wrote:
>
> Serhat,
>
> That implementation looks very tidy. But it still uses uintptr. So it 
> doesn't solve the GC problems discussed above.
>
> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
> wrote:
>>
>> Hi,
>>
>> I wrote a string utility library  with 
>> many tests to ensure the conversion assumptions are correct. It could be 
>> helpful.
>>
>> Cheers..
>>
>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>>
>>> I am looking at the correct way to convert from a byte slice to a string 
>>> and back with no allocations. All very unsafe.
>>>
>>> I think these two cases are fairly symmetrical. So to simplify the 
>>> discussion below I will only talk about converting from a string to []byte.
>>>
>>> func StringToBytes(s string) (b []byte)
>>>
>>> From what I have read it is currently not clear how to perform this 
>>> correctly.
>>>
>>> When I say correctly I mean that the function returns a `[]byte` which 
>>> contains all of and only the bytes in the string and never confuses the 
>>> garbage collector. We fully expect that the `[]byte` returned will contain 
>>> the same underlying memory as the string and modifying its contents will 
>>> modify the string, making the string dangerously mutable. We are 
>>> comfortable with the dangerously mutable string.
>>>
>>> Following the directions in unsafe you _might_ think that this would be 
>>> a good solution.
>>>
>>> func StringToBytes(s string) []byte {
>>> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>>> Data: 
>>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>> Len:  len(s),
>>> Cap:  len(s),
>>> }))
>>> }
>>>
>>> The line
>>>
>>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>
>>> here is a really carefully commented example of this approach from github
>>>
>>> seems to satisfy unsafe  rule 5 about 
>>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>>
>>> However, it clearly violates rule 6 which states `...SliceHeader and 
>>> StringHeader are only valid when interpreting the content of an actual 
>>> slice or string value.`. The `[]byte` we are returning here is built from a 
>>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>>
>>> So we can switch to
>>>
>>> func StringToBytes(s string) (b []byte) {
>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>> sliceHeader.Data = stringHeader.Data
>>> sliceHeader.Len = stringHeader.Len
>>> sliceHeader.Cap = stringHeader.Len
>>> return b
>>> }
>>>
>>> Now we are using an existing []byte to build `sliceHeader` which is 
>>> good. But we end up with a new problem. sliceHeader.Data and 
>>> stringHeader.Data are both uintptr. So by creating them in one expression 
>>> and then writing them in another expression we violate the rule that 
>>> `uintptr cannot be stored in variable`.
>>>
>>> There is a possible sense that we are protected because both of our 
>>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>>> seems to be indicated by the line `In this usage hdr.Data is really an 
>>> alternate way to refer to the underlying pointer in the string header, not 
>>> a uintptr variable itself.`
>>>
>>> This feels very unclear to me.
>>>
>>> In particular the code example in the unsafe package
>>>
>>> var s string
>>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>>> hdr.Data = uintptr(unsafe.Pointer(p))  // case 6 (this case)
>>> hdr.Len = n
>>>
>>>
>>> is not the same as the case we are dealing with here. Specifically in 
>>> the unsafe package documentation we are writing from a uintpr stored in a 
>>> separate variable to another uinptr. They are probably very similar in 
>>> practice, but it isn't obvious and in my experience subtly incorrect code 
>>> often comes from relying on vague understandings of important documents.
>>>
>>> If we assume that our uinptrs are safe because they are backed by real 
>>> pointers then there is another issue with our string being garbage 
>>> collected.
>>>
>>> func StringToBytes(s string) (b []byte) {
>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>> // Our string is no longer referenced anywhere and could 
>>> potentially be garbage collected
>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>> sliceHeader.Data = stringHeader.Data
>>> sliceHeader.Len = stringHeader.Len
>>> sliceH

[go-nuts] Re: Clarification on unsafe conversion between string <-> []byte

2019-10-04 Thread francis
Serhat,

That implementation looks very tidy. But it still uses uintptr. So it 
doesn't solve the GC problems discussed above.

On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
wrote:
>
> Hi,
>
> I wrote a string utility library  with many 
> tests to ensure the conversion assumptions are correct. It could be helpful.
>
> Cheers..
>
> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>
>> I am looking at the correct way to convert from a byte slice to a string 
>> and back with no allocations. All very unsafe.
>>
>> I think these two cases are fairly symmetrical. So to simplify the 
>> discussion below I will only talk about converting from a string to []byte.
>>
>> func StringToBytes(s string) (b []byte)
>>
>> From what I have read it is currently not clear how to perform this 
>> correctly.
>>
>> When I say correctly I mean that the function returns a `[]byte` which 
>> contains all of and only the bytes in the string and never confuses the 
>> garbage collector. We fully expect that the `[]byte` returned will contain 
>> the same underlying memory as the string and modifying its contents will 
>> modify the string, making the string dangerously mutable. We are 
>> comfortable with the dangerously mutable string.
>>
>> Following the directions in unsafe you _might_ think that this would be a 
>> good solution.
>>
>> func StringToBytes(s string) []byte {
>> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>> Len:  len(s),
>> Cap:  len(s),
>> }))
>> }
>>
>> The line
>>
>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>
>> here is a really carefully commented example of this approach from github
>>
>> seems to satisfy unsafe  rule 5 about 
>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>
>> However, it clearly violates rule 6 which states `...SliceHeader and 
>> StringHeader are only valid when interpreting the content of an actual 
>> slice or string value.`. The `[]byte` we are returning here is built from a 
>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>
>> So we can switch to
>>
>> func StringToBytes(s string) (b []byte) {
>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>> sliceHeader.Data = stringHeader.Data
>> sliceHeader.Len = stringHeader.Len
>> sliceHeader.Cap = stringHeader.Len
>> return b
>> }
>>
>> Now we are using an existing []byte to build `sliceHeader` which is good. 
>> But we end up with a new problem. sliceHeader.Data and stringHeader.Data 
>> are both uintptr. So by creating them in one expression and then writing 
>> them in another expression we violate the rule that `uintptr cannot be 
>> stored in variable`.
>>
>> There is a possible sense that we are protected because both of our 
>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>> seems to be indicated by the line `In this usage hdr.Data is really an 
>> alternate way to refer to the underlying pointer in the string header, not 
>> a uintptr variable itself.`
>>
>> This feels very unclear to me.
>>
>> In particular the code example in the unsafe package
>>
>> var s string
>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>> hdr.Data = uintptr(unsafe.Pointer(p))  // case 6 (this case)
>> hdr.Len = n
>>
>>
>> is not the same as the case we are dealing with here. Specifically in the 
>> unsafe package documentation we are writing from a uintpr stored in a 
>> separate variable to another uinptr. They are probably very similar in 
>> practice, but it isn't obvious and in my experience subtly incorrect code 
>> often comes from relying on vague understandings of important documents.
>>
>> If we assume that our uinptrs are safe because they are backed by real 
>> pointers then there is another issue with our string being garbage 
>> collected.
>>
>> func StringToBytes(s string) (b []byte) {
>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>> // Our string is no longer referenced anywhere and could 
>> potentially be garbage collected
>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>> sliceHeader.Data = stringHeader.Data
>> sliceHeader.Len = stringHeader.Len
>> sliceHeader.Cap = stringHeader.Len
>> return b
>> }
>>
>>
>> There is a discussion where this potential problem is raised
>>
>> https://github.com/golang/go/issues/25484
>>
>> we also see this issue mentioned in
>>
>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ
>>
>> The solution of 
>>
>> func StringToBytes(s string) (b []byte) {
>> stringHeader := (*reflect.StringHeader)(unsafe.P

[go-nuts] Re: Clarification on unsafe conversion between string <-> []byte

2019-09-26 Thread Serhat Şevki Dinçer
Hi,

I wrote a string utility library  with many 
tests to ensure the conversion assumptions are correct. It could be helpful.

Cheers..

On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>
> I am looking at the correct way to convert from a byte slice to a string 
> and back with no allocations. All very unsafe.
>
> I think these two cases are fairly symmetrical. So to simplify the 
> discussion below I will only talk about converting from a string to []byte.
>
> func StringToBytes(s string) (b []byte)
>
> From what I have read it is currently not clear how to perform this 
> correctly.
>
> When I say correctly I mean that the function returns a `[]byte` which 
> contains all of and only the bytes in the string and never confuses the 
> garbage collector. We fully expect that the `[]byte` returned will contain 
> the same underlying memory as the string and modifying its contents will 
> modify the string, making the string dangerously mutable. We are 
> comfortable with the dangerously mutable string.
>
> Following the directions in unsafe you _might_ think that this would be a 
> good solution.
>
> func StringToBytes(s string) []byte {
> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
> Len:  len(s),
> Cap:  len(s),
> }))
> }
>
> The line
>
> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>
> here is a really carefully commented example of this approach from github
>
> seems to satisfy unsafe  rule 5 about 
> converting uintptr to and from unsafe.Pointer in a singe expression.
>
> However, it clearly violates rule 6 which states `...SliceHeader and 
> StringHeader are only valid when interpreting the content of an actual 
> slice or string value.`. The `[]byte` we are returning here is built from a 
> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>
> So we can switch to
>
> func StringToBytes(s string) (b []byte) {
> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
> sliceHeader.Data = stringHeader.Data
> sliceHeader.Len = stringHeader.Len
> sliceHeader.Cap = stringHeader.Len
> return b
> }
>
> Now we are using an existing []byte to build `sliceHeader` which is good. 
> But we end up with a new problem. sliceHeader.Data and stringHeader.Data 
> are both uintptr. So by creating them in one expression and then writing 
> them in another expression we violate the rule that `uintptr cannot be 
> stored in variable`.
>
> There is a possible sense that we are protected because both of our 
> `uinptr`s are actually real pointers inside a real string and []byte. This 
> seems to be indicated by the line `In this usage hdr.Data is really an 
> alternate way to refer to the underlying pointer in the string header, not 
> a uintptr variable itself.`
>
> This feels very unclear to me.
>
> In particular the code example in the unsafe package
>
> var s string
> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
> hdr.Data = uintptr(unsafe.Pointer(p))  // case 6 (this case)
> hdr.Len = n
>
>
> is not the same as the case we are dealing with here. Specifically in the 
> unsafe package documentation we are writing from a uintpr stored in a 
> separate variable to another uinptr. They are probably very similar in 
> practice, but it isn't obvious and in my experience subtly incorrect code 
> often comes from relying on vague understandings of important documents.
>
> If we assume that our uinptrs are safe because they are backed by real 
> pointers then there is another issue with our string being garbage 
> collected.
>
> func StringToBytes(s string) (b []byte) {
> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
> // Our string is no longer referenced anywhere and could 
> potentially be garbage collected
> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
> sliceHeader.Data = stringHeader.Data
> sliceHeader.Len = stringHeader.Len
> sliceHeader.Cap = stringHeader.Len
> return b
> }
>
>
> There is a discussion where this potential problem is raised
>
> https://github.com/golang/go/issues/25484
>
> we also see this issue mentioned in
>
> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ
>
> The solution of 
>
> func StringToBytes(s string) (b []byte) {
> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
> // Our string is no longer referenced anywhere and could 
> potentially be garbage collected
> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
> sliceHeader.Data = stringHeader.Data
> sliceHeader.Len = stringHeader.Len
> sliceHeader.Cap = stringHeader.Len
>