[go-nuts] Re: Golang wrapper for wolfSSL

2022-06-22 Thread Vitaly Isaev
Hi! Thanks for announcing, could you please clarify if WolfSSL supports 
BLAKE3 and are there any benchmarks comparing with default BLAKE3 
implementation?

среда, 22 июня 2022 г. в 02:10:54 UTC+3, Lealem Amedie: 

> Checkout the blog announcing the Golang wrapper for the wolfSSL SSL/TLS 
> crypto library!
>
> https://www.wolfssl.com/wolfssl-golang-wrapper/
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/29f1eda7-adf4-4fd1-9558-71fd3ceef556n%40googlegroups.com.


[go-nuts] Re: Go allocator: allocates arenas, reclaims pages?

2022-06-21 Thread Vitaly Isaev
Michael, many thanks for such a comprehensive description!

вторник, 21 июня 2022 г. в 04:50:23 UTC+3, Michael Knyszek: 

> Just to clarify, when I said "publicly visible" I meant via blog posts and 
> talks. There are a few design 
> 
>  
> documents 
> 
>  
> and runtime-internal comments 
> 
>  
> that go into more depth.
>
> On Monday, June 20, 2022 at 5:46:36 PM UTC-4 Michael Knyszek wrote:
>
>> Thanks for the question. The scavenger isn't as publicly visible as other 
>> parts of the runtime. You've got it mostly right, but I'm going to repeat 
>> some things you've already said to make it clear what's different.
>>
>> The Go runtime maps new heap memory (specifically: a new virtual memory 
>> mapping for the heap) as read/write in increments called arenas. (Note: my 
>> use of "heap" here is a little loose; that pool of memory is also used for 
>> e.g. goroutine stacks.) The concept of arena is carried forward to how GC 
>> metadata is managed (chunk of metadata per arena) but is otherwise 
>> orthogonal to everything else I'm about to describe. To the scavenger, the 
>> concept of an arena doesn't really exist.
>>
>> The platform (OS + architecture) has some underlying physical page size 
>> (typically between 4 and 64 KiB, inclusive), but Go has an internal page 
>> size of 8 KiB. It divides all of memory up into these 8 KiB pages, 
>> including heap memory.
>>
>> The runtime assumes, in general, that new virtual memory is not backed by 
>> physical memory until first use (or an explicit system call on some 
>> platforms, like Windows). As free pages get allocated for the heap (for 
>> spans, as you say), they are assumed to be backed by physical memory. Once 
>> those pages are released, they are still assumed to be backed by physical 
>> memory.
>>
>> This is where the scavenger comes in: it tells the OS that these free 
>> regions of the address space, which it assumes are backed by physical 
>> pages, are no longer needed in the short term. So, the OS is free to take 
>> the physical memory back. "Telling the OS" is the madvise system call on 
>> Linux platforms. Note that the Go runtime could be wrong about whether the 
>> region is backed by physical memory; that's fine, madvise is just a hint 
>> anyway (a really useful one). (Also, it's really unlikely to be wrong, 
>> because memory needs to be zeroed before it's handed to the application. 
>> Still, it's theoretically possible.)
>>
>> The scavenger doesn't really have any impact on fragmentation, because 
>> the Go runtime is free to allocate a span out of a mix of scavenged and 
>> unscavenged pages. When it's actively scavenging, it briefly takes those 
>> pages out of the allocation pool, which can affect fragmentation, but the 
>> system is organized such that such a collision (and thus potentially some 
>> fragmentation) is less likely.
>>
>> The result is basically just fewer physical pages consumed by Go 
>> applications (what "top" reports as "RSS") at the cost of about 1% of total 
>> CPU time. The CPU cost, however, is usually much less; 1% is just the 
>> target while it's active, but in the steady-state there's typically not too 
>> much work to do.
>>
>> The Go runtime also never unmaps heap memory, because virtual memory 
>> that's guaranteed to not be backed by physical memory is very cheap (likely 
>> just a single interval in some OS bookkeeping). Unmapping virtual address 
>> space is also fairly expensive in comparison to madvise, so it's worthwhile 
>> to avoid.
>>
>> I don't fully understand what you mean by "layered cake" in this context. 
>> The memory allocator in general is certainly a "layered cake," but the 
>> scavenger just operates directly on the pool of free pages (which again, 
>> don't have much to do with arenas other than that happens to be the 
>> increment that new pages are added to the pool).
>>
>> There's also two additional complications to all of this:
>> (1) Because the Go runtime's page size doesn't usually match the system's 
>> physical page size, the scavenger needs to be careful to only return 
>> contiguous and aligned runs of pages that add up to the physical page size. 
>> This makes it less effective on platforms with physical page larger than 8 
>> KiB because fragmentation can prevent an entire physical page from being 
>> free. This is fine, though; the scavenger is most useful when, for example, 
>> the heap size shrinks significantly. Then there's almost always a large 
>> swathe of available free pages. Note also that platforms with smaller 
>> physical page sizes are fine, because every scavenge operation releases 
>> 

[go-nuts] Go allocator: allocates arenas, reclaims pages?

2022-06-20 Thread Vitaly Isaev
Go allocator requests memory from OS in large arenas (on Linux x86_64 the 
size of arena is 64Mb), then allocator splits each arena to 8 Kb pages, 
than merges pages in spans of different sizes (from 8kb to 80kb size 
according to https://go.dev/src/runtime/sizeclasses.go). This process is 
well described in various blog posts and presentations.

But there is much less information about scavenger. Is it true that in 
contrast to allocation process, scavenger reclaims to OS not arenas, but 
pages underlying idle spans? This performed with madvice(MADV_DONT_NEED). 

If so, Am I correct that after a while the virtual address space of a Go 
application resembles a "layered cake" of interleaving used and reclaimed 
memory regions (kind of classic memory fragmentation problem)? Looks like 
if application requires more virtual memory after some time, the OS won't 
be able to reuse these page-size regions to allocate contiguous space 
sufficient for arena allocation.

Are there any consequences of this design for the runtime performance, 
especially for the RSS consumption?

Finally, how does runtime decide, what to use - munmap or madvice - for the 
purposes of memory reclamation?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/57975587-88f5-4808-8c8d-12c9b8d4391fn%40googlegroups.com.


Re: [go-nuts] How to interpret the HeapReleased decrease?

2022-06-19 Thread Vitaly Isaev
Ian, thank you for the clarification, but what if this happens on Linux 
with Go >= 1.16 (where MADV_DONTNEED is default)?

воскресенье, 19 июня 2022 г. в 03:02:54 UTC+3, Ian Lance Taylor: 

> On Sat, Jun 18, 2022 at 6:35 AM Vitaly Isaev  wrote:
> >
> > Hi everyone, I've read the thread 
> https://github.com/golang/go/issues/33376,
> > but still I can't figure out what does it mean when HeapReleased value 
> is decreasing.
> > When the Go runtime returns the memory to the OS, isn't it an 
> irreversible process? So why does this indicator have to go down?
> >
> > Or should we interpret HeapReleased as an amount of HeapIdle virtual 
> memory, that is not backed by physical memory?
> >
> > Could anyone please elaborate on this.
>
> HeapReleased goes up when the runtime tells the OS that it no longer
> needs the memory. On Linux this is done via madvise(MADV_FREE).
> HeapReleased goes down when the runtime decides that it wants to use
> the memory again. Linux permits ths for MADV_FREE pages, it's just
> that the pages may or may be zeroed.
>
> In other words, no, in general returning memory to the OS is not an
> irreversible process.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/41176b53-4ccc-4a61-b178-94306aba404an%40googlegroups.com.


[go-nuts] How to interpret the HeapReleased decrease?

2022-06-18 Thread Vitaly Isaev
Hi everyone, I've read the thread 
https://github.com/golang/go/issues/33376, 
but still I can't figure out what does it mean when HeapReleased value is 
decreasing.
When the Go runtime returns the memory to the OS, isn't it an irreversible 
process? So why does this indicator have to go down?

Or should we interpret HeapReleased as an amount of HeapIdle virtual 
memory, that is not backed by physical memory?

Could anyone please elaborate on this.


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4f464747-70b2-4b71-8e92-efef8229924en%40googlegroups.com.


[go-nuts] Go: pointer bit stealing technique

2021-05-06 Thread Vitaly Isaev
 

In a well-known book "The Art of Multiprocessor Programming" by Herlihy, 
Shavit some of lock-free and wait-free algorithms utilize Java's template 
AtomicMarkableReference type. It allows to perform single atomic CAS 
operation on the pair consisting of T reference and boolean mark.

There is no similar type in C/C++/Go stdlib, but at least in C++ it's 
possible to model it using bit stealing approach (see C++ example 
).
 
On x86_64 arch only 48 bits of 64 bits are actually used, so one can store 
arbitrary data in the remaining 16 bits, and work with the whole pointer 
and the data atomically.

As far as I understand, there are two requirements to implement this 
approach:

   1. Pointers must be aligned.
   2. Pointer's low bits must be clear (if you want to store something like 
   bool in this area, it must not be already occupied).

But some stackowerflow users have questioned 

 
whether these remaining bits are really free in Go. Perhaps Go runtime 
already uses these area for GC or some other background routines?

So is it possible to use pointer bit stealing technique in Go? Are there 
any working examples?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/b9e78da6-2a88-444b-9d83-64c2341bdc22n%40googlegroups.com.


[go-nuts] Compile Go application with multiple versions of C library

2019-10-08 Thread Vitaly Isaev
Suggest we're implementing a very specific Go application like data 
migrator (for example, the one that transfers data from old database to new 
database with different data types). Therefore, this application must be 
compiled with two different versions of one library (e.g. v1 and v2). It 
seems to be impossible in terms of Go modules (see this ticket 
), but can we do this trick with 
C shared libraries?

Minimal example:

v1/lib.h
#ifndef LIBCGO_MULTIVERSION_V1
#define LIBCGO_MULTIVERSION_V1

int add(int a, int b);

#endif

v1/lib.c
#include "lib.h"

int add(int a, int b) {
return a + b;
}

v1/lib.go
package cgomultiversion

// #cgo LDFLAGS: -l:libcgomultiversion.so.1
// #include 
import "C"

func Add(a, b int) int {
return int(C.add(C.int(a), C.int(b)))
}

v2/lib.h
#ifndef LIBCGO_MULTIVERSION_V2
#define LIBCGO_MULTIVERSION_V2

int add(int a, int b);

#endif

v2/lib.c
#include "lib.h"

int add(int a, int b) {
// for tests purposes addition is replaced with multiplication in v2
return a * b;
}

v2/lib.go
package cgomultiversion

// #cgo LDFLAGS: -l:libcgomultiversion.so.2
// #include 
import "C"

func Add(a, b int) int {
return int(C.add(C.int(a), C.int(b)))
}

Finally we have a test that tries to utilize both versions of C library:
package cgomultiversion

import (
"testing"
"github.com/stretchr/testify/assert"
cgomultiversion1 "github.com/vitalyisaev2/cgo_multiversion_go_lib/v1"
cgomultiversion2 "github.com/vitalyisaev2/cgo_multiversion_go_lib/v2"
)

func TestAdd(t *testing.T) {
assert.Equal(t, 10, cgomultiversion1.Add(5, 5))
assert.Equal(t, 25, cgomultiversion2.Add(5, 5))
}

This won't compile because of naming conflict on the C side:

go test -count=1 -v
# github.com/vitalyisaev2/cgo_multiversion_go_lib.test
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/bin/ld: /tmp/go-link-218143790/05.o: in function `add':
/home/isaev/go/src/github.com/vitalyisaev2/cgo_multiversion_go_lib/v2/lib.c:6: 
multiple definition of `add'; 
/tmp/go-link-218143790/02.o:/home/isaev/go/src/github.com/vitalyisaev2/cgo_multiversion_go_lib/v1/lib.c:5:
 first defined here
collect2: error: ld returned 1 exit status



Full example is available at: 
https://github.com/vitalyisaev2/cgo_multiversion_go_lib

Is it possible to achieve this with some other methods? Thanks a lot.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/1778392c-d0d6-4ff9-9385-380483078626%40googlegroups.com.


Re: [go-nuts] Does it make sense to make expensive syscalls from different goroutines?

2017-03-18 Thread Vitaly Isaev


суббота, 18 марта 2017 г., 14:37:11 UTC+3 пользователь Konstantin Khomoutov 
написал:
>
> On Sat, 18 Mar 2017 03:50:39 -0700 (PDT) 
> Vitaly Isaev <vitaly...@gmail.com > wrote: 
>
> [...] 
> > Assume that application does some heavy lifting with multiple file 
> > descriptors (e.g., opening - writing data - syncing - closing), what 
> > actually happens to Go runtime? Does it block all the goroutines at 
> > the time when expensive syscall occures (like syscall.Fsync)? Or only 
> > the calling goroutine is blocked while the others are still operating? 
>
> IIUC, since there's no general mechanism to have kernel somehow notify 
> the process of the completion of any generic syscalls, when a goroutine 
> enters a syscall, it essentially locks its unrelying OS thread and 
> waits until the syscall completes.  The scheduler detects the goroutine 
> is about to sleep in the syscall and schedules another goroutine(s) to 
> run, but the underlying OS thread is not freed. 
>
> This is in contrast to network I/O which uses the platform-specific 
> poller (such as IOCP on Windows, epoll on Linux, kqueue on FreeBSD and 
> so on) so when an I/O operation on a socket is about to block, the 
> goroutine which performed that syscall is suspended, put on the wait 
> list, its socket is added to the set the poller monitors and its 
> underlying OS thread is freed to be able to serve a runnable goroutine. 
>
> > So does it make sense to write programs with multiple workers that do 
> > a lot of user space - kernel space context switching? Does it make 
> > sense to use multithreading patterns for disk input? 
>
> It may or may not.  A syscall-heavy workload might degrade the 
> goroutine scheduling to actually be N×N instead of M×N.  This might not 
> be the problem in itself (not counting a big number of OS threads 
> allocated and mostly sleeping) but concurrent access to the same slow 
> resource such as rotating medum is almost always a bad idea: say, your 
> HDD (and the file system on it) might only provide such and such read 
> bandwidth, so spreading the processing of the data being read across 
> multiple goroutines is only worth the deal if this processing is so 
> computationally complex that a single goroutine won't cope with that 
> full bandwidth.  If one goroutine is OK with keeping up with that full 
> bandwidth, having two goroutines read that same data will make each deal 
> with only half the bandwidth, so they will sleep > 50% of the time. 
> Note that reading two files in parallel off the filesystem located on 
> the same rotating medium will usually result in lowered full 
> bandwidth due to seek times required to jump around the blocks of 
> different files.
>
> SSDs and other kinds of medium might have way better performance 
> characteristics so it worth measuring. 
>
> IOW, I'd say that trying to parallelizing might be a premature 
> optimization.  It worth keeping in mind that goroutines serve two 
> separate purposes: 1) they allow you to write natural sequential 
> control flow instead of callback-ridden spaghetti code; 2) they allow 
> performing tasks truely in parallel--if the hardware supports it 
> (multiple CPUs and/or cores). 
>
> This (2) is tricky because it assumes such goroutines have something to 
> do; if they instead contend on some shared resource, the parallelization 
> won't really happen. 
>

Thanks, that's a very good point.  

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Does it make sense to make expensive syscalls from different goroutines?

2017-03-18 Thread Vitaly Isaev


I would appreciate it if someone could clarify how does Go runtime operates 
under this circumstances:


Assume that application does some heavy lifting with multiple file 
descriptors (e.g., opening - writing data - syncing - closing), what 
actually happens to Go runtime? Does it block all the goroutines at the 
time when expensive syscall occures (like syscall.Fsync)? Or only the 
calling goroutine is blocked while the others are still operating?


So does it make sense to write programs with multiple workers that do a lot 
of user space - kernel space context switching? Does it make sense to use 
multithreading patterns for disk input?


Minimal example: https://play.golang.org/p/O0omcPBMAJ

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.