Re: [go-nuts] Safe ways to call C with less overhead

2020-04-30 Thread Michael Jones
Function call intensity seems directly addressed by a tree or DAG like
chain of command buffers, not necessarily a full scene graph (with logic
and selection) but a call at the top and traverse tool to let you make just
a few cgo transitions to c per frame.

I’ve done this several ways myself (non-Go) and it works a charm.

On Thu, Apr 30, 2020 at 6:18 AM Constantine Shablya 
wrote:

> Thanks for reply, Ian
>
> To clear up, by safety I only mean presence of stack guards or, more
> generally,
> means of ensuring the program doesn't silently end up writing past the
> stack.
>
> From this I take my next step will be to make something between
> systemstack and
> asmcgocall so that I still run (subset of) Go while on systemstack but
> otherwise
> it would be as if it was a Cgo call, with minor difference that aligning
> stack
> and change of calling convention would happen at calls to C as opposed to
> at
> asmcgocall.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CAEp866QQrqNUznSOFR4j_s6vW4X7ftEQVy48%3DH42UTBoe4DtXw%40mail.gmail.com
> .
>
-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CALoEmQz6KT3zYibL7zG%2B8%3DnqQB1%3DHw0YJOcefU1_Y%3DQu6iP0AA%40mail.gmail.com.


Re: [go-nuts] Safe ways to call C with less overhead

2020-04-30 Thread Constantine Shablya
Thanks for reply, Ian

To clear up, by safety I only mean presence of stack guards or, more generally,
means of ensuring the program doesn't silently end up writing past the stack.

>From this I take my next step will be to make something between systemstack and
asmcgocall so that I still run (subset of) Go while on systemstack but otherwise
it would be as if it was a Cgo call, with minor difference that aligning stack
and change of calling convention would happen at calls to C as opposed to at
asmcgocall.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAEp866QQrqNUznSOFR4j_s6vW4X7ftEQVy48%3DH42UTBoe4DtXw%40mail.gmail.com.


Re: [go-nuts] Safe ways to call C with less overhead

2020-04-29 Thread Ian Lance Taylor
On Wed, Apr 29, 2020 at 12:45 PM  wrote:
>
> 1) I have heard gccgo can call C much quicker than the standard go
> implementation can. If this statement is true, why is that?

It's because gccgo uses the C calling convention.  So you can use a
magic //go:linkname comment to rename a Go function declaration to a C
function, and then just call it.  This is completely unsafe, because
if the C function blocks or misuses pointers your program will crash
horribly.  But it is fast.

> 2) How could a systemstack-like function but which can be friends with
> preemption mechanism of Go 1.14 be achieved?

That is basically a cgo call: a preemption friendly switch to the system stack.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWvk_f7JbpQgzOM17mKBRxwSA0Fb_3k2G34sX3KcLh8DQ%40mail.gmail.com.


[go-nuts] Safe ways to call C with less overhead

2020-04-29 Thread nanokatze
I am interested in having less overhead for Go-C-Go roundtrips, for C
programs that I know (or at least am very much sure) will behave, for
most part, similar to non-preemptible (as in prior to Go 1.14) loops
in Go. Concretely, I have a self-imposed exercise of making a
frankenprogram that talks graphics using Vulkan (this part is in C)
and talks to other computers over network (this one is in Go).

Background

Certain sizable class of Vulkan programs spend overwhelming majority
of their CPU time in a loop of recording command buffers. This means
calling vkCmd* functions some 1 to 10 or more times per
second. The frequency with which these functions are invoked makes it
infeasible to use the common Cgo mechanism because its constant
overhead becomes significantly larger than the time these functions
individually run for. In fact, it is likely that most of vkCmd* these
programs are interested in just merely copy their arguments to an
array and bump a counter. This property makes the assumption of Cgo
that a C program may block redundant. But this is an implementation
detail of a Vulkan driver and differs between drivers (we still assume
that driver is good and won't be doing nasty things like hanging
forever). Even if we knew memory layout of command buffer and
replicated in Go relevant vkCmd* functions from, say, mesa radv, this
will be broken by an inevitable driver update (the .so part of driver
is overwhelmingly common to be linked dynamically) and will not work
on a different driver such as mesa anvil (intel vulkan driver).

There is also a number of minor nuisances such as dynamic cgocheck
being constantly angry at pointers to Go memory being sent to C
(runtime.KeepAlives were carefully placed in the program) and general
discomfort of writing "Go-looking C" in Go.

I suspect there's an alternative way to making graphics card do work
by means of using indirect draw similar to OpenGL AZDO approach (this
is a speculation, since I'm not at all familiar with this
approach). This lets us to just write indirect draw commands to a
large array from Go and make calling into C much less frequent. This
approach appears to come at big cost of ergonomics: we have no way of
interleaving binding of anything with the draw calls. We would need a
separate logical buffer (offset in vkCmdDraw*Indirect) with indirect
draw commands between any two vkCmdBind* calls.

Another approach would be to have intermediate command buffer of our
own which would be, for example, an array of tagged unions (structs
with an integer tag and a union) describing which vkCmd* to call and
what parameters to give it.

This hoop-jumping motivates us to write a considerable partition of
program in C, which is not necessarily the thing we desire, which in
turn motivates us to find ways to call classes of C programs that we
hope will not misbehave with overhead that is significantly less than
that of Cgo.

rustgo: calling Rust from Go with near-zero overhead

I recalled stumbling upon https://blog.filippo.io/rustgo/ and reread
the article. The approach described is simple but extremely dangerous
for opaque C calls (which is what vkCmd* are). We have no idea about
how much stack the C function is going to need and if we had to make a
conservative guess, we would still want to have a stack guard at the
bottom, to be safe.

https://github.com/minio/c2goasm lets us do what is described in the
article in a less involved manner but (as far as I understood) assumes
that the C function is a leaf function.

I never tried installing stack guards at goroutine stacks in Go but in
my own exercise of re-implementing rsc's libtask I tried mprotecting
4k at the bottom of the stack to PROT_READ. I noted that I couldn't
have more than about 15k tasks. This is due to the default limit of
about 32k of memory maps in linux. mprotecting PROT_READ and then
restoring protection at each context switch removed this handicap but
slowed down context switching considerably. In Go, I imagine, stack
guard for C functions could be done as follows:

Withguard(func() {
// C functions are called here in a way described in rustgo article
})

where Withguard would ask for large morestack + 8k at bottom, install
PROT_READ page somewhere in the 8k part near the bottom of the stack,
call the function passed to it, munprotect and leave. But I speculate
this would lead to deadly interactions with GC such as ocassional
segfaults whenever GC would for whatever unknown reason access the
PROT_READ part of the stack.

runtime.systemstack

I stumbled upon this function when exploring Go runtime. It lets me
achieve things I would want to make Withguard for. It has an important
caveat: while we're on systemstack, we may not be preempted (also no
defers). This means that we probably should not allocate anything in
fear of dangerous interactions with GC. We also should probably leave
systemstack sooner, because I suspect GC will not be able to see roots
of the goroutine that switched to