Re: [go-nuts] Port to powerpc 440fpu

2020-08-10 Thread David Riley
On Aug 10, 2020, at 4:59 AM, Hugo Cornelis  wrote:
> 
> 
> Hi,
> 
> Bottom line: Docker works reliably on powerpc 440fpu 32 bit using gccgo as 
> the compiler.  We will likely soon start working on powerpc e6500 in 32bit 
> mode.
> 
> After a fix in the structures used by the epoll system calls, the problem 
> disappeared.  I assume the problem was a starvation similar to
> 
> https://github.com/moby/moby/issues/39461
> 
> We had to correct the used system call numbers for the fstat system call 
> family, for sendfile, fadvise, ftruncate, truncate and fcntl.
> 
> We also had to fix the alignment of some of the structures used by these 
> functions and for the EpollEvent structure (ie. the generator did not always 
> generate correct structures).  The fix for EpollEvent also fixed the I/O 
> starvation problem.
> 
> It remains unclear why the generator did not generate correct structures.  We 
> updated the post processor mkpost.go to fix the structures (alignment + 
> member names), but did not look further into the underlying problem.

Glad to see updates!  I hope there's a chance to mainline this, I would welcome 
running Go in 32-bit PPC on my Net/OpenBSD machines that run on that platform.


- Dave

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/9E061F1B-2197-4A68-8B5A-9359C28DBA8F%40gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-08-10 Thread Hugo Cornelis
Hi,

Bottom line: Docker works reliably on powerpc 440fpu 32 bit using gccgo as
the compiler.  We will likely soon start working on powerpc e6500 in 32bit
mode.

After a fix in the structures used by the epoll system calls, the problem
disappeared.  I assume the problem was a starvation similar to

https://github.com/moby/moby/issues/39461

We had to correct the used system call numbers for the fstat system call
family, for sendfile, fadvise, ftruncate, truncate and fcntl.

We also had to fix the alignment of some of the structures used by these
functions and for the EpollEvent structure (ie. the generator did not
always generate correct structures).  The fix for EpollEvent also fixed the
I/O starvation problem.

It remains unclear why the generator did not generate correct structures.
We updated the post processor mkpost.go to fix the structures (alignment +
member names), but did not look further into the underlying problem.

Hugo




On Thu, Jul 9, 2020 at 10:08 AM Hugo Cornelis 
wrote:

>
>
> On Fri, Jul 3, 2020 at 9:36 PM Ian Lance Taylor  wrote:
>
>> That looks like the process is writing to a pipe that nothing is reading
>> from.
>>
>
> Yes, that is correct.  The question is: why doesn't the reader read from
> the pipe?  And why does it suddenly start reading when the Docker daemon
> process is terminated?
>
> At first I would believe this to be a starvation problem, but our
> investigation is still inconclusive.
>
> Here is what we know about the reader process / goroutine:
>
> - It is a goroutine that becomes active when Docker terminates.
>
> - This same goroutine gets stuck at io.CopyBuffer(epollConsole, in, *bp)
> before Docker terminates.  During this time the writer writes 18778
> characters (and then gets stuck).
>
> - All the configurations we tested, gave this or similar behaviour.
> However the behaviour is slightly timing dependent, ie. inserting logging
> statements may result in small changes to this behaviour.
>
>
> During the last few days of investigation we found one race in containerd
> and several bugs in our system call bindings for ppc (Ftruncate, Truncate,
> Fstatfs, Statfs, Lstat).
>
> We have fixed these, but the problem with the reader not reading / blocked
> I/O persists.
>
> It may be a case of starvation, or a race, or something else.
>
> More investigation is required.  We are now looking further into the
> system call bindings, debugging the code of Docker and its tools, and the
> gccgo runtime.
>
> Thanks for your reply.
>
> Hugo
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAARrXCSWbTLZ0QEWpuENEHUJaE_Bj5qcuMKGMkvHkyghBFZTqQ%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-07-09 Thread Hugo Cornelis
On Fri, Jul 3, 2020 at 9:36 PM Ian Lance Taylor  wrote:

> That looks like the process is writing to a pipe that nothing is reading
> from.
>

Yes, that is correct.  The question is: why doesn't the reader read from
the pipe?  And why does it suddenly start reading when the Docker daemon
process is terminated?

At first I would believe this to be a starvation problem, but our
investigation is still inconclusive.

Here is what we know about the reader process / goroutine:

- It is a goroutine that becomes active when Docker terminates.

- This same goroutine gets stuck at io.CopyBuffer(epollConsole, in, *bp)
before Docker terminates.  During this time the writer writes 18778
characters (and then gets stuck).

- All the configurations we tested, gave this or similar behaviour.
However the behaviour is slightly timing dependent, ie. inserting logging
statements may result in small changes to this behaviour.


During the last few days of investigation we found one race in containerd
and several bugs in our system call bindings for ppc (Ftruncate, Truncate,
Fstatfs, Statfs, Lstat).

We have fixed these, but the problem with the reader not reading / blocked
I/O persists.

It may be a case of starvation, or a race, or something else.

More investigation is required.  We are now looking further into the system
call bindings, debugging the code of Docker and its tools, and the gccgo
runtime.

Thanks for your reply.

Hugo

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAARrXCTG2NcYVuw8GOZw8t0fYf%3DM_hY1h9%3DPor_H1_wgeBo8jQ%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-07-03 Thread Ian Lance Taylor
On Fri, Jul 3, 2020 at 5:54 AM Hugo Cornelis
 wrote:
>
> Thanks for your answer.
>
> On Mon, Jun 29, 2020 at 9:10 PM Ian Lance Taylor  wrote:
>>
>> Thanks for the background.
>>
>> Earlier I suggested looking at the output of "strace -f" for the
>> programs that fail.  Does that show anything of interest?
>
>
> What follows is the analysis of one strace (strace -fv -s 100) attached to 
> the docker daemon.
>
> The strace log file shows the creation of a chain of processes: dockerd forks 
> containerd forks containerd-shim forks runc forks a command that runs inside 
> the container (the command is '/usr/bin/find .').  This is also expected.
>
> When the I/O of the process /usr/bin/find in the docker container is blocked, 
> strace shows that the Golang schedulers are still active:
>
> (line 298015)
> [pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1} 
> 
> [pid  2264] sched_yield( 
> [pid  2270] swapcontext(0x2190a580, 0 
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> [pid  2264] sched_yield( 
> [pid  2270] swapcontext(0, 0x214dda80 
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 558750336
> [pid  2264] sched_yield( 
> [pid  2270] swapcontext(0, 0x2190a580 
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 563127680
> [pid  2264] sched_yield( 
> [pid  2270] swapcontext(0x2190a580, 0 
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> ...
> (line 298134)
> [pid  2266] <... _newselect resumed>)   = 0 (Timeout)
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 563127680
> [pid  2266] epoll_wait(4,  
> [pid  2264] sched_yield( 
> [pid  2270] swapcontext(0x2190a580, 0 
> [pid  2266] <... epoll_wait resumed>[], 128, 0) = 0
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> [pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1} 
> 
>
> TIDs 2264, 2266 and 2270 belong to the process runc.  The strace log has 
> similar straces for the other processes (dockerd, containerd, 
> containerd-shim), so I assume also their goroutine schedulers were active.  I 
> am actually wondering how to relate the arguments listed in the strace file 
> to the Golang or C code.
>
> Just before the container gets blocked it runs the command 'find .' that 
> should produce output to the terminal (but there is no output at first, that 
> is the problem).  The data is visible in the strace log through the 'write()' 
> system call:
>
> [pid  2204] execve("/usr/bin/find", ["/usr/bin/find", "."], 
> ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", 
> "HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] 
> ...
> [pid  2204] lstat64(".",  
> ...
> [pid  2204] write(1, ".\n", 2 
> ...
> [pid  2204] lstat64("./var",  
> ...
> [pid  2204] write(1, "./var\n", 6 
> ...
>
> The 'find' process writes 626 lines to stdout (18778 characters, this seems 
> to be reproducible).  The last lines are:
>
> ...
> [pid  2204] lstat64("./sys/kernel/slab/rpc_buffers",  
> ...
> [pid  2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 
>
> All the write() system calls except the last one are successfully completed.  
> The last one remains blocked.  At that time PID 2204 hangs for a long time
>
> runc / dockerd / containerd / containerd-shim have continuous activity as in 
> the first strace that I showed above.
>
> When I terminate the docker process with a signal SIGINT, the characters 
> written by PID 2204 are suddenly flushed to the terminal.
>
> Are there anything specific things to look for in the strace files (eg. 
> specific epoll() calls).  Is there a way to map the arguments and return 
> values of swapcontext() to goroutines, or would this be a useless thing to 
> try to do?
>
> This is the analysis of one trial.  Some of the other trials did not start 
> the full chain of processes, it looks like the behaviour of the bug is also 
> timing dependent.


That looks like the process is writing to a pipe that nothing is reading from.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWn-em6UzwUifemOttSToZNyNJ2VoqMZsH%2B-%3DhdUVoC%3DQ%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-07-03 Thread Hugo Cornelis
Thanks for your answer.

On Mon, Jun 29, 2020 at 9:10 PM Ian Lance Taylor  wrote:

> Thanks for the background.
>
> Earlier I suggested looking at the output of "strace -f" for the
> programs that fail.  Does that show anything of interest?
>

What follows is the analysis of one strace (strace -fv -s 100) attached to
the docker daemon.

The strace log file shows the creation of a chain of processes: dockerd
forks containerd forks containerd-shim forks runc forks a command that runs
inside the container (the command is '/usr/bin/find .').  This is also
expected.

When the I/O of the process /usr/bin/find in the docker container is
blocked, strace shows that the Golang schedulers are still active:

(line 298015)
[pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1}

[pid  2264] sched_yield( 
[pid  2270] swapcontext(0x2190a580, 0 
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
[pid  2264] sched_yield( 
[pid  2270] swapcontext(0, 0x214dda80 
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 558750336
[pid  2264] sched_yield( 
[pid  2270] swapcontext(0, 0x2190a580 
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 563127680
[pid  2264] sched_yield( 
[pid  2270] swapcontext(0x2190a580, 0 
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
...
(line 298134)
[pid  2266] <... _newselect resumed>)   = 0 (Timeout)
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 563127680
[pid  2266] epoll_wait(4,  
[pid  2264] sched_yield( 
[pid  2270] swapcontext(0x2190a580, 0 
[pid  2266] <... epoll_wait resumed>[], 128, 0) = 0
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
[pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1}


TIDs 2264, 2266 and 2270 belong to the process runc.  The strace log has
similar straces for the other processes (dockerd, containerd,
containerd-shim), so I assume also their goroutine schedulers were active.
I am actually wondering how to relate the arguments listed in the strace
file to the Golang or C code.

Just before the container gets blocked it runs the command 'find .' that
should produce output to the terminal (but there is no output at first,
that is the problem).  The data is visible in the strace log through the
'write()' system call:

[pid  2204] execve("/usr/bin/find", ["/usr/bin/find", "."],
["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] 
...
[pid  2204] lstat64(".",  
...
[pid  2204] write(1, ".\n", 2 
...
[pid  2204] lstat64("./var",  
...
[pid  2204] write(1, "./var\n", 6 
...

The 'find' process writes 626 lines to stdout (18778 characters, this seems
to be reproducible).  The last lines are:

...
[pid  2204] lstat64("./sys/kernel/slab/rpc_buffers",  
...
[pid  2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 

All the write() system calls except the last one are successfully
completed.  The last one remains blocked.  At that time PID 2204 hangs for
a long time

runc / dockerd / containerd / containerd-shim have continuous activity as
in the first strace that I showed above.

When I terminate the docker process with a signal SIGINT, the characters
written by PID 2204 are suddenly flushed to the terminal.

Are there anything specific things to look for in the strace files (eg.
specific epoll() calls).  Is there a way to map the arguments and return
values of swapcontext() to goroutines, or would this be a useless thing to
try to do?

This is the analysis of one trial.  Some of the other trials did not start
the full chain of processes, it looks like the behaviour of the bug is also
timing dependent.

Hugo



>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAARrXCTpc%2BFzT9xZn%3DD_beLaupnes6fLfdM2OGXj5kJM5c__zg%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-06-29 Thread Ian Lance Taylor
On Mon, Jun 29, 2020 at 1:01 AM Hugo Cornelis
 wrote:
>
> The standard Go distribution doesn't support 32-bit PPC.
>
> To compile Golang code to 32-bit PPC we first built a proof of concept based 
> on docker-cli using the gccgo packages for Ubuntu.  We got this working 
> without too much effort.  Afterwards we integrated this type of 
> cross-compilation into Buildroot to compile the entire Docker tool suite for 
> use on an embedded system.
>
> Most of Docker seems to be working fine on the embedded device, however local 
> interactive terminal input / output with a running container is not working.
>
> What we observe is similar to what is described here: 
> https://github.com/moby/moby/issues/39461
>
> Investigation shows that two specific goroutines in Container daemon that are 
> responsible for forwarding the input and output from the container to the 
> user are not scheduled (they don't receive CPU cycles) until after Docker 
> terminates.
>
> These two goroutines use the functions io.CopyBuffer() and ReadFrom() / 
> WriteTo() to forward the traffic (the used method to forward traffic is 
> demonstrated in recvtty.go at 
> https://github.com/opencontainers/runc/blob/master/contrib/cmd/recvtty/recvtty.go)
>
> When Docker terminates it sends signal 15 (TERM) to these processes.  This 
> somehow allows the two goroutines to be scheduled which flushes the output 
> buffers to the terminal.
>
> This may be due to wrong system call bindings for 32-bit PPC in the unix 
> package, however inspection of these bindings did not reveal any problem so 
> far.
>
> We have been working on this for several weeks now, any help would be greatly 
> appreciated.
>
> Thanks!


Thanks for the background.

Earlier I suggested looking at the output of "strace -f" for the
programs that fail.  Does that show anything of interest?

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXDJPFnn3Wh2G5kUxrhRZorZcXUnSAn4PzeesAOOF1qUw%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-06-29 Thread Hugo Cornelis
Hi,

The standard Go distribution doesn't support 32-bit PPC.

To compile Golang code to 32-bit PPC we first built a proof of concept
based on docker-cli using the gccgo packages for Ubuntu.  We got this
working without too much effort.  Afterwards we integrated this type of
cross-compilation into Buildroot to compile the entire Docker tool suite
for use on an embedded system.

Most of Docker seems to be working fine on the embedded device, however
local interactive terminal input / output with a running container is not
working.

What we observe is similar to what is described here:
https://github.com/moby/moby/issues/39461

Investigation shows that two specific goroutines in Container daemon that
are responsible for forwarding the input and output from the container to
the user are not scheduled (they don't receive CPU cycles) until after
Docker terminates.

These two goroutines use the functions io.CopyBuffer() and ReadFrom() /
WriteTo() to forward the traffic (the used method to forward traffic is
demonstrated in recvtty.go at
https://github.com/opencontainers/runc/blob/master/contrib/cmd/recvtty/recvtty.go
)

When Docker terminates it sends signal 15 (TERM) to these processes.  This
somehow allows the two goroutines to be scheduled which flushes the output
buffers to the terminal.

This may be due to wrong system call bindings for 32-bit PPC in the unix
package, however inspection of these bindings did not reveal any problem so
far.

We have been working on this for several weeks now, any help would be
greatly appreciated.

Thanks!

Hugo



On Fri, Jun 19, 2020 at 3:33 AM Ian Lance Taylor  wrote:

> On Thu, Jun 18, 2020 at 1:17 PM Hugo Cornelis
>  wrote:
> >
> > Does anyone have experience with porting go applications to the powerpc
> 440fpu 32 bit.
> >
> > We have a team that is porting the Docker tool suite to a device that
> uses this CPU, we generated the system call bindings and compiled the
> Docker tool suite without much problems.
> >
> > Most of Docker seems to be working fine on the device, however
> interactive terminal input/output is not working.
> >
> > Investigation shows that two specific goroutines that are responsible
> for forwarding the I/O between two Docker related processes are not
> scheduled (they don't receive CPU cycles) until after Docker terminates
> these process with a signal 15 (TERM) and the two goroutines are suddenly
> scheduled and all of the output buffers are suddenly flushed to the
> terminal.
> >
> > We have looked at the terminal settings and flags applied to the file
> descriptors and these seem all fine (although I must admit that the code
> flows inside Docker and its tools are complicated).
> >
> > We suspect there may be a problem with one or more of the system call
> bindings, for instance that there may be a system call declared with
> //sysnb where it should be just //sys, if that makes sense.  I would
> actually not know how to distinguish between these two flags.
> >
> > We would now like to inspect the status of the two goroutines to
> understand what they are waiting for, and why the scheduler does not
> schedule them.
> >
> > Debugging with GODEBUG=schedtrace=1000;scheddetail=1, helps somewhat but
> no idea how to relate the output of the scheduler state to the two
> goroutines (if it would make sense at all).
> >
> > Does anyone have any experience debugging this type of problem?  How
> would we look at where exactly these processes are blocked without
> developing core knowledge about the Docker tool suite?
> >
> > We have been working on this for several weeks now, any help would be
> greatly appreciated.
>
>
> The standard Go distribution doesn't support 32-bit PPC, so I feel
> like there is some missing background information here.
>
> If the problem is with making system calls, then it often helps to
> look at the "strace -f" output to see what is going on at the system
> call level.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAARrXCQ43XxbS5kbMy_Rn66FG3ifW9%2B9%3D7y%3DB4UbmajEP0WcJw%40mail.gmail.com.


Re: [go-nuts] Port to powerpc 440fpu

2020-06-18 Thread Ian Lance Taylor
On Thu, Jun 18, 2020 at 1:17 PM Hugo Cornelis
 wrote:
>
> Does anyone have experience with porting go applications to the powerpc 
> 440fpu 32 bit.
>
> We have a team that is porting the Docker tool suite to a device that uses 
> this CPU, we generated the system call bindings and compiled the Docker tool 
> suite without much problems.
>
> Most of Docker seems to be working fine on the device, however interactive 
> terminal input/output is not working.
>
> Investigation shows that two specific goroutines that are responsible for 
> forwarding the I/O between two Docker related processes are not scheduled 
> (they don't receive CPU cycles) until after Docker terminates these process 
> with a signal 15 (TERM) and the two goroutines are suddenly scheduled and all 
> of the output buffers are suddenly flushed to the terminal.
>
> We have looked at the terminal settings and flags applied to the file 
> descriptors and these seem all fine (although I must admit that the code 
> flows inside Docker and its tools are complicated).
>
> We suspect there may be a problem with one or more of the system call 
> bindings, for instance that there may be a system call declared with //sysnb 
> where it should be just //sys, if that makes sense.  I would actually not 
> know how to distinguish between these two flags.
>
> We would now like to inspect the status of the two goroutines to understand 
> what they are waiting for, and why the scheduler does not schedule them.
>
> Debugging with GODEBUG=schedtrace=1000;scheddetail=1, helps somewhat but no 
> idea how to relate the output of the scheduler state to the two goroutines 
> (if it would make sense at all).
>
> Does anyone have any experience debugging this type of problem?  How would we 
> look at where exactly these processes are blocked without developing core 
> knowledge about the Docker tool suite?
>
> We have been working on this for several weeks now, any help would be greatly 
> appreciated.


The standard Go distribution doesn't support 32-bit PPC, so I feel
like there is some missing background information here.

If the problem is with making system calls, then it often helps to
look at the "strace -f" output to see what is going on at the system
call level.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcVMV_ya4vWw9faj9ebTQxegbQbksGjkf2w4eZbgUG2KVg%40mail.gmail.com.