Re: [go-nuts] The program stucked on __vdso_clock_gettime (x86_64)
> > > For me, this looks like the program is merely idling, and what you're > observing is reading current time by the scheduler implemented by the Go > runtime. > I didn't try it. The environment located in customer's data center, not allowed to try this kind of things on it. > > [...] > >1. Any body know the reason of the problem? > > I'm inclined to think that you're merely facing some deadlock or similar > condition which precludes your normal Go code from doing useful progress > while > the runtime itself is OK. > > I agree with this, too. But the program didn't respond to any pprof interaction (We implemented a pprof server using domain socket), so I can not get goroutine information. > >2. What can I do to get more information if this problem happened > again? > > Do the following: > > - Have a way to collect whatever the process is writing to its stderr. > - When the process wedges in the same way, send it SIGQUIT or SIGABRT so > that >it dumps the stacks of all the active goroutines to the stderr and > exits. > > I will try this next time. > If that service handles HTTP requests, you might enable the handlers of the > core debug/pprof package and when the process it apparently wedged > navigate to > its /debug/pprof/goroutine?debug=2 (unless you have grafted the root of > those endpoints somewhere else) to have the stacks sent in the response. > > As I mentioned above, pprof is enabled by default, but the program ceased to respond to anything. Thanks very much. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAF_Sdb9imBONz%3DYL4%3DUQ9aqTSwLQT%3DN4eL7_g5AKAEdPsjGrOQ%40mail.gmail.com.
Re: [go-nuts] The program stucked on __vdso_clock_gettime (x86_64)
On Mon, Jul 04, 2022 at 01:57:06AM -0700, Chenhong Liu wrote: [...] > After profiling the process with perf command, the perf data shows the > program looped to call __vdso_clock_gettime . > > 64.20% controllerd [vdso] [.] __vdso_clock_gettime > 8.82% controllerd controllerd [.] runtime.procyield > 8.82% controllerd controllerd [.] runtime.suspendG > 8.01% controllerd controllerd [.] runtime.nanotime1 > 1.47% controllerd [kernel.kallsyms] [k] __enqueue_entity > 0.79% controllerd [kernel.kallsyms] [k] system_call_after_swapgs > 0.79% controllerd [kernel.kallsyms] [k] set_next_entity > 0.68% controllerd [kernel.kallsyms] [k] _raw_gspin_lock > 0.68% controllerd [kernel.kallsyms] [k] change_pte_range > 0.65% controllerd [kernel.kallsyms] [k] auditsys > 0.57% controllerd [kernel.kallsyms] [k] update_curr > 0.45% controllerd [kernel.kallsyms] [k] native_sched_clock > 0.45% controllerd [kernel.kallsyms] [k] cpuacct_charge > 0.45% controllerd [kernel.kallsyms] [k] __x86_indirect_thunk_+rax > 0.45% controllerd [kernel.kallsyms] [k] __audit_syscall_exit > 0.34% controllerd [kernel.kallsyms] [k] pick_next_task_fail > 0.23% controllerd controllerd [.] runtime.osyield > 0.23% controllerd [kernel.kallsyms] [k] native_queuedc_spin_lock_slowpath > 0.23% controllerd [kernel.kallsyms] [k] dput > 0.23% controllerd [kernel.kallsyms] [k] __schedule > 0.23% controllerd [kernel.kallsyms] [k] put_prev_task_fair > 0.23% controllerd [kernel.kallsyms] [k] yield_task_fair > 0.22% controllerd [kernel.kallsyms] [k] sys_sched_yield > 0.11% controllerd [kernel.kallsyms] [k] clear_buddies > 0.11% controllerd [kernel.kallsyms] [k] update_rq_clock.part.78 > 0.11% controllerd [kernel.kallsyms] [k] update_min_vruntime > 0.11% controllerd [kernel.kallsyms] [k] tick_do_update_jiffies64 > 0.11% controllerd [kernel.kallsyms] [k] system_call > 0.11% controllerd [kernel.kallsyms] [k] rb_next > 0.11% controllerd [kernel.kallsyms] [k] rb_insert_color > 0.00% controllerd controllerd [.] runtime.notesleep > 0.00% controllerd [kernel.kallsyms] [k] load_balance > 0.00% controllerd controllerd [.] runtime.runggrab > 0.00% controllerd controllerd [.] runtime.findrunnable > 0.00% controllerd [kernel.kallsyms] [k] update_numa_stats > 0.00% controllerd [kernel.kallsyms] [k] __lock_task_sighand > 0.00% controllerd [kernel.kallsyms] [k] idle_cpu > 0.00% controllerd [kernel.kallsyms] [k] kmem_cache_free_bulk > 0.00% controllerd [kernel.kallsyms] [k] task_numa_find_cpu > 0.00% controllerd [kernel.kallsyms] [k] __queue_work > 0.00% controllerd [kernel.kallsyms] [k] __perf_event_task_sched_in > 0.00% controllerd [kernel.kallsyms] [k] finish_task_switch > 0.00% controllerd [kernel.kallsyms] [k] perf_ctx_unlock > 0.00% controllerd [kernel.kallsyms] [k] native_write_msr_safe > 0.00% controllerd [kernel.kallsyms] [k] __perf_event_enable [...] For me, this looks like the program is merely idling, and what you're observing is reading current time by the scheduler implemented by the Go runtime. [...] >1. Any body know the reason of the problem? I'm inclined to think that you're merely facing some deadlock or similar condition which precludes your normal Go code from doing useful progress while the runtime itself is OK. >2. What can I do to get more information if this problem happened again? Do the following: - Have a way to collect whatever the process is writing to its stderr. - When the process wedges in the same way, send it SIGQUIT or SIGABRT so that it dumps the stacks of all the active goroutines to the stderr and exits. If that service handles HTTP requests, you might enable the handlers of the core debug/pprof package and when the process it apparently wedged navigate to its /debug/pprof/goroutine?debug=2 (unless you have grafted the root of those endpoints somewhere else) to have the stacks sent in the response. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/20220705095534.ay2wdgeinruwuf2w%40carbon.
[go-nuts] The program stucked on __vdso_clock_gettime (x86_64)
Hi, I encountered a wiered problem recently. *Environment* Arch: x86_64 CPU Vendor: hygon (AMD Zen1 OEM) (www.hygon.cn) (Hygon C86 7285) OS: CentOS 7.6 Kernel: 3.10.0-957 Go version: 1.15.6 *What happened* Our pure go program, named controllerd, stopped working except looping on one cpu core (from top command, it consumed one cpu core). Before it stopped working, it runs from mid April to Jun 12. After profiling the process with perf command, the perf data shows the program looped to call __vdso_clock_gettime . 64.20% controllerd [vdso] [.] __vdso_clock_gettime 8.82% controllerd controllerd [.] runtime.procyield 8.82% controllerd controllerd [.] runtime.suspendG 8.01% controllerd controllerd [.] runtime.nanotime1 1.47% controllerd [kernel.kallsyms] [k] __enqueue_entity 0.79% controllerd [kernel.kallsyms] [k] system_call_after_swapgs 0.79% controllerd [kernel.kallsyms] [k] set_next_entity 0.68% controllerd [kernel.kallsyms] [k] _raw_gspin_lock 0.68% controllerd [kernel.kallsyms] [k] change_pte_range 0.65% controllerd [kernel.kallsyms] [k] auditsys 0.57% controllerd [kernel.kallsyms] [k] update_curr 0.45% controllerd [kernel.kallsyms] [k] native_sched_clock 0.45% controllerd [kernel.kallsyms] [k] cpuacct_charge 0.45% controllerd [kernel.kallsyms] [k] __x86_indirect_thunk_+rax 0.45% controllerd [kernel.kallsyms] [k] __audit_syscall_exit 0.34% controllerd [kernel.kallsyms] [k] pick_next_task_fail 0.23% controllerd controllerd [.] runtime.osyield 0.23% controllerd [kernel.kallsyms] [k] native_queuedc_spin_lock_slowpath 0.23% controllerd [kernel.kallsyms] [k] dput 0.23% controllerd [kernel.kallsyms] [k] __schedule 0.23% controllerd [kernel.kallsyms] [k] put_prev_task_fair 0.23% controllerd [kernel.kallsyms] [k] yield_task_fair 0.22% controllerd [kernel.kallsyms] [k] sys_sched_yield 0.11% controllerd [kernel.kallsyms] [k] clear_buddies 0.11% controllerd [kernel.kallsyms] [k] update_rq_clock.part.78 0.11% controllerd [kernel.kallsyms] [k] update_min_vruntime 0.11% controllerd [kernel.kallsyms] [k] tick_do_update_jiffies64 0.11% controllerd [kernel.kallsyms] [k] system_call 0.11% controllerd [kernel.kallsyms] [k] rb_next 0.11% controllerd [kernel.kallsyms] [k] rb_insert_color 0.00% controllerd controllerd [.] runtime.notesleep 0.00% controllerd [kernel.kallsyms] [k] load_balance 0.00% controllerd controllerd [.] runtime.runggrab 0.00% controllerd controllerd [.] runtime.findrunnable 0.00% controllerd [kernel.kallsyms] [k] update_numa_stats 0.00% controllerd [kernel.kallsyms] [k] __lock_task_sighand 0.00% controllerd [kernel.kallsyms] [k] idle_cpu 0.00% controllerd [kernel.kallsyms] [k] kmem_cache_free_bulk 0.00% controllerd [kernel.kallsyms] [k] task_numa_find_cpu 0.00% controllerd [kernel.kallsyms] [k] __queue_work 0.00% controllerd [kernel.kallsyms] [k] __perf_event_task_sched_in 0.00% controllerd [kernel.kallsyms] [k] finish_task_switch 0.00% controllerd [kernel.kallsyms] [k] perf_ctx_unlock 0.00% controllerd [kernel.kallsyms] [k] native_write_msr_safe 0.00% controllerd [kernel.kallsyms] [k] __perf_event_enable B.T.W, perf command is: perf record -o controllerd_perf.data -F 99 -p PID sleep 10 I'm not familiar with Go's runtime detail. But after getting the perf data and reading code of function suspendG of Go's runtime. I guessed: The program looping in suspendG and failed to find any other goroutine to execute. *What was did* - Firstly, I suspected the clock source was not set correctly on the system. But it's set to tsc, and it's right. - Then, I tried to send SIGUSR1 to the program to see if it could be revived (the controllerd program can be triggered with SIGUSR1 to flush logs), but nothing changed. - Finally, we have to restart the program to recover. *Ask for help* 1. Any body know the reason of the problem? 2. What can I do to get more information if this problem happened again? Thanks. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1552be0a-3c36-4561-8fd0-2e649195a269n%40googlegroups.com.