Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Robert Engels
With a 500k machine cluster I suggest getting professional Go support - someone experienced in troubleshooting that can sit with you and review the code and configuration to diagnose the issue. Personally it sounds like overallicated machines causing thrashing delays in the context switching.

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Peter Z
Sorry for a mistake: 'hyperthread closed', hyperthread is actually on. 在2021年6月22日星期二 UTC+8 下午10:01:48 写道: > I just checked the monitor data and found that the machine suffered from > high 'load average'(about 30+) at approximately the time the agent get > stuck. > A 24 cores(2 CPUs * 14

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Peter Z
I just checked the monitor data and found that the machine suffered from high 'load average'(about 30+) at approximately the time the agent get stuck. A 24 cores(2 CPUs * 14 cores), hyperthread closed machine with load average over 30 seems bad. But after the load average got down to below 1,

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Peter Z
> > He is stating he has a cloud cluster consisting of 500k machines - each > machine runs an agent process - each agent has 7000 Go routines. > Aha. Yes, this is what I mean. > Sorry, now I am completely confused. > > So, you have about 500,000 *processes *running this agent on each >>>

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Robert Engels
He is stating he has a cloud cluster consisting of 500k machines - each machine runs an agent process - each agent has 7000 Go routines. > On Jun 22, 2021, at 7:07 AM, jake...@gmail.com wrote: > >  > Sorry, now I am completely confused. > >>> So, you have about 500,000 processes running

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread jake...@gmail.com
Sorry, now I am completely confused. So, you have about 500,000 *processes *running this agent on each machine, >> and each process has around 7,000 gorouines? Is that correct? >> > > Yes, that's exactly what I mean. > but then you say: "Only one process per machine". Is there a language

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-22 Thread Peter Z
Only one process per machine. We use '*taskset -c $last_2nd_core,$last_3rd_core,$last_4th_core ./agent -c ../conf/agent.toml*' to start the agent. I wonder if it has any relationship with this problem ? 在2021年6月22日星期二 UTC+8 上午12:56:13 写道: > How many processes per machine? It seems like

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-21 Thread Robert Engels
How many processes per machine? It seems like scheduling latency to me. > On Jun 21, 2021, at 6:31 AM, Peter Z wrote: > >  >> So, you have about 500,000 processes running this agent on each machine, and >> each process has around 7,000 gorouines? Is that correct? > > Yes, that's exactly

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-21 Thread Peter Z
> > So, you have about 500,000 *processes *running this agent on each > machine, and each process has around 7,000 gorouines? Is that correct? > Yes, that's exactly what I mean. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-21 Thread jake...@gmail.com
Could you clarify something? You say: " We have about half a million agents running on each of our machines" in your initial message. I thought maybe it was a language thing, and you meant 500,000 goroutines. But then you said: "There are 7000 goroutines total" So, you have about 500,000

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-20 Thread Peter Z
> > On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote: > > > > The original post is on stackoverflow > https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get > > > > > Golang ENV: > > go1.14.3 linux/amd64 > > > > Description: > > We have about half a million agents

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-17 Thread Robert Engels
You’re right. Inspecting the code it is internally partitioned by P. I agree that it looks like the pool is being continually created. > On Jun 17, 2021, at 12:18 PM, Ian Lance Taylor wrote: > > On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote: >> The original post is on stackoverflow >>

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-17 Thread Ian Lance Taylor
On Thu, Jun 17, 2021 at 10:11 AM Robert Engels wrote: > > You probably need multiple pools in and partition them. 500k accessors of a > shared lock is going to have contention. That might well help, but note that sync.Pool does not have a shared lock in general use. The shared lock is only

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-17 Thread Ian Lance Taylor
On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote: > > The original post is on stackoverflow > https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get > > Golang ENV: > go1.14.3 linux/amd64 > > Description: > We have about half a million agents running on each of our

Re: [go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-17 Thread Robert Engels
You probably need multiple pools in and partition them. 500k accessors of a shared lock is going to have contention. github.com/robaho/go-concurrency-test might be helpful. > On Jun 17, 2021, at 11:19 AM, Peter Z wrote: > >  > The original post is on stackoverflow >

[go-nuts] unexpected stuck in sync.(*Pool).Get()

2021-06-17 Thread Peter Z
The original post is on stackoverflow https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get Golang ENV: go1.14.3 linux/amd64 Description: We have about half a million agents running on each of our machines.The agent is written in Go. Recently we found that the