Re: [go-nuts] Go routines stuck in runtime_pollwait

2021-03-20 Thread Rahul Yadav
Any updates, I am also facing this issue in my case Golang server just 
stuck and openFD gets spiked.
-Rahul

On Thursday, August 27, 2020 at 8:40:19 PM UTC+5:30 Siddhesh Divekar wrote:

> Ok, so your data collectors never complete. A simple change to make this 
>> easier to diagnose is to not spin up another collector controller at the 2 
>> min mark if the previous has not completed. 
>>
>> I would determine if the stuck collector is BQ or Elastic and check the 
>> server side logs. 
>>
> We start a data collector (DC) every 2 mins irrespective of whether the 
> previous DC has completed or not, so we would have a few DCs stuck but not 
> all. (2 from backtrace)
>
> I would determine if the stuck collector is BQ or Elastic and check the 
>> server side logs. 
>>
> In server side logs we don't see any activity at all. That is the 
> confusing part why regular user requests are not reaching the go server 
> from the users.
> Will check BQ and elastic logs around that time. But after restart 
> everything was working fine.
>
> Could be a bug in the http/2 implementation. I would disable http/2 and 
>> see if you encounter the problem. 
>>
> If we hit it again we will collect github.com/robaho/goanalyzer & see 
> what is happening.
> Between is there a compile time flag to disable http2 while building ?
>
> Thanks,
>
> On Thu, Aug 27, 2020 at 4:05 AM Robert Engels  
> wrote:
>
>> Could be a bug in the http/2 implementation. I would disable http/2 and 
>> see if you encounter the problem. 
>>
>> On Aug 27, 2020, at 6:02 AM, Robert Engels  wrote:
>>
>> 
>> Ok, so your data collectors never complete. A simple change to make this 
>> easier to diagnose is to not spin up another collector controller at the 2 
>> min mark if the previous has not completed. 
>>
>> I would determine if the stuck collector is BQ or Elastic and check the 
>> server side logs. 
>>
>> On Aug 26, 2020, at 11:29 PM, Kurtis Rader  wrote:
>>
>> 
>> On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar  
>> wrote:
>>
>>> Right, then it looks less likely that we are blocked on a mutex.
>>>
>>> Every 2 minutes we spin up a go routine which then in turn spins up a 
>>> bunch of go routines to collect data from big query & elastic (data 
>>> collector routines).
>>> The 2 minutes go routine then in turn waits on waitgroup for data 
>>> collector routines to come back. So its not single go routine from our side 
>>> at least.
>>> From backtrace we have from sigabort we see only one such data collector 
>>> go routine blocked and other 2 2 mins thread waiting on waitgroup.
>>>
>>
>> Are you spinning up a Go routine every 2 minutes regardless of whether 
>> the previous instance had completed? Your comment is not clear whether you 
>> have one, or more than one, goroutine issuing a BigQuery query at the time 
>> you saw the problem and captured the backtraces. If you are, in fact, 
>> starting those Go routines without regard to whether the prior instance had 
>> completed that seems like a "yellow flag", "here be dragons", situation. 
>>
>> -- 
>> Kurtis Rader
>> Caretaker of the exceptional canines Junior and Hank
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-y%3DQg_dx5W0qDWiMV9sB0eFToTEoiD83_mbn7RHC%3DQ%3Dg%40mail.gmail.com
>>  
>> 
>> .
>>
>>
>
> -- 
> -Siddhesh.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8a8a4445-7b57-410a-bb33-fa5d603b5f9fn%40googlegroups.com.


Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-27 Thread Siddhesh Divekar
>
> Ok, so your data collectors never complete. A simple change to make this
> easier to diagnose is to not spin up another collector controller at the 2
> min mark if the previous has not completed.
>
> I would determine if the stuck collector is BQ or Elastic and check the
> server side logs.
>
We start a data collector (DC) every 2 mins irrespective of whether the
previous DC has completed or not, so we would have a few DCs stuck but not
all. (2 from backtrace)

I would determine if the stuck collector is BQ or Elastic and check the
> server side logs.
>
In server side logs we don't see any activity at all. That is the
confusing part why regular user requests are not reaching the go server
from the users.
Will check BQ and elastic logs around that time. But after restart
everything was working fine.

Could be a bug in the http/2 implementation. I would disable http/2 and see
> if you encounter the problem.
>
If we hit it again we will collect github.com/robaho/goanalyzer & see what
is happening.
Between is there a compile time flag to disable http2 while building ?

Thanks,

On Thu, Aug 27, 2020 at 4:05 AM Robert Engels  wrote:

> Could be a bug in the http/2 implementation. I would disable http/2 and
> see if you encounter the problem.
>
> On Aug 27, 2020, at 6:02 AM, Robert Engels  wrote:
>
> 
> Ok, so your data collectors never complete. A simple change to make this
> easier to diagnose is to not spin up another collector controller at the 2
> min mark if the previous has not completed.
>
> I would determine if the stuck collector is BQ or Elastic and check the
> server side logs.
>
> On Aug 26, 2020, at 11:29 PM, Kurtis Rader  wrote:
>
> 
> On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Right, then it looks less likely that we are blocked on a mutex.
>>
>> Every 2 minutes we spin up a go routine which then in turn spins up a
>> bunch of go routines to collect data from big query & elastic (data
>> collector routines).
>> The 2 minutes go routine then in turn waits on waitgroup for data
>> collector routines to come back. So its not single go routine from our side
>> at least.
>> From backtrace we have from sigabort we see only one such data collector
>> go routine blocked and other 2 2 mins thread waiting on waitgroup.
>>
>
> Are you spinning up a Go routine every 2 minutes regardless of whether the
> previous instance had completed? Your comment is not clear whether you have
> one, or more than one, goroutine issuing a BigQuery query at the time you
> saw the problem and captured the backtraces. If you are, in fact, starting
> those Go routines without regard to whether the prior instance had
> completed that seems like a "yellow flag", "here be dragons", situation.
>
> --
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-y%3DQg_dx5W0qDWiMV9sB0eFToTEoiD83_mbn7RHC%3DQ%3Dg%40mail.gmail.com
> 
> .
>
>

-- 
-Siddhesh.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2Bhc%3DYyBByn0txHK5MqR6wA%3DUfWsj49TAov2ticeSbWSQg%40mail.gmail.com.


Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-27 Thread Robert Engels
Could be a bug in the http/2 implementation. I would disable http/2 and see if 
you encounter the problem. 

> On Aug 27, 2020, at 6:02 AM, Robert Engels  wrote:
> 
> 
> Ok, so your data collectors never complete. A simple change to make this 
> easier to diagnose is to not spin up another collector controller at the 2 
> min mark if the previous has not completed. 
> 
> I would determine if the stuck collector is BQ or Elastic and check the 
> server side logs. 
> 
>>> On Aug 26, 2020, at 11:29 PM, Kurtis Rader  wrote:
>>> 
>> 
>>> On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar 
>>>  wrote:
>> 
>>> Right, then it looks less likely that we are blocked on a mutex.
>>> 
>>> Every 2 minutes we spin up a go routine which then in turn spins up a bunch 
>>> of go routines to collect data from big query & elastic (data collector 
>>> routines).
>>> The 2 minutes go routine then in turn waits on waitgroup for data collector 
>>> routines to come back. So its not single go routine from our side at least.
>>> From backtrace we have from sigabort we see only one such data collector go 
>>> routine blocked and other 2 2 mins thread waiting on waitgroup.
>> 
>> Are you spinning up a Go routine every 2 minutes regardless of whether the 
>> previous instance had completed? Your comment is not clear whether you have 
>> one, or more than one, goroutine issuing a BigQuery query at the time you 
>> saw the problem and captured the backtraces. If you are, in fact, starting 
>> those Go routines without regard to whether the prior instance had completed 
>> that seems like a "yellow flag", "here be dragons", situation. 
>> 
>> -- 
>> Kurtis Rader
>> Caretaker of the exceptional canines Junior and Hank
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-y%3DQg_dx5W0qDWiMV9sB0eFToTEoiD83_mbn7RHC%3DQ%3Dg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/DE7A59F3-A812-4F0C-9A4D-43B8941A4387%40ix.netcom.com.


Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-27 Thread Robert Engels
Ok, so your data collectors never complete. A simple change to make this easier 
to diagnose is to not spin up another collector controller at the 2 min mark if 
the previous has not completed. 

I would determine if the stuck collector is BQ or Elastic and check the server 
side logs. 

> On Aug 26, 2020, at 11:29 PM, Kurtis Rader  wrote:
> 
> 
>> On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar 
>>  wrote:
> 
>> Right, then it looks less likely that we are blocked on a mutex.
>> 
>> Every 2 minutes we spin up a go routine which then in turn spins up a bunch 
>> of go routines to collect data from big query & elastic (data collector 
>> routines).
>> The 2 minutes go routine then in turn waits on waitgroup for data collector 
>> routines to come back. So its not single go routine from our side at least.
>> From backtrace we have from sigabort we see only one such data collector go 
>> routine blocked and other 2 2 mins thread waiting on waitgroup.
> 
> Are you spinning up a Go routine every 2 minutes regardless of whether the 
> previous instance had completed? Your comment is not clear whether you have 
> one, or more than one, goroutine issuing a BigQuery query at the time you saw 
> the problem and captured the backtraces. If you are, in fact, starting those 
> Go routines without regard to whether the prior instance had completed that 
> seems like a "yellow flag", "here be dragons", situation. 
> 
> -- 
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-y%3DQg_dx5W0qDWiMV9sB0eFToTEoiD83_mbn7RHC%3DQ%3Dg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/6E5112C8-0C08-4F37-A98C-38C9D6BB56A5%40ix.netcom.com.


Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Siddhesh Divekar
That's right Kurtis, we don't wait or check if the prior go routine which
fired the big query has completed or not.
The rationale there was the data which comes out from big query is not much
and should not take more than 2 mins.

To be precise every 2 mins we will fire 6 independent Big queries & 24
independent elastic queries each in its own go routine.
The data coming out is cached locally and when the user requests that data
we serve it out of cache.

What we are unable to understand is why would this cause http server to
stop responding to user requests which are not coming for this data.
Regular requests were also not getting served. Since the system was
responding we had to send sig abort and collect the back trace.


goroutine 3332366 [select, *79 minutes*]:
net/http.(*http2ClientConn).*roundTrip*(0xc003759b00, 0xc00050b700, 0x0,
0x0, 0x0, 0x0)
/usr/local/go/src/net/http/h2_bundle.go:7573 +0x9a0
net/http.(*http2Transport).*RoundTripOpt*(0xca0480, 0xc00050b700,
0xc0037a8600, 0x74aba5, 0x0, 0xc0001c8000)
/usr/local/go/src/net/http/h2_bundle.go:6936 +0x172
net/http.(*http2Transport).*RoundTrip*(...)
/usr/local/go/src/net/http/h2_bundle.go:6897
net/http.http2noDialH2RoundTripper.*RoundTrip*(0xca0480, 0xc00050b700,
0xc0047e97a0, 0x5, 0xc0001c8088)
/usr/local/go/src/net/http/h2_bundle.go:9032 +0x3e
net/http.(*Transport).*roundTrip*(0x5b48c00, 0xc00050b700, 0x2b3ce20,
0x5b48801, 0xc0048deb00)
/usr/local/go/src/net/http/transport.go:485 +0xdc8
net/http.(*Transport).*RoundTrip*(0x5b48c00, 0xc00050b700, 0x3301658, 0xa,
0xc0048debd0)
/usr/local/go/src/net/http/roundtrip.go:17 +0x35
google.golang.org/api/transport/http.(*parameterTransport).*RoundTrip*(0xc0028d9e40,
0xc00050b600, 0x5, 0x5, 0x0)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/transport/http/dial.go:145 +0x224
go.opencensus.io/plugin/ochttp.(*traceTransport).*RoundTrip*(0xc00284d6c0,
0xc00050b600, 0xc002080980, 0x1, 0x1)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/trace.go:84
+0x456
go.opencensus.io/plugin/ochttp.statsTransport.*RoundTrip*(0x38d4e60,
0xc00284d6c0, 0xc00050b400, 0xc0025e27e0, 0xc0025e27e0, 0x2ecb720)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client_stats.go:57
+0x5cc
go.opencensus.io/plugin/ochttp.(*Transport).*RoundTrip*(0xc003731d60,
0xc00050b400, 0x0, 0x0, 0x150)

/builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client.go:99
+0x20e
golang.org/x/oauth2.(*Transport).*RoundTrip*(0xc00376bc20, 0xc00050b300,
0x0, 0x0, 0x0)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
golang.org/x/oauth2@v0.0.0-20200107190931-bf48bf16ab8d/transport.go:55 +0xf4
net/http.send(0xc00050b300, 0x38d5060, 0xc00376bc20, 0x0, 0x0, 0x0,
0xc0036a64c8, 0x1d6cede, 0x1, 0x0)
/usr/local/go/src/net/http/client.go:250 +0x443
net/http.(*Client).send(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0,
0xc0036a64c8, 0x0, 0x1, 0x0)
/usr/local/go/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:641 +0x3ce
net/http.(*Client).Do(...)
/usr/local/go/src/net/http/client.go:509
google.golang.org/api/internal/gensupport.send(0x393ede0, 0xc00346b8c0,
0xc003afdb60, 0xc00050b200, 0x5b7fee0, 0xc0037a9300, 0x1d6f876)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:67 +0xef
google.golang.org/api/internal/gensupport.SendRequest(0x393ede0,
0xc00346b8c0, 0xc003afdb60, 0xc00050b200, 0xc0037a95e8, 0x84, 0x0)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/internal/gensupport/send.go:52 +0x193
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).doRequest(0xc0037a98f8,
0x32f6d9c, 0x4, 0x0, 0x0, 0x203001)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7768 +0x765
google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).Do(0xc0037a98f8,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7780 +0xa8
cloud.google.com/go/bigquery.(*Job).waitForQuery.func1(0x6d183a,
0xc0037a97c0, 0xc0037a97e0)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go/bigquery@v1.4.0/job.go:320 +0x48
cloud.google.com/go/internal.retry(0x393ede0, 0xc00346b8c0, 0x3b9aca00,
0xdf8475800, 0x4000, 0x0, 0xc0037a98e0, 0x3433818, 0x0,
0x203000)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:38 +0x48
cloud.google.com/go/internal.Retry(...)
/builds/fusionio/fusion/controller/gopath/pkg/mod/
cloud.google.com/go@v0.53.0/internal/retry.go:31

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Kurtis Rader
On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar 
wrote:

> Right, then it looks less likely that we are blocked on a mutex.
>
> Every 2 minutes we spin up a go routine which then in turn spins up a
> bunch of go routines to collect data from big query & elastic (data
> collector routines).
> The 2 minutes go routine then in turn waits on waitgroup for data
> collector routines to come back. So its not single go routine from our side
> at least.
> From backtrace we have from sigabort we see only one such data collector
> go routine blocked and other 2 2 mins thread waiting on waitgroup.
>

Are you spinning up a Go routine every 2 minutes regardless of whether the
previous instance had completed? Your comment is not clear whether you have
one, or more than one, goroutine issuing a BigQuery query at the time you
saw the problem and captured the backtraces. If you are, in fact, starting
those Go routines without regard to whether the prior instance had
completed that seems like a "yellow flag", "here be dragons", situation.

-- 
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-y%3DQg_dx5W0qDWiMV9sB0eFToTEoiD83_mbn7RHC%3DQ%3Dg%40mail.gmail.com.


Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Siddhesh Divekar
Typo.

>From backtrace we have from sigabort we see only *2* such data collector go
> routine blocked and other 2 2 mins thread waiting on waitgroup.



On Wed, Aug 26, 2020 at 8:50 PM Siddhesh Divekar 
wrote:

> Right, then it looks less likely that we are blocked on a mutex.
>
> Every 2 minutes we spin up a go routine which then in turn spins up a
> bunch of go routines to collect data from big query & elastic (data
> collector routines).
> The 2 minutes go routine then in turn waits on waitgroup for data
> collector routines to come back. So its not single go routine from our side
> at least.
> From backtrace we have from sigabort we see only one such data collector
> go routine blocked and other 2 2 mins thread waiting on waitgroup.
>
> I see similar issues on the internet. Kubernetes hit the following issue
> which has exactly the same signature as us.
>
> https://github.com/kubernetes/kubernetes/issues/53191
>
> I will follow up with big query guys to see what their behavior is. If one
> of the requests gets stuck on big query, what do they do with
> other requests?
> But in that case in our back trace we should see all other requests
> waiting in their code which is not the case.
>
>
>
> On Wed, Aug 26, 2020 at 7:44 PM Robert Engels 
> wrote:
>
>> If you look at the stack trace the futex is because it is trying to
>> shutdown the entire process - killing all of the M’s - probably because you
>> sent the sigabort
>>
>> On Aug 26, 2020, at 9:41 PM, Robert Engels  wrote:
>>
>> 
>> The big query client may be “single threaded” meaning you may need a pool
>> of connections. Not sure - haven’t used the BQ library from Go.
>>
>> If that’s the case a single slow query will block all of your connections
>> (if they are all accessing BQ). Plus there are BQ rate limits - when you
>> hit those the connector may block - again blocking all clients.
>>
>> On Aug 26, 2020, at 8:27 PM, Siddhesh Divekar 
>> wrote:
>>
>> 
>> Robert,
>>
>> That's where the confusion is.
>> From the backtrace we see that two go routines are blocked on the
>> waitgroup.
>> The go routine on which they are blocked is waiting on big query to
>> return.
>>
>> On every user request we create a new go routine so they shouldn't get
>> blocked because of these two.
>>
>> However, we do see one go routine blocked on futex. Is there a way to
>> know from where it originated from ?
>> When does golang use futex under the hold so that we can investigate from
>> that angle.
>>
>> goroutine 0 [idle]:
>> runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640,
>> 0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
>> /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
>> runtime.futexsleep(0x5b607a8, 0x0, 0x)
>> /usr/local/go/src/runtime/os_linux.go:44 +0x46
>> runtime.notesleep(0x5b607a8)
>> /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
>> runtime.stopm()
>> /usr/local/go/src/runtime/proc.go:1928 +0xc0
>> runtime.exitsyscall0(0xc003638000)
>> /usr/local/go/src/runtime/proc.go:3140 +0x111
>> runtime.mcall(0x0)
>> /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b
>>
>> On Wed, Aug 26, 2020 at 5:02 PM robert engels 
>> wrote:
>>
>>> That should allow your server to clean up “dead” clients. Typically you
>>> use this in conjunction with a ‘keep alive’ in the protocol.
>>>
>>> I am doubtful that a bunch of dead clients hanging around would cause a
>>> CPU spike. You really don’t have too many Go routines/connections involved
>>> (I’ve worked with 1000’s of live connections).
>>>
>>> I would look at something else… I am guessing your synchronization is
>>> incorrect and your threads are blocking, and you have a few that are
>>> spinning expecting something to happen that never will (because the source
>>> of the event is blocked on the mutex/lock).
>>>
>>>
>>>
>>> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> Robert,
>>>
>>> I assume we can safely add these timeouts based on what we expect
>>> should be a reasonable timeout for our clients ?
>>>
>>> s.ReadTimeout = expTimeOut * time.Second
>>> s.WriteTimeout = expTimeOut * time.Second
>>>
>>> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
 Both servers and data sources are in the cloud.
 I would not say a lot of data, it's precomputed data which shouldn't
 take that long.


 On Tue, Aug 25, 2020 at 11:25 AM Robert Engels 
 wrote:

> Are you transferring a lot of data? Are the servers non-cloud hosted?
>
> You could be encountering “tcp stalls”.
>
> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
> 
> Clients are over the internet.
>
> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels 
> wrote:
>
>> The tcp protocol allows the connection to wait for hours. Go routines
>> stuck in wait do not burn CPU. Are the 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Siddhesh Divekar
Right, then it looks less likely that we are blocked on a mutex.

Every 2 minutes we spin up a go routine which then in turn spins up a bunch
of go routines to collect data from big query & elastic (data collector
routines).
The 2 minutes go routine then in turn waits on waitgroup for data collector
routines to come back. So its not single go routine from our side at least.
>From backtrace we have from sigabort we see only one such data collector go
routine blocked and other 2 2 mins thread waiting on waitgroup.

I see similar issues on the internet. Kubernetes hit the following issue
which has exactly the same signature as us.

https://github.com/kubernetes/kubernetes/issues/53191

I will follow up with big query guys to see what their behavior is. If one
of the requests gets stuck on big query, what do they do with
other requests?
But in that case in our back trace we should see all other requests waiting
in their code which is not the case.



On Wed, Aug 26, 2020 at 7:44 PM Robert Engels  wrote:

> If you look at the stack trace the futex is because it is trying to
> shutdown the entire process - killing all of the M’s - probably because you
> sent the sigabort
>
> On Aug 26, 2020, at 9:41 PM, Robert Engels  wrote:
>
> 
> The big query client may be “single threaded” meaning you may need a pool
> of connections. Not sure - haven’t used the BQ library from Go.
>
> If that’s the case a single slow query will block all of your connections
> (if they are all accessing BQ). Plus there are BQ rate limits - when you
> hit those the connector may block - again blocking all clients.
>
> On Aug 26, 2020, at 8:27 PM, Siddhesh Divekar 
> wrote:
>
> 
> Robert,
>
> That's where the confusion is.
> From the backtrace we see that two go routines are blocked on the
> waitgroup.
> The go routine on which they are blocked is waiting on big query to return.
>
> On every user request we create a new go routine so they shouldn't get
> blocked because of these two.
>
> However, we do see one go routine blocked on futex. Is there a way to know
> from where it originated from ?
> When does golang use futex under the hold so that we can investigate from
> that angle.
>
> goroutine 0 [idle]:
> runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640,
> 0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
> /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
> runtime.futexsleep(0x5b607a8, 0x0, 0x)
> /usr/local/go/src/runtime/os_linux.go:44 +0x46
> runtime.notesleep(0x5b607a8)
> /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
> runtime.stopm()
> /usr/local/go/src/runtime/proc.go:1928 +0xc0
> runtime.exitsyscall0(0xc003638000)
> /usr/local/go/src/runtime/proc.go:3140 +0x111
> runtime.mcall(0x0)
> /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b
>
> On Wed, Aug 26, 2020 at 5:02 PM robert engels 
> wrote:
>
>> That should allow your server to clean up “dead” clients. Typically you
>> use this in conjunction with a ‘keep alive’ in the protocol.
>>
>> I am doubtful that a bunch of dead clients hanging around would cause a
>> CPU spike. You really don’t have too many Go routines/connections involved
>> (I’ve worked with 1000’s of live connections).
>>
>> I would look at something else… I am guessing your synchronization is
>> incorrect and your threads are blocking, and you have a few that are
>> spinning expecting something to happen that never will (because the source
>> of the event is blocked on the mutex/lock).
>>
>>
>>
>> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar <
>> siddhesh.dive...@gmail.com> wrote:
>>
>> Robert,
>>
>> I assume we can safely add these timeouts based on what we expect
>> should be a reasonable timeout for our clients ?
>>
>> s.ReadTimeout = expTimeOut * time.Second
>> s.WriteTimeout = expTimeOut * time.Second
>>
>> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar <
>> siddhesh.dive...@gmail.com> wrote:
>>
>>> Both servers and data sources are in the cloud.
>>> I would not say a lot of data, it's precomputed data which shouldn't
>>> take that long.
>>>
>>>
>>> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels 
>>> wrote:
>>>
 Are you transferring a lot of data? Are the servers non-cloud hosted?

 You could be encountering “tcp stalls”.

 On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar <
 siddhesh.dive...@gmail.com> wrote:

 
 Clients are over the internet.

 On Tue, Aug 25, 2020 at 3:25 AM Robert Engels 
 wrote:

> The tcp protocol allows the connection to wait for hours. Go routines
> stuck in wait do not burn CPU. Are the clients local or remote (over
> internet)?
>
> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
> 
> Robert,
>
> We will do the profiling next time we hit the issue again & see what
> is happening.
> This was the first time we saw the issue & don't want to get rid of
> 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Robert Engels
If you look at the stack trace the futex is because it is trying to shutdown 
the entire process - killing all of the M’s - probably because you sent the 
sigabort 

> On Aug 26, 2020, at 9:41 PM, Robert Engels  wrote:
> 
> 
> The big query client may be “single threaded” meaning you may need a pool of 
> connections. Not sure - haven’t used the BQ library from Go. 
> 
> If that’s the case a single slow query will block all of your connections (if 
> they are all accessing BQ). Plus there are BQ rate limits - when you hit 
> those the connector may block - again blocking all clients. 
> 
>>> On Aug 26, 2020, at 8:27 PM, Siddhesh Divekar  
>>> wrote:
>>> 
>> 
>> Robert,
>> 
>> That's where the confusion is.
>> From the backtrace we see that two go routines are blocked on the waitgroup.
>> The go routine on which they are blocked is waiting on big query to return.
>> 
>> On every user request we create a new go routine so they shouldn't get 
>> blocked because of these two.
>> 
>> However, we do see one go routine blocked on futex. Is there a way to know 
>> from where it originated from ?
>> When does golang use futex under the hold so that we can investigate from 
>> that angle.
>> 
>> goroutine 0 [idle]:
>> runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640, 
>> 0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
>> /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
>> runtime.futexsleep(0x5b607a8, 0x0, 0x)
>> /usr/local/go/src/runtime/os_linux.go:44 +0x46
>> runtime.notesleep(0x5b607a8)
>> /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
>> runtime.stopm()
>> /usr/local/go/src/runtime/proc.go:1928 +0xc0
>> runtime.exitsyscall0(0xc003638000)
>> /usr/local/go/src/runtime/proc.go:3140 +0x111
>> runtime.mcall(0x0)
>> /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b
>> 
>>> On Wed, Aug 26, 2020 at 5:02 PM robert engels  wrote:
>>> That should allow your server to clean up “dead” clients. Typically you use 
>>> this in conjunction with a ‘keep alive’ in the protocol.
>>> 
>>> I am doubtful that a bunch of dead clients hanging around would cause a CPU 
>>> spike. You really don’t have too many Go routines/connections involved 
>>> (I’ve worked with 1000’s of live connections).
>>> 
>>> I would look at something else… I am guessing your synchronization is 
>>> incorrect and your threads are blocking, and you have a few that are 
>>> spinning expecting something to happen that never will (because the source 
>>> of the event is blocked on the mutex/lock).
>>> 
>>> 
>>> 
 On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar 
  wrote:
 
 Robert,
 
 I assume we can safely add these timeouts based on what we expect
 should be a reasonable timeout for our clients ?
 
 s.ReadTimeout = expTimeOut * time.Second
 s.WriteTimeout = expTimeOut * time.Second
 
> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar 
>  wrote:
> Both servers and data sources are in the cloud.
> I would not say a lot of data, it's precomputed data which shouldn't take 
> that long.
> 
> 
>> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels  
>> wrote:
>> Are you transferring a lot of data? Are the servers non-cloud hosted? 
>> 
>> You could be encountering “tcp stalls”. 
>> 
 On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar 
  wrote:
 
>>> 
>>> Clients are over the internet.
>>> 
 On Tue, Aug 25, 2020 at 3:25 AM Robert Engels  
 wrote:
 The tcp protocol allows the connection to wait for hours. Go routines 
 stuck in wait do not burn CPU. Are the clients local or remote (over 
 internet)?
 
>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar 
>>  wrote:
>> 
> 
> Robert,
> 
> We will do the profiling next time we hit the issue again & see what 
> is happening.
> This was the first time we saw the issue & don't want to get rid of 
> http2 advantages without making sure it's the actual culprit.
> 
> Do you think in the meanwhile we should do what the discussion below 
> suggests anyways ?
> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
> 
> 
>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels 
>>  wrote:
>> I think it is too hard to tell with the limited information. It 
>> could be exhausted connections or it could be thrashing (the claim 
>> of high cpu)
>> 
>> I think you want to run profiling capture prior to hitting the stick 
>> state - you should be able to detect what is happening. 
>> 
>> If the referenced issue is related I would assume you should be able 
>> to connect by forcing http/1. 
>> 
>> You can also try disabling http/2 and 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Robert Engels
The big query client may be “single threaded” meaning you may need a pool of 
connections. Not sure - haven’t used the BQ library from Go. 

If that’s the case a single slow query will block all of your connections (if 
they are all accessing BQ). Plus there are BQ rate limits - when you hit those 
the connector may block - again blocking all clients. 

> On Aug 26, 2020, at 8:27 PM, Siddhesh Divekar  
> wrote:
> 
> 
> Robert,
> 
> That's where the confusion is.
> From the backtrace we see that two go routines are blocked on the waitgroup.
> The go routine on which they are blocked is waiting on big query to return.
> 
> On every user request we create a new go routine so they shouldn't get 
> blocked because of these two.
> 
> However, we do see one go routine blocked on futex. Is there a way to know 
> from where it originated from ?
> When does golang use futex under the hold so that we can investigate from 
> that angle.
> 
> goroutine 0 [idle]:
> runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640, 
> 0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
> /usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
> runtime.futexsleep(0x5b607a8, 0x0, 0x)
> /usr/local/go/src/runtime/os_linux.go:44 +0x46
> runtime.notesleep(0x5b607a8)
> /usr/local/go/src/runtime/lock_futex.go:151 +0x9f
> runtime.stopm()
> /usr/local/go/src/runtime/proc.go:1928 +0xc0
> runtime.exitsyscall0(0xc003638000)
> /usr/local/go/src/runtime/proc.go:3140 +0x111
> runtime.mcall(0x0)
> /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b
> 
> On Wed, Aug 26, 2020 at 5:02 PM robert engels  wrote:
>> That should allow your server to clean up “dead” clients. Typically you use 
>> this in conjunction with a ‘keep alive’ in the protocol.
>> 
>> I am doubtful that a bunch of dead clients hanging around would cause a CPU 
>> spike. You really don’t have too many Go routines/connections involved (I’ve 
>> worked with 1000’s of live connections).
>> 
>> I would look at something else… I am guessing your synchronization is 
>> incorrect and your threads are blocking, and you have a few that are 
>> spinning expecting something to happen that never will (because the source 
>> of the event is blocked on the mutex/lock).
>> 
>> 
>> 
>>> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar  
>>> wrote:
>>> 
>>> Robert,
>>> 
>>> I assume we can safely add these timeouts based on what we expect
>>> should be a reasonable timeout for our clients ?
>>> 
>>> s.ReadTimeout = expTimeOut * time.Second
>>> s.WriteTimeout = expTimeOut * time.Second
>>> 
>>> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar 
>>>  wrote:
 Both servers and data sources are in the cloud.
 I would not say a lot of data, it's precomputed data which shouldn't take 
 that long.
 
 
 On Tue, Aug 25, 2020 at 11:25 AM Robert Engels  
 wrote:
> Are you transferring a lot of data? Are the servers non-cloud hosted? 
> 
> You could be encountering “tcp stalls”. 
> 
>> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar 
>>  wrote:
>> 
>> 
>> Clients are over the internet.
>> 
>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels  
>> wrote:
>>> The tcp protocol allows the connection to wait for hours. Go routines 
>>> stuck in wait do not burn CPU. Are the clients local or remote (over 
>>> internet)?
>>> 
 On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar 
  wrote:
 
 
 Robert,
 
 We will do the profiling next time we hit the issue again & see what 
 is happening.
 This was the first time we saw the issue & don't want to get rid of 
 http2 advantages without making sure it's the actual culprit.
 
 Do you think in the meanwhile we should do what the discussion below 
 suggests anyways ?
 https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
 
 
 On Mon, Aug 24, 2020 at 5:37 PM Robert Engels  
 wrote:
> I think it is too hard to tell with the limited information. It could 
> be exhausted connections or it could be thrashing (the claim of high 
> cpu)
> 
> I think you want to run profiling capture prior to hitting the stick 
> state - you should be able to detect what is happening. 
> 
> If the referenced issue is related I would assume you should be able 
> to connect by forcing http/1. 
> 
> You can also try disabling http/2 and see if your issue goes away. 
> 
> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar 
>  wrote:
> 
>> Hi Robert,
>> 
>> Sorry I missed your earlier response.
>> 
>> From what we saw our UI was blocked and since everything was 
>> unresponsive
>> we had to recover the system by sending 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Siddhesh Divekar
Robert,

That's where the confusion is.
>From the backtrace we see that two go routines are blocked on the waitgroup.
The go routine on which they are blocked is waiting on big query to return.

On every user request we create a new go routine so they shouldn't get
blocked because of these two.

However, we do see one go routine blocked on futex. Is there a way to know
from where it originated from ?
When does golang use futex under the hold so that we can investigate from
that angle.

goroutine 0 [idle]:
runtime.futex(0x5b607a8, 0x80, 0x0, 0x0, 0x0, 0x7ffc5dd1bd20, 0x437640,
0xc003638038, 0x7ffc5dd1bd48, 0x40c78f, ...)
/usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0x5b607a8, 0x0, 0x)
/usr/local/go/src/runtime/os_linux.go:44 +0x46
runtime.notesleep(0x5b607a8)
/usr/local/go/src/runtime/lock_futex.go:151 +0x9f
runtime.stopm()
/usr/local/go/src/runtime/proc.go:1928 +0xc0
runtime.exitsyscall0(0xc003638000)
/usr/local/go/src/runtime/proc.go:3140 +0x111
runtime.mcall(0x0)
/usr/local/go/src/runtime/asm_amd64.s:318 +0x5b

On Wed, Aug 26, 2020 at 5:02 PM robert engels  wrote:

> That should allow your server to clean up “dead” clients. Typically you
> use this in conjunction with a ‘keep alive’ in the protocol.
>
> I am doubtful that a bunch of dead clients hanging around would cause a
> CPU spike. You really don’t have too many Go routines/connections involved
> (I’ve worked with 1000’s of live connections).
>
> I would look at something else… I am guessing your synchronization is
> incorrect and your threads are blocking, and you have a few that are
> spinning expecting something to happen that never will (because the source
> of the event is blocked on the mutex/lock).
>
>
>
> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar 
> wrote:
>
> Robert,
>
> I assume we can safely add these timeouts based on what we expect
> should be a reasonable timeout for our clients ?
>
> s.ReadTimeout = expTimeOut * time.Second
> s.WriteTimeout = expTimeOut * time.Second
>
> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Both servers and data sources are in the cloud.
>> I would not say a lot of data, it's precomputed data which shouldn't take
>> that long.
>>
>>
>> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels 
>> wrote:
>>
>>> Are you transferring a lot of data? Are the servers non-cloud hosted?
>>>
>>> You could be encountering “tcp stalls”.
>>>
>>> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> 
>>> Clients are over the internet.
>>>
>>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels 
>>> wrote:
>>>
 The tcp protocol allows the connection to wait for hours. Go routines
 stuck in wait do not burn CPU. Are the clients local or remote (over
 internet)?

 On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <
 siddhesh.dive...@gmail.com> wrote:

 
 Robert,

 We will do the profiling next time we hit the issue again & see what is
 happening.
 This was the first time we saw the issue & don't want to get rid of
 http2 advantages without making sure it's the actual culprit.

 Do you think in the meanwhile we should do what the discussion below
 suggests anyways ?

 https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time


 On Mon, Aug 24, 2020 at 5:37 PM Robert Engels 
 wrote:

> I think it is too hard to tell with the limited information. It could
> be exhausted connections or it could be thrashing (the claim of high cpu)
>
> I think you want to run profiling capture prior to hitting the stick
> state - you should be able to detect what is happening.
>
> If the referenced issue is related I would assume you should be able
> to connect by forcing http/1.
>
> You can also try disabling http/2 and see if your issue goes away.
>
> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
> Hi Robert,
>
> Sorry I missed your earlier response.
>
> From what we saw our UI was blocked and since everything was
> unresponsive
> we had to recover the system by sending sig abort.
>
> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Looking at the no. of go routines we have does this apply to this
>> issue ?
>> https://github.com/golang/go/issues/27044
>>
>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
>> wrote:
>>
>>> Go routines in a waiting state should not be consuming CPU. Are you
>>> certain they are not in constant transition from waiting to processing -
>>> this could show up as high CPU usage while everything looks blocks.
>>>
>>> I would use pprof - github.com/robaho/goanalyzer might be of
>>> assistance here to 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread robert engels
That should allow your server to clean up “dead” clients. Typically you use 
this in conjunction with a ‘keep alive’ in the protocol.

I am doubtful that a bunch of dead clients hanging around would cause a CPU 
spike. You really don’t have too many Go routines/connections involved (I’ve 
worked with 1000’s of live connections).

I would look at something else… I am guessing your synchronization is incorrect 
and your threads are blocking, and you have a few that are spinning expecting 
something to happen that never will (because the source of the event is blocked 
on the mutex/lock).



> On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar  
> wrote:
> 
> Robert,
> 
> I assume we can safely add these timeouts based on what we expect
> should be a reasonable timeout for our clients ?
> 
> s.ReadTimeout = expTimeOut * time.Second
> s.WriteTimeout = expTimeOut * time.Second
> 
> On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar  > wrote:
> Both servers and data sources are in the cloud.
> I would not say a lot of data, it's precomputed data which shouldn't take 
> that long.
> 
> 
> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels  > wrote:
> Are you transferring a lot of data? Are the servers non-cloud hosted? 
> 
> You could be encountering “tcp stalls”. 
> 
>> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar > > wrote:
>> 
>> 
>> Clients are over the internet.
>> 
>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels > > wrote:
>> The tcp protocol allows the connection to wait for hours. Go routines stuck 
>> in wait do not burn CPU. Are the clients local or remote (over internet)?
>> 
>>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar >> > wrote:
>>> 
>>> 
>>> Robert,
>>> 
>>> We will do the profiling next time we hit the issue again & see what is 
>>> happening.
>>> This was the first time we saw the issue & don't want to get rid of http2 
>>> advantages without making sure it's the actual culprit.
>>> 
>>> Do you think in the meanwhile we should do what the discussion below 
>>> suggests anyways ?
>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>>  
>>> 
>>> 
>>> 
>>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels >> > wrote:
>>> I think it is too hard to tell with the limited information. It could be 
>>> exhausted connections or it could be thrashing (the claim of high cpu)
>>> 
>>> I think you want to run profiling capture prior to hitting the stick state 
>>> - you should be able to detect what is happening. 
>>> 
>>> If the referenced issue is related I would assume you should be able to 
>>> connect by forcing http/1. 
>>> 
>>> You can also try disabling http/2 and see if your issue goes away. 
>>> 
>>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar >> > wrote:
>>> 
 Hi Robert,
 
 Sorry I missed your earlier response.
 
 From what we saw our UI was blocked and since everything was unresponsive
 we had to recover the system by sending sig abort. 
 
 On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar 
 mailto:siddhesh.dive...@gmail.com>> wrote:
 Looking at the no. of go routines we have does this apply to this issue ?
 https://github.com/golang/go/issues/27044 
 
 
 On Mon, Aug 24, 2020 at 12:54 PM Robert Engels >>> > wrote:
 Go routines in a waiting state should not be consuming CPU. Are you 
 certain they are not in constant transition from waiting to processing - 
 this could show up as high CPU usage while everything looks blocks. 
 
 I would use pprof - github.com/robaho/goanalyzer 
  might be of assistance here to see 
 the actual work being done. 
 
> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar  > wrote:
> 
> 
> Hi Ian,
> 
> Thanks for replying. 
> 
> We have a go server running which handles user requests & collects data 
> from various sources like gcp's cloud sql and big query.
> We are using shopify's sarama library to do kafka operations.
> 
> There are seeing lots of go routines in waiting state for several 
> minutes. 
> Over the period of time around 587 goroutines have been spun up.
> 
> We see that two go routines are stuck on gcp big query and we are using 
> wait groups there.
> However, it's not clear why that would cause all other go routines to get 
> hung & make cpu go high.
> 
> goroutine 3332131 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc001c4fcf8)
> /usr/local/go/src/runtime/sema.go:56 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-26 Thread Siddhesh Divekar
Robert,

I assume we can safely add these timeouts based on what we expect
should be a reasonable timeout for our clients ?

s.ReadTimeout = expTimeOut * time.Second
s.WriteTimeout = expTimeOut * time.Second

On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar 
wrote:

> Both servers and data sources are in the cloud.
> I would not say a lot of data, it's precomputed data which shouldn't take
> that long.
>
>
> On Tue, Aug 25, 2020 at 11:25 AM Robert Engels 
> wrote:
>
>> Are you transferring a lot of data? Are the servers non-cloud hosted?
>>
>> You could be encountering “tcp stalls”.
>>
>> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar 
>> wrote:
>>
>> 
>> Clients are over the internet.
>>
>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels 
>> wrote:
>>
>>> The tcp protocol allows the connection to wait for hours. Go routines
>>> stuck in wait do not burn CPU. Are the clients local or remote (over
>>> internet)?
>>>
>>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> 
>>> Robert,
>>>
>>> We will do the profiling next time we hit the issue again & see what is
>>> happening.
>>> This was the first time we saw the issue & don't want to get rid of
>>> http2 advantages without making sure it's the actual culprit.
>>>
>>> Do you think in the meanwhile we should do what the discussion below
>>> suggests anyways ?
>>>
>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>>
>>>
>>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels 
>>> wrote:
>>>
 I think it is too hard to tell with the limited information. It could
 be exhausted connections or it could be thrashing (the claim of high cpu)

 I think you want to run profiling capture prior to hitting the stick
 state - you should be able to detect what is happening.

 If the referenced issue is related I would assume you should be able to
 connect by forcing http/1.

 You can also try disabling http/2 and see if your issue goes away.

 On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar <
 siddhesh.dive...@gmail.com> wrote:

 Hi Robert,

 Sorry I missed your earlier response.

 From what we saw our UI was blocked and since everything was
 unresponsive
 we had to recover the system by sending sig abort.

 On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
 siddhesh.dive...@gmail.com> wrote:

> Looking at the no. of go routines we have does this apply to this
> issue ?
> https://github.com/golang/go/issues/27044
>
> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
> wrote:
>
>> Go routines in a waiting state should not be consuming CPU. Are you
>> certain they are not in constant transition from waiting to processing -
>> this could show up as high CPU usage while everything looks blocks.
>>
>> I would use pprof - github.com/robaho/goanalyzer might be of
>> assistance here to see the actual work being done.
>>
>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <
>> siddhesh.dive...@gmail.com> wrote:
>>
>> 
>> Hi Ian,
>>
>> Thanks for replying.
>>
>> We have a go server running which handles user requests & collects
>> data from various sources like gcp's cloud sql and big query.
>> We are using shopify's sarama library to do kafka operations.
>>
>> There are seeing lots of go routines in waiting state for several
>> minutes.
>> Over the period of time around 587 goroutines have been spun up.
>>
>> We see that two go routines are stuck on gcp big query and we are
>> using wait groups there.
>> However, it's not clear why that would cause all other go routines to
>> get hung & make cpu go high.
>>
>> goroutine 3332131 [semacquire, 79 minutes]:
>> sync.runtime_Semacquire(0xc001c4fcf8)
>> /usr/local/go/src/runtime/sema.go:56 +0x42
>> sync.(*WaitGroup).Wait(0xc001c4fcf0)
>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>> 0xc002356810, 0x24, 0x32f7b78, 0x5)
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77
>> +0xf2
>> created by
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100
>> +0xd8
>>

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-25 Thread Siddhesh Divekar
Both servers and data sources are in the cloud.
I would not say a lot of data, it's precomputed data which shouldn't take
that long.


On Tue, Aug 25, 2020 at 11:25 AM Robert Engels 
wrote:

> Are you transferring a lot of data? Are the servers non-cloud hosted?
>
> You could be encountering “tcp stalls”.
>
> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar 
> wrote:
>
> 
> Clients are over the internet.
>
> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels 
> wrote:
>
>> The tcp protocol allows the connection to wait for hours. Go routines
>> stuck in wait do not burn CPU. Are the clients local or remote (over
>> internet)?
>>
>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <
>> siddhesh.dive...@gmail.com> wrote:
>>
>> 
>> Robert,
>>
>> We will do the profiling next time we hit the issue again & see what is
>> happening.
>> This was the first time we saw the issue & don't want to get rid of http2
>> advantages without making sure it's the actual culprit.
>>
>> Do you think in the meanwhile we should do what the discussion below
>> suggests anyways ?
>>
>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>
>>
>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels 
>> wrote:
>>
>>> I think it is too hard to tell with the limited information. It could be
>>> exhausted connections or it could be thrashing (the claim of high cpu)
>>>
>>> I think you want to run profiling capture prior to hitting the stick
>>> state - you should be able to detect what is happening.
>>>
>>> If the referenced issue is related I would assume you should be able to
>>> connect by forcing http/1.
>>>
>>> You can also try disabling http/2 and see if your issue goes away.
>>>
>>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> Hi Robert,
>>>
>>> Sorry I missed your earlier response.
>>>
>>> From what we saw our UI was blocked and since everything was unresponsive
>>> we had to recover the system by sending sig abort.
>>>
>>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
 Looking at the no. of go routines we have does this apply to this issue
 ?
 https://github.com/golang/go/issues/27044

 On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
 wrote:

> Go routines in a waiting state should not be consuming CPU. Are you
> certain they are not in constant transition from waiting to processing -
> this could show up as high CPU usage while everything looks blocks.
>
> I would use pprof - github.com/robaho/goanalyzer might be of
> assistance here to see the actual work being done.
>
> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
> 
> Hi Ian,
>
> Thanks for replying.
>
> We have a go server running which handles user requests & collects
> data from various sources like gcp's cloud sql and big query.
> We are using shopify's sarama library to do kafka operations.
>
> There are seeing lots of go routines in waiting state for several
> minutes.
> Over the period of time around 587 goroutines have been spun up.
>
> We see that two go routines are stuck on gcp big query and we are
> using wait groups there.
> However, it's not clear why that would cause all other go routines to
> get hung & make cpu go high.
>
> goroutine 3332131 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc001c4fcf8)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc001c4fcf0)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
> 0xc002356810, 0x24, 0x32f7b78, 0x5)
> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
> created by
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
> /builds/fusionio/fusion/controller/stats/prefetcher.go:100
> +0xd8
>
>
> goroutine 3332149 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc0015ede48)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc0015ede40)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-25 Thread Robert Engels
Are you transferring a lot of data? Are the servers non-cloud hosted? 

You could be encountering “tcp stalls”. 

> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar  
> wrote:
> 
> 
> Clients are over the internet.
> 
>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels  wrote:
>> The tcp protocol allows the connection to wait for hours. Go routines stuck 
>> in wait do not burn CPU. Are the clients local or remote (over internet)?
>> 
 On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar 
  wrote:
 
>>> 
>>> Robert,
>>> 
>>> We will do the profiling next time we hit the issue again & see what is 
>>> happening.
>>> This was the first time we saw the issue & don't want to get rid of http2 
>>> advantages without making sure it's the actual culprit.
>>> 
>>> Do you think in the meanwhile we should do what the discussion below 
>>> suggests anyways ?
>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>> 
>>> 
 On Mon, Aug 24, 2020 at 5:37 PM Robert Engels  
 wrote:
 I think it is too hard to tell with the limited information. It could be 
 exhausted connections or it could be thrashing (the claim of high cpu)
 
 I think you want to run profiling capture prior to hitting the stick state 
 - you should be able to detect what is happening. 
 
 If the referenced issue is related I would assume you should be able to 
 connect by forcing http/1. 
 
 You can also try disabling http/2 and see if your issue goes away. 
 
> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar 
>  wrote:
> 
> Hi Robert,
> 
> Sorry I missed your earlier response.
> 
> From what we saw our UI was blocked and since everything was unresponsive
> we had to recover the system by sending sig abort. 
> 
>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar 
>>  wrote:
>> Looking at the no. of go routines we have does this apply to this issue ?
>> https://github.com/golang/go/issues/27044
>> 
>>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels  
>>> wrote:
>>> Go routines in a waiting state should not be consuming CPU. Are you 
>>> certain they are not in constant transition from waiting to processing 
>>> - this could show up as high CPU usage while everything looks blocks. 
>>> 
>>> I would use pprof - github.com/robaho/goanalyzer might be of assistance 
>>> here to see the actual work being done. 
>>> 
> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar 
>  wrote:
> 
 
 Hi Ian,
 
 Thanks for replying. 
 
 We have a go server running which handles user requests & collects 
 data from various sources like gcp's cloud sql and big query.
 We are using shopify's sarama library to do kafka operations.
 
 There are seeing lots of go routines in waiting state for several 
 minutes. 
 Over the period of time around 587 goroutines have been spun up.
 
 We see that two go routines are stuck on gcp big query and we are 
 using wait groups there.
 However, it's not clear why that would cause all other go routines to 
 get hung & make cpu go high.
 
 goroutine 3332131 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc001c4fcf8)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc001c4fcf0)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
  0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
  0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
  0xc002356810, 0x24, 0x32f7b78, 0x5)
 /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
 created by 
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
 /builds/fusionio/fusion/controller/stats/prefetcher.go:100 
 +0xd8
 
 
 goroutine 3332149 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc0015ede48)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc0015ede40)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
  0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-25 Thread Siddhesh Divekar
Clients are over the internet.

On Tue, Aug 25, 2020 at 3:25 AM Robert Engels  wrote:

> The tcp protocol allows the connection to wait for hours. Go routines
> stuck in wait do not burn CPU. Are the clients local or remote (over
> internet)?
>
> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar 
> wrote:
>
> 
> Robert,
>
> We will do the profiling next time we hit the issue again & see what is
> happening.
> This was the first time we saw the issue & don't want to get rid of http2
> advantages without making sure it's the actual culprit.
>
> Do you think in the meanwhile we should do what the discussion below
> suggests anyways ?
>
> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>
>
> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels 
> wrote:
>
>> I think it is too hard to tell with the limited information. It could be
>> exhausted connections or it could be thrashing (the claim of high cpu)
>>
>> I think you want to run profiling capture prior to hitting the stick
>> state - you should be able to detect what is happening.
>>
>> If the referenced issue is related I would assume you should be able to
>> connect by forcing http/1.
>>
>> You can also try disabling http/2 and see if your issue goes away.
>>
>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar 
>> wrote:
>>
>> Hi Robert,
>>
>> Sorry I missed your earlier response.
>>
>> From what we saw our UI was blocked and since everything was unresponsive
>> we had to recover the system by sending sig abort.
>>
>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
>> siddhesh.dive...@gmail.com> wrote:
>>
>>> Looking at the no. of go routines we have does this apply to this issue ?
>>> https://github.com/golang/go/issues/27044
>>>
>>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
>>> wrote:
>>>
 Go routines in a waiting state should not be consuming CPU. Are you
 certain they are not in constant transition from waiting to processing -
 this could show up as high CPU usage while everything looks blocks.

 I would use pprof - github.com/robaho/goanalyzer might be of
 assistance here to see the actual work being done.

 On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <
 siddhesh.dive...@gmail.com> wrote:

 
 Hi Ian,

 Thanks for replying.

 We have a go server running which handles user requests & collects data
 from various sources like gcp's cloud sql and big query.
 We are using shopify's sarama library to do kafka operations.

 There are seeing lots of go routines in waiting state for several
 minutes.
 Over the period of time around 587 goroutines have been spun up.

 We see that two go routines are stuck on gcp big query and we are using
 wait groups there.
 However, it's not clear why that would cause all other go routines to
 get hung & make cpu go high.

 goroutine 3332131 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc001c4fcf8)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc001c4fcf0)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64

 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7

 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d

 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
 0xc002356810, 0x24, 0x32f7b78, 0x5)
 /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
 created by
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
 /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8


 goroutine 3332149 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc0015ede48)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc0015ede40)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64

 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7

 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d

 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
 /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-25 Thread Robert Engels
The tcp protocol allows the connection to wait for hours. Go routines stuck in 
wait do not burn CPU. Are the clients local or remote (over internet)?

> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar  
> wrote:
> 
> 
> Robert,
> 
> We will do the profiling next time we hit the issue again & see what is 
> happening.
> This was the first time we saw the issue & don't want to get rid of http2 
> advantages without making sure it's the actual culprit.
> 
> Do you think in the meanwhile we should do what the discussion below suggests 
> anyways ?
> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
> 
> 
>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels  wrote:
>> I think it is too hard to tell with the limited information. It could be 
>> exhausted connections or it could be thrashing (the claim of high cpu)
>> 
>> I think you want to run profiling capture prior to hitting the stick state - 
>> you should be able to detect what is happening. 
>> 
>> If the referenced issue is related I would assume you should be able to 
>> connect by forcing http/1. 
>> 
>> You can also try disabling http/2 and see if your issue goes away. 
>> 
>>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar  
>>> wrote:
>>> 
>>> Hi Robert,
>>> 
>>> Sorry I missed your earlier response.
>>> 
>>> From what we saw our UI was blocked and since everything was unresponsive
>>> we had to recover the system by sending sig abort. 
>>> 
 On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar 
  wrote:
 Looking at the no. of go routines we have does this apply to this issue ?
 https://github.com/golang/go/issues/27044
 
> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels  
> wrote:
> Go routines in a waiting state should not be consuming CPU. Are you 
> certain they are not in constant transition from waiting to processing - 
> this could show up as high CPU usage while everything looks blocks. 
> 
> I would use pprof - github.com/robaho/goanalyzer might be of assistance 
> here to see the actual work being done. 
> 
>>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar 
>>>  wrote:
>>> 
>> 
>> Hi Ian,
>> 
>> Thanks for replying. 
>> 
>> We have a go server running which handles user requests & collects data 
>> from various sources like gcp's cloud sql and big query.
>> We are using shopify's sarama library to do kafka operations.
>> 
>> There are seeing lots of go routines in waiting state for several 
>> minutes. 
>> Over the period of time around 587 goroutines have been spun up.
>> 
>> We see that two go routines are stuck on gcp big query and we are using 
>> wait groups there.
>> However, it's not clear why that would cause all other go routines to 
>> get hung & make cpu go high.
>> 
>> goroutine 3332131 [semacquire, 79 minutes]:
>> sync.runtime_Semacquire(0xc001c4fcf8)
>> /usr/local/go/src/runtime/sema.go:56 +0x42
>> sync.(*WaitGroup).Wait(0xc001c4fcf0)
>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>  0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>  0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>  0xc002356810, 0x24, 0x32f7b78, 0x5)
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
>> created by 
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>> 
>> 
>> goroutine 3332149 [semacquire, 79 minutes]:
>> sync.runtime_Semacquire(0xc0015ede48)
>> /usr/local/go/src/runtime/sema.go:56 +0x42
>> sync.(*WaitGroup).Wait(0xc0015ede40)
>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>  0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>  0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>  0xc002356ba0, 0x24, 0x32f7b78, 0x5)
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-24 Thread Siddhesh Divekar
Robert,

We will do the profiling next time we hit the issue again & see what is
happening.
This was the first time we saw the issue & don't want to get rid of http2
advantages without making sure it's the actual culprit.

Do you think in the meanwhile we should do what the discussion below
suggests anyways ?
https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time


On Mon, Aug 24, 2020 at 5:37 PM Robert Engels  wrote:

> I think it is too hard to tell with the limited information. It could be
> exhausted connections or it could be thrashing (the claim of high cpu)
>
> I think you want to run profiling capture prior to hitting the stick state
> - you should be able to detect what is happening.
>
> If the referenced issue is related I would assume you should be able to
> connect by forcing http/1.
>
> You can also try disabling http/2 and see if your issue goes away.
>
> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar 
> wrote:
>
> Hi Robert,
>
> Sorry I missed your earlier response.
>
> From what we saw our UI was blocked and since everything was unresponsive
> we had to recover the system by sending sig abort.
>
> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar <
> siddhesh.dive...@gmail.com> wrote:
>
>> Looking at the no. of go routines we have does this apply to this issue ?
>> https://github.com/golang/go/issues/27044
>>
>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
>> wrote:
>>
>>> Go routines in a waiting state should not be consuming CPU. Are you
>>> certain they are not in constant transition from waiting to processing -
>>> this could show up as high CPU usage while everything looks blocks.
>>>
>>> I would use pprof - github.com/robaho/goanalyzer might be of assistance
>>> here to see the actual work being done.
>>>
>>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <
>>> siddhesh.dive...@gmail.com> wrote:
>>>
>>> 
>>> Hi Ian,
>>>
>>> Thanks for replying.
>>>
>>> We have a go server running which handles user requests & collects data
>>> from various sources like gcp's cloud sql and big query.
>>> We are using shopify's sarama library to do kafka operations.
>>>
>>> There are seeing lots of go routines in waiting state for several
>>> minutes.
>>> Over the period of time around 587 goroutines have been spun up.
>>>
>>> We see that two go routines are stuck on gcp big query and we are using
>>> wait groups there.
>>> However, it's not clear why that would cause all other go routines to
>>> get hung & make cpu go high.
>>>
>>> goroutine 3332131 [semacquire, 79 minutes]:
>>> sync.runtime_Semacquire(0xc001c4fcf8)
>>> /usr/local/go/src/runtime/sema.go:56 +0x42
>>> sync.(*WaitGroup).Wait(0xc001c4fcf0)
>>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>> 0xc002356810, 0x24, 0x32f7b78, 0x5)
>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
>>> created by
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>>>
>>>
>>> goroutine 3332149 [semacquire, 79 minutes]:
>>> sync.runtime_Semacquire(0xc0015ede48)
>>> /usr/local/go/src/runtime/sema.go:56 +0x42
>>> sync.(*WaitGroup).Wait(0xc0015ede40)
>>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>>> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>>> 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>>>
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>>> 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
>>> created by
>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>>>
>>> I found the link below which kind of co-relates to our scenario.
>>>
>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>>
>>> Most of the go routines in the backtrace are in a net/http package so
>>> our suspicion is 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-24 Thread Robert Engels
I think it is too hard to tell with the limited information. It could be 
exhausted connections or it could be thrashing (the claim of high cpu)

I think you want to run profiling capture prior to hitting the stick state - 
you should be able to detect what is happening. 

If the referenced issue is related I would assume you should be able to connect 
by forcing http/1. 

You can also try disabling http/2 and see if your issue goes away. 

> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar  
> wrote:
> 
> Hi Robert,
> 
> Sorry I missed your earlier response.
> 
> From what we saw our UI was blocked and since everything was unresponsive
> we had to recover the system by sending sig abort. 
> 
>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar 
>>  wrote:
>> Looking at the no. of go routines we have does this apply to this issue ?
>> https://github.com/golang/go/issues/27044
>> 
>>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels  
>>> wrote:
>>> Go routines in a waiting state should not be consuming CPU. Are you certain 
>>> they are not in constant transition from waiting to processing - this could 
>>> show up as high CPU usage while everything looks blocks. 
>>> 
>>> I would use pprof - github.com/robaho/goanalyzer might be of assistance 
>>> here to see the actual work being done. 
>>> 
 On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar  
 wrote:
 
 
 Hi Ian,
 
 Thanks for replying. 
 
 We have a go server running which handles user requests & collects data 
 from various sources like gcp's cloud sql and big query.
 We are using shopify's sarama library to do kafka operations.
 
 There are seeing lots of go routines in waiting state for several minutes. 
 Over the period of time around 587 goroutines have been spun up.
 
 We see that two go routines are stuck on gcp big query and we are using 
 wait groups there.
 However, it's not clear why that would cause all other go routines to get 
 hung & make cpu go high.
 
 goroutine 3332131 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc001c4fcf8)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc001c4fcf0)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
  0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
  0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
  0xc002356810, 0x24, 0x32f7b78, 0x5)
 /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
 created by 
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
 /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
 
 
 goroutine 3332149 [semacquire, 79 minutes]:
 sync.runtime_Semacquire(0xc0015ede48)
 /usr/local/go/src/runtime/sema.go:56 +0x42
 sync.(*WaitGroup).Wait(0xc0015ede40)
 /usr/local/go/src/sync/waitgroup.go:130 +0x64
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
  0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
  0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
 /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
  0xc002356ba0, 0x24, 0x32f7b78, 0x5)
 /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
 created by 
 git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
 /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
 
 I found the link below which kind of co-relates to our scenario.
 https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
 
 Most of the go routines in the backtrace are in a net/http package so our 
 suspicion is that above bug in our code might be causing that.
 Even the bigquery is getting hung in net/http.
 
 We are using go version 1.13.8 & are running on gcp kubernetes cluster on 
 ubuntu 18.04 docker.
 
 go env
 GO111MODULE=""
 GOARCH="amd64"
 GOBIN=""
 GOCACHE="/root/.cache/go-build"
 GOENV="/root/.config/go/env"
 GOEXE=""
 GOFLAGS=""
 GOHOSTARCH="amd64"
 GOHOSTOS="linux"
 GONOPROXY=""

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-24 Thread Siddhesh Divekar
Hi Robert,

Sorry I missed your earlier response.

>From what we saw our UI was blocked and since everything was unresponsive
we had to recover the system by sending sig abort.

On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar 
wrote:

> Looking at the no. of go routines we have does this apply to this issue ?
> https://github.com/golang/go/issues/27044
>
> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
> wrote:
>
>> Go routines in a waiting state should not be consuming CPU. Are you
>> certain they are not in constant transition from waiting to processing -
>> this could show up as high CPU usage while everything looks blocks.
>>
>> I would use pprof - github.com/robaho/goanalyzer might be of assistance
>> here to see the actual work being done.
>>
>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar 
>> wrote:
>>
>> 
>> Hi Ian,
>>
>> Thanks for replying.
>>
>> We have a go server running which handles user requests & collects data
>> from various sources like gcp's cloud sql and big query.
>> We are using shopify's sarama library to do kafka operations.
>>
>> There are seeing lots of go routines in waiting state for several
>> minutes.
>> Over the period of time around 587 goroutines have been spun up.
>>
>> We see that two go routines are stuck on gcp big query and we are using
>> wait groups there.
>> However, it's not clear why that would cause all other go routines to get
>> hung & make cpu go high.
>>
>> goroutine 3332131 [semacquire, 79 minutes]:
>> sync.runtime_Semacquire(0xc001c4fcf8)
>> /usr/local/go/src/runtime/sema.go:56 +0x42
>> sync.(*WaitGroup).Wait(0xc001c4fcf0)
>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>> 0xc002356810, 0x24, 0x32f7b78, 0x5)
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
>> created by
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>>
>>
>> goroutine 3332149 [semacquire, 79 minutes]:
>> sync.runtime_Semacquire(0xc0015ede48)
>> /usr/local/go/src/runtime/sema.go:56 +0x42
>> sync.(*WaitGroup).Wait(0xc0015ede40)
>> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>> 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>>
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>> 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
>> created by
>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>>
>> I found the link below which kind of co-relates to our scenario.
>>
>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>>
>> Most of the go routines in the backtrace are in a net/http package so our
>> suspicion is that above bug in our code might be causing that.
>> Even the bigquery is getting hung in net/http.
>>
>> We are using go version 1.13.8 & are running on gcp kubernetes cluster on
>> ubuntu 18.04 docker.
>>
>> go env
>> GO111MODULE=""
>> GOARCH="amd64"
>> GOBIN=""
>> GOCACHE="/root/.cache/go-build"
>> GOENV="/root/.config/go/env"
>> GOEXE=""
>> GOFLAGS=""
>> GOHOSTARCH="amd64"
>> GOHOSTOS="linux"
>> GONOPROXY=""
>> GONOSUMDB=""
>> GOOS="linux"
>> GOPATH="/root/go"
>> GOPRIVATE=""
>> GOPROXY="https://proxy.golang.org,direct;
>> GOROOT="/usr/local/go"
>> GOSUMDB="sum.golang.org"
>> GOTMPDIR=""
>> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
>> GCCGO="gccgo"
>> AR="ar"
>> CC="gcc"
>> CXX="g++"
>> CGO_ENABLED="1"
>> GOMOD="/builds/prosimoio/prosimo/pdash/go.mod"
>> CGO_CFLAGS="-g -O2"
>> CGO_CPPFLAGS=""
>> CGO_CXXFLAGS="-g -O2"
>> CGO_FFLAGS="-g -O2"
>> CGO_LDFLAGS="-g -O2"
>> PKG_CONFIG="pkg-config"
>> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0
>> -fdebug-prefix-map=/tmp/go-build048009048=/tmp/go-build
>> -gno-record-gcc-switches"
>>
>> Let me know if any other information is needed.
>>
>>
>> On Sat, 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-24 Thread Siddhesh Divekar
Looking at the no. of go routines we have does this apply to this issue ?
https://github.com/golang/go/issues/27044

On Mon, Aug 24, 2020 at 12:54 PM Robert Engels 
wrote:

> Go routines in a waiting state should not be consuming CPU. Are you
> certain they are not in constant transition from waiting to processing -
> this could show up as high CPU usage while everything looks blocks.
>
> I would use pprof - github.com/robaho/goanalyzer might be of assistance
> here to see the actual work being done.
>
> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar 
> wrote:
>
> 
> Hi Ian,
>
> Thanks for replying.
>
> We have a go server running which handles user requests & collects data
> from various sources like gcp's cloud sql and big query.
> We are using shopify's sarama library to do kafka operations.
>
> There are seeing lots of go routines in waiting state for several minutes.
> Over the period of time around 587 goroutines have been spun up.
>
> We see that two go routines are stuck on gcp big query and we are using
> wait groups there.
> However, it's not clear why that would cause all other go routines to get
> hung & make cpu go high.
>
> goroutine 3332131 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc001c4fcf8)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc001c4fcf0)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
> 0xc002356810, 0x24, 0x32f7b78, 0x5)
> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
> created by
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>
>
> goroutine 3332149 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc0015ede48)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc0015ede40)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
> 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
>
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
> 0xc002356ba0, 0x24, 0x32f7b78, 0x5)
> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
> created by
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
>
> I found the link below which kind of co-relates to our scenario.
>
> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
>
> Most of the go routines in the backtrace are in a net/http package so our
> suspicion is that above bug in our code might be causing that.
> Even the bigquery is getting hung in net/http.
>
> We are using go version 1.13.8 & are running on gcp kubernetes cluster on
> ubuntu 18.04 docker.
>
> go env
> GO111MODULE=""
> GOARCH="amd64"
> GOBIN=""
> GOCACHE="/root/.cache/go-build"
> GOENV="/root/.config/go/env"
> GOEXE=""
> GOFLAGS=""
> GOHOSTARCH="amd64"
> GOHOSTOS="linux"
> GONOPROXY=""
> GONOSUMDB=""
> GOOS="linux"
> GOPATH="/root/go"
> GOPRIVATE=""
> GOPROXY="https://proxy.golang.org,direct;
> GOROOT="/usr/local/go"
> GOSUMDB="sum.golang.org"
> GOTMPDIR=""
> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
> GCCGO="gccgo"
> AR="ar"
> CC="gcc"
> CXX="g++"
> CGO_ENABLED="1"
> GOMOD="/builds/prosimoio/prosimo/pdash/go.mod"
> CGO_CFLAGS="-g -O2"
> CGO_CPPFLAGS=""
> CGO_CXXFLAGS="-g -O2"
> CGO_FFLAGS="-g -O2"
> CGO_LDFLAGS="-g -O2"
> PKG_CONFIG="pkg-config"
> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0
> -fdebug-prefix-map=/tmp/go-build048009048=/tmp/go-build
> -gno-record-gcc-switches"
>
> Let me know if any other information is needed.
>
>
> On Sat, Aug 22, 2020 at 12:30 PM Ian Lance Taylor  wrote:
>
>> On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar
>>  wrote:
>> >
>> > We saw an issue with our process running in k8s on ubuntu 18.04.
>> > Most of the go routines are stuck for several minutes in http/http2 net
>> code.
>> >
>> > Have you seen similar issues ?
>> >
>> > goroutine 2800143 [select, 324 minutes]:
>> 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-24 Thread Robert Engels
Go routines in a waiting state should not be consuming CPU. Are you certain 
they are not in constant transition from waiting to processing - this could 
show up as high CPU usage while everything looks blocks. 

I would use pprof - github.com/robaho/goanalyzer might be of assistance here to 
see the actual work being done. 

> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar  
> wrote:
> 
> 
> Hi Ian,
> 
> Thanks for replying. 
> 
> We have a go server running which handles user requests & collects data from 
> various sources like gcp's cloud sql and big query.
> We are using shopify's sarama library to do kafka operations.
> 
> There are seeing lots of go routines in waiting state for several minutes. 
> Over the period of time around 587 goroutines have been spun up.
> 
> We see that two go routines are stuck on gcp big query and we are using wait 
> groups there.
> However, it's not clear why that would cause all other go routines to get 
> hung & make cpu go high.
> 
> goroutine 3332131 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc001c4fcf8)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc001c4fcf0)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>  0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>  0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>  0xc002356810, 0x24, 0x32f7b78, 0x5)
> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
> created by 
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
> 
> 
> goroutine 3332149 [semacquire, 79 minutes]:
> sync.runtime_Semacquire(0xc0015ede48)
> /usr/local/go/src/runtime/sema.go:56 +0x42
> sync.(*WaitGroup).Wait(0xc0015ede40)
> /usr/local/go/src/sync/waitgroup.go:130 +0x64
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40,
>  0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7
> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40,
>  0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97)
> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780,
>  0xc002356ba0, 0x24, 0x32f7b78, 0x5)
> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2
> created by 
> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats
> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8
> 
> I found the link below which kind of co-relates to our scenario.
> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time
> 
> Most of the go routines in the backtrace are in a net/http package so our 
> suspicion is that above bug in our code might be causing that.
> Even the bigquery is getting hung in net/http.
> 
> We are using go version 1.13.8 & are running on gcp kubernetes cluster on 
> ubuntu 18.04 docker.
> 
> go env
> GO111MODULE=""
> GOARCH="amd64"
> GOBIN=""
> GOCACHE="/root/.cache/go-build"
> GOENV="/root/.config/go/env"
> GOEXE=""
> GOFLAGS=""
> GOHOSTARCH="amd64"
> GOHOSTOS="linux"
> GONOPROXY=""
> GONOSUMDB=""
> GOOS="linux"
> GOPATH="/root/go"
> GOPRIVATE=""
> GOPROXY="https://proxy.golang.org,direct;
> GOROOT="/usr/local/go"
> GOSUMDB="sum.golang.org"
> GOTMPDIR=""
> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
> GCCGO="gccgo"
> AR="ar"
> CC="gcc"
> CXX="g++"
> CGO_ENABLED="1"
> GOMOD="/builds/prosimoio/prosimo/pdash/go.mod"
> CGO_CFLAGS="-g -O2"
> CGO_CPPFLAGS=""
> CGO_CXXFLAGS="-g -O2"
> CGO_FFLAGS="-g -O2"
> CGO_LDFLAGS="-g -O2"
> PKG_CONFIG="pkg-config"
> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 
> -fdebug-prefix-map=/tmp/go-build048009048=/tmp/go-build 
> -gno-record-gcc-switches"
> 
> Let me know if any other information is needed.
> 
> 
>> On Sat, Aug 22, 2020 at 12:30 PM Ian Lance Taylor  wrote:
>> On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar
>>  wrote:
>> >
>> > We saw an issue with our process running in k8s on ubuntu 18.04.
>> > Most of the go routines are stuck for several minutes in http/http2 net 
>> > code.
>> >
>> > Have you seen similar issues ?
>> >
>> > goroutine 2800143 [select, 324 minutes]: 
>> > net/http.(*persistConn).readLoop(0xc00187d440) 
>> > /usr/local/go/src/net/http/transport.go:2032 +0x999 created by 
>> > net/http.(*Transport).dialConn 
>> 

Re: [go-nuts] Go routines stuck in runtime_pollwait

2020-08-22 Thread Ian Lance Taylor
On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar
 wrote:
>
> We saw an issue with our process running in k8s on ubuntu 18.04.
> Most of the go routines are stuck for several minutes in http/http2 net code.
>
> Have you seen similar issues ?
>
> goroutine 2800143 [select, 324 minutes]: 
> net/http.(*persistConn).readLoop(0xc00187d440) 
> /usr/local/go/src/net/http/transport.go:2032 +0x999 created by 
> net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1580 
> +0xb0d goroutine 2738894 [IO wait, 352 minutes]: 
> internal/poll.runtime_pollWait(0x7f5b61b280c0, 0x72, 0x) 
> /usr/local/go/src/runtime/netpoll.go:184 +0x55 
> internal/poll.(*pollDesc).wait(0xc0017e7e18, 0x72, 0x1000, 0x1000, 
> 0x) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 
> +0x45 internal/poll.(*pollDesc).waitRead(...) 
> /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 
> internal/poll.(*FD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 0x0, 
> 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf 
> net.(*netFD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 0xc0026359e8, 
> 0x49d7fd, 0xc0017e7e00) /usr/local/go/src/net/fd_unix.go:202 +0x4f 
> net.(*conn).Read(0xcdb8b8, 0xc0044a9000, 0x1000, 0x1000, 0x0, 0x0, 0x0) 
> /usr/local/go/src/net/net.go:184 +0x68 
> net/http.(*connReader).Read(0xc004a4fec0, 0xc0044a9000, 0x1000, 0x1000, 0x0, 
> 0x0, 0x0) /usr/local/go/src/net/http/server.go:785 +0xf4 
> bufio.(*Reader).fill(0xc003f1a360) /usr/local/go/src/bufio/bufio.go:100 
> +0x103 bufio.(*Reader).Peek(0xc003f1a360, 0x4, 0x0, 0x0, 0x0, 0x0, 
> 0xc002635ad0) /usr/local/go/src/bufio/bufio.go:138 +0x4f 
> net/http.(*conn).readRequest(0xc0028e1d60, 0x393ed20, 0xc0024e9780, 0x0, 0x0, 
> 0x0) /usr/local/go/src/net/http/server.go:962 +0xb3b 
> net/http.(*conn).serve(0xc0028e1d60, 0x393ed20, 0xc0024e9780) 
> /usr/local/go/src/net/http/server.go:1817 +0x6d4 created by 
> net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2928 +0x384
>
> Is there a know issue or something obvious from the backtrace here.

It's entirely normal for goroutines to sit in pollWait if they are
waiting for network I/O.  There may be reasons why this is incorrect
for your program, but you'll have to tell us those reasons.

Also, along with those reasons, please tell us the version of Go and
the exact environment that you are running.  Thanks.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXQyDF_WV8JQRQop_9ffO%3D37jWXUto61jB%3DMiobtrLbDg%40mail.gmail.com.


[go-nuts] Go routines stuck in runtime_pollwait

2020-08-22 Thread Siddhesh Divekar
Hi All,

We saw an issue with our process running in k8s on ubuntu 18.04.
Most of the go routines are stuck for several minutes in http/http2 net 
code.

Have you seen similar issues ?

goroutine 2800143 [select, 324 minutes]: 
net/http.(*persistConn).readLoop(0xc00187d440) 
/usr/local/go/src/net/http/transport.go:2032 +0x999 created by 
net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1580 
+0xb0d goroutine 2738894 [IO wait, 352 minutes]: 
internal/poll.runtime_pollWait(0x7f5b61b280c0, 0x72, 0x) 
/usr/local/go/src/runtime/netpoll.go:184 +0x55 
internal/poll.(*pollDesc).wait(0xc0017e7e18, 0x72, 0x1000, 0x1000, 
0x) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 
+0x45 internal/poll.(*pollDesc).waitRead(...) 
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92 
internal/poll.(*FD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 0x0, 
0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf 
net.(*netFD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, 0xc0026359e8, 
0x49d7fd, 0xc0017e7e00) /usr/local/go/src/net/fd_unix.go:202 +0x4f 
net.(*conn).Read(0xcdb8b8, 0xc0044a9000, 0x1000, 0x1000, 0x0, 0x0, 0x0) 
/usr/local/go/src/net/net.go:184 +0x68 
net/http.(*connReader).Read(0xc004a4fec0, 0xc0044a9000, 0x1000, 0x1000, 
0x0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:785 +0xf4 
bufio.(*Reader).fill(0xc003f1a360) /usr/local/go/src/bufio/bufio.go:100 
+0x103 bufio.(*Reader).Peek(0xc003f1a360, 0x4, 0x0, 0x0, 0x0, 0x0, 
0xc002635ad0) /usr/local/go/src/bufio/bufio.go:138 +0x4f 
net/http.(*conn).readRequest(0xc0028e1d60, 0x393ed20, 0xc0024e9780, 0x0, 
0x0, 0x0) /usr/local/go/src/net/http/server.go:962 +0xb3b 
net/http.(*conn).serve(0xc0028e1d60, 0x393ed20, 0xc0024e9780) 
/usr/local/go/src/net/http/server.go:1817 +0x6d4 created by 
net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2928 +0x384

Is there a know issue or something obvious from the backtrace here.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/df843ff3-2398-488a-b506-f46a7dea3668n%40googlegroups.com.