OK, I will add to that issue , but for the record here is our machine type and the env vars:
This GCP hardware has 64 virtual CPUs (a hyperthread on a 2.6GHz Xeon E5), with 416GB memory. This is shared among other docker containers, of course, but as the trace file shows, our process uses 8 CPUs (and around 16GB memory). The environment variables of concern are that we set GOMAXPROCS to 8, SetGCPercent to 20, and while generating the profiles we set GODEBUG=gctrace=1,gcpacertrace=1. On Wednesday, July 26, 2017 at 5:56:10 AM UTC-7, r...@golang.org wrote: > > I would add to 14812 > <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fgolang%2Fgo%2Fissues%2F14812&sa=D&sntz=1&usg=AFQjCNEO0CmkbsFu1mCIIpvASd-f7Z98tA>. > > The report should include the environment variables, HW , and RAM. The > report should indicate if any environment variables are not set to the > default, such as GOGC (SetGCPercent). > > On Monday, July 24, 2017 at 8:44:10 PM UTC-4, stb...@laserlike.com wrote: >> >> Hi, >> >> We are experiencing a problem that I believe may be related to issue >> 14812 >> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fgolang%2Fgo%2Fissues%2F14812&sa=D&sntz=1&usg=AFQjCNEO0CmkbsFu1mCIIpvASd-f7Z98tA> >> >> but I wanted to ask here before adding to that case or filing a new issue. >> Of course, we’d also greatly appreciate any advice about how to make our >> program performant. >> >> Here is what we observe: at Laserlike one of our core user-facing >> services (the “leaf”) typically responds to a particular rpc in <400ms. >> During GC we see spikes in latency to >5s on some simple requests. The >> stop the world periods are short, so the GC spikes appear to happen at >> other times. >> >> We have been unable to replicate this in a simple program, but we did run >> our code in a test mode that repros it. In our test environment the server >> loads ~10 GB of persistent data (never deleted so doesn’t really need to be >> GCed), and we ask for 8 processors. We are running go version 1.8.3 on >> kubernetes on GCP machine of type n1-highmem-64. To create the problem we >> send the server a single request with >500 tasks.. >> >> >> This google drive folder has leaf-logs-full.redacted.txt as well as other >> performance tooling files >> <https://drive.google.com/drive/folders/0BypqYg6r4ebqb0lQNnM5eWtISms?usp=sharing>. >> >> A snippet from that log here shows normal query responses and timing: >> >> I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for >> '[redacted]', took 0.013 seconds >> >> I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for >> '[redacted]' >> >> I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for >> '[redacted]', took 0.001 seconds >> >> I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for >> '[redacted]' >> >> I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for >> '[redacted]', took 0.005 seconds >> >> >> We have observed that if a GC cycle happens to start while serving >> traffic (which is often) AND there is a large amount of time spent in >> assist, then our serving latency skyrockets by 10x. In the log the >> slowdown commences roughly when pacer assist starts at I0719 >> 22:50:31.079283 and then reverts to normal latencies shortly after the gc >> cycle completes at I0719 22:50:36.806085. >> >> Below I copy parts of the log where we see latencies of up to 729ms on >> tasks. I also bold the line that shows 32929ms spent on alloc gc assist. >> >> We have captured an attached cpu profile during this time which seems to >> confirm a large amount of time spent in runtime.gcAssistAlloc.func1. >> >> >> Pardon our ignorance about GC in golang, but our hypothesis about what >> may be going wrong is that our large in-memory data structures are causing >> gc to often go into assist mode, and that for reasons we don’t understand >> malloc becomes expensive in that mode. Since we also create ~100k new data >> objects when processing user requests, we are guessing those allocs become >> very slow. Another piece of evidence for this hypothesis is that we have >> another (prototype) implementation of this kind of service that makes more >> use of object pools and doesn’t seem to have as much of slowdown during GC. >> >> Note on large in-memory data-structures: >> >> The principal data structures can be thought of as: >> >> Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to >> 50K) (around ~10G total memory usage) >> >> Map[uint64][]uint64 (about 50M map entries, the slice lengths vary >> between 10 and 100K, in a zipfian distribution, about 3G total memory usage) >> >> These data structures mostly stay as is over the life of the program. >> >> We are trying to figure out how to solve this so would appreciate any >> advice. An engineer on our team wrote up the following ideas, none of which >> are that great: >> >> 1. >> >> Figure out a simple way to prevent our goroutines slowing down during >> GC. I had some hopes LockOSThread() could be made to work, but it didn't >> seem to help in my experiments. I'm not ruling this solution out >> entirely, >> but if it's the write barriers that are the main problem, I don't have >> much >> hope. >> 2. >> >> Run at least 2 replicas of all our servers. Manage their GC cycles >> ourselves, synchronized so that at most one replica is in GC at any given >> time. The clients should either send all requests to both replicas (and >> cancel when one replies), or use some more complicated Kubernetes and >> client logic so a GCing replica is never sent requests. This is the >> simplest solution that is likely to work, and doesn't require us to >> change >> our habits too much. It just costs more. :) >> 3. >> >> Refactor our current servers and program future servers with the >> explicit goal of reducing GC burden. Give our servers more memory and >> increase SetGCPercent() so that Garbage Collection happens less >> frequently. >> Use simpler data structures with fewer pointers to reduce the length of >> time the GC cycle lasts when it does run. This isn't a perfect solution, >> because the GC will still hit us on occasion, but it reduces the harm. >> 4. >> >> Throw out much of the convenience of Go and write our own >> allocators/shared pointers that index into a flat byte buffer, making no >> use of the GC. (Either do this for particular objects, or just write a >> general solution.) Make GC cycles so infrequent that we can either turn >> GC >> off completely without going OOM, or at least run it only once per day, >> during off hours. This seems very hard, and we'll be fighting with Go a >> lot. >> 5. >> >> Implement the core serving data structures of our service in C++, >> paired off with another server written in Go that manages populating that >> server’s data -- the Go server would manage the data in the C++ server >> via >> rpc, and query it over rpc. >> 6. >> >> The nuclear option: decide that Go is simply not a suitable language >> for our serving stack. Rewrite major servers in C++. >> >> >> >> Right now #5 is our (terrible but only feasible-seeming) choice. >> >> Appreciate any advice! >> >> Steve Baker >> >> >> >> >> Logs with gc log lines: >> >> >> pacer: assist ratio=+3.155887e+001 (scan 2669 MB in 10072->10157 MB) >> workers=2+0 >> >> I0719 22:50:31.097707 leaf.go:363] Worker #3 done search for >> '[redacted]', took 0.017 seconds >> >> I0719 22:50:31.098036 leaf.go:225] Worker #3 starting search for >> '[redacted]' >> >> I0719 22:50:31.133959 leaf.go:363] Worker #6 done search for >> '[redacted]', took 0.069 seconds >> >> I0719 22:50:31.134300 leaf.go:225] Worker #6 starting search for >> '[redacted]' >> >> I0719 22:50:31.142191 leaf.go:363] Worker #9 done search for >> '[redacted]', took 0.066 seconds >> >> I0719 22:50:31.142275 leaf.go:225] Worker #9 starting search for >> '[redacted]' >> >> I0719 22:50:31.173921 leaf.go:363] Worker #10 done search for >> '[redacted]', took 0.098 seconds >> >> I0719 22:50:31.174009 leaf.go:225] Worker #10 starting search for >> '[redacted]' >> >> I0719 22:50:31.328796 leaf.go:363] Worker #15 done search for >> '[redacted]', took 0.260 seconds >> >> I0719 22:50:31.330194 leaf.go:225] Worker #15 starting search for >> '[redacted]' >> >> I0719 22:50:31.395579 leaf.go:363] Worker #11 done search for >> '[redacted]', took 0.374 seconds >> >> >> <SNIP> >> >> I0719 22:50:36.317004 leaf.go:363] Worker #5 done search for >> '[redacted]', took 0.729 seconds >> >> I0719 22:50:36.317190 leaf.go:225] Worker #5 starting search for >> '[redacted]' >> >> I0719 22:50:36.370191 leaf.go:363] Worker #15 done search for >> '[redacted]', took 0.192 seconds >> >> I0719 22:50:36.371446 leaf.go:225] Worker #15 starting search for >> '[redacted]' >> >> I0719 22:50:36.421953 leaf.go:363] Worker #7 done search for >> '[redacted]', took 0.116 seconds >> >> I0719 22:50:36.422092 leaf.go:225] Worker #7 starting search for >> '[redacted]' >> >> I0719 22:50:36.570008 leaf.go:363] Worker #12 done search for >> '[redacted]', took 0.778 seconds >> >> I0719 22:50:36.572970 leaf.go:225] Worker #12 starting search for >> '[redacted]' >> >> I0719 22:50:36.573571 leaf.go:363] Worker #6 done search for >> '[redacted]', took 0.710 seconds >> >> I0719 22:50:36.573721 leaf.go:225] Worker #6 starting search for >> '[redacted]' >> >> pacer: H_m_prev=8875421544 h_t=+1.900000e-001 H_T=10561751637 >> h_a=+1.983366e-001 H_a=10635742360 h_g=+2.000000e-001 H_g=10650505852 >> u_a=+9.735779e-001 u_g=+2.500000e-001 W_a=2354135120 goalΔ=+1.000000e-002 >> actualΔ=+8.336587e-003 u_a/u_g=+3.894312e+000 >> >> gc 1072 @4337.737s 39%: 0.37+5688+0.32 ms clock, 2.2+32929/11310/21+1.9 >> ms cpu, 10072->10143->8661 MB, 10157 MB goal, 8 P >> >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.