Re: [akka-user][deprecated] Re: [akka-user] Spray->Akka-Http Migration - seeing high 99th percentile latencies post-migration
That the entity directive is part of the picture could be a hint that indeed streaming requests might be the cause of this. In spray, there was no request streaming enabled by default and the engine just collected the complete stream into a buffer and dispatched it to the app only after everything was received. This has changed in akka-http where streaming is on by default if the complete request wasn't received in one go from the network. In this case the streaming case is actually more likely to happen on low-traffic servers with a real network where network packages are not aggregated in lower levels but are really processed immediately when they are received. The question is still if the 200ms are really added latency in akka-http or just an artifact of how request processing time is measured. There's definitely *some* overhead of processing a request in streaming fashion but it's not 200ms. I haven't checked seriously but it seems that Kamon might be measuring something else than you are thinking in akka-http: it seems to start measuring the time from when the request is dispatched to your app but at this point the request body might not have been received fully. That means that whenever the HTTP client is slow with sending a request for whatever reason, it will show in your request processing times. Johannes On Mon, Oct 8, 2018 at 10:42 PM Gary Malouf wrote: > We ultimately decided to rollout despite this glitch. Not happy about it, > and hoping whatever is causing this gets resolved in a future release. My > hunch is that it's a fixed price being paid that if 1000's of more > requests/second were sent to the app would make this unnoticeable. > > > > On Sun, Oct 7, 2018 at 11:18 AM Avshalom Manevich > wrote: > >> Hi Gary, >> >> Did you end up finding a solution to this? >> >> We're hitting a similar issue with Akka HTTP (10.0.11) and a low-load >> server. >> >> Average latency is great but 99th percentile is horrible (~200ms). >> >> Appreciate your input. >> >> Regards, >> Avshalom >> >> >> I wonder if you could start a timer when you enter the trace block and >>> then e.g. after 200ms trigger one or multiple stack dumps (using JMX or >>> just by printing out the result of `Thread.getAllStackTraces`). It's not >>> super likely that something will turn up but it seems like a simple enough >>> thing to try. >>> >>> Johannes >>> >>> On Thursday, November 16, 2017 at 1:28:23 PM UTC+1, Gary Malouf wrote: >>> Hi Johannes, Yes; we are seeing 2-3 requests/second (only in production) with the latency spikes. We found no correlation between the gc times and these request latencies, nor between the size/type of requests. We had to pause the migration effort for 2 weeks because of the time being taken, but just jumped back on it the other day. Our current strategy is to implement this with the low level api to see if we get the same results. Gary On Nov 16, 2017 6:57 AM, wrote: Hi Gary, did you find out what's going on by now? If I understand correctly, you get latency spikes as soon as you use the `entity[as[String]]` directive? Could you narrow down if there's anything special to those requests? I guess you monitor your GC times? Johannes On Wednesday, November 1, 2017 at 8:56:50 PM UTC+1, Gary Malouf wrote: > So the only way I was able to successfully identify the suspicious > code was to route a percentage of my production traffic to a stubbed route > that I incrementally added back pieces of our implementation into. What I > found was that we started getting spikes when the entity(as[Case > ClassFromJson]) stubbed was added back in. To figure out if it was > the json parsing or 'POST' entity consumption itself, I replaced that > class > with a string - turns out we experience the latency spikes with that as > well (on low traffic as noted earlier in this thread). > > I by no means have a deep understanding of streams, but it makes me > wonder if the way I have our code consuming the entity is not correct. > > On Monday, October 30, 2017 at 4:27:13 PM UTC-4, Gary Malouf wrote: > >> Hi Roland - thank you for the tip. We shrunk the thread pool size >> down to 1, but were disheartened to still see the latency spikes. Using >> Kamon's tracing library (which we validated with various tests to ensure >> it's own numbers are most likely correct), we could not find anything in >> our code within the route that was causing the latency (it all appeared >> to >> be classified to be that route but no code segments within it). >> >> As mentioned earlier, running loads of 100-1000 requests/second >> completely hides the issue (save for the max latency) as everything >> through >> 99th percentiles is under a few milliseconds. >> >> On Tuesday, October
Re: [akka-user][deprecated] Re: [akka-user] Spray->Akka-Http Migration - seeing high 99th percentile latencies post-migration
We ultimately decided to rollout despite this glitch. Not happy about it, and hoping whatever is causing this gets resolved in a future release. My hunch is that it's a fixed price being paid that if 1000's of more requests/second were sent to the app would make this unnoticeable. On Sun, Oct 7, 2018 at 11:18 AM Avshalom Manevich wrote: > Hi Gary, > > Did you end up finding a solution to this? > > We're hitting a similar issue with Akka HTTP (10.0.11) and a low-load > server. > > Average latency is great but 99th percentile is horrible (~200ms). > > Appreciate your input. > > Regards, > Avshalom > > > I wonder if you could start a timer when you enter the trace block and >> then e.g. after 200ms trigger one or multiple stack dumps (using JMX or >> just by printing out the result of `Thread.getAllStackTraces`). It's not >> super likely that something will turn up but it seems like a simple enough >> thing to try. >> >> Johannes >> >> On Thursday, November 16, 2017 at 1:28:23 PM UTC+1, Gary Malouf wrote: >> >>> Hi Johannes, >>> >>> Yes; we are seeing 2-3 requests/second (only in production) with the >>> latency spikes. We found no correlation between the gc times and these >>> request latencies, nor between the size/type of requests. >>> >>> We had to pause the migration effort for 2 weeks because of the time >>> being taken, but just jumped back on it the other day. >>> >>> Our current strategy is to implement this with the low level api to see >>> if we get the same results. >>> >>> Gary >>> >>> On Nov 16, 2017 6:57 AM, wrote: >>> >>> Hi Gary, >>> >>> did you find out what's going on by now? If I understand correctly, you >>> get latency spikes as soon as you use the `entity[as[String]]` directive? >>> Could you narrow down if there's anything special to those requests? I >>> guess you monitor your GC times? >>> >>> Johannes >>> >>> >>> On Wednesday, November 1, 2017 at 8:56:50 PM UTC+1, Gary Malouf wrote: >>> So the only way I was able to successfully identify the suspicious code was to route a percentage of my production traffic to a stubbed route that I incrementally added back pieces of our implementation into. What I found was that we started getting spikes when the entity(as[CaseClassFromJson ]) stubbed was added back in. To figure out if it was the json parsing or 'POST' entity consumption itself, I replaced that class with a string - turns out we experience the latency spikes with that as well (on low traffic as noted earlier in this thread). I by no means have a deep understanding of streams, but it makes me wonder if the way I have our code consuming the entity is not correct. On Monday, October 30, 2017 at 4:27:13 PM UTC-4, Gary Malouf wrote: > Hi Roland - thank you for the tip. We shrunk the thread pool size > down to 1, but were disheartened to still see the latency spikes. Using > Kamon's tracing library (which we validated with various tests to ensure > it's own numbers are most likely correct), we could not find anything in > our code within the route that was causing the latency (it all appeared to > be classified to be that route but no code segments within it). > > As mentioned earlier, running loads of 100-1000 requests/second > completely hides the issue (save for the max latency) as everything > through > 99th percentiles is under a few milliseconds. > > On Tuesday, October 24, 2017 at 2:23:07 AM UTC-4, rkuhn wrote: > >> You could try to decrease your thread pool size to 1 to exclude >> wakeup latencies when things (like CPU cores) have gone to sleep. >> >> Regards, Roland >> >> Sent from my iPhone >> >> On 23. Oct 2017, at 22:49, Gary Malouf wrote: >> >> Yes, it gets parsed using entity(as[]) with spray-json support. >> Under a load test of say 1000 requests/second these latencies are not >> visible in the percentiles - they are easy to see because this web server >> is getting 10-20 requests/second currently. Trying to brainstorm if a >> dispatcher needed to be tuned or something of that sort but have yet to >> see >> evidence supporting that. >> >> path("foos") { >> traceName("FooSelection") { >> >> entity(as[ExternalPageRequest]) { pr => >> val spr = toSelectionPageRequest(pr) >> shouldTracePageId(spr.pageId).fold( >> Tracer.currentContext.withNewSegment(s"Page-${pr.pageId}", "PageTrace >> ", "kamon") { >> processPageRequestAndComplete(pr, spr) >> }, >> processPageRequestAndComplete(pr, spr) >> ) >> } >> } >> >> >> } >> >> On Mon, Oct 23, 2017 at 4:42 PM, Viktor Klang >> wrote: >> >>> And you consume the entityBytes I presume? >>> >>> On Mon, Oct 23, 2017 at 10:35 PM, Gary Malouf >>> wrote: >>> It is from when I start the Kamon trace (just inside of my