On 2018/11/08 00:13:39, "Matthias J. Sax" <matth...@confluent.io> wrote: 
> That is what I try to figure out. I went over the 0.10.2.2 to 0.11.0.3
> Jiras but found nothing I could point out. There are couple of
> SessionStore related tickets, but none of them should have an effect
> like this.
> 
> To narrow it down, it would be helpful to test with other versions, too.
> Maybe 0.10.2.2 and 0.11.0.0 to see when the issue was introduced.

Done. So far here's what my tests have shown:
0.10.2.1 (the current version we're running) and 0.10.2.2, the local cache 
works properly and we see thread profiles similar to what I posted earlier, 
where the majority of time is spent in RockDB and there's no lag. 

Testing with 0.11.0.0, 0.11.0.3, 1.1.1, 2.0.0 and 2.0.1 all show us spending 
the majority of time in the local cache and we lag considerably:

https://imgur.com/l5VEsC2

> Can you also profile v0.10.2.1 so we can compare?

Here's a recent profile for 0.10.2.1:

https://imgur.com/a/Sto636s

> > What would you recommend for our next steps? 
> 
> Not sure. If you could help us to track down the issue, that would be
> most helpful so get a fix (and you could run from a SNAPSHOT version to
> get the fix -- not sure if this would be an option for you).

Another developer took a look a the code and he had some thoughts:

"It appears we're scanning an order of magnitude more keys for every call to 
`findSessions`. You can see this manifest in the flush logs where version 
0.11.0.3 and later will have a billion hits on the cache in 10 minutes, even 
though the number of events consumed is only 1M. It seems like when they made 
some fixes to make sure all possible windows for a session merge are found that 
resulted in having to scan every entry in the cache."

Is there a way for us to refine the cache search so we're not searching the 
entire key space? 



Reply via email to