[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Caffeine doesn't allocate on read, so that would make sense. I saw a [25x 
boost](https://github.com/google/guava/issues/2063#issuecomment-107169736) 
(compared to 
[current](https://github.com/google/guava/issues/2063#issue-82444927)) when 
porting the buffers to Guava.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Interesting. Then I guess the size must trigger the read bottleneck as 
larger than writes. Perhaps it is incurring a lot more GC overhead that causes 
more collections? The CLQ additions requires allocating a new queue node. That 
and the cache entry probably get promoted to old gen due to the high churn 
rate, causing everything to slow down. Probably isn't too interesting to 
investigate vs swapping libraries :)


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to 
record its size, per segment. When a read occurs, it records that in the queue 
and then drains it under the segment's lock (via tryLock) to replay the events. 
This is similar to Caffeine, which uses optimized structures instead. I 
intended the CLQ & counter as baseline scaffolding for replacement, as it is an 
obvious bottleneck, but I could never get it replaced despite advocating for 
it. The penalty of draining the buffers is amortized, but unfortunately this 
buffer isn't capped.

Since there would be a higher hit rate with a larger cache, the reads would 
be recorded more often. Perhaps contention there and the penalty of draining 
the queue is more observable than a cache miss. That's still surprising since a 
cache miss is usually more expensive I/O. Is the loader doing expensive work in 
your case?

Caffeine gets around this problem by using more optimal buffers and being 
lossy (on reads only) if it can't keep up. By default it delegates the 
amortized maintenance work to a ForkJoinPool to avoid user-facing latencies, 
since you'll want those variances to be tight. Much of that can be back ported 
onto Guava for a nice boost.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to 
not abuse memory in low concurrent situations. You probably want to increase it 
to 64 in a heavy workload, which has a ~4x throughput gain on reads. It won't 
scale much higher, since it has internal bottlenecks and I could never get 
patches reviewed to fix those.

I've only noticed overall throughput be based on threads, and never 
realized there was a capacity constraint to its performance. One should expect 
some due to the older hash table design resulting in more collisions, whereas 
CHMv8 does much better there. Still, I would have expected it to even out 
enough unless have a bad hash code?


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
That makes sense. A uniform distribution will, of course, degrades all 
policies to random replacement so the test is then about how well the 
implementations handle concurrency. Most often caches exhibit a Zipfian 
distribution (80-20 rule), so our bias towards frequency is a net gain. We have 
observed a few rare cases where frequency is a poor signal and LRU is optimal, 
and we are exploring adaptive techniques to dynamically tune the cache based on 
the workload's characteristics. These cases don't seem to occur in many 
real-world scenarios that we know of, but it is always nice to know what users 
are experiencing and how much better (or worse) we perform than a standard LRU 
cache.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Do you know what the hit rates were, for the same data set, between Guava 
and Caffeine? The caches use different policies so it is always interesting to 
see how the handle given workloads. As we continue to refine our adaptive 
algorithm W-TinyLFU, its handy to know what types of workloads to investigate. 
(P.S. We have a simulator for re-running persisted traces if useful for your 
tuning)


---