[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-06 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 +1 The unified topology works great. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-06 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 Ok, README is updated with the new topology diagram. Let me know if there's anything else. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-06 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 That's great @cestella . Many thanks. I will run it up in the lab. No problem. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 Maybe the issue has to do with our keys, and their distribution as the size get's larger? Maybe when we get larger sizes we get more collisions and end up calling equals() more or something.

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 This should have the equiv. diagram and documentation ( i believe as shown above ) to the original split join strategy. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @nickwallen Ok, I refactored the abstraction to separate some concerns, name things a bit, and collapse some of the more onerous abstractions. Also updated javadocs. Can you give it

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 Ahhh, that makes sense. I bet we were getting killed by small allocations in the caching layer. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Caffeine doesn't allocate on read, so that would make sense. I saw a [25x boost](https://github.com/google/guava/issues/2063#issuecomment-107169736) (compared to

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I actually suspect GC as well. We adjusted the garbage collector to the G1GC and saw throughput gains, but not nearly the kinds of gains as we got with a drop-in of Caffeine to replace Guava.

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Interesting. Then I guess the size must trigger the read bottleneck as larger than writes. Perhaps it is incurring a lot more GC overhead that causes more collections? The CLQ additions requires

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 In this case, the loader isn't doing anything terribly expensive, though it may in the future (incur a hbase get or some more expensive computation). ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to record its size, per segment. When a read occurs, it records that in the queue and then drains it under the segment's lock

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We actually did increase the concurrency level for guava to 64; that is what confused us as well. The hash code is mostly standard, should be evenly distributed (the key is pretty much a POJO).

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to not abuse memory in low concurrent situations. You probably want to increase it to 64 in a heavy workload, which has a

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 The interesting thing that we found was that guava seems to be doing poorly when the # of items in the cache gets large. When we scaled the test down (830 distinct IP addresses chosen randomly and

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 That makes sense. A uniform distribution will, of course, degrades all policies to random replacement so the test is then about how well the implementations handle concurrency. Most often caches

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We were being purposefully unkind to the cache in the tests. The load simulation chose a IP address at random to present, so each IP had an equal probability of being selected. Whereas, in real

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Do you know what the hit rates were, for the same data set, between Guava and Caffeine? The caches use different policies so it is always interesting to see how the handle given workloads. As we

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 I completed some fairly extensive performance testing comparing this new Unified topology against the existing Split-Join implementation. The difference was dramatic. - The Unified

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 Just FYI, as part of the performance experimentation in the lab here, we found that one major impediment to scale was the guava cache in this topology when the size of the cache becomes non-trivial

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @arunmahadevan Thanks for chiming in Arun. I would say that most of the enrichment work is I/O bound and we try to avoid it whenever possible with a a time-evicted LRU cache in front of the

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/metron/pull/940 Managing threadpools within a bolt isn't fundamentally wrong, we have see some use cases where this is done. However, we have been putting efforts to reduce the overall number of threads

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @ottobackwards I haven't sent an email to the storm team, but I did run the PR past a storm committer that I know and asked his opinion prior to submitting the PR. The general answer was something

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 have we thought to send a mail to the storm dev list and ask if anyone has done this? potential issues? ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 If we integrated storm with yarn this would also be a problem, as our resource management may be at odds with yarn's. I think? What would be nice is if storm could manage the pool and

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @mraliagha It's definitely a tradeoff. This is why this is as a complement to the original split/join topology. Keep in mind, also, that this architecture enables use-cases that the other would

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-28 Thread mraliagha
Github user mraliagha commented on the issue: https://github.com/apache/metron/pull/940 @cestella Thanks, Casey. Wouldn't be still hard to tune this solution? Still, thread pool tuning and probably the race condition between these threads and normal Strom workers makes the tuning

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-28 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @nickwallen Sounds good. When scale tests are done, can we make sure that we also include #944 ? ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-27 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 I'd hold on merging this until we can get this tested at some decent scale. Unless it already has been? Otherwise, I don't see a need to merge this until we know it actually addresses a

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-26 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/940 I tested this in full dev and worked as expected. +1 ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-23 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 The current architecture is described ![Image of Yaktocat](https://github.com/apache/metron/raw/master/metron-platform/metron-enrichment/enrichment_arch.png) In short, for each message each

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-22 Thread mraliagha
Github user mraliagha commented on the issue: https://github.com/apache/metron/pull/940 Is there any document somewhere to show how the previous approach was implemented? I would like to understand the previous architecture in details. Becuase some of the pros/cons didn't make sense