Github user nickwallen commented on the issue:
https://github.com/apache/metron/pull/940
+1 The unified topology works great.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
Ok, README is updated with the new topology diagram. Let me know if
there's anything else.
---
Github user nickwallen commented on the issue:
https://github.com/apache/metron/pull/940
That's great @cestella . Many thanks. I will run it up in the lab. No
problem.
---
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/940
Maybe the issue has to do with our keys, and their distribution as the size
get's larger? Maybe when we get larger sizes we get more collisions and end up
calling equals() more or something.
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/940
This should have the equiv. diagram and documentation ( i believe as shown
above ) to the original split join strategy.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
@nickwallen Ok, I refactored the abstraction to separate some concerns,
name things a bit, and collapse some of the more onerous abstractions. Also
updated javadocs.
Can you give it
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
Ahhh, that makes sense. I bet we were getting killed by small allocations
in the caching layer.
---
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
Caffeine doesn't allocate on read, so that would make sense. I saw a [25x
boost](https://github.com/google/guava/issues/2063#issuecomment-107169736)
(compared to
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
I actually suspect GC as well. We adjusted the garbage collector to the
G1GC and saw throughput gains, but not nearly the kinds of gains as we got with
a drop-in of Caffeine to replace Guava.
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
Interesting. Then I guess the size must trigger the read bottleneck as
larger than writes. Perhaps it is incurring a lot more GC overhead that causes
more collections? The CLQ additions requires
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
In this case, the loader isn't doing anything terribly expensive, though it
may in the future (incur a hbase get or some more expensive computation).
---
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to
record its size, per segment. When a read occurs, it records that in the queue
and then drains it under the segment's lock
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
We actually did increase the concurrency level for guava to 64; that is
what confused us as well. The hash code is mostly standard, should be evenly
distributed (the key is pretty much a POJO).
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to
not abuse memory in low concurrent situations. You probably want to increase it
to 64 in a heavy workload, which has a
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
The interesting thing that we found was that guava seems to be doing poorly
when the # of items in the cache gets large. When we scaled the test down (830
distinct IP addresses chosen randomly and
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
That makes sense. A uniform distribution will, of course, degrades all
policies to random replacement so the test is then about how well the
implementations handle concurrency. Most often caches
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
We were being purposefully unkind to the cache in the tests. The load
simulation chose a IP address at random to present, so each IP had an equal
probability of being selected. Whereas, in real
Github user ben-manes commented on the issue:
https://github.com/apache/metron/pull/940
Do you know what the hit rates were, for the same data set, between Guava
and Caffeine? The caches use different policies so it is always interesting to
see how the handle given workloads. As we
Github user nickwallen commented on the issue:
https://github.com/apache/metron/pull/940
I completed some fairly extensive performance testing comparing this new
Unified topology against the existing Split-Join implementation. The
difference was dramatic.
- The Unified
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
Just FYI, as part of the performance experimentation in the lab here, we
found that one major impediment to scale was the guava cache in this topology
when the size of the cache becomes non-trivial
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
@arunmahadevan Thanks for chiming in Arun. I would say that most of the
enrichment work is I/O bound and we try to avoid it whenever possible with a a
time-evicted LRU cache in front of the
Github user arunmahadevan commented on the issue:
https://github.com/apache/metron/pull/940
Managing threadpools within a bolt isn't fundamentally wrong, we have see
some use cases where this is done. However, we have been putting efforts to
reduce the overall number of threads
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
@ottobackwards I haven't sent an email to the storm team, but I did run the
PR past a storm committer that I know and asked his opinion prior to submitting
the PR. The general answer was something
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/940
have we thought to send a mail to the storm dev list and ask if anyone has
done this? potential issues?
---
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/940
If we integrated storm with yarn this would also be a problem, as our
resource management may be at odds with yarn's. I think?
What would be nice is if storm could manage the pool and
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
@mraliagha It's definitely a tradeoff. This is why this is as a complement
to the original split/join topology. Keep in mind, also, that this
architecture enables use-cases that the other would
Github user mraliagha commented on the issue:
https://github.com/apache/metron/pull/940
@cestella Thanks, Casey. Wouldn't be still hard to tune this solution?
Still, thread pool tuning and probably the race condition between these threads
and normal Strom workers makes the tuning
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
@nickwallen Sounds good. When scale tests are done, can we make sure that
we also include #944 ?
---
Github user nickwallen commented on the issue:
https://github.com/apache/metron/pull/940
I'd hold on merging this until we can get this tested at some decent scale.
Unless it already has been? Otherwise, I don't see a need to merge this until
we know it actually addresses a
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/940
I tested this in full dev and worked as expected. +1
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/940
The current architecture is described ![Image of
Yaktocat](https://github.com/apache/metron/raw/master/metron-platform/metron-enrichment/enrichment_arch.png)
In short, for each message each
Github user mraliagha commented on the issue:
https://github.com/apache/metron/pull/940
Is there any document somewhere to show how the previous approach was
implemented? I would like to understand the previous architecture in details.
Becuase some of the pros/cons didn't make sense
33 matches
Mail list logo