What Nick says is absolutely right, but I want to add just a bit of color
around the architectural differences between split-join vs unified here.

Split-join was the first approach to parallelizing the various enrichment
adapters that we have (e.g. hbase, geo, stellar).  We took a very "stormy"
approach to this (see:
https://groups.google.com/forum/#!topic/storm-user/7Gk34vwUATk).  What we
found, however, was during performance evaluation we had extremely pinched
throughput
with this architecture (see architecture at
https://github.com/apache/metron/tree/master/metron-platform/metron-enrichment#enrichment-architecture).
Specifically, the cost of the network overhead in a split/join topology was
overwhelming us and pinching throughput.

We then moved to the unified topology (architecture:
https://github.com/apache/metron/tree/master/metron-platform/metron-enrichment#unified-enrichment-topology),
which removed the network latency overhead, but did things in a less stormy
way.  Specifically enrichments are done in parallel, but inside of a
threadpool in the enrichment bolt.  This saved us network hops at the
expense of adding a threadpool to storm.  In our tests, we've found this to
be the preferred approach.

Hope this helps add color!

Best,

Casey

On Thu, Jul 26, 2018 at 5:31 AM Stefan Kupstaitis-Dunkler <
[email protected]> wrote:

> Hi,
>
> what are the key differences of Split-Join and Unified in the enrichment
> topology. Which should be used when and why?
>
> Best,
> Stefan
> --
> Stefan Kupstaitis-Dunkler
> https://datahovel.com/
> https://www.meetup.com/Hadoop-User-Group-Vienna/
> https://twitter.com/StefanDunkler
>

Reply via email to