What Nick says is absolutely right, but I want to add just a bit of color around the architectural differences between split-join vs unified here.
Split-join was the first approach to parallelizing the various enrichment adapters that we have (e.g. hbase, geo, stellar). We took a very "stormy" approach to this (see: https://groups.google.com/forum/#!topic/storm-user/7Gk34vwUATk). What we found, however, was during performance evaluation we had extremely pinched throughput with this architecture (see architecture at https://github.com/apache/metron/tree/master/metron-platform/metron-enrichment#enrichment-architecture). Specifically, the cost of the network overhead in a split/join topology was overwhelming us and pinching throughput. We then moved to the unified topology (architecture: https://github.com/apache/metron/tree/master/metron-platform/metron-enrichment#unified-enrichment-topology), which removed the network latency overhead, but did things in a less stormy way. Specifically enrichments are done in parallel, but inside of a threadpool in the enrichment bolt. This saved us network hops at the expense of adding a threadpool to storm. In our tests, we've found this to be the preferred approach. Hope this helps add color! Best, Casey On Thu, Jul 26, 2018 at 5:31 AM Stefan Kupstaitis-Dunkler < [email protected]> wrote: > Hi, > > what are the key differences of Split-Join and Unified in the enrichment > topology. Which should be used when and why? > > Best, > Stefan > -- > Stefan Kupstaitis-Dunkler > https://datahovel.com/ > https://www.meetup.com/Hadoop-User-Group-Vienna/ > https://twitter.com/StefanDunkler >
