fieldsGrouping data skew (?) : localOrShuffleGrouping, shuffleGrouping, and partialKeyGrouping

John Yost Fri, 01 Jan 2016 08:44:03 -0800

Hi Everyone,

As I've posted previously (
http://mail-archives.apache.org/mod_mbox/storm-user/201512.mbox/%3CCAC_ccQAHfQLrQcF4AGypg3A-sht5fRhO1pRhuvdUWxkFhwK7xA%40mail.gmail.com%3E),
I am attempting to solve a throughput problem I am having using
fieldsGrouping. I just posted the following update:


*As an update, the fan-in Bolt idea appears to work to a certain extent.
Initial throughput as measured by tuples acked/minute nearly matches the
configuration where I have localOrShuffleGrouping between Bolt A and Bolt B
where I have exclusive local messsaging. Moreover, the time spent
in  com.lmax.disruptor.BlockingWaitStrategy.waitFor goes frorm 99% (no
fan-in Bolt) to 60% (fan-in Bolt). This represents a dramatic improvement
because, without the fan-in concept, fieldsGrouping throughput in my
topology is 1/4 or less of localOrShuffleGrouping and executors (and
corresponding workers) fail quickly. So it appears that the fan-in Bolt
concept does help.*

*The problem is that Bolt B acking lags behind fan-in Bolt emitting by
about 10-15% and, eventually I get a series of tuple failures, coupled with
executor and worker failures. So, short-term, the fan-in Bolt concept works
to dramatically improve throughput with fieldsGrouping, but long-term
throughput and topology stability still an issue.*

I recently tested shuffleGrouping and fieldsGrouping with the fan-in Bolt
concept, the shuffleGrouping configuration virtually matches
localOrShuffleGrouping throughput. Importantly, the topology remains stable
over several hours. Consequently, it appears that data skew in the
fieldsGrouping *may* be the problem.

I am going to upgrade to 0.10.0 and try out partialKeyGrouping to see if
that helps things.

--John

fieldsGrouping data skew (?) : localOrShuffleGrouping, shuffleGrouping, and partialKeyGrouping

Reply via email to