Hi Everyone, As I've posted previously ( http://mail-archives.apache.org/mod_mbox/storm-user/201512.mbox/%3CCAC_ccQAHfQLrQcF4AGypg3A-sht5fRhO1pRhuvdUWxkFhwK7xA%40mail.gmail.com%3E), I am attempting to solve a throughput problem I am having using fieldsGrouping. I just posted the following update:
*As an update, the fan-in Bolt idea appears to work to a certain extent. Initial throughput as measured by tuples acked/minute nearly matches the configuration where I have localOrShuffleGrouping between Bolt A and Bolt B where I have exclusive local messsaging. Moreover, the time spent in com.lmax.disruptor.BlockingWaitStrategy.waitFor goes frorm 99% (no fan-in Bolt) to 60% (fan-in Bolt). This represents a dramatic improvement because, without the fan-in concept, fieldsGrouping throughput in my topology is 1/4 or less of localOrShuffleGrouping and executors (and corresponding workers) fail quickly. So it appears that the fan-in Bolt concept does help.* *The problem is that Bolt B acking lags behind fan-in Bolt emitting by about 10-15% and, eventually I get a series of tuple failures, coupled with executor and worker failures. So, short-term, the fan-in Bolt concept works to dramatically improve throughput with fieldsGrouping, but long-term throughput and topology stability still an issue.* I recently tested shuffleGrouping and fieldsGrouping with the fan-in Bolt concept, the shuffleGrouping configuration virtually matches localOrShuffleGrouping throughput. Importantly, the topology remains stable over several hours. Consequently, it appears that data skew in the fieldsGrouping *may* be the problem. I am going to upgrade to 0.10.0 and try out partialKeyGrouping to see if that helps things. --John
