I have a topology in which one of the bolts read from HBase. That bolt is setup to have one task per executor, and it got tuples from shuffle grouping so every executor of the bolt will have the same number of tuples.
The problem is that some executors will take longer than others (because of hot-spotting in HBase region servers). For example, 5% of the executors have latency of 100ms while the rest 95% have around 25ms. Now with the guarantee of equal number of tuples per executors, my understanding is that 95% of the fast executors will have to wait. Thus it brings down the throughput. Please correct me if I mis-understand. Ideally, I would love to have a load-balance-biased shuffle grouping so that the 95% of the fast executors would get more tuples. Is this something I can leverage other existing groupings or patterns to implement? In an earlier post entitled "long running bolts", Mike Thomsen and Nathan Leung discussed a nice idea of taking long running tuples elsewhere (paraphrase below): "*create tasks for the bolt using the same class but different name in the topology... route long running bolts (without acks) to the separate instances and they will not affect your normal processing*" Since my long running executors are not taking that long (just around 100ms), this may not be worth the effort to take the tuples elsewhere. I would appreciate any comment and suggestions. Many thanks. BH
