Hi Nathan, thank you for your immediate response!
Zitat von Nathan Leung <[email protected]>:
Do you have max spout pending set in the topology? It could be that your spout is throttling data since there are too many tuples in flight simultaneously, which will have the effect of reducing your maximum throughput.
Yes, we have that set, but at approx 13 times our per-second throughput. As we see a "complete latency" of about 200ms, those 13 seconds of data should give us enough tuples?
Whether you are reaching framework communications limits depends on your application (how many tuples you are sending), and your network. If you suspect this to be the case, you can turn some shuffleGroupings (if you have any) into localOrShuffleGroupings, which try to keep the data in process if possible, thereby reducing network congestion.
Currently, we're processing about 16k tuples per second in the complete tree (so including those emitted by intermediate bolts), which shouldn't be "that high" a number - we're aiming at 10 times of that, at least. All across a local Gigabit Ethernet segment, no routers or WAN links in between.
I'll give your suggestion a try - those shuffleGroupings are 50% of the groupings, the other half is fieldsGrouping (to balance load across aggregators).
Are there any Storm-specific options/tools to monitor inter-worker queues? With regards, Jens
