Matthias J. Sax created FLINK-2620: -------------------------------------- Summary: StreamPartitioner is not properly initialized for shuffle and rebalance Key: FLINK-2620 URL: https://issues.apache.org/jira/browse/FLINK-2620 Project: Flink Issue Type: Bug Components: Streaming Reporter: Matthias J. Sax
Using rebalance connection pattern, the round-robin distribution starts with the same receiver index in all parallel tasks. For high receiver dop and (very) small data, it might result in an inbalanced distribution. For example, 100 source tasks, each emitting 10 record to 100 consumer tasks. Thus, only the first 10 consumer tasks receive data (10 records each) while the other 90 do not. A possible fix would be, to compute different starting indexes for different producer tasks like {noformat} startIdx = (numReceivers / numSenders) * myIdx {noformat} For shuffle grouping, the random data generator is initialized with a unique seed for all parallel tasks. This should be changed such that each task uses a different seed. To achieve both, the StreamPartitioner class should be extended with a configuration / initialize method which is called on each parallel operator. For a full discussion please see the mailing list archive: https://mail-archives.apache.org/mod_mbox/flink-dev/201509.mbox/browser -- This message was sent by Atlassian JIRA (v6.3.4#6332)