Github user JasonMWhite commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r46449467 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -364,8 +365,8 @@ class DirectKafkaStreamSuite val batchIntervalMilliseconds = 100 val estimator = new ConstantEstimator(100) - val messageKeys = (1 to 200).map(_.toString) - val messages = messageKeys.map((_, 1)).toMap + val messages = Map("foo" -> 200) + kafkaTestUtils.sendMessages(topic, messages) --- End diff -- The kafka messages don't actually need to be sent three times, just once is sufficent for all tests. When I send "foo" 200 times in one batch, they all go to the same partition. However, when I do this 3 times (for each of 100, 50, 20), the batches of 200 go to a random partition each time. I suspect something in how the test kafka cluster does the partitioning. I was usually getting 200 on 1 partitions, and 400 on the other 2. I was explicitly changing the test case to "all the messages are on one partition" since I can't control the split deterministically.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org