Github user JasonMWhite commented on a diff in the pull request:
https://github.com/apache/spark/pull/10089#discussion_r46449467
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
val batchIntervalMilliseconds = 100
val estimator = new ConstantEstimator(100)
- val messageKeys = (1 to 200).map(_.toString)
- val messages = messageKeys.map((_, 1)).toMap
+ val messages = Map("foo" -> 200)
+ kafkaTestUtils.sendMessages(topic, messages)
--- End diff --
The kafka messages don't actually need to be sent three times, just once is
sufficent for all tests. When I send "foo" 200 times in one batch, they all go
to the same partition. However, when I do this 3 times (for each of 100, 50,
20), the batches of 200 go to a random partition each time. I suspect something
in how the test kafka cluster does the partitioning.
I was usually getting 200 on 1 partitions, and 400 on the other 2. I was
explicitly changing the test case to "all the messages are on one partition"
since I can't control the split deterministically.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]