Github user JasonMWhite commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10089#discussion_r46449467
  
    --- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
    @@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
     
         val batchIntervalMilliseconds = 100
         val estimator = new ConstantEstimator(100)
    -    val messageKeys = (1 to 200).map(_.toString)
    -    val messages = messageKeys.map((_, 1)).toMap
    +    val messages = Map("foo" -> 200)
    +    kafkaTestUtils.sendMessages(topic, messages)
    --- End diff --
    
    The kafka messages don't actually need to be sent three times, just once is 
sufficent for all tests. When I send "foo" 200 times in one batch, they all go 
to the same partition. However, when I do this 3 times (for each of 100, 50, 
20), the batches of 200 go to a random partition each time. I suspect something 
in how the test kafka cluster does the partitioning.
    
    I was usually getting 200 on 1 partitions, and 400 on the other 2. I was 
explicitly changing the test case to "all the messages are on one partition" 
since I can't control the split deterministically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to