[GitHub] spark pull request #18234: [SPARK-19185][DSTREAM] Make Kafka consumer cache ...

koeninger Wed, 07 Jun 2017 11:59:53 -0700

Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18234#discussion_r120716157
  
    --- Diff: docs/streaming-kafka-0-10-integration.md ---
    @@ -91,7 +91,7 @@ The new Kafka consumer API will pre-fetch messages into 
buffers.  Therefore it i
     
     In most cases, you should use `LocationStrategies.PreferConsistent` as 
shown above.  This will distribute partitions evenly across available 
executors.  If your executors are on the same hosts as your Kafka brokers, use 
`PreferBrokers`, which will prefer to schedule partitions on the Kafka leader 
for that partition.  Finally, if you have a significant skew in load among 
partitions, use `PreferFixed`. This allows you to specify an explicit mapping 
of partitions to hosts (any unspecified partitions will use a consistent 
location).
     
    -The cache for consumers has a default maximum size of 64.  If you expect 
to be handling more than (64 * number of executors) Kafka partitions, you can 
change this setting via `spark.streaming.kafka.consumer.cache.maxCapacity`
    +The cache for consumers has a default maximum size of 64.  If you expect 
to be handling more than (64 * number of executors) Kafka partitions, you can 
change this setting via `spark.streaming.kafka.consumer.cache.maxCapacity`. If 
you would like to disable the caching for Kafka consumers, you can set 
`spark.streaming.kafka.consumer.cache.enabled` to `false`.
    --- End diff --
    
    Code change LGTM.
    
    I'd prefer clarifying / adding caveats to the documentation, rather than 
leaving it undocumented.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18234: [SPARK-19185][DSTREAM] Make Kafka consumer cache ...

Reply via email to