Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6863#discussion_r32788398
  
    --- Diff: docs/streaming-kafka-integration.md ---
    @@ -161,6 +161,8 @@ Next, we discuss how to use this approach in your 
streaming application.
     
        You can use this to update Zookeeper yourself if you want 
Zookeeper-based Kafka monitoring tools to show progress of the streaming 
application.
     
    -   Another thing to note is that since this approach does not use 
Receivers, the standard receiver-related (that is, 
[configurations](configuration.html) of the form `spark.streaming.receiver.*` ) 
will not apply to the input DStreams created by this approach (will apply to 
other input DStreams though). Instead, use the 
[configurations](configuration.html) `spark.streaming.kafka.*`. An important 
one is `spark.streaming.kafka.maxRatePerPartition` which is the maximum rate at 
which each Kafka partition will be read by this direct API. 
    +   Note that the typecast to HasOffsetRanges will only succeed if it is 
done in the first method called on the directKafkaStream, not later down a 
chain of methods. You can use transform() instead of foreachRDD() as your first 
method call in order to access offsets, then call further Spark methods. 
However, be aware that the one-to-one mapping between RDD partition and Kafka 
partition does not remain after any methods that shuffle or repartition, e.g. 
reduceByKey() or window().
    --- End diff --
    
    Good addition. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to