have you read http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself
On Wed, Apr 26, 2017 at 1:17 PM, Dominik Safaric <dominiksafa...@gmail.com> wrote: > The reason why I want to obtain this information, i.e. <partition, offset, > timestamp> tuples is to relate the consumption with the production rates > using the __consumer_offsets Kafka internal topic. Interestedly, the Spark’s > KafkaConsumer implementation does not auto commit the offsets upon offset > commit expiration, because as seen in the logs, Spark overrides the > enable.auto.commit property to false. > > Any idea onto how to use the KafkaConsumer’s auto offset commits? Keep in > mind that I do not care about exactly-once, hence having messages replayed is > perfectly fine. > >> On 26 Apr 2017, at 19:26, Cody Koeninger <c...@koeninger.org> wrote: >> >> What is it you're actually trying to accomplish? >> >> You can get topic, partition, and offset bounds from an offset range like >> >> http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#obtaining-offsets >> >> Timestamp isn't really a meaningful idea for a range of offsets. >> >> >> On Tue, Apr 25, 2017 at 2:43 PM, Dominik Safaric >> <dominiksafa...@gmail.com> wrote: >>> Hi all, >>> >>> Because the Spark Streaming direct Kafka consumer maps offsets for a given >>> Kafka topic and a partition internally while having enable.auto.commit set >>> to false, how can I retrieve the offset of each made consumer’s poll call >>> using the offset ranges of an RDD? More precisely, the information I seek to >>> get after each poll call is the following: <timestamp, offset, partition>. >>> >>> Thanks in advance, >>> Dominik >>> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org