Hey Cody, I would have responded to the mailing list but it looks like this
thread got aged off. I have the problem where one of my topics dumps more
data than my spark job can keep up with. We limit the input rate with
maxRatePerPartition Eventually, when the data is aged off, I get the
OffsetOutO
ranges. See the definition of count() in recent versions of
> KafkaRDD for an example.
>
> On Wed, Dec 9, 2015 at 9:39 AM, Dan Dutrow wrote:
>
>> Is there documentation for how to update the metrics (#messages per
>> batch) in the Spark Streaming tab when using the Direct
Is there documentation for how to update the metrics (#messages per batch)
in the Spark Streaming tab when using the Direct API? Does the Streaming
tab get its information from Zookeeper or something else internally?
--
Dan ✆
a particular partition, will that partition be ignored?
On Wed, Dec 2, 2015 at 3:17 PM Cody Koeninger wrote:
> Use the direct stream. You can put multiple topics in a single stream,
> and differentiate them on a per-partition basis using the offset range.
>
> On Wed, Dec 2,
After digging into the Kafka code some more (specifically
kafka.consumer.KafkaStream, kafka.consumer.ConsumerIterator and
kafka.message.MessageAndMetadata), it appears that the Left value of the
tuple is not the topic name but rather a key that Kafka puts on each
message. See http://kafka.apache.or
e direct stream. You can put multiple topics in a single stream,
> and differentiate them on a per-partition basis using the offset range.
>
> On Wed, Dec 2, 2015 at 2:13 PM, dutrow wrote:
>
>> I found the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2388
>
I found the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2388
It was marked as invalid.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-from-multiple-topics-tp8678p25550.html
Sent from the Apache Spark User List mailing list arch
My need is similar; I have 10+ topics and don't want to dedicate 10 cores to
processing all of them. Like yourself and others, the (String, String) pair
that comes out of the DStream has (null, StringData...) values instead of
(topic name, StringData...)
Did anyone ever find a way around this issu
for determining whether to start the
> stream at the beginning or the end of the log... if someone's provided
> explicit offsets, it's pretty clear where to start the stream.
>
> On Tue, Sep 8, 2015 at 1:19 PM, Dan Dutrow wrote:
>
>> Yes, but one implementation chec
ue, Sep 8, 2015 at 11:44 AM, Dan Dutrow wrote:
>
>> The two methods of createDirectStream appear to have different
>> implementations, the second checks the offset.reset flags and does some
>> error handling while the first does not. Besides the use of a
>> messageHandler
The two methods of createDirectStream appear to have different
implementations, the second checks the offset.reset flags and does some
error handling while the first does not. Besides the use of a
messageHandler, are they intended to be used in different situations?
def createDirectStream[ K: Clas
ave offsets to ZK, you can. The right way to make that easier is
> to expose the (currently private) methods that already exist in
> KafkaCluster.scala for committing offsets through Kafka's api. I don't
> think adding another "do the wrong thing" option is beneficial
In summary, it appears that the use of the DirectAPI was intended
specifically to enable exactly-once semantics. This can be achieved for
idempotent transformations and with transactional processing using the
database to guarantee an "onto" mapping of results based on inputs. For the
latter, you ne
For those who find this post and may be interested, the most thorough
documentation on the subject may be found here:
https://github.com/koeninger/kafka-exactly-once
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246
How do I get beyond the "This post has NOT been accepted by the mailing list
yet" message? This message was posted through the nabble interface; one
would think that would be enough to get the message accepted.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
15 matches
Mail list logo