subject:"Checkpointing doesn't appear to be working for direct streaming from Kafka"

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-08-14 Thread Dmitry Goldenberg

Thanks, Cody. It sounds like Spark Streaming has enough state info to know how many batches have been processed and if not all of them then the RDD is 'unfinished'. I wonder if it would know whether the last micro-batch has been fully processed successfully. Hypothetically, the driver program could

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-08-14 Thread Cody Koeninger

You'll resume and re-process the rdd that didnt finish On Fri, Aug 14, 2015 at 1:31 PM, Dmitry Goldenberg wrote: > Our additional question on checkpointing is basically the logistics of it > -- > > At which point does the data get written into checkpointing? Is it > written as soon as the drive

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-08-14 Thread Dmitry Goldenberg

Our additional question on checkpointing is basically the logistics of it -- At which point does the data get written into checkpointing? Is it written as soon as the driver program retrieves an RDD from Kafka (or another source)? Or, is it written after that RDD has been processed and we're bas

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-07-31 Thread Dmitry Goldenberg

It looks like there's an issue with the 'Parameters' pojo I'm using within my driver program. For some reason that needs to be serializable, which is odd. java.io.NotSerializableException: com.kona.consumer.kafka.spark.Parameters Giving it another whirl though having to make it serializable seem

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-07-31 Thread Dmitry Goldenberg

I'll check the log info message.. Meanwhile, the code is basically public class KafkaSparkStreamingDriver implements Serializable { .. SparkConf sparkConf = createSparkConf(appName, kahunaEnv); JavaStreamingContext jssc = params.isCheckpointed() ? createCheckpointedContext(sparkCon

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-07-31 Thread Sean Owen

If you've set the checkpoint dir, it seems like indeed the intent is to use a default checkpoint interval in DStream: private[streaming] def initialize(time: Time) { ... // Set the checkpoint interval to be slideDuration or 10 seconds, which ever is larger if (mustCheckpoint && checkpointDurat

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-07-31 Thread Cody Koeninger

Show us the relevant code On Fri, Jul 31, 2015 at 12:16 PM, Dmitry Goldenberg < dgoldenberg...@gmail.com> wrote: > I've instrumented checkpointing per the programming guide and I can tell > that Spark Streaming is creating the checkpoint directories but I'm not > seeing any content being created

Checkpointing doesn't appear to be working for direct streaming from Kafka

2015-07-31 Thread Dmitry Goldenberg

I've instrumented checkpointing per the programming guide and I can tell that Spark Streaming is creating the checkpoint directories but I'm not seeing any content being created in those directories nor am I seeing the effects I'd expect from checkpointing. I'd expect any data that comes into Kafk

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Re: Checkpointing doesn't appear to be working for direct streaming from Kafka

Checkpointing doesn't appear to be working for direct streaming from Kafka

8 matches

Site Navigation

Mail list logo

Footer information