Thanks, Cody. It sounds like Spark Streaming has enough state info to know
how many batches have been processed and if not all of them then the RDD is
'unfinished'. I wonder if it would know whether the last micro-batch has
been fully processed successfully. Hypothetically, the driver program could
You'll resume and re-process the rdd that didnt finish
On Fri, Aug 14, 2015 at 1:31 PM, Dmitry Goldenberg wrote:
> Our additional question on checkpointing is basically the logistics of it
> --
>
> At which point does the data get written into checkpointing? Is it
> written as soon as the drive
Our additional question on checkpointing is basically the logistics of it --
At which point does the data get written into checkpointing? Is it written
as soon as the driver program retrieves an RDD from Kafka (or another
source)? Or, is it written after that RDD has been processed and we're
bas
It looks like there's an issue with the 'Parameters' pojo I'm using within
my driver program. For some reason that needs to be serializable, which is
odd.
java.io.NotSerializableException: com.kona.consumer.kafka.spark.Parameters
Giving it another whirl though having to make it serializable seem
I'll check the log info message..
Meanwhile, the code is basically
public class KafkaSparkStreamingDriver implements Serializable {
..
SparkConf sparkConf = createSparkConf(appName, kahunaEnv);
JavaStreamingContext jssc = params.isCheckpointed() ?
createCheckpointedContext(sparkCon
If you've set the checkpoint dir, it seems like indeed the intent is
to use a default checkpoint interval in DStream:
private[streaming] def initialize(time: Time) {
...
// Set the checkpoint interval to be slideDuration or 10 seconds,
which ever is larger
if (mustCheckpoint && checkpointDurat
Show us the relevant code
On Fri, Jul 31, 2015 at 12:16 PM, Dmitry Goldenberg <
dgoldenberg...@gmail.com> wrote:
> I've instrumented checkpointing per the programming guide and I can tell
> that Spark Streaming is creating the checkpoint directories but I'm not
> seeing any content being created
I've instrumented checkpointing per the programming guide and I can tell
that Spark Streaming is creating the checkpoint directories but I'm not
seeing any content being created in those directories nor am I seeing the
effects I'd expect from checkpointing. I'd expect any data that comes into
Kafk