queue stream does not support driver checkpoint recovery since the RDDs in
the queue are arbitrary generated by the user and its hard for Spark
Streaming to keep track of the data in the RDDs (thats necessary for
recovering from checkpoint). Anyways queue stream is meant of testing and
Thanks,
I've updated my code to use updateStateByKey but am still getting these
errors when I resume from a checkpoint.
One thought of mine was that I used sc.parallelize to generate the RDDs for
the queue, but perhaps on resume, it doesn't recreate the context needed?
--
Shaanan Cohney
PhD
Where does task_batches come from?
On 22 Jun 2015 4:48 pm, Shaanan Cohney shaan...@gmail.com wrote:
Thanks,
I've updated my code to use updateStateByKey but am still getting these
errors when I resume from a checkpoint.
One thought of mine was that I used sc.parallelize to generate the RDDs
It's a generated set of shell commands to run (written in C, highly
optimized numerical computer), which is create from a set of user provided
parameters.
The snippet above is:
task_outfiles_to_cmds = OrderedDict(run_sieving.leftover_tasks)
Counts is a list (counts = []) in the driver, used to collect the results.
It seems like it's also not the best way to be doing things, but I'm new to
spark and editing someone else's code so still learning.
Thanks!
def update_state(out_files, counts, curr_rdd):
try:
for c in
I'm receiving the SPARK-5063 error (RDD transformations and actions can
only be invoked by the driver, not inside of other transformations)
whenever I try and restore from a checkpoint in spark streaming on my app.
I'm using python3 and my RDDs are inside a queuestream DStream.
This is the
What does counts refer to?
Could you also paste the code of your update_state function?
On 22 Jun 2015 12:48 pm, Shaanan Cohney shaan...@gmail.com wrote:
I'm receiving the SPARK-5063 error (RDD transformations and actions can
only be invoked by the driver, not inside of other transformations)
I would suggest you have a look at the updateStateByKey transformation in
the Spark Streaming programming guide which should fit your needs better
than your update_state function.
On 22 Jun 2015 1:03 pm, Shaanan Cohney shaan...@gmail.com wrote:
Counts is a list (counts = []) in the driver, used