Hello,

Thanks all for considering our problem.  We are doing transformations in
Spark Streaming.  We have also since learned that WAL to S3 on 1.4 is "not
reliable" [1]

We are just going to wait for EMR to support 1.5 and hopefully this won't
be a problem anymore [2].

Alan

1.
https://mail-archives.apache.org/mod_mbox/spark-user/201508.mbox/%3CCA+AHuKkH9r0BwQMgQjDG+j=qdcqzpow1rw1u4d0nrcgmq5x...@mail.gmail.com%3E
2. https://issues.apache.org/jira/browse/SPARK-9215

On Fri, Sep 18, 2015 at 4:23 AM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> Are you doing actual transformations / aggregation in Spark Streaming? Or
> just using it to bulk write to S3?
>
> If the latter, then you could just use your AWS Lambda function to read
> directly from the Kinesis stream. If the former, then perhaps either look
> into the WAL option that Aniket mentioned, or perhaps you could write the
> processed RDD back to Kinesis, and have the Lambda function read the
> Kinesis stream and write to Redshift?
>
> On Thu, Sep 17, 2015 at 5:48 PM, Alan Dipert <a...@dipert.org> wrote:
>
>> Hello,
>> We are using Spark Streaming 1.4.1 in AWS EMR to process records from
>> Kinesis.  Our Spark program saves RDDs to S3, after which the records are
>> picked up by a Lambda function that loads them into Redshift.  That no data
>> is lost during processing is important to us.
>>
>> We have set our Kinesis checkpoint interval to 15 minutes, which is also
>> our window size.
>>
>> Unfortunately, checkpointing happens after receiving data from Kinesis,
>> not after we have successfully written to S3.  If batches back up in Spark,
>> and the cluster is terminated, whatever data was in-memory will be lost
>> because it was checkpointed but not actually saved to S3.
>>
>> We are considering forking and modifying the kinesis-asl library with
>> changes that would allow us to perform the checkpoint manually and at the
>> right time.  We'd rather not do this.
>>
>> Are we overlooking an easier way to deal with this problem?  Thank you in
>> advance for your insight!
>>
>> Alan
>>
>
>

Reply via email to