Just an update that the kinesis checkpointing works well with orderly and
kill -9 driver shutdowns when there is less than 4 shards. We use 20+.
I created a case with Amazon support since it is the AWS kinesis getRecords
API which is hanging.
Regards,
Heji
On Thu, Nov 12, 2015 at 10:37 AM
Are you doing actual transformations / aggregation in Spark Streaming? Or
just using it to bulk write to S3?
If the latter, then you could just use your AWS Lambda function to read
directly from the Kinesis stream. If the former, then perhaps either look
into the WAL option that Aniket mentioned,
FYI re WAL on S3
http://search-hadoop.com/m/q3RTtFMpd41A7TnH/WAL+S3=WAL+on+S3
On 18 September 2015 at 13:32, Alan Dipert wrote:
> Hello,
>
> Thanks all for considering our problem. We are doing transformations in
> Spark Streaming. We have also since learned that WAL to S3
Hello,
Thanks all for considering our problem. We are doing transformations in
Spark Streaming. We have also since learned that WAL to S3 on 1.4 is "not
reliable" [1]
We are just going to wait for EMR to support 1.5 and hopefully this won't
be a problem anymore [2].
Alan
1.
Hello,
We are using Spark Streaming 1.4.1 in AWS EMR to process records from
Kinesis. Our Spark program saves RDDs to S3, after which the records are
picked up by a Lambda function that loads them into Redshift. That no data
is lost during processing is important to us.
We have set our Kinesis
You can perhaps setup a WAL that logs to S3? New cluster should pick the
records that weren't processed due previous cluster termination.
Thanks,
Aniket
On Thu, Sep 17, 2015, 9:19 PM Alan Dipert wrote:
> Hello,
> We are using Spark Streaming 1.4.1 in AWS EMR to process records