[jira] [Assigned] (SPARK-24970) Spark Kinesis streaming application fails to recover from streaming checkpoint due to ProvisionedThroughputExceededException

2018-07-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24970:


Assignee: (was: Apache Spark)

> Spark Kinesis streaming application fails to recover from streaming 
> checkpoint due to ProvisionedThroughputExceededException
> 
>
> Key: SPARK-24970
> URL: https://issues.apache.org/jira/browse/SPARK-24970
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: bruce_zhao
>Priority: Major
>  Labels: kinesis
>
>  
> We're using Spark streaming to consume Kinesis data, and found that it reads 
> more data from Kinesis and is easy to touch 
> ProvisionedThroughputExceededException *when it recovers from streaming 
> checkpoint*. 
>  
> Normally, it's a WARN in spark log. But when we have multiple streaming 
> applications (i.e., 5 applications) to consume the same Kinesis stream, the 
> situation becomes serious. *The application will fail to recover due to the 
> following exception in driver.*  And one application failure will also affect 
> the other running applications. 
>   
> {panel:title=Exception}
> org.apache.spark.SparkException: Job aborted due to stage failure: 
> {color:#ff}*Task 5 in stage 7.0 failed 4 times, most recent 
> failure*:{color} Lost task 5.3 in stage 7.0 (TID 128, 
> ip-172-31-14-36.ap-northeast-1.compute.internal, executor 1): 
> org.apache.spark.SparkException: Gave up after 3 retries while getting 
> records using shard iterator, last exception: at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator$$anonfun$retryOrTimeout$2.apply(KinesisBackedBlockRDD.scala:288)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.retryOrTimeout(KinesisBackedBlockRDD.scala:282)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getRecordsAndNextKinesisIterator(KinesisBackedBlockRDD.scala:223)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getRecords(KinesisBackedBlockRDD.scala:207)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getNext(KinesisBackedBlockRDD.scala:162)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getNext(KinesisBackedBlockRDD.scala:133)
>  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
>  at 
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
>  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954) at 
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
> Caused by: 
> *{color:#ff}com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException:
>  Rate exceeded for shard shardId- in stream rellfsstream-an under 
> account 1{color}.* (Service: AmazonKinesis; Status Code: 400; 
> Error Code: ProvisionedThroughputExceededException; Request ID: 
> d3520677-060e-14c4-8014-2886b6b75f03) at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1587)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1257)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1029)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:741)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:715)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:697)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:665)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:647)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:511) at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2219)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2195)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.executeGetRecords(AmazonKinesisClient.java:1004)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:980)
>  at 
> 

[jira] [Assigned] (SPARK-24970) Spark Kinesis streaming application fails to recover from streaming checkpoint due to ProvisionedThroughputExceededException

2018-07-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24970:


Assignee: Apache Spark

> Spark Kinesis streaming application fails to recover from streaming 
> checkpoint due to ProvisionedThroughputExceededException
> 
>
> Key: SPARK-24970
> URL: https://issues.apache.org/jira/browse/SPARK-24970
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: bruce_zhao
>Assignee: Apache Spark
>Priority: Major
>  Labels: kinesis
>
>  
> We're using Spark streaming to consume Kinesis data, and found that it reads 
> more data from Kinesis and is easy to touch 
> ProvisionedThroughputExceededException *when it recovers from streaming 
> checkpoint*. 
>  
> Normally, it's a WARN in spark log. But when we have multiple streaming 
> applications (i.e., 5 applications) to consume the same Kinesis stream, the 
> situation becomes serious. *The application will fail to recover due to the 
> following exception in driver.*  And one application failure will also affect 
> the other running applications. 
>   
> {panel:title=Exception}
> org.apache.spark.SparkException: Job aborted due to stage failure: 
> {color:#ff}*Task 5 in stage 7.0 failed 4 times, most recent 
> failure*:{color} Lost task 5.3 in stage 7.0 (TID 128, 
> ip-172-31-14-36.ap-northeast-1.compute.internal, executor 1): 
> org.apache.spark.SparkException: Gave up after 3 retries while getting 
> records using shard iterator, last exception: at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator$$anonfun$retryOrTimeout$2.apply(KinesisBackedBlockRDD.scala:288)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.retryOrTimeout(KinesisBackedBlockRDD.scala:282)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getRecordsAndNextKinesisIterator(KinesisBackedBlockRDD.scala:223)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getRecords(KinesisBackedBlockRDD.scala:207)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getNext(KinesisBackedBlockRDD.scala:162)
>  at 
> org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.getNext(KinesisBackedBlockRDD.scala:133)
>  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
>  at 
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
>  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954) at 
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
> Caused by: 
> *{color:#ff}com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException:
>  Rate exceeded for shard shardId- in stream rellfsstream-an under 
> account 1{color}.* (Service: AmazonKinesis; Status Code: 400; 
> Error Code: ProvisionedThroughputExceededException; Request ID: 
> d3520677-060e-14c4-8014-2886b6b75f03) at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1587)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1257)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1029)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:741)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:715)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:697)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:665)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:647)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:511) at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2219)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2195)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.executeGetRecords(AmazonKinesisClient.java:1004)
>  at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:980)
>