sekikn opened a new pull request #24651: [SPARK-27420][DSTREAMS][Kinesis] 
KinesisInputDStream should expose a way to configure CloudWatch metrics
URL: https://github.com/apache/spark/pull/24651
 
 
   ## What changes were proposed in this pull request?
   
   KinesisInputDStream currently does not provide a way to disable
   CloudWatch metrics push. Its default level is "DETAILED" which pushes
   10s of metrics every 10 seconds. When dealing with multiple streaming
   jobs this add up pretty quickly, leading to thousands of dollars in cost.
   To address this problem, this PR adds interfaces for accessing
   KinesisClientLibConfiguration's `withMetrics` and
   `withMetricsEnabledDimensions` methods to KinesisInputDStream
   so that users can configure KCL's metrics levels and dimensions.
   
   ## How was this patch tested?
   
   By running updated unit tests in KinesisInputDStreamBuilderSuite.
   In addition, I ran a Streaming job with MetricsLevel.NONE and confirmed:
   
   * there's no data point for the "Operation", "Operation, ShardId" and 
"WorkerIdentifier" dimensions on the AWS management console
   * there's no DEBUG level message from Amazon KCL, such as "Successfully 
published xx datums."
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to