Github user cfregly commented on a diff in the pull request:
https://github.com/apache/spark/pull/6147#discussion_r30775176
--- Diff:
extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -16,29 +16,75 @@
*/
package org.apache.spark.streaming.kinesis
-import org.apache.spark.annotation.Experimental
+import com.amazonaws.regions.RegionUtils
+import
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
import org.apache.spark.storage.StorageLevel
-import org.apache.spark.streaming.Duration
-import org.apache.spark.streaming.StreamingContext
-import org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-import org.apache.spark.streaming.api.java.JavaStreamingContext
+import org.apache.spark.streaming.api.java.{JavaReceiverInputDStream,
JavaStreamingContext}
import org.apache.spark.streaming.dstream.ReceiverInputDStream
-
-import
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+import org.apache.spark.streaming.{Duration, StreamingContext}
-/**
- * Helper class to create Amazon Kinesis Input Stream
- * :: Experimental ::
- */
-@Experimental
object KinesisUtils {
/**
- * Create an InputDStream that pulls messages from a Kinesis stream.
- * :: Experimental ::
- * @param ssc StreamingContext object
+ * Create an input stream that pulls messages from a Kinesis stream.
+ * This uses the Kinesis Client Library (KCL) to pull messages from
Kinesis.
+ *
+ * Note: The AWS credentials will be discovered using the
DefaultAWSCredentialsProviderChain
+ * on the workers. See AWS documentation to understand how
DefaultAWSCredentialsProviderChain
+ * gets the AWS credentials.
+ *
+ * @param ssc StreamingContext object
+ * @param kinesisAppName Kinesis application name used by the Kinesis
Client Library
+ * (KCL) to update DynamoDB
+ * @param streamName Kinesis stream name
+ * @param endpointUrl Url of Kinesis service (e.g.,
https://kinesis.us-east-1.amazonaws.com)
+ * @param regionName Name of region used by the Kinesis Client Library
(KCL) to update
+ * DynamoDB (lease coordination and checkpointing)
and CloudWatch (metrics)
+ * @param initialPositionInStream In the absence of Kinesis checkpoint
info, this is the
+ * worker's initial starting position in
the stream.
+ * The values are either the beginning
of the stream
+ * per Kinesis' limit of 24 hours
+ *
(InitialPositionInStream.TRIM_HORIZON) or
+ * the tip of the stream
(InitialPositionInStream.LATEST).
+ * @param checkpointInterval Checkpoint interval for Kinesis
checkpointing.
--- End diff --
not sure why i keep thinking this checkpointInterval should go above
initialPositionInStream. not a big deal, but i remember it being different for
some reason.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]