[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4473


---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-15 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r133121728
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -20,14 +20,24 @@
 import org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer;
 
 /**
+ * @deprecated
--- End diff --

@tzulitai  Thanks for the reminder!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-14 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r133119309
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -20,14 +20,24 @@
 import org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer;
 
 /**
+ * @deprecated
--- End diff --

@bowenli86 one thing to mention:
We should always try to have good Javadoc for why something is deprecated. 
Sorry but I overlooked this on my reviewing. I'll address this myself this time 
while merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-14 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132879596
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -57,6 +55,9 @@
/** Properties to parametrize settings such as AWS service region, 
access key etc. */
private final Properties configProps;
 
+   /** Configuration for KinesisProducer. */
+   private final KinesisProducerConfiguration producerConfig;
--- End diff --

agree. I'm moving it to `open()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-14 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132879159
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -57,6 +55,9 @@
/** Properties to parametrize settings such as AWS service region, 
access key etc. */
private final Properties configProps;
 
+   /** Configuration for KinesisProducer. */
+   private final KinesisProducerConfiguration producerConfig;
--- End diff --

`KinesisProducerConfiguration` isn't serializable. We need to make it 
`transient`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-14 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132879261
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -57,6 +55,9 @@
/** Properties to parametrize settings such as AWS service region, 
access key etc. */
private final Properties configProps;
 
+   /** Configuration for KinesisProducer. */
+   private final KinesisProducerConfiguration producerConfig;
--- End diff --

On second thought: do we actually really need to have a field for this?
Or can we just instantiate in in the method (it doesn't seem to be used 
across different methods)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-14 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132879203
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -115,13 +116,13 @@ public String getTargetStream(OUT element) {
 * This is a constructor supporting {@see KinesisSerializationSchema}.
 *
 * @param schema Kinesis serialization schema for the data type
-* @param configProps The properties used to configure AWS credentials 
and AWS region
+* @param configProps The properties used to configure KinesisProducer, 
including AWS credentials and AWS region
 */
public FlinkKinesisProducer(KinesisSerializationSchema schema, 
Properties configProps) {
-   this.configProps = checkNotNull(configProps, "configProps can 
not be null");
-
-   // check the configuration properties for any conflicting 
settings
-   
KinesisConfigUtil.validateProducerConfiguration(this.configProps);
+   checkNotNull(configProps, "configProps can not be null");
+   this.configProps = 
KinesisConfigUtil.replaceDeprecatedProducerKeys(configProps);
+   // check the configuration properties for any invalid settings
+   this.producerConfig = 
KinesisConfigUtil.validateProducerConfiguration(configProps);
--- End diff --

For non-serializable fields that needs to be `transient`, we should only 
initialize them in `open`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-11 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132636379
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -22,12 +22,14 @@
 /**
  * Optional producer specific configuration keys for {@link 
FlinkKinesisProducer}.
  */
+@Deprecated
 public class ProducerConfigConstants extends AWSConfigConstants {
 
-   /** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
-
-   /** Maximum number of items to pack into an aggregated record. **/
-   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+   /** Deprecated key. **/
+   @Deprecated
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
 
+   /** Deprecated key. **/
+   @Deprecated
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
--- End diff --

This cannot be renamed, as it completely breaks user code that uses 
`ProducerConfigConstants. AGGREGATION_MAX_COUNT`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-11 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132636613
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java
 ---
@@ -38,6 +39,23 @@
  * Utilities for Flink Kinesis connector configuration.
  */
 public class KinesisConfigUtil {
+   /** Maximum number of items to pack into an PutRecords request. **/
+   @Deprecated
+   protected static final String COLLECTION_MAX_COUNT = 
"CollectionMaxCount";
+
+   /** Maximum number of items to pack into an aggregated record. **/
+   @Deprecated
+   protected static final String AGGREGATION_MAX_COUNT = 
"AggregationMaxCount";
--- End diff --

Same here; deprecation not required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-11 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132636498
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java
 ---
@@ -38,6 +39,23 @@
  * Utilities for Flink Kinesis connector configuration.
  */
 public class KinesisConfigUtil {
+   /** Maximum number of items to pack into an PutRecords request. **/
--- End diff --

nit: Add empty line before comment block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-11 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132636582
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java
 ---
@@ -38,6 +39,23 @@
  * Utilities for Flink Kinesis connector configuration.
  */
 public class KinesisConfigUtil {
+   /** Maximum number of items to pack into an PutRecords request. **/
+   @Deprecated
+   protected static final String COLLECTION_MAX_COUNT = 
"CollectionMaxCount";
--- End diff --

This doesn't require deprecation, as it isn't exposed to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-11 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132636335
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -22,12 +22,14 @@
 /**
  * Optional producer specific configuration keys for {@link 
FlinkKinesisProducer}.
  */
+@Deprecated
 public class ProducerConfigConstants extends AWSConfigConstants {
 
-   /** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
-
-   /** Maximum number of items to pack into an aggregated record. **/
-   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+   /** Deprecated key. **/
+   @Deprecated
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
--- End diff --

This cannot be renamed, as it completely breaks user code that uses 
`ProducerConfigConstants. COLLECTION_MAX_COUNT`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132376988
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

`CollectionMaxCount` and `AllocationMaxCount` will be `protected` since 
there's a unit test using them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132376112
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -165,17 +167,10 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
-
-   
producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));

producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
-   if 
(configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
-   
producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.COLLECTION_MAX_COUNT, 
producerConfig.getCollectionMaxCount(), LOG));
-   }
-   if 
(configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
-   
producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.AGGREGATION_MAX_COUNT, 
producerConfig.getAggregationMaxCount(), LOG));
+   // Override KPL default value if it's not specified by user
+   if 
(!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
+   
producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
--- End diff --

Yeah, either way. This is not really a validation, but a replacement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132375356
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

Sounds like a good idea.
It seems like that `RATE_LIMIT`, `CollectionMaxCount`, `AllocationMaxCount` 
can be moved to `KinesisConfigUtils` as private static final Strings, though, 
as that should be the only place where they are used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132375079
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

`ProducerConfigConstants` and `FlinkKinesisProducer` don't live in the same 
package. Let me use plain strings and see how it looks like.

Besides, why don't we push a step forward now? How about I move RATE_LIMIT 
to `FlinkKinesisProducer`, and deprecate `ProducerConfigConstants`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132374210
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

Ideally, we should deprecate `ProducerConfigConstants` and remove the class 
in the long run, since we rely solely on `Properties` parsing now and have no 
"Flink-specific" key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132373930
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

Hmmm, maybe have package private static Strings for "CollectionMaxCount" 
and "AllocationMaxCount", with comments on what they are used for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread bowenli86
Github user bowenli86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132373651
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

Are you ok with using plain string "CollectionMaxCount" and 
"AllocationMaxCount" in code directly?

I actually use those strings directly in code at the beginning. Then change 
to these static final definition because I worried it might not conform to 
checkstyle as "magic strings", similar to "magic number". 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132373446
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -165,17 +167,10 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
-
-   
producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));

producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
-   if 
(configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
-   
producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.COLLECTION_MAX_COUNT, 
producerConfig.getCollectionMaxCount(), LOG));
-   }
-   if 
(configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
-   
producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.AGGREGATION_MAX_COUNT, 
producerConfig.getAggregationMaxCount(), LOG));
+   // Override KPL default value if it's not specified by user
+   if 
(!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
+   
producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
--- End diff --

Ah I see why now, just read your Javadocs :)
Should we move this override to 
`KinesisConfigUtil.validateProducerConfiguration`, where the producer config is 
built?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132373167
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

I think what we should do is just deprecate `COLLECTION_MAX_COUNT` and 
`AGGREGATION_MAX_COUNT`, and in the Javadoc direct the user to directly refer 
to the KPL config keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132373023
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
--- End diff --

Do we really want to expose these config keys now?
AFAIK, if the user wants to tweak these, with this change they should just 
simply refer to the KPL docs to see what keys are available. Explicitly 
exposing partially some keys in Flink but not others is a bit weird, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132372556
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
 
/** Maximum number of items to pack into an aggregated record. **/
-   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+   public static final String AGGREGATION_MAX_COUNT = 
"AggregationMaxCount";
+
+   /** Limits the maximum allowed put rate for a shard, as a percentage of 
the backend limits.
+* The default value is set as 100% in Flink. KPL's default value is 
150% but it throws RateLimitExceededException
+* too frequently and breaks Flink sink.
+**/
+   public static final String RATE_LIMIT = "RateLimit";
+
+
--- End diff --

nit: unneccesary 2 empty lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132372454
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
/** Maximum number of items to pack into an PutRecords request. **/
-   public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+   public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
 
/** Maximum number of items to pack into an aggregated record. **/
-   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+   public static final String AGGREGATION_MAX_COUNT = 
"AggregationMaxCount";
+
+   /** Limits the maximum allowed put rate for a shard, as a percentage of 
the backend limits.
+* The default value is set as 100% in Flink. KPL's default value is 
150% but it throws RateLimitExceededException
+* too frequently and breaks Flink sink.
--- End diff --

We might need to document this, if we're not following the KPL defaults.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132372254
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
--- End diff --

Add `@deprecated` annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132372291
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,28 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
+   /** Deprecated key. **/
+   public static final String DEPRECATED_COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
+
+   /** Deprecated key. **/
+   public static final String DEPRECATED_AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
--- End diff --

Add `@deprecated` annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-10 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r132372133
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -165,17 +167,10 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
-
-   
producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));

producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
-   if 
(configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
-   
producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.COLLECTION_MAX_COUNT, 
producerConfig.getCollectionMaxCount(), LOG));
-   }
-   if 
(configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
-   
producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.AGGREGATION_MAX_COUNT, 
producerConfig.getAggregationMaxCount(), LOG));
+   // Override KPL default value if it's not specified by user
+   if 
(!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
+   
producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
--- End diff --

Why do we need to explicitly override the default `RATE_LIMIT`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-07 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r131578056
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java
 ---
@@ -24,10 +24,30 @@
  */
 public class ProducerConfigConstants extends AWSConfigConstants {
 
-   /** Maximum number of items to pack into an PutRecords request. **/
+   /** Maximum number of KPL user records to store in a single Kinesis 
Streams record (an aggregated record). */
+   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+
+   /** Maximum number of Kinesis Streams records to pack into an 
PutRecords request. */
public static final String COLLECTION_MAX_COUNT = 
"aws.producer.collectionMaxCount";
 
-   /** Maximum number of items to pack into an aggregated record. **/
-   public static final String AGGREGATION_MAX_COUNT = 
"aws.producer.aggregationMaxCount";
+   /** Maximum number of connections to open to the backend. HTTP requests 
are
+* sent in parallel over multiple connections */
--- End diff --

Style consistency: missing period at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-07 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4473#discussion_r131577895
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -169,14 +169,27 @@ public void open(Configuration parameters) throws 
Exception {
 

producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));

producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
-   if 
(configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
-   
producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.COLLECTION_MAX_COUNT, 
producerConfig.getCollectionMaxCount(), LOG));
-   }
-   if 
(configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
-   
producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
-   
ProducerConfigConstants.AGGREGATION_MAX_COUNT, 
producerConfig.getAggregationMaxCount(), LOG));
-   }
+
+   
producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
+   ProducerConfigConstants.AGGREGATION_MAX_COUNT, 
producerConfig.getAggregationMaxCount(), LOG));
+
+   
producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
+   ProducerConfigConstants.COLLECTION_MAX_COUNT, 
producerConfig.getCollectionMaxCount(), LOG));
+
+   
producerConfig.setMaxConnections(PropertiesUtil.getLong(configProps,
+   ProducerConfigConstants.MAX_CONNECTIONS, 
producerConfig.getMaxConnections(), LOG));
+
+   producerConfig.setRateLimit(PropertiesUtil.getLong(configProps,
+   ProducerConfigConstants.RATE_LIMIT, 
producerConfig.getRateLimit(), LOG));
--- End diff --

Starting from this line, the indentation is not consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

2017-08-03 Thread bowenli86
GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/4473

[FLINK-7367][kinesis connector] Parameterize more configs for 
FlinkKinesisProducer (RecordMaxBufferedTime, MaxConnections, RequestTimeout, 
etc)


## What is the purpose of the change

Right now, FlinkKinesisProducer only expose two configs for the underlying 
KinesisProducer:

- AGGREGATION_MAX_COUNT
- COLLECTION_MAX_COUNT

Well, according to AWS doc and their sample on github, developers can set 
more to make the max use of KinesisProducer, and make it fault-tolerant (e.g. 
by increasing timeout). I select a few more configs that we need when using 
Flink with Kinesis:

- MAX_CONNECTIONS
- RATE_LIMIT
- RECORD_MAX_BUFFERED_TIME
- RECORD_TIME_TO_LIVE
- REQUEST_TIMEOUT

We need to parameterize FlinkKinesisProducer to pass in the above params, 
in order to cater to our need

## Brief change log

  - *Added more config values into `ProducerConfigConstants`*
  - *Made FlinkKinesisProducer pick up more configs*
  - *Added an example in doc*


## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.


## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-7363

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4473


commit ac0b4d22fde763dde62b26ed9a022d537bb29e58
Author: Bowen Li 
Date:   2017-08-04T03:59:02Z

FLINK-7367 Parameterize FlinkKinesisProducer on RecordMaxBufferedTime, 
MaxConnections, RequestTimeout, and more




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---