[jira] [Updated] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-7243:
--
Description: Add a {{ParquetInputFormat}} to read data from a Apache 
Parquet file.   (was: Add a ```ParquetInputFormat``` to read data from a Apache 
Parquet file. )

> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-7243:
--
Description: Add a ```ParquetInputFormat``` to read data from a Apache 
Parquet file.   (was: Add a `ParquetInputFormat` to read data from a Apache 
Parquet file. )

> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Add a ```ParquetInputFormat``` to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-7243:
--
Description: Add a `ParquetInputFormat` to read data from a Apache Parquet 
file.   (was: Add a ParquetInputFormat to read data from a Apache Parquet file. 
)

> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Add a `ParquetInputFormat` to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-7243:
--
Description: Add a ParquetInputFormat to read data from a Apache Parquet 
file. 

> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Add a ParquetInputFormat to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4259: [FLINK-7105] Make ActorSystems non daemonic

2017-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4259


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7118) Remove hadoop1.x code in HadoopUtils

2017-07-23 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097533#comment-16097533
 ] 

Aljoscha Krettek commented on FLINK-7118:
-

Agreed, it doesn't hurt to have these in 1.3.x.

> Remove hadoop1.x code in HadoopUtils
> 
>
> Key: FLINK-7118
> URL: https://issues.apache.org/jira/browse/FLINK-7118
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: mingleizhang
>Assignee: mingleizhang
> Fix For: 1.4.0
>
>
> Since flink no longer support hadoop 1.x version, we should remove it. Below 
> code reside in {{org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils}}
>   
> {code:java}
> public static JobContext instantiateJobContext(Configuration configuration, 
> JobID jobId) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.JobContext", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, JobID.class);
>   JobContext context = (JobContext) 
> constructor.newInstance(configuration, jobId);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> JobContext.");
>   }
>   }
> {code}
> And 
> {code:java}
>   public static TaskAttemptContext 
> instantiateTaskAttemptContext(Configuration configuration,  TaskAttemptID 
> taskAttemptID) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl");
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.TaskAttemptContext");
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, TaskAttemptID.class);
>   TaskAttemptContext context = (TaskAttemptContext) 
> constructor.newInstance(configuration, taskAttemptID);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> TaskAttemptContext.");
>   }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7118) Remove hadoop1.x code in HadoopUtils

2017-07-23 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-7118:

Fix Version/s: (was: 1.3.2)
   1.4.0

> Remove hadoop1.x code in HadoopUtils
> 
>
> Key: FLINK-7118
> URL: https://issues.apache.org/jira/browse/FLINK-7118
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: mingleizhang
>Assignee: mingleizhang
> Fix For: 1.4.0
>
>
> Since flink no longer support hadoop 1.x version, we should remove it. Below 
> code reside in {{org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils}}
>   
> {code:java}
> public static JobContext instantiateJobContext(Configuration configuration, 
> JobID jobId) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.JobContext", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, JobID.class);
>   JobContext context = (JobContext) 
> constructor.newInstance(configuration, jobId);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> JobContext.");
>   }
>   }
> {code}
> And 
> {code:java}
>   public static TaskAttemptContext 
> instantiateTaskAttemptContext(Configuration configuration,  TaskAttemptID 
> taskAttemptID) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl");
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.TaskAttemptContext");
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, TaskAttemptID.class);
>   TaskAttemptContext context = (TaskAttemptContext) 
> constructor.newInstance(configuration, taskAttemptID);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> TaskAttemptContext.");
>   }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7244) Add ParquetTableSource Implementation based on ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)
godfrey he created FLINK-7244:
-

 Summary: Add ParquetTableSource Implementation based on 
ParquetInputFormat
 Key: FLINK-7244
 URL: https://issues.apache.org/jira/browse/FLINK-7244
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7118) Remove hadoop1.x code in HadoopUtils

2017-07-23 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097542#comment-16097542
 ] 

mingleizhang commented on FLINK-7118:
-

Thanks to [~greghogan] and [~aljoscha]. Nice!

> Remove hadoop1.x code in HadoopUtils
> 
>
> Key: FLINK-7118
> URL: https://issues.apache.org/jira/browse/FLINK-7118
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: mingleizhang
>Assignee: mingleizhang
> Fix For: 1.4.0
>
>
> Since flink no longer support hadoop 1.x version, we should remove it. Below 
> code reside in {{org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils}}
>   
> {code:java}
> public static JobContext instantiateJobContext(Configuration configuration, 
> JobID jobId) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.JobContext", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, JobID.class);
>   JobContext context = (JobContext) 
> constructor.newInstance(configuration, jobId);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> JobContext.");
>   }
>   }
> {code}
> And 
> {code:java}
>   public static TaskAttemptContext 
> instantiateTaskAttemptContext(Configuration configuration,  TaskAttemptID 
> taskAttemptID) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl");
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.TaskAttemptContext");
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, TaskAttemptID.class);
>   TaskAttemptContext context = (TaskAttemptContext) 
> constructor.newInstance(configuration, taskAttemptID);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> TaskAttemptContext.");
>   }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4421) Make clocks and time measurements monotonous

2017-07-23 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097627#comment-16097627
 ] 

Stephan Ewen commented on FLINK-4421:
-

The clocks can jump due to re-synchronization, and also due to other factors in 
virtualized settings. We have seen that happen for example in Travis tests.

Flink's runtime makes no use of time for correctness in any place, so this is 
not critical, as far as I know.
But it may lead to test instability, when tests use 
{{System.currentTimeMillis()}} to measure deadlines.

It may also lead to "weird metrics", when the JobManager reports a negative 
duration for the restart time, for example.

I would actually lazily migrate these, rather than doing a big refactor effort, 
because that is prone to introduce other bugs...


> Make clocks and time measurements monotonous
> 
>
> Key: FLINK-4421
> URL: https://issues.apache.org/jira/browse/FLINK-4421
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stephan Ewen
>Priority: Minor
>
> Currently, many places use {{System.currentTimeMillis()}} to acquire 
> timestamps or measure time intervals.
> Since this relies on the system clock, and the system clock is not 
> necessarily monotonous (in the presence of clock updates), this can lead to 
> negative duration and decreasing timestamps where increasing timestamps are 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7245) Enhance the operators to support holding back watermarks

2017-07-23 Thread Xingcan Cui (JIRA)
Xingcan Cui created FLINK-7245:
--

 Summary: Enhance the operators to support holding back watermarks
 Key: FLINK-7245
 URL: https://issues.apache.org/jira/browse/FLINK-7245
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API
Reporter: Xingcan Cui
Assignee: Xingcan Cui


Currently the watermarks are applied and emitted by the 
{{AbstractStreamOperator}} instantly. 
{code:java}
public void processWatermark(Watermark mark) throws Exception {
if (timeServiceManager != null) {
timeServiceManager.advanceWatermark(mark);
}
output.emitWatermark(mark);
}
{code}

Some calculation results (with timestamp fields) triggered by these watermarks 
(e.g., join or aggregate results) may be regarded as delayed by the downstream 
operators since their timestamps must be less than or equal to the 
corresponding triggers. 

This issue aims to add another "working mode", which supports holding back 
watermarks, to current operators. These watermarks should be blocked and stored 
by the operators until all the corresponding new generated results are emitted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097624#comment-16097624
 ] 

ASF GitHub Bot commented on FLINK-6301:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4015
  
@vidhu5269 The account management from the Apache/Github integration 
actually means that we cannot manually close pull requests, only via commit 
messages. So we would need you to close the pull request for us ;-)




> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Fix For: 1.2.2, 1.4.0
>
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> 

[GitHub] flink issue #4015: [FLINK-6301] [flink-connector-kafka-0.10] Upgrading Kafka...

2017-07-23 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4015
  
@vidhu5269 The account management from the Apache/Github integration 
actually means that we cannot manually close pull requests, only via commit 
messages. So we would need you to close the pull request for us ;-)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7242) Drop Java 7 Support

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-7242:

Fix Version/s: 1.4.0

> Drop Java 7 Support
> ---
>
> Key: FLINK-7242
> URL: https://issues.apache.org/jira/browse/FLINK-7242
> Project: Flink
>  Issue Type: Task
>Reporter: Eron Wright 
>Priority: Critical
> Fix For: 1.4.0
>
>
> This is the umbrella issue for dropping Java 7 support.   The decision was 
> taken following a vote 
> [here|http://mail-archives.apache.org/mod_mbox/flink-dev/201707.mbox/%3CCANC1h_tawd90CU12v%2BfQ%2BQU2ORsh%3Dnob7AehT11jGHs1g5Hqtg%40mail.gmail.com%3E]
>  and announced 
> [here|http://mail-archives.apache.org/mod_mbox/flink-dev/201707.mbox/%3CCANC1h_vnxpiBnAB0OmQPD6NMH6L_PLCyWYsX32mZ0H%2BXP3%2BheQ%40mail.gmail.com%3E].
>   Reasons cited include new language features and compatibility with Akka 2.4 
> and Scala 2.12.
> Please open sub-tasks as necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7202) Split supressions for flink-core, flink-java, flink-optimizer per package

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097617#comment-16097617
 ] 

ASF GitHub Bot commented on FLINK-7202:
---

GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/4384

[FLINK-7202] Split supressions for flink-core, flink-java, flink-opti…

…mizer per package

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink checkstyle-suppressions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4384


commit 26ab6dee90f907c54b17a4c8e707134faecd1f0e
Author: Dawid Wysakowicz 
Date:   2017-07-23T12:12:46Z

[FLINK-7202] Split supressions for flink-core, flink-java, flink-optimizer 
per package




> Split supressions for flink-core, flink-java, flink-optimizer per package
> -
>
> Key: FLINK-7202
> URL: https://issues.apache.org/jira/browse/FLINK-7202
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Checkstyle
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4384: [FLINK-7202] Split supressions for flink-core, fli...

2017-07-23 Thread dawidwys
GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/4384

[FLINK-7202] Split supressions for flink-core, flink-java, flink-opti…

…mizer per package

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink checkstyle-suppressions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4384


commit 26ab6dee90f907c54b17a4c8e707134faecd1f0e
Author: Dawid Wysakowicz 
Date:   2017-07-23T12:12:46Z

[FLINK-7202] Split supressions for flink-core, flink-java, flink-optimizer 
per package




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)
godfrey he created FLINK-7243:
-

 Summary: Add ParquetInputFormat
 Key: FLINK-7243
 URL: https://issues.apache.org/jira/browse/FLINK-7243
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he
Assignee: godfrey he






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (FLINK-2169) Add ParquetTableSource

2017-07-23 Thread godfrey he (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he reassigned FLINK-2169:
-

Assignee: godfrey he  (was: Xu Pingyong)

> Add ParquetTableSource
> --
>
> Key: FLINK-2169
> URL: https://issues.apache.org/jira/browse/FLINK-2169
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: godfrey he
>Priority: Minor
>  Labels: starter
>
> Add a {{ParquetTableSource}} to read data from a Apache Parquet file. The 
> {{ParquetTableSource}} should implement the {{ProjectableTableSource}} 
> (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4849) trustStorePassword should be checked against null in SSLUtils#createSSLClientContext

2017-07-23 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097548#comment-16097548
 ] 

mingleizhang commented on FLINK-4849:
-

There already has a NULL check condition before this , which like followling. 
So, load call can not throw NPE I think.

 {{Preconditions.checkNotNull(trustStorePassword, 
SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured.");}}

> trustStorePassword should be checked against null in 
> SSLUtils#createSSLClientContext
> 
>
> Key: FLINK-4849
> URL: https://issues.apache.org/jira/browse/FLINK-4849
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   String trustStorePassword = sslConfig.getString(
> ConfigConstants.SECURITY_SSL_TRUSTSTORE_PASSWORD,
> null);
> ...
>   try {
> trustStoreFile = new FileInputStream(new File(trustStoreFilePath));
> trustStore.load(trustStoreFile, trustStorePassword.toCharArray());
> {code}
> If trustStorePassword is null, the load() call would throw NPE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4849) trustStorePassword should be checked against null in SSLUtils#createSSLClientContext

2017-07-23 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097549#comment-16097549
 ] 

mingleizhang commented on FLINK-4849:
-

It probably not a problem.

> trustStorePassword should be checked against null in 
> SSLUtils#createSSLClientContext
> 
>
> Key: FLINK-4849
> URL: https://issues.apache.org/jira/browse/FLINK-4849
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   String trustStorePassword = sslConfig.getString(
> ConfigConstants.SECURITY_SSL_TRUSTSTORE_PASSWORD,
> null);
> ...
>   try {
> trustStoreFile = new FileInputStream(new File(trustStoreFilePath));
> trustStore.load(trustStoreFile, trustStorePassword.toCharArray());
> {code}
> If trustStorePassword is null, the load() call would throw NPE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7105) Make ActorSystem creation per default non-daemonic

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097612#comment-16097612
 ] 

ASF GitHub Bot commented on FLINK-7105:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4259


> Make ActorSystem creation per default non-daemonic
> --
>
> Key: FLINK-7105
> URL: https://issues.apache.org/jira/browse/FLINK-7105
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> At the moment, we create all {{ActorSystems}} with the setting 
> {{daemonic=on}}. This has the consequence that we have to wait in the main 
> thread on the {{ActorSystem's}} termination. By making the {{ActorSystems}} 
> non-daemonic, we could get rid of this artifact. Especially since we have the 
> {{ProcessReapers}} which terminate the process once a registered actor 
> terminates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6665.
-
   Resolution: Fixed
Fix Version/s: 1.3.2
 Release Note: 
Fixed in 
  - 1.4.0 via 65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e
  - 1.3.2 via 65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

1.3.2 needed to be fixed because this was a blocker for a critical bug fix

> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6665.
---

> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-6667) Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6667.
-
   Resolution: Fixed
Fix Version/s: 1.3.2
   1.4.0
 Release Note: 
Fixed in 
  - 1.4.0 via 65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e
  - 1.3.2 via 65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

1.3.2 needed to be fixed because this was a blocker for a critical bug fix

> Pass a callback type to the RestartStrategy, rather than the full 
> ExecutionGraph
> 
>
> Key: FLINK-6667
> URL: https://issues.apache.org/jira/browse/FLINK-6667
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Affects Versions: 1.3.0
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> To let the {{ResourceStrategy}} work across multiple {{FailoverStrategy}} 
> implementations, it needs to be passed a "callback" to call to trigger the 
> restart of tasks/regions/etc.
> Such a "callback" would be a nice abstraction to use for global restarts as 
> well, to not expose the full execution graph.
> Ideally, the callback is one-shot, so it cannot accidentally be used to call 
> restart() multiple times.
> This would also make the testing of RestartStrategies much easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7195) FlinkKafkaConsumer should not respect fetched partitions to filter restored partition states

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097735#comment-16097735
 ] 

ASF GitHub Bot commented on FLINK-7195:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4344
  
@tzulitai Could you close this PR since its subsumed by #4357?


> FlinkKafkaConsumer should not respect fetched partitions to filter restored 
> partition states
> 
>
> Key: FLINK-7195
> URL: https://issues.apache.org/jira/browse/FLINK-7195
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> This issue is a re-appearance of FLINK-6006. On restore, we should not 
> respect any fetched partitions list from Kafka and perform any filtering of 
> the restored partition states. There are corner cases where, due to Kafka 
> broker downtime, some partitions may be missing in the fetched partition 
> list. To be more precise, we actually should not require fetching partitions 
> on restore.
> We've stepped on our own foot again and reintroduced this bug in 
> https://github.com/apache/flink/pull/3378/commits/ed68fedbe90db03823d75a020510ad3c344fa73e.
>  The previous test for this behavior was too implementation specific, and 
> therefore the leak in catching this on different internal implementations.
> We should have a proper unit test for this that does not rely on the internal 
> implementations and test only on public abstractions of 
> {{FlinkKafkaConsumerBase}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4344: (release-1.3) [FLINK-7195] [kafka] Remove partition list ...

2017-07-23 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4344
  
@tzulitai Could you close this PR since its subsumed by #4357?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-7246) Big latency shown on operator.latency

2017-07-23 Thread yinhua.dai (JIRA)
yinhua.dai created FLINK-7246:
-

 Summary: Big latency shown on operator.latency
 Key: FLINK-7246
 URL: https://issues.apache.org/jira/browse/FLINK-7246
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.1
 Environment: Local
Reporter: yinhua.dai


I was running flink 1.2.1, and I have set metrics reporter to JMX to check 
latency of my job.

But the result is that the latency I observerd is over 100ms even there is no 
processing in my job.

And then I ran the example SocketWordCount streaming job, and again I saw the 
latency is over 100ms, I am wondering if there is something misconfiguration or 
problems.

I was using start-local.bat and flink run to start up the job, all with default 
configs.

Thank you in advance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4302: (master) [FLINK-7143] [kafka] Stricter tests for determin...

2017-07-23 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4302
  
@tzulitai You can merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097740#comment-16097740
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4302
  
@tzulitai You can merge this.


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7174) Bump dependency of Kafka 0.10.x to the latest one

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-7174:

Fix Version/s: 1.3.2
   1.4.0

> Bump dependency of Kafka 0.10.x to the latest one
> -
>
> Key: FLINK-7174
> URL: https://issues.apache.org/jira/browse/FLINK-7174
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.2.1, 1.3.1, 1.4.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0, 1.3.2
>
>
> We are using pretty old Kafka version for 0.10. Besides any bug fixes and 
> improvements that were made between 0.10.0.1 and 0.10.2.1, it 0.10.2.1 
> version is more similar to 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7174) Bump dependency of Kafka 0.10.x to the latest one

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-7174:

Affects Version/s: 1.4.0
   1.2.1
   1.3.1

> Bump dependency of Kafka 0.10.x to the latest one
> -
>
> Key: FLINK-7174
> URL: https://issues.apache.org/jira/browse/FLINK-7174
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.2.1, 1.3.1, 1.4.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0, 1.3.2
>
>
> We are using pretty old Kafka version for 0.10. Besides any bug fixes and 
> improvements that were made between 0.10.0.1 and 0.10.2.1, it 0.10.2.1 
> version is more similar to 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6301.
-
   Resolution: Duplicate
Fix Version/s: (was: 1.4.0)
   (was: 1.2.2)
 Release Note: Duplicate of FLINK-7174

> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> *InflaterInputSteam* (extended by *GZIPInputStream*) but looking through the 
> Kafka consumer code, we found that RecordsIterator is not closing the 
> compressor stream after use and hence, causing the memory leak:
> 

[jira] [Closed] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6301.
---

> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> *InflaterInputSteam* (extended by *GZIPInputStream*) but looking through the 
> Kafka consumer code, we found that RecordsIterator is not closing the 
> compressor stream after use and hence, causing the memory leak:
> https://github.com/apache/kafka/blob/23c69d62a0cabf06c4db8b338f0ca824dc6d81a7/clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java#L210
> 

[jira] [Resolved] (FLINK-7105) Make ActorSystem creation per default non-daemonic

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-7105.
-
   Resolution: Fixed
Fix Version/s: 1.4.0
 Release Note: Fixed in 02bf80cf7108253dfc3444fd3fbdeda79fabe333

> Make ActorSystem creation per default non-daemonic
> --
>
> Key: FLINK-7105
> URL: https://issues.apache.org/jira/browse/FLINK-7105
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.4.0
>
>
> At the moment, we create all {{ActorSystems}} with the setting 
> {{daemonic=on}}. This has the consequence that we have to wait in the main 
> thread on the {{ActorSystem's}} termination. By making the {{ActorSystems}} 
> non-daemonic, we could get rid of this artifact. Especially since we have the 
> {{ProcessReapers}} which terminate the process once a registered actor 
> terminates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-7105) Make ActorSystem creation per default non-daemonic

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-7105.
---

> Make ActorSystem creation per default non-daemonic
> --
>
> Key: FLINK-7105
> URL: https://issues.apache.org/jira/browse/FLINK-7105
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.4.0
>
>
> At the moment, we create all {{ActorSystems}} with the setting 
> {{daemonic=on}}. This has the consequence that we have to wait in the main 
> thread on the {{ActorSystem's}} termination. By making the {{ActorSystems}} 
> non-daemonic, we could get rid of this artifact. Especially since we have the 
> {{ProcessReapers}} which terminate the process once a registered actor 
> terminates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6667) Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097641#comment-16097641
 ] 

ASF GitHub Bot commented on FLINK-6667:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4277
  
Thank you for the contribution!
The code form this PR has been improved and merged in this commit: 
65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

If you agree, could you close this PR?


> Pass a callback type to the RestartStrategy, rather than the full 
> ExecutionGraph
> 
>
> Key: FLINK-6667
> URL: https://issues.apache.org/jira/browse/FLINK-6667
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Affects Versions: 1.3.0
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> To let the {{ResourceStrategy}} work across multiple {{FailoverStrategy}} 
> implementations, it needs to be passed a "callback" to call to trigger the 
> restart of tasks/regions/etc.
> Such a "callback" would be a nice abstraction to use for global restarts as 
> well, to not expose the full execution graph.
> Ideally, the callback is one-shot, so it cannot accidentally be used to call 
> restart() multiple times.
> This would also make the testing of RestartStrategies much easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097642#comment-16097642
 ] 

ASF GitHub Bot commented on FLINK-6665:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4220
  
Thank you for the contribution!
The code form this PR has been improved and merged in this commit: 
65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

If you agree, could you close this PR?


> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4220: [FLINK-6665] Pass a ScheduledExecutorService to the Resta...

2017-07-23 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4220
  
Thank you for the contribution!
The code form this PR has been improved and merged in this commit: 
65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

If you agree, could you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-6667) Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6667.
---

> Pass a callback type to the RestartStrategy, rather than the full 
> ExecutionGraph
> 
>
> Key: FLINK-6667
> URL: https://issues.apache.org/jira/browse/FLINK-6667
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Affects Versions: 1.3.0
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> To let the {{ResourceStrategy}} work across multiple {{FailoverStrategy}} 
> implementations, it needs to be passed a "callback" to call to trigger the 
> restart of tasks/regions/etc.
> Such a "callback" would be a nice abstraction to use for global restarts as 
> well, to not expose the full execution graph.
> Ideally, the callback is one-shot, so it cannot accidentally be used to call 
> restart() multiple times.
> This would also make the testing of RestartStrategies much easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4277: [FLINK-6667] Pass a callback type to the RestartStrategy,...

2017-07-23 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4277
  
Thank you for the contribution!
The code form this PR has been improved and merged in this commit: 
65400bd0a0f6a3bdd3e0bad05e2179534eaf6e9e

If you agree, could you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7242) Drop Java 7 Support

2017-07-23 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-7242:
--
Component/s: Build System

> Drop Java 7 Support
> ---
>
> Key: FLINK-7242
> URL: https://issues.apache.org/jira/browse/FLINK-7242
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Reporter: Eron Wright 
>Priority: Critical
> Fix For: 1.4.0
>
>
> This is the umbrella issue for dropping Java 7 support.   The decision was 
> taken following a vote 
> [here|http://mail-archives.apache.org/mod_mbox/flink-dev/201707.mbox/%3CCANC1h_tawd90CU12v%2BfQ%2BQU2ORsh%3Dnob7AehT11jGHs1g5Hqtg%40mail.gmail.com%3E]
>  and announced 
> [here|http://mail-archives.apache.org/mod_mbox/flink-dev/201707.mbox/%3CCANC1h_vnxpiBnAB0OmQPD6NMH6L_PLCyWYsX32mZ0H%2BXP3%2BheQ%40mail.gmail.com%3E].
>   Reasons cited include new language features and compatibility with Akka 2.4 
> and Scala 2.12.
> Please open sub-tasks as necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7174) Bump dependency of Kafka 0.10.x to the latest one

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097708#comment-16097708
 ] 

ASF GitHub Bot commented on FLINK-7174:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4321


> Bump dependency of Kafka 0.10.x to the latest one
> -
>
> Key: FLINK-7174
> URL: https://issues.apache.org/jira/browse/FLINK-7174
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.2.1, 1.3.1, 1.4.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0, 1.3.2
>
>
> We are using pretty old Kafka version for 0.10. Besides any bug fixes and 
> improvements that were made between 0.10.0.1 and 0.10.2.1, it 0.10.2.1 
> version is more similar to 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4321: [FLINK-7174] Bump Kafka 0.10 dependency to 0.10.2....

2017-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7241) Fix YARN high availability documentation

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097635#comment-16097635
 ] 

ASF GitHub Bot commented on FLINK-7241:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4382
  
I would go a bit further in updating this:

  - The ZooKeeper root node has a default value of `/flink` (I think). You 
only need to set this when you want to organize your ZooKeeper node tree in a 
specific way.

  - The cluster-id docs should be changed to say that you *do not need to 
set this manually* in Yarn / Mesos mode and that it generates a new subtree per 
launch. You only need to set this if you want to manually recover an earlier HA 
job (which is different from re-launching the job from a savepoint or 
externalized checkpoint)

  - 


> Fix YARN high availability documentation
> 
>
> Key: FLINK-7241
> URL: https://issues.apache.org/jira/browse/FLINK-7241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, YARN
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.3.2
>
>
> The documentation (jobmanager_high_availability.md) incorrectly suggests this 
> configuration template when running on YARN:
> {code}
> high-availability: zookeeper
> high-availability.zookeeper.quorum: localhost:2181
> high-availability.zookeeper.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.path.namespace: /cluster_one # important: 
> customize per cluster
> yarn.application-attempts: 10
> {code}
> while above it says that the namespace should not be set on YARN because it 
> will be automatically generated.
> Also, the documentation still refers to {{namespace}} while this has been 
> renamed to {{cluster-id}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4382: [FLINK-7241] Fix YARN high availability documentation

2017-07-23 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4382
  
I would go a bit further in updating this:

  - The ZooKeeper root node has a default value of `/flink` (I think). You 
only need to set this when you want to organize your ZooKeeper node tree in a 
specific way.

  - The cluster-id docs should be changed to say that you *do not need to 
set this manually* in Yarn / Mesos mode and that it generates a new subtree per 
launch. You only need to set this if you want to manually recover an earlier HA 
job (which is different from re-launching the job from a savepoint or 
externalized checkpoint)

  - 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7241) Fix YARN high availability documentation

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097639#comment-16097639
 ] 

ASF GitHub Bot commented on FLINK-7241:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/4382
  
Thanks  Will update


> Fix YARN high availability documentation
> 
>
> Key: FLINK-7241
> URL: https://issues.apache.org/jira/browse/FLINK-7241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, YARN
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.3.2
>
>
> The documentation (jobmanager_high_availability.md) incorrectly suggests this 
> configuration template when running on YARN:
> {code}
> high-availability: zookeeper
> high-availability.zookeeper.quorum: localhost:2181
> high-availability.zookeeper.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.path.namespace: /cluster_one # important: 
> customize per cluster
> yarn.application-attempts: 10
> {code}
> while above it says that the namespace should not be set on YARN because it 
> will be automatically generated.
> Also, the documentation still refers to {{namespace}} while this has been 
> renamed to {{cluster-id}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4382: [FLINK-7241] Fix YARN high availability documentation

2017-07-23 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/4382
  
Thanks 👌 Will update


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-7231) SlotSharingGroups are not always released in time for new restarts

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-7231.
---

> SlotSharingGroups are not always released in time for new restarts
> --
>
> Key: FLINK-7231
> URL: https://issues.apache.org/jira/browse/FLINK-7231
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> In the case where there are not enough resources to schedule the streaming 
> program, a race condition can lead to a sequence of the following errors:
> {code}
> java.lang.IllegalStateException: SlotSharingGroup cannot clear task 
> assignment, group still has allocated resources.
> {code}
> This eventually recovers, but may involve many fast restart attempts before 
> doing so.
> The root cause is that slots are not cleared before the next restart attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-7231) SlotSharingGroups are not always released in time for new restarts

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-7231.
-
  Resolution: Fixed
Release Note: 
Fixed in
  - 1.4.0 via 605319b550aeba5612b0e32fa193521081b7adc5
  - 1.3.2 via 39f5b1144167dcb80e8708f4cb5426e76f648026

> SlotSharingGroups are not always released in time for new restarts
> --
>
> Key: FLINK-7231
> URL: https://issues.apache.org/jira/browse/FLINK-7231
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> In the case where there are not enough resources to schedule the streaming 
> program, a race condition can lead to a sequence of the following errors:
> {code}
> java.lang.IllegalStateException: SlotSharingGroup cannot clear task 
> assignment, group still has allocated resources.
> {code}
> This eventually recovers, but may involve many fast restart attempts before 
> doing so.
> The root cause is that slots are not cleared before the next restart attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7174) Bump dependency of Kafka 0.10.x to the latest one

2017-07-23 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097767#comment-16097767
 ] 

Stephan Ewen commented on FLINK-7174:
-

Merged for {{1.4.0}} in 02850545e3143600c7265e737e278663e3264317

Issue is pending backport of the change to the release branch for {{1.3.2}}

> Bump dependency of Kafka 0.10.x to the latest one
> -
>
> Key: FLINK-7174
> URL: https://issues.apache.org/jira/browse/FLINK-7174
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.2.1, 1.3.1, 1.4.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0, 1.3.2
>
>
> We are using pretty old Kafka version for 0.10. Besides any bug fixes and 
> improvements that were made between 0.10.0.1 and 0.10.2.1, it 0.10.2.1 
> version is more similar to 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-7225) Cutoff exception message in StateDescriptor

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-7225.
---

> Cutoff exception message in StateDescriptor
> ---
>
> Key: FLINK-7225
> URL: https://issues.apache.org/jira/browse/FLINK-7225
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Stephan Ewen
> Fix For: 1.4.0, 1.3.2
>
>
> When the type extraction fails in the StateDescriptor constructor an 
> exception is thrown, but the message is cutoff and doesn't contain any advice 
> to remedy the situation.
> {code}
> try {
>   this.typeInfo = TypeExtractor.createTypeInfo(type);
>   } catch (Exception e) {
>   throw new RuntimeException("Cannot create full type 
> information based on the given class. If the type has generics, please", e);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-7225) Cutoff exception message in StateDescriptor

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-7225.
-
  Resolution: Fixed
Assignee: Stephan Ewen
Release Note: 
Fixed in
  - 1.4.0 via 3c756085375a003c7fbf8d477924f5b17efcb115
  - 1.3.2 via 618d544491664e9fb0e67d6e95596895cdc9d56d

> Cutoff exception message in StateDescriptor
> ---
>
> Key: FLINK-7225
> URL: https://issues.apache.org/jira/browse/FLINK-7225
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Stephan Ewen
> Fix For: 1.4.0, 1.3.2
>
>
> When the type extraction fails in the StateDescriptor constructor an 
> exception is thrown, but the message is cutoff and doesn't contain any advice 
> to remedy the situation.
> {code}
> try {
>   this.typeInfo = TypeExtractor.createTypeInfo(type);
>   } catch (Exception e) {
>   throw new RuntimeException("Cannot create full type 
> information based on the given class. If the type has generics, please", e);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-7216) ExecutionGraph can perform concurrent global restarts to scheduling

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-7216.
-
  Resolution: Fixed
Release Note: 
Fixed in 
  - 1.4.0 via 74a6cbab4e736cdb353d100cdd29f51809325796
  - 1.3.2 via e6348fbde1fc0ee8ea682063a4d6503ba3b68864

> ExecutionGraph can perform concurrent global restarts to scheduling
> ---
>
> Key: FLINK-7216
> URL: https://issues.apache.org/jira/browse/FLINK-7216
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.2.1, 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> Because ExecutionGraph restarts happen asynchronously and possibly delayed, 
> it can happen in rare corner cases that two restarts are attempted 
> concurrently, in which case some structures on the Execution Graph undergo a 
> concurrent access:
> Sample stack trace:
> {code}
> WARN  org.apache.flink.runtime.executiongraph.ExecutionGraph- Failed 
> to restart the job.
> java.lang.IllegalStateException: SlotSharingGroup cannot clear task 
> assignment, group still has allocated resources.
> at 
> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup.clearTaskAssignment(SlotSharingGroup.java:78)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:535)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1151)
> at 
> org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestarter$1.call(ExecutionGraphRestarter.java:40)
> at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The solution is to strictly guard against "subsumed" restarts via the 
> {{globalModVersion}} in a similar way as we fence local restarts against 
> global restarts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-7216) ExecutionGraph can perform concurrent global restarts to scheduling

2017-07-23 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-7216.
---

> ExecutionGraph can perform concurrent global restarts to scheduling
> ---
>
> Key: FLINK-7216
> URL: https://issues.apache.org/jira/browse/FLINK-7216
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.2.1, 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> Because ExecutionGraph restarts happen asynchronously and possibly delayed, 
> it can happen in rare corner cases that two restarts are attempted 
> concurrently, in which case some structures on the Execution Graph undergo a 
> concurrent access:
> Sample stack trace:
> {code}
> WARN  org.apache.flink.runtime.executiongraph.ExecutionGraph- Failed 
> to restart the job.
> java.lang.IllegalStateException: SlotSharingGroup cannot clear task 
> assignment, group still has allocated resources.
> at 
> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup.clearTaskAssignment(SlotSharingGroup.java:78)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:535)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1151)
> at 
> org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestarter$1.call(ExecutionGraphRestarter.java:40)
> at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The solution is to strictly guard against "subsumed" restarts via the 
> {{globalModVersion}} in a similar way as we fence local restarts against 
> global restarts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097916#comment-16097916
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4302


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4310: [misc] Commit read offsets in Kafka integration te...

2017-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4310


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4375: [Flink-6365][kinesis-connector] Adapt default valu...

2017-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4375


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4302: (master) [FLINK-7143] [kafka] Stricter tests for d...

2017-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4302


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7224) Incorrect Javadoc description in all Kafka consumer versions

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097917#comment-16097917
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-7224:


Fixed for {{master}} via 58b53748293c160c28c7f9d08c3a0ad23152d34f.
Fixed for {{release-1.3}} via cf70255d427c4f16cb5cda008952d28f20655da1.

> Incorrect Javadoc description in all Kafka consumer versions
> 
>
> Key: FLINK-7224
> URL: https://issues.apache.org/jira/browse/FLINK-7224
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, all Kafka consumer version still have this in the Javadoc:
> {code}
> The implementation currently accesses partition metadata when the consumer
> is constructed. That means that the client that submits the program needs to 
> be able to
> reach the Kafka brokers or ZooKeeper.
> {code}
> This is also the case for the documentation:
> {code}
> The current FlinkKafkaConsumer implementation will establish a connection 
> from the client (when calling the constructor) for querying the list of 
> topics and partitions.
> For this to work, the consumer needs to be able to access the consumers from 
> the machine submitting the job to the Flink cluster. If you experience any 
> issues with the Kafka consumer on the client side, the client log might 
> contain information about failed requests, etc.
> {code}
> These statements are no longer true since starting from Flink 1.3. partition 
> metadata happens only in {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-7224) Incorrect Javadoc description in all Kafka consumer versions

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-7224.
--
   Resolution: Fixed
Fix Version/s: 1.3.2
   1.4.0

> Incorrect Javadoc description in all Kafka consumer versions
> 
>
> Key: FLINK-7224
> URL: https://issues.apache.org/jira/browse/FLINK-7224
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, all Kafka consumer version still have this in the Javadoc:
> {code}
> The implementation currently accesses partition metadata when the consumer
> is constructed. That means that the client that submits the program needs to 
> be able to
> reach the Kafka brokers or ZooKeeper.
> {code}
> This is also the case for the documentation:
> {code}
> The current FlinkKafkaConsumer implementation will establish a connection 
> from the client (when calling the constructor) for querying the list of 
> topics and partitions.
> For this to work, the consumer needs to be able to access the consumers from 
> the machine submitting the job to the Flink cluster. If you experience any 
> issues with the Kafka consumer on the client side, the client log might 
> contain information about failed requests, etc.
> {code}
> These statements are no longer true since starting from Flink 1.3. partition 
> metadata happens only in {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4363: [FLINK-7224] [kafka, docs] Fix incorrect Javadoc /...

2017-07-23 Thread tzulitai
Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4015: [FLINK-6301] [flink-connector-kafka-0.10] Upgrading Kafka...

2017-07-23 Thread vidhu5269
Github user vidhu5269 commented on the issue:

https://github.com/apache/flink/pull/4015
  
@StephanEwen : Oh, ok. Didn't know the policy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097956#comment-16097956
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6365:


Thanks for the discussions and contribution [~sthm] and [~phoenixjiangnan]!
Fixed for master via 35564f25c844b827ce325453b5d518416e1bd5a8.

> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0, 1.3.2
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-6365.
--
Resolution: Fixed

> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7174) Bump dependency of Kafka 0.10.x to the latest one

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097936#comment-16097936
 ] 

ASF GitHub Bot commented on FLINK-7174:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/4386

(backport-1.3) [FLINK-7174] [kafka] Bump Kafka 0.10 dependency to 0.10.2.1

Backport of #4321 to `release-1.3`, with the following things being 
different:
1. No need to touch `KafkaConsumerThread`, because in 1.3 the code in 
`KafkaConsumerThread` will only ever be reached if there is partitions to 
subscribe to (therefore would not bump into the changed exception behaviour 
described in #4321).
2. Some touched tests and classes do not exist in 1.3 (e.g. partition 
reassignment tests, `AbstractPartitionDiscoverer`) and therefore is not 
relevant for the backport.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-7174-flink13

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4386


commit 9653b60974f68b14fd4be5ee5b9f0f687b764bdb
Author: Piotr Nowojski 
Date:   2017-07-13T09:07:28Z

[FLINK-7174] [kafka connector] Bump Kafka 0.10 dependency to 0.10.2.1

This closes #4321




> Bump dependency of Kafka 0.10.x to the latest one
> -
>
> Key: FLINK-7174
> URL: https://issues.apache.org/jira/browse/FLINK-7174
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.2.1, 1.3.1, 1.4.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0, 1.3.2
>
>
> We are using pretty old Kafka version for 0.10. Besides any bug fixes and 
> improvements that were made between 0.10.0.1 and 0.10.2.1, it 0.10.2.1 
> version is more similar to 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4386: (backport-1.3) [FLINK-7174] [kafka] Bump Kafka 0.1...

2017-07-23 Thread tzulitai
GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/4386

(backport-1.3) [FLINK-7174] [kafka] Bump Kafka 0.10 dependency to 0.10.2.1

Backport of #4321 to `release-1.3`, with the following things being 
different:
1. No need to touch `KafkaConsumerThread`, because in 1.3 the code in 
`KafkaConsumerThread` will only ever be reached if there is partitions to 
subscribe to (therefore would not bump into the changed exception behaviour 
described in #4321).
2. Some touched tests and classes do not exist in 1.3 (e.g. partition 
reassignment tests, `AbstractPartitionDiscoverer`) and therefore is not 
relevant for the backport.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-7174-flink13

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4386


commit 9653b60974f68b14fd4be5ee5b9f0f687b764bdb
Author: Piotr Nowojski 
Date:   2017-07-13T09:07:28Z

[FLINK-7174] [kafka connector] Bump Kafka 0.10 dependency to 0.10.2.1

This closes #4321




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4015: [FLINK-6301] [flink-connector-kafka-0.10] Upgrading Kafka...

2017-07-23 Thread vidhu5269
Github user vidhu5269 commented on the issue:

https://github.com/apache/flink/pull/4015
  
Closing this pull request as the change mentioned here is already covered 
as part of #4321.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097928#comment-16097928
 ] 

ASF GitHub Bot commented on FLINK-6301:
---

Github user vidhu5269 commented on the issue:

https://github.com/apache/flink/pull/4015
  
@StephanEwen : Oh, ok. Didn't know the policy.


> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> *InflaterInputSteam* (extended by *GZIPInputStream*) but looking through the 
> Kafka consumer code, we found that RecordsIterator is not closing the 
> compressor stream after use and hence, causing the memory leak:
> 

[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097930#comment-16097930
 ] 

ASF GitHub Bot commented on FLINK-6301:
---

Github user vidhu5269 commented on the issue:

https://github.com/apache/flink/pull/4015
  
Closing this pull request as the change mentioned here is already covered 
as part of #4321.


> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> *InflaterInputSteam* (extended by *GZIPInputStream*) but looking through the 
> Kafka consumer code, we found that RecordsIterator is not closing the 
> compressor stream after 

[jira] [Commented] (FLINK-7224) Incorrect Javadoc description in all Kafka consumer versions

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097927#comment-16097927
 ] 

ASF GitHub Bot commented on FLINK-7224:
---

Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4363


> Incorrect Javadoc description in all Kafka consumer versions
> 
>
> Key: FLINK-7224
> URL: https://issues.apache.org/jira/browse/FLINK-7224
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, all Kafka consumer version still have this in the Javadoc:
> {code}
> The implementation currently accesses partition metadata when the consumer
> is constructed. That means that the client that submits the program needs to 
> be able to
> reach the Kafka brokers or ZooKeeper.
> {code}
> This is also the case for the documentation:
> {code}
> The current FlinkKafkaConsumer implementation will establish a connection 
> from the client (when calling the constructor) for querying the list of 
> topics and partitions.
> For this to work, the consumer needs to be able to access the consumers from 
> the machine submitting the job to the Flink cluster. If you experience any 
> issues with the Kafka consumer on the client side, the client log might 
> contain information about failed requests, etc.
> {code}
> These statements are no longer true since starting from Flink 1.3. partition 
> metadata happens only in {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread Rahul Yadav (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097953#comment-16097953
 ] 

Rahul Yadav commented on FLINK-6301:


[~StephanEwen]: https://github.com/apache/flink/pull/4321 takes care of 
https://github.com/apache/flink/pull/4321 but FLINK-6301 is not a `duplicate` 
of FLINK-7174 as:
1. Apart from kafka-connector-0.10, we found the leak in 0.9 connector as well 
which is not actually solved by the other bug. This may not be solved here as 
well since kafka-clients did not solve it but we should document is somewhere 
as a known issue.
2. Also, the fix versions of FLINK-7174 does not list `1.2.2` which is there in 
the current one.
We can probably mark them as `related` and close it after add the relevant 
documentation.

Thoughts?

> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> 

[GitHub] flink pull request #4015: [FLINK-6301] [flink-connector-kafka-0.10] Upgradin...

2017-07-23 Thread vidhu5269
Github user vidhu5269 closed the pull request at:

https://github.com/apache/flink/pull/4015


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097955#comment-16097955
 ] 

ASF GitHub Bot commented on FLINK-6301:
---

Github user vidhu5269 closed the pull request at:

https://github.com/apache/flink/pull/4015


> Flink KafkaConnector09 leaks memory on reading compressed messages due to a 
> Kafka consumer bug
> --
>
> Key: FLINK-6301
> URL: https://issues.apache.org/jira/browse/FLINK-6301
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.2.0, 1.1.3, 1.1.4
>Reporter: Rahul Yadav
>Assignee: Rahul Yadav
> Attachments: jeprof.24611.1228.i1228.heap.svg, 
> jeprof.24611.1695.i1695.heap.svg, jeprof.24611.265.i265.heap.svg, 
> jeprof.24611.3138.i3138.heap.svg, jeprof.24611.595.i595.heap.svg, 
> jeprof.24611.705.i705.heap.svg, jeprof.24611.81.i81.heap.svg, 
> POSTFIX.jeprof.14880.1944.i1944.heap.svg, 
> POSTFIX.jeprof.14880.4129.i4129.heap.svg, 
> POSTFIX.jeprof.14880.961.i961.heap.svg, POSTFIX.jeprof.14880.99.i99.heap.svg, 
> POSTFIX.jeprof.14880.9.i9.heap.svg
>
>
> Hi
> We are running Flink on a standalone cluster with 8 TaskManagers having 8 
> vCPUs and 8 slots each. Each host has 16 GB of RAM.
> In our jobs, 
> # We are consuming gzip compressed messages from Kafka using 
> *KafkaConnector09* and use *rocksDB* backend for checkpoint storage.
> # To debug the leak, we used *jemalloc and jprof* to profile the sources of 
> malloc calls from the java process and attached are the profiles generated at 
> various stages of the job. As we can see, apart from the os.malloc and 
> rocksDB.allocateNewBlock, there are additional malloc calls coming from 
> inflate() method of java.util.zip.inflater. These calls are innocuous as long 
> as the inflater.end() method is called after it's use.
> # To look for sources of inflate() method, we used Btrace on the running 
> process to dump caller stack on the method call. Following is the stackTrace 
> we got: 
> {code}
> java.util.zip.Inflater.inflate(Inflater.java)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:253)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.innerDone(MemoryRecords.java:282)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:233)
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator.makeNext(MemoryRecords.java:210)
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:563)
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:227)
> {code}
> The end() method on Inflater is called inside the close() method of 
> *InflaterInputSteam* (extended by *GZIPInputStream*) but looking through the 
> Kafka consumer code, we found that RecordsIterator is not closing the 
> compressor stream after use and hence, causing the memory leak:
> 

[jira] [Reopened] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai reopened FLINK-6365:


> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6365:
---
Fix Version/s: (was: 1.3.2)

> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-6365.

  Resolution: Fixed
Release Note: Some default values for configurations for AWS API call 
behaviors in the Flink Kinesis Consumer was adapted for better default 
consumption performance: 1) SHARD_GETRECORDS_MAX default changed to 10,000, and 
2) SHARD_GETRECORDS_INTERVAL_MILLIS default changed to 200ms.

> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0, 1.3.2
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-6365) Adapt default values of the Kinesis connector

2017-07-23 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-6365.
--

> Adapt default values of the Kinesis connector
> -
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.2.0
>Reporter: Steffen Hausmann
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0, 1.3.2
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 1.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4421) Make clocks and time measurements monotonous

2017-07-23 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097851#comment-16097851
 ] 

mingleizhang commented on FLINK-4421:
-

We already have {{System.nanoTime}} for measuring with a monitonous time. Why 
do we still introduce a Clock  that provides a currentTimeMillis() function 
calls {{System.currentTimeMillis()}} ? I think the two are equivalent since 
remembers the max returned timestamp so far and that way it would never return 
decreasing timestamps.

> Make clocks and time measurements monotonous
> 
>
> Key: FLINK-4421
> URL: https://issues.apache.org/jira/browse/FLINK-4421
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stephan Ewen
>Priority: Minor
>
> Currently, many places use {{System.currentTimeMillis()}} to acquire 
> timestamps or measure time intervals.
> Since this relies on the system clock, and the system clock is not 
> necessarily monotonous (in the presence of clock updates), this can lead to 
> negative duration and decreasing timestamps where increasing timestamps are 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097875#comment-16097875
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4357


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4385: wrong parameter check

2017-07-23 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/4385
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7134) Remove hadoop1.x code in mapreduce.utils.HadoopUtils

2017-07-23 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated FLINK-7134:

Fix Version/s: (was: 1.3.2)
   1.4.0

> Remove hadoop1.x code in mapreduce.utils.HadoopUtils
> 
>
> Key: FLINK-7134
> URL: https://issues.apache.org/jira/browse/FLINK-7134
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: mingleizhang
>Assignee: mingleizhang
> Fix For: 1.4.0
>
>
> This jira is similar to FLINK-7118. And for a clearer format and a review, I 
> separated the two jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097879#comment-16097879
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4302
  
Merging :) 


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4302: (master) [FLINK-7143] [kafka] Stricter tests for determin...

2017-07-23 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4302
  
Merging :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097861#comment-16097861
 ] 

ASF GitHub Bot commented on FLINK-6665:
---

Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4220
  
@StephanEwen Thank you for your merging :)


> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6667) Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097864#comment-16097864
 ] 

ASF GitHub Bot commented on FLINK-6667:
---

Github user zjureel closed the pull request at:

https://github.com/apache/flink/pull/4277


> Pass a callback type to the RestartStrategy, rather than the full 
> ExecutionGraph
> 
>
> Key: FLINK-6667
> URL: https://issues.apache.org/jira/browse/FLINK-6667
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Affects Versions: 1.3.0
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> To let the {{ResourceStrategy}} work across multiple {{FailoverStrategy}} 
> implementations, it needs to be passed a "callback" to call to trigger the 
> restart of tasks/regions/etc.
> Such a "callback" would be a nice abstraction to use for global restarts as 
> well, to not expose the full execution graph.
> Ideally, the callback is one-shot, so it cannot accidentally be used to call 
> restart() multiple times.
> This would also make the testing of RestartStrategies much easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4277: [FLINK-6667] Pass a callback type to the RestartSt...

2017-07-23 Thread zjureel
Github user zjureel closed the pull request at:

https://github.com/apache/flink/pull/4277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097862#comment-16097862
 ] 

ASF GitHub Bot commented on FLINK-6665:
---

Github user zjureel closed the pull request at:

https://github.com/apache/flink/pull/4220


> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4220: [FLINK-6665] Pass a ScheduledExecutorService to th...

2017-07-23 Thread zjureel
Github user zjureel closed the pull request at:

https://github.com/apache/flink/pull/4220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6667) Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097863#comment-16097863
 ] 

ASF GitHub Bot commented on FLINK-6667:
---

Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4277
  
Thank you for your merging, thanks 


> Pass a callback type to the RestartStrategy, rather than the full 
> ExecutionGraph
> 
>
> Key: FLINK-6667
> URL: https://issues.apache.org/jira/browse/FLINK-6667
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Affects Versions: 1.3.0
>Reporter: Stephan Ewen
>Assignee: Fang Yong
> Fix For: 1.4.0, 1.3.2
>
>
> To let the {{ResourceStrategy}} work across multiple {{FailoverStrategy}} 
> implementations, it needs to be passed a "callback" to call to trigger the 
> restart of tasks/regions/etc.
> Such a "callback" would be a nice abstraction to use for global restarts as 
> well, to not expose the full execution graph.
> Ideally, the callback is one-shot, so it cannot accidentally be used to call 
> restart() multiple times.
> This would also make the testing of RestartStrategies much easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4277: [FLINK-6667] Pass a callback type to the RestartStrategy,...

2017-07-23 Thread zjureel
Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4277
  
Thank you for your merging, thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4220: [FLINK-6665] Pass a ScheduledExecutorService to the Resta...

2017-07-23 Thread zjureel
Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4220
  
@StephanEwen Thank you for your merging :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4357: (release-1.3) [FLINK-7143, FLINK-7195] Collection ...

2017-07-23 Thread tzulitai
Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4357


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4301: (release-1.3) [FLINK-7143] [kafka] Fix indeterminate part...

2017-07-23 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4301
  
This was merged via #4357. Closing ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097881#comment-16097881
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4301
  
This was merged via #4357. Closing ..


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4301: (release-1.3) [FLINK-7143] [kafka] Fix indetermina...

2017-07-23 Thread tzulitai
Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4301


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6539) Add automated end-to-end tests

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097883#comment-16097883
 ] 

ASF GitHub Bot commented on FLINK-6539:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3911
  
I think this LGTM now.


> Add automated end-to-end tests
> --
>
> Key: FLINK-6539
> URL: https://issues.apache.org/jira/browse/FLINK-6539
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We should add simple tests that exercise all the paths that a user would use 
> when starting a cluster and submitting a program. Preferably with a simple 
> batch program and a streaming program that uses Kafka.
> This would have catched some of the bugs that we now discovered right before 
> the release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097882#comment-16097882
 ] 

ASF GitHub Bot commented on FLINK-7143:
---

Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4301


> Partition assignment for Kafka consumer is not stable
> -
>
> Key: FLINK-7143
> URL: https://issues.apache.org/jira/browse/FLINK-7143
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Steven Zhen Wu
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.2
>
>
> while deploying Flink 1.3 release to hundreds of routing jobs, we found some 
> issues with partition assignment for Kafka consumer. some partitions weren't 
> assigned and some partitions got assigned more than once.
> Here is the bug introduced in Flink 1.3. 
> {code}
>   protected static void initializeSubscribedPartitionsToStartOffsets(...) 
> {
> ...
>   for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
>   if (i % numParallelSubtasks == indexOfThisSubtask) {
>   if (startupMode != 
> StartupMode.SPECIFIC_OFFSETS) {
>   
> subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), 
> startupMode.getStateSentinel());
>   }
> ...
>  }
> {code}
> The bug is using array index {{i}} to mod against {{numParallelSubtasks}}. if 
> the {{kafkaTopicPartitions}} has different order among different subtasks, 
> assignment is not stable cross subtasks and creates the assignment issue 
> mentioned earlier. 
> fix is also very simple, we should use partitionId to do the mod {{if 
> (kafkaTopicPartitions.get\(i\).getPartition() % numParallelSubtasks == 
> indexOfThisSubtask)}}. That would result in stable assignment cross subtasks 
> that is independent of ordering in the array.
> marking it as blocker because of its impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4375: [Flink-6365][kinesis-connector] Adapt default values of t...

2017-07-23 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4375
  
Thanks for the PR!
Since this was already thoroughly discussed, this LGTM.
Merging ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3911: [FLINK-6539] Add end-to-end tests

2017-07-23 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3911
  
I think this LGTM now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4385: wrong parameter check

2017-07-23 Thread p1tz
GitHub user p1tz opened a pull request:

https://github.com/apache/flink/pull/4385

wrong parameter check



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/p1tz/flink patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4385


commit b7803ed891f986c1361e3ea2b04911b7423ccf44
Author: p1tz 
Date:   2017-07-23T23:19:17Z

Update Plan.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >