Docker Containers on YARN when using Flink on EMR

2018-06-17 Thread Vijay Balakrishnan
Hi,
Trying to use Docker Containers to be launched from YARN when using Flink
on EMR on Ubuntu. Can't seem to launch a Docker Container from YARN
Resource Manager while starting up the ./flink-yarn-session or Submitting a
Flink job ./bin/flink run ...
Following the docs here:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#yarn
,

Tried various combinations of these variables:
-Dcontainerized.master.env.YARN_CONTAINER_RUNTIME_TYPE=docker,

-Dcontainerized.master.env.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk

Nothing seems to be working.

TIA,
Vijay


BUILD FAILURE with AbstractStreamOperator

2018-06-17 Thread Thodoris Bitsakis
Hello, When i made the following class in my demo project:


 

the build fails with "mvn clean package" :


 

Can somebody explain to me why this happens?
The demo project is made according to "Project Template for Java",so i am
using the default "pom.xml"
Thanks in advance!!!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
But if we do hive on flink , I think it should be a very big project.


> 在 2018年6月17日,下午9:36,Will Du  写道:
> 
> Agree, two missing pieces I think could make Flink more competitive against 
> Spark SQL/Stream and Kafka Stream
> 1. Flink over Hive or Flink SQL hive table source and sink
> 2. Flink ML on stream
> 
> 
>> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com> wrote:
>> 
>> Actually, I have been an idea, how about support hive on flink ? Since lots 
>> of business are written by hive sql. And users wants to transform map reduce 
>> to fink without changing the sql.
>> 
>> Zhangminglei
>> 
>> 
>> 
>>> 在 2018年6月17日,下午8:11,zhangminglei <18717838...@163.com> 写道:
>>> 
>>> Hi, Sagar
>>> 
>>> There already has relative JIRAs for ORC and Parquet, you can take a look 
>>> here: 
>>> 
>>> https://issues.apache.org/jira/browse/FLINK-9407 
>>>  and 
>>> https://issues.apache.org/jira/browse/FLINK-9411 
>>> 
>>> 
>>> For ORC format, Currently only support basic data types, such as Long, 
>>> Boolean, Short, Integer, Float, Double, String. 
>>> 
>>> Best
>>> Zhangminglei
>>> 
>>> 
>>> 
 在 2018年6月17日,上午11:11,sagar loke  写道:
 
 We are eagerly waiting for 
 
 - Extends Streaming Sinks:
   - Bucketing Sink should support S3 properly (compensate for eventual 
 consistency), work with Flink's shaded S3 file systems, and efficiently 
 support formats that compress/index arcoss individual rows (Parquet, ORC, 
 ...)
 
 Especially for ORC and Parquet sinks. Since, We are planning to use 
 Kafka-jdbc to move data from rdbms to hdfs. 
 
 Thanks,
 
 On Sat, Jun 16, 2018 at 5:08 PM Elias Levy >>> > wrote:
 One more, since it we have to deal with it often:
 
 - Idling sources (Kafka in particular) and proper watermark propagation: 
 FLINK-5018 / FLINK-5479
 
 On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy >>> > wrote:
 Since wishes are free:
 
 - Standalone cluster job isolation: 
 https://issues.apache.org/jira/browse/FLINK-8886 
 
 - Proper sliding window joins (not overlapping hoping window joins): 
 https://issues.apache.org/jira/browse/FLINK-6243 
 
 - Sharing state across operators: 
 https://issues.apache.org/jira/browse/FLINK-6239 
 
 - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 
 
 
 Seconded:
 - Atomic cancel-with-savepoint: 
 https://issues.apache.org/jira/browse/FLINK-7634 
 
 - Support dynamically changing CEP patterns : 
 https://issues.apache.org/jira/browse/FLINK-7129 
 
 
 
 On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen >>> > wrote:
 Hi all!
 
 Thanks for the discussion and good input. Many suggestions fit well with 
 the proposal above.
 
 Please bear in mind that with a time-based release model, we would release 
 whatever is mature by end of July.
 The good thing is we could schedule the next release not too far after 
 that, so that the features that did not quite make it will not be delayed 
 too long.
 In some sense, you could read this as as "what to do first" list, rather 
 than "this goes in, other things stay out".
 
 Some thoughts on some of the suggestions
 
 Kubernetes integration: An opaque integration with Kubernetes should be 
 supported through the "as a library" mode. For a deeper integration, I 
 know that some committers have experimented with some PoC code. I would 
 let Till add some thoughts, he has worked the most on the deployment parts 
 recently.
 
 Per partition watermarks with idleness: Good point, could one implement 
 that on the current interface, with a periodic watermark extractor?
 
 Atomic cancel-with-savepoint: Agreed, this is important. Making this work 
 with all sources needs a bit more work. We should have this in the roadmap.
 
 Elastic Bloomfilters: This seems like an interesting new feature - the 
 above suggested feature set was more about addressing some longer standing 
 issues/requests. However, nothing should prevent contributors to work on 
 that.
 
 Best,
 Stephan
 
 
 On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] >>> > wrote:
 +1 on https://issues.apache.org/jira/browse/FLINK-5479 
 
 [FLINK-5479] Per-partition wate

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread Will Du
Agree, two missing pieces I think could make Flink more competitive against 
Spark SQL/Stream and Kafka Stream
1. Flink over Hive or Flink SQL hive table source and sink
2. Flink ML on stream


> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com> wrote:
> 
> Actually, I have been an idea, how about support hive on flink ? Since lots 
> of business are written by hive sql. And users wants to transform map reduce 
> to fink without changing the sql.
> 
> Zhangminglei
> 
> 
> 
>> 在 2018年6月17日,下午8:11,zhangminglei <18717838...@163.com> 写道:
>> 
>> Hi, Sagar
>> 
>> There already has relative JIRAs for ORC and Parquet, you can take a look 
>> here: 
>> 
>> https://issues.apache.org/jira/browse/FLINK-9407 
>>  and 
>> https://issues.apache.org/jira/browse/FLINK-9411 
>> 
>> 
>> For ORC format, Currently only support basic data types, such as Long, 
>> Boolean, Short, Integer, Float, Double, String. 
>> 
>> Best
>> Zhangminglei
>> 
>> 
>> 
>>> 在 2018年6月17日,上午11:11,sagar loke  写道:
>>> 
>>> We are eagerly waiting for 
>>> 
>>> - Extends Streaming Sinks:
>>>- Bucketing Sink should support S3 properly (compensate for eventual 
>>> consistency), work with Flink's shaded S3 file systems, and efficiently 
>>> support formats that compress/index arcoss individual rows (Parquet, ORC, 
>>> ...)
>>> 
>>> Especially for ORC and Parquet sinks. Since, We are planning to use 
>>> Kafka-jdbc to move data from rdbms to hdfs. 
>>> 
>>> Thanks,
>>> 
>>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy >> > wrote:
>>> One more, since it we have to deal with it often:
>>> 
>>> - Idling sources (Kafka in particular) and proper watermark propagation: 
>>> FLINK-5018 / FLINK-5479
>>> 
>>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy >> > wrote:
>>> Since wishes are free:
>>> 
>>> - Standalone cluster job isolation: 
>>> https://issues.apache.org/jira/browse/FLINK-8886 
>>> 
>>> - Proper sliding window joins (not overlapping hoping window joins): 
>>> https://issues.apache.org/jira/browse/FLINK-6243 
>>> 
>>> - Sharing state across operators: 
>>> https://issues.apache.org/jira/browse/FLINK-6239 
>>> 
>>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 
>>> 
>>> 
>>> Seconded:
>>> - Atomic cancel-with-savepoint: 
>>> https://issues.apache.org/jira/browse/FLINK-7634 
>>> 
>>> - Support dynamically changing CEP patterns : 
>>> https://issues.apache.org/jira/browse/FLINK-7129 
>>> 
>>> 
>>> 
>>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen >> > wrote:
>>> Hi all!
>>> 
>>> Thanks for the discussion and good input. Many suggestions fit well with 
>>> the proposal above.
>>> 
>>> Please bear in mind that with a time-based release model, we would release 
>>> whatever is mature by end of July.
>>> The good thing is we could schedule the next release not too far after 
>>> that, so that the features that did not quite make it will not be delayed 
>>> too long.
>>> In some sense, you could read this as as "what to do first" list, rather 
>>> than "this goes in, other things stay out".
>>> 
>>> Some thoughts on some of the suggestions
>>> 
>>> Kubernetes integration: An opaque integration with Kubernetes should be 
>>> supported through the "as a library" mode. For a deeper integration, I know 
>>> that some committers have experimented with some PoC code. I would let Till 
>>> add some thoughts, he has worked the most on the deployment parts recently.
>>> 
>>> Per partition watermarks with idleness: Good point, could one implement 
>>> that on the current interface, with a periodic watermark extractor?
>>> 
>>> Atomic cancel-with-savepoint: Agreed, this is important. Making this work 
>>> with all sources needs a bit more work. We should have this in the roadmap.
>>> 
>>> Elastic Bloomfilters: This seems like an interesting new feature - the 
>>> above suggested feature set was more about addressing some longer standing 
>>> issues/requests. However, nothing should prevent contributors to work on 
>>> that.
>>> 
>>> Best,
>>> Stephan
>>> 
>>> 
>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] >> > wrote:
>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 
>>> 
>>> [FLINK-5479] Per-partition watermarks in ... 
>>> 
>>> issues.apache.org 
>>> Reported in ML: 
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-ca

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
Actually, I have been an idea, how about support hive on flink ? Since lots of 
business are written by hive sql. And users wants to transform map reduce to 
fink without changing the sql.

Zhangminglei



> 在 2018年6月17日,下午8:11,zhangminglei <18717838...@163.com> 写道:
> 
> Hi, Sagar
> 
> There already has relative JIRAs for ORC and Parquet, you can take a look 
> here: 
> 
> https://issues.apache.org/jira/browse/FLINK-9407 
>  and 
> https://issues.apache.org/jira/browse/FLINK-9411 
> 
> 
> For ORC format, Currently only support basic data types, such as Long, 
> Boolean, Short, Integer, Float, Double, String. 
> 
> Best
> Zhangminglei
> 
> 
> 
>> 在 2018年6月17日,上午11:11,sagar loke  写道:
>> 
>> We are eagerly waiting for 
>> 
>>  - Extends Streaming Sinks:
>> - Bucketing Sink should support S3 properly (compensate for eventual 
>> consistency), work with Flink's shaded S3 file systems, and efficiently 
>> support formats that compress/index arcoss individual rows (Parquet, ORC, 
>> ...)
>> 
>> Especially for ORC and Parquet sinks. Since, We are planning to use 
>> Kafka-jdbc to move data from rdbms to hdfs. 
>> 
>> Thanks,
>> 
>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy > > wrote:
>> One more, since it we have to deal with it often:
>> 
>> - Idling sources (Kafka in particular) and proper watermark propagation: 
>> FLINK-5018 / FLINK-5479
>> 
>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy > > wrote:
>> Since wishes are free:
>> 
>> - Standalone cluster job isolation: 
>> https://issues.apache.org/jira/browse/FLINK-8886 
>> 
>> - Proper sliding window joins (not overlapping hoping window joins): 
>> https://issues.apache.org/jira/browse/FLINK-6243 
>> 
>> - Sharing state across operators: 
>> https://issues.apache.org/jira/browse/FLINK-6239 
>> 
>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 
>> 
>> 
>> Seconded:
>> - Atomic cancel-with-savepoint: 
>> https://issues.apache.org/jira/browse/FLINK-7634 
>> 
>> - Support dynamically changing CEP patterns : 
>> https://issues.apache.org/jira/browse/FLINK-7129 
>> 
>> 
>> 
>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen > > wrote:
>> Hi all!
>> 
>> Thanks for the discussion and good input. Many suggestions fit well with the 
>> proposal above.
>> 
>> Please bear in mind that with a time-based release model, we would release 
>> whatever is mature by end of July.
>> The good thing is we could schedule the next release not too far after that, 
>> so that the features that did not quite make it will not be delayed too long.
>> In some sense, you could read this as as "what to do first" list, rather 
>> than "this goes in, other things stay out".
>> 
>> Some thoughts on some of the suggestions
>> 
>> Kubernetes integration: An opaque integration with Kubernetes should be 
>> supported through the "as a library" mode. For a deeper integration, I know 
>> that some committers have experimented with some PoC code. I would let Till 
>> add some thoughts, he has worked the most on the deployment parts recently.
>> 
>> Per partition watermarks with idleness: Good point, could one implement that 
>> on the current interface, with a periodic watermark extractor?
>> 
>> Atomic cancel-with-savepoint: Agreed, this is important. Making this work 
>> with all sources needs a bit more work. We should have this in the roadmap.
>> 
>> Elastic Bloomfilters: This seems like an interesting new feature - the above 
>> suggested feature set was more about addressing some longer standing 
>> issues/requests. However, nothing should prevent contributors to work on 
>> that.
>> 
>> Best,
>> Stephan
>> 
>> 
>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] > > wrote:
>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 
>> 
>> [FLINK-5479] Per-partition watermarks in ... 
>> 
>> issues.apache.org 
>> Reported in ML: 
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
>>  
>> 
>>  It's normally not a common case to have Kafka partitions not producing any 
>> data, but it'll probably be good to handle this as well. I ...
>> 
>> From: Rico Bergmann mailto:i...@ricobergmann.

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
Hi, Sagar

There already has relative JIRAs for ORC and Parquet, you can take a look here: 

 https://issues.apache.org/jira/browse/FLINK-9407 
 and 
https://issues.apache.org/jira/browse/FLINK-9411 


For ORC format, Currently only support basic data types, such as Long, Boolean, 
Short, Integer, Float, Double, String. 

Best
Zhangminglei



> 在 2018年6月17日,上午11:11,sagar loke  写道:
> 
> We are eagerly waiting for 
> 
>   - Extends Streaming Sinks:
>  - Bucketing Sink should support S3 properly (compensate for eventual 
> consistency), work with Flink's shaded S3 file systems, and efficiently 
> support formats that compress/index arcoss individual rows (Parquet, ORC, ...)
> 
> Especially for ORC and Parquet sinks. Since, We are planning to use 
> Kafka-jdbc to move data from rdbms to hdfs. 
> 
> Thanks,
> 
> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy  > wrote:
> One more, since it we have to deal with it often:
> 
> - Idling sources (Kafka in particular) and proper watermark propagation: 
> FLINK-5018 / FLINK-5479
> 
> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy  > wrote:
> Since wishes are free:
> 
> - Standalone cluster job isolation: 
> https://issues.apache.org/jira/browse/FLINK-8886 
> 
> - Proper sliding window joins (not overlapping hoping window joins): 
> https://issues.apache.org/jira/browse/FLINK-6243 
> 
> - Sharing state across operators: 
> https://issues.apache.org/jira/browse/FLINK-6239 
> 
> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 
> 
> 
> Seconded:
> - Atomic cancel-with-savepoint: 
> https://issues.apache.org/jira/browse/FLINK-7634 
> 
> - Support dynamically changing CEP patterns : 
> https://issues.apache.org/jira/browse/FLINK-7129 
> 
> 
> 
> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen  > wrote:
> Hi all!
> 
> Thanks for the discussion and good input. Many suggestions fit well with the 
> proposal above.
> 
> Please bear in mind that with a time-based release model, we would release 
> whatever is mature by end of July.
> The good thing is we could schedule the next release not too far after that, 
> so that the features that did not quite make it will not be delayed too long.
> In some sense, you could read this as as "what to do first" list, rather than 
> "this goes in, other things stay out".
> 
> Some thoughts on some of the suggestions
> 
> Kubernetes integration: An opaque integration with Kubernetes should be 
> supported through the "as a library" mode. For a deeper integration, I know 
> that some committers have experimented with some PoC code. I would let Till 
> add some thoughts, he has worked the most on the deployment parts recently.
> 
> Per partition watermarks with idleness: Good point, could one implement that 
> on the current interface, with a periodic watermark extractor?
> 
> Atomic cancel-with-savepoint: Agreed, this is important. Making this work 
> with all sources needs a bit more work. We should have this in the roadmap.
> 
> Elastic Bloomfilters: This seems like an interesting new feature - the above 
> suggested feature set was more about addressing some longer standing 
> issues/requests. However, nothing should prevent contributors to work on that.
> 
> Best,
> Stephan
> 
> 
> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science]  > wrote:
> +1 on https://issues.apache.org/jira/browse/FLINK-5479 
> 
> [FLINK-5479] Per-partition watermarks in ... 
> 
> issues.apache.org 
> Reported in ML: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
>  
> 
>  It's normally not a common case to have Kafka partitions not producing any 
> data, but it'll probably be good to handle this as well. I ...
> 
> From: Rico Bergmann mailto:i...@ricobergmann.de>>
> Sent: Tuesday, June 5, 2018 9:12:00 PM
> To: Hao Sun
> Cc: d...@flink.apache.org ; user
> Subject: Re: [DISCUSS] Flink 1.6 features
>  
> +1 on K8s integration 
> 
> 
> 
> Am 06.06.2018 um 00:01 schrieb Hao Sun  >:
> 
>> adding my vote to K8S Job mode, maybe it is this?
>> > Smoothen the integration in Container environment, like "Flink as a