Re: [DISCUSS] Flink 1.6 features

2018-06-16 Thread sagar loke
We are eagerly waiting for - Extends Streaming Sinks: - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) Especially for

Re: [DISCUSS] Flink 1.6 features

2018-06-16 Thread Elias Levy
One more, since it we have to deal with it often: - Idling sources (Kafka in particular) and proper watermark propagation: FLINK-5018 / FLINK-5479 On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy wrote: > Since wishes are free: > > - Standalone cluster job isolation: https://issues. >

Re: Exception while submitting jobs through Yarn

2018-06-16 Thread Ted Yu
The error for core-default.xml is interesting.  Flink doesn't have this file. Probably it came with Yarn. Please check the hadoop version Flink was built with versus the hadoop version in your cluster. Thanks Original message From: Garvit Sharma Date: 6/16/18 7:23 AM

Re: # of active session windows of a streaming job

2018-06-16 Thread Dongwon Kim
Hi Fabian, I'm still eager to expose # of active sessions as a key metric of our service but I haven’t figured it out yet. First of all, I want to ask you some questions regarding your suggestion. > You could implement a Trigger that fires when a new window is created and > when the window is

Re: Exception while submitting jobs through Yarn

2018-06-16 Thread Garvit Sharma
I am not able to figure out, got stuck badly in this since last 1 week. Any little help would be appreciated. 2018-06-16 19:25:10,523 DEBUG org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator - Parallelism set: 1 for 8 2018-06-16 19:25:10,578 DEBUG

[BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-16 Thread Rinat
Hi mates, since 1.5 release, BucketingSink has ability to configure suffix of the part file. It’s very useful, when it’s necessary to set specific extension of the file. During the usage, I’ve found the issue - when new part file is created, it has the same part index, as index of just closed

Re: Flink application does not scale as expected, please help!

2018-06-16 Thread Siew Wai Yow
Hi Jorn, Please find the source @https://github.com/swyow/flink_sample_git Thank you! From: Jörn Franke Sent: Saturday, June 16, 2018 6:03 PM To: Siew Wai Yow Cc: user@flink.apache.org Subject: Re: Flink application does not scale as expected, please help! Can

Re: Flink application does not scale as expected, please help!

2018-06-16 Thread Jörn Franke
Can you share the app source on gitlab, github or bitbucket etc? > On 16. Jun 2018, at 11:46, Siew Wai Yow wrote: > > Hi, There is an interesting finding, the reason of low parallelism work much > better is because all task being run in same TM, once we scale more, the task > is distributed

Re: Flink application does not scale as expected, please help!

2018-06-16 Thread Siew Wai Yow
Hi, There is an interesting finding, the reason of low parallelism work much better is because all task being run in same TM, once we scale more, the task is distributed to different TM and the performance worse than the low parallelism case. Is this something expected? The more I scale the

Re: Flink application does not scale as expected, please help!

2018-06-16 Thread Siew Wai Yow
Hi Jorn, the input data is 1kb per record, in production it will have 10 billions of record per day and it will be increased so scalability is quite important to us to handle more data. Unfortunately this is not work as expected even with only 10 millions of testing data. The test application

Re: Flink application does not scale as expected, please help!

2018-06-16 Thread Jörn Franke
How large is the input data? If the input data is very small then it does not make sense to scale it even more. The larger the data is the more parallelism you will have. You can modify this behavior of course by changing the partition on the Dataset. > On 16. Jun 2018, at 10:41, Siew Wai Yow