FileStreamingSink is using the same counter for different files

2020-01-23 Thread Pawel Bartoszek
Hi, Flink Streaming Sink is designed to use global counter when creating files to avoid overwrites. I am running Flink 1.8.2 with Kinesis Analytics (managed flink provided by AWS) with bulk writes (rolling policy is hardcoded to roll over on checkpoint). My job is configured to checkpoint every m

Re: FileStreamingSink is using the same counter for different files

2020-01-24 Thread Pawel Bartoszek
ss part file for bucket id=2020-01-24T14_55_00Z on checkpoint. 2020-01-24 14:59:11 [Async Sink: Unnamed (1/1)] DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 0 checkpointing: BucketState for bucketId=2020-01-24T14_55_00Z and bucketPath=s3://xxx Thanks, Pawel On

Re: FileStreamingSink is using the same counter for different files

2020-01-25 Thread Pawel Bartoszek
n > have a look here: https://issues.apache.org/jira/browse/FLINK-13609 > > Cheers, > Kostas > > On Fri, Jan 24, 2020 at 5:16 PM Pawel Bartoszek > wrote: > > > > I have looked into the source code and it looks likes that the same > counter counter value being used in two bucket

Job manager logs for previous YARN attempts

2018-10-08 Thread Pawel Bartoszek
Hi, I am looking into the cause YARN starts new application attempt on Flink 1.5.2. The challenge is getting the logs for the first attempt. After checking YARN I discovered that in the first attempt and the second one application manager (job manager) gets assigned the same container id (is this

Flink weird checkpointing behaviour

2018-10-24 Thread Pawel Bartoszek
Hi, We have just upgraded to Flink 1.5.2 on EMR from Flink 1.3.2. We have noticed that some checkpoints are taking a very long time to complete some of them event fails with exception Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_0#-665361795]]

Support for multiple slots per task manager in Flink 1.5

2018-12-05 Thread Pawel Bartoszek
Hi, According to the Flink 1.5 release docs multiple slots per task manager are "not fully supported yet". Can you provide more information about what are the risks of running more than one slot per tm? We are running Flink on EMR on YARN. Previously we run 4 task task managers with 8 slot each n

Scheduling of GroupByKey and CombinePerKey operations

2018-01-18 Thread Pawel Bartoszek
Can I ask why some operations run only one slot? I understand that file writes should happen only one one slot but GroupByKey operation could be distributed across all slots. I am having around 20k distinct keys every minute. Is there any way to break this operator chain? I noticed that CombinePer

Re: Task Manager detached under load

2018-01-19 Thread Pawel Bartoszek
Thanks for this message. We also experience very similar issue under a heavy load. In job manager logs we see AskTimeoutExceptions. This correlates typicaly with almost 100% cpu in tak manager. Even if the job is stopped task manger is still busy for minutes or even hour acting like in `saturation`

How back pressure is handled by source?

2018-01-30 Thread Pawel Bartoszek
Hi, I am interested in how back pressure is handled by sources in Flink ie Kinesis source. From what I understood back pressure is a mechanism to slow down rate at which records are read from the stream. However, in the kinesis source code I can see that it configured to read the same number of ro

Java heap size in YARN

2018-02-15 Thread Pawel Bartoszek
Hi, I have a question regarding configuration of task manager heap size when running YARN session on EMR. I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap. However when requesting 56000 MB when staring YARN ac

Re: Java heap size in YARN

2018-02-15 Thread Pawel Bartoszek
at 16:03, Pawel Bartoszek wrote: > Hi, > > I have a question regarding configuration of task manager heap size when > running YARN session on EMR. > > I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use > as much as possible of that memory for

Re: Java heap size in YARN

2018-02-15 Thread Pawel Bartoszek
ff-heap. > > > You can reduce this in order to get a bigger JVM heap size or increase it > in order to reserve more memory for off-heap usage (for jobs with large > rocksdb state), > > but I suggest you not changing this setting without careful consideration. > > >

Re: Correlation between number of operators and Job manager memory requirements

2018-02-18 Thread Pawel Bartoszek
Hi, You could definitely try to find formula for heap size, but isnt's it easier just to try out different memory settings and see which works best for you? Thanks, Pawel 17 lut 2018 12:26 "Shailesh Jain" napisaƂ(a): Oops, hit send by mistake. In the configuration section, it is mentioned tha

How back pressure works in Flink?

2018-03-06 Thread Pawel Bartoszek
Can you explain how back pressure affect the source in flink? I read the great article https://data-artisans.com/blog/how-flink-handles-backpressure and got the idea but I would like to know more details. Let's consider org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext