Re: How does Flink handle backpressure in EMR

2019-12-05 Thread Piotr Nowojski
;mailto:khachatryan.ro...@gmail.com>> > Date: Thursday, December 5, 2019 at 9:47 AM > To: Michael Nguyen <mailto:michael.nguye...@t-mobile.com>> > Cc: Piotr Nowojski mailto:pi...@ververica.com>>, > "user@flink.apache.org <mailto:user@flink.apach

Re: How does Flink handle backpressure in EMR

2019-12-05 Thread Piotr Nowojski
t; > For the second article, I understand I can monitor the backpressure status > via the Flink Web UI. Can I refer to the same metrics in my Flink jobs > itself? For example, can I put in an if statement to check for when > outPoolUsage reaches 100%? > > Thank you, > Mi

Re: Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-18 Thread Piotr Nowojski
No problem, cheers :) Piotrek > On 17 Oct 2019, at 19:56, Vishal Santoshi wrote: > > oh shit.. sorry wrong wrong forum :) > > On Thu, Oct 17, 2019 at 1:41 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi, > > It sounds like setting up M

Re: Monitor number of keys per Taskmanager

2019-10-23 Thread Piotr Nowojski
Hi, This is a known issue of Flink. For example key groups can have sizes +/- 1 and they are currently randomly distributed across the cluster, so some machines will get more keys to handle then the others. If the number of keys is relatively small, like 3 keys per key group, the load

Re: Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-17 Thread Piotr Nowojski
Hi, It sounds like setting up Mirror Maker has very little to do with Flink. Shouldn’t you try asking this question on the Kafka mailing list? Piotrek > On 16 Oct 2019, at 16:06, Vishal Santoshi wrote: > > 1. still no clue, apart from the fact that ConsumerConfig gets it from > somewhere (

Re: Elasticsearch6UpsertTableSink how to trigger es delete index。

2019-10-17 Thread Piotr Nowojski
Hi, Take a look at the documentation. This [1] describes an example were a running query can produce updated results (and thus retracting the previous results). [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html#table-to-stream-conversion

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
Thanks for the confirmation, I’ve created Jira ticket to track this issue [1] https://issues.apache.org/jira/browse/FLINK-14952 Piotrek > On 26 Nov 2019, at 11:10, yingjie wrote: > > The new BlockingSubpartition implementation in 1.9 uses

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
Hi, > @yingjie Do you have any idea how much memory will be stolen from OS when > using mmap for data reading? I think this is bounded only by the size of the written data. Also it will not be “stolen from OS”, as kernel is controlling the amount of pages residing currently in the RAM

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
this exception. Could you double check if > this is the case? Thank you. > > Best, > Tony Wei > > Piotr Nowojski mailto:pi...@ververica.com>> 於 > 2019年11月27日 週三 下午8:50 寫道: > Hi, > > Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based > on Prav

Re: Per Operator State Monitoring

2019-11-25 Thread Piotr Nowojski
Hi, I’m not sure if there is some simple way of doing that (maybe some other contributors will know more). There are two potential ideas worth exploring: - use periodically triggered save points for monitoring? If I remember correctly save points are never incremental - use save point

Re: Flink distributed runtime architucture

2019-11-25 Thread Piotr Nowojski
Hi, I’m glad to hear that you are interested in Flink! :) > In the picture, keyBy window and apply operators share the same circle. Is > is because these operators are chaining together? It’s not as much about chaining, as the chain of DataStream API invocations

Re: Window-join DataStream (or KeyedStream) with a broadcast stream

2019-11-25 Thread Piotr Nowojski
Hi, So you are trying to use the same window definition, but you want to aggregate the data in two different ways: 1. keyBy(userId) 2. Global aggregation Do you want to use exactly the same aggregation functions? If not, you can just process the events twice: DataStream<…> events = …;

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-25 Thread Piotr Nowojski
rs which fed into a CoGroup? > > // ah > > <>From: Zhijiang > Sent: Thursday, November 21, 2019 9:48 PM > To: Hailu, Andreas [Engineering] ; Piotr > Nowojski > Cc: user@flink.apache.org > Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 -

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-25 Thread Piotr Nowojski
Hi, I would suggest the same thing as Vino did: it might be possible to use stdout somehow, but it’s a better idea to coordinate in some other way. Produce some (side?) output with a control message from one job once it finishes, that will control the second job. Piotrek > On 25 Nov 2019, at

Re: low performance in running queries

2019-11-01 Thread Piotr Nowojski
:57, Habib Mostafaei wrote: > > Hi Piotrek, > > Thanks for the list of profilers. I used VisualVM and here is the resource > usage for taskManager. > > > > Habib > > > > On 11/1/2019 9:48 AM, Piotr Nowojski wrote: >> Hi, >> >> &g

Re: possible backwards compatibility issue between 1.8->1.9?

2019-10-31 Thread Piotr Nowojski
Hi, (This question is more appropriate for the user mailing list, not dev - when responding to my e-mail please remove dev mailing list from the recipients, I’ve kept it just FYI that discussion has moved to user mailing list). Could it be, that the problem is caused by changes in chaining

Re: How to stream intermediate data that is stored in external storage?

2019-10-31 Thread Piotr Nowojski
ave to implement your own custom operator that would > output changes to it’s internal state as a side output" > > Yes, I am looking for this but I am not sure how to do this? Should I use the > processFunction(like the event-driven applications) ? > > On Wed, Oct 30, 2019 at 8:53

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-28 Thread Piotr Nowojski
est, > Tony Wei > > [1] https://issues.apache.org/jira/browse/FLINK-10377 > <https://issues.apache.org/jira/browse/FLINK-10377> > Piotr Nowojski mailto:pi...@ververica.com>> 於 > 2019年11月28日 週四 上午12:17寫道: > Hi Tony, > > Thanks for the explanation. Assuming that’s

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi, Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based on Pravega’s sink for Flink, which was implemented by Stephan, and it has the same logic [1]. If I remember the discussions with Stephan/Till, the way how Flink is using Akka probably guarantees that messages will

Re: Backpressure tuning/failure

2019-10-10 Thread Piotr Nowojski
Hi, I’m not entirely sure what you are testing. I have looked at your code (only the constant straggler scenario) and please correct me if’m wrong, in your job you are basically measuring throughput of `Thread.sleep(straggler.waitMillis)`. In the first RichMap task (`subTaskId == 0`), per

Re: Backpressure tuning/failure

2019-10-10 Thread Piotr Nowojski
thing up? To clarify, the code is > attempting to simulate a straggler node due to high load, which therefore > processes data at a slower rate - not a failing node. Some degree of this is > a feature of multi-tenant Hadoop. > > Cheers, Owen > > On Thu, 10 Oct 2019 at

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-11 Thread Piotr Nowojski
Hi Stephan, +1 for adding this to Apache Flink! Regarding the question if this should be committed to the main repository or as a separate one, I think it should be the main one. Previously we were discussing the idea of splitting Apache Flink into multiple repositories and I think the

Re: [Question] How to use different filesystem between checkpoint data and user data sink

2019-12-18 Thread Piotr Nowojski
Hi, As Yang Wang pointed out, you should use the new plugins mechanism. If it doesn’t work, first make sure that you are shipping/distributing the plugins jars correctly - the correct plugins directory structure both on the client machine. Next make sure that the cluster has the same correct

Re: [Question] How to use different filesystem between checkpointdata and user data sink

2019-12-19 Thread Piotr Nowojski
Hi, Can you share the full stack trace or just attach job manager and task managers logs? This exception should have had some cause logged below. Piotrek > On 19 Dec 2019, at 04:06, ouywl wrote: > > Hi Piotr Nowojski, >I have move my filesystem plugin to FLINK_HOME/pulgins in

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
Hi, As Kostas has pointed out, the operator's and udf’s APIs are not thread safe and Flink always is calling them from the same, single Task thread. This also includes checkpointing state. Also as Kostas pointed out, the easiest way would be to try use AsyncWaitOperator. If that’s not

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
You must switch to the Operator API to access the checkpointing lock. It was like by design - Operator API is not stable (@PublicEvolving) - that’s why we were able to deprecate and remove `checkpointingLock` in Flink 1.10/1.11. Piotrek > On 13 Feb 2020, at 14:54, Salva Alcántara wrote: > >

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
Glad that we could help :) Yes, Arvid’s response is spot on. Piotrek > On 13 Feb 2020, at 14:17, Salva Alcántara wrote: > > Many thanks for your detailed response Piotr, it helped a lot! > > BTW, I got similar comments from Arvid Heise here: >

Re: Encountered error while consuming partitions

2020-02-13 Thread Piotr Nowojski
Hi 刘建刚, Could you explain how did you fix the problem for your case? Did you modify Flink code to use `IdleStateHandler`? Piotrek > On 13 Feb 2020, at 11:10, 刘建刚 wrote: > > Thanks for all the help. Following the advice, I have fixed the problem. > >> 2020年2月13日 下午6:05,Zhijiang >

Re: SSL configuration - default behaviour

2020-02-13 Thread Piotr Nowojski
Hi Krzysztof, Thanks for the suggestion. It was kind of implied in the first sentence on the page already, but I’m fixing it [1] to make it more clear. Piotrek [1] https://github.com/apache/flink/pull/11083 > On 11 Feb 2020, at 08:22, Krzysztof

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-29 Thread Piotr Nowojski
Scala versions, I'm using 2.12 in every place. My > Java version is 1.8.0_221. > > Currently it is working, but not sure what happened here. > > Thanks! > > On Fri, Feb 28, 2020 at 10:50 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Also, don’t you

Re: Do I need to use Table API or DataStream API to construct sources?

2020-02-29 Thread Piotr Nowojski
Hi, You shouldn’t be using `KafkaTableSource` as it’s marked @Internal. It’s not part of any public API. You don’t have to convert DataStream into Table to read from Kafka in Table API. I guess you could, if you had used DataStream API’s FlinkKafkaConsumer as it’s documented here [1]. But

Re: Do I need to use Table API or DataStream API to construct sources?

2020-02-29 Thread Piotr Nowojski
16354 > <https://issues.apache.org/jira/browse/FLINK-16354> > kant kodali mailto:kanth...@gmail.com>> 于2020年3月1日周日 > 上午2:30写道: > Hi, > > Thanks for the pointer. Looks like the documentation says to use > tableEnv.registerTableSink however in my IDE it shows the m

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Piotr Nowojski
; offset in Kafka). > > Can this be achieved with a cancel or stop of the Flink pipeline? > > Best, > Tobias > > On Mon, Mar 2, 2020 at 11:09 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi Tobi, > > No, FlinkKafkaConsumer is not usi

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Piotr Nowojski
> sink. Is that correct? > > Best, > Tobi > > On Fri, Feb 28, 2020 at 3:17 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Yes, that’s correct. There shouldn’t be any data loss. Stop with savepoint is > a solution to make sure, that if you are stopp

Re: Flink 1.10 - Hadoop libraries integration with plugins and class loading

2020-02-28 Thread Piotr Nowojski
Hi, > Since we have "flink-s3-fs-hadoop" at the plugins folder and therefore being > dynamically loaded upon task/job manager(s) startup (also, we are keeping > Flink's default inverted class loading strategy), shouldn't Hadoop > dependencies be loaded by parent-first? (based on >

Re: Batch reading from Cassandra

2020-02-28 Thread Piotr Nowojski
Hi, I’m afraid that we don’t have any native support for reading from Cassandra at the moment. The only things that I could find, are streaming sinks [1][2]. Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
emoved in flink 1.9.0 since it could not work > properly as I know > 2. if we have the feature of the stop with savepoint, we could add it to the > web UI, but it may still need some work on the rest API to support the new > feature > > > Best, > Yadong > > >

Re: Flink remote batch execution in dynamic cluster

2020-02-28 Thread Piotr Nowojski
Hi, I guess it depends what do you have already available in your cluster and try to use that. Running Flink in existing Yarn cluster is very easy, but setting up yarn cluster in the first place even if it’s easy (I’m not sure about if that’s the case), would add extra complexity. When I’m

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Hey, What Java versions are you using? Also, could you check, if you are not mixing Scala versions somewhere? There are two different Flink binaries for Scala 2.11 and Scala 2.12. I guess if you mix them, of if you use incorrect Scala runtime not matching the supported version of the

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Also, don’t you have a typo in your pattern? In your pattern you are using `$accountId`, while the variable is `account_id`? (Maybe I don’t understand it as I don’t know Scala very well). Piotrek > On 28 Feb 2020, at 11:45, Piotr Nowojski wrote: > > Hey, > > What Java version

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
ng somewhere) and I click "cancel" and after that I > restart the pipeline - I should not expect any data to be lost - is that > correct? > > Best, > Tobias > > On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > T

Re: Flink and Presto integration

2020-01-28 Thread Piotr Nowojski
Hi, Yes, Presto (in presto-hive connector) is just using hive Metastore to get the table definitions/meta data. If you connect to the same hive Metastore with Flink, both systems should be able to see the same tables. Piotrek > On 28 Jan 2020, at 04:34, Jingsong Li wrote: > > Hi Flavio, >

Re: ActiveMQ connector

2020-01-30 Thread Piotr Nowojski
https://bahir.apache.org/docs/flink/current/flink-streaming-activemq/> > [3] > https://github.com/apache/bahir-flink/blob/master/flink-connector-activemq/src/main/java/org/apache/flink/streaming/connectors/activemq/AMQSource.java > > <https://github.com/apache/bahir-flink/blob/maste

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Piotr Nowojski
oblem. > > I think the DeleteOnExit problem will mean it needs to be restarted every few > weeks, but that's acceptable for now. > > Thanks again, > > Mark > From: Mark Harris > Sent: 30 January 2020 14:36 > To: Piotr Nowojski > Cc: Cliff Resnick ; Dav

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Piotr Nowojski
load.buffer=true > > To try and avoid writing the buffer files, but the taskmanager breaks with > the same problem. > > Best regards, > > Mark > From: Piotr Nowojski <mailto:pi...@data-artisans.com>> on behalf of Piotr Nowojski > mailto:pi...@ververica

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-30 Thread Piotr Nowojski
uld this be a factor? > > Best regards, > > Mark > From: Piotr Nowojski > Sent: 27 January 2020 16:16 > To: Cliff Resnick > Cc: David Magalhães ; Mark Harris > ; Till Rohrmann ; > flink-u...@apache.org ; kkloudas > Subject: Re: GC overhead limit exceeded, memory ful

Re: ActiveMQ connector

2020-01-30 Thread Piotr Nowojski
Hi, Regarding your last question, sorry I don’t know about ActiveMQ connectors. I’m not sure exactly how you are implementing your SourceFunction. Generally speaking `run()` method is executed in one thread, and other operations like checkpointing, timers (if any) are executed from another

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-20 Thread Piotr Nowojski
me maximum parallelism would be 18. > > I will try increase the ulimit and hopefully, we wont see it... > > On Thu, 20 Feb 2020 at 04:56, Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > But it could be Kafka’s client issue on the Flink side (as the stack trace is > sugg

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-17 Thread Piotr Nowojski
Hey, sorry but I know very little about the KafkaConsumer. I hope that someone else might know more. However, did you try to google this issue? It doesn’t sound like Flink specific problem, but like a general Kafka issue. Also a solution might be just as simple as bumping the limit of opened

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-20 Thread Piotr Nowojski
nk task node where the task was > running. The brokers looked fine at the time. > I have about a dozen topics which all are single partition except one which > is 18. So I really doubt the broker machines ran out of files. > > On Mon, 17 Feb 2020 at 05:21, Piotr Nowojski <mail

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-22 Thread Piotr Nowojski
Hi, This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0. Piotrek [1] https://issues.apache.org/jira/browse/HADOOP-15658 > On 22 Jan 2020, at 13:56, Till Rohrmann wrote: > > Thanks for reporting this

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Piotr Nowojski
+1 for making it consistent. When using X state backend, timers should be stored in X by default. Also I think any configuration option controlling that needs to be well documented in some performance tuning section of the docs. Piotrek > On 17 Jan 2020, at 09:16, Congxian Qiu wrote: > > +1

Re: Data overflow in SpillingResettableMutableObjectIterator

2020-01-13 Thread Piotr Nowojski
Hi Jian, Thank your for reporting the issue. I see that you have already created a ticket for this [1]. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-15549 > On 9 Jan 2020, at 09:10, Jian Cao wrote: > > Hi all: > We are using

Re: UnsupportedOperationException from org.apache.flink.shaded.asm6.org.objectweb.asm.ClassVisitor.visitNestHostExperimental using Java 11

2020-01-13 Thread Piotr Nowojski
Hi, Yes, this is work in progress [1]. It looks like the Java 11 support is targeted for Flink 1.10 which should be released this or the following month. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-10725 > On 9 Jan 2020, at

Re: Checkpoints issue and job failing

2020-01-06 Thread Piotr Nowojski
; > I want to ask, does it happen by accident or frequently? > > Another concern is that since the 1.4 version is very far away, all > maintenance and response are not as timely as the recent versions. I > personally recommend upgrading as soon as possible. > > I can ping @Pio

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-08 Thread Piotr Nowojski
Hi Salva, Can not you take into account the pending element that’s stuck somewhere in the transit? Snapshot it as well and during recovery reprocess it? This is exactly that’s AsyncWaitOperator is doing. Piotrek > On 5 Apr 2020, at 15:00, Salva Alcántara wrote: > > Hi again Piotr, > > I

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-09 Thread Piotr Nowojski
Hi, With: > Can not you take into account the pending element that’s stuck somewhere in > the transit? Snapshot it as well and during recovery reprocess it? This is > exactly that’s AsyncWaitOperator is doing. I didn’t mean for you to use AsynWaitOperator, but what both me and Arvid

Re: Cancel the flink task and restore from checkpoint ,can I change the flink operator's parallelism

2020-03-13 Thread Piotr Nowojski
Hi, Yes, you can change the parallelism. One thing that you can not change is “max parallelism”. Piotrek > On 13 Mar 2020, at 04:34, Sivaprasanna wrote: > > I think you can modify the operator’s parallelism. It is only if you have set > maxParallelism, and while restoring from a checkpoint,

Re: Expected behaviour when changing operator parallelism but starting from an incremental checkpoint

2020-03-13 Thread Piotr Nowojski
Hi, Generally speaking changes of parallelism is supported between checkpoints and savepoints. Other changes to the job’s topology, like adding/changing/removing operators, changing types in the job graph are only officially supported via savepoints. But in reality, as for now, there is no

Re: [DISCUSS] FLIP-115: Filesystem connector in Table

2020-03-13 Thread Piotr Nowojski
Hi, Which actual sinks/sources are you planning to use in this feature? Is it about exposing StreamingFileSink in the Table API? Or do you want to implement new Sinks/Sources? Piotrek > On 13 Mar 2020, at 10:04, jinhai wang wrote: > > Thanks for FLIP-115. It is really useful feature for

Re: Implicit Flink Context Documentation

2020-03-13 Thread Piotr Nowojski
Hi, Please take a look for example here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state And the example in particular

Re: Communication between two queries

2020-03-13 Thread Piotr Nowojski
Hi, Could you explain a bit more what are you trying to achieve? One problem that pops into my head is that currently in Flink Streaming (it is possible for processing bounded data), there is no way to “not ingest” the data reliably in general case, as this might deadlock the upstream

Re: Communication between two queries

2020-03-16 Thread Piotr Nowojski
Hi, Let us know if something doesn’t work :) Piotrek > On 16 Mar 2020, at 08:42, Mikael Gordani wrote: > > Hi, > I'll try it out =) > > Cheers! > > Den mån 16 mars 2020 kl 08:32 skrev Piotr Nowojski <mailto:pi...@ververica.com>>: > Hi, > > In

Re: Implicit Flink Context Documentation

2020-03-16 Thread Piotr Nowojski
eyContextElement1` > is what I'm looking for though. It would be cool if there were some > internals design doc however? Quite hard to dig through the code as there is > a log tied to how the execution of the job actually happens. > > Padarn > > On Fri, Mar 13, 2020 at 9:43

Re: Communication between two queries

2020-03-16 Thread Piotr Nowojski
Hi, In that case you could try to implement your `FilterFunction` as two input operator, with broadcast control input, that would be setting the `global_var`. Broadcast control input can be originating from some source, or from some operator. Piotrek > On 13 Mar 2020, at 15:47, Mikael

Re: Expected behaviour when changing operator parallelism but starting from an incremental checkpoint

2020-03-16 Thread Piotr Nowojski
avepoints > > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints> > > Best, > > Aaron Levin > > On Fri, Mar 13, 2020 at 9:08 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi, > > Generally spe

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
Hey, Isn’t explanation of the problem in the logs that you posted? Not enough memory? You have 2 EMR nodes, 8GB memory each, while trying to allocate 2 TaskManagers AND 1 JobManager with 6GB heap size each? Piotrek > On 31 Mar 2020, at 17:01, Antonio Martínez Carratalá > wrote: > > Hi, I'm

Re: some subtask taking too long

2020-04-01 Thread Piotr Nowojski
Hey, The thread you are referring to is about DataStream API job and long checkpointing issue. While from your message it seems like you are using Table API (SQL) to process a batch data? Or what exactly do you mean by: > i notice that there are one or two subtasks that take too long to

Re: Flink Kafka Consumer Throughput reduces over time

2020-04-01 Thread Piotr Nowojski
Hi, I haven’t heard about Flink specific problem like that. Have you checked that the records are not changing over time? That they are not for example twice as large or twice as heavy to process? Especially that you are using rate limiter with 12MB/s. If your records grew to 60KB in size,

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
un > on slaves so I need 3 instances instead of 2 as I guessed. > > Regards > > On Wed, Apr 1, 2020 at 1:31 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hey, > > Isn’t explanation of the problem in the logs that you posted? Not enough > memory? Yo

Re: Task Assignment

2020-04-27 Thread Piotr Nowojski
Hi Navneeth, But what’s the problem with using `keyBy(…)`? If you have a set of keys that you want to process together, in other words they are are basically equal from the `keyBy(…)` perspective, why can’t you use this in your `KeySelector`? Maybe to make it clear, you can think about this in

Re: "Fill in" notification messages based on event time watermark

2020-04-27 Thread Piotr Nowojski
Hi, I’m not sure, but I don’t think there is an existing window that would do exactly what you want. I would suggest to go back to the `keyedProcessFunction` (or a custom operator?), and have a MapState currentStates field. Your key would be for example a timestamp of the beginning of your

[ANNOUNCE] Development progress of Apache Flink 1.11

2020-04-24 Thread Piotr Nowojski
and we have highlighted features that are already done and also the features that are no longer aimed for Flink 1.11 release and will be most likely postponed to a later date. Your release managers, Zhijiang & Piotr Nowojski Features already done and ready for Flink 1.11 - PyF

Re: KeyedStream and chained forward operators

2020-04-23 Thread Piotr Nowojski
Hi, I’m not sure how can we help you here. To my eye, your code looks ok, what you figured about pushing the keyBy in front of ContinuousFileReader is also valid and makes sense if you indeed can correctly perform the keyBy based on the input splits. The problem should be somewhere in your

Re: Debug Slowness in Async Checkpointing

2020-04-29 Thread Piotr Nowojski
Hi, Yes, for example [1]. Most of the points that you mentioned are already visible in the UI and/or via metrics, just take a look at the subtask checkpoint stats. > when barriers were instrumented at source from checkpoint coordinator That’s checkpoint trigger time. > when each down stream task

Re: "Fill in" notification messages based on event time watermark

2020-04-29 Thread Piotr Nowojski
Perhaps that can help you get started. >>Regards, >>David >>[1] >> >> https://ci.apache.org/projects/flink/flink-docs-master/tutorials/event_driven.html#example >> >> <https://ci.apache.org/projects/flink/flink-docs-master/tutoria

[ANNOUNECE] release-1.11 branch cut

2020-05-18 Thread Piotr Nowojski
Hi community, I have cut the release-1.11 from the master branch based on bf5594cdde428be3521810cb3f44db0462db35df commit. If you will be merging something into the master branch, please make sure to set the correct fix version in the JIRA, accordingly to which branch have you merged your code.

Re: Communication between two queries

2020-03-17 Thread Piotr Nowojski
Ops, sorry there was a misleading typo/auto correction in my previous e-mail. Second sentence should have been: > First of all you would have to use event time semantic for consistent results Piotrek > On 17 Mar 2020, at 14:43, Piotr Nowojski wrote: > > Hi, > > Ye

Re: Communication between two queries

2020-03-17 Thread Piotr Nowojski
ake sure that both queries has > processed the same amount of data before writing to the sink, but I'm a bit > unsure on how to do it. > Do you have any suggestions or thoughts? > Cheers, > > Den mån 16 mars 2020 kl 08:43 skrev Piotr Nowojski <mailto:pi...@ververica.com

Re: Combined streams backpressure

2020-09-03 Thread Piotr Nowojski
Hi, This is a known problem. As of recently, there was no way to solve this problem generically, for every source. This is changing now, as one of the motivations behind FLIP-27, was to actually address this issue [1]. Note, as of now, there are no FLIP-27 sources yet in the code base, but for

Re: Flink 1.8.3 GC issues

2020-09-03 Thread Piotr Nowojski
4 and Beam 2.4.0 > > Any insights into this will help me to debug further > > Thanks, > Josson > > > On Thu, Sep 3, 2020 at 3:34 AM Piotr Nowojski > wrote: > >> Hi, >> >> Have you tried using a more recent Flink version? 1.8.x is no longer >> sup

Re: Flink 1.8.3 GC issues

2020-09-09 Thread Piotr Nowojski
et > GCed/Finalized. Any change around this between Flink 1.4 and Flink 1.8. > > My understanding on back pressure is that it is not based on Heap memory > but based on how fast the Network buffers are filled. Is this correct?. > Does Flink use TCP connection to communicate between tas

Re: Flink 1.8.3 GC issues

2020-09-11 Thread Piotr Nowojski
and Cpu. Both are holding good. > > In Flin 1.8 I could reach only 160 Clients/Sec and the issue started > happening. Issue started within 15 minutes of starting the ingestion. @Piotr > Nowojski , you can see that there is no meta space > related issue. All the GC related details are available

Re: Flink 1.8.3 GC issues

2020-09-10 Thread Piotr Nowojski
om Heap dump to show you the difference > between Flink 1.4 and 1.8 the way HeapKeyedStateBackend is created. Not > sure whether this change has something to do with this memory issue that I > am facing. > Name Flink-1.4.jpg for the 1.4 and Flink-1.8.jpg for 1.8 > > > Thanks, >

Re: Flink 1.8.3 GC issues

2020-09-14 Thread Piotr Nowojski
tps://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#network [3] https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network pon., 14 wrz 2020 o 05:04 Josson Paul napisał(a): > @Piotr Nowojski @Nico Kruber > I have attached the Ta

Re: Flink 1.8.3 GC issues

2020-09-03 Thread Piotr Nowojski
Hi, Have you tried using a more recent Flink version? 1.8.x is no longer supported, and latest versions might not have this issue anymore. Secondly, have you tried backtracking those references to the Finalizers? Assuming that Finalizer is indeed the class causing problems. Also it may well be

Re: what's the datasets used in flink sql document?

2020-10-15 Thread Piotr Nowojski
-docs-release-1.11/dev/table/common.html > > Thanks for your help~ > > -- 原始邮件 ------ > *发件人:* "Piotr Nowojski" ; > *发送时间:* 2020年10月14日(星期三) 晚上10:20 > *收件人:* "大森林"; > *抄送:* "user"; > *主题:* Re: what's the datasets u

Re: Processing single events for minimum latency

2020-10-15 Thread Piotr Nowojski
No problem :) Piotrek czw., 15 paź 2020 o 08:18 Pankaj Chand napisał(a): > Thank you for the quick and informative reply, Piotrek! > > On Thu, Oct 15, 2020 at 2:09 AM Piotr Nowojski > wrote: > >> Hi Pankay, >> >> Yes, you can trigger a window per each e

Re: Broadcasting control messages to a sink

2020-10-15 Thread Piotr Nowojski
or not? The main problem I > encountered when playing around with broadcast state was that I couldn’t > figure out how to access the broadcast state within the sink, but perhaps I > just haven’t thought about it the right way. I’ll meditate on the docs > further  > > > > Julian

Re: Processing single events for minimum latency

2020-10-15 Thread Piotr Nowojski
r every processed record. > > How do I do this? > > Also, is there any way I can change the execution.buffer-timeout or > setbuffertimeout(milliseconds) dynamically while the job is running? > > Thank you, > > Pankaj > > On Wed, Oct 14, 2020 at 9:42 AM Piotr N

Re: NPE when checkpointing

2020-10-14 Thread Piotr Nowojski
if I upgrade to a newer > JDK version (I tried with JDK 1.8.0_265) the issue doesn’t happen. > > Thank you for helping > -Binh > > On Fri, Oct 9, 2020 at 11:36 AM Piotr Nowojski > wrote: > >> Hi Binh, >> >> Could you try upgrading Flink's Java

Re: NPE when checkpointing

2020-10-09 Thread Piotr Nowojski
nt (build 1.8.0_77-b03) > Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode) > > Thanks > -Binh > > On Fri, Oct 9, 2020 at 10:23 AM Piotr Nowojski > wrote: > >> Hi, >> >> One more thing. It looks like it's not a Flink issue, but some JDK bug.

Re: NPE when checkpointing

2020-10-09 Thread Piotr Nowojski
Hi, Thanks for reporting the problem. I think this is a known issue [1] on which we are working to fix. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-18196 pon., 5 paź 2020 o 08:54 Binh Nguyen Van napisał(a): > Hi, > > I have a streaming job that is written in Apache Beam and uses

Re: Broadcasting control messages to a sink

2020-10-17 Thread Piotr Nowojski
| Broadcast > > | | > > Union -- > > | > > ___ > > | Sink | > >

Re: NPE when checkpointing

2020-10-09 Thread Piotr Nowojski
Hi, One more thing. It looks like it's not a Flink issue, but some JDK bug. Others reported that upgrading JDK version (for example to jdk1.8.0_251) seemed to be solving this problem. What JDK version are you using? Piotrek pt., 9 paź 2020 o 17:59 Piotr Nowojski napisał(a): > Hi, > &g

Re: Broadcasting control messages to a sink

2020-10-14 Thread Piotr Nowojski
Hi Julian, Have you seen Broadcast State [1]? I have never used it personally, but it sounds like something you want. Maybe your job should look like: 1. read raw messages from Kafka, without using the schema 2. read schema changes and broadcast them to 3. and 5. 3. deserialize kafka records in

Re: Processing single events for minimum latency

2020-10-14 Thread Piotr Nowojski
Hi Pankaj, I'm not entirely sure if I understand your question. If you want to minimize latency, you should avoid using windows or any other operators, that are buffering data for long periods of time. You still can use windowing, but you might want to emit updated value of the window per every

Re: Upgrade to Flink 1.11 in EMR 5.31 Command line interface

2020-10-14 Thread Piotr Nowojski
Hi, Are you sure you are loading the filesystems correctly? Are you using the plugin mechanism? [1] Since Flink 1.10 plugins can only be loaded in this way [2], while there were some changes to plug some holes in Flink 1.11 [3]. Best, Piotrek [1]

Re: Dynamic file name prefix - StreamingFileSink

2020-10-14 Thread Piotr Nowojski
Hi Yadav, What Flink version are you using? `getPartPrefix` and `getPartSufix` methods were not public before 1.10.1/1.11.0, which might be causing this problem for you. Other than that, if you are already using Flink 1.10.1 (or newer), maybe please double check what class are you extending? The

Re: what's the datasets used in flink sql document?

2020-10-14 Thread Piotr Nowojski
Hi, It depends how you defined `orders` in your example. For example here [1] > Table orders = tEnv.from("Orders"); // schema (a, b, c, rowtime) `orders` is obtained from the environment, from a table registered under the name "Orders". You would need to first register such table, or register a

<    1   2   3   4   5   6   7   >