Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Piotr Nowojski
Hi Alexey, I might be wrong (I don't know this side of Flink very well), but as far as I know JobGraph is never stored in the ZK. It's always recreated from the job's JAR. So you should be able to upgrade the job by replacing the JAR with a newer version, as long as the operator UIDs are the same

Re: No space left on device exception

2020-08-20 Thread Piotr Nowojski
Hi, As far as I know when uploading a file to S3, the writer needs to first create some temporary files on the local disks. I would suggest to double check all of the partitions on the local machine and monitor available disk space continuously while the job is running. If you are just checking

Re: Decompose failure recovery time

2020-08-20 Thread Piotr Nowojski
Hi, > I want to decompose the recovery time into different parts, say > (1) the time to detect the failure, > (2) the time to restart the job, > (3) and the time to restore the checkpointing. 1. Maybe I'm missing something, but as far as I can tell, Flink can not help you with that. Time to

Re: Status of a job when a kafka source dies

2020-08-14 Thread Piotr Nowojski
> > [1] > https://docs.spring.io/spring-kafka/api/org/springframework/kafka/event/NonResponsiveConsumerEvent.html > > On Wed, Aug 5, 2020 at 1:30 PM Piotr Nowojski > wrote: > >> Hi Nick, >> >> Could you elaborate more, what event and how would you lik

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
t; , > Nikola Hrusov > > On Wed, Aug 12, 2020 at 11:49 AM Piotr Nowojski > wrote: > >> Hi Nikola, >> >> Which Flink version are you using? Can you describe step by step what you >> are doing? >> >> This error that you have should have been fixed i

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
Hi Nikola, Which Flink version are you using? Can you describe step by step what you are doing? This error that you have should have been fixed in Flink 1.9.0+ [1], so if you are using an older version of Flink, please first upgrade Flink - without upgrading the job, then upgrade the connector.

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Bendtner wrote: > >> Thanks Piotr but shouldn't this event be handled by the >> FlinkKafkaConsumer since the poll happens inside the FlinkKafkaConsumer. >> How can I catch this event in my code since I don't have control over the >> poll. >> >> Best, >

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Hi Nick, What Aljoscha was trying to say is that Flink is not trying to do any magic. If `KafkaConsumer` - which is being used under the hood of `FlinkKafkaConsumer` connector - throws an exception, this exception bubbles up causing the job to failover. If the failure is handled by the

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-05 Thread Piotr Nowojski
Hi Lu, In this case, as it looks from the quite fragmented log/error message that you posted, the job has failed so Flink indeed detected some issue and that probably means a data loss in Kafka (in such case you could probably recover some lost records by reading with `read_uncommitted` mode from

Re: Is there a way to use stream API with this program?

2020-07-28 Thread Piotr Nowojski
:08 PM David Anderson > wrote: > >> In this use case, couldn't the custom trigger register an event time >> timer for MAX_WATERMARK, which would be triggered when the bounded input >> reaches its end? >> >> David >> >> On Mon, Jul 20, 2020 at 5:4

Re: Global Hashmap & global static variable.

2020-07-20 Thread Piotr Nowojski
Hi Annemarie, You are missing some basic concepts in Flink, please take a look at [1]. > Weirdly enough it worked fine in my Intellij. It's completely normal. If you are accessing some static variable in your code and you are executing your Flink application in a testing local environment

Re: Is there a way to use stream API with this program?

2020-07-20 Thread Piotr Nowojski
Hi, I'm afraid that there is not out of the box way of doing this. I've created a ticket [1] to write down and document a discussion that we had about this issue in the past. The issue is that currently, untriggered processing time timers are ignored on end of input and it seems like there might

Re: Beam flink runner job not keeping up with input rate after downscaling

2020-07-20 Thread Piotr Nowojski
Hi, maxParallelism = -1, the default value, is interpreted as described in the documentation you linked: > The default setting for the maximum parallelism is roughly operatorParallelism + (operatorParallelism / 2) with a lower bound of 128 and an upper bound of 32768. So maxParallelism should

Re: Flink Jobs are failing for running testcases when trying to build in Jenkins server

2020-07-20 Thread Piotr Nowojski
Hi, It looks like Flink mini (test) cluster has troubles starting up on your Jenkins machine. Frankly, it's hard for me to guess what could be the issue here. 1. Are you following this guideline? [1] 2. Maybe there are some other error messages somewhere else in the logs/stderr/stdout? This 100s

Re: Flink ML

2020-06-17 Thread Piotr Nowojski
Hi, It looks like FLIP-39 is only partially implemented as for now [1], so I’m not sure which features are already done. I’m including Shaoxuan Wang in this thread, maybe he will be able to better answer your question. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-12470

Re: Multiple Sinks for a Single Soure

2020-06-04 Thread Piotr Nowojski
help from both of you. Am able to add the sinks based > on the JSON and create DAG. > > Thanks, > Prasanna. > > On Wed, Jun 3, 2020 at 4:51 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi Prasanna, > > 1. > > > The object probably cont

Re: Multiple Sinks for a Single Soure

2020-06-03 Thread Piotr Nowojski
} > }, > { > "customername": "c2", > "method": "S3", > "methodparams": { > "folder": "aws://folderC2" >

Re: Collecting operators real output cardinalities as json files

2020-05-27 Thread Piotr Nowojski
about the exact netRunTime of each job maybe using the REST APIs > to get the other information will be more reliable? > > Thank you. Best, > > Francesco > >> Il giorno 25 mag 2020, alle ore 19:54, Piotr Nowojski > <mailto:pi...@ververica.com>> ha scritto: >&

Re: Singal task backpressure problem with Credit-based Flow Control

2020-05-27 Thread Piotr Nowojski
e of back pressure. > > Best > Weihua Hu > >> 2020年5月26日 02:54,Piotr Nowojski > <mailto:pi...@ververica.com>> 写道: >> >> Hi Weihua, >> >> > After dumping the memory and analyzing it, I found: &

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
a generic job for the same , rather than writing > one for new case. > > Let me know your thoughts and flink suitability to this requirement. > > Thanks > Prasanna. > > > On Tue, May 26, 2020 at 3:34 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: >

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
code > processing. > > Prasanna. > > > On Mon 25 May, 2020, 23:37 Piotr Nowojski, <mailto:pi...@ververica.com>> wrote: > Hi, > > To the best of my knowledge the following pattern should work just fine: > > DataStream myStream = env.ad

Re: close file on job crash

2020-05-26 Thread Piotr Nowojski
SIGINT will be issued anyway). In this case `#close` also will be called after source’s threads exit. Piotrek > On 25 May 2020, at 21:37, Laurent Exsteens > wrote: > > Thank you, we'll try that. > > On Mon, May 25, 2020, 21:09 Piotr Nowojski <mailto:pi...@verver

Re: close file on job crash

2020-05-25 Thread Piotr Nowojski
Hi, Cancel method is being invoked only when SourceTask is being cancelled from the outside, by JobManager - for example after detecting a failure of a different Task. > What is the proper way to handle this issue? Is there some kind of closable > source interface we should implement? Have

Re: Singal task backpressure problem with Credit-based Flow Control

2020-05-25 Thread Piotr Nowojski
Hi Weihua, > After dumping the memory and analyzing it, I found: > Sink (121)'s RemoteInputChannel.unannouncedCredit = 0, > Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0. > This is not consistent with my understanding of the Flink network > transmission mechanism.

Re: Multiple Sinks for a Single Soure

2020-05-25 Thread Piotr Nowojski
Hi, To the best of my knowledge the following pattern should work just fine: DataStream myStream = env.addSource(…).foo().bar() // for custom source, but any ; myStream.addSink(sink1); myStream.addSink(sink2); myStream.addSink(sink3); All of the records from `myStream` would be passed to each

Re: Collecting operators real output cardinalities as json files

2020-05-25 Thread Piotr Nowojski
Hi Francesco, Have you taken a look at the metrics? [1] And IO metrics [2] in particular? You can use some of the pre-existing metric reporter [3] or implement a custom one. You could export metrics to some 3rd party system, and get JSONs from there, or export them to JSON directly via a

Re: java.lang.AbstractMethodError when implementing KafkaSerializationSchema

2020-05-25 Thread Piotr Nowojski
Hi, It would be helpful if you could provide full stack trace, what Flink version and which Kafka connector version are you using? It sounds like either a dependency convergence error (mixing Kafka dependencies/various versions of flink-connector-kafka inside a single job/jar) or some shading

[ANNOUNECE] release-1.11 branch cut

2020-05-18 Thread Piotr Nowojski
Hi community, I have cut the release-1.11 from the master branch based on bf5594cdde428be3521810cb3f44db0462db35df commit. If you will be merging something into the master branch, please make sure to set the correct fix version in the JIRA, accordingly to which branch have you merged your code.

Re: Debug Slowness in Async Checkpointing

2020-04-29 Thread Piotr Nowojski
Hi, Yes, for example [1]. Most of the points that you mentioned are already visible in the UI and/or via metrics, just take a look at the subtask checkpoint stats. > when barriers were instrumented at source from checkpoint coordinator That’s checkpoint trigger time. > when each down stream task

Re: "Fill in" notification messages based on event time watermark

2020-04-29 Thread Piotr Nowojski
Perhaps that can help you get started. >>Regards, >>David >>[1] >> >> https://ci.apache.org/projects/flink/flink-docs-master/tutorials/event_driven.html#example >> >> <https://ci.apache.org/projects/flink/flink-docs-master/tutoria

Re: "Fill in" notification messages based on event time watermark

2020-04-27 Thread Piotr Nowojski
Hi, I’m not sure, but I don’t think there is an existing window that would do exactly what you want. I would suggest to go back to the `keyedProcessFunction` (or a custom operator?), and have a MapState currentStates field. Your key would be for example a timestamp of the beginning of your

Re: Task Assignment

2020-04-27 Thread Piotr Nowojski
Hi Navneeth, But what’s the problem with using `keyBy(…)`? If you have a set of keys that you want to process together, in other words they are are basically equal from the `keyBy(…)` perspective, why can’t you use this in your `KeySelector`? Maybe to make it clear, you can think about this in

[ANNOUNCE] Development progress of Apache Flink 1.11

2020-04-24 Thread Piotr Nowojski
and we have highlighted features that are already done and also the features that are no longer aimed for Flink 1.11 release and will be most likely postponed to a later date. Your release managers, Zhijiang & Piotr Nowojski Features already done and ready for Flink 1.11 - PyF

Re: KeyedStream and chained forward operators

2020-04-23 Thread Piotr Nowojski
Hi, I’m not sure how can we help you here. To my eye, your code looks ok, what you figured about pushing the keyBy in front of ContinuousFileReader is also valid and makes sense if you indeed can correctly perform the keyBy based on the input splits. The problem should be somewhere in your

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-09 Thread Piotr Nowojski
Hi, With: > Can not you take into account the pending element that’s stuck somewhere in > the transit? Snapshot it as well and during recovery reprocess it? This is > exactly that’s AsyncWaitOperator is doing. I didn’t mean for you to use AsynWaitOperator, but what both me and Arvid

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-08 Thread Piotr Nowojski
Hi Salva, Can not you take into account the pending element that’s stuck somewhere in the transit? Snapshot it as well and during recovery reprocess it? This is exactly that’s AsyncWaitOperator is doing. Piotrek > On 5 Apr 2020, at 15:00, Salva Alcántara wrote: > > Hi again Piotr, > > I

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
un > on slaves so I need 3 instances instead of 2 as I guessed. > > Regards > > On Wed, Apr 1, 2020 at 1:31 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hey, > > Isn’t explanation of the problem in the logs that you posted? Not enough > memory? Yo

Re: some subtask taking too long

2020-04-01 Thread Piotr Nowojski
Hey, The thread you are referring to is about DataStream API job and long checkpointing issue. While from your message it seems like you are using Table API (SQL) to process a batch data? Or what exactly do you mean by: > i notice that there are one or two subtasks that take too long to

Re: Flink Kafka Consumer Throughput reduces over time

2020-04-01 Thread Piotr Nowojski
Hi, I haven’t heard about Flink specific problem like that. Have you checked that the records are not changing over time? That they are not for example twice as large or twice as heavy to process? Especially that you are using rate limiter with 12MB/s. If your records grew to 60KB in size,

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
Hey, Isn’t explanation of the problem in the logs that you posted? Not enough memory? You have 2 EMR nodes, 8GB memory each, while trying to allocate 2 TaskManagers AND 1 JobManager with 6GB heap size each? Piotrek > On 31 Mar 2020, at 17:01, Antonio Martínez Carratalá > wrote: > > Hi, I'm

Re: Communication between two queries

2020-03-17 Thread Piotr Nowojski
Ops, sorry there was a misleading typo/auto correction in my previous e-mail. Second sentence should have been: > First of all you would have to use event time semantic for consistent results Piotrek > On 17 Mar 2020, at 14:43, Piotr Nowojski wrote: > > Hi, > > Ye

Re: Communication between two queries

2020-03-17 Thread Piotr Nowojski
ake sure that both queries has > processed the same amount of data before writing to the sink, but I'm a bit > unsure on how to do it. > Do you have any suggestions or thoughts? > Cheers, > > Den mån 16 mars 2020 kl 08:43 skrev Piotr Nowojski <mailto:pi...@ververica.com

Re: Implicit Flink Context Documentation

2020-03-16 Thread Piotr Nowojski
eyContextElement1` > is what I'm looking for though. It would be cool if there were some > internals design doc however? Quite hard to dig through the code as there is > a log tied to how the execution of the job actually happens. > > Padarn > > On Fri, Mar 13, 2020 at 9:43

Re: Communication between two queries

2020-03-16 Thread Piotr Nowojski
Hi, Let us know if something doesn’t work :) Piotrek > On 16 Mar 2020, at 08:42, Mikael Gordani wrote: > > Hi, > I'll try it out =) > > Cheers! > > Den mån 16 mars 2020 kl 08:32 skrev Piotr Nowojski <mailto:pi...@ververica.com>>: > Hi, > > In

Re: Expected behaviour when changing operator parallelism but starting from an incremental checkpoint

2020-03-16 Thread Piotr Nowojski
avepoints > > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints> > > Best, > > Aaron Levin > > On Fri, Mar 13, 2020 at 9:08 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi, > > Generally spe

Re: Communication between two queries

2020-03-16 Thread Piotr Nowojski
Hi, In that case you could try to implement your `FilterFunction` as two input operator, with broadcast control input, that would be setting the `global_var`. Broadcast control input can be originating from some source, or from some operator. Piotrek > On 13 Mar 2020, at 15:47, Mikael

Re: Implicit Flink Context Documentation

2020-03-13 Thread Piotr Nowojski
Hi, Please take a look for example here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state And the example in particular

Re: Cancel the flink task and restore from checkpoint ,can I change the flink operator's parallelism

2020-03-13 Thread Piotr Nowojski
Hi, Yes, you can change the parallelism. One thing that you can not change is “max parallelism”. Piotrek > On 13 Mar 2020, at 04:34, Sivaprasanna wrote: > > I think you can modify the operator’s parallelism. It is only if you have set > maxParallelism, and while restoring from a checkpoint,

Re: Communication between two queries

2020-03-13 Thread Piotr Nowojski
Hi, Could you explain a bit more what are you trying to achieve? One problem that pops into my head is that currently in Flink Streaming (it is possible for processing bounded data), there is no way to “not ingest” the data reliably in general case, as this might deadlock the upstream

Re: Expected behaviour when changing operator parallelism but starting from an incremental checkpoint

2020-03-13 Thread Piotr Nowojski
Hi, Generally speaking changes of parallelism is supported between checkpoints and savepoints. Other changes to the job’s topology, like adding/changing/removing operators, changing types in the job graph are only officially supported via savepoints. But in reality, as for now, there is no

Re: [DISCUSS] FLIP-115: Filesystem connector in Table

2020-03-13 Thread Piotr Nowojski
Hi, Which actual sinks/sources are you planning to use in this feature? Is it about exposing StreamingFileSink in the Table API? Or do you want to implement new Sinks/Sources? Piotrek > On 13 Mar 2020, at 10:04, jinhai wang wrote: > > Thanks for FLIP-115. It is really useful feature for

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Piotr Nowojski
; offset in Kafka). > > Can this be achieved with a cancel or stop of the Flink pipeline? > > Best, > Tobias > > On Mon, Mar 2, 2020 at 11:09 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi Tobi, > > No, FlinkKafkaConsumer is not usi

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Piotr Nowojski
> sink. Is that correct? > > Best, > Tobi > > On Fri, Feb 28, 2020 at 3:17 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Yes, that’s correct. There shouldn’t be any data loss. Stop with savepoint is > a solution to make sure, that if you are stopp

Re: Do I need to use Table API or DataStream API to construct sources?

2020-02-29 Thread Piotr Nowojski
16354 > <https://issues.apache.org/jira/browse/FLINK-16354> > kant kodali mailto:kanth...@gmail.com>> 于2020年3月1日周日 > 上午2:30写道: > Hi, > > Thanks for the pointer. Looks like the documentation says to use > tableEnv.registerTableSink however in my IDE it shows the m

Re: Do I need to use Table API or DataStream API to construct sources?

2020-02-29 Thread Piotr Nowojski
Hi, You shouldn’t be using `KafkaTableSource` as it’s marked @Internal. It’s not part of any public API. You don’t have to convert DataStream into Table to read from Kafka in Table API. I guess you could, if you had used DataStream API’s FlinkKafkaConsumer as it’s documented here [1]. But

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-29 Thread Piotr Nowojski
Scala versions, I'm using 2.12 in every place. My > Java version is 1.8.0_221. > > Currently it is working, but not sure what happened here. > > Thanks! > > On Fri, Feb 28, 2020 at 10:50 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Also, don’t you

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
ng somewhere) and I click "cancel" and after that I > restart the pipeline - I should not expect any data to be lost - is that > correct? > > Best, > Tobias > > On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > T

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
emoved in flink 1.9.0 since it could not work > properly as I know > 2. if we have the feature of the stop with savepoint, we could add it to the > web UI, but it may still need some work on the rest API to support the new > feature > > > Best, > Yadong > > >

Re: Batch reading from Cassandra

2020-02-28 Thread Piotr Nowojski
Hi, I’m afraid that we don’t have any native support for reading from Cassandra at the moment. The only things that I could find, are streaming sinks [1][2]. Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html

Re: Flink remote batch execution in dynamic cluster

2020-02-28 Thread Piotr Nowojski
Hi, I guess it depends what do you have already available in your cluster and try to use that. Running Flink in existing Yarn cluster is very easy, but setting up yarn cluster in the first place even if it’s easy (I’m not sure about if that’s the case), would add extra complexity. When I’m

Re: Flink 1.10 - Hadoop libraries integration with plugins and class loading

2020-02-28 Thread Piotr Nowojski
Hi, > Since we have "flink-s3-fs-hadoop" at the plugins folder and therefore being > dynamically loaded upon task/job manager(s) startup (also, we are keeping > Flink's default inverted class loading strategy), shouldn't Hadoop > dependencies be loaded by parent-first? (based on >

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Also, don’t you have a typo in your pattern? In your pattern you are using `$accountId`, while the variable is `account_id`? (Maybe I don’t understand it as I don’t know Scala very well). Piotrek > On 28 Feb 2020, at 11:45, Piotr Nowojski wrote: > > Hey, > > What Java version

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Hey, What Java versions are you using? Also, could you check, if you are not mixing Scala versions somewhere? There are two different Flink binaries for Scala 2.11 and Scala 2.12. I guess if you mix them, of if you use incorrect Scala runtime not matching the supported version of the

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-20 Thread Piotr Nowojski
me maximum parallelism would be 18. > > I will try increase the ulimit and hopefully, we wont see it... > > On Thu, 20 Feb 2020 at 04:56, Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > But it could be Kafka’s client issue on the Flink side (as the stack trace is > sugg

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-20 Thread Piotr Nowojski
nk task node where the task was > running. The brokers looked fine at the time. > I have about a dozen topics which all are single partition except one which > is 18. So I really doubt the broker machines ran out of files. > > On Mon, 17 Feb 2020 at 05:21, Piotr Nowojski <mail

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-17 Thread Piotr Nowojski
Hey, sorry but I know very little about the KafkaConsumer. I hope that someone else might know more. However, did you try to google this issue? It doesn’t sound like Flink specific problem, but like a general Kafka issue. Also a solution might be just as simple as bumping the limit of opened

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
You must switch to the Operator API to access the checkpointing lock. It was like by design - Operator API is not stable (@PublicEvolving) - that’s why we were able to deprecate and remove `checkpointingLock` in Flink 1.10/1.11. Piotrek > On 13 Feb 2020, at 14:54, Salva Alcántara wrote: > >

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
Glad that we could help :) Yes, Arvid’s response is spot on. Piotrek > On 13 Feb 2020, at 14:17, Salva Alcántara wrote: > > Many thanks for your detailed response Piotr, it helped a lot! > > BTW, I got similar comments from Arvid Heise here: >

Re: Encountered error while consuming partitions

2020-02-13 Thread Piotr Nowojski
Hi 刘建刚, Could you explain how did you fix the problem for your case? Did you modify Flink code to use `IdleStateHandler`? Piotrek > On 13 Feb 2020, at 11:10, 刘建刚 wrote: > > Thanks for all the help. Following the advice, I have fixed the problem. > >> 2020年2月13日 下午6:05,Zhijiang >

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-02-13 Thread Piotr Nowojski
Hi, As Kostas has pointed out, the operator's and udf’s APIs are not thread safe and Flink always is calling them from the same, single Task thread. This also includes checkpointing state. Also as Kostas pointed out, the easiest way would be to try use AsyncWaitOperator. If that’s not

Re: SSL configuration - default behaviour

2020-02-13 Thread Piotr Nowojski
Hi Krzysztof, Thanks for the suggestion. It was kind of implied in the first sentence on the page already, but I’m fixing it [1] to make it more clear. Piotrek [1] https://github.com/apache/flink/pull/11083 > On 11 Feb 2020, at 08:22, Krzysztof

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Piotr Nowojski
oblem. > > I think the DeleteOnExit problem will mean it needs to be restarted every few > weeks, but that's acceptable for now. > > Thanks again, > > Mark > From: Mark Harris > Sent: 30 January 2020 14:36 > To: Piotr Nowojski > Cc: Cliff Resnick ; Dav

Re: ActiveMQ connector

2020-01-30 Thread Piotr Nowojski
https://bahir.apache.org/docs/flink/current/flink-streaming-activemq/> > [3] > https://github.com/apache/bahir-flink/blob/master/flink-connector-activemq/src/main/java/org/apache/flink/streaming/connectors/activemq/AMQSource.java > > <https://github.com/apache/bahir-flink/blob/maste

Re: ActiveMQ connector

2020-01-30 Thread Piotr Nowojski
Hi, Regarding your last question, sorry I don’t know about ActiveMQ connectors. I’m not sure exactly how you are implementing your SourceFunction. Generally speaking `run()` method is executed in one thread, and other operations like checkpointing, timers (if any) are executed from another

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-30 Thread Piotr Nowojski
uld this be a factor? > > Best regards, > > Mark > From: Piotr Nowojski > Sent: 27 January 2020 16:16 > To: Cliff Resnick > Cc: David Magalhães ; Mark Harris > ; Till Rohrmann ; > flink-u...@apache.org ; kkloudas > Subject: Re: GC overhead limit exceeded, memory ful

Re: Flink and Presto integration

2020-01-28 Thread Piotr Nowojski
Hi, Yes, Presto (in presto-hive connector) is just using hive Metastore to get the table definitions/meta data. If you connect to the same hive Metastore with Flink, both systems should be able to see the same tables. Piotrek > On 28 Jan 2020, at 04:34, Jingsong Li wrote: > > Hi Flavio, >

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Piotr Nowojski
load.buffer=true > > To try and avoid writing the buffer files, but the taskmanager breaks with > the same problem. > > Best regards, > > Mark > From: Piotr Nowojski <mailto:pi...@data-artisans.com>> on behalf of Piotr Nowojski > mailto:pi...@ververica

Re: java.lang.StackOverflowError

2020-01-23 Thread Piotr Nowojski
Hi, Thanks for reporting the issue. Could you first try to upgrade to Flink 1.6.4? This might be a known issue fixed in a later bug fix release [1]. Also, are you sure you are using (unmodified) Flink 1.6.2? Stack traces somehow do not match with the 1.6.2 release tag in the repository, for

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-22 Thread Piotr Nowojski
Hi, This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed in 3.3.0. Piotrek [1] https://issues.apache.org/jira/browse/HADOOP-15658 > On 22 Jan 2020, at 13:56, Till Rohrmann wrote: > > Thanks for reporting this

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Piotr Nowojski
+1 for making it consistent. When using X state backend, timers should be stored in X by default. Also I think any configuration option controlling that needs to be well documented in some performance tuning section of the docs. Piotrek > On 17 Jan 2020, at 09:16, Congxian Qiu wrote: > > +1

Re: UnsupportedOperationException from org.apache.flink.shaded.asm6.org.objectweb.asm.ClassVisitor.visitNestHostExperimental using Java 11

2020-01-13 Thread Piotr Nowojski
Hi, Yes, this is work in progress [1]. It looks like the Java 11 support is targeted for Flink 1.10 which should be released this or the following month. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-10725 > On 9 Jan 2020, at

Re: Data overflow in SpillingResettableMutableObjectIterator

2020-01-13 Thread Piotr Nowojski
Hi Jian, Thank your for reporting the issue. I see that you have already created a ticket for this [1]. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-15549 > On 9 Jan 2020, at 09:10, Jian Cao wrote: > > Hi all: > We are using

Re: Checkpoints issue and job failing

2020-01-06 Thread Piotr Nowojski
; > I want to ask, does it happen by accident or frequently? > > Another concern is that since the 1.4 version is very far away, all > maintenance and response are not as timely as the recent versions. I > personally recommend upgrading as soon as possible. > > I can ping @Pio

Re: [Question] How to use different filesystem between checkpointdata and user data sink

2019-12-19 Thread Piotr Nowojski
Hi, Can you share the full stack trace or just attach job manager and task managers logs? This exception should have had some cause logged below. Piotrek > On 19 Dec 2019, at 04:06, ouywl wrote: > > Hi Piotr Nowojski, >I have move my filesystem plugin to FLINK_HOME/pulgins in

Re: [Question] How to use different filesystem between checkpoint data and user data sink

2019-12-18 Thread Piotr Nowojski
Hi, As Yang Wang pointed out, you should use the new plugins mechanism. If it doesn’t work, first make sure that you are shipping/distributing the plugins jars correctly - the correct plugins directory structure both on the client machine. Next make sure that the cluster has the same correct

Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-13 Thread Piotr Nowojski
Congratulations! :) > On 13 Dec 2019, at 18:05, Fabian Hueske wrote: > > Congrats Zhu Zhu and welcome on board! > > Best, Fabian > > Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till Rohrmann < > trohrm...@apache.org>: > >> Hi everyone, >> >> I'm very happy to announce that Zhu Zhu accepted

Re: How does Flink handle backpressure in EMR

2019-12-05 Thread Piotr Nowojski
t; > For the second article, I understand I can monitor the backpressure status > via the Flink Web UI. Can I refer to the same metrics in my Flink jobs > itself? For example, can I put in an if statement to check for when > outPoolUsage reaches 100%? > > Thank you, > Mi

Re: How does Flink handle backpressure in EMR

2019-12-05 Thread Piotr Nowojski
;mailto:khachatryan.ro...@gmail.com>> > Date: Thursday, December 5, 2019 at 9:47 AM > To: Michael Nguyen <mailto:michael.nguye...@t-mobile.com>> > Cc: Piotr Nowojski mailto:pi...@ververica.com>>, > "user@flink.apache.org <mailto:user@flink.apach

Re: How does Flink handle backpressure in EMR

2019-12-05 Thread Piotr Nowojski
Hi Michael, As Roman pointed out Flink currently doesn’t support the auto-scaling. It’s on our roadmap but it requires quite a bit of preliminary work to happen before. Piotrek > On 5 Dec 2019, at 15:32, r_khachatryan wrote: > > Hi Michael > > Flink *does* detect backpressure but currently,

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-12-03 Thread Piotr Nowojski
Hi, Yes, it is only related to **batch** jobs, but not necessarily only to DataSet API jobs. If you are using for example Blink SQL/Table API to process some bounded data streams (tables), it could also be visible/affected there. If not, I would suggest to start a new user mailing list

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-28 Thread Piotr Nowojski
est, > Tony Wei > > [1] https://issues.apache.org/jira/browse/FLINK-10377 > <https://issues.apache.org/jira/browse/FLINK-10377> > Piotr Nowojski mailto:pi...@ververica.com>> 於 > 2019年11月28日 週四 上午12:17寫道: > Hi Tony, > > Thanks for the explanation. Assuming that’s

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
this exception. Could you double check if > this is the case? Thank you. > > Best, > Tony Wei > > Piotr Nowojski mailto:pi...@ververica.com>> 於 > 2019年11月27日 週三 下午8:50 寫道: > Hi, > > Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based > on Prav

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi, Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based on Pravega’s sink for Flink, which was implemented by Stephan, and it has the same logic [1]. If I remember the discussions with Stephan/Till, the way how Flink is using Akka probably guarantees that messages will

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
Hi, > @yingjie Do you have any idea how much memory will be stolen from OS when > using mmap for data reading? I think this is bounded only by the size of the written data. Also it will not be “stolen from OS”, as kernel is controlling the amount of pages residing currently in the RAM

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
Thanks for the confirmation, I’ve created Jira ticket to track this issue [1] https://issues.apache.org/jira/browse/FLINK-14952 Piotrek > On 26 Nov 2019, at 11:10, yingjie wrote: > > The new BlockingSubpartition implementation in 1.9 uses

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-25 Thread Piotr Nowojski
Hi, I would suggest the same thing as Vino did: it might be possible to use stdout somehow, but it’s a better idea to coordinate in some other way. Produce some (side?) output with a control message from one job once it finishes, that will control the second job. Piotrek > On 25 Nov 2019, at

Re: Window-join DataStream (or KeyedStream) with a broadcast stream

2019-11-25 Thread Piotr Nowojski
Hi, So you are trying to use the same window definition, but you want to aggregate the data in two different ways: 1. keyBy(userId) 2. Global aggregation Do you want to use exactly the same aggregation functions? If not, you can just process the events twice: DataStream<…> events = …;

Re: Flink distributed runtime architucture

2019-11-25 Thread Piotr Nowojski
Hi, I’m glad to hear that you are interested in Flink! :) > In the picture, keyBy window and apply operators share the same circle. Is > is because these operators are chaining together? It’s not as much about chaining, as the chain of DataStream API invocations

Re: Per Operator State Monitoring

2019-11-25 Thread Piotr Nowojski
Hi, I’m not sure if there is some simple way of doing that (maybe some other contributors will know more). There are two potential ideas worth exploring: - use periodically triggered save points for monitoring? If I remember correctly save points are never incremental - use save point

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-25 Thread Piotr Nowojski
rs which fed into a CoGroup? > > // ah > > <>From: Zhijiang > Sent: Thursday, November 21, 2019 9:48 PM > To: Hailu, Andreas [Engineering] ; Piotr > Nowojski > Cc: user@flink.apache.org > Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 -

<    1   2   3   4   5   6   7   >