Re: what's the datasets used in flink sql document?

2020-10-14 Thread Piotr Nowojski
t; > I mean: > Is there such a dataset can be downloaded > to satisfy all the examples in the document? > > Thanks for your help > > -- 原始邮件 ------ > *发件人:* "Piotr Nowojski" ; > *发送时间:* 2020年10月14日(星期三) 晚上9:52 > *收件人:* "大森林&quo

Re: Dynamic file name prefix - StreamingFileSink

2020-10-14 Thread Piotr Nowojski
Great! Please let us know if it solves the issue or not. Best, Piotrek śr., 14 paź 2020 o 17:46 Vijayendra Yadav napisał(a): > Hi Piotrek, > > That is correct I was still in 1.10, I am upgrading to 1.11. > > Regards, > Vijay > > On Wed, Oct 14, 2020 at 6:12 AM Piotr No

Re: Upgrade to Flink 1.11 in EMR 5.31 Command line interface

2020-10-14 Thread Piotr Nowojski
I'm glad to hear that :) Best regards, Piotrek śr., 14 paź 2020 o 18:28 Vijayendra Yadav napisał(a): > Thank You Piotre. I moved *flink-s3-fs-hadoop* library to plugin. Now > it's good. > > > On Wed, Oct 14, 2020 at 6:23 AM Piotr Nowojski > wrote: > >> Hi, >>

Re: No space left on device exception

2020-08-20 Thread Piotr Nowojski
Hi, As far as I know when uploading a file to S3, the writer needs to first create some temporary files on the local disks. I would suggest to double check all of the partitions on the local machine and monitor available disk space continuously while the job is running. If you are just checking

Re: ERROR : RocksDBStateBackend

2020-08-20 Thread Piotr Nowojski
Hi, It looks more like a dependency convergence issue - you have two conflicting versions of `org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest` on the class path. Or you built your jar with one version and trying to execute it with a different one. Till is it some kind of a known

Re: Decompose failure recovery time

2020-08-20 Thread Piotr Nowojski
all operator instances > from CANCELING to CANCELED is around 30s, do you have any ideas about > this? > > Many thanks. > > Regards, > Zhinan > > On Thu, 20 Aug 2020 at 21:26, Piotr Nowojski wrote: > > > > Hi, > > > > > I want to decompose the recove

Re: How Flink distinguishes between late and in-time events?

2020-08-20 Thread Piotr Nowojski
Hi Ori, No. Flink does it differently. Operators that are keeping track of late events, are remembering the latest watermark. If a new element arrives with even time lower compared to the latest watermark, it is marked as a late event [1] Piotrek [1]

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Piotr Nowojski
Hi Alexey, I might be wrong (I don't know this side of Flink very well), but as far as I know JobGraph is never stored in the ZK. It's always recreated from the job's JAR. So you should be able to upgrade the job by replacing the JAR with a newer version, as long as the operator UIDs are the same

Re: Stream job with Event Timestamp (Backfill getting incorrect results)

2020-08-20 Thread Piotr Nowojski
Hi, It's hard for me to help you debug your code, but as long as: - you are using event time for processing records (in operators like `WindowOperator`) - you do not have late records - you are replaying the same records - your code is deterministic - you do not rely on the order of the records

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Piotr Nowojski
bGraph > (assuming you aren't changing the Zookeeper configuration). > > If you wish to update the job, then you should cancel it (along with > creating a savepoint), which will clear the Zookeeper state, and then > create a new deployment > > On 20/08/2020 15:43, Piotr Nowojski

Re: Status of a job when a kafka source dies

2020-08-14 Thread Piotr Nowojski
> > [1] > https://docs.spring.io/spring-kafka/api/org/springframework/kafka/event/NonResponsiveConsumerEvent.html > > On Wed, Aug 5, 2020 at 1:30 PM Piotr Nowojski > wrote: > >> Hi Nick, >> >> Could you elaborate more, what event and how would you lik

Re: Decompose failure recovery time

2020-08-20 Thread Piotr Nowojski
Hi, > I want to decompose the recovery time into different parts, say > (1) the time to detect the failure, > (2) the time to restart the job, > (3) and the time to restore the checkpointing. 1. Maybe I'm missing something, but as far as I can tell, Flink can not help you with that. Time to

Re: I have a job with multiple Kafka sources. They all contain certain historical data.

2020-09-27 Thread Piotr Nowojski
d control streaming according to your > first suggest. > > Piotr Nowojski 于2020年9月16日周三 下午11:56写道: > >> Hey, >> >> If you are worried about increased amount of buffered data by the >> WindowOperator if watermarks/event time is not progressing uniformly across >>

Re: Dynamic Kafka Source

2020-09-28 Thread Piotr Nowojski
Hi Prasanna, As Theo has suggested on Stackoverflow, can you use multiple independent jobs instead? Piotrek sob., 26 wrz 2020 o 19:17 Prasanna kumar napisał(a): > Hi, > > My requirement has been captured by the following stack overflow question. > > >

Re: Flink performance testing

2020-09-17 Thread Piotr Nowojski
salunkhe napisał(a): > I would like to do performance testing for my flink job specially related > with volume, how my flink job perform if more streaming data coming to my > source connectors and measure benchmark for various operators? > > On Wed, 16 Sep 2020 at 12:03, Piotr

Re: Flink Batch Processing

2020-09-28 Thread Piotr Nowojski
Hi Sunitha, First and foremost, the DataSet API will be deprecated soon [1] so I would suggest trying to migrate to the DataStream API. When using the DataStream API it doesn't mean that you can not work with bounded inputs - you can. Flink SQL (Blink planner) is in fact using DataStream API to

Re: Scala: Static methods in interface require -target:jvm-1.8

2020-09-28 Thread Piotr Nowojski
Hi, It sounds more like an Intellij issue, not a Flink issue. But have you checked your configured target language level for your modules? Best regards, Piotrek pon., 28 wrz 2020 o 10:57 Lu Weizheng napisał(a): > Hi all, > > I recently upgraded Intellij IEDA from 2019 to 2020.2 Community

Re: Singal task backpressure problem with Credit-based Flow Control

2020-05-27 Thread Piotr Nowojski
e of back pressure. > > Best > Weihua Hu > >> 2020年5月26日 02:54,Piotr Nowojski > <mailto:pi...@ververica.com>> 写道: >> >> Hi Weihua, >> >> > After dumping the memory and analyzing it, I found: &

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
a generic job for the same , rather than writing > one for new case. > > Let me know your thoughts and flink suitability to this requirement. > > Thanks > Prasanna. > > > On Tue, May 26, 2020 at 3:34 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: >

Re: Collecting operators real output cardinalities as json files

2020-05-27 Thread Piotr Nowojski
about the exact netRunTime of each job maybe using the REST APIs > to get the other information will be more reliable? > > Thank you. Best, > > Francesco > >> Il giorno 25 mag 2020, alle ore 19:54, Piotr Nowojski > <mailto:pi...@ververica.com>> ha scritto: >&

Re: Multiple Sinks for a Single Soure

2020-05-25 Thread Piotr Nowojski
Hi, To the best of my knowledge the following pattern should work just fine: DataStream myStream = env.addSource(…).foo().bar() // for custom source, but any ; myStream.addSink(sink1); myStream.addSink(sink2); myStream.addSink(sink3); All of the records from `myStream` would be passed to each

Re: close file on job crash

2020-05-25 Thread Piotr Nowojski
Hi, Cancel method is being invoked only when SourceTask is being cancelled from the outside, by JobManager - for example after detecting a failure of a different Task. > What is the proper way to handle this issue? Is there some kind of closable > source interface we should implement? Have

Re: Singal task backpressure problem with Credit-based Flow Control

2020-05-25 Thread Piotr Nowojski
Hi Weihua, > After dumping the memory and analyzing it, I found: > Sink (121)'s RemoteInputChannel.unannouncedCredit = 0, > Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0. > This is not consistent with my understanding of the Flink network > transmission mechanism.

Re: java.lang.AbstractMethodError when implementing KafkaSerializationSchema

2020-05-25 Thread Piotr Nowojski
Hi, It would be helpful if you could provide full stack trace, what Flink version and which Kafka connector version are you using? It sounds like either a dependency convergence error (mixing Kafka dependencies/various versions of flink-connector-kafka inside a single job/jar) or some shading

Re: Collecting operators real output cardinalities as json files

2020-05-25 Thread Piotr Nowojski
Hi Francesco, Have you taken a look at the metrics? [1] And IO metrics [2] in particular? You can use some of the pre-existing metric reporter [3] or implement a custom one. You could export metrics to some 3rd party system, and get JSONs from there, or export them to JSON directly via a

Re: close file on job crash

2020-05-26 Thread Piotr Nowojski
SIGINT will be issued anyway). In this case `#close` also will be called after source’s threads exit. Piotrek > On 25 May 2020, at 21:37, Laurent Exsteens > wrote: > > Thank you, we'll try that. > > On Mon, May 25, 2020, 21:09 Piotr Nowojski <mailto:pi...@verver

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
code > processing. > > Prasanna. > > > On Mon 25 May, 2020, 23:37 Piotr Nowojski, <mailto:pi...@ververica.com>> wrote: > Hi, > > To the best of my knowledge the following pattern should work just fine: > > DataStream myStream = env.ad

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Piotr Nowojski
Hi, Have you seen "Reinterpreting a pre-partitioned data stream as keyed stream" feature? [1] However I'm not sure if and how can it be integrated with the Table API. Maybe someone more familiar with the Table API can help with that? Piotrek [1]

Re: Flink performance testing

2020-09-16 Thread Piotr Nowojski
Hi, I'm not sure what you are asking for. We do not provide benchmarks for all of the operators. We currently have a couple of micro benchmarks [1] for some of the operators, and we are also setting up some adhoc benchmarks when implementing various features. If you want to benchmark something

Re: I have a job with multiple Kafka sources. They all contain certain historical data.

2020-09-16 Thread Piotr Nowojski
Hey, If you are worried about increased amount of buffered data by the WindowOperator if watermarks/event time is not progressing uniformly across multiple sources, then there is little you can do currently. FLIP-27 [1] will allow us to address this problem in more generic way. What you can

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-16 Thread Piotr Nowojski
Hi, Could it be related to https://issues.apache.org/jira/browse/FLINK-18223 ? Also maybe as a workaround, is it working if you enable object reuse (`StreamExecutionEnvironment#getConfig()#enableObjectReuse())`)? Best regards Piotrek śr., 16 wrz 2020 o 08:09 Lian Jiang napisał(a): > Hi, > >

Re: Flink 1.8.3 GC issues

2020-10-23 Thread Piotr Nowojski
Hi Josson, Thanks for great investigation and coming back to use. Aljoscha, could you help us here? It looks like you were involved in this original BEAM-3087 issue. Best, Piotrek pt., 23 paź 2020 o 07:36 Josson Paul napisał(a): > @Piotr Nowojski @Nico Kruber > > An update. >

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
Hi Nikola, Which Flink version are you using? Can you describe step by step what you are doing? This error that you have should have been fixed in Flink 1.9.0+ [1], so if you are using an older version of Flink, please first upgrade Flink - without upgrading the job, then upgrade the connector.

Re: Updating kafka connector with state

2020-08-12 Thread Piotr Nowojski
t; , > Nikola Hrusov > > On Wed, Aug 12, 2020 at 11:49 AM Piotr Nowojski > wrote: > >> Hi Nikola, >> >> Which Flink version are you using? Can you describe step by step what you >> are doing? >> >> This error that you have should have been fixed i

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-05 Thread Piotr Nowojski
Hi Lu, In this case, as it looks from the quite fragmented log/error message that you posted, the job has failed so Flink indeed detected some issue and that probably means a data loss in Kafka (in such case you could probably recover some lost records by reading with `read_uncommitted` mode from

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Hi Nick, What Aljoscha was trying to say is that Flink is not trying to do any magic. If `KafkaConsumer` - which is being used under the hood of `FlinkKafkaConsumer` connector - throws an exception, this exception bubbles up causing the job to failover. If the failure is handled by the

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Bendtner wrote: > >> Thanks Piotr but shouldn't this event be handled by the >> FlinkKafkaConsumer since the poll happens inside the FlinkKafkaConsumer. >> How can I catch this event in my code since I don't have control over the >> poll. >> >> Best, >

Re: Flink ML

2020-06-17 Thread Piotr Nowojski
Hi, It looks like FLIP-39 is only partially implemented as for now [1], so I’m not sure which features are already done. I’m including Shaoxuan Wang in this thread, maybe he will be able to better answer your question. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-12470

Re: Multiple Sinks for a Single Soure

2020-06-03 Thread Piotr Nowojski
} > }, > { > "customername": "c2", > "method": "S3", > "methodparams": { > "folder": "aws://folderC2" >

Re: Multiple Sinks for a Single Soure

2020-06-04 Thread Piotr Nowojski
help from both of you. Am able to add the sinks based > on the JSON and create DAG. > > Thanks, > Prasanna. > > On Wed, Jun 3, 2020 at 4:51 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi Prasanna, > > 1. > > > The object probably cont

Re: Is there a way to use stream API with this program?

2020-07-28 Thread Piotr Nowojski
:08 PM David Anderson > wrote: > >> In this use case, couldn't the custom trigger register an event time >> timer for MAX_WATERMARK, which would be triggered when the bounded input >> reaches its end? >> >> David >> >> On Mon, Jul 20, 2020 at 5:4

Re: Flink Jobs are failing for running testcases when trying to build in Jenkins server

2020-07-20 Thread Piotr Nowojski
Hi, It looks like Flink mini (test) cluster has troubles starting up on your Jenkins machine. Frankly, it's hard for me to guess what could be the issue here. 1. Are you following this guideline? [1] 2. Maybe there are some other error messages somewhere else in the logs/stderr/stdout? This 100s

Re: Is there a way to use stream API with this program?

2020-07-20 Thread Piotr Nowojski
Hi, I'm afraid that there is not out of the box way of doing this. I've created a ticket [1] to write down and document a discussion that we had about this issue in the past. The issue is that currently, untriggered processing time timers are ignored on end of input and it seems like there might

Re: Beam flink runner job not keeping up with input rate after downscaling

2020-07-20 Thread Piotr Nowojski
Hi, maxParallelism = -1, the default value, is interpreted as described in the documentation you linked: > The default setting for the maximum parallelism is roughly operatorParallelism + (operatorParallelism / 2) with a lower bound of 128 and an upper bound of 32768. So maxParallelism should

Re: Global Hashmap & global static variable.

2020-07-20 Thread Piotr Nowojski
Hi Annemarie, You are missing some basic concepts in Flink, please take a look at [1]. > Weirdly enough it worked fine in my Intellij. It's completely normal. If you are accessing some static variable in your code and you are executing your Flink application in a testing local environment

Re: Flink kafka exceptions handling

2021-01-07 Thread Piotr Nowojski
ultez go/secu. > Be cautious before opening attachments or clicking on any links. If in > doubt, use "Suspicious email" button or visit go/secu. > > > > > > > > -- Message transféré - > De : *Piotr Nowojski* > Date : mer. 6 janv. 2021 à 17:26 &

Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-09 Thread Piotr Nowojski
> capacity for it or if ~1 GB is appropriate. > > > > taskmanager.memory.task.off-heap.size: 1536m > > taskmanager.memory.managed.size: 3g > > taskmanager.memory.task.heap.size: 6g > > taskmanager.memory.jvm-metaspace.size: 1536m &g

Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Piotr Nowojski
Hi Kye, Almost for sure this error is not the primary cause of the failure. This error means that the node reporting it, has detected some fatal failure on the other side of the wire (connection reset by peer), but the original error is somehow too slow or unable to propagate to the JobManager

Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Piotr Nowojski
r with the jobmanager? > > -K > > On Tue, Dec 8, 2020 at 3:19 AM Piotr Nowojski > wrote: > >> Hi Kye, >> >> Almost for sure this error is not the primary cause of the failure. This >> error means that the node reporting it, has detected some fatal failure on

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

2020-12-11 Thread Piotr Nowojski
Hi, It's hard for me to guess what could be the problem. There was the same error reported a couple of months ago [1], but there is frankly no extra information there. Can we start from looking at the full TaskManager and JobManager logs? Could you share them with us? Best, Piotrek [1]

Re: stream to table, gable to stream overhead

2020-12-11 Thread Piotr Nowojski
Hi Eric, We have never measured it. Probably the most important overhead (probably the only significant thing) is the type conversion. Especially if object reuse is disabled this means serialization step. The best thing for you would be to just try it out in your particular use case. Best,

Re: Using key.fields in 1.12

2021-01-06 Thread Piotr Nowojski
Hey, have you added Kafka connector as the dependency? [1] [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/kafka.html#dependencies Best, Piotrek śr., 6 sty 2021 o 04:37 Aeden Jameson napisał(a): > I've upgraded from 1.11.1 to 1.12 in hopes of using the

Re: reason for endless backpressure

2021-01-06 Thread Piotr Nowojski
Hi, If you have an unstable network, which is dropping packets in a weird way (data is lost, but the connection is still kept alive from the perspective of the underlying operating system) it could happen that task will be perpetually blocked. But this is extremely rare. I would first suggest

Re: Roadmap for Execution Mode (Batch/Streaming) and interaction with Table/SQL APIs

2021-01-06 Thread Piotr Nowojski
Hi, 1. I think those changes will mostly bring new features/functionalities to the existing Streaming APIs in order to fully support batch executions. For example one way or another to better handle "bounded data streams" in the DataStream API. 2. I think there is and there is not going to be one

Re: Restoring from a checkpoint or savepoint on a different Kafka consumer group

2021-01-19 Thread Piotr Nowojski
from a checkpoint or savepoints offsets if there are > some (unless checkpointing offsets is turned off). > > Is this interpretation correct? > > Thanks! > > > On Mon, Jan 18, 2021 at 3:23 AM Piotr Nowojski > wrote: > >> Hi Rex, >> >> I believe this secti

Re: What is checkpoint start delay?

2021-01-19 Thread Piotr Nowojski
Hey Rex, What do you mean by "Start Delay" when recovering from a checkpoint? Did you mean when taking a checkpoint? If so: 1. https://www.google.com/search?q=flink+checkpoint+start+delay 2. top 3 result (at least for me)

Re: What is checkpoint start delay?

2021-01-19 Thread Piotr Nowojski
nt? Something else must be going on that's > in addition to the normal alignment process. > > On Tue, Jan 19, 2021 at 8:14 AM Piotr Nowojski > wrote: > >> Hey Rex, >> >> What do you mean by "Start Delay" when recovering from a checkpoint? Did >> you me

Re: Why do we need to use synchrnized(checkpointLock) in SourceFunction.run ?

2021-01-18 Thread Piotr Nowojski
Hi Kazunori, The checkpoint lock is acquired preemptively inside the SourceContext#collect call for the cases if the source is state less. However this is not enough if you are implementing a stateful `SourceFunction`, since if you are modifying state in your source function outside of the

Re: Why do we need to use synchrnized(checkpointLock) in SourceFunction.run ?

2021-01-19 Thread Piotr Nowojski
No problem :) Piotrek śr., 20 sty 2021 o 02:12 Kazunori Shinhira napisał(a): > Hi, > > > Thank you for your explanation. > > I now understand the need for checkpoint lock :) > > > > Best, > > 2021年1月19日(火) 18:00 Piotr Nowojski : > >> Hi, >>

Re: Restoring from a checkpoint or savepoint on a different Kafka consumer group

2021-01-19 Thread Piotr Nowojski
duplicate a > job in order to do some testing out-of-bound from the normal job while > slightly tweaking / tuning things. Is there any way to transfer offsets > between consumer groups? > > On Tue, Jan 19, 2021 at 5:45 AM Piotr Nowojski > wrote: > >> Hi, >> &

Re: Restoring from a savepoint, constraining factors

2021-01-18 Thread Piotr Nowojski
Hi Rex, Good that you have found the source of your problem and thanks for reporting it back. Regarding your question about the recovery steps (ignoring scheduling and deploying). I think it depends on the used state backend. From your other emails I see you are using RocksDB, so I believe this

Re: Monitor the Flink

2021-01-18 Thread Piotr Nowojski
Hi Penguin, Building on top of Yangze's response, you can also take a look at the more detailed system resources usage [1] after adding an optional dependency to the class path/lib directory. Regarding the single task/task slot metrics, as Yangze noted there is "almost" no isolation of the

Re: Restoring from a checkpoint or savepoint on a different Kafka consumer group

2021-01-18 Thread Piotr Nowojski
Hi Rex, I believe this section answers your question [1] Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration pon., 18 sty 2021 o 09:00 赵一旦 napisał(a): > If you changed the consumer group in your new job,

Re: Restoring from a savepoint, constraining factors

2021-01-19 Thread Piotr Nowojski
ed? Anything else I'm missing? > > Thanks! > > On Mon, Jan 18, 2021 at 2:49 AM Piotr Nowojski > wrote: > >> Hi Rex, >> >> Good that you have found the source of your problem and thanks for >> reporting it back. >> >> Regarding your question about th

Re: Why do we need to use synchrnized(checkpointLock) in SourceFunction.run ?

2021-01-19 Thread Piotr Nowojski
nding correct ? > > > Thank you for the information on the new Source interface. > > I’ll look into how to implement it. > > > > Best, > > 2021年1月18日(月) 23:45 Piotr Nowojski : > >> Hi Kazunori, >> >> The checkpoint lock is acquired preemptive

Re: Question about setNestedFileEnumeration()

2021-02-01 Thread Piotr Nowojski
Hi Billy, Could you maybe share some minimal code reproducing the problem? For example I would suggest to start with reading from local files with some trivial application. Best Piotrek pt., 22 sty 2021 o 00:21 Billy Bain napisał(a): > I have a Streaming process where new directories are

Re: Comment in source code of CoGroupedStreams

2021-02-01 Thread Piotr Nowojski
Hi Sudharsan, Sorry for maybe a bit late response, but as far as I can tell, this comment refers to this piece of code: public void apply(KEY key, W window, Iterable> values, Collector out) throws Exception { List oneValues = new ArrayList<>();

Re: importing types doesn't fix “could not find implicit value for evidence parameter of type …TypeInformation”

2021-02-01 Thread Piotr Nowojski
Hey Devin, Have you maybe tried looking for an answer via Google? Via just copying pasting your error message into Google I'm getting hundreds of results pointing towards: import org.apache.flink.api.scala._ Best, Piotrek czw., 28 sty 2021 o 04:13 Devin Bost napisał(a): > I posted this

Re: Flink and Amazon EMR

2021-02-01 Thread Piotr Nowojski
Hi, Yes, it's working. You would need to analyse what's working slower than expected. Checkpointing times? (Async duration? Sync duration? Start delay/back pressure?) Throughput? Recovery/startup? Are you being rate limited by Amazon? Piotrek czw., 28 sty 2021 o 03:46 Marco Villalobos

Re: importing types doesn't fix “could not find implicit value for evidence parameter of type …TypeInformation”

2021-02-01 Thread Piotr Nowojski
Hey, Sorry for my hasty response. I didn't notice you have the import inside the code block. Have you maybe tried one of the responses suggested in the Stackoverflow by other users? Best, Piotrek pon., 1 lut 2021 o 15:49 Piotr Nowojski napisał(a): > Hey Devin, > > Have you ma

Re: Flink and Amazon EMR

2021-02-01 Thread Piotr Nowojski
kpointing? I would expect > Amazon to have enough resources. When I turn my sink (the next operator) > into a print, it fails during checkpointing as well. > > I will explore what you mentioned though. Thank you. > > On Mon, Feb 1, 2021 at 6:53 AM Piotr Nowojski > wrote: > >&g

Re: flink slot communication

2021-01-28 Thread Piotr Nowojski
Hi, Yes Dawid is correct. Communications between two tasks on the same TaskManager are not going through the network, but via "local" channel (`LocalInputChannel`). It's still serialising and deserializing the data, but there are no network overheads, and local channels have only half of the

Re: recover from svaepoint

2021-06-07 Thread Piotr Nowojski
>>>> Unexpected error in InitProducerIdResponse; Producer attempted an >>>> operation with an old epoch. Either there is a newer producer with the same >>>> transactionalId, or the producer's transaction has been expired by the >>>> broker. >>>

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-09 Thread Piotr Nowojski
Yes good catch Kezhu, IllegalStateException sounds very much like FLINK-21028. Thomas, could you try upgrading to Flink 1.13.1 or 1.12.4? (1.11.4 hasn't been released yet)? Piotrek wt., 8 cze 2021 o 17:18 Kezhu Wang napisał(a): > Could it be same as FLINK-21028[1] (titled as “Streaming

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-08 Thread Piotr Nowojski
nts. > > Alex > > On Mon, Jun 7, 2021 at 3:12 AM Piotr Nowojski > wrote: > >> Hi Alex, >> >> A quick question. Are you using incremental checkpoints? >> >> Best, Piotrek >> >> sob., 5 cze 2021 o 21:23 napisał(a): >> >>

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-17 Thread Piotr Nowojski
Hi Thomas. The bug https://issues.apache.org/jira/browse/FLINK-21028 is still present in 1.12.1. You would need to upgrade to at least 1.13.0, 1.12.2 or 1.11.4. However as I mentioned before, 1.11.4 hasn't yet been released. On the other hand both 1.12.2 and 1.13.0 have already been superseded by

Re: Web UI shows my AssignTImestamp is in high back pressure but in/outPoolUsage are both 0.

2021-06-18 Thread Piotr Nowojski
Hi Haocheng, Regarding the first part, yes. For a very long time there was a trivial bug that was displaying the maximum "backpressure status" ("HIGH" in your case) from all of the subtasks, for every subtask, instead of showing the subtask's individual status. [1] It is/will be fixed in Flink

Re: Flow of events when Flink Iterations are used in DataStream API

2021-06-18 Thread Piotr Nowojski
Hi, In old Flink versions (prior to 1.9) that would be the case. If operator D emitted a record to Operator B, but Operator B hasn't yet processed when checkpoint is happening, this record would be lost during recovery. Operator D would be recovered with it's state as it was after emitting this

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
ctStreamOperator` or `OneInputStreamOperator`. Best, Piotrek pt., 18 cze 2021 o 12:49 Felipe Gutierrez napisał(a): > Hello Piotrek, > > On Fri, Jun 18, 2021 at 11:48 AM Piotr Nowojski > wrote: > >> Hi, >> >> As far as I can tell timers should be che

Re: The memory usage of the job is very different between Flink1.9 and Flink1.12

2021-06-18 Thread Piotr Nowojski
Hi, always when upgrading I would suggest to check release notes first [1] Best, Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html#memory-management pt., 18 cze 2021 o 12:24 Haihang Jing napisał(a): > Ask a question, the same business

Re: Handling Large Broadcast States

2021-06-18 Thread Piotr Nowojski
Hi, As far as I know there are no plans to support other state backends with BroadcastState. I don't know about any particular technical limitation, it probably just hasn't been done. Also I don't know how much effort that would be. Probably it wouldn't be easy. Timo, can you chip in how for

Re: Discard checkpoint files through a single recursive call

2021-06-18 Thread Piotr Nowojski
Hi, Unfortunately at the moment I think there are no plans to push for this. I would suggest you to bump/cast a vote on https://issues.apache.org/jira/browse/FLINK-13856 in order to allows us more accurately prioritise efforts. Best, Piotrek śr., 16 cze 2021 o 05:46 Jiahui Jiang napisał(a): >

Re: Task is always created state after submit a example job

2021-06-18 Thread Piotr Nowojski
Hi, I would start by looking at the Job Manager and Task Manager logs. Take a look if Task Managers connected/registered in the Job Manager and if so, if there were no problems when submitting the job. It seems like either there are not enough slots, or slots are actually not available. Best,

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
Hi, As far as I can tell timers should be checkpointed and recovered. What may be happening is that the state of the last seen watermarks by operators on different inputs and different channels inside an input is not persisted. Flink is assuming that after the restart, watermark assigners will

Re: Task is always created state after submit a example job

2021-06-21 Thread Piotr Nowojski
17 Lei Wang napisał(a): > There's enough slots on the jobmanger UI, but the slots are not available. > > After i add taskmanager.host: localhost to flink-conf.yaml, it works. > > But i don't know why. > > Thanks, > Lei > > > On Fri, Jun 18, 2021 at 6:07 PM

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-25 Thread Piotr Nowojski
TERMARKS. >> WHY? >> processing watermark: 0 >> processing watermark: 0 >> processing watermark: 0 >> Attempts restart: 1 >> processing watermark: 1 >> processing watermark: 1 >> processing watermark: 1 >> processing watermark: 1 >> A

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
I'm glad I could help, I hope it will solve your problem :) Best, Piotrek pt., 18 cze 2021 o 14:38 Felipe Gutierrez napisał(a): > On Fri, Jun 18, 2021 at 1:41 PM Piotr Nowojski > wrote: > >> Hi, >> >> Keep in mind that this is a quite low level approach to this p

Re: Managing Jobs entirely with Flink Monitoring API

2021-05-26 Thread Piotr Nowojski
Glad to hear it! Thanks for confirming that it works. Piotrek śr., 26 maj 2021 o 12:59 Barak Ben Nathan napisał(a): > > > Hi Piotrek, > > > > This is exactly what I was searching for. Thanks! > > > > Barak > > > > *From:* Piotr Nowojski > *Sent

Re: Managing Jobs entirely with Flink Monitoring API

2021-05-26 Thread Piotr Nowojski
Hi Barak, Before starting the JobManager I don't think there is any API running at all. If you want to be able to submit/stop multiple jobs to the same cluster session mode is indeed the way to go. But first you need to to start the cluster ( start-cluster.sh ) [1] Piotrek [1]

Re: Flink 1.11.3 NoClassDefFoundError: Could not initialize class

2021-05-26 Thread Piotr Nowojski
k.beforeInvoke(StreamTask.java:475) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > at java

Re: yarn ship from s3

2021-05-25 Thread Piotr Nowojski
Hi Vijay, I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job? Best, Piotrek

Re: DataStream API in Batch Mode job is timing out, please advise on how to adjust the parameters.

2021-05-25 Thread Piotr Nowojski
Hi Marco, How are you starting the job? For example, are you using Yarn as the resource manager? It looks like there is just enough resources in the cluster to run this job. Assuming the cluster is correctly configured and Task Managers are able to connect with the Job Manager (can you share full

Re: Time needed to read from Kafka source

2021-05-25 Thread Piotr Nowojski
Hi, That's a throughput of 700 records/second, which should be well below theoretical limits of any deserializer (from hundreds thousands up to tens of millions records/second/per single operator), unless your records are huge or very complex. Long story short, I don't know of a magic bullet to

Re: Customer operator in BATCH execution mode

2021-05-25 Thread Piotr Nowojski
Hi, 1. I don't know if there is a built-in way of doing it. You can always pass this information anyway on your own when you are starting the job (via operator/function's constructors). 2. Yes, I think this should work. Best, Piotrek wt., 25 maj 2021 o 17:05 ChangZhuo Chen (陳昌倬) napisał(a): >

Re: Flink 1.11.3 NoClassDefFoundError: Could not initialize class

2021-05-25 Thread Piotr Nowojski
Hi Georgi, I don't think it's a bug in Flink. It sounds like some problem with dependencies or jars in your job. Can you explain a bit more what do you mean by: > that some of them are constantly restarting with the following exception. After restart, everything is working as expected

Re: How to read large amount of data from hive and write to redis, in a batch manner?

2021-05-25 Thread Piotr Nowojski
Hi, You could always buffer records in your sink function/operator, until a large enough batch is accumulated and upload the whole batch at once. Note that if you want to have at-least-once or exactly-once semantics, you would need to take care of those buffered records in one way or another. For

Re: recover from svaepoint

2021-06-02 Thread Piotr Nowojski
Hi, I think there is no generic way. If this error has happened indeed after starting a second job from the same savepoint, or something like that, user can change Sink's operator UID. If this is an issue of intentional recovery from an earlier checkpoint/savepoint, maybe

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-07 Thread Piotr Nowojski
Hi Alex, A quick question. Are you using incremental checkpoints? Best, Piotrek sob., 5 cze 2021 o 21:23 napisał(a): > Small correction, in T4 and T5 I mean Job2, not Job 1 (as job 1 was save > pointed). > > Thank you, > Alex > > On Jun 4, 2021, at 3:07 PM, Alexander Filipchik > wrote: > > 

Re: Customer operator in BATCH execution mode

2021-05-27 Thread Piotr Nowojski
>> >> Yes, it should be possible to register a timer for Long.MAX_WATERMARK if >> you want to apply a transformation at the end of each key. You could >> also use the reduce operation (DataStream#keyBy#reduce) in BATCH mode. > > According to [0], timer time is irrelevant since timer will be

Re: How to read large amount of data from hive and write to redis, in a batch manner?

2021-07-08 Thread Piotr Nowojski
Great, thanks for coming back and I'm glad that it works for you! Piotrek czw., 8 lip 2021 o 13:34 Yik San Chan napisał(a): > Hi Piotr, > > Thanks! I end up doing option 1, and that works great. > > Best, > Yik San > > On Tue, May 25, 2021 at 11:43 PM Piotr No

<    1   2   3   4   5   6   7   >