Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2021-03-05 Thread bat man
Hi Xintong Song, I tried using the java options to generate heap dump referring to docs[1] in flink-conf.yaml, however after adding this the task manager containers are not coming up. Note that I am using EMR. Am i doing anything wrong here? env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError -XX:Heap

Re: reading file from s3

2021-03-05 Thread Avi Levi
Does anyone by any chance have a working example (of course without the credentials etc') that can be shared on github ?simply reading/writing a file from/to s3. I keep on struggling with this one and getting weird exceptions Thanks On Thu, Mar 4, 2021 at 7:30 PM Avi Levi wrote: > Sure, This is

questions about broadcasts

2021-03-05 Thread Marco Villalobos
Is it possible for an operator to receive two different kinds of broadcasts? Is it possible for an operator to receive two different types of streams and a broadcast? For example, I know there is a KeyedCoProcessFunction, but is there a version of that which can also receive broadcasts?

Invitation to Beam College

2021-03-05 Thread Mara Ruvalcaba
Hi Flink community, You are invited to Improve your data processing skills with the *Beam College* webinars! If you know about Apache Beam but haven’t used it in production yet, or you want to learn best practices to optimize your Beam pipelines, then Beam College is for you! Beam College,

Re: Broadcasting to multiple operators

2021-03-05 Thread David Anderson
Glad to hear it! Thanks for letting us know. David On Fri, Mar 5, 2021 at 10:22 PM Roger wrote: > Confirmed. This worked! > Thanks! > Roger > > On Fri, Mar 5, 2021 at 12:41 PM Roger wrote: > >> Hey David. >> Thank you very much for your response. This is making sense now. It was >> confusing b

Re: Python DataStream API Questions -- Java/Scala Interoperability?

2021-03-05 Thread Kevin Lam
Thanks Shuiqiang! That's really helpful, we'll give the connectors a try. On Wed, Mar 3, 2021 at 4:02 AM Shuiqiang Chen wrote: > Hi Kevin, > > Thank you for your questions. Currently, users are not able to defined > custom source/sinks in Python. This is a greate feature that can unify the > end

Running Pyflink job on K8s Flink Cluster Deployment?

2021-03-05 Thread Kevin Lam
Hello everyone, I'm looking to run a Pyflink application run in a distributed fashion, using kubernetes, and am currently facing issues. I've successfully gotten a Scala Flink Application to run using the manifests provided at [0] I attempted to run the application by updating the jobmanager comm

Re: Broadcasting to multiple operators

2021-03-05 Thread Roger
Confirmed. This worked! Thanks! Roger On Fri, Mar 5, 2021 at 12:41 PM Roger wrote: > Hey David. > Thank you very much for your response. This is making sense now. It was > confusing because I was able to use the Broadcast stream prior to adding > the second stream. However, now I realize that th

Re: Dynamic JDBC Sink Support

2021-03-05 Thread David Anderson
Rion, A given JdbcSink can only write to one table, but if the number of tables involved isn't unreasonable, you could use a separate sink for each table, and use side outputs [1] from a process function to steer each record to the appropriate sink. I suggest you avoid trying to implement a sink.

Re: Broadcasting to multiple operators

2021-03-05 Thread Roger
Hey David. Thank you very much for your response. This is making sense now. It was confusing because I was able to use the Broadcast stream prior to adding the second stream. However, now I realize that this part of the pipeline occurs after the windowing so I'm not affected the same way. This is d

Re: Broadcasting to multiple operators

2021-03-05 Thread David Anderson
This is a watermarking issue. Whenever an operator has two or more input streams, its watermark is the minimum of watermarks of the incoming streams. In this case your broadcast stream doesn't have a watermark generator, so it is preventing the watermarks from advancing. This in turn is preventing

Dynamic JDBC Sink Support

2021-03-05 Thread Rion Williams
Hi all, I’ve been playing around with a proof-of-concept application with Flink to assist a colleague of mine. The application is fairly simple (take in a single input and identify various attributes about it) with the goal of outputting those to separate tables in Postgres: object AttributeIdent

Re: LocalWatermarkAssigner causes predicate pushdown to be skipped

2021-03-05 Thread Yuval Itzchakov
Hi Timo, After investigating this further, this is actually non related to implementing SupportsWatermarkPushdown. Once I create a TableSchema for my custom source's RowData, and assign it a watermark (see my example in the original mail), the plan will always include a LogicalWatermarkAssigner. T

RE: Need information on latency metrics

2021-03-05 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi Timo, Yes I have gone through the link. But for the other metrics documentation has description. For example, numBytesOut - The total number of bytes this task has emitted. lastCheckpointSize - The total size of the last checkpoint (in bytes). For the latency metrics I don't see such descr

Broadcasting to multiple operators

2021-03-05 Thread Roger
Hello. I am having an issue with a Flink 1.8 pipeline when trying to consume broadcast state across multiple operators. I currently have a working pipeline that looks like the following: records .assignTimestampsAndWatermarks( new BoundedOutOfOrdernessGenerator( Long.parseLong(proper

Error Starting PyFlink in Kubernetes Session Cluster "Could Not Get Rest Endpoint"

2021-03-05 Thread Robert Cullen
Trying to spin up a Python Flink instance in my Kubernetes cluster with this configuration ... sudo ./bin/flink run \ --target kubernetes-session \ -Dkubernetes.cluster-id=flink-python \ -Dkubernetes.namespace=cmdaa \ -Dkubernetes.container.image=cmdaa/pyflink:0.0.1 \ --pyModule word_count \ --pyF

Re: [ANNOUNCE] Apache Flink 1.12.2 released

2021-03-05 Thread Piotr Nowojski
Thanks Roman and Yuan for your work and driving the release process :) pt., 5 mar 2021 o 15:53 Till Rohrmann napisał(a): > Great work! Thanks a lot for being our release managers Roman and Yuan and > to everyone who has made this release possible. > > Cheers, > Till > > On Fri, Mar 5, 2021 at 10

Re: Flink KafkaProducer flushing on savepoints

2021-03-05 Thread Piotr Nowojski
Yes, that might be an issue. As far as I remember, the universal connector works with Kafka 0.10.x or higher. Piotrek pt., 5 mar 2021 o 11:20 Witzany, Tomas napisał(a): > Hi, > thanks for your answer. It seems like it will not be possible for me to > upgrade to the newer universal Flink produce

Re: Re: Independence of task parallelism

2021-03-05 Thread Piotr Nowojski
Yes, it might be the case. Hard to tell for sure without looking at the job, metrics etc. Just be mindful of what I described, and if you want to fine tune a job and set different parallelism values for different operators, pay attention to where those operators are being distributed. Usually in pr

Re: LocalWatermarkAssigner causes predicate pushdown to be skipped

2021-03-05 Thread Timo Walther
Hi Yuval, sorry that nobody replied earlier. Somehow your email fell through the cracks. If I understand you correctly, could would like to implement a table source that implements both `SupportsWatermarkPushDown` and `SupportsFilterPushDown`? The current behavior might be on purpose. Filt

New settings are not honored unless checkpoint is cleared.

2021-03-05 Thread Yordan Pavlov
Hello there, I am running Flink 1.11.3 on Kubernetes deployment. If I change a setting and re-deploy my Flink setup, the new setting is correctly applied in the config file but is not being honored by Flink. In other words, I can ssh into the pod and check the config file - it has the new setting a

Re: Convert BIGINT to TIMESTAMP in pyflink when using datastream api

2021-03-05 Thread Timo Walther
Hi Shilpa, Shuiqiang is right. Currently, we recommend to use SQL DDL until the connect API is updated. See here: https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/create/#create-table Especially the WATERMARK section shows how to declare a rowtime attribute. Regards,

Re: Need information on latency metrics

2021-03-05 Thread Timo Walther
Hi Suchithra, did you see this section in the docs? https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html#latency-tracking Regards, Timo On 05.03.21 15:31, V N, Suchithra (Nokia - IN/Bangalore) wrote: Hi, I am using flink 1.12.1 version and trying to explore latency metrics

Re: How to emit after a merge?

2021-03-05 Thread Timo Walther
I don't know how the resulting plan for you query looks like. You can print it via `env.sqlQuery().explain()`. But I could imagine that by simplifying the query you would also simplify the number of retraction messages/operators in the pipeline. Regards, Timo On 05.03.21 13:28, Yik San Chan

Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-05 Thread Till Rohrmann
Thanks for this proposal Guowei. +1 for it. Concerning the default size, maybe we can run some experiments and see how the system behaves with different pool sizes. Cheers, Till On Fri, Mar 5, 2021 at 2:45 PM Stephan Ewen wrote: > Thanks Guowei, for the proposal. > > As discussed offline alrea

Re: [ANNOUNCE] Apache Flink 1.12.2 released

2021-03-05 Thread Till Rohrmann
Great work! Thanks a lot for being our release managers Roman and Yuan and to everyone who has made this release possible. Cheers, Till On Fri, Mar 5, 2021 at 10:43 AM Yuan Mei wrote: > Cheers! > > Thanks, Roman, for doing the most time-consuming and difficult part of the > release! > > Best, >

Need information on latency metrics

2021-03-05 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, I am using flink 1.12.1 version and trying to explore latency metrics with Prometheus. I have enabled latency metrics by adding "metrics.latency.interval: 1" in flink-conf.yaml. I have submitted a flink streaming job which has Source->flatmap->process->sink which is chained into single task

Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-05 Thread Stephan Ewen
Thanks Guowei, for the proposal. As discussed offline already, I think this sounds good. One thought is that 16m sounds very small for a default read buffer pool. How risky do you think it is to increase this to 32m or 64m? Best, Stephan On Fri, Mar 5, 2021 at 4:33 AM Guowei Ma wrote: > Hi, a

Re: How to emit after a merge?

2021-03-05 Thread Yik San Chan
Hi Timo, If I understand correctly, the UDF only simplifies the query, but not doing anything functionally different. Please correct me if I am wrong, thank you! Best, Yik San On Thu, Mar 4, 2021 at 8:34 PM Timo Walther wrote: > Yes, implementing a UDF might be the most convenient option for s

Re: Flink KafkaProducer flushing on savepoints

2021-03-05 Thread Witzany, Tomas
Hi, thanks for your answer. It seems like it will not be possible for me to upgrade to the newer universal Flink producer, because of an older Kafka version I am reading from. So unfortunately for now I will have to go with the hack. Thanks From: Piotr Nowojski S

Re: [ANNOUNCE] Apache Flink 1.12.2 released

2021-03-05 Thread Yuan Mei
Cheers! Thanks, Roman, for doing the most time-consuming and difficult part of the release! Best, Yuan On Fri, Mar 5, 2021 at 5:41 PM Roman Khachatryan wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.12.2, which is the second bugfix release for th

[ANNOUNCE] Apache Flink 1.12.2 released

2021-03-05 Thread Roman Khachatryan
The Apache Flink community is very happy to announce the release of Apache Flink 1.12.2, which is the second bugfix release for the Apache Flink 1.12 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2021-03-05 Thread Xintong Song
Hi Hemant, This exception generally suggests that JVM is running out of heap memory. Per the official documentation [1], the amount of live data barely fits into the Java heap having little free space for new allocations. You can try to increase the heap size following these guides [2]. If a mem

Re: Multiple JobManager HA set up for Standalone Kubernetes

2021-03-05 Thread Yang Wang
Hi deepthi, Thanks for trying the Kubernetes HA service. > Do I need standby JobManagers? I think the answer is based on your production requirements. Usually, it is unnecessary to have more than one JobManagers. Because we are using the Kubernetes deployment to manage the JobManager. Once it cra

java.lang.OutOfMemoryError: GC overhead limit exceeded

2021-03-05 Thread bat man
Hi, Getting the below OOM but the job failed 4-5 times and recovered from there. j *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceededat org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(S

Aw: Re: Independence of task parallelism

2021-03-05 Thread Jan Nitschke
Hey Piotr,    thanks for your answer, that makes perfect sense. However, when looking at the number of messages being processed, we can see that both subtasks on task 2 will produce the same amount of messages in the (1-2-1-1-1) scenario, even with the first task hitting backpressure. We assume t