Re: Flink’s Kubernetes HA services - NOT working

2021-02-11 Thread Matthias Pohl
://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/ha/kubernetes_ha.html#configuration On Wed, Feb 10, 2021 at 6:08 PM Matthias Pohl wrote: > Hi Daniel, > what's the exact configuration you used? Did you use the resource > definitions provided in the Standalone Flink on K

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Matthias Pohl
Hi Barisa, thanks for sharing this. I'm gonna add Till to this thread. He might have some insights. Best, Matthias On Wed, Feb 10, 2021 at 4:19 PM Barisa Obradovic wrote: > I'm trying to understand if behaviour of the flink jobmanager during > zookeeper upgrade is expected or not. > > I'm

Re: How does Flink handle shorted lived keyed streams

2021-02-10 Thread Matthias Pohl
Hi narashima, not sure whether this fits your use case, but have you considered creating a savepoint and analyzing it using the State Processor API [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#state-processor-api On Wed, Feb

Re: Flink’s Kubernetes HA services - NOT working

2021-02-10 Thread Matthias Pohl
Hi Daniel, what's the exact configuration you used? Did you use the resource definitions provided in the Standalone Flink on Kubernetes docs [1]? Did you do certain things differently in comparison to the documentation? Best, Matthias [1]

Re: S3 parquet files as Sink in the Table SQL API

2021-02-10 Thread Matthias Pohl
Hi, have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing. Best, Matthias [1] https://flink.apache.org/downloads.html#additional-components On Wed, Feb 10, 2021 at 1:24 PM meneldor wrote: > Hello, > I am using PyFlink and I want to write records

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-02-01 Thread Matthias Pohl
Yes, thanks for taking over the release! Best, Matthias On Mon, Feb 1, 2021 at 5:04 AM Zhu Zhu wrote: > Thanks Xintong for being the release manager and everyone who helped with > the release! > > Cheers, > Zhu > > Dian Fu 于2021年1月29日周五 下午5:56写道: > >> Thanks Xintong for driving this release!

Re: memory tuning

2021-01-27 Thread Matthias Pohl
rg.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.memory.process.size, 1600m > 2021-01-25 21:41:18,046 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: taskmanager.memory.process.size, 1728m > 2021-01-25 21:41:18,0

Re: JobManager seems to be leaking temporary jar files

2021-01-26 Thread Matthias Pohl
Hi Maciek, my understanding is that the jars in the JobManager should be cleaned up after the job is terminated (I assume that your jobs successfully finished). The jars are managed by the BlobService. The dispatcher will trigger the jobCleanup in [1] after job termination. Are there any

Re: memory tuning

2021-01-26 Thread Matthias Pohl
Hi Marco, Could you share the preconfiguration logs? They are printed in the beginning of the taskmanagers' logs and contain a summary of the used memory configuration? Best, Matthias On Tue, Jan 26, 2021 at 6:35 AM Marco Villalobos wrote: > > I have a flink job that collects and aggregates

Re: Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-25 Thread Matthias Pohl
luent/kafka-avro-serializer/5.5.2/kafka-avro-serializer-5.5.2.pom > [3]. https://mvnrepository.com/artifact/io.confluent/kafka-avro-serializer > > At 2021-01-22 21:22:51, "Matthias Pohl" wrote: > > Hi Smile, > Have you used a clean checkout? I second Robert's statement consideri

Re: FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread Matthias Pohl
Hi, thanks for reaching out to the community. I'm not an Hive nor Orc format expert. But could it be that this is a configuration problem? The error is caused by an ArrayIndexOutOfBounds exception in ValidReadTxnList.readFromString on an array generated by splitting a String using colons as

Re: unsubscribe

2021-01-24 Thread Matthias Pohl
Hi Abhishek, unsubscribing works by sending an email to user-unsubscr...@flink.apache.org as stated in [1]. Best, Matthias [1] https://flink.apache.org/community.html#mailing-lists On Sun, Jan 24, 2021 at 3:06 PM Abhishek Jain wrote: > unsubscribe >

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-01-22 Thread Matthias Pohl
neral. > > :-) > > Thanks! > > On Fri, 22 Jan 2021 at 15:19, Matthias Pohl > wrote: > >> Hi Sebastián, >> have you tried changing the dependency scope to provided >> for flink-table-planner-blink as it is suggested in [1]? >> >> Best, >>

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-01-22 Thread Matthias Pohl
Hi Sebastián, have you tried changing the dependency scope to provided for flink-table-planner-blink as it is suggested in [1]? Best, Matthias [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-1-10-exception-Unable-to-instantiate-java-compiler-td38221.html On Fri, Jan 22,

Re: Flink 1.11 checkpoint compatibility issue

2021-01-22 Thread Matthias Pohl
Hi Lu, thanks for reaching out to the community, Lu. Interesting observation. There's no change between 1.9.1 and 1.11 that could explain this behavior as far as I can tell. Have you had a chance to debug the code? Can you provide the code so that we could look into it more closely? Another thing:

Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-22 Thread Matthias Pohl
Hi Smile, Have you used a clean checkout? I second Robert's statement considering that the dependency you're talking about is already part of flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/pom.xml. It also has the correct scope set both in master and release-1.12. Best, Matthias On

Re: Handling validations/errors in the flink job

2021-01-22 Thread Matthias Pohl
Hi Sagar, have you had a look at CoProcessFunction [1]? CoProcessFunction enables you to join two streams into one and also provide context to use SideOutput [2]. Best, Matthias [1]

Re: JDBC connection pools

2021-01-22 Thread Matthias Pohl
Hi Marco, have you had a look into the connector documentation ([1] for the regular connector or [2] for the SQL connector)? Maybe, discussions about connection pooling in [3] and [4] or the code snippets provided in the JavaDoc of JdbcInputFormat [5] help as well. Best, Matthias [1]

Re: Question about setNestedFileEnumeration()

2021-01-22 Thread Matthias Pohl
Hi Wayne, based on other mailing list discussion ([1]) you can assume that the combination of FileProcessingMode.PROCESS_CONTINUOUSLY and setting FileInputFormat.setNestedFileEnumeration to true should work as you expect it to work. Can you provide more context on your issue like log files? Which

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-19 Thread Matthias Pohl
tem property ? If > user code has nothing to do with such arguments, why Flink append these > arguments to user JOB args? > Thanks, > Alexey > > > -- > *From:* Matthias Pohl > *Sent:* Sunday, January 17, 2021 11:53:29 PM > *To:* Alex

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-17 Thread Matthias Pohl
Hi Alexey, thanks for reaching out to the Flink community. I'm not 100% sure whether you have an actual issue or whether it's just the changed behavior you are confused about. The change you're describing was introduced in Flink 1.12 as part of the work on FLIP-104 [1] exposing the actual memory

Re: Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
s mean that only the > replied person can see the email? > > > If Maven fails to download plugins or dependencies, is mvn -clean > install -DskipTests a must? > I'll try first. > > penguin > > > > > 在 2021-01-13 16:35:10,"Matthias Pohl" 写道: >

Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
errors > Cannot resolve plugin org.codehaus.mojo:build-helper-maven-plugin: > > > Best, > penguin > > > > > 在 2021-01-13 15:24:22,"Matthias Pohl" 写道: > > Hi, > you might want to move these kinds of questions into the > user@flink.apache

Re: Idea import Flink source code

2021-01-12 Thread Matthias Pohl
Hi, you might want to move these kinds of questions into the user@flink.apache.org which is the mailing list for community support questions [1]. Coming back to your question: Is it just me or is the image not accessible? Could you provide a textual description of your problem? Best, Matthias

Re: Official Flink 1.12.0 image

2020-12-22 Thread Matthias Pohl
Hi Robert, there is a discussion about it in FLINK-20632 [1]. PR #9249 [2] still needs to get reviewed. You might want to follow that PR as Xintong suggested in [1]. I hope that helps. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-20632 [2]

Re: Putting record on kinesis (actually kinesalite) results with The security token included in the request is invalid.

2020-12-11 Thread Matthias Pohl
0, 2020 at 6:53 PM Matthias Pohl > wrote: > >> Hi Avi, >> thanks for reaching out to the Flink community. I haven't worked with the >> KinesisConsumer. Unfortenately, I cannot judge whether there's something >> missing in your setup. But first of all: Could you confirm tha

Re: Putting record on kinesis (actually kinesalite) results with The security token included in the request is invalid.

2020-12-10 Thread Matthias Pohl
Hi Avi, thanks for reaching out to the Flink community. I haven't worked with the KinesisConsumer. Unfortenately, I cannot judge whether there's something missing in your setup. But first of all: Could you confirm that the key itself is valid? Did you try to use it in other cases? Best, Matthias

Re: Accumulators storage path

2020-12-10 Thread Matthias Pohl
Hi Hanan, thanks for reaching out to the Flink community. Have you considered changing io.tmp.dirs [1][2]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#io-tmp-dirs [2]

Re: Flink jobmanager TLS connectivity to Zookeeper

2020-12-10 Thread Matthias Pohl
can go about doing this? > > -- Matthias Pohl | Engineer Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Inva

Re: Batch compressed file output

2020-11-27 Thread Matthias Pohl
Hi Flavio, others might have better ideas to solve this but I'll give it a try: Have you considered extending FileOutputFormat to achieve what you need? That approach (which is discussed in [1]) sounds like something you could do. Another pointer I want to give is the DefaultRollingPolicy [2]. It

Re: Questions on Table to Datastream conversion and onTimer method in KeyedProcessFunction

2020-11-13 Thread Matthias Pohl
Hi Fuyao, for your first question about the different behavior depending on whether you chain the methods or not: Keep in mind that you have to save the return value of the assignTimestampsAndWatermarks method call if you don't chain the methods together as it is also shown in [1]. At least the

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-13 Thread Matthias Pohl
Hi Si-li, trying to answer your initial question: Theoretically, you could try using the co-location constraints to achieve this. But keep in mind that this might lead to multiple Join operators running in the same JVM reducing the amount of memory each operator can utilize. Best, Matthias On

Re: Assistance configuring access to GoogleCloudStorage for row format streaming file sink

2020-11-13 Thread Matthias Pohl
.withMaxPartSize(1024 * 1024 * 1024) > .build()) > .build(); > > //input.print(); > input.addSink(sink); > > > Not sure what else to try. Any pointers appreciated. > > > > Sent with ProtonMail <https://protonmail.com> Secure Email. > > -- Matthias Poh

Re: Filter By Value in List

2020-11-13 Thread Matthias Pohl
Hi Rex, after verifying with Timo I created a new issue to address your proposal of introducing a new operator [1]. Feel free to work on that one if you like. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-20148 On Thu, Nov 5, 2020 at 6:35 PM Rex Fenley wrote: > Thanks Timo, >

Re: Logs of JobExecutionListener

2020-11-13 Thread Matthias Pohl
Hi Flavio, thanks for sharing this with the Flink community. Could you answer the following questions, please: - What's the code of your Job's main method? - What cluster backend and application do you use to execute the job? - Is there anything suspicious you can find in the logs that might be

Re: Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-13 Thread Matthias Pohl
Hi Averell, thanks for sharing this with the Flink community. Is there anything suspicious in the logs which you could share? Best, Matthias On Fri, Nov 13, 2020 at 2:27 AM Averell wrote: > I have some updates. Some weird behaviours were found. Please refer to the > attached photo. > > All

Re: Data loss exception using hash join in batch mode

2020-11-13 Thread Matthias Pohl
Hi 键, we would need more context on your case (e.g. logs and more details on what you're doing exactly or any other useful information) to help. Best, Matthias On Thu, Nov 12, 2020 at 3:25 PM 键 <1941890...@qq.com> wrote: > Data loss exception using hash join in batch mode >

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
s used when Flink uses HybridMemorySegments. Well, how the Flink knows > when to use these HybridMemorySegments and in which operations this is > happened? > > Best, > Iacovos > On 11/11/20 11:41 π.μ., Matthias Pohl wrote: > > Hi Iacovos, > The task's off-heap configuration value

Re: SSL setup for YARN deployment when hostnames are unknown.

2020-11-11 Thread Matthias Pohl
Hi Jiahui, thanks for reaching out to the mailing list. This is not something I have expertise in. But have you checked out the Flink SSL Setup documentation [1]? Maybe, you'd find some help there. Additionally, I did go through the code a bit: A SecurityContext is loaded during ClusterEntrypoint

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
Hi Iacovos, The task's off-heap configuration value is used when spinning up TaskManager containers in a clustered environment. It will contribute to the overall memory reserved for a TaskManager container during deployment. This parameter can be used to influence the amount of memory allocated if

Re: a couple of memory questions

2020-11-05 Thread Matthias Pohl
Hello Edward, please find my answers within your message below: On Wed, Nov 4, 2020 at 1:35 PM Colletta, Edward wrote: > Using Flink 1.9.2 with FsStateBackend, Session cluster. > > > >1. Does heap state get cleaned up when a job is cancelled? > > We have jobs that we run on a daily basis.

Re: Running flink in a Local Execution Environment for Production Workloads

2020-10-29 Thread Matthias Pohl
Hi Joseph, thanks for reaching out to us. There shouldn't be any downsides other than the one you already mentioned as far as I know. Best, Matthias On Fri, Oct 23, 2020 at 1:27 PM Joseph Lorenzini wrote: > Hi all, > > > > I plan to run flink jobs as docker containers in a AWS Elastic

Re: Kubernetes Job Cluster, does it autoterminate?

2020-10-28 Thread Matthias Pohl
Hi Ruben, thanks for reaching out to us. Flink's native Kubernetes Application mode [1] might be what you're looking for. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application On Wed, Oct 28, 2020 at

Fwd: Flink memory usage monitoring

2020-10-27 Thread Matthias Pohl
I missed adding the mailing list in my previous email. -- Forwarded message - From: Matthias Pohl Date: Tue, Oct 27, 2020 at 12:39 PM Subject: Re: Flink memory usage monitoring To: Rajesh Payyappilly Jose Hi Rajesh, thanks for reaching out to us. We worked on providing metrics

Re: Ignoring invalid values in KafkaSerializationSchema

2020-09-24 Thread Matthias Pohl
Hi Yuval, thanks for bringing this issue up. You're right: There is no error handling currently implemented for SerializationSchema. FLIP-124 [1] addressed this for the DeserializationSchema, though. I created FLINK-19397 [2] to cover this feature. In the meantime, I cannot think of any other

Re: global state and single stream

2020-09-24 Thread Matthias Pohl
Hi Adam, sorry for the late reply. Introducing a global state is something that should be avoided as it introduces bottlenecks and/or concurrency/order issues. Broadcasting the state between different subtasks will also bring a loss in performance since each state change has to be shared with

Re: How can I drop events which are late by more than X hours/days?

2020-09-24 Thread Matthias Pohl
the difference between the watermark and my element's timestamp is > greater than X - drop the element. > > However, I do not have access to the current watermark inside any of > Flink's operators/functions including FilterFunction. > > How can such functionality be achieved?

Re: Re: Questions of "State Processing API in Scala"

2020-09-01 Thread Matthias Pohl
he State > >Processor API, but it's just that up to this point, we didn't have a plan > >for that yet. > >Can you open a JIRA for this? I think it'll be a reasonable extension to > >the API. > > > > > >> > >> And when I change `xxx.keyBy(_._

<    1   2   3