Flink, local development, finish processing a stream of Kafka data

2021-03-02 Thread Dan Hill
Hi. For local and tests development, I want to flush the events in my system to make sure I'm processing everything. My watermark does not progress to finish all of the data. What's the best practice for local development or tests? If I use idle sources for 1 Kafka partition, this appears broke

Flink Zookeeper leader change v 1.9.X

2021-03-02 Thread Varun Chakravarthy Senthilnathan
Hi, We are using flink version 1.9.1 and in a long-running environment, we encountered the specific issue mentioned in : https://issues.apache.org/jira/browse/FLINK-14091 While we are working on upgrading our version, 1. Why does zookeeper go for a leader change? As far as we checked, there

Flink 1.12 Compatibility with hbase-shaded-client 2.1 in application jar

2021-03-02 Thread Debraj Manna
Hi I am trying to deploy an application in flink 1.12 having hbase-shaded-client 2.1.0 as dependency in application mode . On deploying the application I am seeing the below ClassCastE

Re: java Flink local test failure (Could not create actor system)

2021-03-02 Thread Vijayendra Yadav
Hi Smile, Thanks for your clarification, it helped. Thanks, Vijay > On Feb 28, 2021, at 7:06 PM, Smile wrote: > > Hi Vijay, > > Since version 1.7 Flink builds with Scala version 2.11 (default) and 2.12. > Flink has APIs, libraries, and runtime modules written in Scala. Users of > the Scala A

Re: Debugging long Flink checkpoint durations

2021-03-02 Thread Dan Hill
Thanks! Yes, I've looked at these. My job is facing backpressure starting at an early join step. I'm unclear if more time is fine for the backfill or if I need more resources. On Tue, Mar 2, 2021 at 12:50 AM Yun Gao wrote: > Hi Dan, > > I think you could see the detail of the checkpoints via

Re: Need help with JDBC Broken Pipeline Issue after some idle time

2021-03-02 Thread XU Qinghui
It sounds like the jdbc driver's connection is closed somehow, and probably has nothing to do with flink itself. Maybe you could check if there's some settings on the db that could close the connection after some inactivity, or otherwise it could be your network drops the inactive tcp connection af

Re: Sliding Window Count: Tricky Edge Case / Count Zero Problem

2021-03-02 Thread Jan Brusch
Hi Roman, thanks for your reply. Don't timers remove themselves after firing? Apart from that, the idea is indeed to have one timer per element, so that we count one up whenever the element comes in and count one down exactly later. So we emulate a sliding window without the "hops" in certa

Flink + Hive + Compaction + Parquet?

2021-03-02 Thread Theo Diefenthal
Hi there, Currently, I have a Flink 1.11 job which writes parquet files via the StreamingFileSink to HDFS (simply using DataStream API). I commit like every 3 minutes and thus have many small files in HDFS. Downstream, the generated table is consumed from Spark Jobs and Impala queries. HDFS do

Re: Sliding Window Count: Tricky Edge Case / Count Zero Problem

2021-03-02 Thread Roman Khachatryan
Hi Jan, Thanks for sharing your solution. You probably also want to remove previously created timer(s) in processElement; so that you don't end up with a timer per element. For that, you can store the previous time (in function state). Regards, Roman On Fri, Feb 26, 2021 at 10:29 PM Jan Brusch

Re: [Flink-SQL] FLOORing OR CEILing a DATE or TIMESTAMP to WEEK uses Thursdays as week start

2021-03-02 Thread Jaffe, Julian
Calcite does not follow ISO-8601. Instead, until very recently Calcite weeks started on Thursdays[1]. (As an aside, Calcite somewhat abuses the WEEK time unit - converting a date to a week returns an integer representing the week of the year the date falls in while FLOORing or CEILing a timesta

Python DataStream API Questions -- Java/Scala Interoperability?

2021-03-02 Thread Kevin Lam
Hello everyone, I have some questions about the Python API that hopefully folks in the Apache Flink community can help with. A little background, I’m interested in using the Python Datastream API because of stakeholders who don’t have a background in Scala/Java, and would prefer Python if possibl

Re: Need help with JDBC Broken Pipeline Issue after some idle time

2021-03-02 Thread Fuyao Li
Sorry for the uncompleted email. Error log of broken pipeline, the failed SQL will be executed after checkpoint automatic recovery. Please share some ideas on this issue. Really appreciate it. Thanks! 09:20:02,868 ERROR org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC

Need help with JDBC Broken Pipeline Issue after some idle time

2021-03-02 Thread Fuyao Li
Hi Flink Community, I need some help with JDBC sink in Datastream API. I can produce some records and sink it to database correctly. However, if I wait for 5 minutes between insertions. I will run into broken pipeline issue. Ater that, the Flink application will restart and recover from checkpo

Re: Savepoint documentation

2021-03-02 Thread XU Qinghui
Out of curiosity, does it mean that savepoint created by flink 1.11 cannot be recovered by a job running with flink 1.10 or older versions (so downgrade is impossible)? Le mar. 2 mars 2021 à 12:25, David Anderson a écrit : > You are correct in thinking that the documentation wasn't updated. If y

Running Apache Flink on Android

2021-03-02 Thread Alexander Borgschulze
I was trying to run Apache Flink within an Android App. I just want to run a minimum working example, like this: @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); runFlinkExample(); } private v

Re: [Flink-SQL] FLOORing OR CEILing a DATE or TIMESTAMP to WEEK uses Thursdays as week start

2021-03-02 Thread Timo Walther
Hi Sebastián, it might be the case that some time functions are not correct due to the underlying refactoring of data structures. I will loop in Leonard in CC that currently works on improving this situation as part of FLIP-162 [1]. @Leonard: Is this wrong behavior on your list? Regards, Tim

Re: Best way to handle BIGING to TIMESTAMP conversions

2021-03-02 Thread Yik San Chan
I think you can also do CAST((e / 1000) AS TIMESTAMP) On Tue, Mar 2, 2021 at 7:27 PM Sebastián Magrí wrote: > Thanks a lot Jark, > > On Mon, 1 Mar 2021 at 02:38, Jark Wu wrote: > >> Hi Sebastián, >> >> You can use `TO_TIMESTAMP(FROM_UNIXTIME(e))` to get a timestamp value. >> The BIGINT should b

[Flink-SQL] FLOORing OR CEILing a DATE or TIMESTAMP to WEEK uses Thursdays as week start

2021-03-02 Thread Sebastián Magrí
While using a simple query such as this SELECT `ts`, FLOOR(`ts` TO WEEK) as `week_start`, CEIL(`ts` TO WEEK) as `week_end` FROM some_table I get some weird results like these: 2021-03-01T00:00|2021-02-25T00:00|2021-03-04T00:00 Which is obviously wrong since March 1st is on Mond

Re: Best way to handle BIGING to TIMESTAMP conversions

2021-03-02 Thread Sebastián Magrí
Thanks a lot Jark, On Mon, 1 Mar 2021 at 02:38, Jark Wu wrote: > Hi Sebastián, > > You can use `TO_TIMESTAMP(FROM_UNIXTIME(e))` to get a timestamp value. > The BIGINT should be in seconds. Please note to declare the computed > column > in DDL schema and declare a watermark strategy on this com

Re: Savepoint documentation

2021-03-02 Thread David Anderson
You are correct in thinking that the documentation wasn't updated. If you look at the master docs [1] you will see that they now say Can I move the Savepoint files on stable storage? #

Flink KafkaProducer flushing on savepoints

2021-03-02 Thread Witzany, Tomas
Hi, I have a question about the at-least-once guarantees for Kafka producers when checkpointing is disabled. In our data pipeline we have a Flink job on an unlimited stream that originally, we had checkpoints turned on. Further this job is cancelled with a savepoint once a day to do some data pr

How to get operator uid from a sql

2021-03-02 Thread XU Qinghui
Hello folks I'm trying to use the flink state processor api to read the state of operators from a checkpoint. But currently the operator look up in the API relies on the operator `uid` (e.g. ExistingSavepoint.readKeyedState(uid, readerFunction)). But when it comes to a sql job, where should I look

Re: Flink application kept restarting

2021-03-02 Thread Rainie Li
Thanks for checking, Matthias. I have another flink job which failed last weekend with the same buffer pool destroyed error. This job is also running version 1.9. Here is the error I found from the task manager log. Any suggestion what is the root cause and how to fix it? 2021-02-28 00:54:45,943

Savepoint documentation

2021-03-02 Thread Farouk
Hi Does this chapter is outdated with Flink 1.11 ? https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/state/savepoints.html#can-i-move-the-savepoint-files-on-stable-storage *Can I move the Savepoint files on stable storage?* *The quick answer to this question is currently “no” bec

Re: Debugging long Flink checkpoint durations

2021-03-02 Thread Yun Gao
Hi Dan, I think you could see the detail of the checkpoints via the checkpoint UI[1]. Also, if you see in the pending checkpoints some tasks do not take snapshot, you might have a look whether this task is backpressuring the previous tasks [2]. Best, Yun [1] https://ci.apache.org/projects/