Re: Does FileSource download all remote files for generating splits

2022-01-27 Thread Meghajit Mazumdar
Thanks Caizhi. This clarifies. On Fri, Jan 28, 2022 at 12:06 PM Caizhi Weng wrote: > Hi! > > FileEnumerator never reads the actual content of a file. FileEnumerator > lives in job managers and it only reads the necessary meta-data of the file > (for example how large is the file) so that it can

Re: Unbounded streaming with table API and large json as one of the columns

2022-01-27 Thread Caizhi Weng
Hi! This job will work as long as your SQL statement is valid. Did you meet some difficulties? Or what is your concern? A record of 100K is sort of large, but I've seen quite a lot of jobs with such record size so it is OK. HG 于2022年1月27日周四 02:57写道: > Hi, > > I need to calculate elapsed times

Re: Questions about checkpoint retention

2022-01-27 Thread Caizhi Weng
Hi! So you'd like to remove all checkpoints after a savepoint is completed? Could you elaborate more on why you'd like to retain 10 checkpoints? For most of the cases retaining one checkpoint is enough. Also you mentioned that you're keeping 10 checkpoints for each version of your app. For each

Re: Does FileSource download all remote files for generating splits

2022-01-27 Thread Caizhi Weng
Hi! FileEnumerator never reads the actual content of a file. FileEnumerator lives in job managers and it only reads the necessary meta-data of the file (for example how large is the file) so that it can split the work across all task managers. Corresponding file readers, in the other hand, lives

Re: Aggregation support in the Table API

2022-01-27 Thread Caizhi Weng
Hi! You can directly use .select() to call an aggregate function (either built-in or user-defined). See [1] for an example on how to use call() expression to call an user-defined aggregate function. [1]

Re: 如何给flink的输出削峰填谷?

2022-01-27 Thread 18703416...@163.com
类似kafka这样的消息管道应该用来 削峰填谷, 可以先sink 至kafka,再从kafka -> db > 2022年1月26日 上午2:11,Jing 写道: > > Hi Flink中文社区, > > 我碰到一个这样的问题,我的数据库有write throttle, 我的flink > app是一个10分钟窗口的聚合操作,这样导致,每10分钟有个非常大量的写请求。导致数据库的sink有时候会destroy. > 有什么办法把这些写请求均匀分布到10分钟吗? > > > 谢谢, > Jing

Re: Flink-ML: Sink model data in online training

2022-01-27 Thread Zhipeng Zhang
Hi thekingofcity, Thanks for your interest! Unfortunately we don't have an example for online learning for now. We are working on an online machine learning example. Hopefully it will be added here [1] in the next three weeks. [1] https://github.com/apache/flink-ml thekingofcity

Aggregation support in the Table API

2022-01-27 Thread Pouria Pirzadeh
I am using the Table api in Java to write queries with grouping/aggregation. The aggregations may use built-in functions or user defined aggregate functions. Therefore I am using the aggregate() method on a WindowGroupedTable. table.window(...) .groupBy(...)

Re: Flink POJO documentation for primitive boolean state variables

2022-01-27 Thread Alexander Fedulov
Hi Mac, I just verified that objects with isXXX methods indeed will be interpreted as POJOs. Would you be willing to contribute a documentation update? Here are some guidelines: [1]. [1] https://flink.apache.org/contributing/contribute-documentation.html Thanks, Alexander Fedulov On Thu,

Re: Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
I'm not sure if the issue in [1] is relevant since it mentions the Table API, but it could be. Since stream1 and stream2 in my example have a long chain of operators behind, I presume they might "run" at very different paces. Oh and, in the context of my unit tests, watermarks should be

Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
Hi everyone, I'm seeing a lack of determinism in unit tests when using an interval join. I am testing with both Flink 1.11.3 and 1.14.3 and the relevant bits of my pipeline look a bit like this: keySelector1 = ... keySelector2 = ... rightStream = stream1 .flatMap(...) .keyBy(keySelector1)

Re: Inaccurate checkpoint trigger time

2022-01-27 Thread Paul Lam
Hi Yun, Sorry for the late reply. I finally found some time to investigate this problem further. I upgraded the job to 1.14.0, but it’s still the same. I’ve checked the debug logs, and I found that Zookeeper notifies watched event of checkpoint id changes very late [1]. Each time a checkpoint

Does FileSource download all remote files for generating splits

2022-01-27 Thread Meghajit Mazumdar
Hello, I had a question about the FileSource in Flink 1.14 . Considering FileSource is set to read from a remote GCS URL, I could read and understand that the FileEnumerator is

Duplicate job submission error

2022-01-27 Thread Parag Somani
Hello All, While deploying on our one of environment, we encountered crashloopback of job manager pod. Env: K8s Flink: 1.14.2 Could you suggest, how can we troubleshoot this and possible handling of this? exception snipper as follows: 2022-01-27 06:58:07.326 ERROR 44 --- [lt-dispatcher-4]

Re: Flink POJO documentation for primitive boolean state variables

2022-01-27 Thread Makhanchan Pandey
Hi all, Just wanted to follow up on this again :) Thanks in advance. Regards, Mac Pandey On Tue, Jan 25, 2022 at 12:59 PM Makhanchan Pandey < makhanchanpan...@gmail.com> wrote: > Hi all, > > For Flink to treat a model class as a special POJO type, these are the > documented conditions: >