Re: Data Stream countWindow followed by keyBy does not preserve time order

2021-11-02 Thread Guowei Ma
Hi I did not run your program directly, but I see that you are now using the Batch execution mode. I suspect it is related to this, because in the Batch execution mode FLINK will "sort" the Key (this might be an unstable sort). So would you like to experiment with the results of running with

Re: Flink sink data to DB and then commit data to Kafka

2021-11-02 Thread Guowei Ma
Hi, Qihua AFAIK there is no way to do it. Maybe you need to implement a "new" sink to archive this target. Best, Guowei On Wed, Nov 3, 2021 at 12:40 PM Qihua Yang wrote: > Hi, > > Our flink application has two sinks(DB and kafka topic). We want to push > same data to both sinks. Is it

Re: FlinkSQL 使用 streamingSink 写入 hive orc数据,如何控制文件数量。

2021-11-02 Thread yidan zhao
还有个问题,我看FlinkSQL写的文件的命名不像文档中说的如下格式: └── 2019-08-25--12 ├── prefix-0-0.ext ├── prefix-0-1.ext.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334 ├── prefix-1-0.ext └── prefix-1-1.ext.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11

Flink sink data to DB and then commit data to Kafka

2021-11-02 Thread Qihua Yang
Hi, Our flink application has two sinks(DB and kafka topic). We want to push same data to both sinks. Is it possible to push data to kafka topic only after data is pushed to DB successfully? If the commit to DB fail, we don't want those data is pushed to kafka. Thanks, Qihua

Why do the count windows in Flink Table APIs require processing time for sorting whereas in Flink Datastream APIs they do not

2021-11-02 Thread Long Nguyễn
I have read about the Window operator in Flink documentation and know that it groups rows into finite groups based on time or row-count intervals. I saw an example of a sliding count window right

Re: FlinkSQL 使用 streamingSink 写入 hive orc数据,如何控制文件数量。

2021-11-02 Thread Caizhi Weng
Hi! hive sink 有文件合并功能可以在同一个 checkpoint 内把同一个 partition 的数据整理到同一个文件里。详见 [1] [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction yidan zhao 于2021年11月3日周三 上午10:03写道: > 需求 > 假设,我的hive表为tmp表,若干字段,如何以dt、hour、sid为分区,其中sid为渠道的含义。 > >

FlinkSQL 使用 streamingSink 写入 hive orc数据,如何控制文件数量。

2021-11-02 Thread yidan zhao
需求 假设,我的hive表为tmp表,若干字段,如何以dt、hour、sid为分区,其中sid为渠道的含义。 我当前基于FlinkSQL从kafka表中读取数据,转写到hive表tmp中,采用流式写入,提交策略metastore、success-file,触发假设用process-time,delay为1h。 检查点每1min检查一次,连续2次检查点间隔10min,本质就是10min做一次检查点。 当前情况 由于数据量较大,kafka分区数量为60,因此我的任务并发可以选择60以内,假设并发也选了60。

New blog post published - Sort-Based Blocking Shuffle Implementation in Flink

2021-11-02 Thread Daisy Tsang
Hey everyone, we have a new two-part post published on the Apache Flink blog about the sort-based blocking shuffle implementation in Flink. It covers benchmark results, design and implementation details, and more! We hope you like it and welcome any sort of feedback on it. :)

Re: Flink + K8s

2021-11-02 Thread Austin Cawley-Edwards
Hi Rommel, That’s correct that K8s will restart the JM pod (assuming it’s been created by a K8s Job or Deployment), and it will pick up the HA data and resume work. The only use case for having multiple replicas is faster failover, so you don’t have to wait for K8s to provision that new pod

What is Could not retrieve file from transient blob store?

2021-11-02 Thread John Smith
Hi running Flink 1.10.0 With 3 zookeepers, 3 job nodes and 3 task nodes. and I saw this exception on the job node logs... 2021-11-02 23:20:22,703 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler - Unhandled exception. org.apache.flink.util.FlinkException: Could not

Re: Question on BoundedOutOfOrderness

2021-11-02 Thread Alexey Trenikhun
Hi Oliver, I believe you also need to do sort, out of order ness watermark strategy only “postpone” watermark for given expected maximum of out of orderness. Check Ververica example -

Re: How to refresh topics to ingest with KafkaSource?

2021-11-02 Thread Mason Chen
Hi Arvid, I have some bandwidth to contribute to this task and am familiar with the code. Could you or another committer assign me this ticket? Thanks, Mason > On Oct 30, 2021, at 5:24 AM, Arvid Heise wrote: > > Hi Mason, > > thanks for creating that. > > We are happy to take contribuitons

Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-11-02 Thread Fuyao Li
I tried to remove the Re: [EXTERNAL] label just now, but it seems to break the thread on my side. I guess I can’t do much on my side regarding this. The email is forced to add such tag at the company level. Anyways, l guess we can continue to discuss the issue in this thread. Thanks, Fuyao

Is there a way to update checkpoint configuration for a job "in-place"?

2021-11-02 Thread Kevin Lam
Hi all, We run a Flink application on Kubernetes in Application Mode using Kafka with exactly-once-semantics and high availability. We are looking into a specific failure scenario: a flink job that has too short a checkpoint timeout (execution.checkpointing.timeout) and at some point during the

Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-11-02 Thread Fuyao Li
Hi David, Nicolaus, Thanks for the reply. 1. For your first question, Yes. I want to use the checkpoint to stop and restart the application. I think this is similar to the Reactive mode strategy, right? (I don’t know the exact implementation behind the Reactive mode). From your

Flink + K8s

2021-11-02 Thread Rommel Holmes
Hi, >From my understanding, when i set Flink in HA mode in K8s, I don't need to setup more than 1 job manager, because once the job manager dies, K8s will restart it for me. Is that the correct understanding or for the HA purpose, I still need to setup more than 1 job manager? Thanks. Rommel

Question on BoundedOutOfOrderness

2021-11-02 Thread Oliver Moser
Hi! I am investigating the use of Flink for a new project and started some simple demos. Currently I am stuck at the point where I need to deal with events arriving out of order based on their event time. I’ve spent quite some time researching on SO, the docs, the Ververica training

Re: NoClassDefFoundError for RMIServerImpl when running Flink with Scala 2.12 and Java 11

2021-11-02 Thread L. C. Hsieh
Hi Nicolaus, Thanks for your reply. It turns out to be the Java distribution used in the base image. I changed the base image and it works now. On Tue, Nov 2, 2021 at 10:14 AM Nicolaus Weidner wrote: > > Hi, > > I tried building Flink 1.13 with the Scala 2.12 profile and running some of > the

Re: NoClassDefFoundError for RMIServerImpl when running Flink with Scala 2.12 and Java 11

2021-11-02 Thread Nicolaus Weidner
Hi, I tried building Flink 1.13 with the Scala 2.12 profile and running some of the examples with Java 11, without encountering the issue you describe (with or without HA). Can you give more details on how exactly you built Flink (ideally the full command), and how you ran the job? Best, Nico

Data Stream countWindow followed by keyBy does not preserve time order

2021-11-02 Thread Yan Shen
Hi all, Can anyone advise on this? I wrote a simple test of the countWindow method (in Kotlin) as below package aero.airlab.flinkjobs.headingreminder import org.apache.flink.api.common.RuntimeExecutionMode import org.apache.flink.api.common.eventtime.WatermarkStrategy import

Re: Reactive mode in 1.13

2021-11-02 Thread Till Rohrmann
Hi Ravi, I think you also need to make the tornado.jar available to the TaskExecutor processes (e.g. putting them into the usrlib or lib directory where you started the process). When using the application mode, then Flink assumes that all processes have access to the user code jar. That's why

Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-11-02 Thread Nicolaus Weidner
Hi Fuyao, About your second question: You are right that taking and restoring from savepoints will incur a performance loss. They cannot be incremental, and cannot use native (low-level) data formats - for now. These issues are on the list of things to improve for Flink 1.15, so if the changes

Statefun remote functions - acessing kafka headers from a remote function

2021-11-02 Thread Filip Karnicki
Hi, is there a neat way to access kafka headers from within a remote function without using the datastream api to insert the headers as part of a RoutableMessage payload? Many thanks Fil

Re: Statefun embedded functions - parallel per partition, sequential per key

2021-11-02 Thread Filip Karnicki
Hi All Just an update for future reference, it turned out that the machine we were using for this test didn't have enough memory for what we were asking it to do. It was that simple. The upside is that not even with the world's most unstable cluster did we manage to lose a single message. Just

Re: [External] : Re: Possibility of supporting Reactive mode for native Kubernetes application mode

2021-11-02 Thread David Morávek
> > Similar to Reactive mode, checkpoint must be enabled to support such > functionality. ... > Wouldn't that mean tearing down the whole Flink cluster in order to re-scale? That could be quite costly. We're aiming to speed-up the recovery process for the reactive mode and this would most likely

Re: 创建表t1的视图v1之后rowtime属性丢失

2021-11-02 Thread godfrey he
可以把具体的sql发出来看看 yidan zhao 于2021年11月2日周二 下午7:06写道: > > 如题,我原先基于flink1.11和1.12貌似没这个问题。目前基于1.13出现这个问题。 > 问题描述如下: > 我t1是kafka表,其中有个属性是event_time是row time属性,然后创建了view v1,通过select , > event_time from t1这样创建。 现在问题是这么创建之后,我基于v1查询报错说aggre.. window只能在time > attributes上定义。 >

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Fabian Paul
Hi Yuval, Ok no worries. One thing I would first check is why the TwoPhaseCommitSinkFunction is instantiated because the KafkaSink is not using it. It seems there is still an old FlinkKafkaProducer build somewhere. Best, Fabian

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Yuval Itzchakov
Hi Fabian, The program is part of a very large Flink infrastructure we have in place so unfortunately sharing I can't share it. But perhaps point me to more concrete information you'd like to receive? On Tue, Nov 2, 2021 at 1:51 PM Fabian Paul wrote: > Hi Yuval, > > This error looks indeed

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Fabian Paul
Hi Yuval, This error looks indeed strange. I do not think when switching to the unified KafkaSink the old serializer should be invoked at all. Can you maybe share more information about the job you are using or maybe share the program so that we can reproduce it? Best, Fabian

Re: Converting a Table to a DataStream[RowData] instead of DataStream[Row] with `toDataStream`

2021-11-02 Thread Yuval Itzchakov
It works! Thank you Nicolaus. On Tue, Nov 2, 2021 at 12:40 PM Nicolaus Weidner < nicolaus.weid...@ververica.com> wrote: > Hi Yuval, > > Can you try > toDataStream[RowData](tableSchema.toPhysicalRowDataType.bridgedTo(classOf[RowData]))? > > Best regards, > Nico > > On Thu, Oct 28, 2021 at 10:15

Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Yuval Itzchakov
Hi, I'm trying to upgrade an application from Flink 1.14 + using the new KafkaSink while restoring from a checkpoint. I changed the UID of the KafkaSink completely and ran the application with --allow-non-restored-state. However, when restoring, I still run into the following error: Caused by:

创建表t1的视图v1之后rowtime属性丢失

2021-11-02 Thread yidan zhao
如题,我原先基于flink1.11和1.12貌似没这个问题。目前基于1.13出现这个问题。 问题描述如下: 我t1是kafka表,其中有个属性是event_time是row time属性,然后创建了view v1,通过select , event_time from t1这样创建。 现在问题是这么创建之后,我基于v1查询报错说aggre.. window只能在time attributes上定义。 不清楚是版本变化导致,还是我其他地方搞错了呢。

Re: Converting a Table to a DataStream[RowData] instead of DataStream[Row] with `toDataStream`

2021-11-02 Thread Nicolaus Weidner
Hi Yuval, Can you try toDataStream[RowData](tableSchema.toPhysicalRowDataType.bridgedTo(classOf[RowData]))? Best regards, Nico On Thu, Oct 28, 2021 at 10:15 PM Yuval Itzchakov wrote: > Flink 1.14 > Scala 2.12.5 > > Hi, > I want to be able to convert a Table into a DataStream[RowData]. I need

Re: Reactive mode in 1.13

2021-11-02 Thread Till Rohrmann
Hi Ravi, The reactive mode shouldn't do things differently compared to a normal application cluster deployment. Maybe you can show us exactly how you submit a job, the contents of the bundled jar, how you build the fat jar and the logs of the failed Flink run. Moving this discussion to the user

Re: New blog post published - Sort-Based Blocking Shuffle Implementation in Flink

2021-11-02 Thread David Morávek
Thanks Daisy and Kevin for a great write up! ;) Especially the 2nd part was really interesting, I really like the idea of the single spill file with a custom scheduling of read requests. Best, D. On Mon, Nov 1, 2021 at 10:01 AM Daisy Tsang wrote: > Hey everyone, we have a new two-part post

Re: Kryo Serialization issues in Flink Jobs.

2021-11-02 Thread Timo Walther
Hi Prasanna, it could be a bug where the ExecutionConfig is not forwarded properly to all locations where the KryoSerializer is used. As a first step for debugging, I would recommend to create a custom TypeInformation (most methods are not relevant except for createTypeSerializer and

Re: 关于FlinkSQL从kafka读取数据写到hive的一些问题

2021-11-02 Thread yidan zhao
thanks Tony Wei 于2021年11月2日周二 下午1:12写道: > Hi yidan, > > 你可以試試 SQL Hints [1]. > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/hints/ > > > yidan zhao 於 2021年11月2日 週二 下午1:03寫道: > > > 嗯嗯,hive catalog的确不需要重新建表,但是我的场景是:我需要通过 flinkSQL 流式将 kafka 表数据写入

退订

2021-11-02 Thread 李芳奎
退订 felix felix_...@163.com