Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Martijn Visser
Hi Kartik, I don't think there's much that the Flink community can do here to help you. The Solace source and sink aren't owned by the Flink project, and based on the source code they haven't been touched for the last 7 years [1] and I'm actually not aware of anyone who uses Solace at all. Best

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Kartik Kushwaha
Any help here please. Regards, Kartik On Fri, Feb 9, 2024, 8:33 AM Kartik Kushwaha wrote: > I am using flink checkpointing to restore states of my job. I am using > unaligned checkpointing with 100 ms as the checkpointing interval. I see > few events getting dropped that were sucessfully

RE: Flink connection with AWS OpenSearch Service

2024-02-12 Thread Praveen Chandna via user
Hello Request you to please update. Thanks !! From: Praveen Chandna via user Sent: Friday, February 9, 2024 2:46 PM To: Praveen Chandna via user Subject: Flink connection with AWS OpenSearch Service Hello As per the Flink OpenSearch connector documentation, it specify how to connect to

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-08 Thread David Anderson
For a collection of several complete sample applications using Flink with Kafka, see https://github.com/confluentinc/flink-cookbook. And I agree with Marco -- in fact, I would go farther, and say that using Spring Boot with Flink is an anti-pattern. David On Wed, Feb 7, 2024 at 4:37 PM Marco

RE: 退订

2024-02-08 Thread Jiabao Sun
Please send email to user-unsubscr...@flink.apache.org if you want to unsubscribe the mail from user@flink.apache.org, and you can refer [1][2] for more details. 请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自 user@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。 Best, Jiabao [1]

Re: sink upsert materializer in SQL job

2024-02-08 Thread Marek Maj
Hi Xuyang, Thank you for the explanation, table.exec.sink.upsert-materialize = FORCE config was set unnecessarily, I just redeployed the job and confirmed that when using default AUTO, materializer is still on Thank you for the example you provided. My understanding of upsert key was exactly as

Re: Entering a busy loop when adding a new sink to the graph + checkpoint enabled

2024-02-07 Thread Daniel Peled
Good morning, Any updates/progress on this issue ? BR, Danny ‫בתאריך יום א׳, 4 בפבר׳ 2024 ב-13:20 מאת ‪Daniel Peled‬‏ <‪ daniel.peled.w...@gmail.com‬‏>:‬ > Hello, > > We have noticed that when we add a *new kafka sink* operator to the > graph, *and start from the last save point*, the operator

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-07 Thread Marco Villalobos
Hi Nida, You can find sample code for using Kafka here: https://kafka.apache.org/documentation/ You can find sample code for using Flink here: https://nightlies.apache.org/flink/flink-docs-stable/ You can find sample code for using Flink with Kafka here:

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-06 Thread Alexis Sarda-Espinosa
Hello, check this thread from some months ago, but keep in mind that it's not really officially supported by Flink itself: https://lists.apache.org/thread/l0pgm9o2vdywffzdmbh9kh7xorhfvj40 Regards, Alexis. Am Di., 6. Feb. 2024 um 12:23 Uhr schrieb Fidea Lidea < lideafidea...@gmail.com>: > Hi

Re: Idleness not working if watermark alignment is used

2024-02-06 Thread Alexis Sarda-Espinosa
added latency) > > > > Thias > > > > *From:* Alexis Sarda-Espinosa > *Sent:* Tuesday, February 6, 2024 9:48 AM > *To:* Schwalbe Matthias > *Cc:* user > *Subject:* Re: Idleness not working if watermark alignment is used > > > > ⚠*EXTERNAL MESSAGE – **

RE: Idleness not working if watermark alignment is used

2024-02-06 Thread Schwalbe Matthias
Subject: Re: Idleness not working if watermark alignment is used ⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠ Hi Matthias, thanks for looking at this. Would you then say this comment in the source code is not really valid? https://github.com/apache/flink/blob/release-1.18/flink-runtime/src

Re: Idleness not working if watermark alignment is used

2024-02-06 Thread Alexis Sarda-Espinosa
leWatermarkExcemption(…) to > make it more obvious. > > > > Hope this helps > > > > > > Thias > > > > > > > > *From:* Alexis Sarda-Espinosa > *Sent:* Monday, February 5, 2024 6:04 PM > *To:* user > *Subject:* Re: Idleness not working if water

RE: Idleness not working if watermark alignment is used

2024-02-05 Thread Schwalbe Matthias
rather rename .withIdleness(…) to something like .idleWatermarkExcemption(…) to make it more obvious. Hope this helps Thias From: Alexis Sarda-Espinosa Sent: Monday, February 5, 2024 6:04 PM To: user Subject: Re: Idleness not working if watermark alignment is used Ah and I forgot to mention

Re:Re: Re: DESCRIBE CATALOG not available?

2024-02-05 Thread Xuyang
Hi, Yunhong. Thanks for your volunteering :) -- Best! Xuyang 在 2024-02-06 09:26:55,"yh z" 写道: Hi, Xuyang, I hope I can also participate in the development of the remaining flip features. Please cc me if there are any further developments. Thank you ! Xuyang 于2024年1月29日周一

Re: Idleness not working if watermark alignment is used

2024-02-05 Thread Alexis Sarda-Espinosa
Ah and I forgot to mention, this is with Flink 1.18.1 Am Mo., 5. Feb. 2024 um 18:00 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hello, > > I have 2 Kafka sources that are configured with a watermark strategy > instantiated like this: > >

Re: Flink任务链接信息审计获取

2024-02-03 Thread Feng Jin
我理解应该是平台统一配置在 flink-conf.yaml 即可, 不需要用户单独配置相关参数. Best, Feng On Sun, Feb 4, 2024 at 11:19 AM 阿华田 wrote: > 看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难 > > > | | > 阿华田 > | > | > a15733178...@163.com > | > 签名由网易邮箱大师定制 > > > 在2024年02月2日 19:38,Feng Jin 写道: > hi, > > 可以参考下

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Benchao Li
Broadcast streaming join is a very interesting addition to streaming SQL, I'm glad to see it's been brought up. One of the major difference between streaming and batch is state. Regular join uses "Keyed State" (the key is deduced from join condition), so for a regular broadcast streaming join, we

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Shengkai Fang
+1 a FLIP to clarify the idea. Please be careful to choose which type of state you use here. The doc[1] says the broadcast state doesn't support RocksDB backend here. Best, Shengkai [1]

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread David Anderson
I've seen enough demand for a streaming broadcast join in the community to justify a FLIP -- I think it's a good idea, and look forward to the discussion. David On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote: > +1 a FLIP for this topic. > > > Best, > Feng > > On Fri, Feb 2, 2024 at 10:26 PM

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Feng Jin
+1 a FLIP for this topic. Best, Feng On Fri, Feb 2, 2024 at 10:26 PM Martijn Visser wrote: > Hi, > > I would definitely expect a FLIP on this topic before moving to > implementation. > > Best regards, > > Martijn > > On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote: > >> Hi, Prabhjot. >> >>

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Martijn Visser
Hi, I would definitely expect a FLIP on this topic before moving to implementation. Best regards, Martijn On Fri, Feb 2, 2024 at 12:47 PM Xuyang wrote: > Hi, Prabhjot. > > IIUC, the main reasons why the community has not previously considered > supporting join hints only in batch mode are as

Re: Flink任务链接信息审计获取

2024-02-02 Thread Feng Jin
hi, 可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析 Source 和 Sink 拿到血缘信息。 [1] https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/main/java/io/openlineage/flink/OpenLineageFlinkJobListener.java Best, Feng On Fri, Feb 2, 2024 at 6:36 PM 阿华田

Re: Batch模式下,StatefulSinkWriter如何存储状态,以保证在failover或者job restart的情况避免从头读取数据

2024-02-02 Thread jinzhuguang
"file:\\d:\\abc")); > > > 发件人: jinzhuguang > 发送时间: 2024-02-02 17:16 > 收件人: user-zh > 主题: Re: Batch模式下,StatefulSinkWriter如何存储状态,以保证在failover或者job restart的情况避免从头读取数据 > 你是在batch模式下手动开启了checkpoint吗 > >> 2024年2月2日 17:11,ha.fen...@aisino.com 写道: >> >> 今天正好测试了这

Re: Jobmanager restart after it has been requested to stop

2024-02-02 Thread Yang Wang
If you could find the "Deregistering Flink Kubernetes cluster, clusterId" in the JobManager log, then it is not the expected behavior. Having the full logs of JobManager Pod before restarted will help a lot. Best, Yang On Fri, Feb 2, 2024 at 1:26 PM Liting Liu (litiliu) via user <

Re: Batch模式下,StatefulSinkWriter如何存储状态,以保证在failover或者job restart的情况避免从头读取数据

2024-02-02 Thread tanjialiang
按我的理解,streaming模式去读是允许checkpoint的(具体看各个connector的checkpoint逻辑),batch模式是一个阶段一个阶段的跑的,上一个task跑完的结果会放到磁盘等待下一个task拉取,task失败了就重新拉取上一个task的结果重新跑(没有看源码里具体实现细节,纯属个人的猜测,有懂行的大佬们可以纠正) 回复的原邮件 | 发件人 | ha.fen...@aisino.com | | 发送日期 | 2024年2月2日 17:21 | | 收件人 | user-zh | | 主题 | Re: Re: Batch模式下

Re: Batch模式下,StatefulSinkWriter如何存储状态,以保证在failover或者job restart的情况避免从头读取数据

2024-02-02 Thread jinzhuguang
你是在batch模式下手动开启了checkpoint吗 > 2024年2月2日 17:11,ha.fen...@aisino.com 写道: > > 今天正好测试了这个问题,开启checkpoint后,读取一个文件内容,在checkpoints有记录时,停止程序,然后再从checkpoint读取启动,读取的记录并不是从最开始,这说明批处理下也会自动保存状态。 > > 发件人: jinzhuguang > 发送时间: 2024-02-02 16:47 > 收件人: user-zh > 主题:

RE: RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-02-01 Thread Kirti Dhar Upadhyay K via user
Thanks Jiabao and Yaroslav for your quick responses. Regards, Kirti Dhar From: Yaroslav Tkachenko Sent: 01 February 2024 21:42 Cc: user@flink.apache.org Subject: Re: RE: Flink Kafka Sink + Schema Registry + Message Headers The schema registry support is provided

Re: Parallelism and Target TPS

2024-02-01 Thread Zhanghao Chen
Hi Patricia, Flink will create one Kafka consumer per parallelism, however, you'll need some testing to measure the capability of a single task. Usu, one consumer can consume at a much higher rate than 1 record per second. Best, Zhanghao Chen From: patricia lee

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Prabhjot Bharaj via user
Hi Feng, Thanks for your prompt response. If we were to solve this in Flink, my higher level viewpoint is: 1. First to implement Broadcast join in Flink Streaming SQL, that works across Table api (e.g. via a `left.join(right, , join_type="broadcast") 2. Then, support a Broadcast hint that would

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Feng Jin
Hi Prabhjot I think this is a reasonable scenario. If there is a large table and a very small table for regular join, without broadcasting the regular join, it can easily cause data skew. We have also encountered similar problems too. Currently, we can only copy multiple copies of the small table

Re: RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-02-01 Thread Yaroslav Tkachenko
Flink version? >> > >> > Also, any help on question 1 regarding Schema Registry? >> > >> > Regards, >> > Kirti Dhar >> > >> > -Original Message- >> > From: Jiabao Sun >> > Sent: 01 February 2024 1

Re: RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-02-01 Thread Yaroslav Tkachenko
on question 1 regarding Schema Registry? > > > > Regards, > > Kirti Dhar > > > > -Original Message- > > From: Jiabao Sun > > Sent: 01 February 2024 13:29 > > To: user@flink.apache.org > > Subject: RE: Flink Kafka Sink +

RE: RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-02-01 Thread Jiabao Sun
- > From: Jiabao Sun > Sent: 01 February 2024 13:29 > To: user@flink.apache.org > Subject: RE: Flink Kafka Sink + Schema Registry + Message Headers > > Hi Kirti, > > Kafka Sink supports sending messages with headers. > You should implement a HeaderProvider to

RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-02-01 Thread Kirti Dhar Upadhyay K via user
: Jiabao Sun Sent: 01 February 2024 13:29 To: user@flink.apache.org Subject: RE: Flink Kafka Sink + Schema Registry + Message Headers Hi Kirti, Kafka Sink supports sending messages with headers. You should implement a HeaderProvider to extract headers from input element. KafkaSink sink

RE: Flink Kafka Sink + Schema Registry + Message Headers

2024-01-31 Thread Jiabao Sun
Hi Kirti, Kafka Sink supports sending messages with headers. You should implement a HeaderProvider to extract headers from input element. KafkaSink sink = KafkaSink.builder() .setBootstrapServers(brokers) .setRecordSerializer(KafkaRecordSerializationSchema.builder()

Re: Redis as a State Backend

2024-01-31 Thread David Anderson
When it comes to decoupling the state store from Flink, I suggest taking a look at FlinkNDB, which is an experimental state backend for Flink that puts the state into an external distributed database. There's a Flink Forward talk [1] and a master's thesis [2] available. [1]

Re: Redis as a State Backend

2024-01-31 Thread Chirag Dewan via user
Thanks Zakelly and Junrui. I was actually exploring RocksDB as a state backend and I thought maybe Redis could offer more features as a state backend. For e.g. maybe state sharing between operators, geo-red of state, partitioning etc. I understand these are not native use cases for Flink, but

Re: Redis as a State Backend

2024-01-30 Thread Zakelly Lan
And I found some previous discussion, FYI: 1. https://issues.apache.org/jira/browse/FLINK-3035 2. https://www.mail-archive.com/dev@flink.apache.org/msg10666.html Hope this helps. Best, Zakelly On Tue, Jan 30, 2024 at 4:08 PM Zakelly Lan wrote: > Hi Chirag > > That's an interesting idea. IIUC,

Re: Redis as a State Backend

2024-01-30 Thread Zakelly Lan
Hi Chirag That's an interesting idea. IIUC, storing key-values can be simply implemented for Redis, but supporting checkpoint and recovery is relatively challenging. Flink's checkpoint should be consistent among all stateful operators at the same time. For an *embedded* and *file-based* key value

Re: Redis as a State Backend

2024-01-29 Thread Junrui Lee
Hi Chirag, Indeed, the possibility of using Redis as a state backend for Flink has been considered in the past. You can find a detailed discussion about this topic in the JIRA issue FLINK-3035[1] as well as in the comments section of this PR[2]. The outcome of these discussions was that Redis is

RE: Re: Request to provide example codes on ElasticsearchSinkFunction updaterequest

2024-01-29 Thread Jiabao Sun
Hi Fidea, When specifying an ID, the IndexedRequest[1] can perform a complete overwrite. If partial update is needed, the UpdateRequest[2] can be used. @Override public void process( Tuple2 element, RuntimeContext ctx, RequestIndexer indexer) { UpdateRequest updateRequest = new

Re: Request to provide example codes on ElasticsearchSinkFunction updaterequest

2024-01-29 Thread Fidea Lidea
Hi Jiabao & Jiadong, Could you please share examples on how to "*update*" data using ElasticsearchSink? Thanks On Mon, Jan 29, 2024 at 9:07 PM Jiabao Sun wrote: > Hi Fidea, > > I found some examples in the Java documentation, and I hope they can be > helpful. > > private static class

RE: Request to provide example codes on ElasticsearchSinkFunction updaterequest

2024-01-29 Thread Jiabao Sun
Hi Fidea, I found some examples in the Java documentation, and I hope they can be helpful. private static class TestElasticSearchSinkFunction implements ElasticsearchSinkFunction> { public IndexRequest createIndexRequest(Tuple2 element) { Map json = new HashMap<>();

Re: Request to provide example codes on ElasticsearchSinkFunction updaterequest

2024-01-29 Thread jiadong.lu
Hi, Fidea. The ElasticsearchSinkFunction Class has already marked as deprecated[1], please try to use ElasticsearchSink Class. Hope to help you. Best. Jiadong.Lu 1.

Re: Elasticsearch Sink 1.17.2 error message

2024-01-28 Thread archzi lu
Hi Tauseef, This error is because your Class com.hds.alta.pipeline.model.TopologyDTO cannot be serialized by ES xcontent util. The following solutions may fix it. 1. convert your TopologyDTO class data to a Map, and avoid using some custom Class that cannot be serialized by ES. or 2. make your

Re: Apache Flink lifecycle and Java 17 support

2024-01-28 Thread xiangyu feng
ed Java 17 in production, including ByteDance as mentioned by Xiangyu. > > Best, > Zhanghao Chen > -- > *From:* Deepti Sharma S via user > *Sent:* Friday, January 26, 2024 22:56 > *To:* xiangyu feng > *Cc:* user@flink.apache.org > *Subject:

Re: Apache Flink lifecycle and Java 17 support

2024-01-28 Thread Zhanghao Chen
a 17 in production, including ByteDance as mentioned by Xiangyu. Best, Zhanghao Chen From: Deepti Sharma S via user Sent: Friday, January 26, 2024 22:56 To: xiangyu feng Cc: user@flink.apache.org Subject: RE: Apache Flink lifecycle and Java 17 support Hello Xi

Re: DESCRIBE CATALOG not available?

2024-01-28 Thread Hang Ruan
Hi, Robin. I see that the `DESCRIBE CATALOG` sql is not list in the DESCRIBE document[1]. It is not available. Besides this, I checked the changes in Catalog.java from commits on May 9, 2019. I cannot find the method `explainCatalog` introduced from this FLIP. This FLIP is not finished yet.

Re: Long execution of SQL query to Kafka + Hive (q77 TPC-DS)

2024-01-26 Thread Вова Фролов
One of the MultiInput operators works for 12 minutes. The screenshot shows all stages of Flink job. [image: 3.png] Q77 Query of TPC-DS Benchmark. All *_sales and *_returns tables(6 tables) are read from Kafka, and the remaining 3 tables ( date_dim, web_page, store) from Hive. with ss as

RE: Apache Flink lifecycle and Java 17 support

2024-01-26 Thread Deepti Sharma S via user
is supported by Flink community with Flink 1.18? Can we use this combination in our commercial product release? Regards, Deepti Sharma From: xiangyu feng Sent: 24 January 2024 18:11 To: Deepti Sharma S Cc: user@flink.apache.org Subject: Re: Apache Flink lifecycle and Java 17 support Hi Deepti

Re:回复: flink ui 算子数据展示一直loading...

2024-01-25 Thread Xuyang
Hi, 手动curl 有问题的metric的接口,出来的数据正常吗? JM log里有发现什么异常么? -- Best! Xuyang 在 2024-01-26 11:51:53,"阿华田" 写道: >这个维度都排查了 都正常 > > >| | >阿华田 >| >| >a15733178...@163.com >| >签名由网易邮箱大师定制 > > >在2024年01月23日 21:57,Feng Jin 写道: >可以尝试着下面几种方式确认下原因: > > >1. > >打开浏览器开发者模式,看是否因为请求某个接口卡住 >2. > >查看下

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-25 Thread Jing Ge via user
Hi folks, The bug has been fixed and PR at docker-library/official-images has been merged. The official images are available now. Best regards, Jing On Mon, Jan 22, 2024 at 11:39 AM Jing Ge wrote: > Hi folks, > > I am still working on the official images because of the issue >

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-25 Thread Jing Ge
Hi folks, The bug has been fixed and PR at docker-library/official-images has been merged. The official images are available now. Best regards, Jing On Mon, Jan 22, 2024 at 11:39 AM Jing Ge wrote: > Hi folks, > > I am still working on the official images because of the issue >

RE: Elasticsearch Sink 1.17.2 error message

2024-01-25 Thread Jiabao Sun
Hi Tauseef, We cannot directly write POJO types into Elasticsearch. You can try serializing the TopologyDTO into a JSON string like Jackson before writing it. public static void main(String[] args) throws IOException { try (RestHighLevelClient client = new RestHighLevelClient(

RE: 回复:RE: how to get flink accumulated sink record count

2024-01-25 Thread Jiabao Sun
ing the metric in Flink tasks. > > > > > --原始邮件-- > 发件人: "Jiabao Sun" 发送时间: 2024年1月25日(星期四) 下午3:11 > 收件人: "user" 主题: RE: how to get flink accumulated sink record count > > > > > > I guess get

Re: Long execution of SQL query to Kafka + Hive (q77 TPC-DS)

2024-01-25 Thread Ron liu
Hi, Can you help to explain the q77 execution plan? And find which operator takes a long time in flink UI? Best Ron Вова Фролов 于2024年1月24日周三 09:09写道: > Hello, > > I am executing a heterogeneous SQL query (part of the data is in Hive > and part in Kafka. The query utilizes TPC-DS benchmark

??????RE: how to get flink accumulated sink record count

2024-01-25 Thread Enric Ott
Thanks,Jiabao,but what I mean is capturing the metric in Flink tasks. ---- ??: "Jiabao Sun"https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/metrics/ [2]

RE: how to get flink accumulated sink record count

2024-01-24 Thread Jiabao Sun
Hi Enric, I guess getting the metrics[1] might be helpful for you. You can query the numRecordsOut metric by Metrics Reporter[2] or REST API[3]. Best, Jiabao [1] https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/metrics/ [2]

Re: Re: failed when job graph change

2024-01-24 Thread nick toker
Hi i adding a new sink that was not exists in the graph 1. stop with save point 2. run the the new graph with the new sink operator ( from save point) in this case the job stuck in initializing forever because cant complete transaction ( on the new operator , kafka topic) i dont understand how

Re: Re: failed when job graph change

2024-01-24 Thread Feng Jin
Hi nick If you want to modify the sink operator , I think you can modify the uid of the operator to avoid restoring the state that does not belong to it. Best, Feng On Thu, Jan 25, 2024 at 1:19 AM nick toker wrote: > hi > > i didn't found anything in the log > but i found that it happened

Re: Re: failed when job graph change

2024-01-24 Thread nick toker
hi i didn't found anything in the log but i found that it happened when i add a new sink operator and because i work with checkpoints the flink can't finish the transaction ( the new topic in kafka not part of the transaction before i added the new sink operator) so i must cancel the job to

Re: Apache Flink lifecycle and Java 17 support

2024-01-24 Thread xiangyu feng
Hi Deepti, For the lifecycle of Flink 1.x version and more information about Flink 2.0, pls refer to Flink RoadMap[1] and this discussion thread[2]. Flink 1.18 currently has experimental support for Java17[3]. [1] https://flink.apache.org/what-is-flink/roadmap/ [2]

Re: 实时数仓场景落地问题

2024-01-23 Thread xiaohui zhang
实时数仓落地建议先动手做一两个场景真实应用起来,见过好几个项目一开始目标定得过大,实时数仓、流批一体、数据管控啥的都规划进去,结果项目陷入无尽的扯皮,架构设计也如空中楼阁。 实践过程中不要太过于向已有数仓分层模型靠拢,从源系统直接拼接宽表到dws层就足以应付大部分需求了。下游应用再用MPP来满足业务层的实时聚合、BI需求。 等立了几个烟囱,自己项目的实时数仓怎么做也基本有了思路

Re: RocksDB增量模式checkpoint大小持续增长的问题

2024-01-23 Thread yuanfeng hu
> 2024年1月18日 14:59,fufu 写道: > > 看hdfs上shard文件比chk-xxx要大很多。 > > > > 在 2024-01-18 14:49:14,"fufu" 写道: > > 是datastream作业,窗口算子本身没有设置TTL,其余算子设置了TTL,是在Flink > UI上看到窗口算子的size不断增大,一天能增加个600~800M,持续不断的增大。以下图为例:ID为313的cp比ID为304的大了将近10M,一直运行,会一直这么增加下去。cp文件和rocksdb文件正在看~ > > 在 2024-01-18

Re: Sending key with the event

2024-01-23 Thread Yaroslav Tkachenko
Hi Oscar, The only way you can define the Kafka message key is by providing a custom KafkaRecordSerializationSchema to your FlinkKafkaSink via "setRecordSerializer" method. KafkaRecordSerializationSchema.serialize method return a ProducerRecord, so you can set things like the message key,

Re: flink ui 算子数据展示一直loading...

2024-01-23 Thread Feng Jin
可以尝试着下面几种方式确认下原因: 1. 打开浏览器开发者模式,看是否因为请求某个接口卡住 2. 查看下 JobManager 的 GC 情况,是否频繁 FullGC 3. 查看下 JobManager 的日志,是否存在某些资源文件丢失或者磁盘异常情况导致 web UI 无法访问. Best, Feng On Tue, Jan 23, 2024 at 6:16 PM 阿华田 wrote: > > >

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-22 Thread Jing Ge
Hi folks, I am still working on the official images because of the issue https://issues.apache.org/jira/browse/FLINK-34165. Images under apache/flink are available. Best regards, Jing On Sun, Jan 21, 2024 at 11:06 PM Jing Ge wrote: > Thanks Leonard for the feedback! Also thanks @Jark Wu >

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-22 Thread Jing Ge via user
Hi folks, I am still working on the official images because of the issue https://issues.apache.org/jira/browse/FLINK-34165. Images under apache/flink are available. Best regards, Jing On Sun, Jan 21, 2024 at 11:06 PM Jing Ge wrote: > Thanks Leonard for the feedback! Also thanks @Jark Wu >

RE: Re:RE: Re:RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-22 Thread Jiabao Sun
Hi, ResumeToken[1] can be considered globally unique[2]. Best, Jiabao [1] https://www.mongodb.com/docs/manual/changeStreams/#resume-tokens [2] https://img.alicdn.com/imgextra/i4/O1CN010e81SP1vkgoyL0nhd_!!66211-0-tps-2468-1360.jpg On 2024/01/22 09:36:42 "casel.chen" wrote: > > > >

Re: Why calling ListBucket for each file in a checkpoint

2024-01-21 Thread Zakelly Lan
Are you accessing the s3 API with presto implementation? If so, you may read the code of `com.facebook.presto.hive.s3.PrestoS3FileSystem#create` and find it check the existence of the target path first, in which the `getFileStatus` and `listPrefix` are called. There is no option for this. Best,

RE: Re:RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 Thread Jiabao Sun
Hi, Flink CDC MongoDB connector V1是基于 mongo-kafka 实现的,没有基于 debezium 实现。 Flink CDC MongoDB connector V2是基于增量快照读框架实现的,不依赖 mongo-kafka 和 debezium 。 Best, Jiabao [1] https://github.com/mongodb/mongo-kafka On 2024/01/22 02:57:38 "casel.chen" wrote: > > > > > > > > > > Flink CDC MongoDB

RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 Thread Jiabao Sun
Hi, 可以参考 Flink CDC MongoDB connector 的实现。 Best, Jiabao On 2024/01/22 02:06:37 "casel.chen" wrote: > 现有一种数据源不在debezium支持范围内,需要通过flink sql全增量一体消费,想着基于flink cdc > 3.x自行开发,查了一下现有大部分flink cdc source > connector都是基于debezium库开发的,只有oceanbase和tidb不是,但后二者用的source接口还是V1版本的,不是最新V2版本的incremental >

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Jing Ge via user
Thanks Leonard for the feedback! Also thanks @Jark Wu @Chesnay Schepler and each and everyone who worked closely with me for this release. We made it together! Best regards, Jing On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu wrote: > Thanks Jing for driving the release, nice work! > > Thanks

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Jing Ge
Thanks Leonard for the feedback! Also thanks @Jark Wu @Chesnay Schepler and each and everyone who worked closely with me for this release. We made it together! Best regards, Jing On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu wrote: > Thanks Jing for driving the release, nice work! > > Thanks

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Leonard Xu
Thanks Jing for driving the release, nice work! Thanks all who involved this release! Best, Leonard > 2024年1月20日 上午12:01,Jing Ge 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18 >

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Leonard Xu
Thanks Jing for driving the release, nice work! Thanks all who involved this release! Best, Leonard > 2024年1月20日 上午12:01,Jing Ge 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18 >

RE: Re:RE: binlog文件丢失问题

2024-01-19 Thread Jiabao Sun
Hi, 日志中有包含 GTID 的内容吗? 用 SHOW VARIABLES LIKE 'gtid_mode’; 确认下是否开启了GTID呢? Best, Jiabao On 2024/01/19 09:36:38 wyk wrote: > > > > > > > > > > 抱歉,具体报错和代码如下: > > > 报错部分: > Caused by: java.lang.IllegalStateException: The connector is trying to read > binlog starting at >

RE: Re: Python flink statefun

2024-01-19 Thread Jiabao Sun
Hi Alex, I think that logic is in IngressWebServer[1] and EgressWebServer[2]. Best, Jiabao [1]

Re: Python flink statefun

2024-01-19 Thread Alexandre LANGUILLAT
Thanks Sun I use now the 3.2 version and it works as described in the README tutorial! I don't see in the code where the port redirection is handled tho, eg 8090 for PUT and 8091 for GET (they are in the module.yaml but dont see where in Python it's handled). Bests, Alex Le ven. 19 janv. 2024 à

RE: binlog文件丢失问题

2024-01-19 Thread Jiabao Sun
Hi, 你的图挂了,可以贴一下图床链接或者直接贴一下代码。 Best, Jiabao On 2024/01/19 09:16:55 wyk wrote: > > > 各位大佬好: > 现在有一个binlog文件丢失问题,需要请教各位,具体问题描述如下: > > > 问题描述: > 场景: 公司mysql有两个备库: 备库1和备库2。 > 1. 现在备库1需要下线,需要将任务迁移至备库2 > 2.我正常将任务保存savepoint后,将链接信息修改为备库2从savepoint启动,这个时候提示报错binlog文件不存在问题,报错截图如下图一 >

RE: Use different S3 access key for different S3 bucket

2024-01-18 Thread Qing Lim
Thank you for the tips, I will try these out From: Josh Mahonin Sent: 18 January 2024 21:07 To: Qing Lim Cc: Jun Qin ; User Subject: Re: Use different S3 access key for different S3 bucket Oops my syntax was a bit off there, as shown in the Hadoop docs, it looks like: fs.s3a.bucket.. Josh

Re: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed.

2024-01-18 Thread Zakelly Lan
t; from a state descriptor)? It seems you are creating a state in >>>> #processElement? >>>> >>>> >>>> Best, >>>> Zakelly >>>> >>>> On Thu, Jan 18, 2024 at 3:47

Re: 退订

2024-01-18 Thread Junrui Lee
Please send email to user-unsubscr...@flink.apache.org and user-zh-unsubscr...@flink.apache.org if you want to unsubscribe the mail from user@flink.apache.org and user...@flink.apache.org, you can refer [1][2] for more details. Best, Junrui [1]

RE: Re:RE: RE: flink cdc动态加表不生效

2024-01-18 Thread Jiabao Sun
Hi, oracle cdc connector 已经接入增量快照读框架,动态加表也是可以统一去实现的。 可以去社区创建issue,也欢迎直接贡献。 Best, Jiabao On 2024/01/19 04:46:21 "casel.chen" wrote: > > > > > > > 想知道oracle cdc connector不支持动态加表的原因是什么?可否自己扩展实现呢? > > > > > > > > > > > > 在 2024-01-19 11:53:49,"Jiabao Sun" 写道: > >Hi, > > > >Oracle

Re:RE: RE: flink cdc动态加表不生效

2024-01-18 Thread casel.chen
想知道oracle cdc connector不支持动态加表的原因是什么?可否自己扩展实现呢? 在 2024-01-19 11:53:49,"Jiabao Sun" 写道: >Hi, > >Oracle CDC connector[1] 目前是不支持动态加表的。 > >Best, >Jiabao > >[1] >https://ververica.github.io/flink-cdc-connectors/release-2.4/content/connectors/oracle-cdc.html > > >On 2024/01/19

RE: 退订

2024-01-18 Thread Jiabao Sun
Please send email to user-unsubscr...@flink.apache.org if you want to unsubscribe the mail from u...@flink.apache.org, and you can refer [1][2] for more details. 请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自 u...@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。 Best, Jiabao [1]

RE: RE: flink cdc动态加表不生效

2024-01-18 Thread Jiabao Sun
Hi, Oracle CDC connector[1] 目前是不支持动态加表的。 Best, Jiabao [1] https://ververica.github.io/flink-cdc-connectors/release-2.4/content/connectors/oracle-cdc.html On 2024/01/19 03:37:41 Jiabao Sun wrote: > Hi, > > 请提供一下 flink cdc 的版本,使用的什么连接器。 > 如果方便的话,也请提供一下日志。 > 另外,table 的正则表达式可以匹配到新增的表吗? > >

RE: flink cdc动态加表不生效

2024-01-18 Thread Jiabao Sun
Hi, 请提供一下 flink cdc 的版本,使用的什么连接器。 如果方便的话,也请提供一下日志。 另外,table 的正则表达式可以匹配到新增的表吗? Best, Jiabao [1] https://ververica.github.io/flink-cdc-connectors/release-3.0/content/connectors/mysql-cdc%28ZH%29.html#id15 On 2024/01/19 03:27:22 王凯 wrote: > 在使用flink

RE: Python flink statefun

2024-01-18 Thread Jiabao Sun
Hi Alexandre, I couldn't find the image apache/flink-statefun-playground:3.3.0-1.0 in Docker Hub. You can temporarily use the release-3.2 version. Hi Martijn, did we ignore pushing it to the docker registry? Best, Jiabao [1] https://hub.docker.com/r/apache/flink-statefun-playground/tags On

Re: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed.

2024-01-18 Thread Konstantinos Karavitis
i...@gmail.com> wrote: >>> >>>> Have you ever met the following error when a flink application restarts >>>> and tries to restore the state from RocksDB? >>>> >>>> >>>> *Caused by: java.lang.UnsupportedOperationException: A ser

Re: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed.

2024-01-18 Thread Konstantinos Karavitis
gt;>> and tries to restore the state from RocksDB? >>> >>> >>> *Caused by: java.lang.UnsupportedOperationException: A serializer has >>> already been registered for the state; re-registration is not allowed. >>> at >>> org.apache.flink.runt

Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
Oops my syntax was a bit off there, as shown in the Hadoop docs, it looks like: fs.s3a.bucket.. Josh >

Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
oyment/filesystems/s3/#configure-access-credentials > > > > Thanks > > > > *From:* Jun Qin > *Sent:* 18 January 2024 10:51 > *To:* User ; Qing Lim > *Subject:* Re: Use different S3 access key for different S3 bucket > > > > Hi Qing > > The S3 crede

Re: Flink autoscaler scaling report

2024-01-18 Thread Yu Chen
Hi Yang, You can run `StandaloneAutoscalerEntrypoint`, and the scale report will print in log (info level) by LoggingEventHandler[2]. [1] flink-kubernetes-operator/flink-autoscaler-standalone/src/main/java/org/apache/flink/autoscaler/standalone/StandaloneAutoscalerEntrypoint.java at main ·

RE: Use different S3 access key for different S3 bucket

2024-01-18 Thread Qing Lim
://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/#configure-access-credentials Thanks From: Jun Qin Sent: 18 January 2024 10:51 To: User ; Qing Lim Subject: Re: Use different S3 access key for different S3 bucket Hi Qing The S3 credentials are associated with Flink SQL tables. I

Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Jun Qin
Hi Qing The S3 credentials are associated with Flink SQL tables. I assume you are talking about processing/joining from two different tables, backed up by two different S3 buckets. If so, you can provide different credentials for different tables, then use the two tables in your pipeline. Jun

RE: Flink Slow Execution

2024-01-18 Thread Qing Lim
multiple batch job throughout the day, I just initialize the cluster once at beginning of process and reuse it Best -Original Message- From: Jiabao Sun Sent: 18 January 2024 02:46 To: user@flink.apache.org Subject: RE: Flink Slow Execution Hi Dulce, MiniCluster is generally used

Re: Re:Re: RocksDB增量模式checkpoint大小持续增长的问题

2024-01-17 Thread Zakelly Lan
图挂了看不到,不然你把文字信息简单复制下来看看? 另外你的ProcessWindowFunction里是否会访问state,如果访问了,是否实现了clear方法? On Thu, Jan 18, 2024 at 3:01 PM fufu wrote: > 看hdfs上shard文件比chk-xxx要大很多。 > > > > 在 2024-01-18 14:49:14,"fufu" 写道: > > 是datastream作业,窗口算子本身没有设置TTL,其余算子设置了TTL,是在Flink >

Re: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed.

2024-01-17 Thread Zakelly Lan
rror when a flink application restarts >> and tries to restore the state from RocksDB? >> >> >> *Caused by: java.lang.UnsupportedOperationException: A serializer has >> already been registered for the state; re-registratio

Re: RocksDB增量模式checkpoint大小持续增长的问题

2024-01-17 Thread Zakelly Lan
你好,能提供一些详细的信息吗,比如:是datastream作业吧?是否设置了State TTL?观测到逐渐变大是通过checkpoint监控吗,总量是什么级别。cp文件或者本地rocksdb目录下哪些文件最大 On Wed, Jan 17, 2024 at 4:09 PM fufu wrote: > >

<    1   2   3   4   5   6   7   8   9   10   >