Re: 退订

2024-02-21 Thread Zhanghao Chen
请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 来取消订阅来自 user-zh@flink.apache.org 邮件组的邮件。 Best, Zhanghao Chen From: 曹明勤 Sent: Thursday, February 22, 2024 9:42 To: user-zh@flink.apache.org Subject: 退订 退订

退订

2024-02-21 Thread 曹明勤
退订

Using Custom JSON Formatting with Flink Operator

2024-02-21 Thread Rion Williams
Hey Flinkers, Recently I’ve been in the process of migrating a series of older Flink jobs to use the official operator and have run into a snag on the logging front. I’ve attempted to use the following configuration for the job: ``` logConfiguration: log4j-console.properties: |+

Temporal join on rolling aggregate

2024-02-21 Thread Sébastien Chevalley
Hi, I have been trying to write a temporal join in SQL done on a rolling aggregate view. However it does not work and throws : org.apache.flink.table.api.ValidationException: Event-Time Temporal Table Join requires both primary key and row time attribute in versioned table, but no row time

Flink Scala Positions in India or USA !

2024-02-21 Thread sri hari kali charan Tummala
Hi Folks, I am currently seeking full-time positions in Flink Scala in India or the USA (non consulting) , specifically at the Principal or Staff level positions in India or USA. I require an h1b transfer and assistance with relocation from India , my i40 is approved. Thanks & Regards Sri

Flink - OpenSearch connector query

2024-02-21 Thread Praveen Chandna via user
Hello As per the OpenSearch connector documentation, OpensearchEmitter can be used to perform requests of different types i.e., IndexRequest, DeleteRequest, UpdateRequest etc.

Re: Preparing keyed state before snapshot

2024-02-21 Thread Lorenzo Nicora
Thanks Thias and Zakelly, I probably muddied the waters saying that my use case was similar to kvCache. What I was calling "non serializable state" is actually a Random Cut Forest ML model that cannot be serialized by itself, but you can extract a serializable state. That is serializable, but

RE: Preparing keyed state before snapshot

2024-02-21 Thread Schwalbe Matthias
Good morning all, Let me loop myself in … 1. Another even more convenient way to enable cache is to actually configure/assign RocksDB to use more off-heap memory for cache, you also might consider enabling bloom filters (all depends on how large you key-space is

flink sql作业如何统计端到端延迟

2024-02-20 Thread casel.chen
flink sql作业从kafka消费mysql过来的canal json消息,经过复杂处理后写入doris,请问如何统计doris表记录的端到端时延?mysql表有update_time字段代表业务更新记录时间。 doris系统可以在表schema新增一个更新时间列ingest_time,所以在doris表上可以通过ingest_time - update_time算出端到端时延,但这种方法只能离线统计,有没有实时统计以方便实时监控的方法?

Re:Re:Re: flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?

2024-02-20 Thread casel.chen
感谢!那是不是要计算sink延迟的话只需用 context.currentProcessingTime() - context.timestamp() 即可? 我看新的sink v2中的SinkWriter.Context已经没有currentProcessingTime()方法了,是不是可以直接使用System.currentTimeMillis() - context.timestamp()得到sink延迟呢? 在 2024-02-21 09:41:37,"Xuyang" 写道: >Hi, chen. >可以试一下在sink

Re: Preparing keyed state before snapshot

2024-02-20 Thread Zakelly Lan
Hi Lorenzo, I think the most convenient way is to modify the code of the state backend, adding a k-v cache as you want. Otherwise IIUC, there's no public interface to get keyContext. But well, you may try something hacky. You may use the passed-in `Context` instance in processElement, and

[ANNOUNCE] Apache Kyuubi 1.8.1 is available

2024-02-20 Thread Cheng Pan
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.8.1 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC and RESTful

Entering a busy loop when adding a new sink to the graph + checkpoint enabled

2024-02-20 Thread Daniel Peled
Hello Guys, Can someone please assist us regarding the following issue ? We have noticed that when we add a *new kafka sink* operator to the graph, *and start from the last save point*, the operator is 100% busy for several minutes and *even 1/2-1 hour* !!! The problematic code seems to be the

Re:Re: flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?

2024-02-20 Thread Xuyang
Hi, chen. 可以试一下在sink function的invoke函数中使用: @Override public void invoke(RowData row, Context context) throws Exception { context.currentProcessingTime(); context.currentWatermark(); ... } -- Best! Xuyang 在 2024-02-20 19:38:44,"Feng Jin"

Re: Support for ConfigMap for Runtime Arguments in Flink Kubernetes Operator

2024-02-20 Thread Surendra Singh Lilhore
Hi Arjun, Yes, direct support for external configuration files within Flink ConfigMaps is somewhat restricted. The current method involves simply copying two local files from the operator. Please check : FlinkConfMountDecorator#getLocalLogConfFiles()

Re: Preparing keyed state before snapshot

2024-02-20 Thread Lorenzo Nicora
Thanks Zakelly, I'd need to do something similar, with a map containing my non-serializable "state", similar to the kvCache in FastTop1Fucntion. But I am not sure I understand how I can set the keyed state for a specific key, in snapshotState(). FastTop1Function seems to rely on keyContext set

Help in designing the Flink usecase

2024-02-20 Thread neha goyal
Classification: External Hi, I have a use case involving calculating the lifetime order count of a customer in real-time. To reduce the memory footprint, I plan to run a batch job on stored data every morning (let's say at 5 am) to calculate the total order count up to that moment. Additionally,

Re: flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?

2024-02-20 Thread Feng Jin
我理解不应该通过 rowData 获取, 可以通过 Context 获得 watermark 和 eventTime. Best, Feng On Tue, Feb 20, 2024 at 4:35 PM casel.chen wrote: > 请问flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark? > > > public class XxxSinkFunction extends RichSinkFunction implements > CheckpointedFunction,

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code Asia 2024 are now open! We will be supporting Community over Code Asia, Hangzhou, China July 26th - 28th, 2024. TAC exists

退订

2024-02-20 Thread 任香帅
退订

flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?

2024-02-20 Thread casel.chen
请问flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark? public class XxxSinkFunction extends RichSinkFunction implements CheckpointedFunction, CheckpointListener { @Override public synchronized void invoke(RowData rowData, Context context) throws IOException {

Re: Completeablefuture in a flat map operator

2024-02-19 Thread Lasse Nedergaard
Hi In my case I have to query an external system. The system returns n numbers of row in a page and I have to call the system until no more data. I could use AsyncFunction if I buffer all records and output at the end but I want to “stream” and don’t have enough memory to hold all the data. 

New Relic alerts for Flink job submission failure and recovery

2024-02-19 Thread elakiya udhayanan
Hi Team, I would like to know the possibilities of configuring the new relic alerts for a Flink job whenever the job is submitted, gets failed and recovers from the failure. In our case, we have configured the Flink environment as a Kubernetes pod running on an EKS cluster and the application

Re: Completeablefuture in a flat map operator

2024-02-19 Thread Ken Krugler
Is there some reason why you can’t use an AsyncFunction? https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/async/AsyncFunction.html Note that when dealing with event time and exactly once, an AsyncFunction provides required support for

Completeablefuture in a flat map operator

2024-02-19 Thread Lasse Nedergaard
Hi. I have a case where I would like to collect object from a completeablefuture future in a flat map function. I run into some problem where I get an exception regarding a buffer pool that don’t exists when I collect the objets after some times. I can see if I for testing don’t return from

Re: Impact of RocksDB backend on the Java heap

2024-02-19 Thread Alexis Sarda-Espinosa
Hi Zakelly, Yeah that makes sense to me, I was just curious about whether reading could be a bottleneck or not, but I imagine user-specific logic would be better than a generic cache from Flink that might habe a low hit rate. Thanks again, Alexis. On Mon, 19 Feb 2024, 07:29 Zakelly Lan, wrote:

Support for ConfigMap for Runtime Arguments in Flink Kubernetes Operator

2024-02-19 Thread arjun s
Hi team, I am currently in the process of deploying Flink on Kubernetes using the Flink Kubernetes Operator and have encountered a scenario where I need to pass runtime arguments to my Flink application from a properties file. Given the dynamic nature of Kubernetes environments and the need for

退订

2024-02-19 Thread 曹明勤
退订

Re: Impact of RocksDB backend on the Java heap

2024-02-18 Thread Zakelly Lan
Hi Alexis, Assuming the bulk load for a batch of sequential keys performs better than accessing them one by one, the main problem comes to do we really need to access all the keys that were bulk-loaded to cache before. In other words, cache hit rate is the key issue. If the rate is high, even

Re:Re: sink upsert materializer in SQL job

2024-02-18 Thread Xuyang
Hi, Marek. Sorry for this late reply because of the Spring Festival in China. When the upsert keys is empty that can't be deduced or are different with the pk in. sink, flink will genereta upsert materializer when `table.exec.sink.upsert-materialize = FORCE`. You can see the code here[1].

Re: Impact of RocksDB backend on the Java heap

2024-02-18 Thread Alexis Sarda-Espinosa
Hi Zakelly, thanks for the information, that's interesting. Would you say that reading a subset from RocksDB is fast enough to be pretty much negligible, or could it be a bottleneck if the state of each key is "large"? Again assuming the number of distinct partition keys is large. Regards,

Re: Impact of RocksDB backend on the Java heap

2024-02-17 Thread Zakelly Lan
Hi Alexis, Flink does need some heap memory to bridge requests to rocksdb and gather the results. In most cases, the memory is discarded immediately (eventually collected by GC). In case of timers, flink do cache a limited subset of key-values in heap to improve performance. In general you don't

Re: Preparing keyed state before snapshot

2024-02-17 Thread Zakelly Lan
Hi Lorenzo, It is not recommended to do this with the keyed state. However there is an example in flink code (FastTop1Function#snapshotState) [1] of setting keys when snapshotState(). Hope this helps. [1]

Re: Task Manager getting killed while executing sql queries.

2024-02-16 Thread Zhanghao Chen
Hi Kanchi, Could you provide with more information on it? Like at what stage this log prints (job recovering, running, etc), any more detailed job or stacktrace. Best, Zhanghao Chen From: Kanchi Masalia via user Sent: Friday, February 16, 2024 4:07 To: Neha

Re: Flink use case feedback request

2024-02-16 Thread Zhanghao Chen
Hi Brent, Sounds like a good plan to start with. Application mode gives the best isolation level, but can be costly for large number of small jobs as at least 2 containers are required (1 JobManager and 1 TaskManager) for each job. If your jobs are mostly small in size (1 or 2 TM), you might

Re: The fault tolerance and recovery mechanism in batch mode within Apache Flink.

2024-02-16 Thread David Anderson
With streaming execution, the entire pipeline is always running, which is necessary so that results can be continuously produced. But with batch execution, the job graph can be segmented into separate pipelined stages that can be executed sequentially, each running to completion before the next

Flink use case feedback request

2024-02-16 Thread Brent
Hey everyone, I've been looking at Flink to handle a fairly complex use case and was hoping for some feedback on if the approach I'm thinking about with Flink seems reasonable. When researching what people build on Flink, it seems like a lot of the focus tends to be on running fewer

Re: Preparing keyed state before snapshot

2024-02-16 Thread Lorenzo Nicora
Hi Thias I considered CheckpointedFunction. In snapshotState() I would have to update the state of each key, extracting the in-memory "state" of each key and putting it in the state with state.update(...) . This must happen per key, But snapshotState() has no visibility of the keys. And I have no

[Meta question] Sharing blog posts

2024-02-16 Thread Robin Moffatt via user
Hi, I have a netiquette question - is it ok to share blog posts here that are specific to Apache Flink and not vendor-focussed? thanks, Robin.

The fault tolerance and recovery mechanism in batch mode within Apache Flink.

2024-02-16 Thread Вова Фролов
Hi everyone, I am currently exploring the fault tolerance and recovery mechanism in batch mode within Apache Flink. If I terminate the task manager process while the job is running, the job restarts from the point of failure. However, at some point, the job restarts from the very beginning. The

RE: Preparing keyed state before snapshot

2024-02-15 Thread Schwalbe Matthias
Good morning Lorenzo, You may want to implement org.apache.flink.streaming.api.checkpoint.CheckpointedFunction interface in your KeyedProcessFunction. Btw. By the time initializeState(…) is called, the state backend is fully initialized and can be read and written to (which is not the case for

Re: Task Manager getting killed while executing sql queries.

2024-02-15 Thread Asimansu Bera
Hello Kanchi, It's recommended to submit a separate request or issue for the problem you're encountering, as the data pipeline is distinct from the one Neha raised. This will help ensure that each issue can be addressed individually and efficiently. Hello Neha, Not sure about the issue you are

Re: Impact of RocksDB backend on the Java heap

2024-02-15 Thread Asimansu Bera
Hello Alexis, I don't think data in RocksDB resides in JVM even with function calls. For more details, check the link below: https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#3-high-level-architecture RocksDB has three main components - memtable, sstfile and WAL(not used in Flink as

Re: Task Manager getting killed while executing sql queries.

2024-02-15 Thread Kanchi Masalia via user
Hi! We just encountered a similar issue. This is usually caused by: 1) Akka failed sending the message silently, due to problems like oversized payload or serialization failures. In that case, you should find detailed error information in the logs. 2) The recipient needs more time for

Re: Impact of RocksDB backend on the Java heap

2024-02-15 Thread Alexis Sarda-Espinosa
Hi Asimansu The memory RocksDB manages is outside the JVM, yes, but the mentioned subsets must be bridged to the JVM somehow so that the data can be exposed to the functions running inside Flink, no? Regards, Alexis. On Thu, 15 Feb 2024, 14:06 Asimansu Bera, wrote: > Hello Alexis, > >

Preparing keyed state before snapshot

2024-02-15 Thread Lorenzo Nicora
Hello everyone, I have a convoluted problem. I am implementing a KeyedProcessFunction that keeps some non-serializable "state" in memory, in a transient Map (key = stream key, value = the non-serializable "state"). I can extract a serializable representation to put in Flink state, and I can

Re: Impact of RocksDB backend on the Java heap

2024-02-15 Thread Asimansu Bera
Hello Alexis, RocksDB resides off-heap and outside of JVM. The small subset of data ends up on the off-heap in the memory. For more details, check the following link: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup_tm/#managed-memory I hope this

Impact of RocksDB backend on the Java heap

2024-02-15 Thread Alexis Sarda-Espinosa
Hello, Most info regarding RocksDB memory for Flink focuses on what's needed independently of the JVM (although the Flink process configures its limits and so on). I'm wondering if there are additional special considerations with regards to the JVM heap in the following scenario. Assuming a key

Re: Redis as a State Backend

2024-02-14 Thread David Morávek
Here is my "little harsh/straightforward feedback", but it's based on fact and real-world experience with using Redis since ~2012. Redis is not a database, period. The best description of what Redis is is something along the lines of "in-memory - text only (base64 ftw) - data structures on top of

RE: Stream enrichment with ingest mode

2024-02-14 Thread LINZ, Arnaud
Hello, You’re right, one of our main use cases consist of adding missing fields, stored in a “small” reference table, periodically refreshed, to a stream. Using a broadcast stream and flink join was not the choice we made, because we didn’t want to add tricky watermarks and hold one stream (it

Stream enrichment with ingest mode

2024-02-13 Thread Lars Skjærven
Dear all, A reoccurring challenge we have with stream enrichment in Flink is a robust mechanism to estimate that all messages of the source(s) have been consumed/processed before output is collected. A simple example is two sources of catalogue metadata: - source A delivers products, - source B

Re: [EXTERNAL]Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Niklas Wilcke
Hi Mate, thanks for creating the issue and pointing it out. I think the issue you created is a bit more specific than my whole point. It rather focuses on the taskmanagers, which is of course fine. From my point of view the following two things are the low hanging fruits: 1. Improving the

Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Mate Czagany
Hi, I have opened a JIRA [1] as I had the same error (AlreadyExists) last week and I could pinpoint the problem to the TaskManagers being still alive when creating the new Deployment. In native mode we only check for the JobManagers when we wait for the cluster to shut down in contrast to

Re: Continuous transfer of data from a partitioned table

2024-02-13 Thread Giannis Polyzos
You can check the Oracle CDC connector, which provides that https://ververica.github.io/flink-cdc-connectors/master/content/connectors/oracle-cdc.html Best, G. On Tue, Feb 13, 2024 at 3:25 PM К В wrote: > Hello! > > We need to read data from an Oracle database table in order to pass it to >

Continuous transfer of data from a partitioned table

2024-02-13 Thread К В
Hello! We need to read data from an Oracle database table in order to pass it to Kafka. Data is inserted in the table periodically. The table has multiple partitions. Data should be processed parallel, each task should consume one partition in the database. Can this be done using a Flink task?

Re: [EXTERNAL]Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Niklas Wilcke
Hi Gyula, thanks for the advise. I requested a Jira account and will try to open a ticket as soon as I get access. Cheers, Niklas > On 13. Feb 2024, at 09:13, Gyula Fóra wrote: > > Hi Niklas! > > The best way to report the issue would be to open a JIRA ticket with the same > detailed

Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Gyula Fóra
Hi Niklas! The best way to report the issue would be to open a JIRA ticket with the same detailed information. Otherwise I think your observations are correct and this is indeed a frequent problem that comes up, it would be good to improve on it. In addition to improving logging we could also

Re: Request for sample codes for Dockerizing Java application

2024-02-13 Thread Marco Villalobos
Hi Nida, I request that you read https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/ in order to learn how to Dockerize your Flink job. You're Welcome & Regard Marco A. Villalobos > On Feb 13, 2024, at 12:00 AM, Fidea Lidea wrote: > > Hi

Request for sample codes for Dockerizing Java application

2024-02-13 Thread Fidea Lidea
Hi Team, I request you to provide a few sample codes for dockerizing flink-java application. My application has only one job as of now. Awaiting your response. Thanks & Regards Nida Shaikh

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Martijn Visser
Hi Kartik, It should be the other way around: the connector should use the proper Source and Sink interfaces, and therefore get the right guarantees and integration with mechanisms like checkpoints and savepoints. I would say there's no other way to achieve your desired result, because of all the

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Kartik Kushwaha
Thank you Martijn, the article you provided had detailed explanation on the exactly once two phase commit. Returning to the best way to handle commits/acknowledgments on sources like JMS Queues or Solace topics to support guaranteed delivery, when they are not supported out of the box by Flink

Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-12 Thread Niklas Wilcke
Hi Flink Kubernetes Operator Community, I hope this is the right way to report an issue with the Apache Flink Kubernetes Operator. We are experiencing problems with some streaming job clusters which end up in a terminated state, because of the operator not behaving as expected. The problem is

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Martijn Visser
Sources don't need to support two phase commits, that's something for sinks. I think the example of exactly-once-processing (although the interfaces have changed since) at https://flink.apache.org/2018/02/28/an-overview-of-end-to-end-exactly-once-processing-in-apache-flink-with-apache-kafka-too/

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Kartik Kushwaha
Let me put the question in other words. What happens if a source does not support two phase commit and the Flink job has to guarantee exactly once delivery to downstream? Checkpointing as I understand, works on interval basis. New events for which the checkpoint barrier has not yet reached will

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Martijn Visser
Hi Kartik, I don't think there's much that the Flink community can do here to help you. The Solace source and sink aren't owned by the Flink project, and based on the source code they haven't been touched for the last 7 years [1] and I'm actually not aware of anyone who uses Solace at all. Best

Re: Flink checkpointing - exactly once guaranteed understanding

2024-02-12 Thread Kartik Kushwaha
Any help here please. Regards, Kartik On Fri, Feb 9, 2024, 8:33 AM Kartik Kushwaha wrote: > I am using flink checkpointing to restore states of my job. I am using > unaligned checkpointing with 100 ms as the checkpointing interval. I see > few events getting dropped that were sucessfully

RE: Flink connection with AWS OpenSearch Service

2024-02-12 Thread Praveen Chandna via user
Hello Request you to please update. Thanks !! From: Praveen Chandna via user Sent: Friday, February 9, 2024 2:46 PM To: Praveen Chandna via user Subject: Flink connection with AWS OpenSearch Service Hello As per the Flink OpenSearch connector documentation, it specify how to connect to

Entering a busy loop when adding a new sink to the graph

2024-02-11 Thread nick toker
Hello, We have noticed that when we add a *new kafka sink* operator to the graph, *and start from the last save point*, the operator is 100% busy for several minutes and *even 1/2-1 hour* !!! The problematic code seems to be the following for-loop in getTransactionalProducer() method:

Flink checkpointing - exactly once guaranteed understanding

2024-02-09 Thread Kartik Kushwaha
I am using flink checkpointing to restore states of my job. I am using unaligned checkpointing with 100 ms as the checkpointing interval. I see few events getting dropped that were sucessfully processed by the operators or were in-flight that were yet to be captured by checkpoint. That is these

Flink connection with AWS OpenSearch Service

2024-02-09 Thread Praveen Chandna via user
Hello As per the Flink OpenSearch connector documentation, it specify how to connect to OpenSearch database. But it doesn't mention anything how to connect with the AWS OpenSearch service. https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/opensearch/ Whereas

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-08 Thread David Anderson
For a collection of several complete sample applications using Flink with Kafka, see https://github.com/confluentinc/flink-cookbook. And I agree with Marco -- in fact, I would go farther, and say that using Spring Boot with Flink is an anti-pattern. David On Wed, Feb 7, 2024 at 4:37 PM Marco

RE: 退订

2024-02-08 Thread Jiabao Sun
Please send email to user-unsubscr...@flink.apache.org if you want to unsubscribe the mail from user@flink.apache.org, and you can refer [1][2] for more details. 请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自 user@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。 Best, Jiabao [1]

Re: sink upsert materializer in SQL job

2024-02-08 Thread Marek Maj
Hi Xuyang, Thank you for the explanation, table.exec.sink.upsert-materialize = FORCE config was set unnecessarily, I just redeployed the job and confirmed that when using default AUTO, materializer is still on Thank you for the example you provided. My understanding of upsert key was exactly as

flink作业实时数据质量监控告警要如何实现?

2024-02-08 Thread casel.chen
我们在使用flink搭建实时数仓,想知道flink作业是如何做数据质量监控告警的?包括数据及时性、完整性、一致性、准确性等 调研了spark streaming有amazon deequ和apache griffin框架来实现,想知道flink作业有没有类似的DQC框架?最好是对原有作业无侵入或者少侵入。 如果没有的话,实时数据质量这块一般是如何实现的呢? 如果每个生产作业都要单独配置一个DQC作业是不是代价太高了?有没有通过metrics暴露数据质量信息的呢? 下面是deequ使用的示例,检查每个微批数据是否满足规则要求。我们也有类似的数据质量检查需求

Re: Entering a busy loop when adding a new sink to the graph + checkpoint enabled

2024-02-07 Thread Daniel Peled
Good morning, Any updates/progress on this issue ? BR, Danny ‫בתאריך יום א׳, 4 בפבר׳ 2024 ב-13:20 מאת ‪Daniel Peled‬‏ <‪ daniel.peled.w...@gmail.com‬‏>:‬ > Hello, > > We have noticed that when we add a *new kafka sink* operator to the > graph, *and start from the last save point*, the operator

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-07 Thread Marco Villalobos
Hi Nida, You can find sample code for using Kafka here: https://kafka.apache.org/documentation/ You can find sample code for using Flink here: https://nightlies.apache.org/flink/flink-docs-stable/ You can find sample code for using Flink with Kafka here:

how to disable autoscaling

2024-02-07 Thread Yang LI
Dear Flink Community, I occasionally need to temporarily disable autoscaling and manually adjust the scale of a Flink job. However, this process has proven to be challenging. Although I've attempted to disable autoscaling by setting job.autoscaler.scaling.enabled to false and modifying the

[ANNOUNCE] Apache flink-connector-kafka v3.1.0 released

2024-02-07 Thread Martijn Visser
The Apache Flink community is very happy to announce the release of Apache flink-connector-kafka v3.1.0. This release is compatible with Apache Flink 1.17 and 1.18. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data

[ANNOUNCE] Apache flink-connector-kafka v3.1.0 released

2024-02-07 Thread Martijn Visser
The Apache Flink community is very happy to announce the release of Apache flink-connector-kafka v3.1.0. This release is compatible with Apache Flink 1.17 and 1.18. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data

flink cdc整库同步大小表造成数据倾斜问题

2024-02-06 Thread casel.chen
使用flink cdc 3.0 yaml作业进行mysql到doris整库同步时发现有数据倾斜发生,大的TM要处理180G数据,小的TM只有30G数据,上游有的大表流量很大,而小表几乎没有流量,有什么办法可以避免发生数据倾斜问题么?

Watermark alignment without idleness

2024-02-06 Thread Alexis Sarda-Espinosa
Hello, I was reading through the comments in [1] and it seems that enabling watermark alignment implicitly activates some idleness logic "if the source waits for alignment for a long time" (even if withIdleness is not called explicitly during the creation of WatermarkStrategy). Is this time

[ANNOUNCE] Apache Celeborn(incubating) 0.4.0 available

2024-02-06 Thread Fu Chen
Hi all, Apache Celeborn(Incubating) community is glad to announce the new release of Apache Celeborn(Incubating) 0.4.0. Celeborn is dedicated to improving the efficiency and elasticity of different map-reduce engines and provides an elastic, high-efficient service for intermediate data including

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-06 Thread Alexis Sarda-Espinosa
Hello, check this thread from some months ago, but keep in mind that it's not really officially supported by Flink itself: https://lists.apache.org/thread/l0pgm9o2vdywffzdmbh9kh7xorhfvj40 Regards, Alexis. Am Di., 6. Feb. 2024 um 12:23 Uhr schrieb Fidea Lidea < lideafidea...@gmail.com>: > Hi

Re: Idleness not working if watermark alignment is used

2024-02-06 Thread Alexis Sarda-Espinosa
Hi Matthias, I think I understand the implications of idleness. In my case I really do need it since even in the production environment one of the Kafka topics will receive messages only sporadically. With regards to the code, I have very limited understanding of Flink internals, but that part I

Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-06 Thread Fidea Lidea
Hi Team, I request you to provide sample codes on data streaming using flink, kafka and spring boot. Awaiting your response. Thanks & Regards Nida Shaikh

RE: Idleness not working if watermark alignment is used

2024-02-06 Thread Schwalbe Matthias
Hi Alexis, Yes, I guess so, while not utterly acquainted with that part of the code. Apparently the SourceCoordinator cannot come up with a proper watermark time, if watermarking is turned off (idle mode of stream), and then it deducts watermark time from the remaining non-idle sources. It’s

Re: Idleness not working if watermark alignment is used

2024-02-06 Thread Alexis Sarda-Espinosa
Hi Matthias, thanks for looking at this. Would you then say this comment in the source code is not really valid? https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L181 That's where the log I was

RE: Idleness not working if watermark alignment is used

2024-02-05 Thread Schwalbe Matthias
Good morning Alexis, withIdleness(…) is easily misunderstood, it actually means that the thus configured stream is exempt from watermark processing after 5 seconds (in your case). Hence also watermark alignment is turned off for the stream until a new event arrives. .withIdleness(…) is good

退订

2024-02-05 Thread 杨作青

Re:Re: Re: DESCRIBE CATALOG not available?

2024-02-05 Thread Xuyang
Hi, Yunhong. Thanks for your volunteering :) -- Best! Xuyang 在 2024-02-06 09:26:55,"yh z" 写道: Hi, Xuyang, I hope I can also participate in the development of the remaining flip features. Please cc me if there are any further developments. Thank you ! Xuyang 于2024年1月29日周一

Re: Idleness not working if watermark alignment is used

2024-02-05 Thread Alexis Sarda-Espinosa
Ah and I forgot to mention, this is with Flink 1.18.1 Am Mo., 5. Feb. 2024 um 18:00 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hello, > > I have 2 Kafka sources that are configured with a watermark strategy > instantiated like this: > >

Idleness not working if watermark alignment is used

2024-02-05 Thread Alexis Sarda-Espinosa
Hello, I have 2 Kafka sources that are configured with a watermark strategy instantiated like this: WatermarkStrategy.forBoundedOutOfOrderness(maxAllowedWatermarkDrift) .withIdleness(idleTimeout) // 5 seconds currently .withWatermarkAlignment(alignmentGroup,

Watermark alignment with different allowed drifts

2024-02-05 Thread Alexis Sarda-Espinosa
Hello, is the behavior for this configuration well defined? Assigning two different (Kafka) sources to the same alignment group but configuring different max allowed drift in each one. Regards, Alexis.

回复: Jobmanager restart after it has been requested to stop

2024-02-04 Thread Liting Liu (litiliu) via user
Thank you, Yang:   We have found the root cause.   In the logic of Flink operator, it calls Flink's rest API to stop this job then calls the K8s's API to stop the deployment of Flink jobManager. However it took more than one minute for K8s to delete that deployment, so when the JM's

Entering a busy loop when adding a new sink to the graph + checkpoint enabled

2024-02-04 Thread Daniel Peled
Hello, We have noticed that when we add a *new kafka sink* operator to the graph, *and start from the last save point*, the operator is 100% busy for several minutes and *even 1/2-1 hour* !!! The problematic code seems to be the following for-loop in getTransactionalProducer() method:

Re: Flink任务链接信息审计获取

2024-02-03 Thread Feng Jin
我理解应该是平台统一配置在 flink-conf.yaml 即可, 不需要用户单独配置相关参数. Best, Feng On Sun, Feb 4, 2024 at 11:19 AM 阿华田 wrote: > 看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难 > > > | | > 阿华田 > | > | > a15733178...@163.com > | > 签名由网易邮箱大师定制 > > > 在2024年02月2日 19:38,Feng Jin 写道: > hi, > > 可以参考下

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Benchao Li
Broadcast streaming join is a very interesting addition to streaming SQL, I'm glad to see it's been brought up. One of the major difference between streaming and batch is state. Regular join uses "Keyed State" (the key is deduced from join condition), so for a regular broadcast streaming join, we

回复: Flink任务链接信息审计获取

2024-02-03 Thread 阿华田
看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难 | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2024年02月2日 19:38,Feng Jin 写道: hi, 可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析 Source 和 Sink 拿到血缘信息。 [1]

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Shengkai Fang
+1 a FLIP to clarify the idea. Please be careful to choose which type of state you use here. The doc[1] says the broadcast state doesn't support RocksDB backend here. Best, Shengkai [1]

回复: Flink任务链接信息审计获取

2024-02-03 Thread 阿华田
好的 感谢 我研究一下 | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2024年02月2日 19:38,Feng Jin 写道: hi, 可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析 Source 和 Sink 拿到血缘信息。 [1]

<    5   6   7   8   9   10   11   12   13   14   >