请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 来取消订阅来自
user-zh@flink.apache.org
邮件组的邮件。
Best,
Zhanghao Chen
From: 曹明勤
Sent: Thursday, February 22, 2024 9:42
To: user-zh@flink.apache.org
Subject: 退订
退订
退订
Hey Flinkers,
Recently I’ve been in the process of migrating a series of older Flink jobs to
use the official operator and have run into a snag on the logging front.
I’ve attempted to use the following configuration for the job:
```
logConfiguration:
log4j-console.properties: |+
Hi,
I have been trying to write a temporal join in SQL done on a rolling aggregate
view. However it does not work and throws :
org.apache.flink.table.api.ValidationException: Event-Time Temporal Table Join
requires both primary key and row time attribute in versioned table, but no row
time
Hi Folks,
I am currently seeking full-time positions in Flink Scala in India or the
USA (non consulting) , specifically at the Principal or Staff level
positions in India or USA.
I require an h1b transfer and assistance with relocation from India , my
i40 is approved.
Thanks & Regards
Sri
Hello
As per the OpenSearch connector documentation, OpensearchEmitter can be used to
perform requests of different types i.e., IndexRequest, DeleteRequest,
UpdateRequest etc.
Thanks Thias and Zakelly,
I probably muddied the waters saying that my use case was similar to
kvCache.
What I was calling "non serializable state" is actually a Random Cut Forest
ML model that cannot be serialized by itself, but you can extract a
serializable state. That is serializable, but
Good morning all,
Let me loop myself in …
1. Another even more convenient way to enable cache is to actually
configure/assign RocksDB to use more off-heap memory for cache, you also might
consider enabling bloom filters (all depends on how large you key-space is
flink sql作业从kafka消费mysql过来的canal
json消息,经过复杂处理后写入doris,请问如何统计doris表记录的端到端时延?mysql表有update_time字段代表业务更新记录时间。
doris系统可以在表schema新增一个更新时间列ingest_time,所以在doris表上可以通过ingest_time -
update_time算出端到端时延,但这种方法只能离线统计,有没有实时统计以方便实时监控的方法?
感谢!那是不是要计算sink延迟的话只需用 context.currentProcessingTime() - context.timestamp() 即可?
我看新的sink
v2中的SinkWriter.Context已经没有currentProcessingTime()方法了,是不是可以直接使用System.currentTimeMillis()
- context.timestamp()得到sink延迟呢?
在 2024-02-21 09:41:37,"Xuyang" 写道:
>Hi, chen.
>可以试一下在sink
Hi Lorenzo,
I think the most convenient way is to modify the code of the state backend,
adding a k-v cache as you want.
Otherwise IIUC, there's no public interface to get keyContext. But well,
you may try something hacky. You may use the passed-in `Context` instance
in processElement, and
Hi all,
The Apache Kyuubi community is pleased to announce that
Apache Kyuubi 1.8.1 has been released!
Apache Kyuubi is a distributed and multi-tenant gateway to provide
serverless SQL on data warehouses and lakehouses.
Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC and
RESTful
Hello Guys,
Can someone please assist us regarding the following issue ?
We have noticed that when we add a *new kafka sink* operator to the graph, *and
start from the last save point*, the operator is 100% busy for several
minutes and *even 1/2-1 hour* !!!
The problematic code seems to be the
Hi, chen.
可以试一下在sink function的invoke函数中使用:
@Override
public void invoke(RowData row, Context context) throws Exception {
context.currentProcessingTime();
context.currentWatermark();
...
}
--
Best!
Xuyang
在 2024-02-20 19:38:44,"Feng Jin"
Hi Arjun,
Yes, direct support for external configuration files within Flink
ConfigMaps is somewhat restricted. The current method involves simply
copying two local files from the operator.
Please check : FlinkConfMountDecorator#getLocalLogConfFiles()
Thanks Zakelly,
I'd need to do something similar, with a map containing my non-serializable
"state", similar to the kvCache in FastTop1Fucntion.
But I am not sure I understand how I can set the keyed state for a specific
key, in snapshotState().
FastTop1Function seems to rely on keyContext set
Classification: External
Hi,
I have a use case involving calculating the lifetime order count of a
customer in real-time. To reduce the memory footprint, I plan to run a
batch job on stored data every morning (let's say at 5 am) to calculate the
total order count up to that moment. Additionally,
我理解不应该通过 rowData 获取, 可以通过 Context 获得 watermark 和 eventTime.
Best,
Feng
On Tue, Feb 20, 2024 at 4:35 PM casel.chen wrote:
> 请问flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?
>
>
> public class XxxSinkFunction extends RichSinkFunction implements
> CheckpointedFunction,
Hello to all users, contributors and Committers!
The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!
We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.
TAC exists
退订
请问flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?
public class XxxSinkFunction extends RichSinkFunction implements
CheckpointedFunction, CheckpointListener {
@Override
public synchronized void invoke(RowData rowData, Context context) throws
IOException {
Hi In my case I have to query an external system. The system returns n numbers of row in a page and I have to call the system until no more data. I could use AsyncFunction if I buffer all records and output at the end but I want to “stream” and don’t have enough memory to hold all the data.
Hi Team,
I would like to know the possibilities of configuring the new relic alerts
for a Flink job whenever the job is submitted, gets failed and recovers
from the failure.
In our case, we have configured the Flink environment as a Kubernetes pod
running on an EKS cluster and the application
Is there some reason why you can’t use an AsyncFunction?
https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/async/AsyncFunction.html
Note that when dealing with event time and exactly once, an AsyncFunction
provides required support for
Hi.
I have a case where I would like to collect object from a completeablefuture
future in a flat map function.
I run into some problem where I get an exception regarding a buffer pool that
don’t exists when I collect the objets after some times. I can see if I for
testing don’t return from
Hi Zakelly,
Yeah that makes sense to me, I was just curious about whether reading could
be a bottleneck or not, but I imagine user-specific logic would be better
than a generic cache from Flink that might habe a low hit rate.
Thanks again,
Alexis.
On Mon, 19 Feb 2024, 07:29 Zakelly Lan, wrote:
Hi team,
I am currently in the process of deploying Flink on Kubernetes using the
Flink Kubernetes Operator and have encountered a scenario where I need to
pass runtime arguments to my Flink application from a properties file.
Given the dynamic nature of Kubernetes environments and the need for
退订
Hi Alexis,
Assuming the bulk load for a batch of sequential keys performs better than
accessing them one by one, the main problem comes to do we really need to
access all the keys that were bulk-loaded to cache before. In other words,
cache hit rate is the key issue. If the rate is high, even
Hi, Marek. Sorry for this late reply because of the Spring Festival in China.
When the upsert keys is empty that can't be deduced or are different with the
pk in. sink, flink will genereta upsert materializer when
`table.exec.sink.upsert-materialize = FORCE`. You can see the code here[1].
Hi Zakelly,
thanks for the information, that's interesting. Would you say that reading
a subset from RocksDB is fast enough to be pretty much negligible, or could
it be a bottleneck if the state of each key is "large"? Again assuming the
number of distinct partition keys is large.
Regards,
Hi Alexis,
Flink does need some heap memory to bridge requests to rocksdb and gather
the results. In most cases, the memory is discarded immediately (eventually
collected by GC). In case of timers, flink do cache a limited subset of
key-values in heap to improve performance.
In general you don't
Hi Lorenzo,
It is not recommended to do this with the keyed state. However there is an
example in flink code (FastTop1Function#snapshotState) [1] of setting keys
when snapshotState().
Hope this helps.
[1]
Hi Kanchi,
Could you provide with more information on it? Like at what stage this log
prints (job recovering, running, etc), any more detailed job or stacktrace.
Best,
Zhanghao Chen
From: Kanchi Masalia via user
Sent: Friday, February 16, 2024 4:07
To: Neha
Hi Brent,
Sounds like a good plan to start with. Application mode gives the best
isolation level, but can be costly for large number of small jobs as at least 2
containers are required (1 JobManager and 1 TaskManager) for each job. If your
jobs are mostly small in size (1 or 2 TM), you might
With streaming execution, the entire pipeline is always running, which is
necessary so that results can be continuously produced. But with batch
execution, the job graph can be segmented into separate pipelined stages
that can be executed sequentially, each running to completion before the
next
Hey everyone,
I've been looking at Flink to handle a fairly complex use case and was
hoping for some feedback on if the approach I'm thinking about with Flink
seems reasonable. When researching what people build on Flink, it seems
like a lot of the focus tends to be on running fewer
Hi Thias
I considered CheckpointedFunction.
In snapshotState() I would have to update the state of each key, extracting
the in-memory "state" of each key and putting it in the state with
state.update(...) .
This must happen per key,
But snapshotState() has no visibility of the keys. And I have no
Hi,
I have a netiquette question - is it ok to share blog posts here that
are specific to Apache Flink and not vendor-focussed?
thanks,
Robin.
Hi everyone,
I am currently exploring the fault tolerance and recovery mechanism in
batch mode within Apache Flink.
If I terminate the task manager process while the job is running, the job
restarts from the point of failure. However, at some point, the job
restarts from the very beginning.
The
Good morning Lorenzo,
You may want to implement
org.apache.flink.streaming.api.checkpoint.CheckpointedFunction interface in
your KeyedProcessFunction.
Btw. By the time initializeState(…) is called, the state backend is fully
initialized and can be read and written to (which is not the case for
Hello Kanchi,
It's recommended to submit a separate request or issue for the problem
you're encountering, as the data pipeline is distinct from the one Neha
raised. This will help ensure that each issue can be addressed individually
and efficiently.
Hello Neha,
Not sure about the issue you are
Hello Alexis,
I don't think data in RocksDB resides in JVM even with function calls.
For more details, check the link below:
https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#3-high-level-architecture
RocksDB has three main components - memtable, sstfile and WAL(not used in
Flink as
Hi!
We just encountered a similar issue.
This is usually caused by: 1) Akka failed sending the message silently, due
to problems like oversized payload or serialization failures. In that case,
you should find detailed error information in the logs. 2) The recipient
needs more time for
Hi Asimansu
The memory RocksDB manages is outside the JVM, yes, but the mentioned
subsets must be bridged to the JVM somehow so that the data can be exposed
to the functions running inside Flink, no?
Regards,
Alexis.
On Thu, 15 Feb 2024, 14:06 Asimansu Bera, wrote:
> Hello Alexis,
>
>
Hello everyone,
I have a convoluted problem.
I am implementing a KeyedProcessFunction that keeps some non-serializable
"state" in memory, in a transient Map (key = stream key, value = the
non-serializable "state").
I can extract a serializable representation to put in Flink state, and I
can
Hello Alexis,
RocksDB resides off-heap and outside of JVM. The small subset of data ends
up on the off-heap in the memory.
For more details, check the following link:
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup_tm/#managed-memory
I hope this
Hello,
Most info regarding RocksDB memory for Flink focuses on what's needed
independently of the JVM (although the Flink process configures its limits
and so on). I'm wondering if there are additional special considerations
with regards to the JVM heap in the following scenario.
Assuming a key
Here is my "little harsh/straightforward feedback", but it's based on fact
and real-world experience with using Redis since ~2012.
Redis is not a database, period. The best description of what Redis is is
something along the lines of "in-memory - text only (base64 ftw) - data
structures on top of
Hello,
You’re right, one of our main use cases consist of adding missing fields,
stored in a “small” reference table, periodically refreshed, to a stream. Using
a broadcast stream and flink join was not the choice we made, because we didn’t
want to add tricky watermarks and hold one stream (it
Dear all,
A reoccurring challenge we have with stream enrichment in Flink is a robust
mechanism to estimate that all messages of the source(s) have been
consumed/processed before output is collected.
A simple example is two sources of catalogue metadata:
- source A delivers products,
- source B
Hi Mate,
thanks for creating the issue and pointing it out. I think the issue you
created is a bit more specific than my whole point. It rather focuses on the
taskmanagers, which is of course fine. From my point of view the following two
things are the low hanging fruits:
1. Improving the
Hi,
I have opened a JIRA [1] as I had the same error (AlreadyExists) last week
and I could pinpoint the problem to the TaskManagers being still alive when
creating the new Deployment. In native mode we only check for the
JobManagers when we wait for the cluster to shut down in contrast to
You can check the Oracle CDC connector, which provides that
https://ververica.github.io/flink-cdc-connectors/master/content/connectors/oracle-cdc.html
Best,
G.
On Tue, Feb 13, 2024 at 3:25 PM К В wrote:
> Hello!
>
> We need to read data from an Oracle database table in order to pass it to
>
Hello!
We need to read data from an Oracle database table in order to pass it to
Kafka.
Data is inserted in the table periodically.
The table has multiple partitions.
Data should be processed parallel, each task should consume one partition
in the database.
Can this be done using a Flink task?
Hi Gyula,
thanks for the advise. I requested a Jira account and will try to open a ticket
as soon as I get access.
Cheers,
Niklas
> On 13. Feb 2024, at 09:13, Gyula Fóra wrote:
>
> Hi Niklas!
>
> The best way to report the issue would be to open a JIRA ticket with the same
> detailed
Hi Niklas!
The best way to report the issue would be to open a JIRA ticket with the
same detailed information.
Otherwise I think your observations are correct and this is indeed a
frequent problem that comes up, it would be good to improve on it. In
addition to improving logging we could also
Hi Nida,
I request that you read
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/
in order to learn how to Dockerize your Flink job.
You're Welcome & Regard
Marco A. Villalobos
> On Feb 13, 2024, at 12:00 AM, Fidea Lidea wrote:
>
> Hi
Hi Team,
I request you to provide a few sample codes for dockerizing
flink-java application.
My application has only one job as of now.
Awaiting your response.
Thanks & Regards
Nida Shaikh
Hi Kartik,
It should be the other way around: the connector should use the proper
Source and Sink interfaces, and therefore get the right guarantees and
integration with mechanisms like checkpoints and savepoints. I would
say there's no other way to achieve your desired result, because of
all the
Thank you Martijn, the article you provided had detailed explanation on the
exactly once two phase commit.
Returning to the best way to handle commits/acknowledgments on sources like
JMS Queues or Solace topics to support guaranteed delivery, when they are
not supported out of the box by Flink
Hi Flink Kubernetes Operator Community,
I hope this is the right way to report an issue with the Apache Flink
Kubernetes Operator. We are experiencing problems with some streaming job
clusters which end up in a terminated state, because of the operator not
behaving as expected. The problem is
Sources don't need to support two phase commits, that's something for
sinks. I think the example of exactly-once-processing (although the
interfaces have changed since) at
https://flink.apache.org/2018/02/28/an-overview-of-end-to-end-exactly-once-processing-in-apache-flink-with-apache-kafka-too/
Let me put the question in other words.
What happens if a source does not support two phase commit and the Flink
job has to guarantee exactly once delivery to downstream? Checkpointing as
I understand, works on interval basis. New events for which the checkpoint
barrier has not yet reached will
Hi Kartik,
I don't think there's much that the Flink community can do here to
help you. The Solace source and sink aren't owned by the Flink
project, and based on the source code they haven't been touched for
the last 7 years [1] and I'm actually not aware of anyone who uses
Solace at all.
Best
Any help here please.
Regards,
Kartik
On Fri, Feb 9, 2024, 8:33 AM Kartik Kushwaha
wrote:
> I am using flink checkpointing to restore states of my job. I am using
> unaligned checkpointing with 100 ms as the checkpointing interval. I see
> few events getting dropped that were sucessfully
Hello
Request you to please update.
Thanks !!
From: Praveen Chandna via user
Sent: Friday, February 9, 2024 2:46 PM
To: Praveen Chandna via user
Subject: Flink connection with AWS OpenSearch Service
Hello
As per the Flink OpenSearch connector documentation, it specify how to connect
to
Hello,
We have noticed that when we add a *new kafka sink* operator to the graph, *and
start from the last save point*, the operator is 100% busy for several
minutes and *even 1/2-1 hour* !!!
The problematic code seems to be the following for-loop in
getTransactionalProducer() method:
I am using flink checkpointing to restore states of my job. I am using
unaligned checkpointing with 100 ms as the checkpointing interval. I see
few events getting dropped that were sucessfully processed by the operators
or were in-flight that were yet to be captured by checkpoint. That is these
Hello
As per the Flink OpenSearch connector documentation, it specify how to connect
to OpenSearch database. But it doesn't mention anything how to connect with the
AWS OpenSearch service.
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/opensearch/
Whereas
For a collection of several complete sample applications using Flink with
Kafka, see https://github.com/confluentinc/flink-cookbook.
And I agree with Marco -- in fact, I would go farther, and say that using
Spring Boot with Flink is an anti-pattern.
David
On Wed, Feb 7, 2024 at 4:37 PM Marco
Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from user@flink.apache.org, and you can refer [1][2]
for more details.
请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自
user@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。
Best,
Jiabao
[1]
Hi Xuyang,
Thank you for the explanation, table.exec.sink.upsert-materialize =
FORCE config
was set unnecessarily, I just redeployed the job and confirmed that when
using default AUTO, materializer is still on
Thank you for the example you provided. My understanding of upsert key was
exactly as
我们在使用flink搭建实时数仓,想知道flink作业是如何做数据质量监控告警的?包括数据及时性、完整性、一致性、准确性等
调研了spark streaming有amazon deequ和apache
griffin框架来实现,想知道flink作业有没有类似的DQC框架?最好是对原有作业无侵入或者少侵入。
如果没有的话,实时数据质量这块一般是如何实现的呢?
如果每个生产作业都要单独配置一个DQC作业是不是代价太高了?有没有通过metrics暴露数据质量信息的呢?
下面是deequ使用的示例,检查每个微批数据是否满足规则要求。我们也有类似的数据质量检查需求
Good morning,
Any updates/progress on this issue ?
BR,
Danny
בתאריך יום א׳, 4 בפבר׳ 2024 ב-13:20 מאת Daniel Peled <
daniel.peled.w...@gmail.com>:
> Hello,
>
> We have noticed that when we add a *new kafka sink* operator to the
> graph, *and start from the last save point*, the operator
Hi Nida,
You can find sample code for using Kafka here:
https://kafka.apache.org/documentation/
You can find sample code for using Flink here:
https://nightlies.apache.org/flink/flink-docs-stable/
You can find sample code for using Flink with Kafka here:
Dear Flink Community,
I occasionally need to temporarily disable autoscaling and manually adjust
the scale of a Flink job. However, this process has proven to be
challenging. Although I've attempted to disable autoscaling by setting
job.autoscaler.scaling.enabled to false and modifying the
The Apache Flink community is very happy to announce the release of
Apache flink-connector-kafka v3.1.0. This release is compatible with
Apache Flink 1.17 and 1.18.
Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
The Apache Flink community is very happy to announce the release of
Apache flink-connector-kafka v3.1.0. This release is compatible with
Apache Flink 1.17 and 1.18.
Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
使用flink cdc 3.0
yaml作业进行mysql到doris整库同步时发现有数据倾斜发生,大的TM要处理180G数据,小的TM只有30G数据,上游有的大表流量很大,而小表几乎没有流量,有什么办法可以避免发生数据倾斜问题么?
Hello,
I was reading through the comments in [1] and it seems that enabling
watermark alignment implicitly activates some idleness logic "if the source
waits for alignment for a long time" (even if withIdleness is not called
explicitly during the creation of WatermarkStrategy). Is this time
Hi all,
Apache Celeborn(Incubating) community is glad to announce the
new release of Apache Celeborn(Incubating) 0.4.0.
Celeborn is dedicated to improving the efficiency and elasticity of
different map-reduce engines and provides an elastic, high-efficient
service for intermediate data including
Hello,
check this thread from some months ago, but keep in mind that it's not
really officially supported by Flink itself:
https://lists.apache.org/thread/l0pgm9o2vdywffzdmbh9kh7xorhfvj40
Regards,
Alexis.
Am Di., 6. Feb. 2024 um 12:23 Uhr schrieb Fidea Lidea <
lideafidea...@gmail.com>:
> Hi
Hi Matthias,
I think I understand the implications of idleness. In my case I really do
need it since even in the production environment one of the Kafka topics
will receive messages only sporadically.
With regards to the code, I have very limited understanding of Flink
internals, but that part I
Hi Team,
I request you to provide sample codes on data streaming using flink, kafka
and spring boot.
Awaiting your response.
Thanks & Regards
Nida Shaikh
Hi Alexis,
Yes, I guess so, while not utterly acquainted with that part of the code.
Apparently the SourceCoordinator cannot come up with a proper watermark time,
if watermarking is turned off (idle mode of stream), and then it deducts
watermark time from the remaining non-idle sources.
It’s
Hi Matthias,
thanks for looking at this. Would you then say this comment in the source
code is not really valid?
https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L181
That's where the log I was
Good morning Alexis,
withIdleness(…) is easily misunderstood, it actually means that the thus
configured stream is exempt from watermark processing after 5 seconds (in your
case).
Hence also watermark alignment is turned off for the stream until a new event
arrives.
.withIdleness(…) is good
Hi, Yunhong.
Thanks for your volunteering :)
--
Best!
Xuyang
在 2024-02-06 09:26:55,"yh z" 写道:
Hi, Xuyang, I hope I can also participate in the development of the remaining
flip features. Please cc me if there are any further developments. Thank you !
Xuyang 于2024年1月29日周一
Ah and I forgot to mention, this is with Flink 1.18.1
Am Mo., 5. Feb. 2024 um 18:00 Uhr schrieb Alexis Sarda-Espinosa <
sarda.espin...@gmail.com>:
> Hello,
>
> I have 2 Kafka sources that are configured with a watermark strategy
> instantiated like this:
>
>
Hello,
I have 2 Kafka sources that are configured with a watermark strategy
instantiated like this:
WatermarkStrategy.forBoundedOutOfOrderness(maxAllowedWatermarkDrift)
.withIdleness(idleTimeout) // 5 seconds currently
.withWatermarkAlignment(alignmentGroup,
Hello,
is the behavior for this configuration well defined? Assigning two
different (Kafka) sources to the same alignment group but configuring
different max allowed drift in each one.
Regards,
Alexis.
Thank you, Yang:
We have found the root cause.
In the logic of Flink operator, it calls Flink's rest API to stop this
job then calls the K8s's API to stop the deployment of Flink jobManager.
However it took more than one minute for K8s to delete that deployment, so when
the JM's
Hello,
We have noticed that when we add a *new kafka sink* operator to the graph, *and
start from the last save point*, the operator is 100% busy for several
minutes and *even 1/2-1 hour* !!!
The problematic code seems to be the following for-loop in
getTransactionalProducer() method:
我理解应该是平台统一配置在 flink-conf.yaml 即可, 不需要用户单独配置相关参数.
Best,
Feng
On Sun, Feb 4, 2024 at 11:19 AM 阿华田 wrote:
> 看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难
>
>
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>
> 在2024年02月2日 19:38,Feng Jin 写道:
> hi,
>
> 可以参考下
Broadcast streaming join is a very interesting addition to streaming
SQL, I'm glad to see it's been brought up.
One of the major difference between streaming and batch is state.
Regular join uses "Keyed State" (the key is deduced from join
condition), so for a regular broadcast streaming join, we
看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难
| |
阿华田
|
|
a15733178...@163.com
|
签名由网易邮箱大师定制
在2024年02月2日 19:38,Feng Jin 写道:
hi,
可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析
Source 和 Sink 拿到血缘信息。
[1]
+1 a FLIP to clarify the idea.
Please be careful to choose which type of state you use here. The doc[1]
says the broadcast state doesn't support RocksDB backend here.
Best,
Shengkai
[1]
好的 感谢 我研究一下
| |
阿华田
|
|
a15733178...@163.com
|
签名由网易邮箱大师定制
在2024年02月2日 19:38,Feng Jin 写道:
hi,
可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析
Source 和 Sink 拿到血缘信息。
[1]
901 - 1000 of 66307 matches
Mail list logo