Hello,
I have a use case where I need to do a cache file deletion after a
successful sunk operation(writing to db). My Flink pipeline is built using
Java. I am contemplating using Java completableFuture.runasync() to perform
the file deletion activity. I am wondering what issues this might cause
If you have some experience before, I'd recommend setting a good parallelism
and TM resource spec first, to give the autotuner a good starting point.
Usually, the autoscaler can tune your jobs well within a few attempts (<=3). As
for `pekko.ask.timeout`, the default value should be sufficient
有人尝试这么实践过么?可以给一些建议么?谢谢!
在 2024-04-15 11:15:34,"casel.chen" 写道:
>我最近在调研Flink实时数仓数据质量保障,需要定期(每10/20/30分钟)跑批核对实时数仓产生的数据,传统方式是通过spark作业跑批,如Apache
>DolphinScheduler的数据质量模块。
>但这种方式的最大缺点是需要使用spark sql重写flink
>sql业务逻辑,难以确保二者一致性。所以我在考虑能否使用Flink流批一体特性,复用flink
Hi.
Does it make sense to specify `parallelism` for task managers or the `job`,
and, similarly, to specify memory amount for the task managers, or it’s better
to leave it to autoscaler and autotuner to pick the best values? How many times
would the autoscaler need to restart task managers
Hi Xuyang,
So if I check the side output way then my pipeline would be something like
this:
final OutputTag lateOutputTag = new OutputTag("late-data"){};
SingleOutputStreamOperator reducedDataStream =
dataStream
.keyBy(new MyKeySelector())
Hi, Sachin.
IIUC, it is in the second situation you listed, that is:
[ reduceData (d1, d2, d3), reducedData(ld1, d2, d3, late d4, late d5, late d6)
].
However, because of `table.exec.emit.late-fire.delay`, it could also be such as
[ reduceData (d1, d2, d3), reducedData(ld1, d2, d3, late d4,
Hi,
Suppose my pipeline is:
data
.keyBy(new MyKeySelector())
.window(TumblingEventTimeWindows.of(Time.seconds(60)))
.allowedLateness(Time.seconds(180))
.reduce(new MyDataReducer())
So I wanted to know if the final output stream would contain reduced data
at the end of the
Hi Tauseef.
I see that the support of Elasticsearch 8[1] will be released
in elasticsearch-3.1.0. So there is no docs for the elasticsearch8 by now.
We could learn to use it by some tests[2] before the docs is ready.
Best,
Hang
[1] https://issues.apache.org/jira/browse/FLINK-26088
[2]
Hi, David.
Have you added the parquet format[1] dependency in your dependencies?
It seems that the class ParquetColumnarRowInputFormat cannot be found.
Best,
Hang
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/formats/parquet/
Sohil Shah 于2024年4月17日周三