Async code inside Flink Sink

2024-04-17 Thread Jacob Rollings
Hello, I have a use case where I need to do a cache file deletion after a successful sunk operation(writing to db). My Flink pipeline is built using Java. I am contemplating using Java completableFuture.runasync() to perform the file deletion activity. I am wondering what issues this might cause

Re: Parallelism for auto-scaling, memory for auto-tuning - Flink operator

2024-04-17 Thread Zhanghao Chen
If you have some experience before, I'd recommend setting a good parallelism and TM resource spec first, to give the autotuner a good starting point. Usually, the autoscaler can tune your jobs well within a few attempts (<=3). As for `pekko.ask.timeout`, the default value should be sufficient

Re:Flink流批一体应用在实时数仓数据核对场景下有哪些注意事项?

2024-04-17 Thread casel.chen
有人尝试这么实践过么?可以给一些建议么?谢谢! 在 2024-04-15 11:15:34,"casel.chen" 写道: >我最近在调研Flink实时数仓数据质量保障,需要定期(每10/20/30分钟)跑批核对实时数仓产生的数据,传统方式是通过spark作业跑批,如Apache >DolphinScheduler的数据质量模块。 >但这种方式的最大缺点是需要使用spark sql重写flink >sql业务逻辑,难以确保二者一致性。所以我在考虑能否使用Flink流批一体特性,复用flink

Parallelism for auto-scaling, memory for auto-tuning - Flink operator

2024-04-17 Thread Maxim Senin via user
Hi. Does it make sense to specify `parallelism` for task managers or the `job`, and, similarly, to specify memory amount for the task managers, or it’s better to leave it to autoscaler and autotuner to pick the best values? How many times would the autoscaler need to restart task managers

Re: Understanding default firings in case of allowed lateness

2024-04-17 Thread Sachin Mittal
Hi Xuyang, So if I check the side output way then my pipeline would be something like this: final OutputTag lateOutputTag = new OutputTag("late-data"){}; SingleOutputStreamOperator reducedDataStream = dataStream .keyBy(new MyKeySelector())

Re:Understanding default firings in case of allowed lateness

2024-04-17 Thread Xuyang
Hi, Sachin. IIUC, it is in the second situation you listed, that is: [ reduceData (d1, d2, d3), reducedData(ld1, d2, d3, late d4, late d5, late d6) ]. However, because of `table.exec.emit.late-fire.delay`, it could also be such as [ reduceData (d1, d2, d3), reducedData(ld1, d2, d3, late d4,

Understanding default firings in case of allowed lateness

2024-04-17 Thread Sachin Mittal
Hi, Suppose my pipeline is: data .keyBy(new MyKeySelector()) .window(TumblingEventTimeWindows.of(Time.seconds(60))) .allowedLateness(Time.seconds(180)) .reduce(new MyDataReducer()) So I wanted to know if the final output stream would contain reduced data at the end of the

Re: Elasticsearch8 example

2024-04-17 Thread Hang Ruan
Hi Tauseef. I see that the support of Elasticsearch 8[1] will be released in elasticsearch-3.1.0. So there is no docs for the elasticsearch8 by now. We could learn to use it by some tests[2] before the docs is ready. Best, Hang [1] https://issues.apache.org/jira/browse/FLINK-26088 [2]

Re: Table Source from Parquet Bug

2024-04-17 Thread Hang Ruan
Hi, David. Have you added the parquet format[1] dependency in your dependencies? It seems that the class ParquetColumnarRowInputFormat cannot be found. Best, Hang [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/formats/parquet/ Sohil Shah 于2024年4月17日周三