退订

2024-04-18 Thread junhua . xie


Re: Why RocksDB metrics cache-usage is larger than cache-capacity

2024-04-18 Thread Hangxiang Yu
Hi, Lei.
It's indeed a bit confusing. Could you share the related rocksdb log which
may contain more detailed info ?

On Fri, Apr 12, 2024 at 12:49 PM Lei Wang  wrote:

>
> I enable RocksDB native metrics and do some performance tuning.
>
> state.backend.rocksdb.block.cache-size is set to 128m,4 slots for each
> TaskManager.
>
> The observed result for one specific parallel slot:
> state.backend.rocksdb.metrics.block-cache-capacity  is about 14.5M
>   state.backend.rocksdb.metrics.block-cache-usage  is about  24M
>
> I am a little confused, why usage is larger than capacity?
>
> Thanks,
> Lei
>


-- 
Best,
Hangxiang.


退订

2024-04-18 Thread dongming



RE: Watermark advancing too quickly when reprocessing events with event time from Kafka

2024-04-18 Thread Tyron Zerafa
Hi,

I’m experiencing the same issue on Flink 18.1.

I have a slightly different job graph. I have a single Kafka Source 
(parallelism 6) that is consuming from 2 topics, one topic with 4 partitions 
and one topic with 6 partitions.

The autoWatermarkInteval change to 0 didn’t fix my issue.

Did you ever find a solution to this problem please?

On 2024/01/17 16:38:40 "adrianbartnik.mailbox.org via user" wrote:
> Hi everyone,
> 
> we are struggling to understand how Flink handles watermarks from Kafka when 
> reprocessing events using their event time.
> 
> The goal of the Flink job is to consume events from 3 Kafka topics (3 
> partitions each), order them using their event time (using a process 
> function) and write them to a single output topic. We are using Flink 1.15.2 
> and have a BoundedOutOfOrderness of 5 seconds.
> 
> Job Graph:
> 
> Kafka Topic 1  Kafka Source  Map Operator---
> ~80k events   
>   |
>   
> |
> Kafka Topic 2  Kafka Source  Map Operator  Union Operator  
> KeyBy Operator --- Process Function - Kafka Output
> ~5k events
>|
>   
> |
> Kafka Topic 3  Kafka Source   Map Operator---
> ~1k events
> 
> Our Assumptions
> 
>   *   The Union-Operator doesn‘t make a difference for the watermarks, e.g. 
> it simply propagates the watermarks of the Kafka Sources to the 
> ProcessFunction
>   *   Each Flink Kafka Source emits one watermark that is the minimum among 
> the minimum extracted timestamps across all it's partitions (per-partition 
> watermarks)
> Desired Behaviour
> 
>   *   The Flink Kafka Source advance their Watermarks according to their 
> progress of processing the events. The ProcessFunction buffers each event in 
> its local state and only releases them (ordered by observered event time) 
> once it received a watermark from all its Kafka Sources.
> 
> Actual Behaviour
> 
>   *   When starting the Flink Job, it reprocesses the existing data in the 
> Kafka Topics. We see that the Watermark advances (In the Flink WebUI) way to 
> quickly (especially for the large stream) and the ProcessFunction quickly 
> sees a very recent watermark, when Flink still processes lots of old events 
> from the large topic. This very recent watermark causes the ProcessFunction 
> to mark all other events coming in as late and drops them.
> 
> Other Observations
> 
>   *   If we set the AutoWatermarkingInterval to 0, the processing function 
> orders the events correctly based on event time (We are generating watermarks 
> based a TimestampAssigner and a custom Watermarking Strategy). The timestamp 
> assignment is by event.
>   *   If we leave the AutoWatermarkingInterval to 200 (default value), the 
> process function considers ~95% of all incoming events as late - regardless 
> if we assign timestamps by event or emit them periodically.
>   *   We changed the Watermarking Generator to Periodic and we see the same 
> results
>   *   We have tried alignment groups, however, in this case, the job doesn’t 
> make any progress at all and seems stuck.
> 
> Our Questions
> 
>   *   Why do we see the behavior that we are seen? Where is our knowledge gap 
> on how the Flink Kafka Source generates its watermarks?
>   *   Are the Watermarks for the Flink Kafka source generated and emitted 
> after each event or periodically? How is it related the each partition?
>   *   Why does it works if we set the AutoWatermarkingInterval to 0? What 
> does this change for Flink in the Watermark generation and propagation?
>   *   Why doesn’t alignment group work in this context and why does it seem 
> stuck?
> 
> We are thankful for every input!
> Cheers
> 
> Source Code of Process Function
> public class ProcessFunction extends KeyedProcessFunction Event > {
> 
> private transient MapState> queueState = null;
> 
> @Override
> public void open(Configuration config) {
> TypeInformation key = TypeInformation.of(new TypeHint() 
> {});
> TypeInformation> value = TypeInformation.of(new 
> TypeHint() {});
> queueState = getRuntimeContext().getMapState(new 
> MapStateDescriptor<>("events-by-timestamp", key, value));
> }
> 
> @Override
> public void processElement(Event event, KeyedProcessFunction Event, Event>.Context ctx, Collector out) throws Exception {
> 
> TimerService timerService = ctx.timerService();
> if (ctx.timestamp() > timerService.currentWatermark()) {
> 
> List< Event > listEvents = queueState.get(ctx.timestamp());
> if (isEmpty(listEvents)) {
> listEvents = new ArrayList<>();
> }
> 
> 

Re: Flink流批一体应用在实时数仓数据核对场景下有哪些注意事项?

2024-04-18 Thread Yunfeng Zhou
流模式和批模式在watermark和一些算子语义等方面上有一些不同,但没看到Join和Window算子上有什么差异,这方面应该在batch
mode下应该是支持的。具体的两种模式的比较可以看一下这个文档

https://nightlies.apache.org/flink/flink-docs-master/zh/docs/dev/datastream/execution_mode/

On Thu, Apr 18, 2024 at 9:44 AM casel.chen  wrote:
>
> 有人尝试这么实践过么?可以给一些建议么?谢谢!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2024-04-15 11:15:34,"casel.chen"  写道:
> >我最近在调研Flink实时数仓数据质量保障,需要定期(每10/20/30分钟)跑批核对实时数仓产生的数据,传统方式是通过spark作业跑批,如Apache
> > DolphinScheduler的数据质量模块。
> >但这种方式的最大缺点是需要使用spark sql重写flink 
> >sql业务逻辑,难以确保二者一致性。所以我在考虑能否使用Flink流批一体特性,复用flink 
> >sql,只需要将数据源从cdc或kafka换成hologres或starrocks表,再新建跑批结果表,最后只需要比较相同时间段内实时结果表和跑批结果表的数据即可。不过有几点疑问:
> >1. 原实时flink sql表定义中包含的watermark, process_time和event_time这些字段可以复用在batch 
> >mode下么?
> >2. 实时双流关联例如interval join和temporal join能够用于batch mode下么?
> >3. 实时流作业中的窗口函数能够复用于batch mode下么?
> >4. 其他需要关注的事项有哪些?


Strange Problem (0 AvailableTask)

2024-04-18 Thread Hemi Grs
Hello,

I have several versions of Flink (1.17.0, 1.18.0, 1.18.1 and 1.19.0) on my
server.
I am still trying it out (on & off), and I was running a job for sync a
table from mysql to elasticsearch and it was running find without any
problems ( I was using 1.18.1 version).
But after a few weeks, I forgot about it and check the dashboard and no job
is running. But the strange part is there are 0 available task (I config it
to have 10 tasks).

I tried restarting the service but still it show 0 available tasks. I even
try using different versions and all of them (except 1.17.0) has no
available tasks. So now I am back using the 1.17.0.

When I checked the log it has this message:
-
Tokens update task not started because either no tokens obtained or none of
the tokens specified its renewal date
-

Is it because of that? and what is the solution?

Appreciate for all the help.

Thanks