Hi Flink 社区:
我们的Flink版本是1.9.2,用的是blink planer,我们今天遇到一个问题,目前没有定位到原因,现象如下:
手动重启任务时,指定了从一个固定的checkpoint恢复时,有一定的概率,一部分状态数据无法正常恢复,启动后Flink任务本身可以正常运行,且日志中没有明显的报错信息。
具体现象是:type=realshow的数据没有从状态恢复,也就是从0开始累加,而type=show和type=click的数据是正常从状态恢复的。


SQL大致如下:
createview view1 as
select event_id, act_time, device_id
from table1
where `getStringFromJson`(`act_argv`, 'ispin', '') <>'1'
and event_id in ('article_newest_list_show','article_newest_list_sight_show', 
'article_list_item_click', 'article_auto_video_play_click');


--天的数据
insertinto table2
select platform, type, `time`, count(1) as pv, hll_uv(device_id) as uv
from
(select'03'as platform, trim(casewhen event_id 
='article_newest_list_show'then'show'
when event_id ='article_newest_list_sight_show'then'realshow'
when event_id ='article_list_item_click'then'click'else''end) astype,
`date_parse`(`act_time`, 'yyyy-MM-dd HH:mm:ss', 'yyyyMMdd') as `time`, device_id
from view1
where event_id in ('article_newest_list_show','article_newest_list_sight_show', 
'article_list_item_click')
unionall
select'03'as platform, 'click_total'astype,
`date_parse`(`act_time`, 'yyyy-MM-dd HH:mm:ss', 'yyyyMMdd') as `time`, device_id
from view1
where event_id in ('article_list_item_click', 'article_auto_video_play_click'))a
groupby platform, type, `time`;


期待大家的帮助与回复,希望能给些问题排查的思路!



回复