我又重试了次,不用重启job也会有问题,就是把并行度大于1会有问题!。
1、直接在sql-client里,启动/data/home/jindyliu/flink-demo/flink-1.11.2/bin/sql-client.sh
embedded -d
/data/home/jindyliu/flink-demo/flink-1.11.2//conf/sql-client-defaults.yaml
sql-client-defaults.yaml的并行度设置为40.
数据一样,其中test表规模是200w条,status表11条。
源表test:
CREATE TABLE test (
`id` INT,
`name` VARCHAR(255),
`time` TIMESTAMP(3),
`status` INT,
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '1',
'database-name' = 'ai_audio_lyric_task',
'table-name' = 'test'
)
源表status
CREATE TABLE status (
`id` INT,
`name` VARCHAR(255),
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '1',
'database-name' = 'ai_audio_lyric_task',
'table-name' = 'status'
);
//输出
CREATE TABLE test_status_print (
`id` INT,
`name` VARCHAR(255),
`time` TIMESTAMP(3),
`status` INT,
`status_name` VARCHAR(255),
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'print'
);
//联接
INSERT into test_status_print
SELECT t.*, s.name
FROM test AS t
LEFT JOIN status AS s ON t.status = s.id;
复现操作,在mysql-cdc snapshot结束后,改test 表中的status字段,会出现顺序问题。我用print打印了。
snapshot后 test_status中的数据正常:
0, jindy0, 2020-07-06T20:01:15 , 0, statu0
1, jindy2, 2020-11-12T00:00:02 , 1, statu2
2, jindy2, 2020-07-03T18:04:22 , 2, statu3
snapshot后,将mysql表中记录id=0,1,2的行中的status值改为3,预期结果
0, jindy0, 2020-07-06T20:01:15 , 3, statu3
1, jindy2, 2020-11-12T00:00:02 , 3, statu3
2, jindy2, 2020-07-03T18:04:22 , 3, statu3
但输出顺序上有问题,会导致test_status表中的id=0,2两条记录丢失。
1、print输出:
<http://apache-flink.147419.n8.nabble.com/file/t670/1.1>
ps:
另外观察到另外一个问题是:source数据送到join算子里,好像没啥hash能力,基本都挤在了一个结点上处理了?为啥会这样?感觉这样join算子会是瓶颈!!!很容易反压?!
<http://apache-flink.147419.n8.nabble.com/file/t670/2.2>
@jark,帮忙看看,我的版本是Version: 1.11.2 Commit: fe36135 @
2020-09-09T16:19:03+02:00,官网下载的 ?
--
Sent from: http://apache-flink.147419.n8.nabble.com/