我又重试了次,不用重启job也会有问题,就是把并行度大于1会有问题!。

1、直接在sql-client里,启动/data/home/jindyliu/flink-demo/flink-1.11.2/bin/sql-client.sh
embedded -d 
/data/home/jindyliu/flink-demo/flink-1.11.2//conf/sql-client-defaults.yaml
sql-client-defaults.yaml的并行度设置为40.

数据一样,其中test表规模是200w条,status表11条。

源表test:
CREATE TABLE test (
`id` INT,
`name` VARCHAR(255),
`time` TIMESTAMP(3),
`status` INT,
PRIMARY KEY(id) NOT ENFORCED
) WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'localhost',
  'port' = '3306',
  'username' = 'root',
  'password' = '1',
  'database-name' = 'ai_audio_lyric_task',
  'table-name' = 'test'
)
源表status
CREATE TABLE status (
`id` INT,
`name` VARCHAR(255),
PRIMARY KEY(id) NOT ENFORCED
) WITH (  
  'connector' = 'mysql-cdc',
  'hostname' = 'localhost',
  'port' = '3306',
  'username' = 'root',
  'password' = '1',
  'database-name' = 'ai_audio_lyric_task',
  'table-name' = 'status'
);

//输出
CREATE TABLE test_status_print (
`id` INT,
`name` VARCHAR(255),
`time` TIMESTAMP(3),
`status` INT,
`status_name` VARCHAR(255),
PRIMARY KEY(id) NOT ENFORCED
) WITH (
        'connector' = 'print'
);

//联接
INSERT into test_status_print 
SELECT t.*, s.name
FROM test AS t
LEFT JOIN status AS s ON t.status = s.id;

复现操作,在mysql-cdc snapshot结束后,改test 表中的status字段,会出现顺序问题。我用print打印了。
snapshot后 test_status中的数据正常:
    0, jindy0, 2020-07-06T20:01:15 , 0, statu0
    1, jindy2, 2020-11-12T00:00:02 , 1, statu2
    2, jindy2, 2020-07-03T18:04:22 , 2, statu3

snapshot后,将mysql表中记录id=0,1,2的行中的status值改为3,预期结果
    0, jindy0, 2020-07-06T20:01:15 , 3, statu3
    1, jindy2, 2020-11-12T00:00:02 , 3, statu3
    2, jindy2, 2020-07-03T18:04:22 , 3, statu3
但输出顺序上有问题,会导致test_status表中的id=0,2两条记录丢失。

1、print输出:
<http://apache-flink.147419.n8.nabble.com/file/t670/1.1> 


ps:
另外观察到另外一个问题是:source数据送到join算子里,好像没啥hash能力,基本都挤在了一个结点上处理了?为啥会这样?感觉这样join算子会是瓶颈!!!很容易反压?!
<http://apache-flink.147419.n8.nabble.com/file/t670/2.2> 

@jark,帮忙看看,我的版本是Version: 1.11.2 Commit: fe36135 @
2020-09-09T16:19:03+02:00,官网下载的 ?







--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复