Dear ShardingSphere Community,
I am glad to discuss a new design for Sharding-Scaling Job. Same issue in GitHub [Issue#5106](https://github.com/apache/incubator-shardingsphere/issues/5106). ## Current Job Design Sharding-Scaling will extract `datasources` and `actualDataNodes` information from ShardingSphere's sharding configuration to generate Sharding-Scaling jobs. Each complete ShardingSphere configuration will generate a complete Sharding-Scaling job. After checking datasources, Sharding-Scaling job will do the first time job spliting according to the datasource. Each datasource will be one task of Sharding-Scaling job. In datasource sub-task, it also will be splited to inventory task(or named history task) and incremental task (or named real-time task). What's more, the inventory task will still be split further through the tables, and maybe then splited by primary key. Job structure like following: ``` Sharding-Scaling Job |--- Datasource1 Sub-task ||--- Inventory Task |||--- TableA Task |||--- TableA Task shard #1 |||--- TableA Task shard #2 ||..... |||--- TableB Task ||..... ||--- Incremental Task |--- Datasource2 Sub-task ..... ``` For each table task without sharding, table task shard and incremental task, there are at lease two thread to execute the task. One is `Reader` which query or subscribe change data and the other `Writer` which write data into targer ShardingSphere. The design is not friendly for distribution: - The leaf task may be the second layer(incremental task), third layer(table task without sharding and fourth layer(table task shard). - Concurrency or thread pool size is hard to control. - Limited thread pool may cause problem [Issue#5092](https://github.com/apache/incubator-shardingsphere/issues/5092) ## New Design In new design, the Sharding-Scaling job will not be splited with logic of database structure rather by the logic of distributed jobs. ``` Sharding-Scaling Job |--- Inventory Task |--- Inventory Task #1 ||--- Inner Inventory Task For ds1.table1 ||--- Inner Inventory Task For ds1.table2 #1 |--- Inventory Task #2 ||--- Inner Inventory Task For ds1.table2 #2 ||--- Inner Inventory Task For ds2.table3 ...... |--- Incremental Task |--- Incremental Task #1 For ds1 |--- Incremental Task #2 For ds2 ...... ``` In new design, Sharding-Scaling split job to inventory and incremental first, and then split inventory task by `job concurrency` users configured. The leaf task is `Inventory Task #n` and `Incremental Task #n`, which can be equally distributed to different Sharding-Scaling nodes. The inventory table task and inventory table task shard in old design will be as inner task of `Inventory Task #n`, and exeucte one by one. For `Reader` and `Writer`, they are similar to old design. But for new design, the thread pool size and concurrency become easier to understand and configure. Users only need to understand `job concurrency` property. Todo list: - [ ] Rename `Reader` and `Writer` [Issue#4595](https://github.com/apache/incubator-shardingsphere/issues/4595) - [ ] Rename `HistoryTask` and `RealtimeTask` - [ ] Refactor `ExecuteEngine` in Sharding-Scaling - [ ] Refactor `Inventory Task` - [ ] Refactor `TaskController` - [ ] Refactor `JobController` I think the new design is better, and more suitable for development of Sharding-Scaling. Any suggestions? -- Yi Yang(Sion) Apache ShardingSphere