[GitHub] [iotdb] cornmonster commented on a change in pull request #4514: [IOTDB-1971] Improve the parallelism of UDF framework execution in stand-alone mode

GitBox Thu, 02 Dec 2021 19:29:48 -0800


cornmonster commented on a change in pull request #4514:
URL: https://github.com/apache/iotdb/pull/4514#discussion_r761623714




##########
File path: 
server/src/main/java/org/apache/iotdb/db/query/udf/core/layer/LayerBuilder.java
##########
@@ -67,18 +82,30 @@ public DAGBuilder(
 
     expressionIntermediateLayerMap = new HashMap<>();
     expressionDataTypeMap = new HashMap<>();
+
+    fragmentDataSetIndexToLayerPointReaders = new ArrayList<>();
+    resultColumnOutputIndexToFragmentDataSetOutputIndex = new 
int[resultColumnExpressions.length][];
   }
 
-  public DAGBuilder buildLayerMemoryAssigner() {
+  public LayerBuilder buildLayerMemoryAssigner() {
     for (Expression expression : resultColumnExpressions) {
       expression.updateStatisticsForMemoryAssigner(memoryAssigner);
     }
     memoryAssigner.build();
     return this;
   }
 
-  public DAGBuilder buildResultColumnPointReaders() throws 
QueryProcessException, IOException {
-    for (int i = 0; i < resultColumnExpressions.length; ++i) {
+  public LayerBuilder buildResultColumnPointReaders() throws 
QueryProcessException, IOException {
+    for (int i = 0, n = resultColumnExpressions.length; i < n; ++i) {
+      // resultColumnExpressions[i] -> the index of the fragment it belongs to
+      Integer fragmentDataSetIndex =

Review comment:
       Maybe it is more intuitive to figure out the way of splitting the 
computation during the planning stage. And I believe that the parallel 
execution should be a framework for all the other splittable plans.
   
   Thus, in the standalone mode, the parallel execution framework could be like 
this:
   
   UDTF plan -- split --> Several sub-plans -- parallel execution 
(multi-threaded) --> Several datasets -- merge --> merged dataset
   
   And in the cluster mode, the sub-plans could be distributed across the 
cluster (according to data locality) to achieve better performance:
   
   UDTF plan -- split --> Several sub-plans -- parallel execution (distributed 
execution) --> Several datasets -- merge --> merged dataset
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [iotdb] cornmonster commented on a change in pull request #4514: [IOTDB-1971] Improve the parallelism of UDF framework execution in stand-alone mode

Reply via email to