[PR] [IOTDB-6267] Load 2.0 [iotdb]

via GitHub Tue, 12 Dec 2023 23:56:54 -0800


yschengzi opened a new pull request, #11705:
URL: https://github.com/apache/iotdb/pull/11705


   - MPP Load has some problems with stability and compatibility with the Pipe 
system, and there is room for optimization in loading speed.
   Issue 1: Uncontrollable upper limit of total memory used when multiple Load 
statements are executed concurrently.
   - Currently, Load only strictly controls the upper limit of memory used by a 
single Load statement during its execution life cycle.
   - When a large number of Load statements are executed concurrently, the 
total memory size used by these Load statements is uncontrollable.
   - Please refer to MPP Load memory footprint for the memory usage during the 
execution life cycle of a single Load statement.
   Issue 2: New data added by Load is not properly recognized by the Pipe 
system.
   - The Pipe system currently adds a ProgressIndex to all new data added to 
the IoTDB (see the discussion of Key Issues in Pipe System Design and 
Implementation).
   - In a normal write process, the process of adding the index is realized by 
the consensus layer.
   - In the normal write process, the process of adding an identifier is 
implemented by the consensus layer. However, the current Load's two-phase 
transaction commit process does not go through the consensus layer, and does 
not have a normal progress identifier, nor can it be correctly recognized by 
the Pipe system when restarting the task.
   Issue 3: Too many serialization steps in the Load TsFile process.
   - In the LoadTsFileScheduler class, the implementation of MPP Load 1.0 is to
     - Iterate through each TsFile in the TsFile.
     - For each TsFile, perform split first, then send, and then perform the 
second stage after all the sends are completed.
     - After completing the second phase, the next TsFile is loaded 
sequentially.
   - Since the TsFile splitting process may involve memory computation, the 
disk IO capacity is not fully utilized during the memory computation.
   - A single TsFile is serialized during splitting and sending via Thrift, 
waiting for both disk IO and network IO.
   
   document link:https://apache-iotdb.feishu.cn/docx/UE9Od5caDoLoYJxt4Ptc4s0hnof


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [IOTDB-6267] Load 2.0 [iotdb]

Reply via email to