marchpure opened a new pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799


    ### Why is this PR needed?
   There are two major performance bottlenecks of 'insert stage'.
   1) Get LastModifyTime of stagefiles requires a lot of access to OBS.
   2) Parallelism is not supported
   
    ### What changes were proposed in this PR?
   1) Cache the lastmodifytime info when list stage files.
   2) support insert stage in parallel. we add a tag 'loading' to the stages in 
process. different insertstage processes can load different data separately by 
choose the stages without 'loading' tag or stages loaded timeout. which avoid 
loading the same data between concurrent insertstage processes. The 'loading' 
tag is actually an empty file with '.loading' suffix filename.
   
    ### Does this PR introduce any user interface change?
   NO
   
    ### Is any new testcase added?
   YES
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to