[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading process

2020-09-18 Thread GitBox


ajantha-bhat edited a comment on pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#issuecomment-694708624


   Agree with @Zhangshunyu and @akashrn5 
   a) When compaction retries, it uses the same segment ID, if stale files are 
not cleaned. It gives duplicate data.
   So, before this change, we need #3934 to be merged which can use a unique 
segment id for compaction retry.
   b) please check and move the logic of `deletePartialLoadsInCompaction` in 
clean files command, instead of permanently removing it. If the clean files 
don't have this logic, it may not able to clean stale files.
   c) Also if the purpose of this PR is to avoid accidental data loss. you need 
to handle `cleanStaleDeltaFiles` in` CarbonUpdateUtil.java` and also identify 
other places. Just handling in once place will not guarantee that we cannot 
have data loss. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading process

2020-09-18 Thread GitBox


ajantha-bhat edited a comment on pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#issuecomment-694708624


   Agree with @Zhangshunyu and @akashrn5 
   a) When compaction retries, it uses the same segment ID, if stale files are 
not cleaned. It gives duplicate data.
   So, before this change, we need #3934 to be merged which can use a unique 
segment id for compaction retry.
   b) please check and move the logic of deletePartialLoadsInCompaction in 
clean files command, instead of permanently removing it. If the clean files 
don't have this logic, it may not able to clean stale files
   c) Also if the purpose of this PR at deleting accidental data loss. you need 
to handle `cleanStaleDeltaFiles` in` CarbonUpdateUtil.java` and also identify 
other places. Just handling in once place will not guarantee that we cannot 
have data loss. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org