[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading process
ajantha-bhat edited a comment on pull request #3935: URL: https://github.com/apache/carbondata/pull/3935#issuecomment-694708624 Agree with @Zhangshunyu and @akashrn5 a) When compaction retries, it uses the same segment ID, if stale files are not cleaned. It gives duplicate data. So, before this change, we need #3934 to be merged which can use a unique segment id for compaction retry. b) please check and move the logic of `deletePartialLoadsInCompaction` in clean files command, instead of permanently removing it. If the clean files don't have this logic, it may not able to clean stale files. c) Also if the purpose of this PR is to avoid accidental data loss. you need to handle `cleanStaleDeltaFiles` in` CarbonUpdateUtil.java` and also identify other places. Just handling in once place will not guarantee that we cannot have data loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading process
ajantha-bhat edited a comment on pull request #3935: URL: https://github.com/apache/carbondata/pull/3935#issuecomment-694708624 Agree with @Zhangshunyu and @akashrn5 a) When compaction retries, it uses the same segment ID, if stale files are not cleaned. It gives duplicate data. So, before this change, we need #3934 to be merged which can use a unique segment id for compaction retry. b) please check and move the logic of deletePartialLoadsInCompaction in clean files command, instead of permanently removing it. If the clean files don't have this logic, it may not able to clean stale files c) Also if the purpose of this PR at deleting accidental data loss. you need to handle `cleanStaleDeltaFiles` in` CarbonUpdateUtil.java` and also identify other places. Just handling in once place will not guarantee that we cannot have data loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org