[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-25 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r511705764



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -227,34 +224,9 @@ public static boolean deleteLoadFoldersFromFileSystem(
 if (details != null && details.length != 0) {
   for (LoadMetadataDetails oneLoad : details) {
 if (checkIfLoadCanBeDeleted(oneLoad, isForceDelete)) {
-  ICarbonLock segmentLock = 
CarbonLockFactory.getCarbonLockObj(absoluteTableIdentifier,

Review comment:
   Actually, this part of the modifications comes from #3971 contributed by 
@Indhumathi27 .  It removes the condition of insert_in_progress and lock 
acquirement, I think it may delete the segments which are in progress. So maybe 
I can revoke this modification. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-25 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r511703767



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
##
@@ -323,10 +320,6 @@ public static boolean 
recordNewLoadMetadata(LoadMetadataDetails newMetaEntry,
 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {

Review comment:
   Seems in `insertOverwrite` process, the stale data can be removed 
immediately.  I think we can keep the original `addToStaleFolders` function here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-25 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r511703719



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
##
@@ -323,10 +320,6 @@ public static boolean 
recordNewLoadMetadata(LoadMetadataDetails newMetaEntry,
 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {

Review comment:
   > Let's discuss insertOverwrite behavior.
   > Should we delete old data immediately?
   
   Seems in `insertOverwrite` process, the stale data can be removed 
immediately.  I think we can keep the original `addToStaleFolders` function here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-25 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r511703571



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
##
@@ -323,10 +320,6 @@ public static boolean 
recordNewLoadMetadata(LoadMetadataDetails newMetaEntry,
 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {

Review comment:
   Seems in `insertOverwrite` process, the stale data can be removed 
immediately.  I think we can keep the original `addToStaleFolders` function here
   
   > Let's discuss insertOverwrite behavior.
   > Should we delete old data immediately?
   
   Seems in `insertOverwrite` process, the stale data can be removed 
immediately.  I think we can keep the original `addToStaleFolders` function here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-14 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r504537575



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   @ajantha-bhat after the merge of #3978 , cleanStaleDeltaFiles will only 
be called when there are exceptions during the update process. And it will only 
delete the delta and index file created by the exception update.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-13 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r504382333



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala
##
@@ -108,9 +108,6 @@ private[sql] case class CarbonProjectForDeleteCommand(
   }
   val executorErrors = ExecutionErrors(FailureCauses.NONE, "")
 
-  // handle the clean up of IUD.
-  CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, false)

Review comment:
   Yes, the purpose of PR is to prevent delete data by accident, after 
checking the code of `cleanUpDeltaFiles` function, I think it's safe to keep it 
here since it will only delete the files in the aborted scenario and it's 
crucial for data accuracy. I will revert removing this function.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-09 Thread GitBox


Pickupolddriver commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r502735735



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   (a). My PR will dependent on #3934 , if it merged first we don't need to 
worry about the same segment ID in compaction situations. So it the 
deletePartialLoadsInCompaction could be deleted. 
   
   (b). In this PR's design, the stale files would not be deleted 
automatically. Users need to call cleanFiles function to clean them besides PR 
#3917 can handle the clean files function.
   
   (c). More places of calling claenStaleFiles have been removed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org