[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738628213 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3306/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738625327 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5065/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738625173 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738624282 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5061/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738619646 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3300/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738618787 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.
Nihal kumar ojha created CARBONDATA-4070: Summary: Handle the scenario mentioned in description for SI. Key: CARBONDATA-4070 URL: https://issues.apache.org/jira/browse/CARBONDATA-4070 Project: CarbonData Issue Type: Bug Reporter: Nihal kumar ojha # SI creation should not be allowed on SI table. # SI table should not be scanned with like filter on MT. # Drop column should not be allowed on SI table. Add the FT for all above scenario and sort column related scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 opened a new pull request #4042: [CARBONDATA-4069] handled set streaming for SI table or table having SI.
nihal0107 opened a new pull request #4042: URL: https://github.com/apache/carbondata/pull/4042 ### Why is this PR needed? Currently `alter table set streaming = true` is allowed either for SI table or MT which has SI. ### What changes were proposed in this PR? Handled set streaming true for SI table and MT which has SI and throws the exception. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.
Nihal kumar ojha created CARBONDATA-4069: Summary: Alter table set streaming=true should not be allowed on SI table or table having SI. Key: CARBONDATA-4069 URL: https://issues.apache.org/jira/browse/CARBONDATA-4069 Project: CarbonData Issue Type: Bug Reporter: Nihal kumar ojha # Create carbon table and SI . # Now set streaming = true on either SI table or main table. Both the operation should not be allowed because SI is not supported on streaming table. create table maintable2 (a string,b string,c int) STORED AS carbondata; insert into maintable2 values('k','x',2); create index m_indextable on table maintable2(b) AS 'carbondata'; ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true'); => operation should not be allowed. ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation should not be allowed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 opened a new pull request #4041: [CARBONDATA-4068] handled set long string column on SI table.
nihal0107 opened a new pull request #4041: URL: https://github.com/apache/carbondata/pull/4041 ### Why is this PR needed? Currently set column as long string on which SI is already created is allowed. This operation should not be allowed because SI doesn't support long strings. ### What changes were proposed in this PR? Handled set long string on SI columns and throws the exception. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.
Nihal kumar ojha created CARBONDATA-4068: Summary: Alter table set long string should not allowed on SI column. Key: CARBONDATA-4068 URL: https://issues.apache.org/jira/browse/CARBONDATA-4068 Project: CarbonData Issue Type: Bug Reporter: Nihal kumar ojha # Create table and create SI. # Now try to set the column data type to long string on which SI is created. Operation should not be allowed because we don't support SI on long string. create table maintable (a string,b string,c int) STORED AS carbondata; create index indextable on table maintable(b) AS 'carbondata'; insert into maintable values('k','x',2); ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b'); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akkio-97 commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
akkio-97 commented on a change in pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#discussion_r535857150 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -89,8 +89,13 @@ case class CarbonAddLoadCommand( throw new ConcurrentOperationException(carbonTable, "insert overwrite", "add segment") } -val inputPath = options.getOrElse( +var givenPath = options.getOrElse( "path", throw new UnsupportedOperationException("PATH is mandatory")) +// remove file separator if already present +if (givenPath.charAt(givenPath.length - 1) == '/') { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]
CarbonDataQA2 commented on pull request #4040: URL: https://github.com/apache/carbondata/pull/4040#issuecomment-738578591 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3298/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]
CarbonDataQA2 commented on pull request #4040: URL: https://github.com/apache/carbondata/pull/4040#issuecomment-738576741 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5057/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
CarbonDataQA2 commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738549058 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3297/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
CarbonDataQA2 commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738546905 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5056/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu opened a new pull request #4040: [WIP][CI TEST]
Zhangshunyu opened a new pull request #4040: URL: https://github.com/apache/carbondata/pull/4040 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738528336 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738527880 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5055/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4054) Size control of minor compaction
[ https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4054. -- Fix Version/s: 2.2.0 Resolution: Fixed > Size control of minor compaction > > > Key: CARBONDATA-4054 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4054 > Project: CarbonData > Issue Type: Improvement >Reporter: ZHANGSHUNYU >Priority: Major > Fix For: 2.2.0 > > Time Spent: 10h > Remaining Estimate: 0h > > {{Currentlly, minor compaction only consider the num of segments and major}} > compaction only consider the SUM size of segments, but consider a scenario > that the user want to use minor compaction by the num of segments but he > dont want to merge the segment whose datasize larger the threshold for > example 2GB, as it is no need to merge so much big segment and it is time > costly. > so we need to add a parameter to control the threshold of segment included > in minor compaction, so that the user can specify the segment not included > in minor compaction once the datasize exeed the threshold, of course default > value must be threre. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
asfgit closed pull request #4020: URL: https://github.com/apache/carbondata/pull/4020 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command
Zhangshunyu commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738502710 Added support insert literal TODO: Add doc and more test cases This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738483684 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738280044 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738201557 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3293/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738200415 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5052/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535465503 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1095,7 +1099,8 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean // if execute command 'clean files' or the number of invisible segment info // exceeds the value of 'carbon.invisible.segments.preserve.count', // it need to append the invisible segment list to 'tablestatus.history' file. -if (isForceDeletion || (invisibleSegmentCnt > invisibleSegmentPreserveCnt)) { +if (cleanStaleInprogress || cleanCompactedAndMFD || (invisibleSegmentCnt > Review comment: changes it to isCleanFilesOperation, if we do not keep this, we will change the behaviour of show history segments, as that file would never be updated, since previously clean files was always force option, it would always go in that if condition for clean files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535287432 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: i have put it back ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -173,40 +176,68 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || -SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) -&& oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { +if (oneLoad.getVisibility().equalsIgnoreCase("true")) { Review comment: it's handeled in checkIfLoadCanBeDeletedPhysically method, it does not check visibility before deleting ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -117,9 +119,9 @@ case class CarbonCleanFilesCommand( CleanFilesUtil.cleanStaleSegments(carbonTable) } if (forceTableClean) { Review comment: it is removed in the PR4013 by David ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: updated as per the above discussion ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 opened a new pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
Indhumathi27 opened a new pull request #4039: URL: https://github.com/apache/carbondata/pull/4039 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128556 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3291/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128162 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5049/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
ajantha-bhat commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535392789 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) { val lastSegment = sortedSegments.get(sortedSegments.size() - 1) - deletePartialLoadsInCompaction() Review comment: I think this will go as a stale segment in trash folder when clean files is called right ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
ajantha-bhat commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535391315 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand( logError("Table metadata unlocking is unsuccessful, index table may be in stale state") } } - // in case if the the physical folders still exists for the index table - // but the carbon and hive info for the index table is removed, - // DROP INDEX IF EXISTS should clean up those physical directories - if (ifExistsSet && carbonTable.isEmpty) { Review comment: I agree with @akashrn5 : If code is buggy, we need to fix it. If we remove it. It might bring back the original issues that caused this code change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535385941 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ## @@ -346,4 +346,10 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1, "name) } + def removeSegmentEntryFromTableStatusFile(carbonTable: CarbonTable, segmentNo: String) : Unit = { Review comment: @vikramahuja1001 : yes, instead of defining in each file, move it to some test util. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738119523 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3292/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738118531 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5050/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
ajantha-bhat commented on a change in pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#discussion_r535365912 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -89,8 +89,13 @@ case class CarbonAddLoadCommand( throw new ConcurrentOperationException(carbonTable, "insert overwrite", "add segment") } -val inputPath = options.getOrElse( +var givenPath = options.getOrElse( "path", throw new UnsupportedOperationException("PATH is mandatory")) +// remove file separator if already present +if (givenPath.charAt(givenPath.length - 1) == '/') { + givenPath = givenPath.substring(0, givenPath.length - 1) +} +val inputPath = givenPath Review comment: can you modify current add segment testcase where one has / at the end and the one doesn't ? so, testcase will cover new code also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
ajantha-bhat commented on a change in pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#discussion_r535365097 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -89,8 +89,13 @@ case class CarbonAddLoadCommand( throw new ConcurrentOperationException(carbonTable, "insert overwrite", "add segment") } -val inputPath = options.getOrElse( +var givenPath = options.getOrElse( "path", throw new UnsupportedOperationException("PATH is mandatory")) +// remove file separator if already present +if (givenPath.charAt(givenPath.length - 1) == '/') { Review comment: ```suggestion if (givenPath.charAt(givenPath.length - 1) == 'CarbonCommonConstans.FILE_SEPARATOR') { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr
akashrn5 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359638 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala ## @@ -53,13 +51,15 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging { val carbonTable = cleanFilesPostEvent.carbonTable val indexTables = CarbonIndexUtil .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession) +val force = cleanFilesPostEvent.ifForceDeletion Review comment: ```suggestion val isForceDelete = cleanFilesPostEvent.ifForceDeletion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr
akashrn5 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359069 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -58,8 +58,11 @@ case class CarbonCleanFilesCommand( var carbonTable: CarbonTable = _ var cleanFileCommands: List[CarbonCleanFilesCommand] = List.empty val optionsMap = options.getOrElse(List.empty[(String, String)]).toMap - // forceClean will empty trash + // forceClean will clean the MFD and Compacted segments immediately and also empty the trash + // folder val forceClean = optionsMap.getOrElse("force", "false").toBoolean + // stale_inprogress will clean the In Progress segments Review comment: update comment saying, it will clean based on retention time and it will clean immediately when force is true This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535358582 ## File path: docs/clean-files.md ## @@ -37,10 +38,23 @@ Clean files command is used to remove the Compacted, Marked For Delete ,In Progr ``` Once the timestamp subdirectory is expired as per the configured expiration day value, that subdirectory is deleted from the trash folder in the subsequent clean files command. -### FORCE DELETE TRASH -The force option with clean files command deletes all the files and folders from the trash folder. +### FORCE OPTION +The force option with clean files command deletes all the files and folders from the trash folder and delete the Marked for Delete and Compacted segments immediately. ``` CLEAN FILES FOR TABLE TABLE_NAME options('force'='true') ``` -Since Clean Files operation with force option will delete data that can never be recovered, the force option by default is disabled. Clean files with force option is only allowed when the carbon property ```carbon.clean.file.force.allowed``` is set to true. The default value of this property is false. \ No newline at end of file +Since Clean Files operation with force option will delete data that can never be recovered, the force option by default is disabled. Clean files with force option is only allowed when the carbon property ```carbon.clean.file.force.allowed``` is set to true. The default value of this property is false. Review comment: move this section, above force as they have to enable this before trying force This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535357362 ## File path: docs/clean-files.md ## @@ -24,6 +24,7 @@ Clean files command is used to remove the Compacted, Marked For Delete ,In Progr ``` CLEAN FILES FOR TABLE TABLE_NAME ``` +The above clean files command will clean Marked For Delete and Compacted segments depending on query time (default 1 hr) and trash retention timeout (default 7 days). It will also delete the timestamp subdirectories from the trash folder after expiration day(default 7 day, can be configured) Review comment: query time, trash retention timeout => use the original carbon property names This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535356095 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || -SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) -&& oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean isForceDeletion) { +if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean isForceDeletion, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() -|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null -|| oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); - +if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); } - return false; } + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean isForceDeletion, boolean cleanStaleInProgress) { +/* + * if cleanStaleInProgress == false and force == false, clean MFD and Compacted + * segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default). + * For example: + * If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete after 7 days + * If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete after 1 hr + * + * if cleanStaleInProgress == false and force == true, clean MFD and Compacted + * segments immediately(Do not check for any timeout) + * + * if cleanStaleInProgress == true and force == false, clean Stale Inprogress, MFD and + * compacted segments after 7 days(taking carbon.trash.retention.time value) + * + * if cleanStaleInProgress == true and force == true, clean MFD, Compacted and + * stale inprogress segments immediately.(Do not check for any timeout) + */ +if (isForceDeletion) { + // immediately delete compacted and MFD + if (cleanStaleInProgress) { +// immediately delete inprogress segments too +return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus +.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus +.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE == +oneLoad.getSegmentStatus(); + } + return SegmentStatus.COMPACTED == + oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE == oneLoad + .getSegmentStatus(); +} +long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); +// in case there is no deletion or modification timestamp, take the load start time as +// deleteTime +if (deletionTime == 0) { + deletionTime = oneLoad.getLoadStartTime(); +} +if (cleanStaleInProgress) { + // delete MFD, compacted and Inprogress segments after trash timeout + return (SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus Review comment: same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535355003 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || -SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) -&& oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean isForceDeletion) { +if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean isForceDeletion, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() -|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null -|| oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); - +if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); } - return false; } + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean isForceDeletion, boolean cleanStaleInProgress) { +/* + * if cleanStaleInProgress == false and force == false, clean MFD and Compacted + * segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default). + * For example: + * If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete after 7 days + * If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete after 1 hr + * + * if cleanStaleInProgress == false and force == true, clean MFD and Compacted + * segments immediately(Do not check for any timeout) + * + * if cleanStaleInProgress == true and force == false, clean Stale Inprogress, MFD and + * compacted segments after 7 days(taking carbon.trash.retention.time value) + * + * if cleanStaleInProgress == true and force == true, clean MFD, Compacted and + * stale inprogress segments immediately.(Do not check for any timeout) + */ +if (isForceDeletion) { + // immediately delete compacted and MFD + if (cleanStaleInProgress) { +// immediately delete inprogress segments too +return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus +.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus +.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE == Review comment: move COMPACTED and MARKED_FOR_DELETE above and return if true. check only INSERT_OVERWRITE_IN_PROGRESS, INSERT_IN_PROGRESS in INSERT_IN_PROGRESS This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535348536 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || -SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || -SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) -&& oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean isForceDeletion) { +if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean isForceDeletion, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it -if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() -|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null -|| oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { -return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); - +if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress); Review comment: ```suggestion return CanDeleteThisLoad(oneLoad, isForceDeletion, cleanStaleInProgress); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
akashrn5 commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535345986 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand( logError("Table metadata unlocking is unsuccessful, index table may be in stale state") } } - // in case if the the physical folders still exists for the index table - // but the carbon and hive info for the index table is removed, - // DROP INDEX IF EXISTS should clean up those physical directories - if (ifExistsSet && carbonTable.isEmpty) { Review comment: are you saying that, even though the t2 is not an index, and i call drop index on t2, it will consider as a index and drop t2 table folder? If so, we should handle there and keep this code here. This was handled specially for SI i think in negative scenarios This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
akashrn5 commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535339696 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) { val lastSegment = sortedSegments.get(sortedSegments.size() - 1) - deletePartialLoadsInCompaction() Review comment: but the stale data will always be present inside segment folder right? by chance assume if the segment file is corrupted or deleted, still carbon should pass the query, which it does by listing, in that case we will get the wrong data or query will always fail. I think we need to have some way to clean them, what you think @QiangCai @ajantha-bhat This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535334869 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders -.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, -carbonTable.getMetadataPath()); +.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, Review comment: it is always advisable to add a new argument at the end This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535333805 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders -.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, -carbonTable.getMetadataPath()); +.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, Review comment: check for all the places and handle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
akashrn5 commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535332892 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) { } - /** - * Handling of the clean up of old carbondata files, index files , delete delta, - * update status files. - * @param table clean up will be handled on this table. - * @param forceDelete if true then max query execution timeout will not be considered. - */ - public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) throws IOException { Review comment: cleanStaleDeltaFiles will be called only in case of exception here, please see below points 1. in Projectfordeletecommand, its called two times both in exception case and finally block, thats wrong, it should be only in finally block 2. its called multiple times in `DeleteExecution ` class which needs to be checked and avoid it. 3. since its handled only in case of exception case, how its taken care in case of application crash or shutdown scenario? 4. cleanStaleDeltaFiles had cases of handling the update and delete aborted case, how its handled now, as the complete method is removed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535332006 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders -.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, -carbonTable.getMetadataPath()); +.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, Review comment: keep the original order of arguments (identifier, isForceDeletion, cleanStaleInProgress, deatils, path), to avoid sending wrong arguments values from callers This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535321147 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -90,13 +93,13 @@ public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTabl * Delete the invalid data physically from table. * @param carbonTable table * @param loadDetails Load details which need clean up - * @param isForceDelete is Force delete requested by user + * @param force Force delete Compacted and MFD segments Review comment: force --> isForceDelete Also it will force clean the trash. add that also in header description ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -68,19 +68,22 @@ private static String getSegmentPath(AbsoluteTableIdentifier identifier, public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTable, LoadMetadataDetails[] newAddedLoadHistoryList, - boolean isForceDelete, + boolean isForceDeletion, Review comment: keep it same as previous `isForceDelete` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-738046006 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3289/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-738041238 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5047/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678032 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: Instead of just a force option deleting all MFD, Compacted and stale Insert In Progress segments, dividing them into 2 parameters, cleanMFDandCompacted and cleanStaleInProgress. CleanMFDAndCompacted parameter will decide if MFD and Compacted segments can be deleted and cleanStaleInProgress will decide if stale InProgress segments can be deleted. After giving these 2 parameters, force can be removed. if the user wants to do force deletion, they can set both cleanMFDandCompacted and cleanStaleInProgress to true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations
[ https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4050. -- Resolution: Fixed > TPC-DS queries performance degraded when compared to older versions due to > redundant getFileStatus() invocations > > > Key: CARBONDATA-4050 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4050 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 >Reporter: Venugopal Reddy K >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > *Issue:* > In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata > files in the segment, loop through all the carbon files and make a map of > fileNameToMetaInfoMapping > In that carbon files loop, if the file is of AbstractDFSCarbonFile > type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And > the method to get file status is an RPC call(fileSystem.getFileStatus(path)). > It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms > per file. So overall driver side query processing time has increased > significantly when there are more carbon files. Hence caused TPC-DS queries > performance degradation. > Have shown the methods/calls which get the file status for the carbon file in > loop: > {code:java} > public static Map > createCarbonDataFileBlockMetaInfoMapping( > String segmentFilePath, Configuration configuration) throws IOException { > Map fileNameToMetaInfoMapping = new TreeMap(); > CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, > configuration); > if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof > S3CarbonFile)) { > PathFilter pathFilter = new PathFilter() { > @Override > public boolean accept(Path path) { > return CarbonTablePath.isCarbonDataFile(path.getName()); > } > }; > CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter); > for (CarbonFile file : carbonFiles) { > String[] location = file.getLocations(); // RPC call - 1 > long len = file.getSize(); // RPC call - 2 > BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len); > fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC > call - 3 in file.getpath() method > } > } > return fileNameToMetaInfoMapping; > } > {code} > > *Suggestion:* > I think, currently we make RPC call to get the file status upon each > invocation because file status may change over a period of time. And we > shouldn't cache the file status in AbstractDFSCarbonFile. > In the current case, just before the loop of carbon files, we get the > file status of all the carbon files in the segment with RPC call shown below. > LocatedFileStatus is a child class of FileStatus. It has BlockLocation along > with file status. > {code:java} > RemoteIterator iter = > fileSystem.listLocatedStatus(path);{code} > Intention of getting all the file status here is to create instance > of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping. > So it is safe to avoid these unnecessary rpc calls to get file status again > in getLocations(), getSize() and getPath() methods. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4010: [CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is instantiated with fileStatus constructor
asfgit closed pull request #4010: URL: https://github.com/apache/carbondata/pull/4010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4046) Select count(*) fails on partition table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4046. -- Fix Version/s: 2.1.1 Resolution: Fixed > Select count(*) fails on partition table. > - > > Key: CARBONDATA-4046 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4046 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Steps to reproduce > 1. set property `carbon.read.partition.hive.direct=false` > 2. Create table which contain more than one partition column. > 3. run query select count (*) > > It fails with exception as `Key not found`. > > create table partition_cache(a string) partitioned by(b int, c String) stored > as carbondata; > insert into partition_cache select 'k',1,'nihal'; > select count(*) from partition_cache where b = 1; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4002: [CARBONDATA-4046] Handled multiple partition columns for partition cache
asfgit closed pull request #4002: URL: https://github.com/apache/carbondata/pull/4002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4038: [WIP] Fix MV having Subquery alias used in query projection
CarbonDataQA2 commented on pull request #4038: URL: https://github.com/apache/carbondata/pull/4038#issuecomment-738017364 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3288/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.
[ https://issues.apache.org/jira/browse/CARBONDATA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4022. -- Fix Version/s: 2.1.1 Resolution: Fixed > Getting the error - "PathName is not a valid DFS filename." with index server > and after adding carbon SDK segments and then doing select/update/delete > operations. > -- > > Key: CARBONDATA-4022 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4022 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Getting this error - "PathName is not a valid DFS filename." during the > update/delete/select queries on a added SDK segment table. Also the path > represented in the error is not proper, which is the cause of error. This is > seen only when index server is running and disable fallback is true. > Queries and errors: > > create table sdk_2level_1(name string, rec1 > > struct>) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.425 seconds) > > alter table sdk_2level_1 add segment > > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata'); > +-+ > | Result | > +-+ > +-+ > No rows selected (0.77 seconds) > > select * from sdk_2level_1; > INFO : Execution ID: 1855 > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 600.0 (TID 21345, linux, executor 16): > java.lang.IllegalArgumentException: Pathname > /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata > from > hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293) > at > org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198) > at > org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188) > at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100) > at > org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60) > at > org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65) > at > org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902) > at > org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) > at > org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47) > at > org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPl
[GitHub] [carbondata] asfgit closed pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.
asfgit closed pull request #4017: URL: https://github.com/apache/carbondata/pull/4017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4038: [WIP] Fix MV having Subquery alias used in query projection
CarbonDataQA2 commented on pull request #4038: URL: https://github.com/apache/carbondata/pull/4038#issuecomment-738012712 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5046/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-738008523 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3287/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-738006908 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5045/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4002: [CARBONDATA-4046] Handled multiple partition columns for partition cache
kunal642 commented on pull request #4002: URL: https://github.com/apache/carbondata/pull/4002#issuecomment-738002378 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.
kunal642 commented on pull request #4017: URL: https://github.com/apache/carbondata/pull/4017#issuecomment-738002313 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
akashrn5 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-738001983 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
CarbonDataQA2 commented on pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#issuecomment-738000543 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3286/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
CarbonDataQA2 commented on pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#issuecomment-73704 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5044/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224194 ## File path: integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala ## @@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, sparkSession: SparkSessi * @param carbonTable * @param sparkSession */ -case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: SparkSession) - extends Event with CleanFilesEventInfo +case class CleanFilesPostEvent(carbonTable: CarbonTable, Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535228453 ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.trash + +import scala.collection.JavaConverters._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.indexstore.PartitionSpec +import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, TrashUtil} + +/** + * This object will manage the following data. + * 1. .Trash folder + * 2. stale segments without metadata + * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + */ +object DataTrashManager { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + /** + * clean garbage data + * 1. clean .Trash folder + * 2. clean stale segments without metadata + * 3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + * + * @param carbonTable : CarbonTable Object + * @param partitionSpecs : Hive Partitions details + */ + def cleanGarbageData( + carbonTable: CarbonTable, + force: Boolean = false, + partitionSpecs: Option[Seq[PartitionSpec]] = None): Unit = { +var carbonCleanFilesLock: ICarbonLock = null +val absoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier +try { + val errorMsg = "Clean files request is failed for " + +s"${ carbonTable.getQualifiedName }" + +". Not able to acquire the clean files lock due to another clean files " + +"operation is running in the background." + carbonCleanFilesLock = CarbonLockUtil.getLockObject(absoluteTableIdentifier, +LockUsage.CLEAN_FILES_LOCK, errorMsg) + // step 1: clean trash folder + cleanTrashFolder(carbonTable, force) + // step 2: clean stale segments which are not exists in metadata + cleanStaleSegments(carbonTable) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535228252 ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.trash + +import scala.collection.JavaConverters._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.indexstore.PartitionSpec +import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, TrashUtil} + +/** + * This object will manage the following data. + * 1. .Trash folder + * 2. stale segments without metadata + * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + */ +object DataTrashManager { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + /** + * clean garbage data + * 1. clean .Trash folder + * 2. clean stale segments without metadata + * 3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + * + * @param carbonTable : CarbonTable Object + * @param partitionSpecs : Hive Partitions details + */ + def cleanGarbageData( + carbonTable: CarbonTable, + force: Boolean = false, + partitionSpecs: Option[Seq[PartitionSpec]] = None): Unit = { +var carbonCleanFilesLock: ICarbonLock = null +val absoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier +try { + val errorMsg = "Clean files request is failed for " + +s"${ carbonTable.getQualifiedName }" + +". Not able to acquire the clean files lock due to another clean files " + +"operation is running in the background." + carbonCleanFilesLock = CarbonLockUtil.getLockObject(absoluteTableIdentifier, +LockUsage.CLEAN_FILES_LOCK, errorMsg) + // step 1: clean trash folder + cleanTrashFolder(carbonTable, force) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224194 ## File path: integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala ## @@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, sparkSession: SparkSessi * @param carbonTable * @param sparkSession */ -case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: SparkSession) - extends Event with CleanFilesEventInfo +case class CleanFilesPostEvent(carbonTable: CarbonTable, Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224083 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) { } - /** - * Handling of the clean up of old carbondata files, index files , delete delta, - * update status files. - * @param table clean up will be handled on this table. - * @param forceDelete if true then max query execution timeout will not be considered. - */ - public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) throws IOException { - -SegmentStatusManager ssm = new SegmentStatusManager(table.getAbsoluteTableIdentifier()); - -LoadMetadataDetails[] details = -SegmentStatusManager.readLoadMetadata(table.getMetadataPath()); - -SegmentUpdateStatusManager updateStatusManager = new SegmentUpdateStatusManager(table); -SegmentUpdateDetails[] segmentUpdateDetails = updateStatusManager.getUpdateStatusDetails(); -// hold all the segments updated so that wen can check the delta files in them, ne need to -// check the others. -Set updatedSegments = new HashSet<>(); -for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) { - updatedSegments.add(updateDetails.getSegmentName()); -} - -String validUpdateStatusFile = ""; - -boolean isAbortedFile = true; - -boolean isInvalidFile = false; - -// take the update status file name from 0th segment. -validUpdateStatusFile = ssm.getUpdateStatusFileName(details); -// scan through each segment. -for (LoadMetadataDetails segment : details) { - // if this segment is valid then only we will go for delta file deletion. - // if the segment is mark for delete or compacted then any way it will get deleted. - if (segment.getSegmentStatus() == SegmentStatus.SUCCESS - || segment.getSegmentStatus() == SegmentStatus.LOAD_PARTIAL_SUCCESS) { -// when there is no update operations done on table, then no need to go ahead. So -// just check the update delta start timestamp and proceed if not empty -if (!segment.getUpdateDeltaStartTimestamp().isEmpty() -|| updatedSegments.contains(segment.getLoadName())) { - // take the list of files from this segment. - String segmentPath = CarbonTablePath.getSegmentPath( - table.getAbsoluteTableIdentifier().getTablePath(), segment.getLoadName()); - CarbonFile segDir = - FileFactory.getCarbonFile(segmentPath); - CarbonFile[] allSegmentFiles = segDir.listFiles(); - - // now handle all the delete delta files which needs to be deleted. - // there are 2 cases here . - // 1. if the block is marked as compacted then the corresponding delta files - //can be deleted if query exec timeout is done. - // 2. if the block is in success state then also there can be delete - //delta compaction happened and old files can be deleted. - - SegmentUpdateDetails[] updateDetails = updateStatusManager.readLoadMetadata(); - for (SegmentUpdateDetails block : updateDetails) { -CarbonFile[] completeListOfDeleteDeltaFiles; -CarbonFile[] invalidDeleteDeltaFiles; - -if (!block.getSegmentName().equalsIgnoreCase(segment.getLoadName())) { - continue; -} - -// aborted scenario. -invalidDeleteDeltaFiles = updateStatusManager -.getDeleteDeltaInvalidFilesList(block, false, -allSegmentFiles, isAbortedFile); -for (CarbonFile invalidFile : invalidDeleteDeltaFiles) { - boolean doForceDelete = true; - compareTimestampsAndDelete(invalidFile, doForceDelete, false); -} - -// case 1 -if (CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) { - completeListOfDeleteDeltaFiles = updateStatusManager - .getDeleteDeltaInvalidFilesList(block, true, - allSegmentFiles, isInvalidFile); - for (CarbonFile invalidFile : completeListOfDeleteDeltaFiles) { -compareTimestampsAndDelete(invalidFile, forceDelete, false); - } - -} else { - invalidDeleteDeltaFiles = updateStatusManager - .getDeleteDeltaInvalidFilesList(block, false, - allSegmentFiles, isInvalidFile); - for (CarbonFile invalidFile : invalidDeleteDeltaFiles) { -compareTimestampsAndDelete(invalidFile, forceDelete, false); - } -} - } -} -// handle cleanup of merge index files and data files after small files merge happened for -// SI table -cleanUpDataFilesAfterSmallFilesMerg
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535218031 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand( logError("Table metadata unlocking is unsuccessful, index table may be in stale state") } } - // in case if the the physical folders still exists for the index table - // but the carbon and hive info for the index table is removed, - // DROP INDEX IF EXISTS should clean up those physical directories - if (ifExistsSet && carbonTable.isEmpty) { Review comment: create table t1... create table t2... drop index t2 on t1 this drop index sql will remove table folder of t2 by mistake. ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.trash + +import scala.collection.JavaConverters._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.indexstore.PartitionSpec +import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, TrashUtil} + +/** + * This object will manage the following data. + * 1. .Trash folder + * 2. stale segments without metadata + * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + */ +object DataTrashManager { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + /** + * clean garbage data + * 1. clean .Trash folder + * 2. clean stale segments without metadata + * 3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + * + * @param carbonTable : CarbonTable Object + * @param partitionSpecs : Hive Partitions details Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535216366 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) { val lastSegment = sortedSegments.get(sortedSegments.size() - 1) - deletePartialLoadsInCompaction() Review comment: In this case, clean files have no chance to clean data. keep the stale data in the folder, it will not impact compaction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737977578 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3285/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r535212465 ## File path: integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala ## @@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, sparkSession: SparkSessi * @param carbonTable * @param sparkSession */ -case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: SparkSession) - extends Event with CleanFilesEventInfo +case class CleanFilesPostEvent(carbonTable: CarbonTable, Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737968960 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5043/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 opened a new pull request #4038: [WIP] Fix MV having Subquery alias used in query projection
Indhumathi27 opened a new pull request #4038: URL: https://github.com/apache/carbondata/pull/4038 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command
akkio-97 commented on a change in pull request #4001: URL: https://github.com/apache/carbondata/pull/4001#discussion_r535160669 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -877,6 +877,10 @@ public SegmentFile getSegmentFile() { } if (entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) { String mergeFileName = entry.getValue().getMergeFileName(); + // remove file separator if already present Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r535124613 ## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ## @@ -0,0 +1,411 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo; + +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; + +public class GeoHashUtils { + + /** + * Get the degree of each grid in the east-west direction. + * + * @param originLatitude the origin point latitude + * @param gridSize the grid size + * @return Delta X is the degree of each grid in the east-west direction + */ + public static double getDeltaX(double originLatitude, int gridSize) { +double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR); +return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos); + } + + /** + * Get the degree of each grid in the north-south direction. + * + * @param gridSize the grid size + * @return Delta Y is the degree of each grid in the north-south direction + */ + public static double getDeltaY(int gridSize) { +return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS); + } + + /** + * Calculate the number of knives cut + * + * @param gridSize the grid size + * @param originLatitude the origin point latitude + * @return The number of knives cut + */ + public static int getCutCount(int gridSize, double originLatitude) { +double deltaX = getDeltaX(originLatitude, gridSize); +int countX = Double.valueOf( +Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue(); +double deltaY = getDeltaY(gridSize); +int countY = Double.valueOf( +Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue(); +return Math.max(countX, countY); + } + + /** + * Convert input longitude and latitude to GeoID + * + * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation. + * @param oriLatitude the origin point latitude + * @param gridSize the grid size + * @return GeoID + */ + public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) { +long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY; +long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY; +int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize); +return colRow2GeoID(ij[0], ij[1]); + } + + /** + * Calculate geo id through grid index coordinates, the row and column of grid coordinates + * can be transformed by latitude and longitude + * + * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut + * level. + * @param gridSize the size of minimal grid after cut + * @return Grid ID value [row, column], column starts from 1 + */ + public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude, + int gridSize) { +int cutLevel = getCutCount(gridSize, oriLatitude); +int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) / +GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1)); +int row = (int) Math.floor(latitude / getDeltaY(gridSize) / +GeoConstants.CONVERSION_R
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr
akashrn5 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535070890 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: agree with @ajantha-bhat , the clean files with options, let it do its specific functionality, along with this, if others expired we should do that also, to save time for users.It should be like (1+2), (1+3) and (1+4). I think this will be better This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535066388 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: @QiangCai @akashrn5 , can you clarify if we should delete the expired MFD, compacted in point 3? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: in point 3), If we don't clean expired MFD, expired compacted and expired trash along with expired stale_inprogress. we have to call clean files at least two times to clean all expired data. I feel it is not good. @QiangCai @akashrn5 : what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: If we don't clean expired MFD, expired compacted and expired trash along with expired stale_inprogress. we have to call clean files at least two times to clean all expired data. I feel it is not good. @QiangCai @akashrn5 : what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535054960 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: 1. Trash behaviour remains unchanged. Trash will be deleted automatically when option force = true, it's already in the document, If force = true is not given, then it will delete trash according to timestamp retention property. 2. in_progress = true, should only delete in_progress segments. Expired trash will be deleted anyways. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535043361 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + cleanCompactedAndMFD, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders -.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, -carbonTable.getMetadataPath()); +.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, +cleanCompactedAndMFD, details, carbonTable.getMetadataPath()); Review comment: clean `cleanCompactedAndMFD` flag is no need as everytime we need to clean it, after you fix your point 3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535042468 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: @vikramahuja1001 : please update the document of clean files. > 1 ) default clean files behaviour(clean files on table t1): clean MFD and Compacted segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default). 2) clean files on table t1 options('force'='true'): clean MFD and Compacted segments immediately(Do not check for any timeout) 3) clean files on table t1 options('clean_inprgress'='true') : clean stale inprogress segments depends on trashRetentionTimeout, after 7 days(default behaviour) 4) clean files on table t1 options('clean_inprgress'='true', 'force'='true') : clean MFD, Compacted and stale inprogress segments immediately.(Do not check for any timeout) **my comments for description** 1) you have not mentioned about trash in any of this. mention what happens to trash in each. 2) And in your point 3) we should delete the expired MFD, compacted and expired trash folders also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737786983 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5042/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737784784 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3284/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-737782499 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r534983455 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) { } - /** - * Handling of the clean up of old carbondata files, index files , delete delta, - * update status files. - * @param table clean up will be handled on this table. - * @param forceDelete if true then max query execution timeout will not be considered. - */ - public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) throws IOException { Review comment: only keep cleanStaleDeltaFiles to handle exceptions. clean files should work with the concurrent update, so remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r534982270 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonTruncateCommand.scala ## @@ -18,39 +18,31 @@ package org.apache.spark.sql.execution.command.mutation import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} -import org.apache.spark.sql.catalyst.analysis.NoSuchTableException -import org.apache.spark.sql.execution.command.{DataCommand, TruncateTableCommand} -import org.apache.spark.sql.execution.command.management.CarbonCleanFilesCommand -import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.execution.command.{Checker, DataCommand, TruncateTableCommand} import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.statusmanager.SegmentStatusManager case class CarbonTruncateCommand(child: TruncateTableCommand) extends DataCommand { override def processData(sparkSession: SparkSession): Seq[Row] = { val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) -val dbName = CarbonEnv.getDatabaseName(child.tableName.database)(sparkSession) -val tableName = child.tableName.table -setAuditTable(dbName, tableName) -val relation = CarbonEnv.getInstance(sparkSession).carbonMetaStore - .lookupRelation(Option(dbName), tableName)(sparkSession).asInstanceOf[CarbonRelation] -if (relation == null) { - throw new NoSuchTableException(dbName, tableName) -} -if (null == relation.carbonTable) { - LOGGER.error(s"Truncate table failed. table not found: $dbName.$child.tableName.table") - throw new NoSuchTableException(dbName, child.tableName.table) +Checker.validateTableExists(child.tableName.database, child.tableName.table, sparkSession) +val carbonTable = CarbonEnv.getCarbonTable( + child.tableName.database, child.tableName.table)(sparkSession) +setAuditTable(carbonTable) +if (!carbonTable.isTransactionalTable) { + LOGGER.error(s"Unsupported truncate non-transactional table") + throw new MalformedCarbonCommandException( +"Unsupported truncate non-transactional table") } if (child.partitionSpec.isDefined) { throw new MalformedCarbonCommandException( "Unsupported truncate table with specified partition") } -CarbonCleanFilesCommand( - databaseNameOp = Option(dbName), - tableName = Option(tableName), - None, - truncateTable = true -).run(sparkSession) + Review comment: let clean files to remove data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737766059 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3283/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
akashrn5 commented on a change in pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#discussion_r534930213 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) { val lastSegment = sortedSegments.get(sortedSegments.size() - 1) - deletePartialLoadsInCompaction() Review comment: if you are removing this, how are you handling the case of segment number same in next compaction retry as we dont have any entry for table status file during compaction. ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.trash + +import scala.collection.JavaConverters._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.indexstore.PartitionSpec +import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, TrashUtil} + +/** + * This object will manage the following data. + * 1. .Trash folder + * 2. stale segments without metadata + * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + */ +object DataTrashManager { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + /** + * clean garbage data + * 1. clean .Trash folder + * 2. clean stale segments without metadata + * 3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress) + * + * @param carbonTable : CarbonTable Object + * @param partitionSpecs : Hive Partitions details Review comment: please remove @params, name suggests what it is clearly ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) { } - /** - * Handling of the clean up of old carbondata files, index files , delete delta, - * update status files. - * @param table clean up will be handled on this table. - * @param forceDelete if true then max query execution timeout will not be considered. - */ - public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) throws IOException { - -SegmentStatusManager ssm = new SegmentStatusManager(table.getAbsoluteTableIdentifier()); - -LoadMetadataDetails[] details = -SegmentStatusManager.readLoadMetadata(table.getMetadataPath()); - -SegmentUpdateStatusManager updateStatusManager = new SegmentUpdateStatusManager(table); -SegmentUpdateDetails[] segmentUpdateDetails = updateStatusManager.getUpdateStatusDetails(); -// hold all the segments updated so that wen can check the delta files in them, ne need to -// check the others. -Set updatedSegments = new HashSet<>(); -for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) { - updatedSegments.add(updateDetails.getSegmentName()); -} - -String validUpdateStatusFile = ""; - -boolean isAbortedFile = true; - -boolean isInvalidFile = false; - -// take the update status file name from 0th segment. -validUpdateStatusFile = ssm.getUpdateStatusFileName(details); -// scan through each segment. -for (LoadMetadataDetails segment : details) { - // if this segment is valid then only we will go for delta file deletion. - // if the segment is mark for delete or compacted then any way it will get deleted. - if (segment.getSegmentStatus() == SegmentStatus.SUCCESS - || segment.getSegmentStatus() == SegmentStatus.LOAD_PARTIAL_SUCCESS) { -// when there is no update operations done on table, then n
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737759882 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5041/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-737747052 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3282/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-737741961 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5039/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org