[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r513197605 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1143,28 +1148,62 @@ public static void cleanSegments(CarbonTable table, * @throws IOException */ public static void deleteSegment(String tablePath, Segment segment, - List partitionSpecs, - SegmentUpdateStatusManager updateStatusManager) throws Exception { + List partitionSpecs, SegmentUpdateStatusManager updateStatusManager, + SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStampForTrashFolder) + throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); +List filesToDelete = new ArrayList<>(); Map> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + // Move the file to the trash folder in case the segment status is insert in progress + if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS && isPartitionTable) { +TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), timeStampForTrashFolder + Review comment: changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717713614 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2955/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717711741 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4712/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3952: [CARBONDATA-4006] Fix for currentUser as NULL in getcount method during index server fallback mode
Indhumathi27 commented on pull request #3952: URL: https://github.com/apache/carbondata/pull/3952#issuecomment-717703722 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
akashrn5 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r513171515 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -2116,6 +2087,26 @@ public int getMaxSIRepairLimit(String dbName, String tableName) { return Math.abs(Integer.parseInt(thresholdValue)); } + /** + * The below method returns the microseconds after which the trash folder will expire + */ + public long getTrashFolderExpirationTime() { +String configuredValue = getProperty(CarbonCommonConstants.CARBON_TRASH_EXPIRATION_DAYS, +CarbonCommonConstants.CARBON_TRASH_EXPIRATION_DAYS_DEFAULT); +Integer result = 0; +try { + result = Integer.parseInt(configuredValue); + if (result < 0) { +LOGGER.warn("Value of carbon.trash.expiration.days is negative, taking default value"); +result = Integer.parseInt(CARBON_TRASH_EXPIRATION_DAYS_DEFAULT); + } +} catch (NumberFormatException e) { + LOGGER.error("Error happened while parsing", e); Review comment: ```suggestion LOGGER.error("Invalid value configured for " + "CarbonCommonConstants.CARBON_TRASH_EXPIRATION_DAYS, considering the default value", e); ``` Please refactor it, in case of failure also it should take default value, now it will be considered as 0, and common message you can take to a variable to reduce the code and keep clean. ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -113,12 +116,24 @@ private static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTab SegmentUpdateStatusManager updateStatusManager = new SegmentUpdateStatusManager(carbonTable, currLoadDetails); for (final LoadMetadataDetails oneLoad : loadDetails) { - if (checkIfLoadCanBeDeletedPhysically(oneLoad, isForceDelete)) { + if (checkIfLoadCanBeDeletedPhysically(oneLoad, isForceDelete, carbonTable + .getAbsoluteTableIdentifier())) { try { + // if insert in progress, then move it to trash + if (oneLoad.getSegmentStatus() == SegmentStatus.INSERT_IN_PROGRESS && !carbonTable + .isHivePartitionTable()) { +// move this segment to trash + TrashUtil.copyDataToTrashBySegment(FileFactory.getCarbonFile(CarbonTablePath +.getFactDir(carbonTable.getTablePath()) + "/Part0/Segment_" + oneLoad Review comment: do not hardcode ## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ## @@ -47,6 +47,7 @@ public static final String BATCH_PREFIX = "_batchno"; private static final String LOCK_DIR = "LockFiles"; + public static final String SEGMENTS_METADATA_FOLDER = "segments"; Review comment: also change in `getSegmentFilesLocation` and `getSegmentFilePath` in this class and check for other place who uses `"segments"` string once ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1143,28 +1148,62 @@ public static void cleanSegments(CarbonTable table, * @throws IOException */ public static void deleteSegment(String tablePath, Segment segment, - List partitionSpecs, - SegmentUpdateStatusManager updateStatusManager) throws Exception { + List partitionSpecs, SegmentUpdateStatusManager updateStatusManager, + SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStampForTrashFolder) + throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); +List filesToDelete = new ArrayList<>(); Map> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + // Move the file to the trash folder in case the segment status is insert in progress + if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS && isPartitionTable) { +TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), timeStampForTrashFolder + Review comment: you are sending these many arguments in many places, can we refactor it to reduce the changes in many place? ## File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717678439 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4711/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717678491 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2954/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
QiangCai commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r513152963 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ## @@ -121,7 +121,7 @@ case class UpdateTableModel( updatedTimeStamp: Long, var executorErrors: ExecutionErrors, deletedSegments: Seq[Segment], -loadAsNewSegment: Boolean = false) +loadAsNewSegment: Boolean = true) Review comment: remove it, always update into new segment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update
QiangCai commented on a change in pull request #3986: URL: https://github.com/apache/carbondata/pull/3986#discussion_r513147277 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java ## @@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) { } /** - * Return all delta file for a block. - * @param segmentId - * @param blockName - * @return + * Get all delete delta files of the block of specified segment. + * Actually, delete delta file name is generated from each SegmentUpdateDetails. + * + * @param seg the segment which is to find block and its delete delta files + * @param blockName the specified block of the segment + * @return delete delta file list of the block */ - public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) { + public List getDeleteDeltaFilesList(final Segment seg, final String blockName) { +List deleteDeltaFileList = new ArrayList<>(); String segmentPath = CarbonTablePath.getSegmentPath( -identifier.getTablePath(), segmentId.getSegmentNo()); -CarbonFile segDir = -FileFactory.getCarbonFile(segmentPath); +identifier.getTablePath(), seg.getSegmentNo()); + for (SegmentUpdateDetails block : updateDetails) { if ((block.getBlockName().equalsIgnoreCase(blockName)) && - (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo())) - && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( { -final long deltaStartTimestamp = - getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block); -final long deltaEndTimeStamp = -getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block); - -return segDir.listFiles(new CarbonFileFilter() { - - @Override - public boolean accept(CarbonFile pathName) { -String fileName = pathName.getName(); -if (pathName.getSize() > 0 -&& fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) { - String blkName = fileName.substring(0, fileName.lastIndexOf("-")); - long timestamp = - Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName)); - return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp - && timestamp >= deltaStartTimestamp; -} -return false; - } -}); + (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) && + !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) { +Set deltaFileTimestamps = block.getDeltaFileStamps(); +String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR + +blockName + CarbonCommonConstants.HYPHEN; +if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) { Review comment: please add test case: 1. set CarbonCommonConstants.CARBON_HORIZONTAL_COMPACTION_ENABLE to false 2. update different row in same file three times (it should generate three delete delta files for same blockname) 3. set CarbonCommonConstants.CARBON_HORIZONTAL_COMPACTION_ENABLE to true 4. update different row in same file again 5. query table to check result ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java ## @@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) { } /** - * Return all delta file for a block. - * @param segmentId - * @param blockName - * @return + * Get all delete delta files of the block of specified segment. + * Actually, delete delta file name is generated from each SegmentUpdateDetails. + * + * @param seg the segment which is to find block and its delete delta files + * @param blockName the specified block of the segment + * @return delete delta file list of the block */ - public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) { + public List getDeleteDeltaFilesList(final Segment seg, final String blockName) { +List deleteDeltaFileList = new ArrayList<>(); String segmentPath = CarbonTablePath.getSegmentPath( -identifier.getTablePath(), segmentId.getSegmentNo()); -CarbonFile segDir = -FileFactory.getCarbonFile(segmentPath); +identifier.getTablePath(), seg.getSegmentNo()); + for (SegmentUpdateDetails block : updateDetails) { if ((block.getBlockName().equalsIgnoreCase(blockName)) && - (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo())) - && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( { -final long deltaStartTimestamp = - getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block); -final long
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717496054 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2953/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717492977 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4710/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
ajantha-bhat commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r512886166 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ## @@ -121,7 +121,7 @@ case class UpdateTableModel( updatedTimeStamp: Long, var executorErrors: ExecutionErrors, deletedSegments: Seq[Segment], -loadAsNewSegment: Boolean = false) +loadAsNewSegment: Boolean = true) Review comment: There is no use of horizontal compaction features after this change? @QiangCai , @marchpure : we need to mark it as deprecated ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3972: [CARBONDATA-4042]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort
akashrn5 commented on pull request #3972: URL: https://github.com/apache/carbondata/pull/3972#issuecomment-717401461 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
akashrn5 commented on pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#issuecomment-717398581 @marchpure as from the description, i understood that the table status file after insert overwrite do not contain the `updatetablestatusfile ` name. So why do we need to add boolean variable, and send this boolean from many places. This may lead to confusion later also. So why dont we take care it just while updating the table status in case of insert overwrite? I mean to say, while updating the status after insert overwrite, we can first copy the existing meta and then prepare new entry and update case can be handled there, this way, it will lead to less changes and less confusion later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
akashrn5 commented on pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#issuecomment-717398850 > > @marchpure please correct the PR title and please give detailed explanation of the issue and the fix in the PR description. > > Yes. I have modified the PR desc as your suggestion PR title is still not yet formatted, please check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
akashrn5 commented on a change in pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#discussion_r512875192 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -276,14 +278,13 @@ public static boolean updateTableMetadataStatus(Set updatedSegmentsList SegmentStatusManager.readLoadMetadata(metaDataFilepath); for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) { + if (isUpdateStatusFileUpdateRequired && + loadMetadata.getLoadName().equalsIgnoreCase("0")) { +loadMetadata.setUpdateStatusFileName( +CarbonUpdateUtil.getUpdateStatusFileName(updatedTimeStamp)); + } if (isTimestampUpdateRequired) { -// we are storing the link between the 2 status files in the segment 0 only. Review comment: why this condition is moved up? and why not `isTimestampUpdateRequired` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3952: [CARBONDATA-4006] Fix for currentUser as NULL in getcount method during index server fallback mode
vikramahuja1001 commented on pull request #3952: URL: https://github.com/apache/carbondata/pull/3952#issuecomment-717378005 @kunal642 @akashrn5 please merge this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
CarbonDataQA1 commented on pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#issuecomment-717333719 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4709/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717329144 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2951/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
CarbonDataQA1 commented on pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#issuecomment-717328681 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2952/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717325746 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4708/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
ajantha-bhat commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r512760255 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -342,7 +342,8 @@ object CarbonDataRDDFactory { try { if (!carbonLoadModel.isCarbonTransactionalTable || segmentLock.lockWithRetries()) { -if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) { +if (updateModel.isDefined && (!updateModel.get.loadAsNewSegment Review comment: If you just set updateModel.get.loadAsNewSegment= true in update flow is not enough ? please explain why this change is required ? ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableSortColumnsProperty.scala ## @@ -739,14 +739,14 @@ class TestAlterTableSortColumnsProperty extends QueryTest with BeforeAndAfterAll val table = CarbonEnv.getCarbonTable(Option("default"), tableName)(sqlContext.sparkSession) val tablePath = table.getTablePath -(0 to 2).foreach { segmentId => +(0 to 3).foreach { segmentId => Review comment: why this change required ? I think you can revert this ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala ## @@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand( case _ => sys.error("") } -val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments) +val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments, + !carbonRelation.carbonTable.isHivePartitionTable) Review comment: so, for partition, update will not write as new segment ? How handle dirty data issue for partition table update scenario ? ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala ## @@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll { sql("update carbon set(a)=(10) where a=1").collect() checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3))) showCache = sql("show metacache on table carbon").collect() -assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached")) +assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached")) Review comment: how 8 index files become 6 (other two were stale ?) and out of 6 why only 1 is cached now ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure closed pull request #3934: [WIP] Support Global Unique Id for SegmentNo
marchpure closed pull request #3934: URL: https://github.com/apache/carbondata/pull/3934 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA1 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-717268180 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2950/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA1 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-717267646 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4707/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-3903) Documentation Issue in Github Docs Link https://github.com/apache/carbondata/tree/master/docs
[ https://issues.apache.org/jira/browse/CARBONDATA-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PURUJIT CHAUGULE closed CARBONDATA-3903. Document updation is resolved and verified. > Documentation Issue in Github Docs Link > https://github.com/apache/carbondata/tree/master/docs > -- > > Key: CARBONDATA-3903 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3903 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 2.0.1 > Environment: https://github.com/apache/carbondata/tree/master/docs >Reporter: PURUJIT CHAUGULE >Priority: Minor > Fix For: 2.1.0 > > > dml-of-carbondata.md > LOAD DATA: > * Mention Each Load is considered as a Segment. > * Give all possible options for SORT_SCOPE like > GLOBAL_SORT/LOCAL_SORT/NO_SORT (with explanation of difference between each > type). > * Add Example Of complete Load query with/without use of OPTIONS. > INSERT DATA: > * Mention each insert is a Segment. > LOAD Using Static/Dynamic Partitioning: > * Can give a hyperlink to Static/Dynamic partitioning. > UPDATE/DELETE: > * Mention about delta files concept in update and delete. > DELETE: > * Add example for deletion of all records from a table (delete from > tablename). > COMPACTION: > * Can mention Minor compaction of two types Auto and Manual( > carbon.auto.load.merge =true/false), and that if > carbon.auto.load.merge=false, trigger should be done manually. > * Hyperlink to Configurable properties of Compaction. > * Mention that compacted segments do not get cleaned automatically and > should be triggered manually using clean files. > > flink-integration-guide.md > * Mention what are stages, how is it used. > * Process of insertion, deletion of stages in carbontable. (How is it stored > in carbontable). > > language-manual.md > * Mention Compaction Hyperlink in DML section. > > spatial-index-guide.md > * Mention the TBLPROPERTIES supported / not supported for Geo table. > * Mention Spatial Index does not make a new column. > * CTAS from one geo table to another does not create another Geo table can > be mentioned. > * Mention that a certain combination of Spatial Index table properties need > to be added in create table, without which a geo table does not get created. > * Mention that we cannot alter columns (change datatype, change name, drop) > mentioned in spatial_index. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…
marchpure commented on pull request #3981: URL: https://github.com/apache/carbondata/pull/3981#issuecomment-717259421 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717258126 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2949/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-717257966 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4706/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-4010) "Alter table set tblproperties should support long string columns" and bad record handling of long string data for string columns need to be updated in https://github
[ https://issues.apache.org/jira/browse/CARBONDATA-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat closed CARBONDATA-4010. --- Issue is fixed in Carbon 2.1 version. > "Alter table set tblproperties should support long string columns" and bad > record handling of long string data for string columns need to be updated in > https://github.com/apache/carbondata/blob/master/docs > - > > Key: CARBONDATA-4010 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4010 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 2.1.0 > Environment: https://github.com/apache/carbondata/blob/master/docs >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > "Alter table set tblproperties should support long string columns" and bad > record handling of long string data for string columns need to be updated in > https://github.com/apache/carbondata/blob/master/docs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3932) need to change discovery.uri and add hive.metastore.uri,hive.config.resources in https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md#presto-mul
[ https://issues.apache.org/jira/browse/CARBONDATA-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat closed CARBONDATA-3932. --- Issue is fixed in Carbon 2.1 version. > need to change discovery.uri and add > hive.metastore.uri,hive.config.resources in > https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md#presto-multinode-cluster-setup-for-carbondata > - > > Key: CARBONDATA-3932 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3932 > Project: CarbonData > Issue Type: Bug > Components: docs, presto-integration >Affects Versions: 2.0.1 > Environment: Documentation >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Need to change discovery.uri=:8086 to > discovery.uri=http://:8086 in > [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md#presto-multinode-cluster-setup-for-carbondata] > Need to add these configurations as well in carbondata.properties and to be > updated in carbondata-presto opensource doc . > 1.hive.metastore.uri > 2.hive.config.resources > Ex : - > connector.name=carbondata > hive.metastore.uri=thrift://10.21.18.106:9083 > hive.config.resources=/opt/HA/C10/install/hadoop/datanode/etc/hadoop/core-site.xml,/opt/HA/C10/install/hadoop/datanode/etc/hadoop/hdfs-site.xml > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3901) Documentation issues in https://github.com/apache/carbondata/tree/master/docs
[ https://issues.apache.org/jira/browse/CARBONDATA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat closed CARBONDATA-3901. --- Issue is fixed in Carbon 2.1 version > Documentation issues in https://github.com/apache/carbondata/tree/master/docs > - > > Key: CARBONDATA-3901 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3901 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 2.0.1 > Environment: https://github.com/apache/carbondata/tree/master/docs >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > *Issue 1 -* > [https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.mdSORT_SCOPE] > Sort scope of the load.Options include no sort, local sort ,batch sort and > global sort --> Batch sort to be removed as its not supported. > *Issue 2 -* > [https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md#close-stream] > CLOSE STREAM link is not working. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3824) Error when Secondary index tried to be created on table that does not exist is not correct.
[ https://issues.apache.org/jira/browse/CARBONDATA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat closed CARBONDATA-3824. --- Issue is fixed in Carbon 2.1 version. > Error when Secondary index tried to be created on table that does not exist > is not correct. > --- > > Key: CARBONDATA-3824 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3824 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 2.0.0 > Environment: Spark 2.3.2, Spark 2.4.5 >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.0 > > > *Issue :-* > Table uniqdata_double does not exist. > Secondary index tried to be created on table. Error message is incorrect. > CREATE INDEX indextable2 ON TABLE uniqdata_double (DOB) AS 'carbondata' > PROPERTIES('carbon.column.compressor'='zstd'); > *Error: java.lang.RuntimeException: Operation not allowed on non-carbon table > (state=,code=0)* > > *Expected :-* > *Error: java.lang.RuntimeException: Table does not exist* *(state=,code=0)*** -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-717190630 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2948/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-717189509 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4705/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] maheshrajus commented on a change in pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically
maheshrajus commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r512579906 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ## @@ -143,10 +143,18 @@ object DataLoadProcessBuilderOnSpark { var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) -if (numPartitions <= 0) { - numPartitions = convertRDD.partitions.length + +// if numPartitions user does not specify and not specified in config then dynamically calculate +if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(inputRDD)/100 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/inputRDD.getNumPartitions Review comment: @akashrn5 you reviewed old code, now i pushed again the new code and testing is in progress This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically
akashrn5 commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r503187128 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ## @@ -143,10 +143,18 @@ object DataLoadProcessBuilderOnSpark { var numPartitions = CarbonDataProcessorUtil.getGlobalSortPartitions( configuration.getDataLoadProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS)) -if (numPartitions <= 0) { - numPartitions = convertRDD.partitions.length + +// if numPartitions user does not specify and not specified in config then dynamically calculate +if (numPartitions == 0) { + // get the size in bytes and convert to size in MB + val sizeOfDataFrame = SizeEstimator.estimate(inputRDD)/100 + // data frame size can not be more than Int size + numPartitions = sizeOfDataFrame.toInt/inputRDD.getNumPartitions Review comment: i think here, you should try to get the partitions based on the total load size and the partition size and not the partition number.Please check again and handle correctly. @QiangCai please have a look once This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-717143281 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4704/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-717141636 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2947/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-717131900 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4007) ArrayIndexOutofBoundsException when IUD operations performed using SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4007. -- Fix Version/s: 2.1.0 Resolution: Fixed > ArrayIndexOutofBoundsException when IUD operations performed using SDK > -- > > Key: CARBONDATA-4007 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4007 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.1.0 > Environment: Spark 2.4.5 jars used for compilation of SDK >Reporter: Chetan Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Issue - > ArrayIndexOutofBoundsException when IUD operations performed using SDK. > Exception - > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.close(CarbonTableOutputFormat.java:579) > at org.apache.carbondata.sdk.file.CarbonIUD.delete(CarbonIUD.java:110) > at > org.apache.carbondata.sdk.file.CarbonIUD.deleteExecution(CarbonIUD.java:238) > at org.apache.carbondata.sdk.file.CarbonIUD.closeDelete(CarbonIUD.java:123) > at org.apache.carbondata.sdk.file.CarbonIUD.commit(CarbonIUD.java:221) > at com.apache.spark.SdkIUD_Test.testDelete(SdkIUD_Test.java:130) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
CarbonDataQA1 commented on pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#issuecomment-717126913 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2943/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
CarbonDataQA1 commented on pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#issuecomment-717126479 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4700/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512553653 ## File path: docs/dml-of-carbondata.md ## @@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes: ``` CLEAN FILES FOR TABLE carbon_table ``` + +## CLEAN FILES + + Clean files command is used to remove the Compacted and Marked + For Delete Segments from the store. Carbondata also supports Trash + Folder where all the stale data is moved to after clean files + is called + + There are several types of compaction + + ``` + CLEAN FILES ON TABLE TableName + ``` + + - **Minor Compaction** Review comment: removed ## File path: docs/dml-of-carbondata.md ## @@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes: ``` CLEAN FILES FOR TABLE carbon_table ``` + +## CLEAN FILES + + Clean files command is used to remove the Compacted and Marked Review comment: linked ## File path: integration/spark/src/main/scala/org/apache/carbondata/cleanfiles/CleanFilesUtil.scala ## @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cleanfiles + +import java.util + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer + +import org.apache.spark.sql.{AnalysisException, CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.exception.ConcurrentOperationException +import org.apache.carbondata.core.indexstore.PartitionSpec +import org.apache.carbondata.core.locks.{CarbonLockFactory, CarbonLockUtil, ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, SegmentFileStore} +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.mutate.CarbonUpdateUtil +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatus, SegmentStatusManager} +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil} +import org.apache.carbondata.core.util.path.{CarbonTablePath, TrashUtil} +import org.apache.carbondata.processing.loading.TableProcessingOperations +import org.apache.carbondata.processing.loading.model.CarbonLoadModel + +object CleanFilesUtil { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + /** + * The method deletes all data if forceTableClean and clean garbage segment + * (MARKED_FOR_DELETE state) if forceTableClean + * + * @param dbName : Database name + * @param tableName : Table name + * @param tablePath : Table path + * @param carbonTable: CarbonTable Object in case of force clean + * @param forceTableClean: for force clean it will delete all data + *it will clean garbage segment (MARKED_FOR_DELETE state) + * @param currentTablePartitions : Hive Partitions details + */ + def cleanFiles( +dbName: String, +tableName: String, +tablePath: String, +timeStamp: String, +carbonTable: CarbonTable, +forceTableClean: Boolean, +currentTablePartitions: Option[Seq[PartitionSpec]] = None, +truncateTable: Boolean = false): Unit = { +var carbonCleanFilesLock: ICarbonLock = null +val absoluteTableIdentifier = if (forceTableClean) { + AbsoluteTableIdentifier.from(tablePath, dbName, tableName, tableName) +} else { + carbonTable.getAbsoluteTableIdentifier +} +try { + val errorMsg = "Clean files request is failed for " + +s"$dbName.$tableName" + +". Not able to acquire the clean files lock due to another clean files " + +"operation is running in the background." + // in case of force clean the lock is not required + if
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512553372 ## File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util.path; + +import java.io.File; +import java.io.IOException; +import java.sql.Timestamp; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.exception.CarbonFileException; +import org.apache.carbondata.core.util.CarbonUtil; + +import org.apache.commons.io.FileUtils; + +import org.apache.log4j.Logger; + +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CarbonUtil.class.getName()); + + /** + * The below method copies the complete a file to the trash folder. Provide necessary + * timestamp and the segment number in the suffixToAdd variable, so that the proper folder is + * created in the trash folder. + */ + public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy, + String suffixToAdd) { +String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) + +CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd; +try { + if (new File(pathOfFileToCopy).exists()) { +FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath)); +LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: " ++ trashFolderPath); + } +} catch (IOException e) { + LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e); +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Provide necessary + * timestamp and the segment number in the suffixToAdd variable, so that the proper folder is + * created in the trash folder. + */ + public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath, + String suffixToAdd) { +String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) + +CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd; +try { + FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath)); + LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" + + " successfully"); +} catch (IOException e) { + LOGGER.error("Unable to create the trash folder and copy data to it", e); +} + } + + /** + * The below method deletes timestamp subdirectories in the trash folder which have expired as + * per the user defined expiration time + */ + public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp) + throws IOException { +String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath); +// Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. +if (FileFactory.isFileExist(pathOfTrashFolder)) { + try { +List carbonFileList = FileFactory.getFolderList(pathOfTrashFolder); +for (CarbonFile carbonFile : carbonFileList) { + String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR); Review comment: different names for partition tables and normal tables, changed the variable name though ## File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the
[GitHub] [carbondata] asfgit closed pull request #3970: [CARBONDATA-4007] Fix multiple issues in SDK
asfgit closed pull request #3970: URL: https://github.com/apache/carbondata/pull/3970 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512552552 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1049,7 +1049,7 @@ private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable } public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion, - List partitionSpecs) throws IOException { + List partitionSpecs, String timeStamp) throws IOException { Review comment: i have changed this behaviour, after this change even one clean files command can create multiple timestamp subdirectories. The user can use tree command to list the files and use the timestamp subfolder as he desires. ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String tableName) { return Math.abs(Integer.parseInt(thresholdValue)); } + /** + * The below method returns the microseconds after which the trash folder will expire + */ + public long getTrashFolderExpirationTime() { +String configuredValue = getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS, +CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT); +int result = Integer.parseInt(configuredValue); +if (result < 0) { + result = Integer.parseInt(TRASH_EXPIRATION_DAYS_DEFAULT); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
ajantha-bhat commented on pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#issuecomment-717124467 @akashrn5 a) Please update the PR description about what all problems were there and what changes done to support it. Now it is not clear. b) Please confirm here whether the cluster test is passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512552754 ## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ## @@ -47,6 +47,7 @@ public static final String BATCH_PREFIX = "_batchno"; private static final String LOCK_DIR = "LockFiles"; + public static final String SEGMENTS_FOLDER = "segments"; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
ajantha-bhat commented on a change in pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#discussion_r512505051 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -369,7 +369,8 @@ private CarbonCommonConstants() { public static final String CARBON_MERGE_INDEX_IN_SEGMENT = "carbon.merge.index.in.segment"; - public static final String CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT = "true"; + // TODO: revert this after proper fix in this PR Review comment: please revert this ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputCommitter.java ## @@ -123,13 +132,46 @@ public void commitJob(JobContext jobContext) throws IOException { try { Configuration configuration = jobContext.getConfiguration(); CarbonLoadModel carbonLoadModel = MapredCarbonOutputFormat.getLoadModel(configuration); - ThreadLocalSessionInfo.unsetAll(); - SegmentFileStore.writeSegmentFile(carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable(), - carbonLoadModel.getSegmentId(), String.valueOf(carbonLoadModel.getFactTimeStamp())); - SegmentFileStore - .mergeIndexAndWriteSegmentFile(carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable(), - carbonLoadModel.getSegmentId(), String.valueOf(carbonLoadModel.getFactTimeStamp())); - CarbonTableOutputFormat.setLoadModel(configuration, carbonLoadModel); + if (!carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().isHivePartitionTable()) { +ThreadLocalSessionInfo.unsetAll(); +SegmentFileStore + .writeSegmentFile(carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable(), +carbonLoadModel.getSegmentId(), String.valueOf(carbonLoadModel.getFactTimeStamp())); +SegmentFileStore.mergeIndexAndWriteSegmentFile( +carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable(), +carbonLoadModel.getSegmentId(), String.valueOf(carbonLoadModel.getFactTimeStamp())); +CarbonTableOutputFormat.setLoadModel(configuration, carbonLoadModel); + } else { +String tableFactLocation = + carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getTablePath(); +List carbonFiles = +FileFactory.getCarbonFile(tableFactLocation).listFiles(true, new CarbonFileFilter() { Review comment: list files of whole table can be very slow when multiple segments are present. we list previous load segments also here. I think we need to keep list of index/merge index created for current load in memory and write in the segment file here. ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java ## @@ -115,9 +115,20 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOEx carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getPartitionInfo(); final int partitionColumn = partitionInfo != null ? partitionInfo.getColumnSchemaList().size() : 0; +final String finalOutputPath; if (carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().isHivePartitionTable()) { - carbonLoadModel.getMetrics().addToPartitionPath(finalOutPath.toString()); - context.getConfiguration().set("carbon.outputformat.writepath", finalOutPath.toString()); + String[] outputPathSplits = finalOutPath.toString().split("/"); + StringBuilder partitionDirs = new StringBuilder(); + for (int i = partitionColumn; i > 0; i--) { Review comment: so, how carbondata-hive partition write is working ? ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoInsertIntoPartitionTableTestCase.scala ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.File +import java.util +import java.util.concurrent.atomic.AtomicInteger + +import scala.collection.JavaConverters._ + +import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, FunSuiteLike} +
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551259 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -192,11 +208,17 @@ private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean isForceDelete, AbsoluteTableIdentifier absoluteTableIdentifier) { // Check if the segment is added externally and path is set then do not delete it if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() -|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null +|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus +.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null Review comment: i am not sure about this. maybe we can discuss with @ajantha-bhat or @akashrn5 once? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551385 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1427,6 +1428,25 @@ private CarbonCommonConstants() { public static final String BITSET_PIPE_LINE_DEFAULT = "true"; + public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551780 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List partitio * @throws IOException */ public static void deleteSegment(String tablePath, Segment segment, - List partitionSpecs, - SegmentUpdateStatusManager updateStatusManager) throws Exception { + List partitionSpecs, SegmentUpdateStatusManager updateStatusManager, + SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp) + throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); +List filesToDelete = new ArrayList<>(); Map> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + // Move the file to the trash folder in case the segment status is insert in progress + if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) { +if (!isPartitionTable) { + TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), timeStamp + Review comment: copying it whole segment wise for normal tables, but in case of partition table, doing it file level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512550744 ## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ## @@ -138,8 +143,19 @@ public boolean accept(CarbonFile file) { if (filesToBeDeleted.length == 0) { status = true; } else { - for (CarbonFile eachFile : filesToBeDeleted) { + // If the file to be deleted is a carbondata file, index file, index merge file + // or a delta file, copy that file to the trash folder. + if ((eachFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT) || Review comment: coping segment wise in the case of normal table, in the case of partition flow have kept it file by file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512550325 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List partitio * @throws IOException */ public static void deleteSegment(String tablePath, Segment segment, - List partitionSpecs, - SegmentUpdateStatusManager updateStatusManager) throws Exception { + List partitionSpecs, SegmentUpdateStatusManager updateStatusManager, + SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp) + throws Exception { SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName()); List indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, FileFactory.getConfiguration()); +List filesToDelete = new ArrayList<>(); Map> indexFilesMap = fileStore.getIndexFilesMap(); for (Map.Entry> entry : indexFilesMap.entrySet()) { - FileFactory.deleteFile(entry.getKey()); + // Move the file to the trash folder in case the segment status is insert in progress + if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) { Review comment: for the normal table flow, i have changed it to copy to trash by segment, but in case of partition table copying to trash by file because i will have to read the segment file to get the desired carbondata and the index files per segment, which will increase the IO time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512549332 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ## @@ -0,0 +1,484 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.carbondata.spark.testsuite.cleanfiles + +import java.io.{File, PrintWriter} +import java.util +import java.util.List + +import org.apache.carbondata.cleanfiles.CleanFilesUtil +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.spark.sql.{CarbonEnv, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import scala.io.Source + +class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { + + var count = 0 + + test("clean up table and test trash folder with In Progress segments") { +sql("""DROP TABLE IF EXISTS CLEANTEST""") +sql("""DROP TABLE IF EXISTS CLEANTEST1""") +sql( + """ +| CREATE TABLE cleantest (name String, id Int) +| STORED AS carbondata + """.stripMargin) +sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""") +sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""") +sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""") +// run a select query before deletion +checkAnswer(sql(s"""select count(*) from cleantest"""), + Seq(Row(3))) + +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val tableStatusFilePath = path + CarbonCommonConstants.FILE_SEPARATOR + "Metadata" + + CarbonCommonConstants.FILE_SEPARATOR + "tableStatus" +editTableStatusFile(path) +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + + CarbonCommonConstants.CARBON_TRASH_FOLDER_NAME + +assert(!FileFactory.isFileExist(trashFolderPath)) +val dryRun = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('isDryRun'='true')").count() +// dry run shows 3 segments to move to trash +assert(dryRun == 3) + +sql(s"CLEAN FILES FOR TABLE cleantest").show + +checkAnswer(sql(s"""select count(*) from cleantest"""), + Seq(Row(0))) +assert(FileFactory.isFileExist(trashFolderPath)) +var list = getFileCountInTrashFolder(trashFolderPath) +assert(list == 6) + +val dryRun1 = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('isDryRun'='true')").count() +sql(s"CLEAN FILES FOR TABLE cleantest").show + +count = 0 +list = getFileCountInTrashFolder(trashFolderPath) +// no carbondata file is added to the trash +assert(list == 6) + + +val timeStamp = getTimestampFolderName(trashFolderPath) + +// recovering data from trash folder +sql( + """ +| CREATE TABLE cleantest1 (name String, id Int) +| STORED AS carbondata + """.stripMargin) + +val segment0Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp + + CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '0' +val segment1Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp + + CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '1' +val segment2Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp + + CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '2' + +sql(s"alter table cleantest1 add segment options('path'='$segment0Path'," + + s"'format'='carbon')").show() +sql(s"alter table cleantest1 add segment options('path'='$segment1Path'," + + s"'format'='carbon')").show() +sql(s"alter table cleantest1 add segment options('path'='$segment2Path'," + + s"'format'='carbon')").show() +sql(s"""INSERT INTO CLEANTEST SELECT * from cleantest1""") + +// test after recovering data from trash +checkAnswer(sql(s"""select count(*) from cleantest"""), + Seq(Row(3))) + +sql(s"CLEAN FILES FOR
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512548768 ## File path: docs/cleanfiles.md ## @@ -0,0 +1,78 @@ + + + +## CLEAN FILES + +Clean files command is used to remove the Compacted, Marked For Delete ,In Progress which are stale and Partial(Segments which are missing from the table status file but their data is present) + segments from the store. + + Clean Files Command + ``` + CLEAN FILES ON TABLE TABLE_NAME + ``` + + +### TRASH FOLDER + + Carbondata supports a Trash Folder which is used as a redundant folder where all the unnecessary files and folders are moved to during clean files operation. + This trash folder is mantained inside the table path. It is a hidden folder(.Trash). The segments that are moved to the trash folder are mantained under a timestamp + subfolder(timestamp at which clean files operation is called). This helps the user to list down segments by timestamp. By default all the timestamp sub-directory have an expiration + time of (3 days since that timestamp) and it can be configured by the user using the following carbon property + ``` + carbon.trash.expiration.time = "Number of days" + ``` + Once the timestamp subdirectory is expired as per the configured expiration day value, the subdirectory is deleted from the trash folder in the subsequent clean files command. + + + + +### DRY RUN + Support for dry run is provided before the actual clean files operation. This dry run operation will list down all the segments which are going to be manipulated during + the clean files operation. The dry run result will show the current location of the segment(it can be in FACT folder, Partition folder or trash folder) and where that segment + will be moved(to the trash folder or deleted from store) once the actual operation will be called. + + + ``` + CLEAN FILES ON TABLE TABLE_NAME options('dry_run'='true') + ``` + +### FORCE DELETE TRASH +The force option with clean files command deletes all the files and folders from the trash folder. + + ``` + CLEAN FILES ON TABLE TABLE_NAME options('force'='true') + ``` + +### DATA RECOVERY FROM THE TRASH FOLDER + +The segments from can be recovered from the trash folder by creating an external table from the desired segment location Review comment: changed ## File path: docs/dml-of-carbondata.md ## @@ -552,3 +553,50 @@ CarbonData DML statements are documented here,which includes: ``` CLEAN FILES FOR TABLE carbon_table ``` + +## CLEAN FILES + + Clean files command is used to remove the Compacted and Marked + For Delete Segments from the store. Carbondata also supports Trash + Folder where all the stale data is moved to after clean files + is called + + There are several types of compaction + + ``` + CLEAN FILES ON TABLE TableName + ``` Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata
vikramahuja1001 commented on a change in pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#discussion_r512545772 ## File path: docs/cleanfiles.md ## @@ -0,0 +1,78 @@ + + + +## CLEAN FILES + +Clean files command is used to remove the Compacted, Marked For Delete ,In Progress which are stale and Partial(Segments which are missing from the table status file but their data is present) + segments from the store. + + Clean Files Command + ``` + CLEAN FILES ON TABLE TABLE_NAME Review comment: changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA1 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-717104688 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2942/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA1 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-717095810 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4699/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 edited a comment on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
akashrn5 edited a comment on pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#issuecomment-717074706 @ajantha-bhat @QiangCai i have rebased, please have a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto
akashrn5 commented on pull request #3916: URL: https://github.com/apache/carbondata/pull/3916#issuecomment-717074706 @ajantha-bhat i have rebased, please have a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-4044) Fix dirty data in indexfile while IUD with stale data in segment folder
[ https://issues.apache.org/jira/browse/CARBONDATA-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221222#comment-17221222 ] Yahui Liu commented on CARBONDATA-4044: --- I have faced this issue, and try to solve it. Currently we will call CarbonLoaderUtil.checkAndCreateCarbonDataLocation to check and create Segment_XXX folder(if not exist), but we didn't check wherther stale data exist in segment folder when Segment_XXX folder already exists. My idea is to try to remove Segment_XXX folder always before creating Segment_XXX folder again. It will make sure there will be no stale data. Is this solution validation for all cases? Please provide some ideas to me. Thanks. > Fix dirty data in indexfile while IUD with stale data in segment folder > --- > > Key: CARBONDATA-4044 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4044 > Project: CarbonData > Issue Type: Bug >Reporter: Xingjun Hao >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > XX.mergecarbonindex and XX..segment records the indexfiles list of a segment. > now, we generate xx.mergeindexfile and xx.segment based on filter out all > indexfiles(including carbonindex and mergecarbonindex), which will leading > dirty data when there is stale data in segment folder. > For example, there are a stale index file in segment_0 folder, > "0_1603763776.carbonindex". > While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when > merge carbonindex files, we expect to only merge 0_16037752342.carbonindex, > But If we filter out all carbonindex in segment folder, both > "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and > recorded into segment file. > > While updating, there has same problem. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update
akashrn5 commented on pull request #3986: URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717054658 > We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973. > The result and comparison before and after improving is shown below, we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction. > > Time Before Improving Time after Improving > Average time > from 2nd to 21st UPDATE**99.19** **67.55** > UPDATE > 1st38.397 71.91 > 2nd74.918 67.386 > 3rd43.851 46.171 > 4th52.133 82.974 > 5th86.021 44.499 > 6th62.992 39.973 > 7th99.077 69.292 > 8th77.815 60.416 > 9th74.949 50.187 > 10th 85.874 50.973 > 11th 103.902 58.005 > 12th 77.534 86.087 > 13th 79.154 76.283 > 14th 116.117 72.029 > 15th 112.846 74.182 > 16th 124.707 69.282 > 17th 124.677 67.546 > 18th 147.15 66.652 > 19th 135.127 109.385 > 20th 133.94 80.849 > 21st 171.112 78.92 Cool This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3970: [CARBONDATA-4007] Fix multiple issues in SDK
CarbonDataQA1 commented on pull request #3970: URL: https://github.com/apache/carbondata/pull/3970#issuecomment-717049704 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2941/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3970: [CARBONDATA-4007] Fix multiple issues in SDK
CarbonDataQA1 commented on pull request #3970: URL: https://github.com/apache/carbondata/pull/3970#issuecomment-717049604 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4698/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3997: [CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test
marchpure commented on pull request #3997: URL: https://github.com/apache/carbondata/pull/3997#issuecomment-717045718 > @marchpure : The reason why we don't have TPCH and TPCDS in UT is we need a huge data set, loading huge data takes time. > > What is the reason behind adding this? we can anyways have separate TPCH or TPCDS machines that can have automation script to give performance benchmark on every release > > Also no need to run TPCH and TPCDS on every PR builder. Running once per release is enough. > > @QiangCai , @kunal642 : What's your opinion on this? 1. The TPCDS dataset in this PR is really small(totally 33KB). it won"t took so much time to load and query. It may help to avoid possible issues with accepted overhead. 2. it also help us to debug tpcds. explain plan ~analyse in local environment. the inspiration to add tpcds test case is CARBONDATA 4008. Whose issue is Spark on CarbonData will fail in TPCDS Query 83. This issue seems has been there for a log time, which implies that our UT is not enough. I believe that we can add a profile to turn on/off of TPCDS test in the future if the automatic TPCDS machine is ready. Maybe we can have a module name 'carbondata-integretion-test'? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure edited a comment on pull request #3997: [CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test
marchpure edited a comment on pull request #3997: URL: https://github.com/apache/carbondata/pull/3997#issuecomment-717045718 > @marchpure : The reason why we don't have TPCH and TPCDS in UT is we need a huge data set, loading huge data takes time. > > What is the reason behind adding this? we can anyways have separate TPCH or TPCDS machines that can have automation script to give performance benchmark on every release > > Also no need to run TPCH and TPCDS on every PR builder. Running once per release is enough. > > @QiangCai , @kunal642 : What's your opinion on this? 1. The TPCDS dataset in this PR is really small(totally 33KB). it won"t took so much time to load and query. It may help to avoid possible issues with accepted overhead. 2. it also help us to debug tpcds. explain plan ~analyse in local environment. the inspiration to add tpcds test case is CARBONDATA 4008. Whose issue is Spark on CarbonData will fail in TPCDS Query 83. This issue seems has been there for a log time, which implies that our UT is not enough. I believe that we can add a profile to turn on/off of TPCDS test in the future if the automatic TPCDS machine is ready. **Maybe we can have a module name 'carbondata-integretion-test'?** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene
dependabot[bot] commented on pull request #3456: URL: https://github.com/apache/carbondata/pull/3456#issuecomment-717039451 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk
dependabot[bot] commented on pull request #3447: URL: https://github.com/apache/carbondata/pull/3447#issuecomment-717038691 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3997: [CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test
ajantha-bhat commented on pull request #3997: URL: https://github.com/apache/carbondata/pull/3997#issuecomment-717031195 @marchpure : The reason why we don't have TPCH and TPCDS in UT is we need a huge data set, loading huge data takes time. What is the reason behind adding this? we can anyways have separate TPCH or TPCDS machines that can have automation script to give performance benchmark on every release Also no need to run TPCH and TPCDS on every PR builder. Running once per release is enough. @QiangCai , @kunal642 : What's your opinion on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update
Zhangshunyu commented on pull request #3986: URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717027548 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update
Zhangshunyu commented on pull request #3986: URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717025630 > We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973. > The result and comparison before and after improving is shown below, we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction. > > Time Before Improving Time after Improving > Average time > from 2nd to 21st UPDATE**99.19** **67.55** > UPDATE > 1st38.397 71.91 > 2nd74.918 67.386 > 3rd43.851 46.171 > 4th52.133 82.974 > 5th86.021 44.499 > 6th62.992 39.973 > 7th99.077 69.292 > 8th77.815 60.416 > 9th74.949 50.187 > 10th 85.874 50.973 > 11th 103.902 58.005 > 12th 77.534 86.087 > 13th 79.154 76.283 > 14th 116.117 72.029 > 15th 112.846 74.182 > 16th 124.707 69.282 > 17th 124.677 67.546 > 18th 147.15 66.652 > 19th 135.127 109.385 > 20th 133.94 80.849 > 21st 171.112 78.92 greate! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update
shenjiayu17 commented on pull request #3986: URL: https://github.com/apache/carbondata/pull/3986#issuecomment-717024597 We tested with 10 segments total 10G and update more than 20 times to see the cost, update row count of each time is 176973. The result and comparison before and after improving is shown below, we can see that time-consuming of UPDATE reduces by about 30% after improving horizontal compaction. | | Time Before Improving | Time after Improving | | --- | - | - | | Average time from 2nd to 21st UPDATE | **99.19** | **67.55** | | UPDATE | | | | 1st | 38.397 | 71.91 | | 2nd | 74.918 | 67.386 | | 3rd | 43.851 | 46.171 | | 4th | 52.133 | 82.974 | | 5th | 86.021 | 44.499 | | 6th | 62.992 | 39.973 | | 7th | 99.077 | 69.292 | | 8th | 77.815 | 60.416 | | 9th | 74.949 | 50.187 | | 10th | 85.874 | 50.973 | | 11th | 103.902 | 58.005 | | 12th | 77.534 | 86.087 | | 13th | 79.154 | 76.283 | | 14th | 116.117 | 72.029 | | 15th | 112.846 | 74.182 | | 16th | 124.707 | 69.282 | | 17th | 124.677 | 67.546 | | 18th | 147.15 | 66.652 | | 19th | 135.127 | 109.385 | | 20th | 133.94 | 80.849 | | 21st | 171.112 | 78.92 | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
ajantha-bhat commented on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717017303 @marchpure : yes, If update writes into new segment, It can fix the issue. If you handle this in your PR then I will close mine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat closed pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
ajantha-bhat closed pull request #3994: URL: https://github.com/apache/carbondata/pull/3994 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
ajantha-bhat commented on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717017602 handled in #3999 , I will close this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3997: [CARBONDATA-4045] Add TPCDS TestCase
CarbonDataQA1 commented on pull request #3997: URL: https://github.com/apache/carbondata/pull/3997#issuecomment-717017353 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2940/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3997: [CARBONDATA-4045] Add TPCDS TestCase
CarbonDataQA1 commented on pull request #3997: URL: https://github.com/apache/carbondata/pull/3997#issuecomment-717016610 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4697/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure edited a comment on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
marchpure edited a comment on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717015851 > @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test) yes. I have test it. in pr3999, update will generate new segment like merge into. which can void recreate segment file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
marchpure commented on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717015851 > @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test) yes. I have test it. update will generate new segment like merge into. which can void recreate segment file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
ajantha-bhat commented on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717014461 @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success
marchpure commented on pull request #3994: URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717010368 > > Hi, Ajantha. I have fixed this in #3999 , please have a check. > > @marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also) yes. update test case all passes i am handing other 10 test failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3996: [DOC] Adjust document for partition table
asfgit closed pull request #3996: URL: https://github.com/apache/carbondata/pull/3996 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org