[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [HOTFIX] Fix Random CI Failures of HiveCarbonTest
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734699847 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
nihal0107 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531434575 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, Review comment: ok, handle the scenario when no cg or fg index exists then set `indexExists` property as false. Earlier this case was not handled when we drop all indexes then we didn't set the `indexExists` property as false. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 edited a comment on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 edited a comment on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734690792 > @shenjiayu17 : please update `/docs/spatial-index-guide.md` about what new UDF is supported for query and what functionality changed spatial-index-guide.md has been updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734690792 > @shenjiayu17 : please update `/docs/spatial-index-guide.md` about what new UDF is supported for query and what functionality changed spatial-index-guide.md has been updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531416935 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, + // then set indexExists as false to return empty index list for next query. + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: I understand that `indexMetadata` will have SI indexes as well. But, What i meant was indexMetadata.getIndexesMap.size() != 0 always evaluates to true at line 189. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734683112 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3189/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734682627 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4944/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531416935 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, + // then set indexExists as false to return empty index list for next query. + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: Got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-734677407 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3188/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-734675548 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4943/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
Indhumathi27 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531409958 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, + // then set indexExists as false to return empty index list for next query. + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: indexmetadata will also have SI Indexes. indexExists property is only for CG or FG indexes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531409321 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, Review comment: For example, create 2 bloom indexes, Drop both of them. In last index drop, `indexMetadata` will be null at this point(line 186). We do not seem to set `indexExists` property to `false` in that case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r531365242 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = System.currentTimeMillis(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = System.currentTimeMillis(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +// for each segment we get the indexfile first, then we get the carbondata file. Move both +// of those to
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531399137 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, + // then set indexExists as false to return empty index list for next query. + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: `indexMetadata.getIndexesMap.size() != 0` would always be true at this point. `indexMetadata` will be null if empty. It is redundant check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r531399137 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,12 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + // check if any CG or FG index exists. If not exists, + // then set indexExists as false to return empty index list for next query. + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: `indexMetadata.getIndexesMap.size() != 0` would always be true at this point. `indexMetadata` will be null if empty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
Karan980 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734650646 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734650115 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3186/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734647802 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4941/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4029: refact carbon util
CarbonDataQA2 commented on pull request #4029: URL: https://github.com/apache/carbondata/pull/4029#issuecomment-734647130 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3185/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734646663 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4940/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734646180 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3184/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4029: refact carbon util
CarbonDataQA2 commented on pull request #4029: URL: https://github.com/apache/carbondata/pull/4029#issuecomment-734645757 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4939/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r531089568 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -2086,6 +2087,34 @@ public int getMaxSIRepairLimit(String dbName, String tableName) { return Math.abs(Integer.parseInt(thresholdValue)); } + /** + * The below method returns the time(in milliseconds) for which timestamp folder retention in + * trash folder will take place. + */ + public long getTrashFolderRetentionTime() { +String propertyValue = getProperty(CarbonCommonConstants.CARBON_TRASH_RETENTION_DAYS); Review comment: instead of this, just call `getProperty` with default value also, then all these null checks are not needed ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = System.currentTimeMillis(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for pa
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531372539 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -736,11 +738,22 @@ private CarbonCommonConstants() { @CarbonProperty(dynamicConfigurable = true) public static final String CARBON_MAJOR_COMPACTION_SIZE = "carbon.major.compaction.size"; + /** + * Size of Minor Compaction in MBs + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_MINOR_COMPACTION_SIZE = "carbon.minor.compaction.size"; + /** * By default size of major compaction in MBs. */ public static final String DEFAULT_CARBON_MAJOR_COMPACTION_SIZE = "1024"; + /** + * By default size of minor compaction in MBs. + */ + public static final String DEFAULT_CARBON_MINOR_COMPACTION_SIZE = "1048576"; Review comment: @ajantha-bhat Yes, 1TB is not proper here. Will remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531372454 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -736,11 +738,22 @@ private CarbonCommonConstants() { @CarbonProperty(dynamicConfigurable = true) public static final String CARBON_MAJOR_COMPACTION_SIZE = "carbon.major.compaction.size"; + /** + * Size of Minor Compaction in MBs + */ + @CarbonProperty(dynamicConfigurable = true) Review comment: @ajantha-bhat some users need system value to control all tables if use same value, like major compaction system value. I think by default we dont use it, and the user can specify for each table, if he want to set for all tables, he can use the system level parameter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531371146 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,78 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction") { Review comment: @Zhangshunyu : Agree , I got confused for testcase, we need more than 1 MB data for testing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531371146 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,78 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction") { Review comment: @Zhangshunyu : Agree , I got confused. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531370782 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,78 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction") { Review comment: @ajantha-bhat set to 1MB means the segment size > 1MB, it should be ignore in compaction flow. how can 4 lines data reach to 1MB? if use 4 lines data, both will compact. here we use huge data to exceed 1MB to test the hug segment is ignored This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734632195 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4942/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734632043 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3187/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734630191 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3183/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734629906 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4938/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531367494 ## File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ## @@ -311,10 +311,8 @@ public static String getLoadNumberFromLoadName(String loadName) { listOfSegmentsToBeMerged = identifySegmentsToBeMergedBasedOnSize(compactionSize, listOfSegmentsLoadedInSameDateInterval, carbonLoadModel); } else { - - listOfSegmentsToBeMerged = - identifySegmentsToBeMergedBasedOnSegCount(listOfSegmentsLoadedInSameDateInterval, - tableLevelProperties); + listOfSegmentsToBeMerged = identifySegmentsToBeMergedBasedOnSegCount(compactionSize, Review comment: may be you can calculate compaction size, inside this method as table property is already there. No need to modify this method signature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531367024 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,78 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction") { Review comment: Also test for both partition and non partition flow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531366602 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,78 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction") { Review comment: can simplify the test case (no need of huge rows) Just insert 1 row 4 times and do minor compaction, it should work. and set table property to 1 MB and insert 1 row 4 times and do minor compaction, it shouldn't compact. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531365650 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -736,11 +738,22 @@ private CarbonCommonConstants() { @CarbonProperty(dynamicConfigurable = true) public static final String CARBON_MAJOR_COMPACTION_SIZE = "carbon.major.compaction.size"; + /** + * Size of Minor Compaction in MBs + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_MINOR_COMPACTION_SIZE = "carbon.minor.compaction.size"; + /** * By default size of major compaction in MBs. */ public static final String DEFAULT_CARBON_MAJOR_COMPACTION_SIZE = "1024"; + /** + * By default size of minor compaction in MBs. + */ + public static final String DEFAULT_CARBON_MINOR_COMPACTION_SIZE = "1048576"; Review comment: Don't keep default value, I have seen many user's segment more than 1TB. so for them auto compaction will not work by default. so, I suggest if table property is configured, then consider size for minor compaction. else the base behavior like consider all segments based on numbers. Also can support alter table property This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4028: [WIP] Fix hivetest random failure
marchpure commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734627533 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r531365161 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -736,11 +738,22 @@ private CarbonCommonConstants() { @CarbonProperty(dynamicConfigurable = true) public static final String CARBON_MAJOR_COMPACTION_SIZE = "carbon.major.compaction.size"; + /** + * Size of Minor Compaction in MBs + */ + @CarbonProperty(dynamicConfigurable = true) Review comment: Just the table property is enough, why again a carbon property is needed ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
ajantha-bhat commented on a change in pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#discussion_r531364008 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ## @@ -617,4 +637,157 @@ object SecondaryIndexUtil { identifiedSegments } + /** + * This method deletes the old carbondata files. + */ + private def deleteOldCarbonDataFiles(factTimeStamp: Long, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { +validSegments.asScala.foreach { segment => + val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath, +segment.getSegmentNo) + val dataFiles = FileFactory.getCarbonFile(segmentPath).listFiles(new CarbonFileFilter { +override def accept(file: CarbonFile): Boolean = { + file.getName.endsWith(CarbonTablePath.CARBON_DATA_EXT) +}}) + dataFiles.foreach(dataFile => + if (DataFileUtil.getTimeStampFromFileName(dataFile.getAbsolutePath).toLong < factTimeStamp) { +dataFile.delete() + }) +} + } + + def mergeSISegmentDataFiles(sparkSession: SparkSession, + carbonLoadModel: CarbonLoadModel, + carbonMergerMapping: CarbonMergerMapping): Array[((String, Boolean), String)] = { +val validSegments = carbonMergerMapping.validSegments.toList +val indexCarbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val absoluteTableIdentifier = indexCarbonTable.getAbsoluteTableIdentifier +val jobConf: JobConf = new JobConf(FileFactory.getConfiguration) +SparkHadoopUtil.get.addCredentials(jobConf) +val job: Job = new Job(jobConf) +val format = CarbonInputFormatUtil.createCarbonInputFormat(absoluteTableIdentifier, job) +CarbonInputFormat.setTableInfo(job.getConfiguration, indexCarbonTable.getTableInfo) +val proj = indexCarbonTable.getCreateOrderColumn + .asScala + .map(_.getColName) + .filterNot(_.equalsIgnoreCase(CarbonCommonConstants.POSITION_REFERENCE)).toSet +var mergeStatus = ArrayBuffer[((String, Boolean), String)]() +val mergeSize = getTableBlockSizeInMb(indexCarbonTable)(sparkSession) * 1024 * 1024 +val header = indexCarbonTable.getCreateOrderColumn.asScala.map(_.getColName).toArray +val outputModel = getLoadModelForGlobalSort(sparkSession, indexCarbonTable) +CarbonIndexUtil.initializeSILoadModel(outputModel, header) +outputModel.setFactTimeStamp(carbonLoadModel.getFactTimeStamp) +val segmentMetaDataAccumulator = sparkSession.sqlContext + .sparkContext + .collectionAccumulator[Map[String, SegmentMetaDataInfo]] +validSegments.foreach { segment => Review comment: This can be a spark job, for multiple segments. Handling sequentially is bad This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu opened a new pull request #4029: refact carbon util
Zhangshunyu opened a new pull request #4029: URL: https://github.com/apache/carbondata/pull/4029 ### Why is this PR needed? Currentlly, we have some Carbon{$FUNCTION_NAME}Util and some CarbonUtil/CarbonUtils, it has some mixed functions in CarbonUtil , we should clean code. ### What changes were proposed in this PR? Refact the code to clean it ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
ajantha-bhat commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-734625127 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734619484 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3182/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734612095 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4937/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734528399 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3181/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734526668 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4935/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4028: [WIP] Fix hivetest random failure
marchpure commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734525504 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734523383 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3180/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734523043 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4934/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734522833 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3179/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734521243 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4933/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4028: [WIP] Fix hivetest random failure
marchpure commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734504948 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734446125 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3178/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-73453 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4932/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734440008 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3177/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734439716 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4931/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734431498 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4028: [WIP] Fix hivetest random failure
marchpure commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734412704 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734410043 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3174/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734409813 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4928/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
marchpure commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734383891 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4028: [WIP] Fix hivetest random failure
marchpure commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734359374 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4027: [WIP]added compression and range column based FT for SI
CarbonDataQA2 commented on pull request #4027: URL: https://github.com/apache/carbondata/pull/4027#issuecomment-734348605 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4926/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4027: [WIP]added compression and range column based FT for SI
CarbonDataQA2 commented on pull request #4027: URL: https://github.com/apache/carbondata/pull/4027#issuecomment-734345215 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3172/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734341612 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3171/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734339926 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4927/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734339351 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3173/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734337379 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4925/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734328797 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3170/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4028: [WIP] Fix hivetest random failure
CarbonDataQA2 commented on pull request #4028: URL: https://github.com/apache/carbondata/pull/4028#issuecomment-734323366 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4924/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-734312587 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3168/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-734308995 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4923/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734304033 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3165/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [WIP] blockid code clean
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-734301494 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4920/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
marchpure commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734285029 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #4028: [WIP] Fix hivetest random failure
marchpure opened a new pull request #4028: URL: https://github.com/apache/carbondata/pull/4028 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 opened a new pull request #4027: [WIP]added compression testcase for SI
nihal0107 opened a new pull request #4027: URL: https://github.com/apache/carbondata/pull/4027 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #4026: [WIP] blockid code clean
marchpure opened a new pull request #4026: URL: https://github.com/apache/carbondata/pull/4026 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530973967 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.cleanfiles + +import java.io.{File, PrintWriter} + +import scala.io.Source + +import org.apache.spark.sql.{CarbonEnv, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath + +class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { + + var count = 0 + + test("clean up table and test trash folder with IN PROGRESS segments") { +// do not send the segment folders to trash +createTable() +loadData() +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.TRASH_DIR +editTableStatusFile(path) +assert(!FileFactory.isFileExist(trashFolderPath)) + +val segmentNumber1 = sql(s"""show segments for table cleantest""").count() +assert(segmentNumber1 == 4) +sql(s"CLEAN FILES FOR TABLE cleantest").show +val segmentNumber2 = sql(s"""show segments for table cleantest""").count() +assert(0 == segmentNumber2) +assert(!FileFactory.isFileExist(trashFolderPath)) +count = 0 Review comment: yes, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530973423 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFilesCommandPartitionTable.scala ## @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.cleanfiles + +import java.io.{File, PrintWriter} + +import scala.io.Source + +import org.apache.spark.sql.{CarbonEnv, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath + +class TestCleanFilesCommandPartitionTable extends QueryTest with BeforeAndAfterAll { + + var count = 0 + + test("clean up table and test trash folder with IN PROGRESS segments") { +// do not send the segment folders to trash +createParitionTable() +loadData() +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.TRASH_DIR +editTableStatusFile(path) +assert(!FileFactory.isFileExist(trashFolderPath)) +val segmentNumber1 = sql(s"""show segments for table cleantest""").count() +assert(segmentNumber1 == 4) +sql(s"CLEAN FILES FOR TABLE cleantest").show +val segmentNumber2 = sql(s"""show segments for table cleantest""").count() +assert(0 == segmentNumber2) +assert(!FileFactory.isFileExist(trashFolderPath)) +count = 0 +var list = getFileCountInTrashFolder(trashFolderPath) +// no carbondata file is added to the trash +assert(list == 0) +sql("""DROP TABLE IF EXISTS CLEANTEST""") + } + + test("clean up table and test trash folder with Marked For Delete segments") { +// do not send MFD folders to trash +createParitionTable() +loadData() +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.TRASH_DIR +assert(!FileFactory.isFileExist(trashFolderPath)) +sql(s"""Delete from table cleantest where segment.id in(1)""") +val segmentNumber1 = sql(s"""show segments for table cleantest""").count() +sql(s"CLEAN FILES FOR TABLE cleantest").show +val segmentNumber2 = sql(s"""show segments for table cleantest""").count() +assert(segmentNumber1 == segmentNumber2 + 1) +assert(!FileFactory.isFileExist(trashFolderPath)) +count = 0 +var list = getFileCountInTrashFolder(trashFolderPath) +// no carbondata file is added to the trash +assert(list == 0) +sql("""DROP TABLE IF EXISTS CLEANTEST""") + } + + test("clean up table and test trash folder with compaction") { +// do not send compacted folders to trash +createParitionTable() +loadData() +sql(s"""ALTER TABLE CLEANTEST COMPACT "MINOR" """) + +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.TRASH_DIR +assert(!FileFactory.isFileExist(trashFolderPath)) + +val segmentNumber1 = sql(s"""show segments for table cleantest""").count() +sql(s"CLEAN FILES FOR TABLE cleantest").show +val segmentNumber2 = sql(s"""show segments for table cleantest""").count() +assert(segmentNumber1 == segmentNumber2 + 4) +assert(!FileFactory.isFileExist(trashFolderPath)) +count = 0 +val list = getFileCountInTrashFolder(trashFolderPath) +// no carbondata file is added to the trash +assert(list == 0) +sql("""DROP TABLE IF EXISTS CLEANTEST""") + } + + + + test("test trash folder with 2 segments with same segment number") { +createParitionTable() +sql(s"""INSERT INTO CLEANTEST SELECT 1, 2,"hello","abc) + +val path = CarbonEnv.getCarbonTable(Some(
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734250541 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3164/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530970382 ## File path: docs/clean-files.md ## @@ -0,0 +1,56 @@ + + + +## CLEAN FILES + +Clean files command is used to remove the Compacted, Marked For Delete ,In Progress which are stale and partial(Segments which are missing from the table status file but their data is present) + segments from the store. + + Clean Files Command + ``` + CLEAN FILES FOR TABLE TABLE_NAME + ``` + + +### TRASH FOLDER + + Carbondata supports a Trash Folder which is used as a redundant folder where all stale(segments whose entry is not in tablestatus file) carbondata segments are moved to during clean files operation. + This trash folder is mantained inside the table path and is a hidden folder(.Trash). The segments that are moved to the trash folder are mantained under a timestamp + subfolder(each clean files operation is represented by a timestamp). This helps the user to list down segments in the trash folder by timestamp. By default all the timestamp sub-directory have an expiration + time of 7 days(since the timestamp it was created) and it can be configured by the user using the following carbon property. The supported values are between 0 and 365(both included.) + ``` + carbon.trash.retention.days = "Number of days" + ``` + Once the timestamp subdirectory is expired as per the configured expiration day value, that subdirectory is deleted from the trash folder in the subsequent clean files command. + +### FORCE DELETE TRASH +The force option with clean files command deletes all the files and folders from the trash folder. + + ``` + CLEAN FILES FOR TABLE TABLE_NAME options('force'='true') + ``` + +### DATA RECOVERY FROM THE TRASH FOLDER + +The segments can be recovered from the trash folder by creating table from the desired segment location Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530969904 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for table given table. In this method, we first + * get the stale segments(segments whose entry is not in the table status, but are present in + * the metadata folder) or in case when table status is deleted. To identify the stale segments + * we compare the segment files in the metadata folder with table status file, if it exists. The + * identified stale segments are then copied to the trash folder and then their .segment files + * are also deleted from the metadata folder. We only compare with tablestatus file here, not + * with tablestatus history file. + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +String metaDataLocation = carbonTable.getMetadataPath(); +long timeStampForTrashFolder = System.currentTimeMillis(); +String segmentFilesLocation = +CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath()); +CarbonFile[] segmentFilesList = FileFactory.getCarbonFile(segmentFilesLocation).listFiles(); +// there are no segments present in the Metadata folder. Can return here +if (segmentFilesList.length == 0) { + return; +} +LoadMetadataDetails[] details = SegmentStatusManager.readLoadMetadata(metaDataLocation); +List staleSegments = getStaleSegments(details, segmentFilesList); Review comment: Changed. separated flow for normal table and partition table. In case of normal table, getting the segment path from the .segment file location map and moving complete segment. In case of partition table flow, moving it file by file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530964261 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1414,6 +1414,23 @@ private CarbonCommonConstants() { public static final String BITSET_PIPE_LINE_DEFAULT = "true"; + /** + * this is the user defined time(in days), timestamp subfolders in trash directory will take + * this value as retention time. They are deleted after this time. + */ + @CarbonProperty + public static final String CARBON_TRASH_RETENTION_DAYS = "carbon.trash.retention.days"; + + /** + * Default retention time of a subdirectory in trash folder is 7 days. + */ + public static final String CARBON_TRASH_RETENTION_DAYS_DEFAULT = "7"; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734242575 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4919/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530957279 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala ## @@ -54,6 +52,12 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging { val indexTables = CarbonIndexUtil .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession) indexTables.foreach { indexTable => + if (cleanFilesPostEvent.force) { +TrashUtil.emptyTrash(indexTable.getTablePath) + } else { +TrashUtil.deleteExpiredDataFromTrash(indexTable.getTablePath) + } + CleanFilesUtil.cleanStaleSegments(indexTable) Review comment: removed trash code flow from cleanfilespostevenlistener ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.cleanfiles + +import java.io.{File, PrintWriter} + +import scala.io.Source + +import org.apache.spark.sql.{CarbonEnv, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath + +class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { + + var count = 0 + + test("clean up table and test trash folder with IN PROGRESS segments") { +// do not send the segment folders to trash +createTable() +loadData() +val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) + .getTablePath +val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.TRASH_DIR Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530956951 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param fromPath the path from which to copy the file + * @param toPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String fromPath, String toPath) throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(toPath); + dataInputStream = FileFactory.getDataInputStream(fromPath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + fromPath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { + FileFactory.mkdirs(trashFolderWithTimestamp); +} +if (!FileFactory.isFileExist(trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName())) { + copyToTrashFolder(filePathToCopy, trashFolderWithTimestamp + CarbonCommonConstants + .FILE_SEPARATOR + carbonFileToCopy.getName()); +} + } +} catch (IOException e) { + LOGGER.error("Error while creating trash folder or copying data to the trash folder", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return + */ + public static void copySegmentToTrash(CarbonFile segmentPath, + String trashFolderWithTimestamp) throws IOException { +try { + List dataFiles = FileFactory.getFolderList(segmentPath.getAbsolutePath()); + for (CarbonFile carbonFile : dataFiles) { +copyFileToTrashFolder(carbonFile.getAbsolutePath(), trashFolderWithTimestamp); + } + LOGGER.info("Segment: " + segmentPath.getAbsolutePath() + " has been copied to" + + " the trash folder successfully"); +} catch (IOException e) { + LOGGER.error("Error while getting folder list for the segment", e); + throw e; +} + } + +
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530955940 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param fromPath the path from which to copy the file + * @param toPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String fromPath, String toPath) throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(toPath); + dataInputStream = FileFactory.getDataInputStream(fromPath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + fromPath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { + FileFactory.mkdirs(trashFolderWithTimestamp); +} +if (!FileFactory.isFileExist(trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName())) { + copyToTrashFolder(filePathToCopy, trashFolderWithTimestamp + CarbonCommonConstants + .FILE_SEPARATOR + carbonFileToCopy.getName()); +} + } +} catch (IOException e) { + LOGGER.error("Error while creating trash folder or copying data to the trash folder", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return + */ + public static void copySegmentToTrash(CarbonFile segmentPath, Review comment: Is now being used for normal table clean stale segments flow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r530955721 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for table given table. In this method, we first + * get the stale segments(segments whose entry is not in the table status, but are present in + * the metadata folder) or in case when table status is deleted. To identify the stale segments + * we compare the segment files in the metadata folder with table status file, if it exists. The + * identified stale segments are then copied to the trash folder and then their .segment files + * are also deleted from the metadata folder. We only compare with tablestatus file here, not + * with tablestatus history file. + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +String metaDataLocation = carbonTable.getMetadataPath(); +long timeStampForTrashFolder = System.currentTimeMillis(); +String segmentFilesLocation = +CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath()); +CarbonFile[] segmentFilesList = FileFactory.getCarbonFile(segmentFilesLocation).listFiles(); +// there are no segments present in the Metadata folder. Can return here +if (segmentFilesList.length == 0) { + return; +} +LoadMetadataDetails[] details = SegmentStatusManager.readLoadMetadata(metaDataLocation); +List staleSegments = getStaleSegments(details, segmentFilesList); + +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +// for each segment we get the indexfile first, then we get the carbondata file. Move both +// of those to trash folder +List filesToDelete = new ArrayList<>(); +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +List indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true, +FileFactory.getConfiguration()); +for (String file : indexOrMergeFiles) { + // copy the index or merge file to the trash folder + TrashUtil.copyFileToTrashFolder(file, CarbonTablePath.getTrashFolderPath(carbonTable + .getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + timeStampForTrashFolder + + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath.SEGMENT_PREFIX + + segmentNumber); + filesToDelete.add(FileFactory.getCarbonFile(file)); +} +// get carbondata files from here +Map> indexFilesMap = fileStore.getIndexFilesMap(); +for (Map.Entry> entry : indexFilesMap.entrySet()) { + for (String file : entry.getValue()) { +// copy the carbondata file to trash +TrashUtil.copyFileToTrashFolder(file, C
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734227969 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4917/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734190805 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3161/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
marchpure commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734189785 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
ajantha-bhat commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734184258 @shenjiayu17 : And how is the performance after changing the algorithm? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
ajantha-bhat commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-734182769 @shenjiayu17 : please update `/docs/spatial-index-guide.md` about what new UDF is supported for query and what functionality changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by Discovery Team. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support :* * *"OR", means calculating union of two polygons* * *"AND", means calculating intersection of two polygons* geo util UDFs : * _*GeoIdToGridXy(Long geoId) :* *Pair*_ * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ * _*GeoIdToLatLng(Long geoId) : Pair*_ * _*ToUpperLayerGeoId(Long geoId) : Long*_ * _*ToRangeList (String polygon) : List*_ 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example, {code:java} INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; It uesed to be as below, '855280799612' is generated internally, ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |855280799612|157542840|116285807|40084087| ++-+-++ but now is ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |0 |157542840|116285807|40084087| ++-+-++{code} was: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support :* * *"OR", means calculating union of two polygons* * *"AND", means calculating intersection of two polygons* geo util UDFs : * _*GeoIdToGridXy(Long geoId) :* *Pair*_ * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ * _*GeoIdToLatLng(Long geoId) : Pair*_ * _*ToUpperLayerGeoId(Long geoId) : Long*_ * _*ToRangeList (String polygon) : List*_ 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example, {code:java} INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; It uesed to be as below, '855280799612' is generated internally, ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |855280799612|157542840|116285807|40084087| ++-+-++ but now is ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |0 |157542840|116285807|40084087| ++-+-++{code} > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx > > Time Spent: 4h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by Discovery Team. > 1. Replace geohash encoded algorithm,
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734177874 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4918/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734177288 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3162/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: (was: Genex Cloud&Discvoery Carbon Spatial Index Specification.docx) > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx > > Time Spent: 4h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
marchpure commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734175206 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4059) Block compaction on SI table.
Nihal kumar ojha created CARBONDATA-4059: Summary: Block compaction on SI table. Key: CARBONDATA-4059 URL: https://issues.apache.org/jira/browse/CARBONDATA-4059 Project: CarbonData Issue Type: Bug Reporter: Nihal kumar ojha Currently compaction is allowed on SI table. Because of this if only SI table is compacted then running filter query query on main table is causing more data scan of SI table which will causing performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
marchpure commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-734161643 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org