[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532396937 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), Review comment: added in the same try- catch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532396566 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +//
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532396388 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) + throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { + FileFactory.mkdirs(trashFolderWithTimestamp); +} +if (!FileFactory.isFileExist(trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName())) { + copyToTrashFolder(filePathToCopy, trashFolderWithTimestamp + CarbonCommonConstants + .FILE_SEPARATOR + carbonFileToCopy.getName()); +} + } +} catch (IOException e) { + // in case there is any issue while copying the file to the trash folder, we need to delete + // the complete segment folder from the trash folder. The trashFolderWithTimestamp contains + // the segment folder too. Delete the folder as it is. + FileFactory.deleteFile(trashFolderWithTimestamp); + LOGGER.error("Error while creating trash folder or copying data to the trash folder", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return + */ + public static void copySegmentToTrash(CarbonFile segmentPath, + String trashFolderWithTimestamp) throws IOException { +try { + List dataFiles = FileFactory.getFolderList(segmentPath.getAbsolutePath());
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532396184 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) + throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { + FileFactory.mkdirs(trashFolderWithTimestamp); +} +if (!FileFactory.isFileExist(trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName())) { + copyToTrashFolder(filePathToCopy, trashFolderWithTimestamp + CarbonCommonConstants + .FILE_SEPARATOR + carbonFileToCopy.getName()); +} + } +} catch (IOException e) { + // in case there is any issue while copying the file to the trash folder, we need to delete + // the complete segment folder from the trash folder. The trashFolderWithTimestamp contains + // the segment folder too. Delete the folder as it is. + FileFactory.deleteFile(trashFolderWithTimestamp); + LOGGER.error("Error while creating trash folder or copying data to the trash folder", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return + */ + public static void copySegmentToTrash(CarbonFile segmentPath, + String trashFolderWithTimestamp) throws IOException { +try { + List dataFiles = FileFactory.getFolderList(segmentPath.getAbsolutePath());
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532395502 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) + throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { + FileFactory.mkdirs(trashFolderWithTimestamp); +} +if (!FileFactory.isFileExist(trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName())) { + copyToTrashFolder(filePathToCopy, trashFolderWithTimestamp + CarbonCommonConstants + .FILE_SEPARATOR + carbonFileToCopy.getName()); +} Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
nihal0107 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r532377244 ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,20 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index1", "index2") +sql(s"drop index index1 on $bloomSampleTable") +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index2") Review comment: done ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,20 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532370448 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) + throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamptimestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +try { + if (carbonFileToCopy.exists()) { +if (!FileFactory.isFileExist(trashFolderWithTimestamp)) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
VenuReddy2103 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r532368311 ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,20 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() Review comment: This line can be removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4030: [WIP][CARBONDATA-4064] Fix tpcds query failure with SI
ajantha-bhat commented on a change in pull request #4030: URL: https://github.com/apache/carbondata/pull/4030#discussion_r532368011 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -943,7 +943,11 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { val filterAttributes = filter.condition collect { case attr: AttributeReference => attr.name.toLowerCase } -val parentTableRelation = MatchIndexableRelation.unapply(filter.child).get +val parentRelation = MatchIndexableRelation.unapply(filter.child) +if (parentRelation.isEmpty) { + return false +} +val parentTableRelation = parentRelation.get Review comment: ok, keep PR in WIP to avoid comment on in-progress PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4027: [WIP]added compression and range column based FT for SI
ajantha-bhat commented on pull request #4027: URL: https://github.com/apache/carbondata/pull/4027#issuecomment-735573280 @nihal0107 : I think all the SI related testcases you can add in your existing #4023 itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4030: [CARBONDATA-4064] Fix tpcds query failure with SI
Indhumathi27 commented on a change in pull request #4030: URL: https://github.com/apache/carbondata/pull/4030#discussion_r532367206 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -943,7 +943,11 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { val filterAttributes = filter.condition collect { case attr: AttributeReference => attr.name.toLowerCase } -val parentTableRelation = MatchIndexableRelation.unapply(filter.child).get +val parentRelation = MatchIndexableRelation.unapply(filter.child) +if (parentRelation.isEmpty) { + return false +} +val parentTableRelation = parentRelation.get Review comment: yes.. in progress This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
Indhumathi27 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r532367120 ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,20 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index1", "index2") +sql(s"drop index index1 on $bloomSampleTable") +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index2") Review comment: add case for drop all cg/fg indexes also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4030: [CARBONDATA-4064] Fix tpcds query failure with SI
ajantha-bhat commented on a change in pull request #4030: URL: https://github.com/apache/carbondata/pull/4030#discussion_r532366408 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -943,7 +943,11 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { val filterAttributes = filter.condition collect { case attr: AttributeReference => attr.name.toLowerCase } -val parentTableRelation = MatchIndexableRelation.unapply(filter.child).get +val parentRelation = MatchIndexableRelation.unapply(filter.child) +if (parentRelation.isEmpty) { + return false +} +val parentTableRelation = parentRelation.get Review comment: can you please add a small testcase for this and update the PR description about when the parent Relation can be Empty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532364698 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,206 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x % 10, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction " + +"in system level control") { + CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", "2,2") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/") +// set threshold to 1MB in this test case +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "1") + +sql("drop table if exists minor_threshold") +sql("drop table if exists tmp") + +sql( + "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, date Timestamp," + +" name String, phonetype String, serialname String, salary Int) STORED AS carbondata" +) +sql( + "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date Timestamp," + +" name String, phonetype String, serialname String, salary Int) STORED AS carbondata" +) + +val initframe = generateData(10) +initframe.write + .format("carbondata") + .option("tablename", "tmp") + .mode(SaveMode.Overwrite) + .save() +// load 3 segments +sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) +sql("LOAD DATA LOCAL INPATH '" + csvFilePath2 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) +sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) + +// insert a new segment(id is 3) data size exceed 1 MB +sql("insert into minor_threshold select * from tmp") + +// load another 3 segments +sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) +sql("LOAD DATA LOCAL INPATH '" + csvFilePath2 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) +sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold OPTIONS" + + "('DELIMITER'= ',', 'QUOTECHAR'= '\"')" +) + +sql("show segments for table minor_threshold").show(100, false) +// do minor compaction +sql("alter table minor_threshold compact 'minor'" +) +// check segment 3 whose size exceed the limit should not be compacted +val carbonTable = CarbonMetadata.getInstance().getCarbonTable( + CarbonCommonConstants.DATABASE_DEFAULT_NAME, "minor_threshold") +val carbonTablePath = carbonTable.getMetadataPath +val segments = SegmentStatusManager.readLoadMetadata(carbonTablePath); +assertResult(SegmentStatus.SUCCESS)(segments(3).getSegmentStatus) +assertResult(100030)(sql("select count(*) from minor_threshold").collect().head.get(0)) +// reset to 0 +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "0") + } Review comment: support dynamically changing the table property also, by alter table set/unset tblpeoperties command. With that in the same testcase you can test table level by loading some more data. No need to create new tables again to test it ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,206 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x % 10, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction " + +"in system level control") { + CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", "2,2") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532364299 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +//
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532364095 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +//
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
Indhumathi27 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r532363660 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -183,17 +183,20 @@ private[sql] case class DropIndexCommand( parentCarbonTable) parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata +var hasCgFgIndexes = false Review comment: Can keep old code only i think, as hasCgFgIndexes is not used in multiple places. Just change hasCgFgIndexes logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532363133 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532363057 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ## @@ -191,6 +191,10 @@ private[sql] case class CarbonDescribeFormattedCommand( CarbonProperties.getInstance() .getProperty(CarbonCommonConstants.CARBON_MAJOR_COMPACTION_SIZE, CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE)), ""), + (CarbonCommonConstants.TABLE_MINOR_COMPACTION_SIZE.toUpperCase, +tblProps.getOrElse(CarbonCommonConstants.TABLE_MINOR_COMPACTION_SIZE, + CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE, "0")), ""), Review comment: ok, lets keep min configurable to 1 MB and if the user not configured, we can keep -1. because 0 means, as per property skip all the segments that are greater than 0 MB. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532362600 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -292,6 +293,33 @@ object CommonUtil { } } + /** + * This method will validate the minor compaction size specified by the user + * the property is used while doing minor compaction + * + * @param tableProperties + */ + def validateMinorCompactionSize(tableProperties: Map[String, String]): Unit = { +var minorCompactionSize: Integer = 0 +val tblPropName = CarbonCommonConstants.TABLE_MINOR_COMPACTION_SIZE +if (tableProperties.get(tblPropName).isDefined) { + val minorCompactionSizeStr: String = +parsePropertyValueStringInMB(tableProperties(tblPropName)) + try { +minorCompactionSize = Integer.parseInt(minorCompactionSizeStr) + } catch { +case e: NumberFormatException => + throw new MalformedCarbonCommandException(s"Invalid $tblPropName value found: " + +s"$minorCompactionSizeStr, only int value greater than 0 is supported.") + } + if (minorCompactionSize < 0) { Review comment: if zero is not supported, then <= 0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532362493 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -292,6 +293,33 @@ object CommonUtil { } } + /** + * This method will validate the minor compaction size specified by the user + * the property is used while doing minor compaction + * + * @param tableProperties + */ + def validateMinorCompactionSize(tableProperties: Map[String, String]): Unit = { +var minorCompactionSize: Integer = 0 +val tblPropName = CarbonCommonConstants.TABLE_MINOR_COMPACTION_SIZE +if (tableProperties.get(tblPropName).isDefined) { + val minorCompactionSizeStr: String = +parsePropertyValueStringInMB(tableProperties(tblPropName)) + try { +minorCompactionSize = Integer.parseInt(minorCompactionSizeStr) + } catch { +case e: NumberFormatException => + throw new MalformedCarbonCommandException(s"Invalid $tblPropName value found: " + +s"$minorCompactionSizeStr, only int value greater than 0 is supported.") Review comment: 0 is also supported right ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532362299 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -998,6 +998,23 @@ public long getMajorCompactionSize() { return compactionSize; } + /** + * returns minor compaction size value from carbon properties or 0 if it is not valid or + * not configured + * + * @return compactionSize + */ + public long getMinorCompactionSize() { +long compactionSize = 0; +try { + compactionSize = Long.parseLong(getProperty( Review comment: please handle if user sets negative value. user can set from 0 to LONG_MAX I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
ajantha-bhat commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r532361537 ## File path: docs/dml-of-carbondata.md ## @@ -529,6 +529,10 @@ CarbonData DML statements are documented here,which includes: * Level 1: Merging of the segments which are not yet compacted. * Level 2: Merging of the compacted segments again to form a larger segment. + The segment whose data size exceed limit of carbon.minor.compaction.size will not be included in + minor compaction. If user want to control the size of segment included in minor compaction, + configure the property with appropriate value in MB, if not configure, will merge segments only + based on num of segments. Review comment: ```suggestion based on number of segments. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured
Indhumathi Muthu Murugesh created CARBONDATA-4064: - Summary: TPCDS queries are failing with NOne.get exception when table has SI configured Key: CARBONDATA-4064 URL: https://issues.apache.org/jira/browse/CARBONDATA-4064 Project: CarbonData Issue Type: Bug Reporter: Indhumathi Muthu Murugesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532360958 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +//
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532359426 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532357875 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + pub
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532357801 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { + for (String staleSegment : staleSegments) { +String segmentNumber = staleSegment.split(CarbonCommonConstants.UNDERSCORE)[0]; +SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), +staleSegment); +Map locationMap = fileStore.getSegmentFile() +.getLocationMap(); +if (locationMap != null) { + CarbonFile segmentLocation = FileFactory.getCarbonFile(carbonTable.getTablePath() + + CarbonCommonConstants.FILE_SEPARATOR + fileStore.getSegmentFile().getLocationMap() + .entrySet().iterator().next().getKey()); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentLocation, CarbonTablePath.getTrashFolderPath( + carbonTable.getTablePath()) + CarbonCommonConstants.FILE_SEPARATOR + + timeStampForTrashFolder + CarbonCommonConstants.FILE_SEPARATOR + CarbonTablePath + .SEGMENT_PREFIX + segmentNumber); + // Deleting the stale Segment folders. + try { +CarbonUtil.deleteFoldersAndFiles(segmentLocation); + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentNumber + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } + // delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), + staleSegment)); +} + } + staleSegments.clear(); Review comment: yes, can be removed ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * +
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r532357687 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.fs.Path; +import org.apache.log4j.Logger; + +/** + * Mantains the clean files command in carbondata. This class has methods for clean files + * operation. + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegments = getStaleSegments(carbonTable); +if (staleSegments.size() > 0) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files become data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-735523234 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3213/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files become data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-735521831 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4968/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4063) Refactor getBlockId and getShortBlockId function
Xingjun Hao created CARBONDATA-4063: --- Summary: Refactor getBlockId and getShortBlockId function Key: CARBONDATA-4063 URL: https://issues.apache.org/jira/browse/CARBONDATA-4063 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao Now. getBlockId and getShortBlockId functions are too complex and unreadable. Which need to be simpler and readable. -- This message was sent by Atlassian Jira (v8.3.4#803005)