[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533962334 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl; +import org.apache.carbondata.geo.GeoConstants; +import org.apache.carbondata.geo.GeoHashUtils; +import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl; + +/** + * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to + * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated + * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs + * present in those list of ranges. + */ +@InterfaceAudience.Internal +public class PolygonRangeListExpression extends UnknownExpression +implements ConditionalExpression { + + private String polygonRangeList; + + private String opType; + + private List ranges = new ArrayList(); + + private ColumnExpression column; + + private static final ExpressionResult trueExpRes = + new ExpressionResult(DataTypes.BOOLEAN, true); + + private static final ExpressionResult falseExpRes = + new ExpressionResult(DataTypes.BOOLEAN, false); + + public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) { +this.polygonRangeList = polygonRangeList; +this.opType = opType; +this.column = new ColumnExpression(columnName, DataTypes.LONG); + } + + private void processExpression() { +try { + // 1. parse the range list string + List rangeLists = new ArrayList<>(); + Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION); + Matcher matcher = pattern.matcher(polygonRangeList); + while (matcher.find()) { +String matchedStr = matcher.group(); +rangeLists.add(matchedStr); + } + // 2. process the range lists + if (rangeLists.size() > 0) { +List processedRangeList = getRangeListFromString(rangeLists.get(0)); +for (int i = 1; i < rangeLists.size(); i++) { + List tempRangeList = getRangeListFromString(rangeLists.get(i)); + processedRangeList = GeoHashUtils.processRangeList( +processedRangeList, tempRangeList, opType); +} +ranges = processedRangeList; +GeoHashUtils.validateRangeList(ranges); + } +} catch (Exception e) { + throw new RuntimeException(e); +} + } + + private void sortRange(List rangeList) { +rangeList.sort(new Comparator() { + @Override + public int compare(Long[] x, Long[] y) { +return Long.compare(x[0], y[0]); + } +}); + } + + private void combineRange(List rangeList) { +if (rangeList.size() > 1) { + for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) { +long previousEnd = rangeList.get(i)[1]; +long nextStart = rangeList.get(j)[0]; +if (previousEnd
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533961464 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl; +import org.apache.carbondata.core.util.CustomIndex; +import org.apache.carbondata.geo.GeoConstants; +import org.apache.carbondata.geo.GeoHashIndex; +import org.apache.carbondata.geo.GeoHashUtils; +import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl; + +/** + * InPolygonList expression processor. It inputs the InPolygonList string to the Geo + * implementation's query method, gets a list of range of IDs from each polygon and + * calculates the and/or/diff range list to filter as an output. And then, build + * InExpression with list of all the IDs present in those list of ranges. + */ +@InterfaceAudience.Internal +public class PolygonListExpression extends UnknownExpression implements ConditionalExpression { + + private String polygonListString; + + private String opType; + + private GeoHashIndex instance; + + private List ranges = new ArrayList(); + + private ColumnExpression column; + + private static final ExpressionResult trueExpRes = + new ExpressionResult(DataTypes.BOOLEAN, true); + + private static final ExpressionResult falseExpRes = + new ExpressionResult(DataTypes.BOOLEAN, false); + + public PolygonListExpression(String polygonListString, String opType, String columnName, + CustomIndex indexInstance) { +this.polygonListString = polygonListString; +this.opType = opType; +this.instance = (GeoHashIndex)indexInstance; +this.column = new ColumnExpression(columnName, DataTypes.LONG); + } + + private void processExpression() { +try { + // 1. parse the polygon list string + List polygons = new ArrayList<>(); + Pattern pattern = Pattern.compile(GeoConstants.POLYGON_REG_EXPRESSION); + Matcher matcher = pattern.matcher(polygonListString); + while (matcher.find()) { +String matchedStr = matcher.group(); +polygons.add(matchedStr); + } + if (polygons.size() < 2) { +throw new RuntimeException("polygon list need at least 2 polygons, really has " + +polygons.size()); + } + // 2. get the range list of each polygon + List processedRangeList = instance.query(polygons.get(0)); + for (int i = 1; i < polygons.size(); i++) { +List tempRangeList = instance.query(polygons.get(i)); +processedRangeList = GeoHashUtils.processRangeList( +processedRangeList, tempRangeList, opType); + } + ranges = processedRangeList; + GeoHashUtils.validateRangeList(ranges); Review comment: Done. have checked all of `instance.query` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960629 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl; +import org.apache.carbondata.core.util.CustomIndex; +import org.apache.carbondata.geo.GeoConstants; +import org.apache.carbondata.geo.GeoHashIndex; +import org.apache.carbondata.geo.GeoHashUtils; +import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl; + +/** + * InPolygonList expression processor. It inputs the InPolygonList string to the Geo + * implementation's query method, gets a list of range of IDs from each polygon and + * calculates the and/or/diff range list to filter as an output. And then, build + * InExpression with list of all the IDs present in those list of ranges. + */ +@InterfaceAudience.Internal +public class PolygonListExpression extends UnknownExpression implements ConditionalExpression { Review comment: Done ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960531 ## File path: geo/src/main/java/org/apache/carbondata/geo/GeoOperationType.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo; + +public enum GeoOperationType { + OR("OR"), + AND("AND"); + + private String type; + + GeoOperationType(String type) { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533959413 ## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ## @@ -0,0 +1,411 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo; + +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; + +public class GeoHashUtils { + + /** + * Get the degree of each grid in the east-west direction. + * + * @param originLatitude the origin point latitude + * @param gridSize the grid size + * @return Delta X is the degree of each grid in the east-west direction + */ + public static double getDeltaX(double originLatitude, int gridSize) { +double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR); +return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos); + } + + /** + * Get the degree of each grid in the north-south direction. + * + * @param gridSize the grid size + * @return Delta Y is the degree of each grid in the north-south direction + */ + public static double getDeltaY(int gridSize) { +return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS); + } + + /** + * Calculate the number of knives cut + * + * @param gridSize the grid size + * @param originLatitude the origin point latitude + * @return The number of knives cut + */ + public static int getCutCount(int gridSize, double originLatitude) { +double deltaX = getDeltaX(originLatitude, gridSize); +int countX = Double.valueOf( +Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue(); +double deltaY = getDeltaY(gridSize); +int countY = Double.valueOf( +Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue(); +return Math.max(countX, countY); + } + + /** + * Convert input longitude and latitude to GeoID + * + * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation. + * @param oriLatitude the origin point latitude + * @param gridSize the grid size + * @return GeoID + */ + public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) { +long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY; +long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY; +int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize); +return colRow2GeoID(ij[0], ij[1]); + } + + /** + * Calculate geo id through grid index coordinates, the row and column of grid coordinates + * can be transformed by latitude and longitude + * + * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient, + * and the floating-point calculation is converted to integer calculation + * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut + * level. + * @param gridSize the size of minimal grid after cut + * @return Grid ID value [row, column], column starts from 1 + */ + public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude, + int gridSize) { +int cutLevel = getCutCount(gridSize, oriLatitude); +int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) / +GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1)); +int row = (int) Math.floor(latitude / getDeltaY(gridSize) / +
[GitHub] [carbondata] ajantha-bhat opened a new pull request #4034: [WIP] support prestosql 333
ajantha-bhat opened a new pull request #4034: URL: https://github.com/apache/carbondata/pull/4034 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737051141 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3257/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r533948673 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,134 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x % 10, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction " + +"in system level control and table level") { + CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", "2,2") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/") +// set threshold to 1MB in system level +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "1") + +sql("drop table if exists minor_threshold") +sql("drop table if exists tmp") +sql( + "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, date" + +" Timestamp, name String, phonetype String, serialname String, salary Int) " + +"STORED AS carbondata" +) +sql( + "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date Timestamp," + +" name String, phonetype String, serialname String, salary Int) STORED AS carbondata" +) +val initframe = generateData(10) +initframe.write + .format("carbondata") + .option("tablename", "tmp") + .mode(SaveMode.Overwrite) + .save() +// load 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +// insert a new segment(id is 3) data size exceed 1 MB +sql("insert into minor_threshold select * from tmp") +// load another 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +// do minor compaction +sql("alter table minor_threshold compact 'minor'") +// check segment 3 whose size exceed the limit should not be compacted but success +val carbonTable = CarbonMetadata.getInstance().getCarbonTable( + CarbonCommonConstants.DATABASE_DEFAULT_NAME, "minor_threshold") +val carbonTablePath = carbonTable.getMetadataPath +val segments = SegmentStatusManager.readLoadMetadata(carbonTablePath); +assertResult(SegmentStatus.SUCCESS)(segments(3).getSegmentStatus) +assertResult(100030)(sql("select count(*) from minor_threshold").collect().head.get(0)) + +// change the threshold to 5MB by dynamic table properties setting, then the segment whose id is +// 3 should be included in minor compaction +sql("alter table minor_threshold set TBLPROPERTIES('minor_compaction_size'='5')") +// reload some segments +for (i <- 0 to 2) { + sql("insert into minor_threshold select * from tmp") +} +// do minor compaction +sql("alter table minor_threshold compact 'minor'") +// check segment 3 whose size not exceed the new threshold limit should be compacted now +val segments2 = SegmentStatusManager.readLoadMetadata(carbonTablePath); +assertResult(SegmentStatus.COMPACTED)(segments2(3).getSegmentStatus) +assertResult(400030)(sql("select count(*) from minor_threshold").collect().head.get(0)) + +// reset to -1 +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "-1") Review comment: @akashrn5 Oh, I see. Now handled, pls check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737047075 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5013/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
akashrn5 commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r533944878 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,134 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x % 10, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction " + +"in system level control and table level") { + CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", "2,2") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/") +// set threshold to 1MB in system level +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "1") + +sql("drop table if exists minor_threshold") +sql("drop table if exists tmp") +sql( + "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, date" + +" Timestamp, name String, phonetype String, serialname String, salary Int) " + +"STORED AS carbondata" +) +sql( + "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date Timestamp," + +" name String, phonetype String, serialname String, salary Int) STORED AS carbondata" +) +val initframe = generateData(10) +initframe.write + .format("carbondata") + .option("tablename", "tmp") + .mode(SaveMode.Overwrite) + .save() +// load 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +// insert a new segment(id is 3) data size exceed 1 MB +sql("insert into minor_threshold select * from tmp") +// load another 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +// do minor compaction +sql("alter table minor_threshold compact 'minor'") +// check segment 3 whose size exceed the limit should not be compacted but success +val carbonTable = CarbonMetadata.getInstance().getCarbonTable( + CarbonCommonConstants.DATABASE_DEFAULT_NAME, "minor_threshold") +val carbonTablePath = carbonTable.getMetadataPath +val segments = SegmentStatusManager.readLoadMetadata(carbonTablePath); +assertResult(SegmentStatus.SUCCESS)(segments(3).getSegmentStatus) +assertResult(100030)(sql("select count(*) from minor_threshold").collect().head.get(0)) + +// change the threshold to 5MB by dynamic table properties setting, then the segment whose id is +// 3 should be included in minor compaction +sql("alter table minor_threshold set TBLPROPERTIES('minor_compaction_size'='5')") +// reload some segments +for (i <- 0 to 2) { + sql("insert into minor_threshold select * from tmp") +} +// do minor compaction +sql("alter table minor_threshold compact 'minor'") +// check segment 3 whose size not exceed the new threshold limit should be compacted now +val segments2 = SegmentStatusManager.readLoadMetadata(carbonTablePath); +assertResult(SegmentStatus.COMPACTED)(segments2(3).getSegmentStatus) +assertResult(400030)(sql("select count(*) from minor_threshold").collect().head.get(0)) + +// reset to -1 +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "-1") Review comment: not only resetting this, reset all property which you have set, like timestamp and threshold. Please check below test case, same for that also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
nihal0107 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737030967 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
CarbonDataQA2 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737029820 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5010/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [CARBONDATA-4059] added testcase for custom compaction, compression and range column for SI table
CarbonDataQA2 commented on pull request #4023: URL: https://github.com/apache/carbondata/pull/4023#issuecomment-737029308 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3256/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.
[ https://issues.apache.org/jira/browse/CARBONDATA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-4052. - Fix Version/s: 2.2.0 Resolution: Fixed > Select query on SI table after insert overwrite is giving wrong result. > --- > > Key: CARBONDATA-4052 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4052 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.2.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > # Create carbon table. > # Create SI table on the same carbon table. > # Do load or insert operation. > # Run query insert overwrite on maintable. > # Now select query on SI table is showing old as well as new data which > should be only new data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [CARBONDATA-4059] added testcase for custom compaction, compression and range column for SI table
CarbonDataQA2 commented on pull request #4023: URL: https://github.com/apache/carbondata/pull/4023#issuecomment-737028313 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5012/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
CarbonDataQA2 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737027155 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3255/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
asfgit closed pull request #4015: URL: https://github.com/apache/carbondata/pull/4015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737021905 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3253/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737021680 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5008/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
akashrn5 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737001258 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 closed pull request #4007: [WIP]: Clean files behaviour change
vikramahuja1001 closed pull request #4007: URL: https://github.com/apache/carbondata/pull/4007 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4027: [WIP]added compression and range column based FT for SI
nihal0107 commented on pull request #4027: URL: https://github.com/apache/carbondata/pull/4027#issuecomment-736993272 Added in #4023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 closed pull request #4027: [WIP]added compression and range column based FT for SI
nihal0107 closed pull request #4027: URL: https://github.com/apache/carbondata/pull/4027 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
Indhumathi27 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-736988368 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533892078 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,22 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + FileInternalUtil +.updateTableStatus( + SegmentStatusManager +.readLoadMetadata(indexCarbonTable.getMetadataPath) +.filter(loadMetadata => !successSISegments.contains(loadMetadata.getLoadName)) +.map(_.getLoadName).toList, + secondaryIndexModel.carbonLoadModel.getDatabaseName, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
CarbonDataQA2 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736961603 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3252/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
CarbonDataQA2 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736961311 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5007/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
asfgit closed pull request #4005: URL: https://github.com/apache/carbondata/pull/4005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736928238 @akashrn5 handled This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Inbox (2) | New Cloud Notification
Dear User2 New documents assigned to 'issues@carbondata.apache.org ' are available on carbondata.apache.org Cloudclick here to retrieve document(s) now Powered by carbondata.apache.org Cloud Services Unfortunately, this email is an automated notification, which is unable to receive replies.
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736810208 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5006/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736807198 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3251/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736790887 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3250/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736789397 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5005/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736788185 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3249/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736787550 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5004/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736748233 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3248/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-736747782 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3247/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736744974 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5003/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-736744592 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5002/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736742888 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3245/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736739075 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533628020 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity
CarbonDataQA2 commented on pull request #4004: URL: https://github.com/apache/carbondata/pull/4004#issuecomment-736734171 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3246/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions
CarbonDataQA2 commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736733971 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5000/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4007: [WIP]: Clean files behaviour change
CarbonDataQA2 commented on pull request #4007: URL: https://github.com/apache/carbondata/pull/4007#issuecomment-736733142 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3244/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
akashrn5 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736731895 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4007: [WIP]: Clean files behaviour change
CarbonDataQA2 commented on pull request #4007: URL: https://github.com/apache/carbondata/pull/4007#issuecomment-736731831 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4999/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity
CarbonDataQA2 commented on pull request #4004: URL: https://github.com/apache/carbondata/pull/4004#issuecomment-736730411 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5001/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
akashrn5 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533614590 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,22 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + FileInternalUtil +.updateTableStatus( + SegmentStatusManager +.readLoadMetadata(indexCarbonTable.getMetadataPath) +.filter(loadMetadata => !successSISegments.contains(loadMetadata.getLoadName)) +.map(_.getLoadName).toList, + secondaryIndexModel.carbonLoadModel.getDatabaseName, Review comment: please do not add complete filter logic inside a method parameter, please assign that to a variable for clean code practice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
akashrn5 commented on a change in pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#discussion_r533604921 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -998,6 +998,34 @@ public long getMajorCompactionSize() { return compactionSize; } + /** + * returns minor compaction size value from carbon properties or -1 if it is not valid or + * not configured + * + * @return compactionSize + */ + public long getMinorCompactionSize() { +long compactionSize = -1; +// if not configured, just use default -1 +if (null != getProperty(CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)) { + try { +compactionSize = Long.parseLong(getProperty( +CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)); + } catch (NumberFormatException e) { +LOGGER.warn("Invalid value is configured for property " ++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " considering the default" ++ " value -1 and not considering segment Size during minor compaction."); + } + if (compactionSize <= 0) { +LOGGER.warn("Invalid value is configured for property " ++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " considering the default" Review comment: same as above ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -998,6 +998,34 @@ public long getMajorCompactionSize() { return compactionSize; } + /** + * returns minor compaction size value from carbon properties or -1 if it is not valid or + * not configured + * + * @return compactionSize + */ + public long getMinorCompactionSize() { +long compactionSize = -1; +// if not configured, just use default -1 +if (null != getProperty(CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)) { + try { +compactionSize = Long.parseLong(getProperty( +CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)); + } catch (NumberFormatException e) { +LOGGER.warn("Invalid value is configured for property " ++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " considering the default" Review comment: ```suggestion + CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + ", considering the default" ``` ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala ## @@ -186,6 +187,141 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest with BeforeAndAfterAll } + def generateData(numOrders: Int = 10): DataFrame = { +import sqlContext.implicits._ +sqlContext.sparkContext.parallelize(1 to numOrders, 4) + .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x % 10, +"serialname" + x, x + 1) + }.toDF("country", "ID", "date", "name", "phonetype", "serialname", "salary") + } + + test("test skip segment whose data size exceed threshold in minor compaction " + +"in system level control and table level") { + CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", "2,2") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/") +// set threshold to 1MB in system level +CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", "1") + +sql("drop table if exists minor_threshold") +sql("drop table if exists tmp") +sql( + "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, date" + +" Timestamp, name String, phonetype String, serialname String, salary Int) " + +"STORED AS carbondata" +) +sql( + "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date Timestamp," + +" name String, phonetype String, serialname String, salary Int) STORED AS carbondata" +) +val initframe = generateData(10) +initframe.write + .format("carbondata") + .option("tablename", "tmp") + .mode(SaveMode.Overwrite) + .save() +// load 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +// insert a new segment(id is 3) data size exceed 1 MB +sql("insert into minor_threshold select * from tmp") +// load another 3 segments +for (i <- 0 to 2) { + sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE minor_threshold" + +" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')" + ) +} +sql("show segments for table minor_threshold").show(100, false) +// do minor compaction +sql("alter table minor_threshold compact 'minor'" +) +// check segment 3 whose
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
vikramahuja1001 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533611032 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) +throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamp timestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName(); +try { + if (!FileFactory.isFileExist(destinationPath)) { +copyToTrashFolder(filePathToCopy, destinationPath); + } +} catch (IOException e) { + // in case there is any issue while copying the file to the trash folder, we need to delete + // the complete segment folder from the trash folder. The trashFolderWithTimestamp contains + // the segment folder too. Delete the folder as it is. + FileFactory.deleteFile(trashFolderWithTimestamp); + LOGGER.error("Error while checking trash folder: " + destinationPath + " or copying" + + " file: " + filePathToCopy + " to the trash folder at path", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599288 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) +throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamp timestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName(); +try { + if (!FileFactory.isFileExist(destinationPath)) { +copyToTrashFolder(filePathToCopy, destinationPath); + } +} catch (IOException e) { + // in case there is any issue while copying the file to the trash folder, we need to delete + // the complete segment folder from the trash folder. The trashFolderWithTimestamp contains + // the segment folder too. Delete the folder as it is. + FileFactory.deleteFile(trashFolderWithTimestamp); + LOGGER.error("Error while checking trash folder: " + destinationPath + " or copying" + + " file: " + filePathToCopy + " to the trash folder at path", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return Review comment: remove return, please check for all the methods in class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599288 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) +throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamp timestamp, partition folder(if any) and segment number + * @return + */ + public static void copyFileToTrashFolder(String filePathToCopy, + String trashFolderWithTimestamp) throws IOException { +CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy); +String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants +.FILE_SEPARATOR + carbonFileToCopy.getName(); +try { + if (!FileFactory.isFileExist(destinationPath)) { +copyToTrashFolder(filePathToCopy, destinationPath); + } +} catch (IOException e) { + // in case there is any issue while copying the file to the trash folder, we need to delete + // the complete segment folder from the trash folder. The trashFolderWithTimestamp contains + // the segment folder too. Delete the folder as it is. + FileFactory.deleteFile(trashFolderWithTimestamp); + LOGGER.error("Error while checking trash folder: " + destinationPath + " or copying" + + " file: " + filePathToCopy + " to the trash folder at path", e); + throw e; +} + } + + /** + * The below method copies the complete segment folder to the trash folder. Here, the data files + * in segment are listed and copied one by one to the trash folder. + * + * @param segmentPath the folder which are to be moved to the trash folder + * @param trashFolderWithTimestamp trashfolderpath with complete timestamp and segment number + * @return Review comment: remove return This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599088 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and + * remove data from the trash. + */ +public final class TrashUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(TrashUtil.class.getName()); + + /** + * Base method to copy the data to the trash folder. + * + * @param sourcePath the path from which to copy the file + * @param destinationPath the path where the file will be copied + * @return + */ + private static void copyToTrashFolder(String sourcePath, String destinationPath) +throws IOException { +DataOutputStream dataOutputStream = null; +DataInputStream dataInputStream = null; +try { + dataOutputStream = FileFactory.getDataOutputStream(destinationPath); + dataInputStream = FileFactory.getDataInputStream(sourcePath); + IOUtils.copyBytes(dataInputStream, dataOutputStream, CarbonCommonConstants.BYTEBUFFER_SIZE); +} catch (IOException exception) { + LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", exception); + throw exception; +} finally { + CarbonUtil.closeStreams(dataInputStream, dataOutputStream); +} + } + + /** + * The below method copies the complete a file to the trash folder. + * + * @param filePathToCopy the files which are to be moved to the trash folder + * @param trashFolderWithTimestamp timestamp, partition folder(if any) and segment number + * @return Review comment: remove return This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533598665 ## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import org.apache.hadoop.io.IOUtils; +import org.apache.log4j.Logger; + +/** + * Mantains the trash folder in carbondata. This class has methods to copy data to the trash and Review comment: correct the spelling This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736708005 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3243/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533596340 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533596098 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533595909 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736705103 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4998/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533593984 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
akashrn5 commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533593984 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) + throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +List staleSegmentFiles = new ArrayList<>(); +List redundantSegmentFile = new ArrayList<>(); +getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile); +for (String staleSegmentFile : staleSegmentFiles) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : redundantSegmentFile) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) + throws IOException { +long
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: Presto UT optimization
CarbonDataQA2 commented on pull request #4031: URL: https://github.com/apache/carbondata/pull/4031#issuecomment-736696049 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3242/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: Presto UT optimization
CarbonDataQA2 commented on pull request #4031: URL: https://github.com/apache/carbondata/pull/4031#issuecomment-736695799 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4997/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533571838 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,28 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + var staleSegmentsList = new ListBuffer[String]() + SegmentStatusManager +.readLoadMetadata(indexCarbonTable.getMetadataPath).foreach { loadMetadata => +if (!successSISegments.contains(loadMetadata.getLoadName)) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533571712 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,28 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + var staleSegmentsList = new ListBuffer[String]() Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions
marchpure commented on pull request #4026: URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736672478 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
ajantha-bhat commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736668554 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
QiangCai commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736650399 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured
[ https://issues.apache.org/jira/browse/CARBONDATA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4064. -- Fix Version/s: 2.2.0 Resolution: Fixed > TPCDS queries are failing with NOne.get exception when table has SI configured > -- > > Key: CARBONDATA-4064 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4064 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4030: [CARBONDATA-4064] Fix tpcds query failure with SI
asfgit closed pull request #4030: URL: https://github.com/apache/carbondata/pull/4030 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4066) data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
[ https://issues.apache.org/jira/browse/CARBONDATA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4066. -- Fix Version/s: 2.2.0 Resolution: Fixed > data mismatch observed with SI and without SI when SI global sort and SI > segment merge is true > -- > > Key: CARBONDATA-4066 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4066 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Major > Fix For: 2.2.0 > > Time Spent: 50m > Remaining Estimate: 0h > > data mismatch observed with SI and without SI when SI global sort and SI > segment merge is true > > test case for reproduce the issue: > CarbonProperties.getInstance() > .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "true") > sql("create table complextable2 (id int, name string, country array) > stored as " + > "carbondata tblproperties('sort_scope'='global_sort','sort_columns'='name')") > sql( > s"load data inpath '$resourcesPath/secindex/array.csv' into table > complextable2 options('delimiter'=','," + > > "'quotechar'='\"','fileheader'='id,name,country','complex_delimiter_level_1'='$'," > + > "'global_sort_partitions'='10')") > val result = sql(" select * from complextable2 where > array_contains(country,'china')") > sql("create index index_2 on table complextable2(country) as 'carbondata' > properties" + > "('sort_scope'='global_sort')") > checkAnswer(sql("select count(*) from complextable2 where > array_contains(country,'china')"), > sql("select count(*) from complextable2 where > ni(array_contains(country,'china'))")) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
asfgit closed pull request #4033: URL: https://github.com/apache/carbondata/pull/4033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
ajantha-bhat commented on pull request #4033: URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736644576 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736640874 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3240/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736639647 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
akashrn5 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736634922 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
akashrn5 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533511617 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,28 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + var staleSegmentsList = new ListBuffer[String]() + SegmentStatusManager +.readLoadMetadata(indexCarbonTable.getMetadataPath).foreach { loadMetadata => +if (!successSISegments.contains(loadMetadata.getLoadName)) { Review comment: you can use directly filter instead of creating new buffer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
akashrn5 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533508633 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -398,6 +404,28 @@ object SecondaryIndexCreator { secondaryIndexModel.sqlContext.sparkSession, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments) + +if (isInsertOverwrite) { + var staleSegmentsList = new ListBuffer[String]() Review comment: `staleSegmentsList ` name depicts wrong info, better to rename variable to `overriddenSegments` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736622862 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4996/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data
akashrn5 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736621452 @kunal642 handled for CTAS and added test cases, please review and merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736618751 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4993/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files feature become data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-736618531 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3239/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files feature become data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-736618069 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4994/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
CarbonDataQA2 commented on pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736617364 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3238/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736570532 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3236/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736569211 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4991/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533409267 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -385,7 +392,26 @@ object SecondaryIndexCreator { val rebuiltSegments = SecondaryIndexUtil .mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping, indexCarbonTable, -loadMetadataDetails.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +loadMetadataDetail.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +if (isInsertOverwrite) { + var segmentList = new ListBuffer[String]() + for (loadMetadata <- loadMetadataDetails) { +if (loadMetadata.getSegmentStatus != SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS) { + segmentList += loadMetadata.getLoadName +} + } + if (segmentList.nonEmpty) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533409156 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -385,7 +392,26 @@ object SecondaryIndexCreator { val rebuiltSegments = SecondaryIndexUtil .mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping, indexCarbonTable, -loadMetadataDetails.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +loadMetadataDetail.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +if (isInsertOverwrite) { + var segmentList = new ListBuffer[String]() + for (loadMetadata <- loadMetadataDetails) { +if (loadMetadata.getSegmentStatus != SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS) { + segmentList += loadMetadata.getLoadName +} + } + if (segmentList.nonEmpty) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533408885 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -371,11 +377,12 @@ object SecondaryIndexCreator { val loadMetadataDetails = SegmentStatusManager .readLoadMetadata(indexCarbonTable.getMetadataPath) - .filter(loadMetadataDetail => successSISegments.contains(loadMetadataDetail.getLoadName)) +val loadMetadataDetail = loadMetadataDetails Review comment: removed as logic changed ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -385,7 +392,26 @@ object SecondaryIndexCreator { val rebuiltSegments = SecondaryIndexUtil .mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping, indexCarbonTable, -loadMetadataDetails.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +loadMetadataDetail.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +if (isInsertOverwrite) { + var segmentList = new ListBuffer[String]() + for (loadMetadata <- loadMetadataDetails) { Review comment: done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -385,7 +392,26 @@ object SecondaryIndexCreator { val rebuiltSegments = SecondaryIndexUtil .mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping, indexCarbonTable, -loadMetadataDetails.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +loadMetadataDetail.toList.asJava, carbonLoadModelForMergeDataFiles)(sc) +if (isInsertOverwrite) { + var segmentList = new ListBuffer[String]() Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
nihal0107 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r533408559 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -299,6 +300,13 @@ object CarbonIndexUtil { .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp)) } val header = indexTable.getCreateOrderColumn.asScala.map(_.getColName).toArray +if (isInsertOverWrite) { + val loadMetadataDetails = carbonLoadModel.getLoadMetadataDetails.asScala + for (loadMetadata <- loadMetadataDetails) { Review comment: done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListener.scala ## @@ -79,13 +78,16 @@ class SILoadEventListener extends OperationEventListener with Logging { .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable +val isInsertOverwrite = (operationContext.getProperties Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
CarbonDataQA2 commented on pull request #4033: URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736539558 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3235/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
CarbonDataQA2 commented on pull request #4033: URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736539051 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4990/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
QiangCai commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533390359 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import javafx.util.Pair; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +Pair, List> staleSegmentFiles = getStaleSegmentFiles(carbonTable); +for (String staleSegmentFile : staleSegmentFiles.getKey()) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : staleSegmentFiles.getValue()) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +Pair, List>
[GitHub] [carbondata] QiangCai commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
QiangCai commented on a change in pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#discussion_r533390359 ## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ## @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.IOException; +import java.util.*; +import java.util.stream.Collectors; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.SegmentFileStore; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; +import org.apache.carbondata.core.statusmanager.SegmentStatus; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; + +import javafx.util.Pair; + +import org.apache.log4j.Logger; + +/** + *This util provide clean stale data methods for clean files command + */ +public class CleanFilesUtil { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CleanFilesUtil.class.getName()); + + /** + * This method will clean all the stale segments for a table, delete the source folder after + * copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegments(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +Pair, List> staleSegmentFiles = getStaleSegmentFiles(carbonTable); +for (String staleSegmentFile : staleSegmentFiles.getKey()) { + String segmentNumber = DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile); + SegmentFileStore fileStore = new SegmentFileStore(carbonTable.getTablePath(), + staleSegmentFile); + Map locationMap = fileStore.getSegmentFile() + .getLocationMap(); + if (locationMap != null) { +if (locationMap.entrySet().iterator().next().getValue().isRelative()) { + CarbonFile segmentPath = FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), segmentNumber)); + // copy the complete segment to the trash folder + TrashUtil.copySegmentToTrash(segmentPath, TrashUtil.getCompleteTrashFolderPath( + carbonTable.getTablePath(), timeStampForTrashFolder, segmentNumber)); + // Deleting the stale Segment folders and the segment file. + try { +CarbonUtil.deleteFoldersAndFiles(segmentPath); +// delete the segment file as well + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(), +staleSegmentFile)); +for (String duplicateStaleSegmentFile : staleSegmentFiles.getValue()) { + if (DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile) + .equals(segmentNumber)) { + FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable +.getTablePath(), duplicateStaleSegmentFile)); + } +} + } catch (IOException | InterruptedException e) { +LOGGER.error("Unable to delete the segment: " + segmentPath + " from after moving" + +" it to the trash folder. Please delete them manually : " + e.getMessage(), e); + } +} + } +} + } + + /** + * This method will clean all the stale segments for partition table, delete the source folders + * after copying the data to the trash and also remove the .segment files of the stale segments + */ + public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) +throws IOException { +long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime(); +Pair, List>
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736507588 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3234/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736502949 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4989/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Pickupolddriver commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command
Pickupolddriver commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-736463924 > @Pickupolddriver : PR shows 37KLOC code, please rebase and keep only required changes. I saw some codegen file of 24KLOC Yes, lots of codes are auto-generated by ANTLR, I will try to remove them and let it generated during the compile process. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] maheshrajus opened a new pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
maheshrajus opened a new pull request #4033: URL: https://github.com/apache/carbondata/pull/4033 ### Why is this PR needed? data mismatch observed with SI and without SI when SI global sort and SI segment merge is true. After merge si data files position reference is also sorted and due to this pointing to wrong position reference causing data mismatch with SI and without SI ### What changes were proposed in this PR? no need to calculate the position references after data files merge should use existed position reference column from SI table. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org