[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-01 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533962334



##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import 
org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import 
org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList 
string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is 
to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with 
list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List ranges = new ArrayList();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, 
String columnName) {
+this.polygonRangeList = polygonRangeList;
+this.opType = opType;
+this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+try {
+  // 1. parse the range list string
+  List rangeLists = new ArrayList<>();
+  Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+  Matcher matcher = pattern.matcher(polygonRangeList);
+  while (matcher.find()) {
+String matchedStr = matcher.group();
+rangeLists.add(matchedStr);
+  }
+  // 2. process the range lists
+  if (rangeLists.size() > 0) {
+List processedRangeList = 
getRangeListFromString(rangeLists.get(0));
+for (int i = 1; i < rangeLists.size(); i++) {
+  List tempRangeList = 
getRangeListFromString(rangeLists.get(i));
+  processedRangeList = GeoHashUtils.processRangeList(
+processedRangeList, tempRangeList, opType);
+}
+ranges = processedRangeList;
+GeoHashUtils.validateRangeList(ranges);
+  }
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private void sortRange(List rangeList) {
+rangeList.sort(new Comparator() {
+  @Override
+  public int compare(Long[] x, Long[] y) {
+return Long.compare(x[0], y[0]);
+  }
+});
+  }
+
+  private void combineRange(List rangeList) {
+if (rangeList.size() > 1) {
+  for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+long previousEnd = rangeList.get(i)[1];
+long nextStart = rangeList.get(j)[0];
+if (previousEnd 

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-01 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533961464



##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import 
org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import 
org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonList expression processor. It inputs the InPolygonList string to 
the Geo
+ * implementation's query method, gets a list of range of IDs from each 
polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, 
build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonListExpression extends UnknownExpression implements 
ConditionalExpression {
+
+  private String polygonListString;
+
+  private String opType;
+
+  private GeoHashIndex instance;
+
+  private List ranges = new ArrayList();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonListExpression(String polygonListString, String opType, String 
columnName,
+  CustomIndex indexInstance) {
+this.polygonListString = polygonListString;
+this.opType = opType;
+this.instance = (GeoHashIndex)indexInstance;
+this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+try {
+  // 1. parse the polygon list string
+  List polygons = new ArrayList<>();
+  Pattern pattern = Pattern.compile(GeoConstants.POLYGON_REG_EXPRESSION);
+  Matcher matcher = pattern.matcher(polygonListString);
+  while (matcher.find()) {
+String matchedStr = matcher.group();
+polygons.add(matchedStr);
+  }
+  if (polygons.size() < 2) {
+throw new RuntimeException("polygon list need at least 2 polygons, 
really has " +
+polygons.size());
+  }
+  // 2. get the range list of each polygon
+  List processedRangeList = instance.query(polygons.get(0));
+  for (int i = 1; i < polygons.size(); i++) {
+List tempRangeList = instance.query(polygons.get(i));
+processedRangeList = GeoHashUtils.processRangeList(
+processedRangeList, tempRangeList, opType);
+  }
+  ranges = processedRangeList;
+  GeoHashUtils.validateRangeList(ranges);

Review comment:
   Done. have checked all of `instance.query`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-01 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960629



##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import 
org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import 
org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonList expression processor. It inputs the InPolygonList string to 
the Geo
+ * implementation's query method, gets a list of range of IDs from each 
polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, 
build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonListExpression extends UnknownExpression implements 
ConditionalExpression {

Review comment:
   Done

##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import 

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-01 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960531



##
File path: geo/src/main/java/org/apache/carbondata/geo/GeoOperationType.java
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+public enum GeoOperationType {
+  OR("OR"),
+  AND("AND");
+
+  private String type;
+
+  GeoOperationType(String type) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-01 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533959413



##
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+double mCos = Math.cos(originLatitude * Math.PI / 
GeoConstants.CONVERT_FACTOR);
+return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * 
GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * 
GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+double deltaX = getDeltaX(originLatitude, gridSize);
+int countX = Double.valueOf(
+Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / 
Math.log(2))).intValue();
+double deltaY = getDeltaY(gridSize);
+int countY = Double.valueOf(
+Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / 
Math.log(2))).intValue();
+return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are 
processed by * coefficient,
+   *  and the floating-point calculation is converted to 
integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed 
by * coefficient,
+   *  and the floating-point calculation is converted to 
integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double 
oriLatitude, int gridSize) {
+long longtitudeByRatio = longitude * 
GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+long latitudeByRatio = latitude * 
GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, 
gridSize);
+return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of 
grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are 
processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed 
by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to 
calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double 
oriLatitude,
+  int gridSize) {
+int cutLevel = getCutCount(gridSize, oriLatitude);
+int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) 
/
+GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+

[GitHub] [carbondata] ajantha-bhat opened a new pull request #4034: [WIP] support prestosql 333

2020-12-01 Thread GitBox


ajantha-bhat opened a new pull request #4034:
URL: https://github.com/apache/carbondata/pull/4034


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737051141


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3257/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


Zhangshunyu commented on a change in pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#discussion_r533948673



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala
##
@@ -186,6 +187,134 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest 
with BeforeAndAfterAll
 
   }
 
+  def generateData(numOrders: Int = 10): DataFrame = {
+import sqlContext.implicits._
+sqlContext.sparkContext.parallelize(1 to numOrders, 4)
+  .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x 
% 10,
+"serialname" + x, x + 1)
+  }.toDF("country", "ID", "date", "name", "phonetype", "serialname", 
"salary")
+  }
+
+  test("test skip segment whose data size exceed threshold in minor compaction 
" +
+"in system level control and table level") {
+
CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", 
"2,2")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/")
+// set threshold to 1MB in system level
+CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", 
"1")
+
+sql("drop table if exists  minor_threshold")
+sql("drop table if exists  tmp")
+sql(
+  "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, 
date" +
+" Timestamp, name String, phonetype String, serialname String, salary 
Int) " +
+"STORED AS carbondata"
+)
+sql(
+  "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date 
Timestamp," +
+" name String, phonetype String, serialname String, salary Int) STORED 
AS carbondata"
+)
+val initframe = generateData(10)
+initframe.write
+  .format("carbondata")
+  .option("tablename", "tmp")
+  .mode(SaveMode.Overwrite)
+  .save()
+// load 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+// insert a new segment(id is 3) data size exceed 1 MB
+sql("insert into minor_threshold select * from tmp")
+// load another 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+// do minor compaction
+sql("alter table minor_threshold compact 'minor'")
+// check segment 3 whose size exceed the limit should not be compacted but 
success
+val carbonTable = CarbonMetadata.getInstance().getCarbonTable(
+  CarbonCommonConstants.DATABASE_DEFAULT_NAME, "minor_threshold")
+val carbonTablePath = carbonTable.getMetadataPath
+val segments = SegmentStatusManager.readLoadMetadata(carbonTablePath);
+assertResult(SegmentStatus.SUCCESS)(segments(3).getSegmentStatus)
+assertResult(100030)(sql("select count(*) from 
minor_threshold").collect().head.get(0))
+
+// change the threshold to 5MB by dynamic table properties setting, then 
the segment whose id is
+// 3 should be included in minor compaction
+sql("alter table minor_threshold set 
TBLPROPERTIES('minor_compaction_size'='5')")
+// reload some segments
+for (i <- 0 to 2) {
+  sql("insert into minor_threshold select * from tmp")
+}
+// do minor compaction
+sql("alter table minor_threshold compact 'minor'")
+// check segment 3 whose size not exceed the new threshold limit should be 
compacted now
+val segments2 = SegmentStatusManager.readLoadMetadata(carbonTablePath);
+assertResult(SegmentStatus.COMPACTED)(segments2(3).getSegmentStatus)
+assertResult(400030)(sql("select count(*) from 
minor_threshold").collect().head.get(0))
+
+// reset to -1
+CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", 
"-1")

Review comment:
   @akashrn5 Oh, I see. Now handled, pls check.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737047075


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5013/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#discussion_r533944878



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala
##
@@ -186,6 +187,134 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest 
with BeforeAndAfterAll
 
   }
 
+  def generateData(numOrders: Int = 10): DataFrame = {
+import sqlContext.implicits._
+sqlContext.sparkContext.parallelize(1 to numOrders, 4)
+  .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x 
% 10,
+"serialname" + x, x + 1)
+  }.toDF("country", "ID", "date", "name", "phonetype", "serialname", 
"salary")
+  }
+
+  test("test skip segment whose data size exceed threshold in minor compaction 
" +
+"in system level control and table level") {
+
CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", 
"2,2")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/")
+// set threshold to 1MB in system level
+CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", 
"1")
+
+sql("drop table if exists  minor_threshold")
+sql("drop table if exists  tmp")
+sql(
+  "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, 
date" +
+" Timestamp, name String, phonetype String, serialname String, salary 
Int) " +
+"STORED AS carbondata"
+)
+sql(
+  "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date 
Timestamp," +
+" name String, phonetype String, serialname String, salary Int) STORED 
AS carbondata"
+)
+val initframe = generateData(10)
+initframe.write
+  .format("carbondata")
+  .option("tablename", "tmp")
+  .mode(SaveMode.Overwrite)
+  .save()
+// load 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+// insert a new segment(id is 3) data size exceed 1 MB
+sql("insert into minor_threshold select * from tmp")
+// load another 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+// do minor compaction
+sql("alter table minor_threshold compact 'minor'")
+// check segment 3 whose size exceed the limit should not be compacted but 
success
+val carbonTable = CarbonMetadata.getInstance().getCarbonTable(
+  CarbonCommonConstants.DATABASE_DEFAULT_NAME, "minor_threshold")
+val carbonTablePath = carbonTable.getMetadataPath
+val segments = SegmentStatusManager.readLoadMetadata(carbonTablePath);
+assertResult(SegmentStatus.SUCCESS)(segments(3).getSegmentStatus)
+assertResult(100030)(sql("select count(*) from 
minor_threshold").collect().head.get(0))
+
+// change the threshold to 5MB by dynamic table properties setting, then 
the segment whose id is
+// 3 should be included in minor compaction
+sql("alter table minor_threshold set 
TBLPROPERTIES('minor_compaction_size'='5')")
+// reload some segments
+for (i <- 0 to 2) {
+  sql("insert into minor_threshold select * from tmp")
+}
+// do minor compaction
+sql("alter table minor_threshold compact 'minor'")
+// check segment 3 whose size not exceed the new threshold limit should be 
compacted now
+val segments2 = SegmentStatusManager.readLoadMetadata(carbonTablePath);
+assertResult(SegmentStatus.COMPACTED)(segments2(3).getSegmentStatus)
+assertResult(400030)(sql("select count(*) from 
minor_threshold").collect().head.get(0))
+
+// reset to -1
+CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", 
"-1")

Review comment:
   not only resetting this, reset all property which you have set, like 
timestamp and threshold. Please check below test case, same for that also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists

2020-12-01 Thread GitBox


nihal0107 commented on pull request #4000:
URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737030967


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4000:
URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737029820


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5010/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [CARBONDATA-4059] added testcase for custom compaction, compression and range column for SI table

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4023:
URL: https://github.com/apache/carbondata/pull/4023#issuecomment-737029308


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3256/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.

2020-12-01 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-4052.
-
Fix Version/s: 2.2.0
   Resolution: Fixed

> Select query on SI table after insert overwrite is giving wrong result.
> ---
>
> Key: CARBONDATA-4052
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4052
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> # Create carbon table.
>  # Create SI table on the same carbon table.
>  # Do load or insert operation.
>  # Run query insert overwrite on maintable.
>  # Now select query on SI table is showing old as well as new data which 
> should be only new data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [CARBONDATA-4059] added testcase for custom compaction, compression and range column for SI table

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4023:
URL: https://github.com/apache/carbondata/pull/4023#issuecomment-737028313


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5012/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4000:
URL: https://github.com/apache/carbondata/pull/4000#issuecomment-737027155


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3255/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


asfgit closed pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737021905


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3253/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737021680


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5008/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


akashrn5 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-737001258


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 closed pull request #4007: [WIP]: Clean files behaviour change

2020-12-01 Thread GitBox


vikramahuja1001 closed pull request #4007:
URL: https://github.com/apache/carbondata/pull/4007


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #4027: [WIP]added compression and range column based FT for SI

2020-12-01 Thread GitBox


nihal0107 commented on pull request #4027:
URL: https://github.com/apache/carbondata/pull/4027#issuecomment-736993272


   Added in #4023



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 closed pull request #4027: [WIP]added compression and range column based FT for SI

2020-12-01 Thread GitBox


nihal0107 closed pull request #4027:
URL: https://github.com/apache/carbondata/pull/4027


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists

2020-12-01 Thread GitBox


Indhumathi27 commented on pull request #4000:
URL: https://github.com/apache/carbondata/pull/4000#issuecomment-736988368


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533892078



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,22 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  FileInternalUtil
+.updateTableStatus(
+  SegmentStatusManager
+.readLoadMetadata(indexCarbonTable.getMetadataPath)
+.filter(loadMetadata => 
!successSISegments.contains(loadMetadata.getLoadName))
+.map(_.getLoadName).toList,
+  secondaryIndexModel.carbonLoadModel.getDatabaseName,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736961603


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3252/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736961311


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5007/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


asfgit closed pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


Zhangshunyu commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-736928238


   @akashrn5 handled



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Inbox (2) | New Cloud Notification

2020-12-01 Thread Cloud-carbondata . apache . org


Dear User2 New documents assigned to 'issues@carbondata.apache.org ' are available on carbondata.apache.org Cloudclick here to retrieve document(s) now

Powered by
carbondata.apache.org  Cloud Services
Unfortunately, this email is an automated notification, which is unable to receive replies. 


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736810208


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5006/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-736807198


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3251/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736790887


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736789397


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5005/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736788185


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3249/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736787550


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5004/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736748233


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3248/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4025:
URL: https://github.com/apache/carbondata/pull/4025#issuecomment-736747782


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3247/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736744974


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5003/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4025:
URL: https://github.com/apache/carbondata/pull/4025#issuecomment-736744592


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5002/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4026:
URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736742888


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736739075


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


vikramahuja1001 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533628020



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-736734171


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3246/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4026:
URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736733971


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5000/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4007: [WIP]: Clean files behaviour change

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4007:
URL: https://github.com/apache/carbondata/pull/4007#issuecomment-736733142


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3244/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


akashrn5 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736731895


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4007: [WIP]: Clean files behaviour change

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4007:
URL: https://github.com/apache/carbondata/pull/4007#issuecomment-736731831


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4999/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-736730411


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5001/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533614590



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,22 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  FileInternalUtil
+.updateTableStatus(
+  SegmentStatusManager
+.readLoadMetadata(indexCarbonTable.getMetadataPath)
+.filter(loadMetadata => 
!successSISegments.contains(loadMetadata.getLoadName))
+.map(_.getLoadName).toList,
+  secondaryIndexModel.carbonLoadModel.getDatabaseName,

Review comment:
   please do not add complete filter logic inside a method parameter, 
please assign that to a variable for clean code practice.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#discussion_r533604921



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -998,6 +998,34 @@ public long getMajorCompactionSize() {
 return compactionSize;
   }
 
+  /**
+   * returns minor compaction size value from carbon properties or -1 if it is 
not valid or
+   * not configured
+   *
+   * @return compactionSize
+   */
+  public long getMinorCompactionSize() {
+long compactionSize = -1;
+// if not configured, just use default -1
+if (null != 
getProperty(CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)) {
+  try {
+compactionSize = Long.parseLong(getProperty(
+CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE));
+  } catch (NumberFormatException e) {
+LOGGER.warn("Invalid value is configured for property "
++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " 
considering the default"
++ " value -1 and not considering segment Size during minor 
compaction.");
+  }
+  if (compactionSize <= 0) {
+LOGGER.warn("Invalid value is configured for property "
++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " 
considering the default"

Review comment:
   same as above

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -998,6 +998,34 @@ public long getMajorCompactionSize() {
 return compactionSize;
   }
 
+  /**
+   * returns minor compaction size value from carbon properties or -1 if it is 
not valid or
+   * not configured
+   *
+   * @return compactionSize
+   */
+  public long getMinorCompactionSize() {
+long compactionSize = -1;
+// if not configured, just use default -1
+if (null != 
getProperty(CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE)) {
+  try {
+compactionSize = Long.parseLong(getProperty(
+CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE));
+  } catch (NumberFormatException e) {
+LOGGER.warn("Invalid value is configured for property "
++ CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + " 
considering the default"

Review comment:
   ```suggestion
   + CarbonCommonConstants.CARBON_MINOR_COMPACTION_SIZE + ", 
considering the default"
   ```

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/MajorCompactionIgnoreInMinorTest.scala
##
@@ -186,6 +187,141 @@ class MajorCompactionIgnoreInMinorTest extends QueryTest 
with BeforeAndAfterAll
 
   }
 
+  def generateData(numOrders: Int = 10): DataFrame = {
+import sqlContext.implicits._
+sqlContext.sparkContext.parallelize(1 to numOrders, 4)
+  .map { x => ("country" + x, x, "07/23/2015", "name" + x, "phonetype" + x 
% 10,
+"serialname" + x, x + 1)
+  }.toDF("country", "ID", "date", "name", "phonetype", "serialname", 
"salary")
+  }
+
+  test("test skip segment whose data size exceed threshold in minor compaction 
" +
+"in system level control and table level") {
+
CarbonProperties.getInstance().addProperty("carbon.compaction.level.threshold", 
"2,2")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "mm/dd/")
+// set threshold to 1MB in system level
+CarbonProperties.getInstance().addProperty("carbon.minor.compaction.size", 
"1")
+
+sql("drop table if exists  minor_threshold")
+sql("drop table if exists  tmp")
+sql(
+  "CREATE TABLE IF NOT EXISTS minor_threshold (country String, ID Int, 
date" +
+" Timestamp, name String, phonetype String, serialname String, salary 
Int) " +
+"STORED AS carbondata"
+)
+sql(
+  "CREATE TABLE IF NOT EXISTS tmp (country String, ID Int, date 
Timestamp," +
+" name String, phonetype String, serialname String, salary Int) STORED 
AS carbondata"
+)
+val initframe = generateData(10)
+initframe.write
+  .format("carbondata")
+  .option("tablename", "tmp")
+  .mode(SaveMode.Overwrite)
+  .save()
+// load 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+// insert a new segment(id is 3) data size exceed 1 MB
+sql("insert into minor_threshold select * from tmp")
+// load another 3 segments
+for (i <- 0 to 2) {
+  sql("LOAD DATA LOCAL INPATH '" + csvFilePath1 + "' INTO TABLE 
minor_threshold" +
+" OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"')"
+  )
+}
+sql("show segments for table minor_threshold").show(100, false)
+// do minor compaction
+sql("alter table minor_threshold compact 'minor'"
+)
+// check segment 3 whose 

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


vikramahuja1001 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533611032



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.Logger;
+
+/**
+ * Mantains the trash folder in carbondata. This class has methods to copy 
data to the trash and
+ * remove data from the trash.
+ */
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(TrashUtil.class.getName());
+
+  /**
+   * Base method to copy the data to the trash folder.
+   *
+   * @param sourcePath  the path from which to copy the file
+   * @param destinationPath the path where the file will be copied
+   * @return
+   */
+  private static void copyToTrashFolder(String sourcePath, String 
destinationPath)
+throws IOException {
+DataOutputStream dataOutputStream = null;
+DataInputStream dataInputStream = null;
+try {
+  dataOutputStream = FileFactory.getDataOutputStream(destinationPath);
+  dataInputStream = FileFactory.getDataInputStream(sourcePath);
+  IOUtils.copyBytes(dataInputStream, dataOutputStream, 
CarbonCommonConstants.BYTEBUFFER_SIZE);
+} catch (IOException exception) {
+  LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", 
exception);
+  throw exception;
+} finally {
+  CarbonUtil.closeStreams(dataInputStream, dataOutputStream);
+}
+  }
+
+  /**
+   * The below method copies the complete a file to the trash folder.
+   *
+   * @param filePathToCopy   the files which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp timestamp, partition folder(if any) and 
segment number
+   * @return
+   */
+  public static void copyFileToTrashFolder(String filePathToCopy,
+  String trashFolderWithTimestamp) throws IOException {
+CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy);
+String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants
+.FILE_SEPARATOR + carbonFileToCopy.getName();
+try {
+  if (!FileFactory.isFileExist(destinationPath)) {
+copyToTrashFolder(filePathToCopy, destinationPath);
+  }
+} catch (IOException e) {
+  // in case there is any issue while copying the file to the trash 
folder, we need to delete
+  // the complete segment folder from the trash folder. The 
trashFolderWithTimestamp contains
+  // the segment folder too. Delete the folder as it is.
+  FileFactory.deleteFile(trashFolderWithTimestamp);
+  LOGGER.error("Error while checking trash folder: " + destinationPath + " 
or copying" +
+  " file: " + filePathToCopy + " to the trash folder at path", e);
+  throw e;
+}
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. 
Here, the data files
+   * in segment are listed and copied one by one to the trash folder.
+   *
+   * @param segmentPath  the folder which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp trashfolderpath with complete timestamp 
and segment number
+   * @return

Review comment:
   done

##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599288



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.Logger;
+
+/**
+ * Mantains the trash folder in carbondata. This class has methods to copy 
data to the trash and
+ * remove data from the trash.
+ */
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(TrashUtil.class.getName());
+
+  /**
+   * Base method to copy the data to the trash folder.
+   *
+   * @param sourcePath  the path from which to copy the file
+   * @param destinationPath the path where the file will be copied
+   * @return
+   */
+  private static void copyToTrashFolder(String sourcePath, String 
destinationPath)
+throws IOException {
+DataOutputStream dataOutputStream = null;
+DataInputStream dataInputStream = null;
+try {
+  dataOutputStream = FileFactory.getDataOutputStream(destinationPath);
+  dataInputStream = FileFactory.getDataInputStream(sourcePath);
+  IOUtils.copyBytes(dataInputStream, dataOutputStream, 
CarbonCommonConstants.BYTEBUFFER_SIZE);
+} catch (IOException exception) {
+  LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", 
exception);
+  throw exception;
+} finally {
+  CarbonUtil.closeStreams(dataInputStream, dataOutputStream);
+}
+  }
+
+  /**
+   * The below method copies the complete a file to the trash folder.
+   *
+   * @param filePathToCopy   the files which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp timestamp, partition folder(if any) and 
segment number
+   * @return
+   */
+  public static void copyFileToTrashFolder(String filePathToCopy,
+  String trashFolderWithTimestamp) throws IOException {
+CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy);
+String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants
+.FILE_SEPARATOR + carbonFileToCopy.getName();
+try {
+  if (!FileFactory.isFileExist(destinationPath)) {
+copyToTrashFolder(filePathToCopy, destinationPath);
+  }
+} catch (IOException e) {
+  // in case there is any issue while copying the file to the trash 
folder, we need to delete
+  // the complete segment folder from the trash folder. The 
trashFolderWithTimestamp contains
+  // the segment folder too. Delete the folder as it is.
+  FileFactory.deleteFile(trashFolderWithTimestamp);
+  LOGGER.error("Error while checking trash folder: " + destinationPath + " 
or copying" +
+  " file: " + filePathToCopy + " to the trash folder at path", e);
+  throw e;
+}
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. 
Here, the data files
+   * in segment are listed and copied one by one to the trash folder.
+   *
+   * @param segmentPath  the folder which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp trashfolderpath with complete timestamp 
and segment number
+   * @return

Review comment:
   remove return, please check for all the methods in class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599288



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.Logger;
+
+/**
+ * Mantains the trash folder in carbondata. This class has methods to copy 
data to the trash and
+ * remove data from the trash.
+ */
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(TrashUtil.class.getName());
+
+  /**
+   * Base method to copy the data to the trash folder.
+   *
+   * @param sourcePath  the path from which to copy the file
+   * @param destinationPath the path where the file will be copied
+   * @return
+   */
+  private static void copyToTrashFolder(String sourcePath, String 
destinationPath)
+throws IOException {
+DataOutputStream dataOutputStream = null;
+DataInputStream dataInputStream = null;
+try {
+  dataOutputStream = FileFactory.getDataOutputStream(destinationPath);
+  dataInputStream = FileFactory.getDataInputStream(sourcePath);
+  IOUtils.copyBytes(dataInputStream, dataOutputStream, 
CarbonCommonConstants.BYTEBUFFER_SIZE);
+} catch (IOException exception) {
+  LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", 
exception);
+  throw exception;
+} finally {
+  CarbonUtil.closeStreams(dataInputStream, dataOutputStream);
+}
+  }
+
+  /**
+   * The below method copies the complete a file to the trash folder.
+   *
+   * @param filePathToCopy   the files which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp timestamp, partition folder(if any) and 
segment number
+   * @return
+   */
+  public static void copyFileToTrashFolder(String filePathToCopy,
+  String trashFolderWithTimestamp) throws IOException {
+CarbonFile carbonFileToCopy = FileFactory.getCarbonFile(filePathToCopy);
+String destinationPath = trashFolderWithTimestamp + CarbonCommonConstants
+.FILE_SEPARATOR + carbonFileToCopy.getName();
+try {
+  if (!FileFactory.isFileExist(destinationPath)) {
+copyToTrashFolder(filePathToCopy, destinationPath);
+  }
+} catch (IOException e) {
+  // in case there is any issue while copying the file to the trash 
folder, we need to delete
+  // the complete segment folder from the trash folder. The 
trashFolderWithTimestamp contains
+  // the segment folder too. Delete the folder as it is.
+  FileFactory.deleteFile(trashFolderWithTimestamp);
+  LOGGER.error("Error while checking trash folder: " + destinationPath + " 
or copying" +
+  " file: " + filePathToCopy + " to the trash folder at path", e);
+  throw e;
+}
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. 
Here, the data files
+   * in segment are listed and copied one by one to the trash folder.
+   *
+   * @param segmentPath  the folder which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp trashfolderpath with complete timestamp 
and segment number
+   * @return

Review comment:
   remove return





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533599088



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.Logger;
+
+/**
+ * Mantains the trash folder in carbondata. This class has methods to copy 
data to the trash and
+ * remove data from the trash.
+ */
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(TrashUtil.class.getName());
+
+  /**
+   * Base method to copy the data to the trash folder.
+   *
+   * @param sourcePath  the path from which to copy the file
+   * @param destinationPath the path where the file will be copied
+   * @return
+   */
+  private static void copyToTrashFolder(String sourcePath, String 
destinationPath)
+throws IOException {
+DataOutputStream dataOutputStream = null;
+DataInputStream dataInputStream = null;
+try {
+  dataOutputStream = FileFactory.getDataOutputStream(destinationPath);
+  dataInputStream = FileFactory.getDataInputStream(sourcePath);
+  IOUtils.copyBytes(dataInputStream, dataOutputStream, 
CarbonCommonConstants.BYTEBUFFER_SIZE);
+} catch (IOException exception) {
+  LOGGER.error("Unable to copy " + sourcePath + " to the trash folder", 
exception);
+  throw exception;
+} finally {
+  CarbonUtil.closeStreams(dataInputStream, dataOutputStream);
+}
+  }
+
+  /**
+   * The below method copies the complete a file to the trash folder.
+   *
+   * @param filePathToCopy   the files which are to be moved to the 
trash folder
+   * @param trashFolderWithTimestamp timestamp, partition folder(if any) and 
segment number
+   * @return

Review comment:
   remove return





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533598665



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.io.IOUtils;
+import org.apache.log4j.Logger;
+
+/**
+ * Mantains the trash folder in carbondata. This class has methods to copy 
data to the trash and

Review comment:
   correct the spelling





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736708005


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3243/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533596340



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533596098



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533595909



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736705103


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4998/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533593984



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533593984



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+  throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+List staleSegmentFiles = new ArrayList<>();
+List redundantSegmentFile = new ArrayList<>();
+getStaleSegmentFiles(carbonTable, staleSegmentFiles, redundantSegmentFile);
+for (String staleSegmentFile : staleSegmentFiles) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : redundantSegmentFile) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+  throws IOException {
+long 

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: Presto UT optimization

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4031:
URL: https://github.com/apache/carbondata/pull/4031#issuecomment-736696049


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3242/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: Presto UT optimization

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4031:
URL: https://github.com/apache/carbondata/pull/4031#issuecomment-736695799


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4997/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533571838



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,28 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  var staleSegmentsList = new ListBuffer[String]()
+  SegmentStatusManager
+.readLoadMetadata(indexCarbonTable.getMetadataPath).foreach { 
loadMetadata =>
+if (!successSISegments.contains(loadMetadata.getLoadName)) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533571712



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,28 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  var staleSegmentsList = new ListBuffer[String]()

Review comment:
   removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #4026: [CARBONDATA-4063] Refactor getBlockId and getShortBlockId functions

2020-12-01 Thread GitBox


marchpure commented on pull request #4026:
URL: https://github.com/apache/carbondata/pull/4026#issuecomment-736672478


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


ajantha-bhat commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736668554


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


QiangCai commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736650399


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured

2020-12-01 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4064.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> TPCDS queries are failing with NOne.get exception when table has SI configured
> --
>
> Key: CARBONDATA-4064
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4064
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4030: [CARBONDATA-4064] Fix tpcds query failure with SI

2020-12-01 Thread GitBox


asfgit closed pull request #4030:
URL: https://github.com/apache/carbondata/pull/4030


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4066) data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4066.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
> --
>
> Key: CARBONDATA-4066
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4066
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
>  
> test case for reproduce the issue:
> CarbonProperties.getInstance()
>  .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "true")
> sql("create table complextable2 (id int, name string, country array) 
> stored as " +
>  "carbondata tblproperties('sort_scope'='global_sort','sort_columns'='name')")
> sql(
>  s"load data inpath '$resourcesPath/secindex/array.csv' into table 
> complextable2 options('delimiter'=','," +
>  
> "'quotechar'='\"','fileheader'='id,name,country','complex_delimiter_level_1'='$',"
>  +
>  "'global_sort_partitions'='10')")
> val result = sql(" select * from complextable2 where 
> array_contains(country,'china')")
> sql("create index index_2 on table complextable2(country) as 'carbondata' 
> properties" +
>  "('sort_scope'='global_sort')")
> checkAnswer(sql("select count(*) from complextable2 where 
> array_contains(country,'china')"),
>  sql("select count(*) from complextable2 where 
> ni(array_contains(country,'china'))"))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread GitBox


asfgit closed pull request #4033:
URL: https://github.com/apache/carbondata/pull/4033


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread GitBox


ajantha-bhat commented on pull request #4033:
URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736644576


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736640874


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3240/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736639647


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4995/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


akashrn5 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736634922


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533511617



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,28 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  var staleSegmentsList = new ListBuffer[String]()
+  SegmentStatusManager
+.readLoadMetadata(indexCarbonTable.getMetadataPath).foreach { 
loadMetadata =>
+if (!successSISegments.contains(loadMetadata.getLoadName)) {

Review comment:
   you can use directly filter instead of creating new buffer





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


akashrn5 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533508633



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -398,6 +404,28 @@ object SecondaryIndexCreator {
   secondaryIndexModel.sqlContext.sparkSession,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp,
   rebuiltSegments)
+
+if (isInsertOverwrite) {
+  var staleSegmentsList = new ListBuffer[String]()

Review comment:
   `staleSegmentsList ` name depicts wrong info, better to rename variable 
to `overriddenSegments`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736622862


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4996/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-01 Thread GitBox


akashrn5 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-736621452


   @kunal642 handled for CTAS and added test cases, please review and merge



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736618751


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4993/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files feature become data trash manager

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-736618531


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3239/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files feature become data trash manager

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-736618069


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4994/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-736617364


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3238/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736570532


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3236/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736569211


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4991/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533409267



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -385,7 +392,26 @@ object SecondaryIndexCreator {
 val rebuiltSegments = SecondaryIndexUtil
   
.mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping,
 indexCarbonTable,
-loadMetadataDetails.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+loadMetadataDetail.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+if (isInsertOverwrite) {
+  var segmentList = new ListBuffer[String]()
+  for (loadMetadata <- loadMetadataDetails) {
+if (loadMetadata.getSegmentStatus != 
SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS) {
+  segmentList += loadMetadata.getLoadName
+}
+  }
+  if (segmentList.nonEmpty) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533409156



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -385,7 +392,26 @@ object SecondaryIndexCreator {
 val rebuiltSegments = SecondaryIndexUtil
   
.mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping,
 indexCarbonTable,
-loadMetadataDetails.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+loadMetadataDetail.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+if (isInsertOverwrite) {
+  var segmentList = new ListBuffer[String]()
+  for (loadMetadata <- loadMetadataDetails) {
+if (loadMetadata.getSegmentStatus != 
SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS) {
+  segmentList += loadMetadata.getLoadName
+}
+  }
+  if (segmentList.nonEmpty) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533408885



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -371,11 +377,12 @@ object SecondaryIndexCreator {
 
 val loadMetadataDetails = SegmentStatusManager
   .readLoadMetadata(indexCarbonTable.getMetadataPath)
-  .filter(loadMetadataDetail => 
successSISegments.contains(loadMetadataDetail.getLoadName))
 
+val loadMetadataDetail = loadMetadataDetails

Review comment:
   removed as logic changed

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -385,7 +392,26 @@ object SecondaryIndexCreator {
 val rebuiltSegments = SecondaryIndexUtil
   
.mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping,
 indexCarbonTable,
-loadMetadataDetails.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+loadMetadataDetail.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+if (isInsertOverwrite) {
+  var segmentList = new ListBuffer[String]()
+  for (loadMetadata <- loadMetadataDetails) {

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -385,7 +392,26 @@ object SecondaryIndexCreator {
 val rebuiltSegments = SecondaryIndexUtil
   
.mergeDataFilesSISegments(secondaryIndexModel.segmentIdToLoadStartTimeMapping,
 indexCarbonTable,
-loadMetadataDetails.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+loadMetadataDetail.toList.asJava, 
carbonLoadModelForMergeDataFiles)(sc)
+if (isInsertOverwrite) {
+  var segmentList = new ListBuffer[String]()

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-12-01 Thread GitBox


nihal0107 commented on a change in pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#discussion_r533408559



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -299,6 +300,13 @@ object CarbonIndexUtil {
 .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp))
 }
 val header = 
indexTable.getCreateOrderColumn.asScala.map(_.getColName).toArray
+if (isInsertOverWrite) {
+  val loadMetadataDetails = carbonLoadModel.getLoadMetadataDetails.asScala
+  for (loadMetadata <- loadMetadataDetails) {

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListener.scala
##
@@ -79,13 +78,16 @@ class SILoadEventListener extends OperationEventListener 
with Logging {
   .lookupRelation(Some(carbonLoadModel.getDatabaseName),
 
indexTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable
 
+val isInsertOverwrite = (operationContext.getProperties

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4033:
URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736539558


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3235/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4033:
URL: https://github.com/apache/carbondata/pull/4033#issuecomment-736539051


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4990/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


QiangCai commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533390359



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import javafx.util.Pair;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+Pair, List> staleSegmentFiles = 
getStaleSegmentFiles(carbonTable);
+for (String staleSegmentFile : staleSegmentFiles.getKey()) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : 
staleSegmentFiles.getValue()) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+Pair, List> 

[GitHub] [carbondata] QiangCai commented on a change in pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


QiangCai commented on a change in pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#discussion_r533390359



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.SegmentFileStore;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
+import org.apache.carbondata.core.statusmanager.SegmentStatus;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
+
+import javafx.util.Pair;
+
+import org.apache.log4j.Logger;
+
+/**
+ *This util provide clean stale data methods for clean files command
+ */
+public class CleanFilesUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CleanFilesUtil.class.getName());
+
+  /**
+   * This method will clean all the stale segments for a table, delete the 
source folder after
+   * copying the data to the trash and also remove the .segment files of the 
stale segments
+   */
+  public static void cleanStaleSegments(CarbonTable carbonTable)
+throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+Pair, List> staleSegmentFiles = 
getStaleSegmentFiles(carbonTable);
+for (String staleSegmentFile : staleSegmentFiles.getKey()) {
+  String segmentNumber = 
DataFileUtil.getSegmentNoFromSegmentFile(staleSegmentFile);
+  SegmentFileStore fileStore = new 
SegmentFileStore(carbonTable.getTablePath(),
+  staleSegmentFile);
+  Map locationMap = 
fileStore.getSegmentFile()
+  .getLocationMap();
+  if (locationMap != null) {
+if (locationMap.entrySet().iterator().next().getValue().isRelative()) {
+  CarbonFile segmentPath = 
FileFactory.getCarbonFile(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), segmentNumber));
+  // copy the complete segment to the trash folder
+  TrashUtil.copySegmentToTrash(segmentPath, 
TrashUtil.getCompleteTrashFolderPath(
+  carbonTable.getTablePath(), timeStampForTrashFolder, 
segmentNumber));
+  // Deleting the stale Segment folders and the segment file.
+  try {
+CarbonUtil.deleteFoldersAndFiles(segmentPath);
+// delete the segment file as well
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
+staleSegmentFile));
+for (String duplicateStaleSegmentFile : 
staleSegmentFiles.getValue()) {
+  if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
+  .equals(segmentNumber)) {
+
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
+.getTablePath(), duplicateStaleSegmentFile));
+  }
+}
+  } catch (IOException | InterruptedException e) {
+LOGGER.error("Unable to delete the segment: " + segmentPath + " 
from after moving" +
+" it to the trash folder. Please delete them manually : " + 
e.getMessage(), e);
+  }
+}
+  }
+}
+  }
+
+  /**
+   * This method will clean all the stale segments for partition table, delete 
the source folders
+   * after copying the data to the trash and also remove the .segment files of 
the stale segments
+   */
+  public static void cleanStaleSegmentsForPartitionTable(CarbonTable 
carbonTable)
+throws IOException {
+long timeStampForTrashFolder = CarbonUpdateUtil.readCurrentTime();
+Pair, List> 

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736507588


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3234/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata

2020-12-01 Thread GitBox


CarbonDataQA2 commented on pull request #4005:
URL: https://github.com/apache/carbondata/pull/4005#issuecomment-736502949


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4989/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Pickupolddriver commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-01 Thread GitBox


Pickupolddriver commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-736463924


   > @Pickupolddriver : PR shows 37KLOC code, please rebase and keep only 
required changes. I saw some codegen file of 24KLOC
   
   Yes, lots of codes are auto-generated by ANTLR, I will try to remove them 
and let it generated during the compile process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] maheshrajus opened a new pull request #4033: [CARBONDATA-4066] data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2020-12-01 Thread GitBox


maheshrajus opened a new pull request #4033:
URL: https://github.com/apache/carbondata/pull/4033


   
### Why is this PR needed?
data mismatch observed with SI and without SI when SI global sort and SI 
segment merge is true. After merge si data files position reference is also 
sorted and due to this pointing to wrong position reference causing data 
mismatch with SI and without SI

### What changes were proposed in this PR?
   no need to calculate the position references after data files merge should 
use existed position reference column from SI table.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >