[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248571391
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala
 ##
 @@ -511,6 +515,21 @@ object AlterTableUtil {
 }
   }
 
+  def validateRangeColumnProperties(carbonTable: CarbonTable,
+  propertiesMap: mutable.Map[String, String]): Unit = {
+if (propertiesMap.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) {
+  val rangeColumnProp = 
propertiesMap.get(CarbonCommonConstants.RANGE_COLUMN).get
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248571355
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -787,6 +787,19 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
   }
 }
 
+// range_column should be there in create table cols
+if (tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) {
+  val rangeColumn = 
tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).get.trim
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248570562
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ##
 @@ -219,24 +219,25 @@ case class CarbonLoadDataCommand(
   carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
 CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT))
 
-  optionsFinal
-.put("bad_record_path", 
CarbonBadRecordUtil.getBadRecordsPath(options.asJava, table))
-  val factPath = if (dataFrame.isDefined) {
-""
-  } else {
-FileUtils.getPaths(factPathFromUser, hadoopConf)
-  }
-  carbonLoadModel.setParentTablePath(parentTablePath)
-  carbonLoadModel.setFactFilePath(factPath)
-  
carbonLoadModel.setCarbonTransactionalTable(table.getTableInfo.isTransactionalTable)
-  carbonLoadModel.setAggLoadRequest(
-internalOptions.getOrElse(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL,
-  CarbonCommonConstants.IS_INTERNAL_LOAD_CALL_DEFAULT).toBoolean)
-  
carbonLoadModel.setSegmentId(internalOptions.getOrElse("mergedSegmentName", ""))
-  val columnCompressor = 
table.getTableInfo.getFactTable.getTableProperties.asScala
-.getOrElse(CarbonCommonConstants.COMPRESSOR,
-  CompressorFactory.getInstance().getCompressor.getName)
-  carbonLoadModel.setColumnCompressor(columnCompressor)
+optionsFinal
+  .put("bad_record_path", 
CarbonBadRecordUtil.getBadRecordsPath(options.asJava, table))
+val factPath = if (dataFrame.isDefined) {
+  ""
+} else {
+  FileUtils.getPaths(factPathFromUser, hadoopConf)
+}
+carbonLoadModel.setParentTablePath(parentTablePath)
+carbonLoadModel.setFactFilePath(factPath)
+
carbonLoadModel.setCarbonTransactionalTable(table.getTableInfo.isTransactionalTable)
+carbonLoadModel.setAggLoadRequest(
+  internalOptions.getOrElse(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL,
+CarbonCommonConstants.IS_INTERNAL_LOAD_CALL_DEFAULT).toBoolean)
+
carbonLoadModel.setSegmentId(internalOptions.getOrElse("mergedSegmentName", ""))
+val columnCompressor = 
table.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.COMPRESSOR,
+CompressorFactory.getInstance().getCompressor.getName)
+carbonLoadModel.setColumnCompressor(columnCompressor)
+carbonLoadModel.setRangePartitionColumn(table.getRangeColumn)
 
 Review comment:
   you found a bug of Github...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569818
 
 

 ##
 File path: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestRangeColumnDataLoad.scala
 ##
 @@ -99,19 +114,53 @@ class TestRangeColumnDataLoad extends QueryTest with 
BeforeAndAfterEach with Bef
   """
 | CREATE TABLE carbon_range_column4(id INT, name STRING, city STRING, 
age INT)
 | STORED BY 'org.apache.carbondata.format'
-| TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT', 'SORT_COLUMNS'='name, 
city')
+| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, 
city', 'range_column'='name')
   """.stripMargin)
 
 val dataSkewPath = s"$resourcesPath/range_column"
 
 sql(s"LOAD DATA LOCAL INPATH '$dataSkewPath' INTO TABLE 
carbon_range_column4 " +
-"OPTIONS('GLOBAL_SORT_PARTITIONS'='5', 'range_column'='name', " +
-"'BAD_RECORDS_ACTION'='force')")
+"OPTIONS('GLOBAL_SORT_PARTITIONS'='5', 'BAD_RECORDS_ACTION'='force')")
 
 assert(getIndexFileCount("carbon_range_column4") === 5)
 checkAnswer(sql("SELECT COUNT(*) FROM carbon_range_column4"), Seq(Row(10)))
   }
 
+  test("range_column with system property carbon.range.column.scale.factor") {
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_RANGE_COLUMN_SCALE_FACTOR,
+  "10"
+)
+
+sql(
+  """
+| CREATE TABLE carbon_range_column5(id INT, name STRING, city STRING, 
age INT)
+| STORED BY 'org.apache.carbondata.format'
+| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, 
city', 'range_column'='name')
+  """.stripMargin)
+
+sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE carbon_range_column5 ")
+
+assert(getIndexFileCount("carbon_range_column5") === 1)
+checkAnswer(sql("SELECT COUNT(*) FROM carbon_range_column5"), Seq(Row(12)))
+checkAnswer(sql("SELECT * FROM carbon_range_column5"),
+  sql("SELECT * FROM carbon_range_column5 ORDER BY name"))
+  }
+
+  test("set and unset table property: range_column") {
+sql(
+  """
+| CREATE TABLE carbon_range_column6(id INT, name STRING, city STRING, 
age INT)
+| STORED BY 'org.apache.carbondata.format'
+| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, city')
+  """.stripMargin)
+
+sql("ALTER TABLE carbon_range_column6 SET 
TBLPROPERTIES('range_column'='city')")
+sql("ALTER TABLE carbon_range_column6 SET 
TBLPROPERTIES('range_column'='name')")
+sql("ALTER TABLE carbon_range_column6 UNSET TBLPROPERTIES('range_column')")
+sql("ALTER TABLE carbon_range_column6 SET 
TBLPROPERTIES('range_column'='name')")
+  }
+
 
 Review comment:
   now only support load data command


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569666
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -787,6 +787,19 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
   }
 }
 
+// range_column should be there in create table cols
+if (tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) {
+  val rangeColumn = 
tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).get.trim
+  val rangeField = fields.find(_.column.equalsIgnoreCase(rangeColumn))
+  if (rangeField.isEmpty) {
+val errorMsg = "range_column: " + rangeColumn +
+   " does not exist in table. Please check the create 
table statement."
+throw new MalformedCarbonCommandException(errorMsg)
+  } else {
+tableProperties.put(CarbonCommonConstants.RANGE_COLUMN, 
rangeField.get.column)
 
 Review comment:
   both  yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569407
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -1176,6 +1177,15 @@ private CarbonCommonConstants() {
*/
   public static final int SORT_SIZE_MIN_VAL = 1000;
 
+  /**
+   * For Range_Column, it will use SCALE_FACTOR to control the size of each 
partition.
+   * When SCALE_FACTOR is about the compression ratio, each task will generate 
one CarbonData file.
+   * And the size of the file is about TABLE_BLOCKSIZE of this table.
 
 Review comment:
   changed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties

2019-01-17 Thread GitBox
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move 
Range_Column into the table level properties
URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569358
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
 ##
 @@ -1364,6 +1364,30 @@ public int getHeapMemoryPoolingThresholdBytes() {
 return thresholdSize;
   }
 
+  public int getRangeColumnScaleFactor() {
 
 Review comment:
   because this one has a range, it is different with  the above and the below 
one,


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services