[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248571391 ## File path: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ## @@ -511,6 +515,21 @@ object AlterTableUtil { } } + def validateRangeColumnProperties(carbonTable: CarbonTable, + propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) { + val rangeColumnProp = propertiesMap.get(CarbonCommonConstants.RANGE_COLUMN).get Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248571355 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -787,6 +787,19 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } +// range_column should be there in create table cols +if (tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) { + val rangeColumn = tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).get.trim Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248570562 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -219,24 +219,25 @@ case class CarbonLoadDataCommand( carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)) - optionsFinal -.put("bad_record_path", CarbonBadRecordUtil.getBadRecordsPath(options.asJava, table)) - val factPath = if (dataFrame.isDefined) { -"" - } else { -FileUtils.getPaths(factPathFromUser, hadoopConf) - } - carbonLoadModel.setParentTablePath(parentTablePath) - carbonLoadModel.setFactFilePath(factPath) - carbonLoadModel.setCarbonTransactionalTable(table.getTableInfo.isTransactionalTable) - carbonLoadModel.setAggLoadRequest( -internalOptions.getOrElse(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL, - CarbonCommonConstants.IS_INTERNAL_LOAD_CALL_DEFAULT).toBoolean) - carbonLoadModel.setSegmentId(internalOptions.getOrElse("mergedSegmentName", "")) - val columnCompressor = table.getTableInfo.getFactTable.getTableProperties.asScala -.getOrElse(CarbonCommonConstants.COMPRESSOR, - CompressorFactory.getInstance().getCompressor.getName) - carbonLoadModel.setColumnCompressor(columnCompressor) +optionsFinal + .put("bad_record_path", CarbonBadRecordUtil.getBadRecordsPath(options.asJava, table)) +val factPath = if (dataFrame.isDefined) { + "" +} else { + FileUtils.getPaths(factPathFromUser, hadoopConf) +} +carbonLoadModel.setParentTablePath(parentTablePath) +carbonLoadModel.setFactFilePath(factPath) + carbonLoadModel.setCarbonTransactionalTable(table.getTableInfo.isTransactionalTable) +carbonLoadModel.setAggLoadRequest( + internalOptions.getOrElse(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL, +CarbonCommonConstants.IS_INTERNAL_LOAD_CALL_DEFAULT).toBoolean) + carbonLoadModel.setSegmentId(internalOptions.getOrElse("mergedSegmentName", "")) +val columnCompressor = table.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.COMPRESSOR, +CompressorFactory.getInstance().getCompressor.getName) +carbonLoadModel.setColumnCompressor(columnCompressor) +carbonLoadModel.setRangePartitionColumn(table.getRangeColumn) Review comment: you found a bug of Github... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569818 ## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestRangeColumnDataLoad.scala ## @@ -99,19 +114,53 @@ class TestRangeColumnDataLoad extends QueryTest with BeforeAndAfterEach with Bef """ | CREATE TABLE carbon_range_column4(id INT, name STRING, city STRING, age INT) | STORED BY 'org.apache.carbondata.format' -| TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT', 'SORT_COLUMNS'='name, city') +| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, city', 'range_column'='name') """.stripMargin) val dataSkewPath = s"$resourcesPath/range_column" sql(s"LOAD DATA LOCAL INPATH '$dataSkewPath' INTO TABLE carbon_range_column4 " + -"OPTIONS('GLOBAL_SORT_PARTITIONS'='5', 'range_column'='name', " + -"'BAD_RECORDS_ACTION'='force')") +"OPTIONS('GLOBAL_SORT_PARTITIONS'='5', 'BAD_RECORDS_ACTION'='force')") assert(getIndexFileCount("carbon_range_column4") === 5) checkAnswer(sql("SELECT COUNT(*) FROM carbon_range_column4"), Seq(Row(10))) } + test("range_column with system property carbon.range.column.scale.factor") { +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_RANGE_COLUMN_SCALE_FACTOR, + "10" +) + +sql( + """ +| CREATE TABLE carbon_range_column5(id INT, name STRING, city STRING, age INT) +| STORED BY 'org.apache.carbondata.format' +| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, city', 'range_column'='name') + """.stripMargin) + +sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE carbon_range_column5 ") + +assert(getIndexFileCount("carbon_range_column5") === 1) +checkAnswer(sql("SELECT COUNT(*) FROM carbon_range_column5"), Seq(Row(12))) +checkAnswer(sql("SELECT * FROM carbon_range_column5"), + sql("SELECT * FROM carbon_range_column5 ORDER BY name")) + } + + test("set and unset table property: range_column") { +sql( + """ +| CREATE TABLE carbon_range_column6(id INT, name STRING, city STRING, age INT) +| STORED BY 'org.apache.carbondata.format' +| TBLPROPERTIES('SORT_SCOPE'='LOCAL_SORT', 'SORT_COLUMNS'='name, city') + """.stripMargin) + +sql("ALTER TABLE carbon_range_column6 SET TBLPROPERTIES('range_column'='city')") +sql("ALTER TABLE carbon_range_column6 SET TBLPROPERTIES('range_column'='name')") +sql("ALTER TABLE carbon_range_column6 UNSET TBLPROPERTIES('range_column')") +sql("ALTER TABLE carbon_range_column6 SET TBLPROPERTIES('range_column'='name')") + } + Review comment: now only support load data command This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569666 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -787,6 +787,19 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } +// range_column should be there in create table cols +if (tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).isDefined) { + val rangeColumn = tableProperties.get(CarbonCommonConstants.RANGE_COLUMN).get.trim + val rangeField = fields.find(_.column.equalsIgnoreCase(rangeColumn)) + if (rangeField.isEmpty) { +val errorMsg = "range_column: " + rangeColumn + + " does not exist in table. Please check the create table statement." +throw new MalformedCarbonCommandException(errorMsg) + } else { +tableProperties.put(CarbonCommonConstants.RANGE_COLUMN, rangeField.get.column) Review comment: both yes This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569407 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1176,6 +1177,15 @@ private CarbonCommonConstants() { */ public static final int SORT_SIZE_MIN_VAL = 1000; + /** + * For Range_Column, it will use SCALE_FACTOR to control the size of each partition. + * When SCALE_FACTOR is about the compression ratio, each task will generate one CarbonData file. + * And the size of the file is about TABLE_BLOCKSIZE of this table. Review comment: changed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties
QiangCai commented on a change in pull request #3063: [CARBONDATA-3242] Move Range_Column into the table level properties URL: https://github.com/apache/carbondata/pull/3063#discussion_r248569358 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -1364,6 +1364,30 @@ public int getHeapMemoryPoolingThresholdBytes() { return thresholdSize; } + public int getRangeColumnScaleFactor() { Review comment: because this one has a range, it is different with the above and the below one, This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services