[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1321


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-16 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r139301875
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -433,10 +442,23 @@ case class LoadTable(
   val dateFormat = options.getOrElse("dateformat", null)
   ValidateUtil.validateDateFormat(dateFormat, table, tableName)
   val maxColumns = options.getOrElse("maxcolumns", null)
-  val sortScope = options.getOrElse("sort_scope", null)
+
+  val tableProperties = 
table.getTableInfo.getFactTable.getTableProperties
+  val sortScope = if (null == tableProperties) {
+CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+  } else {
+tableProperties.getOrDefault("sort_scope",
+  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)
+  }
+
--- End diff --

ok, already update.


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-16 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r139300959
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -433,10 +442,23 @@ case class LoadTable(
   val dateFormat = options.getOrElse("dateformat", null)
   ValidateUtil.validateDateFormat(dateFormat, table, tableName)
   val maxColumns = options.getOrElse("maxcolumns", null)
-  val sortScope = options.getOrElse("sort_scope", null)
+
+  val tableProperties = 
table.getTableInfo.getFactTable.getTableProperties
+  val sortScope = if (null == tableProperties) {
+CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+  } else {
+tableProperties.getOrDefault("sort_scope",
+  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)
+  }
+
--- End diff --

Don't hardcode the sort scope here, you should get from carbon properties. 
Please use the code like below.
```
val sortScopeDefault = CarbonProperties.getInstance().
getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,

CarbonProperties.getInstance().getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT))
  val sortScope = if (null == tableProperties) {
sortScopeDefault
  } else {
tableProperties.getOrDefault("sort_scope", sortScopeDefault)
  }
```


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138853538
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -432,7 +440,9 @@ case class LoadTable(
   val dateFormat = options.getOrElse("dateformat", null)
   ValidateUtil.validateDateFormat(dateFormat, table, tableName)
   val maxColumns = options.getOrElse("maxcolumns", null)
-  val sortScope = options.getOrElse("sort_scope", null)
+
+  val tableProperties = 
table.getTableInfo.getFactTable.getTableProperties
+  val sortScope = if (null == tableProperties) null else 
tableProperties.get("sort_scope")
--- End diff --

I use CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT as its default value.


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138852303
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends 
RunnableCommand {
 
 val tableInfo: TableInfo = TableNewProcessor(cm)
 
+// Add validation for sort scope when create table
+val sortScope = 
tableInfo.getFactTable.getTableProperties.get("sort_scope")
+if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) {
+  throw new InvalidConfigurationException("The sort scope " + sortScope
--- End diff --

For this, I just keep same with error message which is already exists.


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138832604
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -874,6 +884,8 @@ private[sql] case class DescribeCommandFormatted(
 results ++= Seq(("CARBON Store Path: ", relation.tableMeta.storePath, 
""))
 val carbonTable = relation.tableMeta.carbonTable
 results ++= Seq(("Table Block Size : ", carbonTable.getBlockSizeInMB + 
" MB", ""))
+results ++= Seq(("SORT_SCOPE", carbonTable.getTableInfo.getFactTable
+  .getTableProperties.getOrDefault("sort_scope", ""), ""))
--- End diff --

Should give a default sort_scope instead of ""


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138832339
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -432,7 +440,9 @@ case class LoadTable(
   val dateFormat = options.getOrElse("dateformat", null)
   ValidateUtil.validateDateFormat(dateFormat, table, tableName)
   val maxColumns = options.getOrElse("maxcolumns", null)
-  val sortScope = options.getOrElse("sort_scope", null)
+
+  val tableProperties = 
table.getTableInfo.getFactTable.getTableProperties
+  val sortScope = if (null == tableProperties) null else 
tableProperties.get("sort_scope")
--- End diff --

If tableProperties is null, it need to use default sort scope, right?


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138831288
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends 
RunnableCommand {
 
 val tableInfo: TableInfo = TableNewProcessor(cm)
 
+// Add validation for sort scope when create table
+val sortScope = 
tableInfo.getFactTable.getTableProperties.get("sort_scope")
+if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) {
+  throw new InvalidConfigurationException("The sort scope " + sortScope
--- End diff --

suggest to change to 
```
s"Passing invalid sort scope '$sortScope', valid sort scopes are 'NO_SORT', 
'BATCH_SORT', 'LOCAL_SORT' or 'GLOBAL_SORT' "
```


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138830621
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends 
RunnableCommand {
 
 val tableInfo: TableInfo = TableNewProcessor(cm)
 
+// Add validation for sort scope when create table
+val sortScope = 
tableInfo.getFactTable.getTableProperties.get("sort_scope")
+if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) {
--- End diff --

`null != sortScope` can be removed, it is checked inside 
`CarbonUtil.isValidSortOption`


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r138830655
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -344,6 +345,13 @@ case class CreateTable(cm: TableModel, createDSTable: 
Boolean = true) extends Ru
 
 val tableInfo: TableInfo = TableNewProcessor(cm)
 
+// Add validation for sort scope when create table
+val sortScope = 
tableInfo.getFactTable.getTableProperties.get("sort_scope")
+if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) {
--- End diff --

null != sortScope can be removed, it is checked inside 
CarbonUtil.isValidSortOption


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-06 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r137442307
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala
 ---
@@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with 
BeforeAndAfterEach with Befo
  | charField CHAR(5),
  | floatField FLOAT
  | )
- | STORED BY 'org.apache.carbondata.format'
+ | STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT')
""".stripMargin)
 sql(
   s"""
  | LOAD DATA LOCAL INPATH '$path' INTO TABLE 
carbon_globalsort_difftypes
- | OPTIONS('SORT_SCOPE'='GLOBAL_SORT',
+ | OPTIONS(
  | 
'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField')
""".stripMargin)
--- End diff --

ok


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-06 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r137441759
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -639,6 +639,23 @@ case class LoadTable(
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
 val optionsFinal = getFinalOptions(carbonProperty)
+val tableProperties = relation.tableMeta.carbonTable.getTableInfo
+  .getFactTable.getTableProperties
+
+optionsFinal.put("sort_scope", 
tableProperties.getOrDefault("sort_scope",
+
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
+  carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
+CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+
+optionsFinal.put("batch_sort_size_inmb", 
tableProperties.getOrDefault("batch_sort_size_inmb",
--- End diff --

Yes, this is only needed for batch sort, but I think if users specify this 
parameter in global sort, it is better to ignore it.


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-06 Thread chenerlu
Github user chenerlu commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r137441815
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -639,6 +639,23 @@ case class LoadTable(
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
 val optionsFinal = getFinalOptions(carbonProperty)
+val tableProperties = relation.tableMeta.carbonTable.getTableInfo
+  .getFactTable.getTableProperties
+
+optionsFinal.put("sort_scope", 
tableProperties.getOrDefault("sort_scope",
+
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
+  carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
+CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+
+optionsFinal.put("batch_sort_size_inmb", 
tableProperties.getOrDefault("batch_sort_size_inmb",
+  
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB,
+
carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB,
+  CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT
+
+optionsFinal.put("global_sort_partitions", 
tableProperties.getOrDefault("global_sort_partitions",
--- End diff --

Same as batch sort size I think.


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-04 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r136830112
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala
 ---
@@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with 
BeforeAndAfterEach with Befo
  | charField CHAR(5),
  | floatField FLOAT
  | )
- | STORED BY 'org.apache.carbondata.format'
+ | STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT')
""".stripMargin)
 sql(
   s"""
  | LOAD DATA LOCAL INPATH '$path' INTO TABLE 
carbon_globalsort_difftypes
- | OPTIONS('SORT_SCOPE'='GLOBAL_SORT',
+ | OPTIONS(
  | 
'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField')
""".stripMargin)
--- End diff --

Add a new testcase to test using SORT_SCOPE in LOAD DATA statement, it 
should fail


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-04 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r136829860
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -556,21 +556,21 @@ case class LoadTable(
   
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT,
 CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT)))
 
-optionsFinal.put("global_sort_partitions", 
options.getOrElse("global_sort_partitions",
-  carbonProperty
-
.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS, 
null)))
+//optionsFinal.put("global_sort_partitions", 
options.getOrElse("global_sort_partitions",
--- End diff --

delete unused code


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-04 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r136829721
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -639,6 +639,23 @@ case class LoadTable(
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
 val optionsFinal = getFinalOptions(carbonProperty)
+val tableProperties = relation.tableMeta.carbonTable.getTableInfo
+  .getFactTable.getTableProperties
+
+optionsFinal.put("sort_scope", 
tableProperties.getOrDefault("sort_scope",
+
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
+  carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
+CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+
+optionsFinal.put("batch_sort_size_inmb", 
tableProperties.getOrDefault("batch_sort_size_inmb",
--- End diff --

This is needed only if using batch sort


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-04 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1321#discussion_r136828909
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala
 ---
@@ -109,23 +118,22 @@ class TestGlobalSortDataLoad extends QueryTest with 
BeforeAndAfterEach with Befo
   }
 
   // --- Configuration Validity 
---
-  test("Don't support GLOBAL_SORT on partitioned table") {
+  ignore("Don't support GLOBAL_SORT on partitioned table") {
--- End diff --

why ignore this testcase?


---


[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...

2017-09-04 Thread chenerlu
GitHub user chenerlu opened a pull request:

https://github.com/apache/carbondata/pull/1321

[CARBONDATA-1438] Unify the sort column and sort scope in create table 
command

Background:
In order to improve the ease of usage for users, unify the sort column and 
sort scope in create table command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenerlu/incubator-carbondata pr-1438

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1321


commit c20caf973a70e6c6dc94d3cb26bf22404646e8eb
Author: chenerlu 
Date:   2017-09-04T12:54:55Z

Unify the sort column and sort scope in create table command




---