[GitHub] carbondata pull request #1496: [CARBONDATA-1709][DataFrame] Support sort_col...

2017-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1496


---


[GitHub] carbondata pull request #1496: [CARBONDATA-1709][DataFrame] Support sort_col...

2017-11-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1496#discussion_r150824372
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataFrame.scala
 ---
@@ -199,6 +202,61 @@ class TestLoadDataFrame extends QueryTest with 
BeforeAndAfterAll {
 )
   }
 
+  private def getSortColumnValue(tableName: String) = {
+val desc = sql(s"desc formatted $tableName")
+val sortColumnRow = desc.collect.find(r =>
+  r(0).asInstanceOf[String].trim.equalsIgnoreCase("SORT_COLUMNS")
+)
+assert(sortColumnRow.isDefined)
+sortColumnRow.get.get(1).asInstanceOf[String].split(",")
+  .map(_.trim.toLowerCase).filter(_.length > 0)
+  }
+
+  private def getDefaultWriter(tableName: String) = {
--- End diff --

add output type in function signature


---


[GitHub] carbondata pull request #1496: [CARBONDATA-1709][DataFrame] Support sort_col...

2017-11-14 Thread xuchuanyin
GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/1496

[CARBONDATA-1709][DataFrame] Support sort_columns option in dataframe writer

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [X] Any interfaces changed?
 `NO`
 - [X] Any backward compatibility impacted?
 `NO`
 - [X] Document update required?
 `NO`
 - [X] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
`ADDED NEW TESTS`
- How it is tested? Please attach test report.
`TEST THE CORRECTNESS OF SORT_COLUMNS OPTION`
- Is it a performance related change? Please attach the performance 
test report.
`NO`
- Any additional information to help reviewers in testing this 
change.
`NO`
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
`UNRELATED`

COPY FROM JIRA
===

While creating carbondata table from dataframe, `sort_column` property is 
not specified, which by default will use all string columns as `sort_column`. 
So an option is required to specify it as below:
```scala
df.write
  .format("carbondata")
  .options(...)
  .option("sort_columns", "c1,c2")
  .save
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata opt_df_write_sort_column

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1496


commit 0ed4a13a046968be6d8da15262abfd05458bfdd8
Author: xuchuanyin 
Date:   2017-11-14T12:23:35Z

Support sort_columns option in dataframe writer




---