from:"chenliang613"

[GitHub] carbondata pull request #3065: [HOTFIX] Optimize presto-guide

2019-01-10 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3065#discussion_r246996198
  
--- Diff: docs/presto-guide.md ---
@@ -220,7 +220,8 @@ Now you can use the Presto CLI on the coordinator to 
query data sources in the c
   Secondly: Create a folder named 'carbondata' under $PRESTO_HOME$/plugin 
and
   copy all jars from 
carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT
 to $PRESTO_HOME$/plugin/carbondata
-
+  **NOTE:**  Not copy one assemble jar, need to copy many jars from 
integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT
--- End diff --

How about : Not copy the assemble jar, make sure to copy all jars ...


---

[GitHub] carbondata pull request #3065: Optimize presto-guide

2019-01-10 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/3065

Optimize presto-guide

Some users made mistake: copy the assemble jar. Add more description to 
clarify, need to copy many jars from 
integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata patch-9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3065


commit b9b629f5b114ced1034761a1f39ca9c8adda1e8f
Author: Liang Chen 
Date:   2019-01-10T15:28:38Z

Optimize presto-guide

Some users made mistake: copy the assemble jar. Add more description to 
clarify, need to copy many jars from 
integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT




---

[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation

2019-01-10 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3033
  
retest this please


---

[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation

2019-01-10 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3033
  
@sraghunandan  please review it.



---

[GitHub] carbondata issue #3021: [CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support

2019-01-09 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3021
  
@chandrasaripaka  please let us know 3026 if solved your issues?


---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-08 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
Reviewed, LGTM


---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Optimize carbonData using alluxio

2019-01-07 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
the pr title is not consistent with pr content.   how about : Add example 
for alluxio integration


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

2019-01-07 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r245676210
  
--- Diff: README.md ---
@@ -68,8 +68,8 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 * [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) 
 
 ##  Integration
-* 
[Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
-* 
[Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md)
+* 
[Hive](https://github.com/apache/carbondata/blob/master/docs/Integration/hive-guide.md)
--- End diff --

Don't suggest creating many folders under docs.


---

[GitHub] carbondata issue #3036: [CARBONDATA-3208] Remove unused parameters, imports ...

2018-12-29 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3036
  
LGTM, thanks for the good contributions.


---

[GitHub] carbondata issue #3036: [CARBONDATA-3208]Remove unused parameters and import...

2018-12-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3036
  
@runzhliu  please correct the PR title.


---

[GitHub] carbondata issue #3034: [CARBONDATA-3126]Correct some spell error in CarbonD...

2018-12-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3034
  
LGTM.
One small issue, for pr title, please add one "blank" after 
[CARBONDATA-3126] 



---

[GitHub] carbondata issue #3034: [CARBONDATA-3126]Correct some spell error in CarbonD...

2018-12-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3034
  
add to whitelist


---

[GitHub] carbondata issue #3030: [HOTFIX] Optimize the code style in csdk/sdk markdow...

2018-12-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3030
  
LGTM


---

[GitHub] carbondata issue #3019: [CARBONDATA-3194] Integrating Carbon with Presto usi...

2018-12-26 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3019
  
remove CarbondataConnector.java in this pr by using hive connector.  in 
future,  if consider contributing carbondata integraton to presto community, 
how to handle ? 


---

[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...

2018-12-26 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3019#discussion_r244003542
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala
 ---
@@ -72,69 +74,107 @@ object CarbonSessionExample {
 val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
 
 // scalastyle:off
-spark.sql(
-  s"""
- | LOAD DATA LOCAL INPATH '$path'
- | INTO TABLE source
- | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#')
-   """.stripMargin)
-// scalastyle:on
-
-spark.sql(
-  s"""
- | SELECT charField, stringField, intField
- | FROM source
- | WHERE stringfield = 'spark' AND decimalField > 40
-  """.stripMargin).show()
-
-spark.sql(
-  s"""
- | SELECT *
- | FROM source WHERE length(stringField) = 5
-   """.stripMargin).show()
-
-spark.sql(
-  s"""
- | SELECT *
- | FROM source WHERE date_format(dateField, "-MM-dd") = 
"2015-07-23"
-   """.stripMargin).show()
-
-spark.sql("SELECT count(stringField) FROM source").show()
-
-spark.sql(
-  s"""
- | SELECT sum(intField), stringField
- | FROM source
- | GROUP BY stringField
-   """.stripMargin).show()
-
-spark.sql(
-  s"""
- | SELECT t1.*, t2.*
- | FROM source t1, source t2
- | WHERE t1.stringField = t2.stringField
-  """.stripMargin).show()
-
-spark.sql(
-  s"""
- | WITH t1 AS (
- | SELECT * FROM source
- | UNION ALL
- | SELECT * FROM source
- | )
- | SELECT t1.*, t2.*
- | FROM t1, source t2
- | WHERE t1.stringField = t2.stringField
-  """.stripMargin).show()
-
-spark.sql(
-  s"""
- | SELECT *
- | FROM source
- | WHERE stringField = 'spark' and floatField > 2.8
-   """.stripMargin).show()
+//spark.sql(
+//  s"""
+// | LOAD DATA LOCAL INPATH '$path'
+// | INTO TABLE source
+// | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#')
+//   """.stripMargin)
+//// scalastyle:on
+//
+//spark.sql(
+//  s"""
+// | CREATE TABLE source_cs(
+// | shortField SHORT,
+// | intField INT,
+// | bigintField LONG,
+// | doubleField DOUBLE,
+// | stringField STRING,
+// | timestampField TIMESTAMP,
+// | decimalField DECIMAL(18,2),
+// | dateField DATE,
+// | charField CHAR(5),
+// | floatField FLOAT
+// | )
+// | using carbon
+// | location 'file://${ExampleUtils.storeLocation}'
+//   """.stripMargin)
+//
+//spark.sql("insert into source_cs select * from source")
+//
+//spark.sql(
+//  s"""
+// | CREATE TABLE source_par(
+// | shortField SHORT,
+// | intField INT,
+// | bigintField LONG,
+// | doubleField DOUBLE,
+// | stringField STRING,
+// | timestampField TIMESTAMP,
+// | decimalField DECIMAL(18,2),
+// | dateField DATE,
+// | charField CHAR(5),
+// | floatField FLOAT
+// | )
+// | using parquet
+//   """.stripMargin)
+//
+//spark.sql("insert into source_par select * from source")
+//spark.sql(
+//  s"""
+// | SELECT charField, stringField, intField
+// | FROM source
+// | WHERE stringfield = 'spark' AND decimalField > 40
+//  """.stripMargin).show()
+//
+//spark.sql(
+//  s"""
+// | SELECT *
+// | FROM source WHERE length(stringField) = 5
+//   """.stripMargin).show()
+//
+//spark.sql(
+//  s"""
+// | SELECT *
+// | FROM source WHERE date_format(dateField, "-MM-dd") = 
"2015-07-23"
+//   """.stripMargin).show()
+//
+//spark.sql("SELECT count(stringField) FROM source").show()
+//
+//spark.sql(
+//  s"""
+// | SELECT sum(intField), stringField
+// | FROM source
+// | GROUP BY stringField
+//   """.stripMargin).show()
+//
+//spark.sql(
+//  s"""
--- End diff --

why disable all these code?


---

[GitHub] carbondata issue #3018: [HOTFIX] rename field "thread_pool_size" to match ca...

2018-12-26 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3018
  
Can you please explain , why need to rename?


---

[GitHub] carbondata issue #3021: [CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support

2018-12-26 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3021
  
@chandrasaripaka  
As I know, spark 2.2.0 is not a stable version, it is better to consider 
other more stable versions.


---

[GitHub] carbondata issue #2890: [CARBONDATA-3002] Fix some spell error

2018-12-14 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2890
  
LGTM


---

[GitHub] carbondata issue #2978: [CARBONDATA-3157] Added lazy load and direct vector ...

2018-12-11 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2978
  
For 1.5.2:
Whether can consider merging vector code to core module from presto 
integration module for example CarbonVectorBatch, or not ?


---

[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error

2018-12-08 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2981
  
@kunal642   please review it


---

[GitHub] carbondata issue #2978: [WIP] Added lazy load and direct vector fill support...

2018-12-05 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2978
  
retest this please


---

[GitHub] carbondata issue #2954: [CARBONDATA-3128]Fix the HiveExample exception

2018-11-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2954
  
@SteNicholas  Thanks for your good contribution.  can you squash all 
commits to one commit


---

[GitHub] carbondata issue #2961: Fixing the getOrCreateCarbonSession method parameter...

2018-11-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
the pr title is not correct, the format should be : [JIRA NUMBER] PR 
description


---

[GitHub] carbondata issue #2961: Fixing the getOrCreateCarbonSession method parameter...

2018-11-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
add to whitelist


---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2018-11-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
Can consider writing an example:how to use MinMaxDataMap to build index for 
CSV file.


---

[GitHub] carbondata issue #2954: [CARBONDATA-3128]Fix the HiveExample exception

2018-11-27 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2954
  
add to whitelist


---

[GitHub] carbondata pull request #2950: [Test PR] How to set PR labels

2018-11-23 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/2950

[Test PR] How to set PR labels

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata patch-8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2950


commit 5a7f32b4e6fe75ab76e31c44bf6541a95e1a3347
Author: Liang Chen 
Date:   2018-11-24T03:09:25Z

[Test PR] How to set PR labels




---

[GitHub] carbondata issue #2934: [CARBONDATA-3111] Readme updated some error links ha...

2018-11-21 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2934
  
LGTM


---

[GitHub] carbondata issue #2934: [CARBONDATA-3111] Readme updated some error links ha...

2018-11-21 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2934
  
add to whitelist


---

[GitHub] carbondata issue #2890: [CARBONDATA-3002] Fix some spell error

2018-11-01 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2890
  
LGTM


---

[GitHub] carbondata pull request #2838: [HOTFIX] Upgrade pom version to 1.6-SNAPSHOT

2018-10-20 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/2838

[HOTFIX] Upgrade pom version to 1.6-SNAPSHOT

Upgrade pom version to 1.6-SNAPSHOT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata upgrade_1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2838


commit 01fe9f12fbadec8f8fcada8f183ec8c7faa4b6b5
Author: chenliang613 
Date:   2018-10-20T09:49:25Z

[HOTFIX] Upgrade pom version to 1.6-SNAPSHOT




---

[GitHub] carbondata issue #2802: [HOTFIX] Correct Create Table documentation contents

2018-10-11 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2802
  
please rebase it.


---

[GitHub] carbondata issue #2810: [WIP] Add CarbonSession Java Example

2018-10-11 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2810
  
add to whitelist


---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

2018-09-29 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779
  
LGTM


---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

2018-09-29 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221444503
  
--- Diff: 
integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
 ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

ok.


---

[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...

2018-09-28 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779
  
My comment : only for 4 parameters , copy the whole 
file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. 
see if can add the judgement for different spark version with different 
code/parameters.


---

[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...

2018-09-28 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221414692
  
--- Diff: 
integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
 ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

My comment : only for 4 parameters , copy the whole 
file(CarbonDataSourceScan.scala) for spark 2.3 integration,  may not require. 
see if can add the judgement for different spark version with different 
code/parameters.


---

[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....

2018-09-27 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2779#discussion_r221131678
  
--- Diff: 
integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
 ---
@@ -0,0 +1,55 @@
+/*
--- End diff --

Why need to move CarbonDataSourceScan.scala?


---

[GitHub] carbondata issue #2777: [HOXFIX] Upgrade spark integration version to 2.3.2

2018-09-27 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2777
  
same as #2779 , so close this pr.


---

[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...

2018-09-27 Thread chenliang613

Github user chenliang613 closed the pull request at:

https://github.com/apache/carbondata/pull/2777


---

[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2

2018-09-27 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2779
  
retest this please


---

[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...

2018-09-27 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2777#discussion_r221129765
  
--- Diff: pom.xml ---
@@ -608,13 +608,12 @@
 
   spark-2.3
   
-2.3.1
+2.3.2
 2.11
 2.11.8
   
   
 integration/spark2
-integration/hive
--- End diff --

hive integration with spark 2.3.2 is not working.


---

[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...

2018-09-27 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/2777

[HOXFIX] Upgrade spark integration version to  2.3.2

1. Upgrade spark integration version to  2.3.2
2. Currently, hive integration module is not supported along with spark 
2.3.2, so remove it in pom.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata spark2.3.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2777


commit e75dde20946873ef42f8c447c71c980375c76a96
Author: chenliang613 
Date:   2018-09-27T15:10:15Z

[HOXFIX] upgrade spark integration version to  2.3.2




---

[GitHub] carbondata issue #2733: [CARBONDATA-2818] Upgrade presto integration version...

2018-09-18 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2733
  
https://user-images.githubusercontent.com/8075709/45723721-52736800-bbe5-11e8-853f-30f530156396.png;>

verified!


---

[GitHub] carbondata pull request #2733: [CARBONDATA-2818] Upgrade presto integration ...

2018-09-18 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/2733

[CARBONDATA-2818] Upgrade presto integration version to 0.210

As per the mailing list 
discussion:http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Propose-to-upgrade-the-version-of-integration-presto-from-0-187-to-0-206-td57336.html

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [X] Any interfaces changed?
 NO
 - [X] Any backward compatibility impacted?
 YES
 - [X] Document update required?
YES
 - [X] Testing done
YES   
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
YES


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata presto_210

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2733


commit 8ecb48b1d3b9e678f89047b9cc9b0063e435d256
Author: chenliang613 
Date:   2018-09-19T00:18:28Z

[CARBONDATA-2818] Upgrade presto integration version to 0.210




---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-09-18 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
LGTM, spark 2.3.1 CI is another issue.


---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-09-18 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
retest this please


---

[GitHub] carbondata issue #2714: [CARBONDATA-2875]Two different threads overwriting t...

2018-09-15 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2714
  
add to whitelist


---

[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...

2018-09-10 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2691#discussion_r216216999
  
--- Diff: integration/spark-common-test/src/test/resources/cars.csv ---
@@ -0,0 +1,4 @@
+name,age
--- End diff --

can you reuse the current csv file, no need to add new one.


---

[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...

2018-09-08 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2638
  
LGTM


---

[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
retest this please


---

[GitHub] carbondata issue #2693: [CARBONDATA-2915] Reformat Documentation of CarbonDa...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2693
  
LGTM


---

[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2693#discussion_r215881276
  
--- Diff: docs/datamap-developer-guide.md ---
@@ -3,14 +3,28 @@
 ### Introduction
 DataMap is a data structure that can be used to accelerate certain query 
of the table. Different DataMap can be implemented by developers. 
 Currently, there are two 2 types of DataMap supported:
-1. IndexDataMap: DataMap that leveraging index to accelerate filter query
-2. MVDataMap: DataMap that leveraging Materialized View to accelerate olap 
style query, like SPJG query (select, predicate, join, groupby)
+1. IndexDataMap: DataMap that leverages index to accelerate filter query
+2. MVDataMap: DataMap that leverages Materialized View to accelerate olap 
style query, like SPJG query (select, predicate, join, groupby)
 
 ### DataMap provider
 When user issues `CREATE DATAMAP dm ON TABLE main USING 'provider'`, the 
corresponding DataMapProvider implementation will be created and initialized. 
 Currently, the provider string can be:
-1. preaggregate: one type of MVDataMap that do pre-aggregate of single 
table
-2. timeseries: one type of MVDataMap that do pre-aggregate based on time 
dimension of the table
+1. preaggregate: A type of MVDataMap that do pre-aggregate of single table
+2. timeseries: A type of MVDataMap that do pre-aggregate based on time 
dimension of the table
 3. class name IndexDataMapFactory  implementation: Developer can implement 
new type of IndexDataMap by extending IndexDataMapFactory
 
-When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding 
DataMapProvider interface will be called.
\ No newline at end of file
+When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding 
DataMapProvider interface will be called.
+
+Details about [DataMap 
Management](./datamap-management.md#datamap-management) and supported 
[DSL](./datamap-management.md#overview) are documented 
[here](./datamap-management.md).
--- End diff --

this link is not working.


---

[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2693#discussion_r215879697
  
--- Diff: docs/configuration-parameters.md ---
@@ -235,3 +235,16 @@ RESET
 * Success will be recorded in the driver log.
 
 * Failure will be displayed in the UI.
+
+
+

[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2693#discussion_r215879237
  
--- Diff: docs/carbondata-architecture-design.md ---
@@ -0,0 +1,140 @@
+## Architecture
+
--- End diff --

Please remove this architecture md file from this pr,  there are many info 
need to be confirmed, it would be better if you could put it to mailing list 
for discussion.


---

[GitHub] carbondata issue #2684: [CARBONDATA-2908]the option of sort_scope don't effe...

2018-09-07 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2684
  
@qiuchenjian  please update the pr's title, doesn't display completely.



---

[GitHub] carbondata issue #2592: [CARBONDATA-2915] Updated & enhanced Documentation o...

2018-09-06 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2592
  
LGTM


---

[GitHub] carbondata pull request #2683: [CARBONDATA-2916] Add CarbonCli tool for data...

2018-09-06 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2683#discussion_r215669915
  
--- Diff: pom.xml ---
@@ -706,6 +706,12 @@
 datamap/mv/core
   
 
+
+  tool
--- End diff --

suggest using "tools"


---

[GitHub] carbondata pull request #2592: [CARBONDATA-2915] Updated & enhanced Document...

2018-09-06 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2592#discussion_r215659429
  
--- Diff: docs/configuration-parameters.md ---
@@ -16,152 +16,135 @@
 -->
 
 # Configuring CarbonData
- This tutorial guides you through the advanced configurations of 
CarbonData :
- 
+ This guide explains the configurations that can be used to tune 
CarbonData to achieve better performance.Some of the properties can be set 
dynamically and are explained in the section Dynamic Configuration In 
CarbonData Using SET-RESET.Most of the properties that control the internal 
settings have reasonable default values.They are listed along with the 
properties along with explanation.
--- End diff --

suggest removing this sentence : Some of the properties can be set 
dynamically and are explained in the section Dynamic Configuration In 
CarbonData Using SET-RESET


---

[GitHub] carbondata pull request #2592: [CARBONDATA-2915] Updated & enhanced Document...

2018-09-06 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2592#discussion_r215655761
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -470,15 +447,6 @@
*/
   @CarbonProperty
   public static final String CARBON_DATE_FORMAT = "carbon.date.format";
-  /**
-   * STORE_LOCATION_HDFS
-   */
-  @CarbonProperty
-  public static final String STORE_LOCATION_HDFS = 
"carbon.storelocation.hdfs";
--- End diff --

can you please explain, why need to remove :  "STORE_LOCATION_HDFS" ?


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-09-03 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
LGTM


---

[GitHub] carbondata issue #2686: upgrade to scala 2.12.6 and binary 2.11

2018-09-03 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2686
  
same question as @zzcclp .

It is better to raise one discussion first on mailing list.


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-09-01 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
two comments  : 
1.In this example, suggest listing all typical cases which be supported 
currently and put the performance comparison, how to improve the performance 
after creating mv datamap.
2.Don't need add this example to CI, because the 1st comment, there are 
many performance comparison.


---

[GitHub] carbondata pull request #2614: [CARBONDATA-2837] Added MVExample in example ...

2018-09-01 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2614#discussion_r214526743
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/MVDataMapExample.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.SparkSession
+
+import org.apache.carbondata.examples.util.ExampleUtils
+
+/**
+ * This example is for pre-aggregate tables.
+ */
+
+object MVDataMapExample {
+
+  def main(args: Array[String]) {
+val spark = ExampleUtils.createCarbonSession("MVDataMapExample")
+exampleBody(spark)
+spark.close()
+  }
+
+  def exampleBody(spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
++ "../../../..").getCanonicalPath
+val testData = 
s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv"
+
+// 1. simple usage for Pre-aggregate tables creation and query
+spark.sql("DROP TABLE IF EXISTS mainTable")
+spark.sql("DROP TABLE IF EXISTS dimtable")
+spark.sql(
+  """
+| CREATE TABLE mainTable
+| (id Int,
+| name String,
+| city String,
+| age Int)
+| STORED BY 'org.apache.carbondata.format'
+  """.stripMargin)
+
+spark.sql(
+  """
+| CREATE TABLE dimtable
+| (name String,
+| address String)
+| STORED BY 'org.apache.carbondata.format'
+  """.stripMargin)
+
+spark.sql(s"""LOAD DATA LOCAL INPATH '$testData' into table 
mainTable""")
+
+spark.sql(s"""insert into dimtable select name, concat(city, ' 
street1') as address from
+   |mainTable group by name, address""".stripMargin)
--- End diff --

Why need to add "group by name ,address" ?


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-09-01 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
retest this please


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-09-01 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
retest this please


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-08-30 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
retest this please


---

[GitHub] carbondata issue #2668: [CARBONDATA-2899] Add MV module class to assembly JA...

2018-08-29 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2668
  
LGTM


---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-08-29 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
@bhavya411 any new progress ?


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-08-26 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
retest this please


---

[GitHub] carbondata issue #2615: [HOTFIX] [presto] presto integration code cleanup

2018-08-21 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2615
  
LGTM


---

[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-08-21 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
retest this please


---

[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful

2018-08-15 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2637
  
add to whitelist


---

[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content

2018-08-15 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2636
  
LGTM


---

[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content

2018-08-15 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2636
  
add to whitelist


---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-08-13 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
@bhavya411  I tested this pr,  the performance(simple aggregation) not 
getting any improvement(0.206 compare to 0.187)

Just i checked 0.207 and 0.208, there are fixed many memory issues, so 
propose to upgrade to 0.208 for CarbonData integration.


---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-08-12 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
retest this please


---

[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-12 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
LGTM


---

[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-11 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
retest this please


---

[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-10 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
retest this please


---

[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...

2018-08-08 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2620#discussion_r208780697
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala
 ---
@@ -0,0 +1,69 @@
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.SparkSession
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.examples.util.ExampleUtils
+
+
+object CustomCompactionExample {
+
+  def main(args: Array[String]): Unit = {
+val spark = ExampleUtils.createCarbonSession("CustomCompactionExample")
+exampleBody(spark)
+spark.close()
+  }
+
+  def exampleBody(spark : SparkSession): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd")
+
+spark.sql("DROP TABLE IF EXISTS custom_compaction_table")
+
+spark.sql(
+  s"""
+ | CREATE TABLE IF NOT EXISTS custom_compaction_table(
+ | ID Int,
+ | date Date,
+ | country String,
+ | name String,
+ | phonetype String,
+ | serialname String,
+ | salary Int,
+ | floatField float
+ | ) STORED BY 'carbondata'
+   """.stripMargin)
+
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+val path = 
s"$rootPath/examples/spark2/src/main/resources/dataSample.csv"
+
+// load 4 segments
+// scalastyle:off
+(1 to 4).foreach(_ => spark.sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$path'
+ | INTO TABLE custom_compaction_table
+ | OPTIONS('HEADER'='true')
+   """.stripMargin))
+// scalastyle:on
+
+// show all segments: 0,1,2,3
+spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show()
+
+// do custom compaction, segments specified will be merged
+spark.sql("ALTER TABLE custom_compaction_table COMPACT 'CUSTOM' WHERE 
SEGMENT.ID IN (1,2)")
+spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show()
+
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_DATE_FORMAT,
+  CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT)
+
--- End diff --

After custom compaction, please query table data once to check the data if 
it is correct?


---

[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...

2018-08-08 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2620#discussion_r208780252
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala
 ---
@@ -0,0 +1,69 @@
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.SparkSession
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.examples.util.ExampleUtils
+
+
--- End diff --

please add the description for explaining the example.


---

[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...

2018-08-08 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2620#discussion_r208780123
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala
 ---
@@ -0,0 +1,69 @@
+package org.apache.carbondata.examples
--- End diff --

please add the apache license header


---

[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-08 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
retest this please


---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-06 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
LGTM


---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
retest this please


---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r206553068
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **Bottleneck for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
--- End diff --

Please change "bottleneck" to "The cost"


---

[GitHub] carbondata issue #2582: [CARBONDATA-2801]Added documentation for flat folder

2018-07-31 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2582
  
LGTM


---

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r206490628
  
--- Diff: integration/presto/Presto-integration-in-carbondata.md ---
@@ -0,0 +1,132 @@
+
+
+# PRESTO INTEGRATION IN CARBONDATA
+
+1. [Document Purpose](#document-purpose)
+1. [Purpose](#purpose)
+1. [Scope](#scope)
+1. [Definitions and Acronyms](#definitions-and-acronyms)
+1. [Requirements addressed](#requirements-addressed)
+1. [Design Considerations](#design-considerations)
+1. [Row Iterator Implementation](#row-iterator-implementation)
+1. [ColumnarReaders or StreamReaders 
approach](#columnarreaders-or-streamreaders-approach)
+1. [Module Structure](#module-structure)
+1. [Detailed design](#detailed-design)
+1. [Modules](#modules)
+1. [Functions Developed](#functions-developed)
+1. [Integration Tests](#integration-tests)
+1. [Tools and languages used](#tools-and-languages-used)
+1. [References](#references)
+
+## Document Purpose
+
+ *  _Purpose_
+ The purpose of this document is to outline the technical design of the 
Presto Integration in CarbonData.
+
+ Its main purpose is to -
+   *  Provide the link between the Functional Requirement and the detailed 
Technical Design documents.
+   *  Detail the functionality which will be provided by each component or 
group of components and show how the various components interact in the design.
+
+ This document is not intended to address installation and configuration 
details of the actual implementation. Installation and configuration details 
are provided in technology guides provided on CarbonData wiki page.As is true 
with any high level design, this document will be updated and refined based on 
changing requirements.
+ *  _Scope_
+ Presto Integration with CarbonData will allow execution of CarbonData 
queries on the Presto CLI. Â CarbonData can be added easily as a Data Source 
among the multiple heterogeneous data sources for Presto.
+ *  _Definitions and Acronyms_
+  **CarbonData :** CarbonData is a fully indexed columnar and Hadoop 
native data-store for processing heavy analytical workloads and detailed 
queries on big data. In customer benchmarks, CarbonData has proven to manage 
Petabyte of data running on extraordinarily low-cost hardware and answers 
queries around 10 times faster than the current open source solutions 
(column-oriented SQL on Hadoop data-stores).
+
+ **Presto :** Presto is a distributed SQL query engine designed to query 
large data sets distributed over one or more heterogeneous data sources.
+
+## Requirements addressed
+This integration of Presto mainly serves two purpose:
+ * Support of Apache CarbonData as Data Source in Presto.
+ * Execution of Apache CarbonData Queries on Presto.
+
+## Design Considerations
+Following are the design considerations for the Presto Integration with 
the Carbondata.
+
+ Row Iterator Implementation
+
+   Presto provides a way to iterate the records through a 
RecordSetProvider which creates a RecordCursor so we have to extend this class 
to create a CarbondataRecordSetProvider and CarbondataRecordCursor to read data 
from Carbondata core module. The CarbondataRecordCursor will utilize the 
DictionaryBasedResultCollector class of Core module to read data row by row. 
This approach has two drawbacks.
+   * The Presto converts this row data into columnar data again since 
carbondata itself store data in columnar format we are adding an additional 
conversion to row to column instead of directly using the column.
+   * The cursor reads the data row by row instead of a batch of data , so 
this is a costly operation as we are already storing the data in pages or 
batches we can directly read the batches of data.
+
+ ColumnarReaders or StreamReaders approach
+
+   In this design we can create StreamReaders that can read data from the 
Carbondata Column based on DataType and directly convert it into Presto Block. 
This approach saves us the row by row processing as well as reduce the 
transition and conversion of data . By this approach we can achieve the fastest 
read from Presto and create a Presto Page by extending PageSourceProvider and 
PageSource class. This design will be discussed in detail in the next sections 
of this document.
+
+## Module Structure
+
+
+![module structure](../presto/images/module-structure.jpg?raw=true)
+
+
+
+## Detailed design
+ Modules
+
+Based on the above functionality, Presto Integration is implemented as 
following module:
+
+1. **Presto**
+
+Integration of Presto with CarbonData includes implementation of connector 
Api of the Presto.
+ Functions developed
+
+![functionas

[GitHub] carbondata pull request #2582: [CARBONDATA-2801]Added documentation for flat...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2582#discussion_r206487262
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -284,6 +286,20 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 ALTER TABLE employee SET TBLPROPERTIES 
(âCACHE_LEVELâ=âBlockletâ)
 ```
 
+- **Support Flat folder**
--- End diff --

change to : 

**Support Flat folder same as Hive/Parquet**


---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r206486006
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -508,6 +511,9 @@ Users can specify which columns to include and exclude 
for local dictionary gene
 ```
ALTER TABLE tablename UNSET 
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
 ```
+
+   **NOTE:** For old tables, by default, local dictionary is disabled. If 
user wants local dictionary, he/she can enable/disable local dictionary for new 
data on those tables at his/her discretion. 
--- End diff --

"he/she" change to  "user"


---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r206485782
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.

--- End diff --

Please explain : what is the cost for enabling local dictionary.


---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r206485268
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.

- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ By default, Local Dictionary will be disabled.
   
  Users will be able to pass following properties in create table 
command: 
   
  | Properties | Default value | Description |
  | -- | - | --- |
- | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will 
be enabled for the table | 
- | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns 
| Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (maximum - 10) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns which are not 
included in dictionary include| Columns for which Local Dictionary is 
generated. |
--- End diff --

"which are not included in dictionary include"  -- please refine.


---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r206484679
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
--- End diff --

Please add one Note  and list which data type don't support.


---

[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206481821
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
+use this feature if you want to store data on amazon cloud. Since the data 
is stored on to cloud 
+storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
+Carbon can support any Object store that conforms to Amazon S3 API. 
+
+#Writing to Object Store
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object stores like S3 does not support file leasing mechanism(supported 
by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object stores are eventual consistent meaning that any put request 
can take some time to reflect when trying to list objects from that bucket 
therefore concurrent queries are not supported. 
--- End diff --

Changes to : Object Storage


---

[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206481369
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
--- End diff --

Suggest changing to : 

S3 is an object storage API on cloud,it is recommended for storing large 
data files. You can use this feature if you want to store data on amazon cloud 
or huawei cloud(obs). Since the data is stored on cloud 
storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
Carbondata can support any Object storage that conforms to Amazon S3 API.


---

[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206480055
  
--- Diff: docs/datamap/preaggregate-datamap-guide.md ---
@@ -7,6 +24,7 @@
 * [Querying Data](#querying-data)
 * [Compaction](#compacting-pre-aggregate-tables)
 * [Data Management](#data-management-with-pre-aggregate-tables)
+* [Limitations](#Limitations)
--- End diff --

Why need to add this item


---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
retest this please


---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] add CTable interface in CarbonSto...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
Can you explain "CTable" for what ?


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] add CTable interface in Ca...

2018-07-31 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r206458466
  
--- Diff: store/core/pom.xml ---
@@ -48,8 +48,8 @@
 org.apache.maven.plugins
 maven-compiler-plugin
 
-  1.7
-  1.7
+  8
--- End diff --

1.8 ?


---

1 2 3 4 5 6 7 8 >

1 - 100 of 780 matches

Mail list logo