[GitHub] carbondata pull request #3065: [HOTFIX] Optimize presto-guide
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3065#discussion_r246996198 --- Diff: docs/presto-guide.md --- @@ -220,7 +220,8 @@ Now you can use the Presto CLI on the coordinator to query data sources in the c Secondly: Create a folder named 'carbondata' under $PRESTO_HOME$/plugin and copy all jars from carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT to $PRESTO_HOME$/plugin/carbondata - + **NOTE:** Not copy one assemble jar, need to copy many jars from integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT --- End diff -- How about : Not copy the assemble jar, make sure to copy all jars ... ---
[GitHub] carbondata pull request #3065: Optimize presto-guide
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/3065 Optimize presto-guide Some users made mistake: copy the assemble jar. Add more description to clarify, need to copy many jars from integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata patch-9 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3065.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3065 commit b9b629f5b114ced1034761a1f39ca9c8adda1e8f Author: Liang Chen Date: 2019-01-10T15:28:38Z Optimize presto-guide Some users made mistake: copy the assemble jar. Add more description to clarify, need to copy many jars from integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT ---
[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3033 retest this please ---
[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3033 @sraghunandan please review it. ---
[GitHub] carbondata issue #3021: [CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3021 @chandrasaripaka please let us know 3026 if solved your issues? ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3056 Reviewed, LGTM ---
[GitHub] carbondata issue #3054: [CARBONDATA-3232] Optimize carbonData using alluxio
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3054 the pr title is not consistent with pr content. how about : Add example for alluxio integration ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r245676210 --- Diff: README.md --- @@ -68,8 +68,8 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) ## Integration -* [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md) -* [Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md) +* [Hive](https://github.com/apache/carbondata/blob/master/docs/Integration/hive-guide.md) --- End diff -- Don't suggest creating many folders under docs. ---
[GitHub] carbondata issue #3036: [CARBONDATA-3208] Remove unused parameters, imports ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3036 LGTM, thanks for the good contributions. ---
[GitHub] carbondata issue #3036: [CARBONDATA-3208]Remove unused parameters and import...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3036 @runzhliu please correct the PR title. ---
[GitHub] carbondata issue #3034: [CARBONDATA-3126]Correct some spell error in CarbonD...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3034 LGTM. One small issue, for pr title, please add one "blank" after [CARBONDATA-3126] ---
[GitHub] carbondata issue #3034: [CARBONDATA-3126]Correct some spell error in CarbonD...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3034 add to whitelist ---
[GitHub] carbondata issue #3030: [HOTFIX] Optimize the code style in csdk/sdk markdow...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3030 LGTM ---
[GitHub] carbondata issue #3019: [CARBONDATA-3194] Integrating Carbon with Presto usi...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3019 remove CarbondataConnector.java in this pr by using hive connector. in future, if consider contributing carbondata integraton to presto community, how to handle ? ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244003542 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala --- @@ -72,69 +74,107 @@ object CarbonSessionExample { val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" // scalastyle:off -spark.sql( - s""" - | LOAD DATA LOCAL INPATH '$path' - | INTO TABLE source - | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#') - """.stripMargin) -// scalastyle:on - -spark.sql( - s""" - | SELECT charField, stringField, intField - | FROM source - | WHERE stringfield = 'spark' AND decimalField > 40 - """.stripMargin).show() - -spark.sql( - s""" - | SELECT * - | FROM source WHERE length(stringField) = 5 - """.stripMargin).show() - -spark.sql( - s""" - | SELECT * - | FROM source WHERE date_format(dateField, "-MM-dd") = "2015-07-23" - """.stripMargin).show() - -spark.sql("SELECT count(stringField) FROM source").show() - -spark.sql( - s""" - | SELECT sum(intField), stringField - | FROM source - | GROUP BY stringField - """.stripMargin).show() - -spark.sql( - s""" - | SELECT t1.*, t2.* - | FROM source t1, source t2 - | WHERE t1.stringField = t2.stringField - """.stripMargin).show() - -spark.sql( - s""" - | WITH t1 AS ( - | SELECT * FROM source - | UNION ALL - | SELECT * FROM source - | ) - | SELECT t1.*, t2.* - | FROM t1, source t2 - | WHERE t1.stringField = t2.stringField - """.stripMargin).show() - -spark.sql( - s""" - | SELECT * - | FROM source - | WHERE stringField = 'spark' and floatField > 2.8 - """.stripMargin).show() +//spark.sql( +// s""" +// | LOAD DATA LOCAL INPATH '$path' +// | INTO TABLE source +// | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#') +// """.stripMargin) +//// scalastyle:on +// +//spark.sql( +// s""" +// | CREATE TABLE source_cs( +// | shortField SHORT, +// | intField INT, +// | bigintField LONG, +// | doubleField DOUBLE, +// | stringField STRING, +// | timestampField TIMESTAMP, +// | decimalField DECIMAL(18,2), +// | dateField DATE, +// | charField CHAR(5), +// | floatField FLOAT +// | ) +// | using carbon +// | location 'file://${ExampleUtils.storeLocation}' +// """.stripMargin) +// +//spark.sql("insert into source_cs select * from source") +// +//spark.sql( +// s""" +// | CREATE TABLE source_par( +// | shortField SHORT, +// | intField INT, +// | bigintField LONG, +// | doubleField DOUBLE, +// | stringField STRING, +// | timestampField TIMESTAMP, +// | decimalField DECIMAL(18,2), +// | dateField DATE, +// | charField CHAR(5), +// | floatField FLOAT +// | ) +// | using parquet +// """.stripMargin) +// +//spark.sql("insert into source_par select * from source") +//spark.sql( +// s""" +// | SELECT charField, stringField, intField +// | FROM source +// | WHERE stringfield = 'spark' AND decimalField > 40 +// """.stripMargin).show() +// +//spark.sql( +// s""" +// | SELECT * +// | FROM source WHERE length(stringField) = 5 +// """.stripMargin).show() +// +//spark.sql( +// s""" +// | SELECT * +// | FROM source WHERE date_format(dateField, "-MM-dd") = "2015-07-23" +// """.stripMargin).show() +// +//spark.sql("SELECT count(stringField) FROM source").show() +// +//spark.sql( +// s""" +// | SELECT sum(intField), stringField +// | FROM source +// | GROUP BY stringField +// """.stripMargin).show() +// +//spark.sql( +// s""" --- End diff -- why disable all these code? ---
[GitHub] carbondata issue #3018: [HOTFIX] rename field "thread_pool_size" to match ca...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3018 Can you please explain , why need to rename? ---
[GitHub] carbondata issue #3021: [CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3021 @chandrasaripaka As I know, spark 2.2.0 is not a stable version, it is better to consider other more stable versions. ---
[GitHub] carbondata issue #2890: [CARBONDATA-3002] Fix some spell error
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2890 LGTM ---
[GitHub] carbondata issue #2978: [CARBONDATA-3157] Added lazy load and direct vector ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2978 For 1.5.2: Whether can consider merging vector code to core module from presto integration module for example CarbonVectorBatch, or not ? ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2981 @kunal642 please review it ---
[GitHub] carbondata issue #2978: [WIP] Added lazy load and direct vector fill support...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2978 retest this please ---
[GitHub] carbondata issue #2954: [CARBONDATA-3128]Fix the HiveExample exception
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2954 @SteNicholas Thanks for your good contribution. can you squash all commits to one commit ---
[GitHub] carbondata issue #2961: Fixing the getOrCreateCarbonSession method parameter...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2961 the pr title is not correct, the format should be : [JIRA NUMBER] PR description ---
[GitHub] carbondata issue #2961: Fixing the getOrCreateCarbonSession method parameter...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2961 add to whitelist ---
[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2963 Can consider writing an example:how to use MinMaxDataMap to build index for CSV file. ---
[GitHub] carbondata issue #2954: [CARBONDATA-3128]Fix the HiveExample exception
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2954 add to whitelist ---
[GitHub] carbondata pull request #2950: [Test PR] How to set PR labels
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/2950 [Test PR] How to set PR labels Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata patch-8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2950.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2950 commit 5a7f32b4e6fe75ab76e31c44bf6541a95e1a3347 Author: Liang Chen Date: 2018-11-24T03:09:25Z [Test PR] How to set PR labels ---
[GitHub] carbondata issue #2934: [CARBONDATA-3111] Readme updated some error links ha...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2934 LGTM ---
[GitHub] carbondata issue #2934: [CARBONDATA-3111] Readme updated some error links ha...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2934 add to whitelist ---
[GitHub] carbondata issue #2890: [CARBONDATA-3002] Fix some spell error
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2890 LGTM ---
[GitHub] carbondata pull request #2838: [HOTFIX] Upgrade pom version to 1.6-SNAPSHOT
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/2838 [HOTFIX] Upgrade pom version to 1.6-SNAPSHOT Upgrade pom version to 1.6-SNAPSHOT You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata upgrade_1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2838.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2838 commit 01fe9f12fbadec8f8fcada8f183ec8c7faa4b6b5 Author: chenliang613 Date: 2018-10-20T09:49:25Z [HOTFIX] Upgrade pom version to 1.6-SNAPSHOT ---
[GitHub] carbondata issue #2802: [HOTFIX] Correct Create Table documentation contents
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2802 please rebase it. ---
[GitHub] carbondata issue #2810: [WIP] Add CarbonSession Java Example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2810 add to whitelist ---
[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2779 LGTM ---
[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2779#discussion_r221444503 --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala --- @@ -0,0 +1,55 @@ +/* --- End diff -- ok. ---
[GitHub] carbondata issue #2779: [CARBONDATA-2989] Upgrade spark integration version ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2779 My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. see if can add the judgement for different spark version with different code/parameters. ---
[GitHub] carbondata pull request #2779: [CARBONDATA-2989] Upgrade spark integration v...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2779#discussion_r221414692 --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala --- @@ -0,0 +1,55 @@ +/* --- End diff -- My comment : only for 4 parameters , copy the whole file(CarbonDataSourceScan.scala) for spark 2.3 integration, may not require. see if can add the judgement for different spark version with different code/parameters. ---
[GitHub] carbondata pull request #2779: [WIP] Upgrade spark integration version to 2....
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2779#discussion_r221131678 --- Diff: integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala --- @@ -0,0 +1,55 @@ +/* --- End diff -- Why need to move CarbonDataSourceScan.scala? ---
[GitHub] carbondata issue #2777: [HOXFIX] Upgrade spark integration version to 2.3.2
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2777 same as #2779 , so close this pr. ---
[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...
Github user chenliang613 closed the pull request at: https://github.com/apache/carbondata/pull/2777 ---
[GitHub] carbondata issue #2779: [WIP] Upgrade spark integration version to 2.3.2
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2779 retest this please ---
[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2777#discussion_r221129765 --- Diff: pom.xml --- @@ -608,13 +608,12 @@ spark-2.3 -2.3.1 +2.3.2 2.11 2.11.8 integration/spark2 -integration/hive --- End diff -- hive integration with spark 2.3.2 is not working. ---
[GitHub] carbondata pull request #2777: [HOXFIX] Upgrade spark integration version to...
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/2777 [HOXFIX] Upgrade spark integration version to 2.3.2 1. Upgrade spark integration version to 2.3.2 2. Currently, hive integration module is not supported along with spark 2.3.2, so remove it in pom. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata spark2.3.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2777 commit e75dde20946873ef42f8c447c71c980375c76a96 Author: chenliang613 Date: 2018-09-27T15:10:15Z [HOXFIX] upgrade spark integration version to 2.3.2 ---
[GitHub] carbondata issue #2733: [CARBONDATA-2818] Upgrade presto integration version...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2733 https://user-images.githubusercontent.com/8075709/45723721-52736800-bbe5-11e8-853f-30f530156396.png;> verified! ---
[GitHub] carbondata pull request #2733: [CARBONDATA-2818] Upgrade presto integration ...
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/2733 [CARBONDATA-2818] Upgrade presto integration version to 0.210 As per the mailing list discussion:http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Propose-to-upgrade-the-version-of-integration-presto-from-0-187-to-0-206-td57336.html Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? NO - [X] Any backward compatibility impacted? YES - [X] Document update required? YES - [X] Testing done YES - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. YES You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata presto_210 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2733.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2733 commit 8ecb48b1d3b9e678f89047b9cc9b0063e435d256 Author: chenliang613 Date: 2018-09-19T00:18:28Z [CARBONDATA-2818] Upgrade presto integration version to 0.210 ---
[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2607 LGTM, spark 2.3.1 CI is another issue. ---
[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2607 retest this please ---
[GitHub] carbondata issue #2714: [CARBONDATA-2875]Two different threads overwriting t...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2714 add to whitelist ---
[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2691#discussion_r216216999 --- Diff: integration/spark-common-test/src/test/resources/cars.csv --- @@ -0,0 +1,4 @@ +name,age --- End diff -- can you reuse the current csv file, no need to add new one. ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2638 LGTM ---
[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2695 retest this please ---
[GitHub] carbondata issue #2693: [CARBONDATA-2915] Reformat Documentation of CarbonDa...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2693 LGTM ---
[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2693#discussion_r215881276 --- Diff: docs/datamap-developer-guide.md --- @@ -3,14 +3,28 @@ ### Introduction DataMap is a data structure that can be used to accelerate certain query of the table. Different DataMap can be implemented by developers. Currently, there are two 2 types of DataMap supported: -1. IndexDataMap: DataMap that leveraging index to accelerate filter query -2. MVDataMap: DataMap that leveraging Materialized View to accelerate olap style query, like SPJG query (select, predicate, join, groupby) +1. IndexDataMap: DataMap that leverages index to accelerate filter query +2. MVDataMap: DataMap that leverages Materialized View to accelerate olap style query, like SPJG query (select, predicate, join, groupby) ### DataMap provider When user issues `CREATE DATAMAP dm ON TABLE main USING 'provider'`, the corresponding DataMapProvider implementation will be created and initialized. Currently, the provider string can be: -1. preaggregate: one type of MVDataMap that do pre-aggregate of single table -2. timeseries: one type of MVDataMap that do pre-aggregate based on time dimension of the table +1. preaggregate: A type of MVDataMap that do pre-aggregate of single table +2. timeseries: A type of MVDataMap that do pre-aggregate based on time dimension of the table 3. class name IndexDataMapFactory implementation: Developer can implement new type of IndexDataMap by extending IndexDataMapFactory -When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding DataMapProvider interface will be called. \ No newline at end of file +When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding DataMapProvider interface will be called. + +Details about [DataMap Management](./datamap-management.md#datamap-management) and supported [DSL](./datamap-management.md#overview) are documented [here](./datamap-management.md). --- End diff -- this link is not working. ---
[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2693#discussion_r215879697 --- Diff: docs/configuration-parameters.md --- @@ -235,3 +235,16 @@ RESET * Success will be recorded in the driver log. * Failure will be displayed in the UI. + + +
[GitHub] carbondata pull request #2693: [CARBONDATA-2915] Reformat Documentation of C...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2693#discussion_r215879237 --- Diff: docs/carbondata-architecture-design.md --- @@ -0,0 +1,140 @@ +## Architecture + --- End diff -- Please remove this architecture md file from this pr, there are many info need to be confirmed, it would be better if you could put it to mailing list for discussion. ---
[GitHub] carbondata issue #2684: [CARBONDATA-2908]the option of sort_scope don't effe...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2684 @qiuchenjian please update the pr's title, doesn't display completely. ---
[GitHub] carbondata issue #2592: [CARBONDATA-2915] Updated & enhanced Documentation o...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2592 LGTM ---
[GitHub] carbondata pull request #2683: [CARBONDATA-2916] Add CarbonCli tool for data...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2683#discussion_r215669915 --- Diff: pom.xml --- @@ -706,6 +706,12 @@ datamap/mv/core + + tool --- End diff -- suggest using "tools" ---
[GitHub] carbondata pull request #2592: [CARBONDATA-2915] Updated & enhanced Document...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2592#discussion_r215659429 --- Diff: docs/configuration-parameters.md --- @@ -16,152 +16,135 @@ --> # Configuring CarbonData - This tutorial guides you through the advanced configurations of CarbonData : - + This guide explains the configurations that can be used to tune CarbonData to achieve better performance.Some of the properties can be set dynamically and are explained in the section Dynamic Configuration In CarbonData Using SET-RESET.Most of the properties that control the internal settings have reasonable default values.They are listed along with the properties along with explanation. --- End diff -- suggest removing this sentence : Some of the properties can be set dynamically and are explained in the section Dynamic Configuration In CarbonData Using SET-RESET ---
[GitHub] carbondata pull request #2592: [CARBONDATA-2915] Updated & enhanced Document...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2592#discussion_r215655761 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -470,15 +447,6 @@ */ @CarbonProperty public static final String CARBON_DATE_FORMAT = "carbon.date.format"; - /** - * STORE_LOCATION_HDFS - */ - @CarbonProperty - public static final String STORE_LOCATION_HDFS = "carbon.storelocation.hdfs"; --- End diff -- can you please explain, why need to remove : "STORE_LOCATION_HDFS" ? ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 LGTM ---
[GitHub] carbondata issue #2686: upgrade to scala 2.12.6 and binary 2.11
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2686 same question as @zzcclp . It is better to raise one discussion first on mailing list. ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 two comments : 1.In this example, suggest listing all typical cases which be supported currently and put the performance comparison, how to improve the performance after creating mv datamap. 2.Don't need add this example to CI, because the 1st comment, there are many performance comparison. ---
[GitHub] carbondata pull request #2614: [CARBONDATA-2837] Added MVExample in example ...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2614#discussion_r214526743 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MVDataMapExample.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.examples.util.ExampleUtils + +/** + * This example is for pre-aggregate tables. + */ + +object MVDataMapExample { + + def main(args: Array[String]) { +val spark = ExampleUtils.createCarbonSession("MVDataMapExample") +exampleBody(spark) +spark.close() + } + + def exampleBody(spark: SparkSession): Unit = { +val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath +val testData = s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv" + +// 1. simple usage for Pre-aggregate tables creation and query +spark.sql("DROP TABLE IF EXISTS mainTable") +spark.sql("DROP TABLE IF EXISTS dimtable") +spark.sql( + """ +| CREATE TABLE mainTable +| (id Int, +| name String, +| city String, +| age Int) +| STORED BY 'org.apache.carbondata.format' + """.stripMargin) + +spark.sql( + """ +| CREATE TABLE dimtable +| (name String, +| address String) +| STORED BY 'org.apache.carbondata.format' + """.stripMargin) + +spark.sql(s"""LOAD DATA LOCAL INPATH '$testData' into table mainTable""") + +spark.sql(s"""insert into dimtable select name, concat(city, ' street1') as address from + |mainTable group by name, address""".stripMargin) --- End diff -- Why need to add "group by name ,address" ? ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 retest this please ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 retest this please ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 retest this please ---
[GitHub] carbondata issue #2668: [CARBONDATA-2899] Add MV module class to assembly JA...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2668 LGTM ---
[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2607 @bhavya411 any new progress ? ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 retest this please ---
[GitHub] carbondata issue #2615: [HOTFIX] [presto] presto integration code cleanup
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2615 LGTM ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2614 retest this please ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2637 add to whitelist ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2636 LGTM ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2636 add to whitelist ---
[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2607 @bhavya411 I tested this pr, the performance(simple aggregation) not getting any improvement(0.206 compare to 0.187) Just i checked 0.207 and 0.208, there are fixed many memory issues, so propose to upgrade to 0.208 for CarbonData integration. ---
[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2607 retest this please ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2620 LGTM ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2620 retest this please ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2620 retest this please ---
[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2620#discussion_r208780697 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala --- @@ -0,0 +1,69 @@ +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + + +object CustomCompactionExample { + + def main(args: Array[String]): Unit = { +val spark = ExampleUtils.createCarbonSession("CustomCompactionExample") +exampleBody(spark) +spark.close() + } + + def exampleBody(spark : SparkSession): Unit = { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + +spark.sql("DROP TABLE IF EXISTS custom_compaction_table") + +spark.sql( + s""" + | CREATE TABLE IF NOT EXISTS custom_compaction_table( + | ID Int, + | date Date, + | country String, + | name String, + | phonetype String, + | serialname String, + | salary Int, + | floatField float + | ) STORED BY 'carbondata' + """.stripMargin) + +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath +val path = s"$rootPath/examples/spark2/src/main/resources/dataSample.csv" + +// load 4 segments +// scalastyle:off +(1 to 4).foreach(_ => spark.sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE custom_compaction_table + | OPTIONS('HEADER'='true') + """.stripMargin)) +// scalastyle:on + +// show all segments: 0,1,2,3 +spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show() + +// do custom compaction, segments specified will be merged +spark.sql("ALTER TABLE custom_compaction_table COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2)") +spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show() + +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_DATE_FORMAT, + CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT) + --- End diff -- After custom compaction, please query table data once to check the data if it is correct? ---
[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2620#discussion_r208780252 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala --- @@ -0,0 +1,69 @@ +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + + --- End diff -- please add the description for explaining the example. ---
[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2620#discussion_r208780123 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala --- @@ -0,0 +1,69 @@ +package org.apache.carbondata.examples --- End diff -- please add the apache license header ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2620 retest this please ---
[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2590 LGTM ---
[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2576 retest this please ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r206553068 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **Bottleneck for Local Dictionary:** The memory size will increase when local dictionary is enabled. --- End diff -- Please change "bottleneck" to "The cost" ---
[GitHub] carbondata issue #2582: [CARBONDATA-2801]Added documentation for flat folder
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2582 LGTM ---
[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2568#discussion_r206490628 --- Diff: integration/presto/Presto-integration-in-carbondata.md --- @@ -0,0 +1,132 @@ + + +# PRESTO INTEGRATION IN CARBONDATA + +1. [Document Purpose](#document-purpose) +1. [Purpose](#purpose) +1. [Scope](#scope) +1. [Definitions and Acronyms](#definitions-and-acronyms) +1. [Requirements addressed](#requirements-addressed) +1. [Design Considerations](#design-considerations) +1. [Row Iterator Implementation](#row-iterator-implementation) +1. [ColumnarReaders or StreamReaders approach](#columnarreaders-or-streamreaders-approach) +1. [Module Structure](#module-structure) +1. [Detailed design](#detailed-design) +1. [Modules](#modules) +1. [Functions Developed](#functions-developed) +1. [Integration Tests](#integration-tests) +1. [Tools and languages used](#tools-and-languages-used) +1. [References](#references) + +## Document Purpose + + * _Purpose_ + The purpose of this document is to outline the technical design of the Presto Integration in CarbonData. + + Its main purpose is to - + * Provide the link between the Functional Requirement and the detailed Technical Design documents. + * Detail the functionality which will be provided by each component or group of components and show how the various components interact in the design. + + This document is not intended to address installation and configuration details of the actual implementation. Installation and configuration details are provided in technology guides provided on CarbonData wiki page.As is true with any high level design, this document will be updated and refined based on changing requirements. + * _Scope_ + Presto Integration with CarbonData will allow execution of CarbonData queries on the Presto CLI. Â CarbonData can be added easily as a Data Source among the multiple heterogeneous data sources for Presto. + * _Definitions and Acronyms_ + **CarbonData :** CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores). + + **Presto :** Presto is a distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. + +## Requirements addressed +This integration of Presto mainly serves two purpose: + * Support of Apache CarbonData as Data Source in Presto. + * Execution of Apache CarbonData Queries on Presto. + +## Design Considerations +Following are the design considerations for the Presto Integration with the Carbondata. + + Row Iterator Implementation + + Presto provides a way to iterate the records through a RecordSetProvider which creates a RecordCursor so we have to extend this class to create a CarbondataRecordSetProvider and CarbondataRecordCursor to read data from Carbondata core module. The CarbondataRecordCursor will utilize the DictionaryBasedResultCollector class of Core module to read data row by row. This approach has two drawbacks. + * The Presto converts this row data into columnar data again since carbondata itself store data in columnar format we are adding an additional conversion to row to column instead of directly using the column. + * The cursor reads the data row by row instead of a batch of data , so this is a costly operation as we are already storing the data in pages or batches we can directly read the batches of data. + + ColumnarReaders or StreamReaders approach + + In this design we can create StreamReaders that can read data from the Carbondata Column based on DataType and directly convert it into Presto Block. This approach saves us the row by row processing as well as reduce the transition and conversion of data . By this approach we can achieve the fastest read from Presto and create a Presto Page by extending PageSourceProvider and PageSource class. This design will be discussed in detail in the next sections of this document. + +## Module Structure + + +![module structure](../presto/images/module-structure.jpg?raw=true) + + + +## Detailed design + Modules + +Based on the above functionality, Presto Integration is implemented as following module: + +1. **Presto** + +Integration of Presto with CarbonData includes implementation of connector Api of the Presto. + Functions developed + +![functionas
[GitHub] carbondata pull request #2582: [CARBONDATA-2801]Added documentation for flat...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2582#discussion_r206487262 --- Diff: docs/data-management-on-carbondata.md --- @@ -284,6 +286,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ALTER TABLE employee SET TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ) ``` +- **Support Flat folder** --- End diff -- change to : **Support Flat folder same as Hive/Parquet** ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r206486006 --- Diff: docs/data-management-on-carbondata.md --- @@ -508,6 +511,9 @@ Users can specify which columns to include and exclude for local dictionary gene ``` ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE') ``` + + **NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, he/she can enable/disable local dictionary for new data on those tables at his/her discretion. --- End diff -- "he/she" change to "user" ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r206485782 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. --- End diff -- Please explain : what is the cost for enabling local dictionary. ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r206485268 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | -- | - | --- | - | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (maximum - 10) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns which are not included in dictionary include| Columns for which Local Dictionary is generated. | --- End diff -- "which are not included in dictionary include" -- please refine. ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r206484679 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: --- End diff -- Please add one Note and list which data type don't support. ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r206481821 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,63 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +Amazon S3 is a cloud storage service that is recommended for storing large data files. You can +use this feature if you want to store data on amazon cloud. Since the data is stored on to cloud +storage there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbon can support any Object store that conforms to Amazon S3 API. + +#Writing to Object Store +To store carbondata files on to Object Store location, you need to set `carbon +.storelocation` property to Object Store path in CarbonProperties file. For example, carbon +.storelocation=s3a://mybucket/carbonstore. By setting this property, all the tables will be created on the specified Object Store path. + +If your existing store is HDFS, and you want to store specific tables on S3 location, then `location` parameter has to be set during create +table. +For example: + +``` +CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS carbondata LOCATION 's3a://mybucket/carbonstore' +``` + +For more details on create table, Refer [data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table) + +#Authentication +You need to set authentication properties to store the carbondata files on to S3 location. For +more details on authentication properties, refer +[hadoop authentication document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties) + +Another way of setting the authentication parameters is as follows: + +``` + SparkSession + .builder() + .master(masterURL) + .appName("S3Example") + .config("spark.driver.host", "localhost") + .config("spark.hadoop.fs.s3a.access.key", "") + .config("spark.hadoop.fs.s3a.secret.key", "") + .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1") + .getOrCreateCarbonSession() +``` + +#Recommendations +1. Object stores like S3 does not support file leasing mechanism(supported by HDFS) that is +required to take locks which ensure consistency between concurrent operations therefore, it is +recommended to set the configurable lock path property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration)) + to a HDFS directory. +2. As Object stores are eventual consistent meaning that any put request can take some time to reflect when trying to list objects from that bucket therefore concurrent queries are not supported. --- End diff -- Changes to : Object Storage ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r206481369 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,63 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +Amazon S3 is a cloud storage service that is recommended for storing large data files. You can --- End diff -- Suggest changing to : S3 is an object storage API on cloud,it is recommended for storing large data files. You can use this feature if you want to store data on amazon cloud or huawei cloud(obs). Since the data is stored on cloud storage there are no restrictions on the size of data and the data can be accessed from anywhere at any time. Carbondata can support any Object storage that conforms to Amazon S3 API. ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r206480055 --- Diff: docs/datamap/preaggregate-datamap-guide.md --- @@ -7,6 +24,7 @@ * [Querying Data](#querying-data) * [Compaction](#compacting-pre-aggregate-tables) * [Data Management](#data-management-with-pre-aggregate-tables) +* [Limitations](#Limitations) --- End diff -- Why need to add this item ---
[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2576 retest this please ---
[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] add CTable interface in CarbonSto...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2589 Can you explain "CTable" for what ? ---
[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] add CTable interface in Ca...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2589#discussion_r206458466 --- Diff: store/core/pom.xml --- @@ -48,8 +48,8 @@ org.apache.maven.plugins maven-compiler-plugin - 1.7 - 1.7 + 8 --- End diff -- 1.8 ? ---