[GitHub] carbondata pull request #2965: [Documentation] Editorial review
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2965 [Documentation] Editorial review Corrected spelling mistakes and grammer You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata DTS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2965.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2965 commit 2dd9603aece16a53f781265ebc0db6cd482a4d5f Author: sgururajshetty Date: 2018-11-29T13:14:22Z Spelling mistakes corrected ---
[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2805 @sraghunandan kindly review ---
[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2805 @sraghunandan kindly review and help me to merge my changes ---
[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2888#discussion_r229732059 --- Diff: docs/sdk-guide.md --- @@ -124,7 +124,7 @@ public class TestSdkAvro { try { CarbonWriter writer = CarbonWriter.builder() .outputPath(path) - .withAvroInput(new org.apache.avro.Schema.Parser().parse(avroSchema)).build(); + .withAvroInput(new org.apache.avro.Schema.Parser().parse(avroSchema))..writtenBy("SDK").build(); --- End diff -- two .. Please remove one ---
[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2805 @sraghunandan kindly review and help me to merge my changes ---
[GitHub] carbondata pull request #2805: Local dictionary Data which are not supported...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2805 Local dictionary Data which are not supported float and byte updated in the note You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata Data_not_supported Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2805 commit 335ae173c72657852539caaecce3437c70e1e07a Author: sgururajshetty Date: 2018-10-09T11:04:20Z Data which are not supported float and byte updated in the note ---
[GitHub] carbondata issue #2788: [Documentation] Readme updated with latest topics an...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2788 @sraghunandan kindly review the PR and help me to merge the same ---
[GitHub] carbondata pull request #2788: [Documentation] Readme updated with latest to...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2788#discussion_r222666063 --- Diff: docs/carbon-as-spark-datasource-guide.md --- @@ -41,25 +42,23 @@ Carbon table can be created with spark's datasource DDL syntax as follows. | Property | Default Value | Description | |---|--|| -| table_blocksize | 1024 | Size of blocks to write onto hdfs | +| table_blocksize | 1024 | Size of blocks to write onto hdfs. For more details, see [Table Block Size Configuration](./ddl-of-carbondata.md#table-block-size-configuration) | --- End diff -- Added reference to all properties ---
[GitHub] carbondata issue #2788: [Documentation] Readme updated with latest topics an...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2788 @sraghunandan & @kunal642 kindly review and merge the doc ---
[GitHub] carbondata issue #2788: [Documentation] Readme updated with latest topics an...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2788 @sraghunandan review ---
[GitHub] carbondata pull request #2788: [Documentation] Readme updated with latest to...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2788 [Documentation] Readme updated with latest topics and new TOC > Readme updated with the new structure > Formatting issue fixed > Review comments fixed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata doc_sept Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2788.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2788 commit c8bc47ec164e43736d6f5b39b7c883d2b11bd7f7 Author: sgururajshetty Date: 2018-09-28T13:43:08Z Readme updated with latest topics and new TOC Formatting issues fixed ---
[GitHub] carbondata issue #2766: [WIP] Added documentation for fallback condition for...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2766 LGTM ---
[GitHub] carbondata issue #2757: [DOC] Add spark carbon file format documentation
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2757 LGTM ---
[GitHub] carbondata issue #2744: [CARBONDATA-2957][doc] update doc for supporting com...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2744 LGTM ---
[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2756 LGTM ---
[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2756#discussion_r220428630 --- Diff: docs/sdk-guide.md --- @@ -181,22 +181,31 @@ public class TestSdkJson { ``` ## Datatypes Mapping -Each of SQL data types are mapped into data types of SDK. Following are the mapping: +Each of SQL data types and Avro Data Types are mapped into data types of SDK. Following are the mapping: -| SQL DataTypes | Mapped SDK DataTypes | +| SQL DataTypes | Avro DataTypes | Mapped SDK DataTypes | |---|--| --- End diff -- Check the formatting ---
[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2756#discussion_r220427375 --- Diff: docs/configuration-parameters.md --- @@ -42,6 +42,7 @@ This section provides the details of all the configurations required for the Car | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | | carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. | | carbon.unsafe.working.memory.in.mb | 512 | CarbonData supports storing data in off-heap memory for certain operations during data loading and query.This helps to avoid the Java GC and thereby improve the overall performance.The Minimum value recommeded is 512MB.Any value below this is reset to default value of 512MB.**NOTE:** The below formulas explain how to arrive at the off-heap size required.Memory Required For Data Loading:(*carbon.number.of.cores.while.loading*) * (Number of tables to load in parallel) * (*offheap.sort.chunk.size.inmb* + *carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb*/3.5 ). Memory required for Query:SPARK_EXECUTOR_INSTANCES * (*carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb* * 3.5) * spark.executor.cores | +| carbon.unsafe.driver.working.memory.in.mb | 60% of JVM Heap Memory | CarbonData supports storing data in unsafe on-heap memory in driver for certain operations like insert into, query for loading datamap cache. The Minimum value recommended is 512MB. | --- End diff -- Parameter description should satisfy following questions: a. What does this parameter do? b. In what scenario the user needs to configure this parameter? c. Is there any benefits by configuring these parameter? d. What is the default value? e. What is the value range if any? f. Is there any limitations? g. Any key information to be highlighted? ---
[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2756#discussion_r220427408 --- Diff: docs/configuration-parameters.md --- @@ -42,6 +42,7 @@ This section provides the details of all the configurations required for the Car | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | | carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. | | carbon.unsafe.working.memory.in.mb | 512 | CarbonData supports storing data in off-heap memory for certain operations during data loading and query.This helps to avoid the Java GC and thereby improve the overall performance.The Minimum value recommeded is 512MB.Any value below this is reset to default value of 512MB.**NOTE:** The below formulas explain how to arrive at the off-heap size required.Memory Required For Data Loading:(*carbon.number.of.cores.while.loading*) * (Number of tables to load in parallel) * (*offheap.sort.chunk.size.inmb* + *carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb*/3.5 ). Memory required for Query:SPARK_EXECUTOR_INSTANCES * (*carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb* * 3.5) * spark.executor.cores | +| carbon.unsafe.driver.working.memory.in.mb | 60% of JVM Heap Memory | CarbonData supports storing data in unsafe on-heap memory in driver for certain operations like insert into, query for loading datamap cache. The Minimum value recommended is 512MB. | --- End diff -- Kindly follow the same for all the parameter description ---
[GitHub] carbondata issue #2753: [CARBONDATA-2964] Fix for unsupported float data typ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2753 LGTM ---
[GitHub] carbondata pull request #2643: Formatting fix s3
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2643 Formatting fix s3 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata formattingFix_S3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2643.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2643 commit 529f80dda6db3ce34e0baf766b03a9a13190b286 Author: sgururajshetty Date: 2018-07-25T12:44:07Z Documentation for support for COLUMN_META_CACHE in create table and alter table properties commit d816aaa7a89155b3579906f960ed6a0ba4d4a59f Author: sgururajshetty Date: 2018-07-25T12:48:43Z Documentation to support for CACHE_LEVEL in create table and alter table properties commit 8ac243f8e9cff8359b6064352deb823eda7b9835 Author: sgururajshetty Date: 2018-07-25T13:24:52Z Review comment fixed commit 98501d35cfd110bcb9e75eb02628f3bce0c0f4ab Author: sgururajshetty Date: 2018-07-25T13:26:58Z review comment fixed commit 62caf822cbcde1e519501c1d5db3c5cfc05fbd63 Author: Indhumathi27 Date: 2018-07-21T10:46:21Z [CARBONDATA-2606]Fix Complex array Pushdown and block auto merge compaction 1.Check for if Complex Column contains ArrayType at n levels and add parent to projection if contains array. 2.Block Auto merge compaction for table containing complex datatype columns. 3.Fix Decimal Datatype scale and precision with two level struct type 4.Fix Dictionary Include for ComplexDataType - If other complex columns other than first complex column is given in dictionary include, then its insertion fails. 5.Fix BadRecord and dateformat for Complex primitive type-DATE This closes #2535 commit d287a102b5c96e54261ac00c77038a1a56161fe9 Author: kumarvishal09 Date: 2018-07-24T14:40:54Z [CARBONDATA-2779]Fixed filter query issue in case of V1/v2 format store Problem: Filter query is failing for V1/V2 carbondata store Root Cause: in V1 store measure min max was not added in blockminmaxindex in executor when filter is applied min max pruning is failing with array index out of cound exception Solution: Need to add min max for measure column same as already handled in driver block pruning This closes #2550 commit b08745f68624ff066e0b23a41ce12d4a99618ac5 Author: Manhua Date: 2018-07-25T08:51:49Z [CARBONDATA-2783][BloomDataMap][Doc] Update document for bloom filter datamap add example for enable/disable datamap This closes #2554 commit 964d26866468df6be130e9d65d339439cb4cf3b0 Author: praveenmeenakshi56 Date: 2018-07-25T15:31:37Z [CARBONDATA-2750] Added Documentation for Local Dictionary Support Added Documentation for Local Dictionary Support This closes #2520 commit 1fa9f64d70123d0bc988427a34c0750283f5daae Author: BJangir Date: 2018-07-23T16:44:12Z [CARBONDATA-2772] Size based dictionary fallback is failing even threshold is not reached. Issue:- Size Based Fallback happened even threshold is not reached. RootCause:- Current size calculation is wrong. it is calculated for each data. instead of generated dictionary data . Solution :- Current size should be calculated only for generated dictionary data. This closes #2542 commit eae5817e56a20aecb7694c8d387dbb05b96e1045 Author: kunal642 Date: 2018-07-24T10:42:54Z [CARBONDATA-2778]Fixed bug when select after delete and cleanup is showing empty records Problem: In case if delete operation when it is found that the data being deleted is leading to a state where one complete block data is getting deleted. In that case the status if that block is marked for delete and during the next delete operation run the block is deleted along with its carbonIndex file. The problem arises due to deletion of carbonIndex file because for multiple blocks there can be one carbonIndex file as one carbonIndex file represents one task
[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2603#discussion_r207516087 --- Diff: docs/configuration-parameters.md --- @@ -140,7 +140,7 @@ This section provides the details of all the configurations required for CarbonD | carbon.enableMinMax | true | Min max is feature added to enhance query performance. To disable this feature, set it false. | | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum time (unit in seconds) the scheduler can wait for executor to be active. Minimum value is 5 sec and maximum value is 15 sec. | | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the minimum resource (executor) ratio needed for starting the block distribution. The default value is 0.8, which indicates 80% of the requested resource is allocated for starting block distribution. The minimum value is 0.1 min and the maximum value is 1.0. | -| carbon.search.enabled | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | +| carbon.search.enabled (Alpha Feature) | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | * **Global Dictionary Configurations** --- End diff -- This issue is handled in a different PR #2576 ---
[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2603#discussion_r207516006 --- Diff: docs/configuration-parameters.md --- @@ -140,7 +140,7 @@ This section provides the details of all the configurations required for CarbonD | carbon.enableMinMax | true | Min max is feature added to enhance query performance. To disable this feature, set it false. | | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum time (unit in seconds) the scheduler can wait for executor to be active. Minimum value is 5 sec and maximum value is 15 sec. | | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the minimum resource (executor) ratio needed for starting the block distribution. The default value is 0.8, which indicates 80% of the requested resource is allocated for starting block distribution. The minimum value is 0.1 min and the maximum value is 1.0. | -| carbon.search.enabled | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | +| carbon.search.enabled (Alpha Feature) | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | * **Global Dictionary Configurations** --- End diff -- The minimum value need not be mentioned now ---
[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2576 LGTM ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207223686 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,12 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | Locks on the files are used to prevent concurrent operation from modifying the same files. This +configuration specifies the path where lock files have to be created. Recommended to configure +HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +**Note:** If this property is not set to HDFS location for S3 store, then there is a possibility +of data corruption because multiple data manipulation calls might try to update the status file +and as lock is not acquired before updation data might get overwritten. --- End diff -- since it is table, end the line with a pipline | ---
[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2603 [Documentation] Editorial review comment fixed Minor issues fixed (spelling, syntax, and missing info) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata editorial_review1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2603.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2603 commit 529f80dda6db3ce34e0baf766b03a9a13190b286 Author: sgururajshetty Date: 2018-07-25T12:44:07Z Documentation for support for COLUMN_META_CACHE in create table and alter table properties commit d816aaa7a89155b3579906f960ed6a0ba4d4a59f Author: sgururajshetty Date: 2018-07-25T12:48:43Z Documentation to support for CACHE_LEVEL in create table and alter table properties commit 8ac243f8e9cff8359b6064352deb823eda7b9835 Author: sgururajshetty Date: 2018-07-25T13:24:52Z Review comment fixed commit 98501d35cfd110bcb9e75eb02628f3bce0c0f4ab Author: sgururajshetty Date: 2018-07-25T13:26:58Z review comment fixed commit 62caf822cbcde1e519501c1d5db3c5cfc05fbd63 Author: Indhumathi27 Date: 2018-07-21T10:46:21Z [CARBONDATA-2606]Fix Complex array Pushdown and block auto merge compaction 1.Check for if Complex Column contains ArrayType at n levels and add parent to projection if contains array. 2.Block Auto merge compaction for table containing complex datatype columns. 3.Fix Decimal Datatype scale and precision with two level struct type 4.Fix Dictionary Include for ComplexDataType - If other complex columns other than first complex column is given in dictionary include, then its insertion fails. 5.Fix BadRecord and dateformat for Complex primitive type-DATE This closes #2535 commit d287a102b5c96e54261ac00c77038a1a56161fe9 Author: kumarvishal09 Date: 2018-07-24T14:40:54Z [CARBONDATA-2779]Fixed filter query issue in case of V1/v2 format store Problem: Filter query is failing for V1/V2 carbondata store Root Cause: in V1 store measure min max was not added in blockminmaxindex in executor when filter is applied min max pruning is failing with array index out of cound exception Solution: Need to add min max for measure column same as already handled in driver block pruning This closes #2550 commit b08745f68624ff066e0b23a41ce12d4a99618ac5 Author: Manhua Date: 2018-07-25T08:51:49Z [CARBONDATA-2783][BloomDataMap][Doc] Update document for bloom filter datamap add example for enable/disable datamap This closes #2554 commit 964d26866468df6be130e9d65d339439cb4cf3b0 Author: praveenmeenakshi56 Date: 2018-07-25T15:31:37Z [CARBONDATA-2750] Added Documentation for Local Dictionary Support Added Documentation for Local Dictionary Support This closes #2520 commit 1fa9f64d70123d0bc988427a34c0750283f5daae Author: BJangir Date: 2018-07-23T16:44:12Z [CARBONDATA-2772] Size based dictionary fallback is failing even threshold is not reached. Issue:- Size Based Fallback happened even threshold is not reached. RootCause:- Current size calculation is wrong. it is calculated for each data. instead of generated dictionary data . Solution :- Current size should be calculated only for generated dictionary data. This closes #2542 commit eae5817e56a20aecb7694c8d387dbb05b96e1045 Author: kunal642 Date: 2018-07-24T10:42:54Z [CARBONDATA-2778]Fixed bug when select after delete and cleanup is showing empty records Problem: In case if delete operation when it is found that the data being deleted is leading to a state where one complete block data is getting deleted. In that case the status if that block is marked for delete and during the next delete operation run the block is deleted along with its carbonIndex file. The problem arises due to deletion of carbonIndex file because for multiple blocks there can be one carbonIndex file as one carbonIndex file represents one task. Solution: Do not delete the carbondata and carbonIndex file. After compaction it will automatically take care of deleting the stale data and stale segments. This closes #2548 commit 6d6874a11482a8aa79f2280f6572e84b5e3cbc93 Author: dhatchayani Date: 2018-07-25T09:11:58Z [CARBONDATA-2753][Compatibility] Row count of page is calculated wrong for old store(V2 store) Row count of page is calculated wrong for V2 store. commit b6f5af6af96140876ec10ff09c3313d9b35ceb36 Author: Sssan520 Date: 2018-07-25T11:36:00Z [CARBONDATA-2782]delete dead code in class 'CarbonCleanFilesCommand' The variables(dmsãindexDms) in function processMetadata
[GitHub] carbondata pull request #2572: [CARBONDATA-2793][32k][Doc] Add 32k support i...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2572#discussion_r206017345 --- Diff: docs/data-management-on-carbondata.md --- @@ -283,7 +283,29 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` ALTER TABLE employee SET TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ) ``` - + + - **String longer than 32000 characters** --- End diff -- If it is a Alpha feature then please mention in the bracket that (Alpha Feature 1.4.1 ) ---
[GitHub] carbondata issue #2572: [CARBONDATA-2793][32k][Doc] Add 32k support in docum...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2572 LGTM ---
[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2520 LGTM ---
[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2558 [CARBONDATA-2648] Documentation for support for COLUMN_META_CACHE in create table and a⦠You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2558 commit 529f80dda6db3ce34e0baf766b03a9a13190b286 Author: sgururajshetty Date: 2018-07-25T12:44:07Z Documentation for support for COLUMN_META_CACHE in create table and alter table properties ---
[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2520 LGTM ---
[GitHub] carbondata pull request #:
Github user sgururajshetty commented on the pull request: https://github.com/apache/carbondata/commit/71048d7b7d2e4aa4a06536b45f2a33a4542f8b76#commitcomment-29761755 LGTM ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r203281996 --- Diff: docs/data-management-on-carbondata.md --- @@ -291,6 +330,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ALTER TABLE carbon ADD COLUMNS (a1 INT, b1 STRING) TBLPROPERTIES('DEFAULT.VALUE.a1'='10') ``` + Users can specify which columns to include and exclude for local dictionary generation after adding new columns. These will be appended with the already existing local dictionary include and exclude columns of main table respectively. --- End diff -- check the spacing between words ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r203277284 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- Convert this into table | Properties | Default Value | Description | The **description** should satisfy the following points: a. What does this parameter do? b. In what scenario the user needs to configure this parameter? c. Are there any benefits in configuring this parameter? d. What is the default value? e. What is the value range if any? f. Are there any limitations? g. Any key information to be highlighted? ---
[GitHub] carbondata issue #2502: [CARBONDATA-2738]Update documentation for Complex da...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2502 LGTM ---
[GitHub] carbondata issue #2361: [CARBONDATA-2577] [CARBONDATA-2579] Fixed issue in A...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2361 LGTM ---
[GitHub] carbondata issue #2356: [CARBONDATA-2566] Optimize CarbonReaderExample
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2356 LGTM ---
[GitHub] carbondata pull request #2320: [Documentation] Editorial review comment fixe...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2320 [Documentation] Editorial review comment fixed Editorial review comment fixed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata Editorial_Review Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2320.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2320 commit f8236ae2eb9e52d22f37a4b61967260071a50957 Author: sgururajshetty <sgururajshetty@...> Date: 2018-05-18T11:29:38Z Editorial review comment fixed ---
[GitHub] carbondata issue #2274: [CARBONDATA-2440] doc updated to set the property fo...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2274 LGTM ---
[GitHub] carbondata issue #2296: [CARBONDATA-2369] updated the document about AVRO to...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2296 LGTM ---
[GitHub] carbondata issue #2258: [CARBONDATA-2424] Added documentation for properties...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2258 LGTM ---
[GitHub] carbondata issue #2199: [CARBONDATA-2370] Added document for presto multinod...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2199 LGTM ---
[GitHub] carbondata issue #2220: [CARBONDATA-2369] FAQ update related to carbon SDK s...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2220 LGTM ---
[GitHub] carbondata issue #2198: [CARBONDATA-2369] Add a document for Non Transaction...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2198 LGTM ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183617096 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result --- End diff -- end all sentence with a period (.) ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183617702 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result + + For instance, main table called **sales** which is defined as + + ``` + CREATE TABLE datamap_test ( +name string, +age int, +city string, +country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene&quo
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183616698 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL --- End diff -- Lucene DataMap can be created using following DDL: ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183615908 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME --- End diff -- These are procedure steps, so we can have numbered list ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183618083 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result + + For instance, main table called **sales** which is defined as + + ``` + CREATE TABLE datamap_test ( +name string, +age int, +city string, +country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene&quo
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183617201 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result + + For instance, main table called **sales** which is defined as --- End diff -- For instance, main table called **sales** which is defined as: ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183617996 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result + + For instance, main table called **sales** which is defined as + + ``` + CREATE TABLE datamap_test ( +name string, +age int, +city string, +country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL + + ``` + CREATE DATAMAP dm + ON TABLE datamap_test + USING "lucene&quo
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183616317 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME --- End diff -- Close all the sentence with a period (.). This is applicable for all the sentences in this topics. ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183617239 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL + ``` + DROP DATAMAP [IF EXISTS] datamap_name + ON TABLE main_table + ``` +To show all DataMaps created, use: + ``` + SHOW DATAMAP + ON TABLE main_table + ``` +It will show all DataMaps created on main table. + + +## Lucene DataMap Introduction + Lucene datamap are created as index DataMaps and managed along with main tables by CarbonData. + User can create as many lucene datamaps required to improve query performance, + provided the storage requirements and loading speeds are acceptable. + + Once lucene datamaps are created, the indexes generated by lucene will be read for pruning till + row level for the filter query by launching a spark datamap job. This pruned data will be read to + give the proper and faster result + + For instance, main table called **sales** which is defined as + + ``` + CREATE TABLE datamap_test ( +name string, +age int, +city string, +country string) + STORED BY 'carbondata' + ``` + + User can create Lucene datamap using the Create DataMap DDL --- End diff -- User can create Lucene datamap using the Create DataMap DDL: ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183616213 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example --- End diff -- The below is a procedure, so put it in a numbered list: Step 1: Step 2: ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183616653 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show --- End diff -- Why a red background. Please check once ---
[GitHub] carbondata pull request #2215: [wip]add documentation for lucene datamap
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2215#discussion_r183616769 --- Diff: docs/datamap/lucene-datamap-guide.md --- @@ -0,0 +1,180 @@ +# CarbonData Lucene DataMap + +* [Quick Example](#quick-example) +* [DataMap Management](#datamap-management) +* [Lucene Datamap](#lucene-datamap-introduction) +* [Loading Data](#loading-data) +* [Querying Data](#querying-data) +* [Data Management](#data-management-with-pre-aggregate-tables) + +## Quick example +Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME + +Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars +```shell +mvn clean package -DskipTests -Pspark-2.2 +``` + +Start spark-shell in new terminal, type :paste, then copy and run the following code. +```scala + import java.io.File + import org.apache.spark.sql.{CarbonEnv, SparkSession} + import org.apache.spark.sql.CarbonSession._ + import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} + import org.apache.carbondata.core.util.path.CarbonStorePath + + val warehouse = new File("./warehouse").getCanonicalPath + val metastore = new File("./metastore").getCanonicalPath + + val spark = SparkSession + .builder() + .master("local") + .appName("preAggregateExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(warehouse, metastore) + + spark.sparkContext.setLogLevel("ERROR") + + // drop table if exists previously + spark.sql(s"DROP TABLE IF EXISTS datamap_test") + + // Create main table + spark.sql( + s""" + |CREATE TABLE datamap_test ( + |name string, + |age int, + |city string, + |country string) + |STORED BY 'carbondata' +""".stripMargin) + + // Create lucene datamap on the main table + spark.sql( + s""" + |CREATE DATAMAP dm + |ON TABLE datamap_test + |USING "lucene" + |DMPROPERTIES ('TEXT_COLUMNS' = 'name, country') + + import spark.implicits._ + import org.apache.spark.sql.SaveMode + import scala.util.Random + + // Load data to the main table, if + // lucene index writing fails, the datamap + // will be disabled in query + val r = new Random() + spark.sparkContext.parallelize(1 to 10) + .map(x => ("c1" + x % 8, x % 8, "city" + x % 50, "country" + x % 60)) + .toDF("name", "age", "city", "country") + .write + .format("carbondata") + .option("tableName", "datamap_test") + .option("compress", "true") + .mode(SaveMode.Append) + .save() + + spark.sql( +s""" + |SELECT * + |from datamap_test where + |TEXT_MATCH('name:c10') + """.stripMargin).show + + spark.stop +``` + + DataMap Management +Lucene DataMap can be created using following DDL + ``` + CREATE DATAMAP [IF NOT EXISTS] datamap_name + ON TABLE main_table + USING "lucene" + DMPROPERTIES ('text_columns'='city, name', ...) + ``` + +DataMap can be dropped using following DDL --- End diff -- DataMap can be dropped using following DDL: ---
[GitHub] carbondata issue #2198: [CARBONDATA-2369] Add a document for Non Transaction...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2198 LGTM ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183304329 --- Diff: integration/presto/Presto_Cluster_Setup_For_Carbondata.md --- @@ -0,0 +1,133 @@ +# Presto Multinode Cluster setup For Carbondata + +## Installing Presto + + 1. Download the 0.187 version of Presto using: + `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz` + + 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz` + + 3. Download the Presto CLI for the coordinator and name it presto. + + ``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto + ``` + + ## Create Configuration Files + + 1. Create `etc` folder in presto-server-0.187 directory. + 2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files. + 3. Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +## Coordinator Configurations + + # Contents of your config.properties + ``` + coordinator=true + node-scheduler.include-coordinator=false + http-server.http.port=8080 + query.max-memory=50GB + query.max-memory-per-node=2GB + discovery-server.enabled=true + discovery.uri=:8080 + ``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +## Worker Configurations + +# Contents of your config.properties + + ``` + coordinator=false + http-server.http.port=8080 + query.max-memory=50GB + query.max-memory-per-node=2GB + discovery.uri=:8080 + ``` + +**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id` + +## Catalog Configurations + +1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator. + +# Configuring Carbondata in Presto +1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes. + +## Add Plugins + +1. Create a directory named `carbondata` in plugin directory of presto --- End diff -- Period at the end of sentence for both the point. Check for all the sentence ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183304497 --- Diff: integration/presto/Presto_Cluster_Setup_For_Carbondata.md --- @@ -0,0 +1,133 @@ +# Presto Multinode Cluster setup For Carbondata + +## Installing Presto + + 1. Download the 0.187 version of Presto using: + `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz` + + 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz` + + 3. Download the Presto CLI for the coordinator and name it presto. + + ``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto + ``` + + ## Create Configuration Files + + 1. Create `etc` folder in presto-server-0.187 directory. + 2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files. + 3. Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +## Coordinator Configurations + + # Contents of your config.properties + ``` + coordinator=true + node-scheduler.include-coordinator=false + http-server.http.port=8080 + query.max-memory=50GB + query.max-memory-per-node=2GB + discovery-server.enabled=true + discovery.uri=:8080 + ``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +## Worker Configurations + +# Contents of your config.properties + + ``` + coordinator=false + http-server.http.port=8080 + query.max-memory=50GB + query.max-memory-per-node=2GB + discovery.uri=:8080 + ``` + +**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id` + +## Catalog Configurations + +1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator. + +# Configuring Carbondata in Presto +1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes. + +## Add Plugins + +1. Create a directory named `carbondata` in plugin directory of presto +2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes + +## Start Presto Server on all nodes + +``` +./presto-server-0.187/bin/launcher start +``` +To run it as a background process. + +``` +./presto-server-0.187/bin/launcher run +``` +To run it in foreground. + +## Start Presto CLI +``` +./presto +``` +To connect to carbondata catalog use the following command: + +``` +./presto --server :8080 --catalog carbondata --schema +``` +Execute the following command to ensure the workers are connected --- End diff -- : end of sentence ---
[GitHub] carbondata pull request #2198: [CARBONDATA-2369] Add a document for Non Tran...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2198#discussion_r183031656 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,140 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: +| SQL DataTypes | Mapped SDK DataTypes | --- End diff -- Table formatting has issue, please check ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183030442 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations --- End diff -- Heading 2 ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183030674 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations + + # Contents of your config.properties +``` +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery-server.enabled=true +discovery.uri=:8080 +``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +### Worker Configurations + +# Contents of your config.properties + +``` +coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery.uri=:8080 +``` + +**Note**: `jvm.config`, `node.properties` file is same for all the nodes (worker + coordinator). All the nodes should have different `node.id` --- End diff -- `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`. ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183031033 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations + + # Contents of your config.properties +``` +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery-server.enabled=true +discovery.uri=:8080 +``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +### Worker Configurations + +# Contents of your config.properties + +``` +coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery.uri=:8080 +``` + +**Note**: `jvm.config`, `node.properties` file is same for all the nodes (worker + coordinator). All the nodes should have different `node.id` + +### Catalog Configurations + +Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator. + +# Configuring Carbondata in Presto +* Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes. + +### Add Plugins + +* Create a directory named `carbondata` in plugin directory of presto +* Copy `carbondata` jars to `plugin/carbondata` directory on all nodes + +### Start Presto Server on all nodes + +``` +./presto-server-0.187/bin/launcher start +``` +To run it as a background process. + +``` +./presto-server-0.187/bin/launcher run +``` +To run it in foreground. + +### Start presto CLI +``` +./presto +``` +To connect to carbondata catalog use the following command: --- End diff -- To connect to carbondata catalog, use the following command: ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183027350 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto --- End diff -- We can make it heading 2 (##) and change the heading to "Installing Presto" ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183030457 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations + + # Contents of your config.properties +``` +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery-server.enabled=true +discovery.uri=:8080 +``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +### Worker Configurations --- End diff -- Heading 2 ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183027087 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata --- End diff -- Give a space after # ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183028214 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: --- End diff -- If this are Steps then can change from Bulleted points to Numbered point ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183027976 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata --- End diff -- Leave a space after # ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183030962 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +# Contents of your node.properties file + + ``` + node.environment=production + node.id= + node.data-dir=/home/ubuntu/data + ``` + +# Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +# Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations + + # Contents of your config.properties +``` +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery-server.enabled=true +discovery.uri=:8080 +``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +### Worker Configurations + +# Contents of your config.properties + +``` +coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery.uri=:8080 +``` + +**Note**: `jvm.config`, `node.properties` file is same for all the nodes (worker + coordinator). All the nodes should have different `node.id` + +### Catalog Configurations + +Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator. + +# Configuring Carbondata in Presto +* Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes. + +### Add Plugins + +* Create a directory named `carbondata` in plugin directory of presto --- End diff -- Procedure so change to numbered step ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183028486 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. --- End diff -- All 'presto' instances can be changed to title case 'Presto' ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183029300 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. --- End diff -- This is a procedure so change it to number point ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183028111 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto --- End diff -- We can change it to Heading 2 (##) and change the heading to "Installing Presto" ---
[GitHub] carbondata pull request #2199: [CARBONDATA-2370] Added document for presto m...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183028525 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + +``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + +``` +wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + +mv presto-cli-0.187-executable.jar presto + +chmod +x presto +``` + + ### Create configuration Files --- End diff -- Headin 2 (##) and make it title case ---
[GitHub] carbondata issue #2183: [Documentation] FAQ added for Why all executors are ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2183 @rahulforallp kindly review ---
[GitHub] carbondata pull request #2183: [Documentation] FAQ added for Why all executo...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2183 [Documentation] FAQ added for Why all executors are showing success in Spark UI even ⦠FAQ added for Why are all executors showing success in Spark UI even after Dataload command failed at Driver side? You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata Faq_dadaload Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2183.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2183 commit aca05e32a4acb12491b85e9989c82b3331f20c0f Author: sgururajshetty <sgururajshetty@...> Date: 2018-04-18T11:38:05Z FAQ added for Why all executors are showing success in Spark UI even after Dataload command failed at Driver side? ---
[GitHub] carbondata issue #1944: [Documentation] Added a FAQ for executor returning s...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1944 Closed as this FAQ is not needed ---
[GitHub] carbondata pull request #1944: [Documentation] Added a FAQ for executor retu...
Github user sgururajshetty closed the pull request at: https://github.com/apache/carbondata/pull/1944 ---
[GitHub] carbondata issue #2138: [CARBONDATA-2230][Documentation]add documentation fo...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2138 LGTM ---
[GitHub] carbondata pull request #2138: [CARBONDATA-2230][Documentation]add documenta...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2138#discussion_r179039767 --- Diff: docs/configuration-parameters.md --- @@ -39,6 +39,7 @@ This section provides the details of all the configurations required for the Car | carbon.streaming.auto.handoff.enabled | true | If this parameter value is set to true, auto trigger handoff function will be enabled.| | carbon.streaming.segment.max.size | 102400 | This parameter defines the maximum size of the streaming segment. Setting this parameter to appropriate value will avoid impacting the streaming ingestion. The value is in bytes.| | carbon.query.show.datamaps | true | If this parameter value is set to true, show tables command will list all the tables including datatmaps(eg: Preaggregate table), else datamaps will be excluded from the table list. | +| carbon.segment.lock.files.preserve.hours | 48 | This property value indicates the number of hours the segment lock files will be preserved after dataload. These lock fils will be deleted with clean files command after the configured amount of hours. | --- End diff -- Spelling error "fils" These lock files will be deleted with the clean command after the configured number of hours. ---
[GitHub] carbondata pull request #2116: [Documentation] The syntax and the example is...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2116 [Documentation] The syntax and the example is corrected Overwrite syntax and examples where corrected as it was throwing error You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata Syntax_correction Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2116.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2116 commit b9ad02b94f9ab8293143125cdf60bc4a2c7d515b Author: sgururajshetty <sgururajshetty@...> Date: 2018-03-30T12:14:05Z The syntax and the example is corrected as it was throwing error ---
[GitHub] carbondata pull request #2079: [Documentation] Editorial Review
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2079 [Documentation] Editorial Review Editorial Review You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata review1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2079 commit af6e717f895001fa44416b1cdabb97f948b08af7 Author: sgururajshetty <sgururajshetty@...> Date: 2018-03-20T05:31:11Z Spelling corrected ---
[GitHub] carbondata pull request #2067: [Documentation] Example added for Drop Partit...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2067 [Documentation] Example added for Drop Partition Example added for Drop Partition You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata exampleDropPartition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2067 commit ae1f161764b4464d3051dc750aa20139e0480b8d Author: sgururajshetty <sgururajshetty@...> Date: 2018-03-15T10:50:41Z Example added for Drop Partition ---
[GitHub] carbondata issue #2041: [CARBONDATA-2235]Update configuration-parameters.md
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2041 LGTM ---
[GitHub] carbondata pull request #2044: [Documentation] Updated Readme for Datamap Fe...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2044 [Documentation] Updated Readme for Datamap Feature Readme is updated with the links to the following new topics on Datamap > CarbonData Pre-aggregate DataMap > CarbonData Timeseries DataMap You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata Datamap Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2044 commit 574c69f93bb78cf09daf492634791a3ef30f27fd Author: sgururajshetty <sgururajshetty@...> Date: 2018-03-08T06:02:17Z Updated Readme for Datamap > CarbonData Pre-aggregate DataMap > CarbonData Timeseries DataMap ---
[GitHub] carbondata pull request #1992: [Documentation] Editorial review
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1992 [Documentation] Editorial review Editorial review You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata reviewed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1992.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1992 commit e9e4b92772eb0ea04b0d56dd7b94828f7648a657 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-23T11:35:17Z Editorial review ---
[GitHub] carbondata issue #1936: [CARBONDATA-2135] Documentation for Table comment an...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1936 @chenliang613 kindly review ---
[GitHub] carbondata issue #1944: [Documentation] Added a FAQ for executor returning s...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1944 @sraghunandan kindly review ---
[GitHub] carbondata issue #1954: [Documentation] Formatting issue fixed
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1954 LGTM ---
[GitHub] carbondata pull request #1944: [Documentation] Added a FAQ for executor retu...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1944 [Documentation] Added a FAQ for executor returning successful even after the query fa⦠dded a FAQ for executor returning successful even after the query failed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata faq1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1944.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1944 commit d32c887b4b683c2cdd4160bf9552e64f706a654c Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-07T11:24:57Z Added a FAQ for executor returning successful even after the query failed commit 411267affcfec7e6e6955de9435adfc9bb4497d9 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-07T11:27:54Z Fixed a link issue ---
[GitHub] carbondata issue #1938: [CARBONDATA-2138] Added documentation for HEADER opt...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1938 @QiangCai fixed the review comment. Kindly review and merge. ---
[GitHub] carbondata pull request #1938: [CARBONDATA-2138] Added documentation for HEA...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1938 [CARBONDATA-2138] Added documentation for HEADER option while loading data Added documentation for HEADER option in load data You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2138 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1938.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1938 commit 20ec4b85d4bf41d1ee99402a8e59aa4ad5d1f08f Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-06T15:28:59Z Added documentation for HEADER option while loading data ---
[GitHub] carbondata pull request #1936: [CARBONDATA-2135] Documentation for Table com...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1936 [CARBONDATA-2135] Documentation for Table comment and Column Comment Documentation for table comment and column comment You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2135 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1936.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1936 commit c78d3a56f209dea3259e0af147a96641de4e9c0f Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-06T10:36:42Z Documentation for Table comment and Column Comment ---
[GitHub] carbondata pull request #1927: [CARBONDATA-2128] Documentation for table pat...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1927 [CARBONDATA-2128] Documentation for table path while creating the table Documentation for table path while creating the table You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2128 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1927 commit c640768a31f97cb5278d1804a9ceba8283889726 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-03T15:50:41Z Documentation for table path while creating the table ---
[GitHub] carbondata pull request #1926: [CARBONDATA-2127] Documentation for Hive Stan...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1926 [CARBONDATA-2127] Documentation for Hive Standard Partition Documentation for Hive Standard Partition You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2127 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1926.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1926 commit f97d0b5eca4d06d40d14436c7b62d93691d77e58 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-03T15:34:23Z Documentation for Hive Standard Partition ---
[GitHub] carbondata issue #1925: [CARBONDATA-2126] Documentation for Create database ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1925 Closed ---
[GitHub] carbondata pull request #1925: [CARBONDATA-2126] Documentation for Create da...
Github user sgururajshetty closed the pull request at: https://github.com/apache/carbondata/pull/1925 ---
[GitHub] carbondata pull request #1925: [CARBONDATA-2126] Documentation for Create da...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1925 [CARBONDATA-2126] Documentation for Create database and custom location Link issue fixed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1925.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1925 commit d66529dd9cca066b2030f3926a8d8c2945016cfc Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-03T13:08:10Z Documentation for create database and custom location commit 241ba99b939ea3b14150485d016794b761d2fe50 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-03T14:35:55Z Link issue fixed ---
[GitHub] carbondata pull request #1923: [CARBONDATA-2126] Documentation for create da...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1923 [CARBONDATA-2126] Documentation for create database and custom location Added documentation for create database and also specify the custom location You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1923 commit d66529dd9cca066b2030f3926a8d8c2945016cfc Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-03T13:08:10Z Documentation for create database and custom location ---
[GitHub] carbondata pull request #1907: Data types
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1907 Data types Spelling fixed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata data_types Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1907.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1907 commit e9fa0a2a09b89bbef10ea7a635ea929c54a58edc Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-01T15:00:12Z The supported datatype mentioned for dictionary exclude and sort columns commit 0a0af9d38359414cc194f8a9e2f6c5850c88e9f9 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-01T15:03:34Z Spelling correction ---
[GitHub] carbondata pull request #1906: [CARBONDATA-2116] Documentation for CTAS
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1906 [CARBONDATA-2116] Documentation for CTAS Added the documentation for CTAS You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2116 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1906.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1906 commit 7527f15e021613df89efc97375ce293898abf9e1 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-01T14:34:54Z Documentation for CTAS ---
[GitHub] carbondata pull request #1905: [CARBONDATA-2115] Scenarios in which aggregat...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/1905 [CARBONDATA-2115] Scenarios in which aggregate query is not fetching ⦠Added the FAQ on Scenarios in which aggregate query is not fetching You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 2115 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1905.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1905 commit 4d96e0f8f48f64a261f63eca6e60ae385f70f872 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-01T12:29:17Z [CARBONDATA-2115] Scenarios in which aggregate query is not fetching data from aggregate table ---