[carbondata] branch master updated: [CARBONDATA-3791] Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.

ajantha Tue, 05 May 2020 09:01:19 -0700

This is an automated email from the ASF dual-hosted git repository.

ajantha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new fbf311c  [CARBONDATA-3791] Correct spelling, query, default value, in 
performance-tuning, prestodb and prestosql documentation.
fbf311c is described below

commit fbf311c83c3e3523c4150133f320c7bcd39b82b8
Author: Nihal kumar ojha <nihalnit...@gmail.com>
AuthorDate: Mon May 4 10:12:45 2020 +0530

    [CARBONDATA-3791] Correct spelling, query, default value, in 
performance-tuning, prestodb and prestosql documentation.
    
    Why is this PR needed?
    Correct spelling, query, default value, in performance-tuning, prestodb and 
prestosql documentation.
    
    What changes were proposed in this PR?
    Corrected spelling, query, default value, in performance-tuning, prestodb 
and prestosql documentation.
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3737
---
 docs/performance-tuning.md | 14 ++++++++------
 docs/prestodb-guide.md     | 13 ++++++++-----
 docs/prestosql-guide.md    | 14 +++++++++-----
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/docs/performance-tuning.md b/docs/performance-tuning.md
index f485388..05352db 100644
--- a/docs/performance-tuning.md
+++ b/docs/performance-tuning.md
@@ -54,7 +54,7 @@
     BEGIN_TIME bigint,
     HOST String,
     Dime_1 String,
-    counter_1, Decimal
+    counter_1 Decimal,
     ...
     
     )STORED AS carbondata
@@ -79,7 +79,7 @@
       BEGIN_TIME bigint,
       HOST String,
       Dime_1 String,
-      counter_1, Decimal
+      counter_1 Decimal,
       ...
       
       )STORED AS carbondata
@@ -128,6 +128,9 @@
 
   **NOTE:**
   + BloomFilter can be created to enhance performance for queries with precise 
equal/in conditions. You can find more information about it in BloomFilter 
index [document](./index/bloomfilter-index-guide.md).
+  + Lucene index can be created on string columns which has content of more 
length to enhance the query performance. You can find more information about it 
in Lucene index [document](./index/lucene-index-guide.md).
+  + Secondary index can be created based on the column position in main 
table(Recommended for right columns) and the queries should have filter on that 
column to improve the filter query performance. You can find more information 
about it in secondary index [document](./index/secondary-index-guide.md).
+  + Materialized view can be created to improve query performance provided the 
storage requirements and loading time is acceptable. You can find more 
information about it in materialized view [document](./mv-guide.md).
 
 
 ## Configuration for Optimizing Data Loading performance for Massive Data
@@ -141,12 +144,12 @@
 | Parameter | Default Value | Description/Tuning |
 |-----------|-------------|--------|
 |carbon.number.of.cores.while.loading|Default: 2. This value should be >= 
2|Specifies the number of cores used for data processing during data loading in 
CarbonData. |
-|carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to 
write local file in sort step when loading data|
-|carbon.sort.file.write.buffer.size|Default:  16384.|CarbonData sorts and 
writes data to intermediate files to limit the memory usage. This configuration 
determines the buffer size to be used for reading and writing such files. |
+|carbon.sort.size|Default: 100000. The value should be >= 1000.|Threshold to 
write local file in sort step when loading data|
+|carbon.sort.file.write.buffer.size|Default:  16384. The value should be >= 
10240 and <= 10485760.|CarbonData sorts and writes data to intermediate files 
to limit the memory usage. This configuration determines the buffer size to be 
used for reading and writing such files. |
 |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores 
used for temp file merging during data loading in CarbonData.|
 |carbon.merge.sort.prefetch|Default: true | You may want set this value to 
false if you have not enough memory|
 
-  For example, if there are 10 million records, and i have only 16 cores, 64GB 
memory, will be loaded to CarbonData table.
+  For example, if there are 10 million records, and I have only 16 cores, 64 
GB memory will be loaded to CarbonData table.
   Using the default configuration  always fail in sort step. Modify 
carbon.properties as suggested below:
 
   ```
@@ -172,7 +175,6 @@
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | 
Whether use YARN local directories for multi-table load disk load balance | If 
this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
 | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. Specially, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
 | carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB to 1GB. |
-| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the nodes 
load the minimum amount of data -- It's useful if the size of your input data 
files very small, say 1MB to 256MB,Avoid generating a large number of small 
files. |
 
   Note: If your CarbonData instance is provided only for query, you may 
specify the property 'spark.speculation=true' which is in conf directory of 
spark.
 
diff --git a/docs/prestodb-guide.md b/docs/prestodb-guide.md
index b048d9d..7b2b2a9 100644
--- a/docs/prestodb-guide.md
+++ b/docs/prestodb-guide.md
@@ -28,8 +28,8 @@ This tutorial provides a quick introduction to using current 
integration/presto
 ### Installing Presto
 
 To know about which version of presto is supported by this version of carbon, 
visit 
-https://github.com/apache/carbondata/blob/master/integration/presto/pom.xml
-and look for ```<presto.version>```
+https://github.com/apache/carbondata/blob/master/pom.xml
+and look for ```<presto.version>``` inside `prestodb` profile.
 
 _Example:_ 
   `<presto.version>0.217</presto.version>`
@@ -139,11 +139,14 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 
 ##### Configuring Carbondata in Presto
 1. Create a file named `carbondata.properties` in the `catalog` folder and set 
the required properties on all the nodes.
+2. As carbondata connector extends hive connector all the 
configurations(including S3) is same as hive connector.
+Just replace the connector name in hive configuration and copy same to 
carbondata.properties
+`connector.name = carbondata` 
 
 ### Add Plugins
 
 1. Create a directory named `carbondata` in plugin directory of presto.
-2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+2. Copy all the jars from 
../integration/presto/target/carbondata-presto-X.Y.Z-SNAPSHOT to 
`plugin/carbondata` directory on all nodes.
 
 ### Start Presto Server on all nodes
 
@@ -295,6 +298,6 @@ carbondata files.
 
 ### Supported features of presto carbon
 Presto carbon only supports reading the carbon table which is written by spark 
carbon or carbon SDK. 
-During reading, it supports the non-distributed datamaps like block datamap 
and bloom datamap.
+During reading, it supports the non-distributed indexes like block index and 
bloom index.
 It doesn't support Materialized View as it needs query plan to be changed and 
presto does not allow it.
-Also Presto carbon supports streaming segment read from streaming table 
created by spark.
+Also, Presto carbon supports streaming segment read from streaming table 
created by spark.
diff --git a/docs/prestosql-guide.md b/docs/prestosql-guide.md
index 8832b7a..11bb385 100644
--- a/docs/prestosql-guide.md
+++ b/docs/prestosql-guide.md
@@ -28,8 +28,8 @@ This tutorial provides a quick introduction to using current 
integration/presto
 ### Installing Presto
 
 To know about which version of presto is supported by this version of carbon, 
visit 
-https://github.com/apache/carbondata/blob/master/integration/presto/pom.xml
-and look for ```<presto.version>```
+https://github.com/apache/carbondata/blob/master/pom.xml
+and look for ```<presto.version>``` inside `prestosql` profile.
 
 _Example:_ 
   `<presto.version>316</presto.version>`
@@ -139,11 +139,15 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 
 ##### Configuring Carbondata in Presto
 1. Create a file named `carbondata.properties` in the `catalog` folder and set 
the required properties on all the nodes.
+2. As carbondata connector extends hive connector all the 
configurations(including S3) is same as hive connector.
+Just replace the connector name in hive configuration and copy same to 
carbondata.properties
+`connector.name = carbondata`
 
 ### Add Plugins
 
 1. Create a directory named `carbondata` in plugin directory of presto.
-2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+2. Copy all the jars from 
../integration/presto/target/carbondata-presto-X.Y.Z-SNAPSHOT to 
`plugin/carbondata` directory on all nodes.
+
 
 ### Start Presto Server on all nodes
 
@@ -294,6 +298,6 @@ carbondata files.
 
 ### Supported features of presto carbon
 Presto carbon only supports reading the carbon table which is written by spark 
carbon or carbon SDK. 
-During reading, it supports the non-distributed datamaps like block datamap 
and bloom datamap.
+During reading, it supports the non-distributed index like block index and 
bloom index.
 It doesn't support Materialized View as it needs query plan to be changed and 
presto does not allow it.
-Also Presto carbon supports streaming segment read from streaming table 
created by spark.
+Also, Presto carbon supports streaming segment read from streaming table 
created by spark.

[carbondata] branch master updated: [CARBONDATA-3791] Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.

Reply via email to