[02/47] carbondata git commit: [CARBONDATA-2793][32k][Doc] Add 32k support in document

ravipesala Thu, 09 Aug 2018 11:26:35 -0700

[CARBONDATA-2793][32k][Doc] Add 32k support in document

This closes #2572



Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b2972ce6
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b2972ce6
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b2972ce6

Branch: refs/heads/branch-1.4
Commit: b2972ce60d7c733eb6269262a6032fc3271ba3cf
Parents: 5cfb1c1
Author: xuchuanyin <xuchuan...@hust.edu.cn>
Authored: Fri Jul 27 16:10:44 2018 +0800
Committer: ravipesala <ravi.pes...@gmail.com>
Committed: Thu Aug 9 23:38:51 2018 +0530

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md      | 48 +++++++++++++++++++------
 docs/supported-data-types-in-carbondata.md |  3 ++
 2 files changed, 40 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/b2972ce6/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md 
b/docs/data-management-on-carbondata.md
index 6aaaaa3..836fff9 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -137,7 +137,7 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
           
      | Properties | Default value | Description |
      | ---------- | ------------- | ----------- |
-     | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not 
be enabled for the table | 
+     | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not 
be enabled for the table |
      | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local 
dictionary generation (range- 1000 to 100000) |
      | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | 
Columns for which Local Dictionary is generated. |
      | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is 
not generated |
@@ -240,11 +240,11 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
         ```
         
    - **Caching at Block or Blocklet Level**
-   
+
      This feature allows you to maintain the cache at Block level, resulting 
in optimized usage of the memory. The memory consumption is high if the 
Blocklet level caching is maintained as a Block can have multiple Blocklet.
         
         Following are the valid values for CACHE_LEVEL:
-        
+
         *Configuration for caching in driver at Block level (default value).*
         
         ```
@@ -285,21 +285,47 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
         ```
         ALTER TABLE employee SET TBLPROPERTIES 
(âCACHE_LEVELâ=âBlockletâ)
         ```
-        
-        - **Support Flat folder same as Hive/Parquet**
-        
+
+    - **Support Flat folder same as Hive/Parquet**
+
          This feature allows all carbondata and index files to keep directy 
under tablepath. Currently all carbondata/carbonindex files written under 
tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet 
folder structure. This feature makes all files written will be directly under 
tablepath, it does not maintain any segment folder structure.This is useful for 
interoperability between the execution engines and plugin with other execution 
engines like hive or presto becomes easier.
-         
+
          Following table property enables this feature and default value is 
false.
          ```
           'flat_folder'='true'
-         ``` 
+         ```
          Example:
          ```
          CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES ('flat_folder'='true')
          ```
-         
-        
+
+    - **String longer than 32000 characters**
+
+     In common scenarios, the length of string is less than 32000,
+     so carbondata stores the length of content using Short to reduce memory 
and space consumption.
+     To support string longer than 32000 characters, carbondata introduces a 
table property called `LONG_STRING_COLUMNS`.
+     For these columns, carbondata internally stores the length of content 
using Integer.
+
+     You can specify the columns as 'long string column' using below 
tblProperties:
+
+     ```
+     // specify col1, col2 as long string columns
+     TBLPROPERTIES ('LONG_STRING_COLUMNS'='col1,col2')
+     ```
+
+     Besides, you can also use this property through DataFrame by
+     ```
+     df.format("carbondata")
+       .option("tableName", "carbonTable")
+       .option("long_string_columns", "col1, col2")
+       .save()
+     ```
+
+     If you are using Carbon-SDK, you can specify the datatype of long string 
column as `varchar`.
+     You can refer to SDKwriterTestCase for example.
+
+     **NOTE:** The LONG_STRING_COLUMNS can only be string/char/varchar columns 
and cannot be dictionary_include/sort_columns/complex columns.
+
 ## CREATE TABLE AS SELECT
   This function allows user to create a Carbon table from any of the 
Parquet/Hive/Carbon table. This is beneficial when the user wants to create 
Carbon table from any other Parquet/Hive table and use the Carbon query engine 
to query and achieve better query results for cases where Carbon is faster than 
other file formats. Also this feature can be used for backing up the data.
 
@@ -745,7 +771,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   * If the FORCE option is used, then it auto-converts the data by storing the 
bad records as NULL before Loading data.
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is 
invalid and the load operation fails.
-  * The maximum number of characters per column is 32000. If there are more 
than 32000 characters in a column, data loading will fail.
+  * The default maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, please refer to *String longer than 
32000 characters* section.
 
   Example:
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/b2972ce6/docs/supported-data-types-in-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/supported-data-types-in-carbondata.md 
b/docs/supported-data-types-in-carbondata.md
index 7260afe..eb74a2e 100644
--- a/docs/supported-data-types-in-carbondata.md
+++ b/docs/supported-data-types-in-carbondata.md
@@ -35,6 +35,9 @@
     * CHAR
     * VARCHAR
 
+    **NOTE**: For string longer than 32000 characters, use 
`LONG_STRING_COLUMNS` in table property.
+    Please refer to TBLProperties in 
[CreateTable](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
 for more information.
+
   * Complex Types
     * arrays: ARRAY``<data_type>``
     * structs: STRUCT``<col_name : data_type COMMENT col_comment, ...>``

[02/47] carbondata git commit: [CARBONDATA-2793][32k][Doc] Add 32k support in document

Reply via email to