[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

2018-02-01 Thread sgururajshetty
Github user sgururajshetty closed the pull request at:

https://github.com/apache/carbondata/pull/1898


---


[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

2018-01-31 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1898#discussion_r165247013
  
--- Diff: docs/configuration-parameters.md ---
@@ -60,6 +60,7 @@ This section provides the details of all the 
configurations required for CarbonD
 | carbon.options.is.empty.data.bad.record | false | If false, then empty 
("" or '' or ,,) data will not be considered as bad record and vice versa. | |
 | carbon.options.bad.record.path |  | Specifies the HDFS path where bad 
records are stored. By default the value is Null. This path must to be 
configured by the user if bad record logger is enabled or bad record action 
redirect. | |
 | carbon.enable.vector.reader | true | This parameter increases the 
performance of select queries as it fetch columnar batch of size 4*1024 rows 
instead of fetching data row by row. | |
+| carbon.task.distribution | merge_small_files | Setting this parameter 
value to *merge_small_files* will merge all the small files to a size of (128 
MB). During data loading, all the small CSV files are combined to a map task to 
reduce the number of read task. This enhances the performance. | | 
--- End diff --

1. carbon.task.distribution is only for the query, not be used by data 
loading.
Global_Sort loading will always merge small CSV files, not require this 
configuration.
2. better to list all values of carbon.task.distribution
custom, block(default), blocklet, merge_small_files


---


[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

2018-01-31 Thread sgururajshetty
GitHub user sgururajshetty opened a pull request:

https://github.com/apache/carbondata/pull/1898

[CARBONDATA-1880] Documentation for merging small files

Added the documentation for merging small file for better performance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sgururajshetty/carbondata 1880

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1898


commit 313586c0bff34405672339d9819260146ae61816
Author: sgururajshetty 
Date:   2018-01-31T13:55:16Z

Documentation for small files




---