[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-11-14 Thread vandana7
Github user vandana7 closed the pull request at:

https://github.com/apache/carbondata/pull/1205


---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-02 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130924110
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
+
+* NO_SORT : Feasible if we want to load our data in unsorted 
manner.
--- End diff --

not we


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-02 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130924023
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
--- End diff --

not per task, it is sorting in whole cluster


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-02 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130923792
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
--- End diff --

mention this is sort based on memory size, and carbon will create one index 
for each batch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-02 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130923366
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
+
+* NO_SORT : Feasible if we want to load our data in unsorted 
manner.
--- End diff --

Introduce this first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-01 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130581766
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
+
+* NO_SORT : Feasible if we want to load our data in unsorted 
manner.
+
+For BATCH_SORT:
+
+```
+OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+```
+
+You can also specify the sort size option for sort scope.
+
+```
+OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+```
+
+Note :
+
+* batch_sort_size_inmb : Size of data in MB to be processed in batch. 
By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in 
MB available for in-memory sort).
+
+For GLOBAL_SORT :
--- End diff --

I mean that if SORT_SCOPE=GLOBAL_SORT,single_pass must be false


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-01 Thread vandana7
Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130578370
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
+
+* NO_SORT : Feasible if we want to load our data in unsorted 
manner.
+
+For BATCH_SORT:
+
+```
+OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+```
+
+You can also specify the sort size option for sort scope.
+
+```
+OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+```
+
+Note :
+
+* batch_sort_size_inmb : Size of data in MB to be processed in batch. 
By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in 
MB available for in-memory sort).
+
+For GLOBAL_SORT :
--- End diff --

Hi,
I tried to execute the LOAD query with single_pass= 'true' and 
sort_scope='BATCH_SORT', it successfully executed  and i was able to fetch the 
records in sorted way
syntax i used to execute load query - LOAD DATA INPATH 
'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata 
OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','SORT_SCOPE'='BATCH_SORT','batch_sort_size_inmb'='7');


Please let me know if i am doing anything wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-08-01 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130534145
  
--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable 
property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+* BATCH_SORT : The sorting scope is smaller and more index tree will 
be created,thus loading is faster but query maybe slower.
+
+* LOCAL_SORT : The sorting scope is bigger and one index tree per data 
node will be created, thus loading is slower but query is faster.
+
+* GLOBAL_SORT : The sorting scope is bigger and one index tree per 
task will be created, thus loading is slower but query is faster.
+
+* NO_SORT : Feasible if we want to load our data in unsorted 
manner.
+
+For BATCH_SORT:
+
+```
+OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+```
+
+You can also specify the sort size option for sort scope.
+
+```
+OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+```
+
+Note :
+
+* batch_sort_size_inmb : Size of data in MB to be processed in batch. 
By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in 
MB available for in-memory sort).
+
+For GLOBAL_SORT :
--- End diff --

Suggestion: add below note:
`'SINGLE_PASS' must be false.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

2017-07-28 Thread vandana7
GitHub user vandana7 opened a pull request:

https://github.com/apache/carbondata/pull/1205

[CARBONDATA-1086] updated configuration-parameters.md and 
dml-operation-on-carbondata for SORT_SCOPE




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vandana7/incubator-carbondata 
sort_scope_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1205


commit 31f237e9c46247b60f84917da3c0368a5114531b
Author: vandana 
Date:   2017-07-28T07:07:07Z

 updated configuration-parameters.md and dml-operation-on-carbondata.md for 
sort_scope feature




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---