[GitHub] carbondata issue #2363: [WIP][CARBONDATA-2591] SDK CarbonReader support filt...

2018-06-06 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2363
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5236/



---


[GitHub] carbondata issue #2363: [WIP][CARBONDATA-2591] SDK CarbonReader support filt...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2363
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6266/



---


[GitHub] carbondata issue #2363: [WIP][CARBONDATA-2591] SDK CarbonReader support filt...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2363
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5104/



---


[GitHub] carbondata pull request #2363: [WIP][CARBONDATA-2591] SDK CarbonReader suppo...

2018-06-06 Thread xubo245
GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/2363

[WIP][CARBONDATA-2591] SDK CarbonReader support filter

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NO
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
Yes
 - [ ] Testing done
   add some test case
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 

No

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata 
CARBONDATA-2591-CarbonReaderFilter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2363


commit f7407681697be7570089c9936ecc9df1f457af63
Author: xubo245 
Date:   2018-06-07T03:34:56Z

[CARBONDATA-2591] SDK CarbonReader support filter




---


[jira] [Created] (CARBONDATA-2591) SDK CarbonReader support filter

2018-06-06 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-2591:
---

 Summary: SDK CarbonReader support filter
 Key: CARBONDATA-2591
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2591
 Project: CarbonData
  Issue Type: Improvement
Reporter: xubo245
Assignee: xubo245


SDK CarbonReader support filter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement

2018-06-06 Thread Akash R Nilugal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal reassigned CARBONDATA-2585:
---

Assignee: Akash R Nilugal

> Support Adding Local Dictionary configuration in Create table statement
> ---
>
> Key: CARBONDATA-2585
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2585
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Allow user to pass local dictionary configuration in Create table statement.
> {color:#00}*ENABLE_LOCAL_DICT*{color}
> {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
> {color:#00}CREATE TABLE carbontable({color}
> {color:#00} column1 string,{color}
> {color:#00} column2 string,{color}
> {color:#00} column3 LONG ){color}
> {color:#00} STORED BY 'carbondata'{color}
> {color:#00}TBLPROPERTIES('{color}{color:#00}*ENABLE_LOCAL_DICT*{color}{color:#00}'='{color}{color:#00}*true*{color}{color:#00}',{color}{color:#00}*CARBON_LOCALDICT_THRESHOLD=1000'*{color}{color:#00}){color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-06 Thread Akash R Nilugal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal reassigned CARBONDATA-2586:
---

Assignee: Akash R Nilugal

> Support Showing local dictionary configuration in desc formatted command
> 
>
> Key: CARBONDATA-2586
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Support Showing local dictionary parameter in Desc formatted command
>  # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
>  # {color:#00}*ENABLE_LOCAL_DICT*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2207: [CARBONDATA-2428] Support flat folder for managed ca...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2207
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5103/



---


[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-06-06 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal updated CARBONDATA-2584:
-
Attachment: CarbonData Local Dictionary Support Design Doc.docx

> CarbonData Local Dictionary Support
> ---
>
> Key: CARBONDATA-2584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: kumar vishal
>Priority: Major
> Attachments: CarbonData Local Dictionary Support Design Doc.docx
>
>
> Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
> stored in LV format) for storing dimension column data.
> *Bottleneck with Global Dictionary*
> It’s difficult for user to determine whether the column should be dictionary 
> or not if number of columns in table is high.
> Global dictionary generation generally slows down the load process.
> Multiple IO operations are made during load even though dictionary already 
> exists.
> During query, multiple IO operations done for reading dictionary files and 
> carbondata files.
> *Bottleneck with No-Dictionary*
> Storage size is high as we store the data in LV format
> Query on No-Dictionary column is slower as data read/processed is more
> Filtering is slower on No-Dictionary columns as number of comparison is high
> Memory footprint is high
> *The above bottlenecks can be solved by generating dictionary for low 
> cardinality columns at each blocklet level, which will help to achieve below 
> benefits:*
> Reduces the extra IO operations read/write on the dictionary files generated 
> in case of global dictionary.
> It will eliminate the problem for user to identify the dictionary columns 
> when the number of columns are more in a table.
> It helps in getting more compression on dimension columns with less 
> cardinality.
> Filter queries and full scan queries on No-dictionary columns with local 
> dictionary will be faster as filter will be done on encoded data.
> It will help in reducing the store size and memory footprint as only unique 
> values will be stored {color:#00}as {color}part of local dictionary and 
> corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2207: [CARBONDATA-2428] Support flat folder for managed ca...

2018-06-06 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2207
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5235/



---


[GitHub] carbondata issue #2362: [CARBONDATA-2578] fixed memory leak inside CarbonRea...

2018-06-06 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2362
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5234/



---


[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5102/



---


[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-06 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5233/



---


[GitHub] carbondata issue #2207: [CARBONDATA-2428] Support flat folder for managed ca...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2207
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6265/



---


[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6263/



---


[jira] [Created] (CARBONDATA-2590) Support Query on Local dictionary Complex type column

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2590:


 Summary: Support Query on Local dictionary Complex type column
 Key: CARBONDATA-2590
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2590
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Support query on local dictionary generated complex type columns



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2589) Support Query on Local dictionary columns

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2589:


 Summary: Support Query on Local dictionary columns
 Key: CARBONDATA-2589
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2589
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Support Query on local dictionary generated column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2588) Support Local dictionary in data loading with complex type columns

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2588:


 Summary: Support Local dictionary in data loading with complex 
type columns
 Key: CARBONDATA-2588
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2588
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Generate local dictionary for complex type primitive columns(no dictionary low 
cardinality column) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2587) Support Local dictionary in data loading

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2587:


 Summary: Support Local dictionary in data loading
 Key: CARBONDATA-2587
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2587
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Support local dictionary in data loading for low cardinality no dictionary 
string data type column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2586:


 Summary: Support Showing local dictionary configuration in desc 
formatted command
 Key: CARBONDATA-2586
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Support Showing local dictionary parameter in Desc formatted command
 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
 # {color:#00}*ENABLE_LOCAL_DICT*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2585:


 Summary: Support Adding Local Dictionary configuration in Create 
table statement
 Key: CARBONDATA-2585
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2585
 Project: CarbonData
  Issue Type: Sub-task
Reporter: kumar vishal


Allow user to pass local dictionary configuration in Create table statement.

{color:#00}*ENABLE_LOCAL_DICT*{color}

{color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

{color:#00}CREATE TABLE carbontable({color}

{color:#00} column1 string,{color}

{color:#00} column2 string,{color}

{color:#00} column3 LONG ){color}

{color:#00} STORED BY 'carbondata'{color}

{color:#00}TBLPROPERTIES('{color}{color:#00}*ENABLE_LOCAL_DICT*{color}{color:#00}'='{color}{color:#00}*true*{color}{color:#00}',{color}{color:#00}*CARBON_LOCALDICT_THRESHOLD=1000'*{color}{color:#00}){color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-06-06 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-2584:


 Summary: CarbonData Local Dictionary Support
 Key: CARBONDATA-2584
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
 Project: CarbonData
  Issue Type: New Feature
Reporter: kumar vishal


Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
stored in LV format) for storing dimension column data.

*Bottleneck with Global Dictionary*

It’s difficult for user to determine whether the column should be dictionary or 
not if number of columns in table is high.

Global dictionary generation generally slows down the load process.

Multiple IO operations are made during load even though dictionary already 
exists.

During query, multiple IO operations done for reading dictionary files and 
carbondata files.

*Bottleneck with No-Dictionary*

Storage size is high as we store the data in LV format

Query on No-Dictionary column is slower as data read/processed is more

Filtering is slower on No-Dictionary columns as number of comparison is high

Memory footprint is high

*The above bottlenecks can be solved by generating dictionary for low 
cardinality columns at each blocklet level, which will help to achieve below 
benefits:*

Reduces the extra IO operations read/write on the dictionary files generated in 
case of global dictionary.

It will eliminate the problem for user to identify the dictionary columns when 
the number of columns are more in a table.

It helps in getting more compression on dimension columns with less cardinality.

Filter queries and full scan queries on No-dictionary columns with local 
dictionary will be faster as filter will be done on encoded data.

It will help in reducing the store size and memory footprint as only unique 
values will be stored {color:#00}as {color}part of local dictionary and 
corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2207: [CARBONDATA-2428] Support flat folder for managed ca...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2207
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6262/



---


[GitHub] carbondata issue #2207: [CARBONDATA-2428] Support flat folder for managed ca...

2018-06-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2207
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5100/



---