[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6274/



---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7519/



---


[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...

2018-07-25 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2560
  
retest this please


---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-25 Thread praveenmeenakshi56
Github user praveenmeenakshi56 commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
retest this please


---


[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...

2018-07-25 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2555
  
retest sdv please


---


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-25 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
retest sdv please



---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-25 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
retest this please



---


[GitHub] carbondata pull request #2528: [CARBONDATA-2767][CarbonStore] Fix task local...

2018-07-25 Thread QiangCai
Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/2528


---


[GitHub] carbondata pull request #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update d...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2554


---


[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2561
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6271/



---


[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2561
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7516/



---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6272/



---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7517/



---


[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2562
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7515/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6269/



---


[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2520
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6266/



---


[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2562
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6270/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7514/



---


[GitHub] carbondata pull request #2562: [HOTFIX] CreateDataMapPost Event was skipped ...

2018-07-25 Thread jatin9896
GitHub user jatin9896 opened a pull request:

https://github.com/apache/carbondata/pull/2562

[HOTFIX] CreateDataMapPost Event was skipped in case of preaggregate datamap


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? No
 
 - [ ] Any backward compatibility impacted? No
 
 - [ ] Document update required? No

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required? NA
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jatin9896/incubator-carbondata hotfix1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2562.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2562


commit bfa56deb66175c40a75cdeabab68990fa3d7d58f
Author: Jatin 
Date:   2018-07-25T19:12:50Z

hotfix : CreateDataMapPost Event was skipped in case of preaggregate datamap




---


[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...

2018-07-25 Thread ajantha-bhat
Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2561
  
@kumarvishal09 : please review


---


[GitHub] carbondata pull request #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever b...

2018-07-25 Thread ajantha-bhat
GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2561

[CARBONDATA-2784][SDK writer] Fixed:Forever blocking wait with more than 21 
batch of data

**problem:**
[SDK writer] Forever blocking wait with more than 21 batch of data, when 
consumer is dead due to data loading exception (bad record / out of memory)

**root cause:**
When the consumer is dead due to data loading exception, writer will be 
forcefully closed. but queue.clear() cleared only snapshot of entries (10 
batches) and close is set to true after that. In between clear() and close = 
true, If more than 10 batches of data is again put into queue. For 11th batch, 
queue.put() goes for forever block as consumer is dead.

**Solution:**
set close = true, before clearing the queue. This will avoid adding more 
batches to queue from write().

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? NA
 
 - [ ] Any backward compatibility impacted? NA
 
 - [ ] Document update required? NA

 - [ ] Testing done.

 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2561


commit ebbe1ef21a1864c3b179ebfb5f0b5d1e2812ef24
Author: ajantha-bhat 
Date:   2018-07-25T19:05:36Z

[CARBONDATA-2784][SDK writer] Forever blocking wait with more than 20 batch 
of data, when consumer is dead due to data loading exception

problem:

[SDK writer] Forever blocking wait with more than 21 batch of data, when
consumer is dead due to data loading exception (bad record / out of
memory)

root cause:

When the consumer is dead due to data loading exception, writer will be
forcefully closed. but queue.clear() cleared only snapshot of entries
(10 batches) and close is set to true after that. In between clear() and
close = true, If more than 10 batches of data is again put into queue.
For 11th batch, queue.put() goes for forever block as consumer is dead.

Solution:

set close = true, before clearing the queue. This will avoid adding more
batches to queue from write().




---


[jira] [Created] (CARBONDATA-2784) [SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception

2018-07-25 Thread Ajantha Bhat (JIRA)
Ajantha Bhat created CARBONDATA-2784:


 Summary: [SDK writer] Forever blocking wait with more than 20 
batch of data, when consumer is dead due to data loading exception 
 Key: CARBONDATA-2784
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2784
 Project: CarbonData
  Issue Type: Bug
Reporter: Ajantha Bhat
Assignee: Ajantha Bhat


problem:

[SDK writer] Forever blocking wait with more than 21 batch of data, when 
consumer is dead due to data loading exception (bad record / out of memory)

 

root cause:

When the consumer is dead due to data loading exception, writer will be 
forcefully closed. but queue.clear() cleared only snapshot of entries (10 
batches) and close is set to true after that. In between clear() and close = 
true, If more than 10 batches of data is again put into queue. For 11th batch, 
queue.put() goes for forever block as consumer is dead.

 

Solution:

set close = true, before clearing the queue. This will avoid adding more 
batches to queue from write().

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2520
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7512/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7513/



---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread dhatchayani
Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
refactored the code in #2559. So closing this PR


---


[GitHub] carbondata pull request #2549: [CARBONDATA-2606][Complex DataType Enhancemen...

2018-07-25 Thread dhatchayani
Github user dhatchayani closed the pull request at:

https://github.com/apache/carbondata/pull/2549


---


[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...

2018-07-25 Thread dhatchayani
Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2555
  
retest sdv please


---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6265/



---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7511/



---


[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
@ravipesala please review the changes in BlockDatamap


---


[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2560
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6262/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5997/



---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
retest this please


---


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-25 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
retest sdv please 


---


[jira] [Updated] (CARBONDATA-2779) Filter query is failing for store created with V1/V2 format

2018-07-25 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta updated CARBONDATA-2779:
-
Issue Type: Bug  (was: Improvement)

> Filter query is failing for store created with V1/V2 format
> ---
>
> Key: CARBONDATA-2779
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2779
> Project: CarbonData
>  Issue Type: Bug
>Reporter: kumar vishal
>Assignee: kumar vishal
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Filter query is failing for store created with V1/V2 format with 
> Arrayindexoutofbound exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2779) Filter query is failing for store created with V1/V2 format

2018-07-25 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta resolved CARBONDATA-2779.
--
   Resolution: Fixed
Fix Version/s: 1.4.1

> Filter query is failing for store created with V1/V2 format
> ---
>
> Key: CARBONDATA-2779
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2779
> Project: CarbonData
>  Issue Type: Bug
>Reporter: kumar vishal
>Assignee: kumar vishal
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Filter query is failing for store created with V1/V2 format with 
> Arrayindexoutofbound exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6264/



---


[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-07-25 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal updated CARBONDATA-2584:
-
Attachment: CarbonData Local Dictionary Support Design Doc(2).docx

> CarbonData Local Dictionary Support
> ---
>
> Key: CARBONDATA-2584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: kumar vishal
>Priority: Major
> Attachments: CarbonData Local Dictionary Support Design Doc(2).docx
>
>
> Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
> stored in LV format) for storing dimension column data.
> *Bottleneck with Global Dictionary*
> It’s difficult for user to determine whether the column should be dictionary 
> or not if number of columns in table is high.
> Global dictionary generation generally slows down the load process.
> Multiple IO operations are made during load even though dictionary already 
> exists.
> During query, multiple IO operations done for reading dictionary files and 
> carbondata files.
> *Bottleneck with No-Dictionary*
> Storage size is high as we store the data in LV format
> Query on No-Dictionary column is slower as data read/processed is more
> Filtering is slower on No-Dictionary columns as number of comparison is high
> Memory footprint is high
> *The above bottlenecks can be solved by generating dictionary for low 
> cardinality columns at each blocklet level, which will help to achieve below 
> benefits:*
> Reduces the extra IO operations read/write on the dictionary files generated 
> in case of global dictionary.
> It will eliminate the problem for user to identify the dictionary columns 
> when the number of columns are more in a table.
> It helps in getting more compression on dimension columns with less 
> cardinality.
> Filter queries and full scan queries on No-dictionary columns with local 
> dictionary will be faster as filter will be done on encoded data.
> It will help in reducing the store size and memory footprint as only unique 
> values will be stored {color:#00}as {color}part of local dictionary and 
> corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache

2018-07-25 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta resolved CARBONDATA-2638.
--
   Resolution: Fixed
Fix Version/s: 1.4.1

> Implement driver min max caching for specified columns and segregate block 
> and blocklet cache
> -
>
> Key: CARBONDATA-2638
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2638
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Major
> Fix For: 1.4.1
>
> Attachments: Driver_Block_Cache.docx
>
>
> *Background*
> Current implementation of Blocklet dataMap caching in driver is that it 
> caches the min and max values of all the columns in schema by default. 
> *Problem*
>  Problem with this implementation is that as the number of loads increases 
> the memory required to hold min and max values also increases considerably. 
> We know that in most of the scenarios there is a single driver and memory 
> configured for driver is less as compared to executor. With continuous 
> increase in memory requirement driver can even go out of memory which makes 
> the situation further worse.
> *Solution*
> 1. Cache only the required columns in Driver
> 2. Segregation of block and Blocklet level cache**
> For more details please check the attached document



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2651) Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties

2018-07-25 Thread Manish Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555654#comment-16555654
 ] 

Manish Gupta commented on CARBONDATA-2651:
--

https://github.com/apache/carbondata/pull/2558

> Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties
> ---
>
> Key: CARBONDATA-2651
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2651
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
> Fix For: 1.4.1
>
>
> Update document for caching properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2651) Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties

2018-07-25 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta resolved CARBONDATA-2651.
--
   Resolution: Fixed
 Assignee: Gururaj Shetty  (was: Manish Gupta)
Fix Version/s: 1.4.1

> Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties
> ---
>
> Key: CARBONDATA-2651
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2651
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Manish Gupta
>Assignee: Gururaj Shetty
>Priority: Minor
> Fix For: 1.4.1
>
>
> Update document for caching properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2555
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7507/



---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6260/



---


[jira] [Updated] (CARBONDATA-2767) Query take more than 5 seconds for RACK_LOCAL

2018-07-25 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-2767:
-
Fix Version/s: (was: 1.4.1)
   (was: 1.5.0)

> Query take more than 5 seconds for RACK_LOCAL
> -
>
> Key: CARBONDATA-2767
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2767
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the Spark cluster and the Hadoop cluster are two different machine 
> cluster, the Spark tasks will run in RACK_LOCAL mode. So no need to provide 
> the preferred locations to the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2562) Support datamaps on external CSV format

2018-07-25 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin reassigned CARBONDATA-2562:
--

Assignee: xuchuanyin

> Support datamaps on external CSV format
> ---
>
> Key: CARBONDATA-2562
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2562
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
>
> Support creating indexed datamap on external CSV datasource.
> Support rebuilding the indexed datamap for the external CSV datasource.
> Query on external datasource make use of datamap if it is available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6261/



---


[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

2018-07-25 Thread sgururajshetty
Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/2520
  
LGTM



---


[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2560
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7510/



---


[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2559
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/



---


[GitHub] carbondata issue #2441: [CARBONDATA-2625] optimize CarbonReader performance

2018-07-25 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2441
  
retest sdv please



---


[GitHub] carbondata issue #2550: [CARBONDATA-2779]Fixed filter query issue in case of...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2550
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6259/



---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-25 Thread praveenmeenakshi56
GitHub user praveenmeenakshi56 reopened a pull request:

https://github.com/apache/carbondata/pull/2520

[CARBONDATA-2750] Added Documentation for Local Dictionary Support

### What has been added?
Documentation for Local Dictionary Support has been added.
 - [x] Any interfaces changed?
 NA
 - [x] Any backward compatibility impacted?
 NA
 - [x] Document update required?
Document has been added in this PR.
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
 NA
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/praveenmeenakshi56/carbondata local_dict_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2520


commit 0e45c06137eac49508de1844bfc31321ba29acf2
Author: praveenmeenakshi56 
Date:   2018-07-18T06:07:29Z

Added Documentation for Local Dictionary Support

Conflicts:
docs/data-management-on-carbondata.md

commit 9093c09463758aafca590ee4fd476a679902fe94
Author: praveenmeenakshi56 
Date:   2018-07-25T15:08:05Z

Added Documentation for Local Dictionary Support




---


[GitHub] carbondata issue #2551: [HOTFIX] Fix a spelling mistake after PR2511 merged.

2018-07-25 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2551
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5996/



---


[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2557
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7506/



---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-25 Thread praveenmeenakshi56
Github user praveenmeenakshi56 closed the pull request at:

https://github.com/apache/carbondata/pull/2520


---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7508/



---


[GitHub] carbondata issue #2558: [CARBONDATA-2648] Documentation for support for COLU...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2558
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7505/



---


[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2553
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6258/



---


[GitHub] carbondata pull request #2550: [CARBONDATA-2779]Fixed filter query issue in ...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2550


---


[GitHub] carbondata issue #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update document...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2554
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6257/



---


[GitHub] carbondata pull request #2560: [HOTFIX] Removed file existence check to impr...

2018-07-25 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2560

[HOTFIX] Removed file existence check to improve dataMap loading performance

**Problem**
DataMap loading performance degraded after adding file existence check.

**Analysis**
When carbonIndex file is read and  carbondata file path to its metadata 
Info map is prepared, file physical existence is getting checked every time 
which in case of HDFS file system is a namenode call. This degrades the dataMap 
loading performance. This is done to avoid failures for 2 scenarios
1. Compatibility with 1.3 version store where segment file contains 
mergeIndex as well as index file name even though index file physically do not 
exist after creation of merge Index file.
2. Handle IUD scenario where after delete operation carbondata file is 
deleted but the entry still exists in index file.

**Fix**
Modified code to check for physical file existence only in case when any 
IUD operation has happened on the table

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Verified in cluster on 20 billion data   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
No


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
query_slow_executor_pruning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2560


commit 7081c8d480f580414583beef8798a5e3a208f549
Author: manishgupta88 
Date:   2018-07-25T14:18:41Z

Removed file existence check to improve dataMap loading performance




---


[jira] [Assigned] (CARBONDATA-2782) dead code in class 'CarbonCleanFilesCommand'

2018-07-25 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping reassigned CARBONDATA-2782:
---

Assignee: lianganping

> dead code in class 'CarbonCleanFilesCommand'
> 
>
> Key: CARBONDATA-2782
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2782
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: lianganping
>Assignee: lianganping
>Priority: Trivial
> Fix For: 1.4.1
>
>
> class:CarbonCleanFilesCommand 
>  dead code:
> override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
>  carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, 
> tableName.get)(sparkSession)
>  {color:#FF}val dms = 
> carbonTable.getTableInfo.getDataMapSchemaList.asScala.map(_.getDataMapName){color}
> {color:#FF} val indexDms = 
> DataMapStoreManager.getInstance.getAllDataMap(carbonTable).asScala{color}
> {color:#FF} .filter(_.getDataMapSchema.isIndexDataMap){color}
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2555
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6256/



---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread ajantha-bhat
Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
Resolved conflicts and refactored the code in #2559.

please close this PR


---


[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

2018-07-25 Thread ajantha-bhat
GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2559

[CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if 
projection column have null primitive column and struct 

Problem:
In case if the actual value of the primitive data type is null, by PR#2489, 
we are moving all the null values to the end of the collected row without 
considering the data type.

Solution:
Place null in the end of output iff the null value is of complex primitive 
column.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? NA
 
 - [ ] Any backward compatibility impacted?NA
 
 - [ ] Document update required?NA

 - [ ] Testing done
 updated UT 

- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata master_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2559


commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0
Author: ajantha-bhat 
Date:   2018-07-25T13:51:02Z

[CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if 
projection column have null primitive column and struct




---


[GitHub] carbondata issue #2542: [CARBONDATA-2772] Size based dictionary fallback is ...

2018-07-25 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2542
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5995/



---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7503/



---


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7504/



---


[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-07-25 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal updated CARBONDATA-2584:
-
Attachment: CarbonData Local Dictionary Support Design Doc.docx

> CarbonData Local Dictionary Support
> ---
>
> Key: CARBONDATA-2584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: kumar vishal
>Priority: Major
> Attachments: CarbonData Local Dictionary Support Design Doc.docx
>
>
> Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
> stored in LV format) for storing dimension column data.
> *Bottleneck with Global Dictionary*
> It’s difficult for user to determine whether the column should be dictionary 
> or not if number of columns in table is high.
> Global dictionary generation generally slows down the load process.
> Multiple IO operations are made during load even though dictionary already 
> exists.
> During query, multiple IO operations done for reading dictionary files and 
> carbondata files.
> *Bottleneck with No-Dictionary*
> Storage size is high as we store the data in LV format
> Query on No-Dictionary column is slower as data read/processed is more
> Filtering is slower on No-Dictionary columns as number of comparison is high
> Memory footprint is high
> *The above bottlenecks can be solved by generating dictionary for low 
> cardinality columns at each blocklet level, which will help to achieve below 
> benefits:*
> Reduces the extra IO operations read/write on the dictionary files generated 
> in case of global dictionary.
> It will eliminate the problem for user to identify the dictionary columns 
> when the number of columns are more in a table.
> It helps in getting more compression on dimension columns with less 
> cardinality.
> Filter queries and full scan queries on No-dictionary columns with local 
> dictionary will be faster as filter will be done on encoded data.
> It will help in reducing the store size and memory footprint as only unique 
> values will be stored {color:#00}as {color}part of local dictionary and 
> corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-07-25 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal updated CARBONDATA-2584:
-
Attachment: (was: CarbonData Local Dictionary Support Design 
Doc(2).docx)

> CarbonData Local Dictionary Support
> ---
>
> Key: CARBONDATA-2584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: kumar vishal
>Priority: Major
> Attachments: CarbonData Local Dictionary Support Design Doc.docx
>
>
> Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
> stored in LV format) for storing dimension column data.
> *Bottleneck with Global Dictionary*
> It’s difficult for user to determine whether the column should be dictionary 
> or not if number of columns in table is high.
> Global dictionary generation generally slows down the load process.
> Multiple IO operations are made during load even though dictionary already 
> exists.
> During query, multiple IO operations done for reading dictionary files and 
> carbondata files.
> *Bottleneck with No-Dictionary*
> Storage size is high as we store the data in LV format
> Query on No-Dictionary column is slower as data read/processed is more
> Filtering is slower on No-Dictionary columns as number of comparison is high
> Memory footprint is high
> *The above bottlenecks can be solved by generating dictionary for low 
> cardinality columns at each blocklet level, which will help to achieve below 
> benefits:*
> Reduces the extra IO operations read/write on the dictionary files generated 
> in case of global dictionary.
> It will eliminate the problem for user to identify the dictionary columns 
> when the number of columns are more in a table.
> It helps in getting more compression on dimension columns with less 
> cardinality.
> Filter queries and full scan queries on No-dictionary columns with local 
> dictionary will be faster as filter will be done on encoded data.
> It will help in reducing the store size and memory footprint as only unique 
> values will be stored {color:#00}as {color}part of local dictionary and 
> corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support

2018-07-25 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal updated CARBONDATA-2584:
-
Attachment: (was: CarbonData Local Dictionary Support Design Doc.docx)

> CarbonData Local Dictionary Support
> ---
>
> Key: CARBONDATA-2584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2584
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: kumar vishal
>Priority: Major
> Attachments: CarbonData Local Dictionary Support Design Doc(2).docx
>
>
> Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text 
> stored in LV format) for storing dimension column data.
> *Bottleneck with Global Dictionary*
> It’s difficult for user to determine whether the column should be dictionary 
> or not if number of columns in table is high.
> Global dictionary generation generally slows down the load process.
> Multiple IO operations are made during load even though dictionary already 
> exists.
> During query, multiple IO operations done for reading dictionary files and 
> carbondata files.
> *Bottleneck with No-Dictionary*
> Storage size is high as we store the data in LV format
> Query on No-Dictionary column is slower as data read/processed is more
> Filtering is slower on No-Dictionary columns as number of comparison is high
> Memory footprint is high
> *The above bottlenecks can be solved by generating dictionary for low 
> cardinality columns at each blocklet level, which will help to achieve below 
> benefits:*
> Reduces the extra IO operations read/write on the dictionary files generated 
> in case of global dictionary.
> It will eliminate the problem for user to identify the dictionary columns 
> when the number of columns are more in a table.
> It helps in getting more compression on dimension columns with less 
> cardinality.
> Filter queries and full scan queries on No-dictionary columns with local 
> dictionary will be faster as filter will be done on encoded data.
> It will help in reducing the store size and memory footprint as only unique 
> values will be stored {color:#00}as {color}part of local dictionary and 
> corresponding data will be stored as encoded data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2771) Block update and delete if compaction is in progress

2018-07-25 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-2771.
--
Resolution: Fixed
  Assignee: Akash R Nilugal

> Block update and delete if compaction is in progress
> 
>
> Key: CARBONDATA-2771
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2771
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Block update and delete if compaction is in progress, as it may leads to data 
> mismatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2550: [CARBONDATA-2779]Fixed filter query issue in case of...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2550
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7502/



---


[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2535
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6252/



---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-25 Thread praveenmeenakshi56
Github user praveenmeenakshi56 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2520#discussion_r205108171
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
  TBLPROPERTIES ('streaming'='true')
  ```
 
+  - **Local Dictionary Configuration**
+  
+  Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+  1. Getting more compression on dimension columns with less cardinality.
+  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
+  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
+
+   By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+   
+   Users will be able to pass following properties in create table 
command: 
+   
+   | Properties | Default value | Description |
+   | -- | - | --- |
+   | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary 
will be enabled for the table | 
+   | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
--- End diff --

It is Segment/Task Level. Please refer to JIRA 2584.


---


[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2558


---


[GitHub] carbondata issue #2558: [CARBONDATA-2648] Documentation for support for COLU...

2018-07-25 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2558
  
LGTM


---


[GitHub] carbondata issue #2528: [CARBONDATA-2767][CarbonStore] Fix task locality iss...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2528
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6253/



---


[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...

2018-07-25 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2558#discussion_r205102509
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -141,7 +141,103 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
'SORT_SCOPE'='NO_SORT')
```
   **NOTE:** CarbonData also supports "using carbondata". Find example code 
at 
[SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala)
 in the CarbonData repo.
-
+   
+   - **Caching Min/Max Value for Required Columns**
+ By default, CarbonData caches min and max values of all the columns 
in schema.  As the load increases, the memory required to hold the min and max 
values increases considerably. This feature enables you to configure min and 
max values only for the required columns, resulting in optimized memory usage. 
+
+Following are the valid values for COLUMN_META_CACHE:
+* If you want no column min/max values to be caches in the driver.
+
+```
+COLUMN_META_CACHE=’’
+```
+
+* If you want only col1 min/max values to be cached in the driver.
+
+```
+COLUMN_META_CACHE=’col1’
+```
+
+* If you want min/max values to be caches in driver for all the 
specified columns.
--- End diff --

correct the typo...caches to cached


---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-25 Thread praveenmeenakshi56
Github user praveenmeenakshi56 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2520#discussion_r205102030
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
  TBLPROPERTIES ('streaming'='true')
  ```
 
+  - **Local Dictionary Configuration**
+  
+  Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+  1. Getting more compression on dimension columns with less cardinality.
+  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
+  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
+
+   By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+   
+   Users will be able to pass following properties in create table 
command: 
+   
+   | Properties | Default value | Description |
+   | -- | - | --- |
+   | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary 
will be enabled for the table | 
+   | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
+   | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar 
columns | Columns for which Local Dictionary is generated. |
+   | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local 
Dictionary is not generated |
+
--- End diff --

All the aforesaid are supported with Local Dictionary. The additional 
information is already present in the Design Document in the JIRA. Please refer 
the same.


---


[jira] [Resolved] (CARBONDATA-2767) Query take more than 5 seconds for RACK_LOCAL

2018-07-25 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2767.
--
   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> Query take more than 5 seconds for RACK_LOCAL
> -
>
> Key: CARBONDATA-2767
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2767
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Priority: Minor
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the Spark cluster and the Hadoop cluster are two different machine 
> cluster, the Spark tasks will run in RACK_LOCAL mode. So no need to provide 
> the preferred locations to the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update document...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2554
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7501/



---


[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2557
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7500/



---


[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...

2018-07-25 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2537
  
please raise to external-format branch


---


[GitHub] carbondata issue #2544: [CARBONDATA-2776][CarbonStore] Support ingesting dat...

2018-07-25 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2544
  
please rebase


---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-25 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5994/



---


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7499/



---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-25 Thread praveenmeenakshi56
Github user praveenmeenakshi56 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2520#discussion_r205094917
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
  TBLPROPERTIES ('streaming'='true')
  ```
 
+  - **Local Dictionary Configuration**
+  
+  Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+  1. Getting more compression on dimension columns with less cardinality.
+  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
+  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
+
+   By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
--- End diff --

Data Loading Performance is affected only by 8%. Test with 3.5 billion 
records (103 columns) 


---


[GitHub] carbondata issue #2528: [CARBONDATA-2767][CarbonStore] Fix task locality iss...

2018-07-25 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2528
  
LGTM


---


[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2557
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6255/



---


[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...

2018-07-25 Thread sgururajshetty
GitHub user sgururajshetty opened a pull request:

https://github.com/apache/carbondata/pull/2558

[CARBONDATA-2648] Documentation for support for COLUMN_META_CACHE in create 
table and a…



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sgururajshetty/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2558


commit 529f80dda6db3ce34e0baf766b03a9a13190b286
Author: sgururajshetty 
Date:   2018-07-25T12:44:07Z

Documentation for support for COLUMN_META_CACHE in create table and alter 
table properties




---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
@dhatchayani Please rebase


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2535


---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6254/



---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6248/



---


[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-25 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2535
  
LGTM


---


[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure

2018-07-25 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2553
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6251/



---


[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...

2018-07-25 Thread dhatchayani
Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2555
  
retest this please


---


[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...

2018-07-25 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2533
  
LGTM


---


  1   2   >