[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2552 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6274/ ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2552 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7519/ ---
[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2560 retest this please ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user praveenmeenakshi56 commented on the issue: https://github.com/apache/carbondata/pull/2552 retest this please ---
[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/2555 retest sdv please ---
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/2517 retest sdv please ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/2484 retest this please ---
[GitHub] carbondata pull request #2528: [CARBONDATA-2767][CarbonStore] Fix task local...
Github user QiangCai closed the pull request at: https://github.com/apache/carbondata/pull/2528 ---
[GitHub] carbondata pull request #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update d...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2554 ---
[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2561 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6271/ ---
[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2561 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7516/ ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6272/ ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7517/ ---
[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2562 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7515/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2533 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6269/ ---
[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2520 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6266/ ---
[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2562 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6270/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2533 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7514/ ---
[GitHub] carbondata pull request #2562: [HOTFIX] CreateDataMapPost Event was skipped ...
GitHub user jatin9896 opened a pull request: https://github.com/apache/carbondata/pull/2562 [HOTFIX] CreateDataMapPost Event was skipped in case of preaggregate datamap Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? NA - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jatin9896/incubator-carbondata hotfix1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2562.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2562 commit bfa56deb66175c40a75cdeabab68990fa3d7d58f Author: Jatin Date: 2018-07-25T19:12:50Z hotfix : CreateDataMapPost Event was skipped in case of preaggregate datamap ---
[GitHub] carbondata issue #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever blocking...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2561 @kumarvishal09 : please review ---
[GitHub] carbondata pull request #2561: [CARBONDATA-2784][SDK writer] Fixed:Forever b...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2561 [CARBONDATA-2784][SDK writer] Fixed:Forever blocking wait with more than 21 batch of data **problem:** [SDK writer] Forever blocking wait with more than 21 batch of data, when consumer is dead due to data loading exception (bad record / out of memory) **root cause:** When the consumer is dead due to data loading exception, writer will be forcefully closed. but queue.clear() cleared only snapshot of entries (10 batches) and close is set to true after that. In between clear() and close = true, If more than 10 batches of data is again put into queue. For 11th batch, queue.put() goes for forever block as consumer is dead. **Solution:** set close = true, before clearing the queue. This will avoid adding more batches to queue from write(). Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2561 commit ebbe1ef21a1864c3b179ebfb5f0b5d1e2812ef24 Author: ajantha-bhat Date: 2018-07-25T19:05:36Z [CARBONDATA-2784][SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception problem: [SDK writer] Forever blocking wait with more than 21 batch of data, when consumer is dead due to data loading exception (bad record / out of memory) root cause: When the consumer is dead due to data loading exception, writer will be forcefully closed. but queue.clear() cleared only snapshot of entries (10 batches) and close is set to true after that. In between clear() and close = true, If more than 10 batches of data is again put into queue. For 11th batch, queue.put() goes for forever block as consumer is dead. Solution: set close = true, before clearing the queue. This will avoid adding more batches to queue from write(). ---
[jira] [Created] (CARBONDATA-2784) [SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception
Ajantha Bhat created CARBONDATA-2784: Summary: [SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception Key: CARBONDATA-2784 URL: https://issues.apache.org/jira/browse/CARBONDATA-2784 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: [SDK writer] Forever blocking wait with more than 21 batch of data, when consumer is dead due to data loading exception (bad record / out of memory) root cause: When the consumer is dead due to data loading exception, writer will be forcefully closed. but queue.clear() cleared only snapshot of entries (10 batches) and close is set to true after that. In between clear() and close = true, If more than 10 batches of data is again put into queue. For 11th batch, queue.put() goes for forever block as consumer is dead. Solution: set close = true, before clearing the queue. This will avoid adding more batches to queue from write(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2520 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7512/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2533 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7513/ ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/2549 refactored the code in #2559. So closing this PR ---
[GitHub] carbondata pull request #2549: [CARBONDATA-2606][Complex DataType Enhancemen...
Github user dhatchayani closed the pull request at: https://github.com/apache/carbondata/pull/2549 ---
[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/2555 retest sdv please ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2552 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6265/ ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2552 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7511/ ---
[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2533 @ravipesala please review the changes in BlockDatamap ---
[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2560 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6262/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2533 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5997/ ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2533 retest this please ---
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/2517 retest sdv please ---
[jira] [Updated] (CARBONDATA-2779) Filter query is failing for store created with V1/V2 format
[ https://issues.apache.org/jira/browse/CARBONDATA-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta updated CARBONDATA-2779: - Issue Type: Bug (was: Improvement) > Filter query is failing for store created with V1/V2 format > --- > > Key: CARBONDATA-2779 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2779 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > Fix For: 1.4.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Filter query is failing for store created with V1/V2 format with > Arrayindexoutofbound exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2779) Filter query is failing for store created with V1/V2 format
[ https://issues.apache.org/jira/browse/CARBONDATA-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta resolved CARBONDATA-2779. -- Resolution: Fixed Fix Version/s: 1.4.1 > Filter query is failing for store created with V1/V2 format > --- > > Key: CARBONDATA-2779 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2779 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > Fix For: 1.4.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Filter query is failing for store created with V1/V2 format with > Arrayindexoutofbound exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6264/ ---
[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support
[ https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2584: - Attachment: CarbonData Local Dictionary Support Design Doc(2).docx > CarbonData Local Dictionary Support > --- > > Key: CARBONDATA-2584 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2584 > Project: CarbonData > Issue Type: New Feature >Reporter: kumar vishal >Priority: Major > Attachments: CarbonData Local Dictionary Support Design Doc(2).docx > > > Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text > stored in LV format) for storing dimension column data. > *Bottleneck with Global Dictionary* > It’s difficult for user to determine whether the column should be dictionary > or not if number of columns in table is high. > Global dictionary generation generally slows down the load process. > Multiple IO operations are made during load even though dictionary already > exists. > During query, multiple IO operations done for reading dictionary files and > carbondata files. > *Bottleneck with No-Dictionary* > Storage size is high as we store the data in LV format > Query on No-Dictionary column is slower as data read/processed is more > Filtering is slower on No-Dictionary columns as number of comparison is high > Memory footprint is high > *The above bottlenecks can be solved by generating dictionary for low > cardinality columns at each blocklet level, which will help to achieve below > benefits:* > Reduces the extra IO operations read/write on the dictionary files generated > in case of global dictionary. > It will eliminate the problem for user to identify the dictionary columns > when the number of columns are more in a table. > It helps in getting more compression on dimension columns with less > cardinality. > Filter queries and full scan queries on No-dictionary columns with local > dictionary will be faster as filter will be done on encoded data. > It will help in reducing the store size and memory footprint as only unique > values will be stored {color:#00}as {color}part of local dictionary and > corresponding data will be stored as encoded data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache
[ https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta resolved CARBONDATA-2638. -- Resolution: Fixed Fix Version/s: 1.4.1 > Implement driver min max caching for specified columns and segregate block > and blocklet cache > - > > Key: CARBONDATA-2638 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2638 > Project: CarbonData > Issue Type: New Feature >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Major > Fix For: 1.4.1 > > Attachments: Driver_Block_Cache.docx > > > *Background* > Current implementation of Blocklet dataMap caching in driver is that it > caches the min and max values of all the columns in schema by default. > *Problem* > Problem with this implementation is that as the number of loads increases > the memory required to hold min and max values also increases considerably. > We know that in most of the scenarios there is a single driver and memory > configured for driver is less as compared to executor. With continuous > increase in memory requirement driver can even go out of memory which makes > the situation further worse. > *Solution* > 1. Cache only the required columns in Driver > 2. Segregation of block and Blocklet level cache** > For more details please check the attached document -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2651) Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties
[ https://issues.apache.org/jira/browse/CARBONDATA-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555654#comment-16555654 ] Manish Gupta commented on CARBONDATA-2651: -- https://github.com/apache/carbondata/pull/2558 > Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties > --- > > Key: CARBONDATA-2651 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2651 > Project: CarbonData > Issue Type: Sub-task >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Minor > Fix For: 1.4.1 > > > Update document for caching properties -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2651) Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties
[ https://issues.apache.org/jira/browse/CARBONDATA-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta resolved CARBONDATA-2651. -- Resolution: Fixed Assignee: Gururaj Shetty (was: Manish Gupta) Fix Version/s: 1.4.1 > Update IDG for COLUMN_META_CACHE and CACHE_LEVEL properties > --- > > Key: CARBONDATA-2651 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2651 > Project: CarbonData > Issue Type: Sub-task >Reporter: Manish Gupta >Assignee: Gururaj Shetty >Priority: Minor > Fix For: 1.4.1 > > > Update document for caching properties -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2555 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7507/ ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2549 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6260/ ---
[jira] [Updated] (CARBONDATA-2767) Query take more than 5 seconds for RACK_LOCAL
[ https://issues.apache.org/jira/browse/CARBONDATA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-2767: - Fix Version/s: (was: 1.4.1) (was: 1.5.0) > Query take more than 5 seconds for RACK_LOCAL > - > > Key: CARBONDATA-2767 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2767 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > If the Spark cluster and the Hadoop cluster are two different machine > cluster, the Spark tasks will run in RACK_LOCAL mode. So no need to provide > the preferred locations to the task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2562) Support datamaps on external CSV format
[ https://issues.apache.org/jira/browse/CARBONDATA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-2562: -- Assignee: xuchuanyin > Support datamaps on external CSV format > --- > > Key: CARBONDATA-2562 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2562 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Support creating indexed datamap on external CSV datasource. > Support rebuilding the indexed datamap for the external CSV datasource. > Query on external datasource make use of datamap if it is available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2517 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6261/ ---
[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/2520 LGTM ---
[GitHub] carbondata issue #2560: [HOTFIX] Removed file existence check to improve dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2560 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7510/ ---
[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2559 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/ ---
[GitHub] carbondata issue #2441: [CARBONDATA-2625] optimize CarbonReader performance
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/2441 retest sdv please ---
[GitHub] carbondata issue #2550: [CARBONDATA-2779]Fixed filter query issue in case of...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2550 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6259/ ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
GitHub user praveenmeenakshi56 reopened a pull request: https://github.com/apache/carbondata/pull/2520 [CARBONDATA-2750] Added Documentation for Local Dictionary Support ### What has been added? Documentation for Local Dictionary Support has been added. - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NA - [x] Document update required? Document has been added in this PR. - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata local_dict_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2520 commit 0e45c06137eac49508de1844bfc31321ba29acf2 Author: praveenmeenakshi56 Date: 2018-07-18T06:07:29Z Added Documentation for Local Dictionary Support Conflicts: docs/data-management-on-carbondata.md commit 9093c09463758aafca590ee4fd476a679902fe94 Author: praveenmeenakshi56 Date: 2018-07-25T15:08:05Z Added Documentation for Local Dictionary Support ---
[GitHub] carbondata issue #2551: [HOTFIX] Fix a spelling mistake after PR2511 merged.
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2551 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5996/ ---
[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2557 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7506/ ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user praveenmeenakshi56 closed the pull request at: https://github.com/apache/carbondata/pull/2520 ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7508/ ---
[GitHub] carbondata issue #2558: [CARBONDATA-2648] Documentation for support for COLU...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2558 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7505/ ---
[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2553 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6258/ ---
[GitHub] carbondata pull request #2550: [CARBONDATA-2779]Fixed filter query issue in ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2550 ---
[GitHub] carbondata issue #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update document...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2554 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6257/ ---
[GitHub] carbondata pull request #2560: [HOTFIX] Removed file existence check to impr...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/carbondata/pull/2560 [HOTFIX] Removed file existence check to improve dataMap loading performance **Problem** DataMap loading performance degraded after adding file existence check. **Analysis** When carbonIndex file is read and carbondata file path to its metadata Info map is prepared, file physical existence is getting checked every time which in case of HDFS file system is a namenode call. This degrades the dataMap loading performance. This is done to avoid failures for 2 scenarios 1. Compatibility with 1.3 version store where segment file contains mergeIndex as well as index file name even though index file physically do not exist after creation of merge Index file. 2. Handle IUD scenario where after delete operation carbondata file is deleted but the entry still exists in index file. **Fix** Modified code to check for physical file existence only in case when any IUD operation has happened on the table - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Verified in cluster on 20 billion data - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata query_slow_executor_pruning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2560 commit 7081c8d480f580414583beef8798a5e3a208f549 Author: manishgupta88 Date: 2018-07-25T14:18:41Z Removed file existence check to improve dataMap loading performance ---
[jira] [Assigned] (CARBONDATA-2782) dead code in class 'CarbonCleanFilesCommand'
[ https://issues.apache.org/jira/browse/CARBONDATA-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lianganping reassigned CARBONDATA-2782: --- Assignee: lianganping > dead code in class 'CarbonCleanFilesCommand' > > > Key: CARBONDATA-2782 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2782 > Project: CarbonData > Issue Type: Improvement >Reporter: lianganping >Assignee: lianganping >Priority: Trivial > Fix For: 1.4.1 > > > class:CarbonCleanFilesCommand > dead code: > override def processMetadata(sparkSession: SparkSession): Seq[Row] = { > carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, > tableName.get)(sparkSession) > {color:#FF}val dms = > carbonTable.getTableInfo.getDataMapSchemaList.asScala.map(_.getDataMapName){color} > {color:#FF} val indexDms = > DataMapStoreManager.getInstance.getAllDataMap(carbonTable).asScala{color} > {color:#FF} .filter(_.getDataMapSchema.isIndexDataMap){color} > ... > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2555 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6256/ ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2549 Resolved conflicts and refactored the code in #2559. please close this PR ---
[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2559 [CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if projection column have null primitive column and struct Problem: In case if the actual value of the primitive data type is null, by PR#2489, we are moving all the null values to the end of the collected row without considering the data type. Solution: Place null in the end of output iff the null value is of complex primitive column. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted?NA - [ ] Document update required?NA - [ ] Testing done updated UT - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2559 commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0 Author: ajantha-bhat Date: 2018-07-25T13:51:02Z [CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if projection column have null primitive column and struct ---
[GitHub] carbondata issue #2542: [CARBONDATA-2772] Size based dictionary fallback is ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2542 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5995/ ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2549 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7503/ ---
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2517 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7504/ ---
[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support
[ https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2584: - Attachment: CarbonData Local Dictionary Support Design Doc.docx > CarbonData Local Dictionary Support > --- > > Key: CARBONDATA-2584 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2584 > Project: CarbonData > Issue Type: New Feature >Reporter: kumar vishal >Priority: Major > Attachments: CarbonData Local Dictionary Support Design Doc.docx > > > Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text > stored in LV format) for storing dimension column data. > *Bottleneck with Global Dictionary* > It’s difficult for user to determine whether the column should be dictionary > or not if number of columns in table is high. > Global dictionary generation generally slows down the load process. > Multiple IO operations are made during load even though dictionary already > exists. > During query, multiple IO operations done for reading dictionary files and > carbondata files. > *Bottleneck with No-Dictionary* > Storage size is high as we store the data in LV format > Query on No-Dictionary column is slower as data read/processed is more > Filtering is slower on No-Dictionary columns as number of comparison is high > Memory footprint is high > *The above bottlenecks can be solved by generating dictionary for low > cardinality columns at each blocklet level, which will help to achieve below > benefits:* > Reduces the extra IO operations read/write on the dictionary files generated > in case of global dictionary. > It will eliminate the problem for user to identify the dictionary columns > when the number of columns are more in a table. > It helps in getting more compression on dimension columns with less > cardinality. > Filter queries and full scan queries on No-dictionary columns with local > dictionary will be faster as filter will be done on encoded data. > It will help in reducing the store size and memory footprint as only unique > values will be stored {color:#00}as {color}part of local dictionary and > corresponding data will be stored as encoded data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support
[ https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2584: - Attachment: (was: CarbonData Local Dictionary Support Design Doc(2).docx) > CarbonData Local Dictionary Support > --- > > Key: CARBONDATA-2584 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2584 > Project: CarbonData > Issue Type: New Feature >Reporter: kumar vishal >Priority: Major > Attachments: CarbonData Local Dictionary Support Design Doc.docx > > > Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text > stored in LV format) for storing dimension column data. > *Bottleneck with Global Dictionary* > It’s difficult for user to determine whether the column should be dictionary > or not if number of columns in table is high. > Global dictionary generation generally slows down the load process. > Multiple IO operations are made during load even though dictionary already > exists. > During query, multiple IO operations done for reading dictionary files and > carbondata files. > *Bottleneck with No-Dictionary* > Storage size is high as we store the data in LV format > Query on No-Dictionary column is slower as data read/processed is more > Filtering is slower on No-Dictionary columns as number of comparison is high > Memory footprint is high > *The above bottlenecks can be solved by generating dictionary for low > cardinality columns at each blocklet level, which will help to achieve below > benefits:* > Reduces the extra IO operations read/write on the dictionary files generated > in case of global dictionary. > It will eliminate the problem for user to identify the dictionary columns > when the number of columns are more in a table. > It helps in getting more compression on dimension columns with less > cardinality. > Filter queries and full scan queries on No-dictionary columns with local > dictionary will be faster as filter will be done on encoded data. > It will help in reducing the store size and memory footprint as only unique > values will be stored {color:#00}as {color}part of local dictionary and > corresponding data will be stored as encoded data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2584) CarbonData Local Dictionary Support
[ https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2584: - Attachment: (was: CarbonData Local Dictionary Support Design Doc.docx) > CarbonData Local Dictionary Support > --- > > Key: CARBONDATA-2584 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2584 > Project: CarbonData > Issue Type: New Feature >Reporter: kumar vishal >Priority: Major > Attachments: CarbonData Local Dictionary Support Design Doc(2).docx > > > Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text > stored in LV format) for storing dimension column data. > *Bottleneck with Global Dictionary* > It’s difficult for user to determine whether the column should be dictionary > or not if number of columns in table is high. > Global dictionary generation generally slows down the load process. > Multiple IO operations are made during load even though dictionary already > exists. > During query, multiple IO operations done for reading dictionary files and > carbondata files. > *Bottleneck with No-Dictionary* > Storage size is high as we store the data in LV format > Query on No-Dictionary column is slower as data read/processed is more > Filtering is slower on No-Dictionary columns as number of comparison is high > Memory footprint is high > *The above bottlenecks can be solved by generating dictionary for low > cardinality columns at each blocklet level, which will help to achieve below > benefits:* > Reduces the extra IO operations read/write on the dictionary files generated > in case of global dictionary. > It will eliminate the problem for user to identify the dictionary columns > when the number of columns are more in a table. > It helps in getting more compression on dimension columns with less > cardinality. > Filter queries and full scan queries on No-dictionary columns with local > dictionary will be faster as filter will be done on encoded data. > It will help in reducing the store size and memory footprint as only unique > values will be stored {color:#00}as {color}part of local dictionary and > corresponding data will be stored as encoded data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2771) Block update and delete if compaction is in progress
[ https://issues.apache.org/jira/browse/CARBONDATA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2771. -- Resolution: Fixed Assignee: Akash R Nilugal > Block update and delete if compaction is in progress > > > Key: CARBONDATA-2771 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2771 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Block update and delete if compaction is in progress, as it may leads to data > mismatch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2550: [CARBONDATA-2779]Fixed filter query issue in case of...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2550 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7502/ ---
[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2535 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6252/ ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user praveenmeenakshi56 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r205108171 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | -- | - | --- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | --- End diff -- It is Segment/Task Level. Please refer to JIRA 2584. ---
[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2558 ---
[GitHub] carbondata issue #2558: [CARBONDATA-2648] Documentation for support for COLU...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2558 LGTM ---
[GitHub] carbondata issue #2528: [CARBONDATA-2767][CarbonStore] Fix task locality iss...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2528 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6253/ ---
[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2558#discussion_r205102509 --- Diff: docs/data-management-on-carbondata.md --- @@ -141,7 +141,103 @@ This tutorial is going to introduce all commands and data operations on CarbonDa 'SORT_SCOPE'='NO_SORT') ``` **NOTE:** CarbonData also supports "using carbondata". Find example code at [SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala) in the CarbonData repo. - + + - **Caching Min/Max Value for Required Columns** + By default, CarbonData caches min and max values of all the columns in schema. As the load increases, the memory required to hold the min and max values increases considerably. This feature enables you to configure min and max values only for the required columns, resulting in optimized memory usage. + +Following are the valid values for COLUMN_META_CACHE: +* If you want no column min/max values to be caches in the driver. + +``` +COLUMN_META_CACHE=ââ +``` + +* If you want only col1 min/max values to be cached in the driver. + +``` +COLUMN_META_CACHE=âcol1â +``` + +* If you want min/max values to be caches in driver for all the specified columns. --- End diff -- correct the typo...caches to cached ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user praveenmeenakshi56 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r205102030 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | -- | - | --- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | + | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | + --- End diff -- All the aforesaid are supported with Local Dictionary. The additional information is already present in the Design Document in the JIRA. Please refer the same. ---
[jira] [Resolved] (CARBONDATA-2767) Query take more than 5 seconds for RACK_LOCAL
[ https://issues.apache.org/jira/browse/CARBONDATA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2767. -- Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > Query take more than 5 seconds for RACK_LOCAL > - > > Key: CARBONDATA-2767 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2767 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Priority: Minor > Fix For: 1.5.0, 1.4.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > If the Spark cluster and the Hadoop cluster are two different machine > cluster, the Spark tasks will run in RACK_LOCAL mode. So no need to provide > the preferred locations to the task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2554: [CARBONDATA-2783][BloomDataMap][Doc] Update document...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2554 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7501/ ---
[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2557 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7500/ ---
[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2537 please raise to external-format branch ---
[GitHub] carbondata issue #2544: [CARBONDATA-2776][CarbonStore] Support ingesting dat...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2544 please rebase ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2484 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5994/ ---
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2517 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7499/ ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user praveenmeenakshi56 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r205094917 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- Data Loading Performance is affected only by 8%. Test with 3.5 billion records (103 columns) ---
[GitHub] carbondata issue #2528: [CARBONDATA-2767][CarbonStore] Fix task locality iss...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2528 LGTM ---
[GitHub] carbondata issue #2557: [CARBONDATA-2782]delete dead code in class 'CarbonCl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2557 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6255/ ---
[GitHub] carbondata pull request #2558: [CARBONDATA-2648] Documentation for support f...
GitHub user sgururajshetty opened a pull request: https://github.com/apache/carbondata/pull/2558 [CARBONDATA-2648] Documentation for support for COLUMN_META_CACHE in create table and a⦠You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2558 commit 529f80dda6db3ce34e0baf766b03a9a13190b286 Author: sgururajshetty Date: 2018-07-25T12:44:07Z Documentation for support for COLUMN_META_CACHE in create table and alter table properties ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2549 @dhatchayani Please rebase ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2535 ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6254/ ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2484 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6248/ ---
[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2535 LGTM ---
[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2553 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6251/ ---
[GitHub] carbondata issue #2555: [CARBONDATA-2753][Compatibility] Row count of page i...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/2555 retest this please ---
[GitHub] carbondata issue #2533: [CARBONDATA-2765]handle flat folder support for impl...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2533 LGTM ---