[jira] [Commented] (CARBONDATA-2340) load数据超过32000byte
[ https://issues.apache.org/jira/browse/CARBONDATA-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436926#comment-16436926 ] xuchuanyin commented on CARBONDATA-2340: [~niaoshu] This is a known issue/restriction in carbondata. The reason is that carbondata store the length of string using `short`. > load数据超过32000byte > - > > Key: CARBONDATA-2340 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2340 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 >Reporter: niaoshu >Priority: Blocker > Original Estimate: 12h > Remaining Estimate: 12h > > INFO storage.BlockManagerMasterEndpoint: Registering block manager > spark1:12603 with 5.2 GB RAM, BlockManagerId(1, spark1, 12603, None) > 18/04/11 14:24:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on spark1:12603 (size: 34.9 KB, free: 5.2 GB) > 18/04/11 14:24:34 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark1, executor 1): > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > Dataload failed, String size cannot exceed 32000 bytes > at > org.apache.carbondata.processing.loading.converter.impl.NonDictionaryFieldConverterImpl.convert(NonDictionaryFieldConverterImpl.java:75) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:162) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl.processRowBatch(DataConverterProcessorStepImpl.java:104) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:91) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:77) > at > org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[jira] [Comment Edited] (CARBONDATA-2340) load数据超过32000byte
[ https://issues.apache.org/jira/browse/CARBONDATA-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436926#comment-16436926 ] xuchuanyin edited comment on CARBONDATA-2340 at 4/13/18 7:02 AM: - [~niaoshu] This is a known issue/restriction in carbondata. The reason is that carbondata store the length of string using `short`. If we want to solve this problem, maybe we can add a new datatype called `TEXT` in carbondata to support this scenario. was (Author: xuchuanyin): [~niaoshu] This is a known issue/restriction in carbondata. The reason is that carbondata store the length of string using `short`. > load数据超过32000byte > - > > Key: CARBONDATA-2340 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2340 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 >Reporter: niaoshu >Priority: Blocker > Original Estimate: 12h > Remaining Estimate: 12h > > INFO storage.BlockManagerMasterEndpoint: Registering block manager > spark1:12603 with 5.2 GB RAM, BlockManagerId(1, spark1, 12603, None) > 18/04/11 14:24:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on spark1:12603 (size: 34.9 KB, free: 5.2 GB) > 18/04/11 14:24:34 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark1, executor 1): > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > Dataload failed, String size cannot exceed 32000 bytes > at > org.apache.carbondata.processing.loading.converter.impl.NonDictionaryFieldConverterImpl.convert(NonDictionaryFieldConverterImpl.java:75) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:162) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl.processRowBatch(DataConverterProcessorStepImpl.java:104) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:91) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:77) > at > org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 @jackylk & @gvramana : please review this PR. ---
[jira] [Created] (CARBONDATA-2343) Improper filter resolver cause more filter scan on data that could be skipped
xuchuanyin created CARBONDATA-2343: -- Summary: Improper filter resolver cause more filter scan on data that could be skipped Key: CARBONDATA-2343 URL: https://issues.apache.org/jira/browse/CARBONDATA-2343 Project: CarbonData Issue Type: Bug Components: data-query Reporter: xuchuanyin Assignee: xuchuanyin In DataMapChooser, Carbondata try to choose and combine datamap for expressions. In some scenario, it will generate `TrueConditionalResolverImpl` to wrap the sub-expression, which will cause data scan on the blocklet which can be skipped (For `TrueConditionalResolverImpl`, the `TrueFilterExecutor` will always cause scanning the data even it simply wraps a range expression). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2327) invalid schema name _system shows when executed show schemas in presto
[ https://issues.apache.org/jira/browse/CARBONDATA-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anubhav tarar reassigned CARBONDATA-2327: - Assignee: anubhav tarar > invalid schema name _system shows when executed show schemas in presto > --- > > Key: CARBONDATA-2327 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2327 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.4.0 >Reporter: anubhav tarar >Assignee: anubhav tarar >Priority: Trivial > Time Spent: 1h 50m > Remaining Estimate: 0h > > presto> show schemas; > Schema > > _system > default > information_schema > (3 rows) > Query 20180410_101915_00010_sidw4, FINISHED, 1 node > Splits: 18 total, 18 done (100.00%) > 0:00 [3 rows, 47B] [25 rows/s, 395B/s] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2023) Optimization in data loading for skewed data
[ https://issues.apache.org/jira/browse/CARBONDATA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2023. Resolution: Fixed > Optimization in data loading for skewed data > > > Key: CARBONDATA-2023 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2023 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Affects Versions: 1.3.0 >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > In one of my cases, carbondata has to load skewed data files. The size of > data file ranges from 1KB to about 5GB. > In current implementation, carbondata will distribute the file blocks(splits) > among the nodes to maximum the data locality and data evenly distributed, we > call it `block-node-assignment` for short. > However, the current implementation has some problems. > The assignment is block number based. The goal is to make sure that all the > nodes deal the same amount number of blocks. In the skewed data scenario > described above, the block of a small file and the block of a big file are > very different from its size (1KB v.s. 64MB). As a result, the difference of > total data size assigned for each data node is very large. > In order to solve this problem, the size of block should be considered during > the block-node-assignment. One node can deal more blocks than another as long > as the total size of blocks are almost the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2288) Compaction should be able to run concurrently with data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2288. Resolution: Fixed > Compaction should be able to run concurrently with data loading > --- > > Key: CARBONDATA-2288 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2288 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Currently in carbondata, compaction can be triggered in two ways: > 1. Manually trigger compaction using ALTER statement. > 2. Atomically trigger compaction when doing data loading. > In both ways, compaction and data loading cannot run concurrently. In way 1, > compation will fail if data load is processing. In way 2, the compaction will > only start after the main data loading finished and the user has to wait > until the compaction is finished. > In my option, data loading will work on a new segment, whereas compaction > works on the existed segments, so we can let them run concurrently. > For the 1st way, compaction will succeed even data loading is processing; > For the 2nd way, compaction will run concurrently with the data loading, or > after the data loading (we can configure it). And user will not have to wait > the compaction finished. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2157: [CARBONDATA-2334] Added Property enabling user to bl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2157 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4986/ ---
[GitHub] carbondata pull request #2167: [CARBONDATA-2337][BACKPORT-1.3] Fix duplicate...
Github user zzcclp closed the pull request at: https://github.com/apache/carbondata/pull/2167 ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2136 retest sdv please ---
[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2168 [CARBONDATA-2343][DataMap]Improper filter resolver cause more filter scan on data that could be skipped Currently DataMapChooser will choose and combine datamap for expressions and it will wrap the expression in a `TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor` will always cause scanning the blocklet which could be skipped. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO, only internal interface has been changed` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0413_bug_dm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2168.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2168 commit 2e2c0683f867a7ecda7a4e7f80b2c7030220cd4a Author: xuchuanyin Date: 2018-04-13T07:14:25Z Fix bugs in datamap chooser Currently DataMapChooser will choose and combine datamap for expressions and it will wrap the expression in a `TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor` will always cause scanning the blocklet which could be skipped. ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4989/ ---
[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436954#comment-16436954 ] Liang Chen commented on CARBONDATA-2318: I tested it in sparkshell : Step1: ./bin/spark-shell --master local --jars ${carbon_jar} --driver-memory 4G Step2: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("/Users/apple/DEMO/presto_test/data","/Users/apple/DEMO/presto_test/metadata") Step3: reuse the old carbondata 1) : copy all data "default/carbon_table/.." to new location : /Users/apple/DEMO/presto_test/data 2) : run carbon.sql("refresh table carbon_table") > Remove invalid table name(.ds_store) of presto integration > --- > > Key: CARBONDATA-2318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2318 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Reporter: Liang Chen >Priority: Minor > > For presto integration , will get the invalid table name via "show tables > from default" > As below. > presto:default> show tables from default; > Table > > .ds_store > carbon_table > carbontable > partition_bigtable > partition_table > (5 rows) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3772/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2141 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4435/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4990/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3771/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2148 retest this please ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2113 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2168 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3774/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3776/ ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2168 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4992/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4993/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3775/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4991/ ---
[jira] [Created] (CARBONDATA-2344) Fix bugs in BlockletDataMap
xuchuanyin created CARBONDATA-2344: -- Summary: Fix bugs in BlockletDataMap Key: CARBONDATA-2344 URL: https://issues.apache.org/jira/browse/CARBONDATA-2344 Project: CarbonData Issue Type: Bug Components: data-query Reporter: xuchuanyin Assignee: xuchuanyin DMStore stores DataMapRows for each blocklet. Currently carbondata access the DMStore by blockletId, which is not unique and will cause problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unamanged table desgin doc_V1.0.pdf) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unmanaged table desgin doc_V1.0.pdf > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2169 [CARBONDATA-2344][DataMap] Fix bugs in mapping blocklet to UnsafeDMStore rows In BlockletDataMap, carbondata stores DMRow in an array for each blocklet. But currently carbondata accesses the DMRow only by blockletId(0, 1, etc.), which will cause problem since different block can have same blockletId. This PR adds a map to map the blockId#blockletId to array index, carbondata can access the DMRow by blockId and blockletId. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO, only internal interfaces have been changed` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO` - How it is tested? Please attach test report. `Tested in local` - Is it a performance related change? Please attach the performance test report. `No` - Any additional information to help reviewers in testing this change. `NO` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0413_bug_blocklet_dm_unsafe_row Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2169 commit dd010297c7f7428dc8f42ec1a292b8cdddcc09aa Author: xuchuanyin Date: 2018-04-13T08:18:23Z Fix bugs in mapping blocklet to UnsafeDMStore In BlockletDataMap, carbondata stores DMRow in an array for each blocklet. But currently carbondata accesses the DMRow only by blockletId(0, 1, etc.), which will cause problem since different block can have same blockletId. This PR adds a map to map the blockId#blockletId to array index, carbondata can access the DMRow by blockId and blockletId. ---
[jira] [Commented] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437058#comment-16437058 ] Ajantha Bhat commented on CARBONDATA-2313: -- Attached the design document. > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h1. Support unmanaged carbon table > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h1. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h5. Support unmanaged carbon table (was: h1. Support unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2113 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4436/ ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3777/ ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4994/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3778/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3780/ ---
[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2149#discussion_r181342780 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/unsafe/UnsafeVariableLengthDimensionDataChunkStore.java --- @@ -78,70 +88,96 @@ public UnsafeVariableLengthDimensionDataChunkStore(long totalSize, boolean isInv // start position will be used to store the current data position int startOffset = 0; -// position from where offsets will start -long pointerOffsets = this.dataPointersOffsets; // as first position will be start from 2 byte as data is stored first in the memory block // we need to skip first two bytes this is because first two bytes will be length of the data // which we have to skip -CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(), -dataPageMemoryBlock.getBaseOffset() + pointerOffsets, -CarbonCommonConstants.SHORT_SIZE_IN_BYTE); -// incrementing the pointers as first value is already filled and as we are storing as int -// we need to increment the 4 bytes to set the position of the next value to set -pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE; +int [] dataOffsets = new int[numberOfRows]; +dataOffsets[0] = CarbonCommonConstants.SHORT_SIZE_IN_BYTE; // creating a byte buffer which will wrap the length of the row -// using byte buffer as unsafe will return bytes in little-endian encoding -ByteBuffer buffer = ByteBuffer.allocate(CarbonCommonConstants.SHORT_SIZE_IN_BYTE); -// store length of data -byte[] length = new byte[CarbonCommonConstants.SHORT_SIZE_IN_BYTE]; -// as first offset is already stored, we need to start from the 2nd row in data array +ByteBuffer buffer = ByteBuffer.wrap(data); for (int i = 1; i < numberOfRows; i++) { - // first copy the length of previous row - CarbonUnsafe.getUnsafe().copyMemory(dataPageMemoryBlock.getBaseObject(), - dataPageMemoryBlock.getBaseOffset() + startOffset, length, CarbonUnsafe.BYTE_ARRAY_OFFSET, - CarbonCommonConstants.SHORT_SIZE_IN_BYTE); - buffer.put(length); - buffer.flip(); + buffer.position(startOffset); // so current row position will be // previous row length + 2 bytes used for storing previous row data - startOffset += CarbonCommonConstants.SHORT_SIZE_IN_BYTE + buffer.getShort(); + startOffset += buffer.getShort() + CarbonCommonConstants.SHORT_SIZE_IN_BYTE; // as same byte buffer is used to avoid creating many byte buffer for each row // we need to clear the byte buffer - buffer.clear(); - // now put the offset of current row, here we need to add 2 more bytes as current will - // also have length part so we have to skip length - CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(), - dataPageMemoryBlock.getBaseOffset() + pointerOffsets, - startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE); - // incrementing the pointers as first value is already filled and as we are storing as int - // we need to increment the 4 bytes to set the position of the next value to set - pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE; + dataOffsets[i] = startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE; } - +CarbonUnsafe.getUnsafe().copyMemory(dataOffsets, CarbonUnsafe.INT_ARRAY_OFFSET, +dataPageMemoryBlock.getBaseObject(), +dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets, +dataOffsets.length * CarbonCommonConstants.INT_SIZE_IN_BYTE); } /** * Below method will be used to get the row based on row id passed - * + * Getting the row from unsafe works in below logic + * 1. if inverted index is present then get the row id based on reverse inverted index + * 2. get the current row id data offset + * 3. if it's not a last row- get the next row offset + * Subtract the current row offset + 2 bytes(to skip the data length) with next row offset + * 4. if it's last row + * subtract the current row offset + 2 bytes(to skip the data length) with complete data length * @param rowId * @return row */ @Override public byte[] getRow(int rowId) { +// get the actual row id +rowId = getRowId(rowId); +// get offset of data in unsafe +int currentDataOffset = getOffSet(rowId); +// get the data length +short length = getLength(rowId, currentDataOffset); +// create data array +byte[] data = new byte[length]; +// fill the row data +fillRowInternal(length,
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3781/ ---
[jira] [Created] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
ocean created CARBONDATA-2345: - Summary: "Task failed while writing rows" error occuers when streaming ingest into carbondata table Key: CARBONDATA-2345 URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.1 Reporter: ocean carbondata version:1.3.1。spark:2.2.1 When using spark structured streaming ingest data into carbondata table , such error occurs: warning: there was one deprecation warning; re-run with -deprecation for details qry: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while writing rows at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) at org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) at org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) at org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) ... 8 more [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while writing rows at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) at org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) at org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) at org.apache.carbondata.st
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4996/ ---
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437140#comment-16437140 ] ocean commented on CARBONDATA-2345: --- stream source is parquet file。 reproduce can use this code: val tableName = "profile_carbondata_stream2" val pqtpath = "/test/stream" val warehouse = new File("./warehouse").getCanonicalPath val metastore = new File("./metastore").getCanonicalPath val spark = SparkSession .builder() .appName("StreamExample") .config("spark.sql.warehouse.dir", warehouse) .getOrCreateCarbonSession(warehouse, metastore) val carbonTable = CarbonEnv.getCarbonTable(Some("default"), tableName)(spark) val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) var qry: StreamingQuery = null val userSchema = spark.read.parquet(pqtpath).schema val readSocketDF = spark.readStream.schema(userSchema).parquet(pqtpath) // Write data from socket stream to carbondata file qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("20 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("dbName", "default") .option("tableName", tableName) .outputMode("append") .start() qry.awaitTermination() > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most r
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4997/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4995/ ---
[jira] [Created] (CARBONDATA-2346) Dropping partition failing with null error for Partition table with Pre-Aggregate tables
Praveen M P created CARBONDATA-2346: --- Summary: Dropping partition failing with null error for Partition table with Pre-Aggregate tables Key: CARBONDATA-2346 URL: https://issues.apache.org/jira/browse/CARBONDATA-2346 Project: CarbonData Issue Type: Bug Reporter: Praveen M P Assignee: Praveen M P -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user chandrasaripaka commented on the issue: https://github.com/apache/carbondata/pull/2161 @CarbonDataQA May I know if this has to be fixed from my side..as a part of the pull request, Kindly advise. @xubo245 Also, I dont have access to resolve the conflicts and recommit. Please advise. ---
[GitHub] carbondata pull request #2170: [CARBONDATA-2346] Added fix for NULL error wh...
GitHub user praveenmeenakshi56 opened a pull request: https://github.com/apache/carbondata/pull/2170 [CARBONDATA-2346] Added fix for NULL error while dropping partition with multiple Pre-Aggregate tables Fixed null value issue for childcolumn - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata defect_part Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2170 commit dd3d3d1181847a1930048144740bfa053c878dd8 Author: praveenmeenakshi56 Date: 2018-04-13T10:31:35Z Added fix for error while dropping partition with multiple Pre-Aggregate tables ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3782/ ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2136 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4438/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4998/ ---
[GitHub] carbondata pull request #2171: [wip]test lucene sdv and UT in CI
GitHub user Indhumathi27 opened a pull request: https://github.com/apache/carbondata/pull/2171 [wip]test lucene sdv and UT in CI Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Indhumathi27/carbondata test_ci_luc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2171 commit 46b29dd2103156a5096a04cc72960dd5170e2d9a Author: Indhumathi27 Date: 2018-04-13T06:29:22Z Added UT & SDV Testcases for LuceneDataMap commit 1be3dfd26a96cfa123de403512d4d04121340aed Author: akashrn5 Date: 2018-03-29T14:29:36Z load issue in lucene datamap, make multiple directory based on taskId make the datamap distributable object based on lucene index path written during load Added Lucene Listener and Fixed Show Datamap ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5000/ ---
[jira] [Created] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable
Akash R Nilugal created CARBONDATA-2347: --- Summary: Fix Functional issues in LuceneDatamap in load and query and make stable Key: CARBONDATA-2347 URL: https://issues.apache.org/jira/browse/CARBONDATA-2347 Project: CarbonData Issue Type: Bug Components: data-load, data-query Reporter: Akash R Nilugal Assignee: Akash R Nilugal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3784/ ---
[jira] [Updated] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable
[ https://issues.apache.org/jira/browse/CARBONDATA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2347: Description: 1) The index write location for the lucene is same, and to IndexWriter will take a lock file called write.lock in write location while writing the index files. In carbon loading the writer tasks are launched parallel and those many writers are opened,Since the write.lock file is acquired by one writer, all other tasks will fail and dataloading will fail. 2)in query side, read index path for lucene was in single path, but after load fix, there will be multiple index directories after load functional issues in drop table, drop datamap, show datamap > Fix Functional issues in LuceneDatamap in load and query and make stable > > > Key: CARBONDATA-2347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2347 > Project: CarbonData > Issue Type: Bug > Components: data-load, data-query >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > > 1) The index write location for the lucene is same, and to IndexWriter will > take a lock file called write.lock in write location while writing the index > files. In carbon loading the writer tasks are launched parallel and those > many writers are opened,Since the write.lock file is acquired by one writer, > all other tasks will fail and dataloading will fail. > 2)in query side, read index path for lucene was in single path, but after > load fix, there will be multiple index directories after load > functional issues in drop table, drop datamap, show datamap > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3783/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4999/ ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2168 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4437/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2141 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4439/ ---
[GitHub] carbondata pull request #2172: [CARBONDATA-2333] Block insert overwrite if a...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2172 [CARBONDATA-2333] Block insert overwrite if all partition columns are not present in an⦠â¦y one of the datamaps Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata preagg_partition_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2172.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2172 commit 5c53f555d3a46b1a0961b7eb82e7ba5df628e994 Author: kunal642 Date: 2018-04-11T11:22:08Z block insert overwrite if all partition columns are not present in any one of the datamaps ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5001/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3785/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3787/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5003/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2169 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4440/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2169 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5004/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3788/ ---
[GitHub] carbondata issue #2166: [CARBONDATA-2341] Added Clean up of files for Pre-Ag...
Github user praveenmeenakshi56 commented on the issue: https://github.com/apache/carbondata/pull/2166 retest SDV please ---
[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437283#comment-16437283 ] anubhav tarar commented on CARBONDATA-2318: --- hi i tried again using same steps that you provided but didn't able to replicate the issue step 1: create carbonsession using sparkshell val carbon = SparkSession.builder().config(sc.getConf) .getOrCreateCarbonSession("/home/anubhav/Documents/prestostore/data","/home/anubhav/Documents/prestostore/metadata") step2: copy old carbondata to new store location /home/anubhav/Documents/carbondata/carbondata/examples/spark2/target/store/default to /home/anubhav/Documents/prestostore/data step3:refresh table scala> carbon.sql("refresh table carbonsession_table").show 18/04/13 13:43:48 AUDIT CarbonCreateTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Creating Table with Database name [default] and Table name [carbonsession_table] 18/04/13 13:43:49 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`carbonsession_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 18/04/13 13:43:49 AUDIT CarbonCreateTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Table created with Database name [default] and Table name [carbonsession_table] 18/04/13 13:43:49 AUDIT RefreshCarbonTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Table registration with Database name [default] and Table name [carbonsession_table] is successful. step4:query the new store from presto ./presto-cli-0.187-executable.jar --server localhost:9000 --catalog carbondata presto> show tables from default; Table - carbonsession_table (1 row) Query 20180413_080021_0_vev2q, FINISHED, 1 node Splits: 18 total, 18 done (100.00%) 0:02 [1 rows, 36B] [0 rows/s, 22B/s] > Remove invalid table name(.ds_store) of presto integration > --- > > Key: CARBONDATA-2318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2318 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Reporter: Liang Chen >Priority: Minor > > For presto integration , will get the invalid table name via "show tables > from default" > As below. > presto:default> show tables from default; > Table > > .ds_store > carbon_table > carbontable > partition_bigtable > partition_table > (5 rows) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2169 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4441/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2148 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4442/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5006/ ---
[GitHub] carbondata issue #2172: [CARBONDATA-2333] Block insert overwrite if all part...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2172 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3789/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3790/ ---
[GitHub] carbondata issue #2172: [CARBONDATA-2333] Block insert overwrite if all part...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2172 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5005/ ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181394048 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1642,6 +1642,16 @@ public static final String CARBON_SEARCH_MODE_THREAD_DEFAULT = "3"; + /** + * compression mode used by lucene for index writing + */ + public static final String CARBON_LUCENE_COMPRESSION_MODE = "carbon.lucene.compression.mode"; --- End diff -- what are the options available for this property? ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181396888 --- Diff: datamap/lucene/pom.xml --- @@ -141,6 +141,34 @@ + --- End diff -- I realize that in this pom, it should not depend on carbon-spark2, please modify the dependency in this pom ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181397988 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -173,6 +174,10 @@ object CarbonEnv { .addListener(classOf[AlterTableDropPartitionPostStatusEvent], AlterTableDropPartitionPostStatusListener) .addListener(classOf[AlterTableDropPartitionMetaEvent], AlterTableDropPartitionMetaListener) + .addListener(classOf[AlterTableRenamePreEvent], LuceneRenameTablePreListener) --- End diff -- Is this required? Ideally, lucene datamap is a separate module which should not have intrusive modification in other modules ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181399425 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -69,11 +69,33 @@ case class CarbonCreateDataMapCommand( } dataMapSchema = new DataMapSchema(dataMapName, dmClassName) -if (mainTable != null && -mainTable.isStreamingTable && - !(dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString) - || dataMapSchema.getProviderName -.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) { +if (dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.LUCENEFG.toString) || --- End diff -- I think we should abstract interface for it. We can not add if check for every new datamap added ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2136 retest this please ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2170 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4443/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5007/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3791/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3793/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5009/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2171 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests// ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2136 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3792/ ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2136 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5008/ ---
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437420#comment-16437420 ] Zhichao Zhang commented on CARBONDATA-2345: [~oceaneast], you need to add below option into 'writeStream' block: {code:java} .option(CarbonStreamParser.CARBON_STREAM_PARSER, CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER) {code} for example: {code:java} qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("20 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("dbName", "default") .option("tableName", tableName) .option(CarbonStreamParser.CARBON_STREAM_PARSER, CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER) .outputMode("append") .start() {code} Please try again. > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5010/ ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2168 LGTM ---
[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2168 ---
[jira] [Resolved] (CARBONDATA-2343) Improper filter resolver cause more filter scan on data that could be skipped
[ https://issues.apache.org/jira/browse/CARBONDATA-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2343. -- Resolution: Fixed Fix Version/s: 1.4.0 > Improper filter resolver cause more filter scan on data that could be skipped > - > > Key: CARBONDATA-2343 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2343 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In DataMapChooser, Carbondata try to choose and combine datamap for > expressions. In some scenario, it will generate `TrueConditionalResolverImpl` > to wrap the sub-expression, which will cause data scan on the blocklet which > can be skipped (For `TrueConditionalResolverImpl`, the `TrueFilterExecutor` > will always cause scanning the data even it simply wraps a range expression). -- This message was sent by Atlassian JIRA (v7.6.3#76005)