[GitHub] carbondata issue #2433: [CARBONDATA-2676]Support local Dictionary for SDK Wr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2433 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5518/ ---
[GitHub] carbondata issue #2433: [CARBONDATA-2676]Support local Dictionary for SDK Wr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2433 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5544/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2432 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5517/ ---
[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2425 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5515/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5516/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6690/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2432 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6689/ ---
[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2403#discussion_r199325848 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/DirectDictionaryFieldConverterImpl.java --- @@ -65,16 +65,22 @@ public DirectDictionaryFieldConverterImpl(DataField dataField, String nullFormat @Override public void convert(CarbonRow row, BadRecordLogHolder logHolder) { String value = row.getString(index); -if (value == null) { +row.update(convert(value, logHolder), index); + } + + @Override public Object convert(Object value, BadRecordLogHolder logHolder) + throws RuntimeException { +String literalValue = (String) value; +if (literalValue == null) { logHolder.setReason( CarbonDataProcessorUtil.prepareFailureReason(column.getColName(), column.getDataType())); - row.update(1, index); -} else if (value.equals(nullFormat)) { - row.update(1, index); + return 1; --- End diff -- Suggest to create a constant for null value (1) ---
[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2403#discussion_r199325844 --- Diff: datamap/bloom/pom.xml --- @@ -23,6 +23,18 @@ carbondata-core ${project.version} + + org.apache.carbondata + carbondata-processing + ${project.version} + + --- End diff -- you have not added compiler scope ---
[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2403#discussion_r199325781 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomDataMapWriter.java --- @@ -69,6 +86,27 @@ indexBloomFilters = new ArrayList<>(indexColumns.size()); initDataMapFile(); resetBloomFilters(); + +keyGenerator = segmentProperties.getDimensionKeyGenerator(); --- End diff -- Can we optimize this instead of passing the whole `SegmentProperties` into this Writer class? Please check @ravipesala ---
[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6688/ ---
[jira] [Resolved] (CARBONDATA-2653) Fix bugs in incorrect blocklet number in bloomfilter
[ https://issues.apache.org/jira/browse/CARBONDATA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2653. -- Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > Fix bugs in incorrect blocklet number in bloomfilter > > > Key: CARBONDATA-2653 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2653 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.5.0, 1.4.1 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Incorrect blocklet number can be found during bloomfilter pruning. > This is because bloomfilterwriter write a extra blocklet before it finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in i...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2408 ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2408 LGTM ---
[jira] [Resolved] (CARBONDATA-2644) Validation not present for carbon.load.sortMemory.spill.percentage parameter
[ https://issues.apache.org/jira/browse/CARBONDATA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2644. -- Resolution: Fixed Fix Version/s: 1.5.0 > Validation not present for carbon.load.sortMemory.spill.percentage parameter > - > > Key: CARBONDATA-2644 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2644 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.4.0 >Reporter: wangsen >Assignee: wangsen >Priority: Minor > Fix For: 1.5.0, 1.4.1 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > For the carbon.load.sortMemory.spill.percentage parameter the user inputs > value outside the range of 0-100. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2397 ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2397 LGTM ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2432 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5543/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2432 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5514/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6687/ ---
[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5513/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2410 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5542/ ---
[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2425 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6685/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6686/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2432 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5541/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5511/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2432 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6684/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user kevinjmh commented on the issue: https://github.com/apache/carbondata/pull/2432 retest this please ---
[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2425 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5540/ ---
[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2431 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5509/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2432 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5510/ ---
[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2416 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6681/ ---
[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2431 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6683/ ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2397 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5508/ ---
[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2391 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5539/ ---
[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2432 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5538/ ---
[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2413 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5507/ ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2397 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6682/ ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2397 @chenliang613 This parameter controls how large sort temp file merge in memory ---
[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2431 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5537/ ---
[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2416 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5506/ ---
[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2410 retest this please ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199317103 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java --- @@ -49,6 +50,7 @@ */ protected CarbonLRUCache lruCache; + Map> segInfoCache; --- End diff -- It's used for reduce the S3 IO, It needs 70*140 IO before, now it only need 140 IO ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199317072 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.executor.impl; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator; +import org.apache.carbondata.core.util.CarbonProperties; + +/** + * It's for SDK carbon reader to execute the detail query + */ +public class SDKDetailQueryExecutor extends AbstractQueryExecutor { --- End diff -- There are some different, get nThread method is different ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199317048 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) { format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID())); List> readers = new ArrayList<>(splits.size()); + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, "true"); --- End diff -- not always, only for SDK reader ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199316994 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala --- @@ -71,8 +70,8 @@ object Spark2TestQueryExecutor { .getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb) if (warehouse.startsWith("hdfs://")) { System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse) - CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, - CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.LOCK_TYPE, CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS) --- End diff -- OKï¼done ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2408 retest this please ---
[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6680/ ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199316375 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java --- @@ -49,6 +50,7 @@ */ protected CarbonLRUCache lruCache; + Map> segInfoCache; --- End diff -- What is this used for? ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199316362 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.executor.impl; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator; +import org.apache.carbondata.core.util.CarbonProperties; + +/** + * It's for SDK carbon reader to execute the detail query + */ +public class SDKDetailQueryExecutor extends AbstractQueryExecutor { --- End diff -- It seems no different from `SearchModeDetailQueryExecutor`, why not use it directly? ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199316329 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) { format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID())); List> readers = new ArrayList<>(splits.size()); + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, "true"); --- End diff -- If it is always setting to true, then no need to add this configuration ---
[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2391#discussion_r199316300 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala --- @@ -71,8 +70,8 @@ object Spark2TestQueryExecutor { .getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb) if (warehouse.startsWith("hdfs://")) { System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse) - CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, - CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.LOCK_TYPE, CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS) --- End diff -- do not change this ---
[GitHub] carbondata pull request #2399: [CARBONDATA-2629] Support SDK carbon reader r...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2399 ---
[jira] [Resolved] (CARBONDATA-2629) SDK carbon reader don't support filter in HDFS and S3
[ https://issues.apache.org/jira/browse/CARBONDATA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2629. -- Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > SDK carbon reader don't support filter in HDFS and S3 > - > > Key: CARBONDATA-2629 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2629 > Project: CarbonData > Issue Type: Bug >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Fix For: 1.5.0, 1.4.1 > > Time Spent: 5h > Remaining Estimate: 0h > > SDK carbon reader don't support filter in HDFS and S3 > Code: > {code:java} >EqualToExpression equalToExpression = new EqualToExpression( > new ColumnExpression("name", DataTypes.STRING), > new LiteralExpression("robot1", DataTypes.STRING)); > CarbonReader reader = CarbonReader > .builder(path, "_temp") > .projection(new String[]{"name", "age"}) > .setAccessKey(args[0]) > .setSecretKey(args[1]) > .filter(equalToExpression) > .setEndPoint(args[2]) > .build(); > {code} > Error: > {code:java} > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.lang.RuntimeException: Carbon index file not > exists. > at > org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildTable(CarbonTable.java:249) > at > org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:184) > at > org.apache.carbondata.examples.sdk.SDKS3Example.main(SDKS3Example.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2399: [CARBONDATA-2629] Support SDK carbon reader read dat...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2399 LGTM ---
[GitHub] carbondata pull request #2432: [CARBONDATA-2675][32K] Support config long_st...
GitHub user kevinjmh opened a pull request: https://github.com/apache/carbondata/pull/2432 [CARBONDATA-2675][32K] Support config long_string_columns when create datamap Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. Create datamap use select statement, but long string column is defined with StringType in the result dataframe if this column is selected. This PR allows to set long_string_columns property in dmproperties. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinjmh/carbondata longstr_datamap Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2432 commit 5b7b7bfcce8d33015f6ef47e2723918198b176b3 Author: Manhua Date: 2018-06-30T02:53:41Z support config long_string_columns when create datamap ---
[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2413 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6678/ ---
[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2416 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5536/ ---
[GitHub] carbondata pull request #2431: [MINOR] Adding a testcase for stream-table jo...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/2431 [MINOR] Adding a testcase for stream-table join in StreamSQL This PR only adds a testcase for stream-table join in StreamSQL - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Yes - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata stream-join Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2431.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2431 commit f6f5789e2ee726ea544ce385219eaae12cf79742 Author: Jacky Li Date: 2018-06-26T11:10:35Z add testcase ---
[jira] [Created] (CARBONDATA-2675) Support config long_string_columns when create datamap
jiangmanhua created CARBONDATA-2675: --- Summary: Support config long_string_columns when create datamap Key: CARBONDATA-2675 URL: https://issues.apache.org/jira/browse/CARBONDATA-2675 Project: CarbonData Issue Type: Sub-task Reporter: jiangmanhua Assignee: jiangmanhua -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2412: [CARBONDATA-2656] Presto vector stream readers perfo...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2412 Used the below script to build data: ``` import scala.util.Random val r = new Random() val df = spark.sparkContext.parallelize(1 to 10).map(x => ("No." + r.nextInt(1), "country" + x % 8, "city" + x % 50, x % 300)).toDF("ID", "country", "city", "population") ``` Two issues: 1. On presto client, i ran two times as per the below script but get the different results: ``` presto:default> select country,sum(population) from carbon_table group by country; country |_col1 --+- country4 | 18508531250 country2 | 18758431703 country0 | 18508717865 country7 | 18884021774 country1 | 18633160595 country5 | 18633480022 country6 | 18757895175 country3 | 18883151243 (8 rows) Query 20180630_041406_4_crn9q, FINISHED, 1 node Splits: 65 total, 65 done (100.00%) 1:01 [1000M rows, 8.4GB] [16.5M rows/s, 142MB/s] presto:default> select country,sum(population) from carbon_table group by country; country |_col1 --+- country4 | 18500014852 country0 | 1843972 country5 | 18624989449 country1 | 18625008398 country3 | 1887496 country6 | 18749995166 country7 | 18874992446 country2 | 1874687 (8 rows) Query 20180630_041510_5_crn9q, FINISHED, 1 node Splits: 65 total, 65 done (100.00%) 0:59 [1000M rows, 8.4GB] [17M rows/s, 146MB/s] ``` 2. For aggregation scenarios with 1 billion row data, presto performance is much lower than spark, as below: (presto is around 1 mins, spark is around 33 seconds) ``` scala> benchmark { carbon.sql("select country,sum(population) from carbon_table group by country").show} ++---+ | country|sum(population)| ++---+ |country4|1848700| |country1|18624998800| |country3|18874998800| |country7|18874998700| |country2|18749998800| |country6|18749998700| |country5|18624998700| |country0|1848900| ++---+ 33849.999703ms ``` ---
[GitHub] carbondata issue #2419: [CARBONDATA-2545] Fix some spell error in CarbonData
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2419 LGTM ---
[GitHub] carbondata pull request #2419: [CARBONDATA-2545] Fix some spell error in Car...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2419 ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2397 retest this please ---
[GitHub] carbondata issue #2394: [CARBONDATA- 2243] Added test case for database and ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2394 Do you find any issue by adding these test case? ---
[GitHub] carbondata pull request #2399: [CARBONDATA-2629] Support SDK carbon reader r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2399#discussion_r199315517 --- Diff: examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/SDKS3Example.java --- @@ -60,13 +63,19 @@ public static void main(String[] args) throws Exception { } writer.close(); // Read data + +EqualToExpression equalToExpression = new EqualToExpression( --- End diff -- Yes, I am also worried about this exposure, I think it is better to create a simple DSL for user to pass the filter expression. For example: ``` c1 > 3 c1 < 1 and c2 = 'apple' c1 in (3,4,5) c1 like ab* ``` ---
[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2400#discussion_r199315290 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.strategy + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier} +import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder} +import org.apache.spark.sql.catalyst.plans.physical.Partitioning +import org.apache.spark.sql.execution.FileSourceScanExec +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} + +/** + * Physical plan node for scanning data. --- End diff -- Please describe whether both `STORED AS CARBONDATA` `USING` will use this physical plan? ---
[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2400#discussion_r199315264 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { supportCodegen && vectorizedReader.toBoolean && cols.forall(_.dataType.isInstanceOf[AtomicType]) } + + private def createHadoopFSRelation(relation: LogicalRelation) = { +val sparkSession = relation.relation.sqlContext.sparkSession +relation.catalogTable match { + case Some(catalogTable) => +HadoopFsRelation(new CatalogFileIndex( --- End diff -- If parameter list is long, it is better to add the parameter name also for better readability. ---
[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2400#discussion_r199315241 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { supportCodegen && vectorizedReader.toBoolean && cols.forall(_.dataType.isInstanceOf[AtomicType]) } + + private def createHadoopFSRelation(relation: LogicalRelation) = { +val sparkSession = relation.relation.sqlContext.sparkSession +relation.catalogTable match { + case Some(catalogTable) => +HadoopFsRelation(new CatalogFileIndex( --- End diff -- move `new CatalogFileIndex(` to next line ---
[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2400#discussion_r199315246 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { supportCodegen && vectorizedReader.toBoolean && cols.forall(_.dataType.isInstanceOf[AtomicType]) } + + private def createHadoopFSRelation(relation: LogicalRelation) = { +val sparkSession = relation.relation.sqlContext.sparkSession +relation.catalogTable match { + case Some(catalogTable) => +HadoopFsRelation(new CatalogFileIndex( + sparkSession, + catalogTable, relation.relation.sizeInBytes), + catalogTable.partitionSchema, + catalogTable.schema, + catalogTable.bucketSpec, + new SparkCarbonTableFormat, + catalogTable.storage.properties)(sparkSession) + case _ => +HadoopFsRelation(new InMemoryFileIndex(sparkSession, Seq.empty, Map.empty, None), --- End diff -- move `new InMemoryFileIndex(sparkSession, Seq.empty, Map.empty, None)` to next line ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2397 retest this case please ---
[GitHub] carbondata pull request #2423: [CARBONDATA-2530][MV] Fix wrong data displaye...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2423#discussion_r199315149 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -75,6 +77,13 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { plan } } else { + if (catalog != null && (plan.isInstanceOf[InsertIntoCarbonTable] +|| plan.isInstanceOf[CarbonLoadDataCommand])) { +val allSchema = catalog.asInstanceOf[SummaryDatasetCatalog].listAllSchema() +for (schema <- allSchema) { --- End diff -- use `foreach` instead of `for` which is faster in Scala ---
[GitHub] carbondata pull request #2423: [CARBONDATA-2530][MV] Fix wrong data displaye...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2423#discussion_r199315133 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -75,6 +77,13 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { plan } } else { + if (catalog != null && (plan.isInstanceOf[InsertIntoCarbonTable] --- End diff -- please move `(plan.isInstanceOf[InsertIntoCarbonTable]` to next line ---
[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2413 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5535/ ---
[GitHub] carbondata pull request #2407: [CARBONDATA-2646][DataLoad]change the log lev...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2407 ---
[GitHub] carbondata issue #2411: [CARBONDATA-2654][Datamap] Optimize output for expla...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2411 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6679/ ---
[GitHub] carbondata issue #2407: [CARBONDATA-2646][DataLoad]change the log level whil...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2407 LGTM ---
[jira] [Resolved] (CARBONDATA-2635) Support different provider based index datamaps on same column
[ https://issues.apache.org/jira/browse/CARBONDATA-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2635. -- Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > Support different provider based index datamaps on same column > -- > > Key: CARBONDATA-2635 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2635 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.5.0, 1.4.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > It will be wasted to build bloom index on one column more than once -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2405: [CARBONDATA-2635][BloomDataMap] Support diffe...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2405 ---
[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5505/ ---
[GitHub] carbondata issue #2405: [CARBONDATA-2635][BloomDataMap] Support different in...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2405 LGTM ---