[GitHub] carbondata issue #2433: [CARBONDATA-2676]Support local Dictionary for SDK Wr...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2433
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5518/



---


[GitHub] carbondata issue #2433: [CARBONDATA-2676]Support local Dictionary for SDK Wr...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2433
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5544/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5517/



---


[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2425
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5515/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5516/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6690/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6689/



---


[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2403#discussion_r199325848
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/DirectDictionaryFieldConverterImpl.java
 ---
@@ -65,16 +65,22 @@ public DirectDictionaryFieldConverterImpl(DataField 
dataField, String nullFormat
   @Override
   public void convert(CarbonRow row, BadRecordLogHolder logHolder) {
 String value = row.getString(index);
-if (value == null) {
+row.update(convert(value, logHolder), index);
+  }
+
+  @Override public Object convert(Object value, BadRecordLogHolder 
logHolder)
+  throws RuntimeException {
+String literalValue = (String) value;
+if (literalValue == null) {
   logHolder.setReason(
   
CarbonDataProcessorUtil.prepareFailureReason(column.getColName(), 
column.getDataType()));
-  row.update(1, index);
-} else if (value.equals(nullFormat)) {
-  row.update(1, index);
+  return 1;
--- End diff --

Suggest to create a constant for null value (1)


---


[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2403#discussion_r199325844
  
--- Diff: datamap/bloom/pom.xml ---
@@ -23,6 +23,18 @@
   carbondata-core
   ${project.version}
 
+
+  org.apache.carbondata
+  carbondata-processing
+  ${project.version}
+
+
--- End diff --

you have not added compiler scope


---


[GitHub] carbondata pull request #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in b...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2403#discussion_r199325781
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomDataMapWriter.java
 ---
@@ -69,6 +86,27 @@
 indexBloomFilters = new ArrayList<>(indexColumns.size());
 initDataMapFile();
 resetBloomFilters();
+
+keyGenerator = segmentProperties.getDimensionKeyGenerator();
--- End diff --

Can we optimize this instead of passing the whole `SegmentProperties` into 
this Writer class? Please check @ravipesala 


---


[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6688/



---


[jira] [Resolved] (CARBONDATA-2653) Fix bugs in incorrect blocklet number in bloomfilter

2018-06-30 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2653.
--
   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> Fix bugs in incorrect blocklet number in bloomfilter
> 
>
> Key: CARBONDATA-2653
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2653
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Incorrect blocklet number can be found during bloomfilter pruning.
> This is because bloomfilterwriter write a extra blocklet before it finish.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in i...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2408


---


[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2408
  
LGTM


---


[jira] [Resolved] (CARBONDATA-2644) Validation not present for carbon.load.sortMemory.spill.percentage parameter

2018-06-30 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2644.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

> Validation not present for carbon.load.sortMemory.spill.percentage parameter 
> -
>
> Key: CARBONDATA-2644
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2644
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.4.0
>Reporter: wangsen
>Assignee: wangsen
>Priority: Minor
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> For the carbon.load.sortMemory.spill.percentage parameter the user inputs 
> value outside the range of 0-100.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2397


---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
LGTM


---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5543/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5514/



---


[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2408
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6687/



---


[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5513/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5542/



---


[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2425
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6685/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6686/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5541/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5511/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6684/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
retest this please


---


[GitHub] carbondata issue #2425: [CARBONDATA-2637][BloomDataMap] Fix bugs for deferre...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2425
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5540/



---


[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2431
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5509/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5510/



---


[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2416
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6681/



---


[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2431
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6683/



---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5508/



---


[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5539/



---


[GitHub] carbondata issue #2432: [CARBONDATA-2675][32K] Support config long_string_co...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2432
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5538/



---


[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2413
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5507/



---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6682/



---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
@chenliang613 This parameter controls how large sort temp file merge in 
memory


---


[GitHub] carbondata issue #2431: [MINOR] Adding a testcase for stream-table join in S...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2431
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5537/



---


[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2416
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5506/



---


[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

2018-06-30 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2410
  
retest this please


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317103
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java
 ---
@@ -49,6 +50,7 @@
*/
   protected CarbonLRUCache lruCache;
 
+  Map> segInfoCache;
--- End diff --

It's used for reduce the S3 IO, It needs 70*140 IO before, now it only need 
140 IO


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317072
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.executor.impl;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
+import org.apache.carbondata.core.util.CarbonProperties;
+
+/**
+ * It's for SDK carbon reader to execute the detail query
+ */
+public class SDKDetailQueryExecutor extends AbstractQueryExecutor {
--- End diff --

There are some different, get nThread method is different


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317048
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) {
   format.getSplits(new JobContextImpl(job.getConfiguration(), new 
JobID()));
 
   List> readers = new ArrayList<>(splits.size());
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, 
"true");
--- End diff --

not always,  only for SDK reader


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316994
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala
 ---
@@ -71,8 +70,8 @@ object Spark2TestQueryExecutor {
 .getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb)
   if (warehouse.startsWith("hdfs://")) {
 System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse)
-
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE,
-  CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.LOCK_TYPE, 
CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
--- End diff --

OK,done


---


[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...

2018-06-30 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2408
  
retest this please


---


[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6680/



---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316375
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java
 ---
@@ -49,6 +50,7 @@
*/
   protected CarbonLRUCache lruCache;
 
+  Map> segInfoCache;
--- End diff --

What is this used for?


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316362
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.executor.impl;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
+import org.apache.carbondata.core.util.CarbonProperties;
+
+/**
+ * It's for SDK carbon reader to execute the detail query
+ */
+public class SDKDetailQueryExecutor extends AbstractQueryExecutor {
--- End diff --

It seems no different from `SearchModeDetailQueryExecutor`, why not use it 
directly?


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316329
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) {
   format.getSplits(new JobContextImpl(job.getConfiguration(), new 
JobID()));
 
   List> readers = new ArrayList<>(splits.size());
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, 
"true");
--- End diff --

If it is always setting to true, then no need to add this configuration


---


[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316300
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala
 ---
@@ -71,8 +70,8 @@ object Spark2TestQueryExecutor {
 .getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb)
   if (warehouse.startsWith("hdfs://")) {
 System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse)
-
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE,
-  CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.LOCK_TYPE, 
CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
--- End diff --

do not change this


---


[GitHub] carbondata pull request #2399: [CARBONDATA-2629] Support SDK carbon reader r...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2399


---


[jira] [Resolved] (CARBONDATA-2629) SDK carbon reader don't support filter in HDFS and S3

2018-06-30 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2629.
--
   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> SDK carbon reader don't support filter in HDFS and S3
> -
>
> Key: CARBONDATA-2629
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2629
> Project: CarbonData
>  Issue Type: Bug
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> SDK carbon reader don't support filter in HDFS and S3
> Code:
> {code:java}
>EqualToExpression equalToExpression = new EqualToExpression(
> new ColumnExpression("name", DataTypes.STRING),
> new LiteralExpression("robot1", DataTypes.STRING));
> CarbonReader reader = CarbonReader
> .builder(path, "_temp")
> .projection(new String[]{"name", "age"})
> .setAccessKey(args[0])
> .setSecretKey(args[1])
> .filter(equalToExpression)
> .setEndPoint(args[2])
> .build();
> {code}
> Error:
> {code:java}
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Exception in thread "main" java.lang.RuntimeException: Carbon index file not 
> exists.
>   at 
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildTable(CarbonTable.java:249)
>   at 
> org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:184)
>   at 
> org.apache.carbondata.examples.sdk.SDKS3Example.main(SDKS3Example.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2399: [CARBONDATA-2629] Support SDK carbon reader read dat...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2399
  
LGTM


---


[GitHub] carbondata pull request #2432: [CARBONDATA-2675][32K] Support config long_st...

2018-06-30 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2432

[CARBONDATA-2675][32K] Support config long_string_columns when create 
datamap

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 


Create datamap use select statement, but long string column is defined with 
StringType in the result dataframe if this column is selected. This PR allows 
to set long_string_columns property in dmproperties. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata longstr_datamap

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2432


commit 5b7b7bfcce8d33015f6ef47e2723918198b176b3
Author: Manhua 
Date:   2018-06-30T02:53:41Z

support config long_string_columns when create datamap




---


[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2413
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6678/



---


[GitHub] carbondata issue #2416: [CARBONDATA-2660][BloomDataMap] Add test for queryin...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2416
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5536/



---


[GitHub] carbondata pull request #2431: [MINOR] Adding a testcase for stream-table jo...

2018-06-30 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/2431

[MINOR] Adding a testcase for stream-table join in StreamSQL

This PR only adds a testcase for stream-table join in StreamSQL

 - [X] Any interfaces changed?
 No
 - [X] Any backward compatibility impacted?
 No
 - [X] Document update required?
No
 - [X] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
 Yes 
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata stream-join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2431.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2431


commit f6f5789e2ee726ea544ce385219eaae12cf79742
Author: Jacky Li 
Date:   2018-06-26T11:10:35Z

add testcase




---


[jira] [Created] (CARBONDATA-2675) Support config long_string_columns when create datamap

2018-06-30 Thread jiangmanhua (JIRA)
jiangmanhua created CARBONDATA-2675:
---

 Summary: Support config long_string_columns when create datamap
 Key: CARBONDATA-2675
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2675
 Project: CarbonData
  Issue Type: Sub-task
Reporter: jiangmanhua
Assignee: jiangmanhua






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2412: [CARBONDATA-2656] Presto vector stream readers perfo...

2018-06-30 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2412
  
Used the below script to build data:
```
import scala.util.Random
val r = new Random()
val df = spark.sparkContext.parallelize(1 to 10).map(x => ("No." + 
r.nextInt(1), "country" + x % 8, "city" + x % 50, x % 300)).toDF("ID", 
"country", "city", "population")
```
Two issues:
1. On presto client, i ran two times as per the below script but get the 
different results:
```
presto:default> select country,sum(population) from carbon_table group by 
country;
 country  |_col1
--+-
 country4 | 18508531250
 country2 | 18758431703
 country0 | 18508717865
 country7 | 18884021774
 country1 | 18633160595
 country5 | 18633480022
 country6 | 18757895175
 country3 | 18883151243
(8 rows)

Query 20180630_041406_4_crn9q, FINISHED, 1 node
Splits: 65 total, 65 done (100.00%)
1:01 [1000M rows, 8.4GB] [16.5M rows/s, 142MB/s]

presto:default> select country,sum(population) from carbon_table group by 
country;
 country  |_col1
--+-
 country4 | 18500014852
 country0 | 1843972
 country5 | 18624989449
 country1 | 18625008398
 country3 | 1887496
 country6 | 18749995166
 country7 | 18874992446
 country2 | 1874687
(8 rows)

Query 20180630_041510_5_crn9q, FINISHED, 1 node
Splits: 65 total, 65 done (100.00%)
0:59 [1000M rows, 8.4GB] [17M rows/s, 146MB/s]
```
2. For aggregation scenarios with 1 billion row data, presto performance is 
much lower than spark, as below: (presto is around 1 mins, spark is around 33 
seconds)
```
scala> benchmark { carbon.sql("select country,sum(population) from 
carbon_table group by country").show}
++---+
| country|sum(population)|
++---+
|country4|1848700|
|country1|18624998800|
|country3|18874998800|
|country7|18874998700|
|country2|18749998800|
|country6|18749998700|
|country5|18624998700|
|country0|1848900|
++---+

33849.999703ms
```


---


[GitHub] carbondata issue #2419: [CARBONDATA-2545] Fix some spell error in CarbonData

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2419
  
LGTM


---


[GitHub] carbondata pull request #2419: [CARBONDATA-2545] Fix some spell error in Car...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2419


---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
retest this please


---


[GitHub] carbondata issue #2394: [CARBONDATA- 2243] Added test case for database and ...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2394
  
Do you find any issue by adding these test case?


---


[GitHub] carbondata pull request #2399: [CARBONDATA-2629] Support SDK carbon reader r...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2399#discussion_r199315517
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/SDKS3Example.java
 ---
@@ -60,13 +63,19 @@ public static void main(String[] args) throws Exception 
{
 }
 writer.close();
 // Read data
+
+EqualToExpression equalToExpression = new EqualToExpression(
--- End diff --

Yes, I am also worried about this exposure, I think it is better to create 
a simple DSL for user to pass the filter expression. For example: 
```
c1 > 3
c1 < 1 and c2 = 'apple'
c1 in (3,4,5)
c1 like ab*
```


---


[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2400#discussion_r199315290
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.strategy
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+
+/**
+ *  Physical plan node for scanning data.
--- End diff --

Please describe whether both `STORED AS CARBONDATA` `USING` will use this 
physical plan?


---


[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2400#discussion_r199315264
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 ---
@@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 supportCodegen && vectorizedReader.toBoolean &&
 cols.forall(_.dataType.isInstanceOf[AtomicType])
   }
+
+  private def createHadoopFSRelation(relation: LogicalRelation) = {
+val sparkSession = relation.relation.sqlContext.sparkSession
+relation.catalogTable match {
+  case Some(catalogTable) =>
+HadoopFsRelation(new CatalogFileIndex(
--- End diff --

If parameter list is long, it is better to add the parameter name also for 
better readability.


---


[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2400#discussion_r199315241
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 ---
@@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 supportCodegen && vectorizedReader.toBoolean &&
 cols.forall(_.dataType.isInstanceOf[AtomicType])
   }
+
+  private def createHadoopFSRelation(relation: LogicalRelation) = {
+val sparkSession = relation.relation.sqlContext.sparkSession
+relation.catalogTable match {
+  case Some(catalogTable) =>
+HadoopFsRelation(new CatalogFileIndex(
--- End diff --

move `new CatalogFileIndex(` to next line


---


[GitHub] carbondata pull request #2400: [HOTFIX] Removed BatchedDataSourceScanExec cl...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2400#discussion_r199315246
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 ---
@@ -673,4 +673,26 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 supportCodegen && vectorizedReader.toBoolean &&
 cols.forall(_.dataType.isInstanceOf[AtomicType])
   }
+
+  private def createHadoopFSRelation(relation: LogicalRelation) = {
+val sparkSession = relation.relation.sqlContext.sparkSession
+relation.catalogTable match {
+  case Some(catalogTable) =>
+HadoopFsRelation(new CatalogFileIndex(
+  sparkSession,
+  catalogTable, relation.relation.sizeInBytes),
+  catalogTable.partitionSchema,
+  catalogTable.schema,
+  catalogTable.bucketSpec,
+  new SparkCarbonTableFormat,
+  catalogTable.storage.properties)(sparkSession)
+  case _ =>
+HadoopFsRelation(new InMemoryFileIndex(sparkSession, Seq.empty, 
Map.empty, None),
--- End diff --

move `new InMemoryFileIndex(sparkSession, Seq.empty, Map.empty, None)` to 
next line


---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
retest this case please


---


[GitHub] carbondata pull request #2423: [CARBONDATA-2530][MV] Fix wrong data displaye...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2423#discussion_r199315149
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -75,6 +77,13 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
 plan
   }
 } else {
+  if (catalog != null && (plan.isInstanceOf[InsertIntoCarbonTable]
+|| plan.isInstanceOf[CarbonLoadDataCommand])) {
+val allSchema = 
catalog.asInstanceOf[SummaryDatasetCatalog].listAllSchema()
+for (schema <- allSchema) {
--- End diff --

use `foreach` instead of `for` which is faster in Scala


---


[GitHub] carbondata pull request #2423: [CARBONDATA-2530][MV] Fix wrong data displaye...

2018-06-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2423#discussion_r199315133
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -75,6 +77,13 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
 plan
   }
 } else {
+  if (catalog != null && (plan.isInstanceOf[InsertIntoCarbonTable]
--- End diff --

please move `(plan.isInstanceOf[InsertIntoCarbonTable]` to next line


---


[GitHub] carbondata issue #2413: [CARBONDATA-2657][BloomDataMap] Fix bugs in loading ...

2018-06-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2413
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5535/



---


[GitHub] carbondata pull request #2407: [CARBONDATA-2646][DataLoad]change the log lev...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2407


---


[GitHub] carbondata issue #2411: [CARBONDATA-2654][Datamap] Optimize output for expla...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2411
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6679/



---


[GitHub] carbondata issue #2407: [CARBONDATA-2646][DataLoad]change the log level whil...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2407
  
LGTM


---


[jira] [Resolved] (CARBONDATA-2635) Support different provider based index datamaps on same column

2018-06-30 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2635.
--
   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> Support different provider based index datamaps on same column
> --
>
> Key: CARBONDATA-2635
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2635
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> It will be wasted to build bloom index on one column more than once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2405: [CARBONDATA-2635][BloomDataMap] Support diffe...

2018-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2405


---


[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

2018-06-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5505/



---


[GitHub] carbondata issue #2405: [CARBONDATA-2635][BloomDataMap] Support different in...

2018-06-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2405
  
LGTM


---