[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/146/



---


[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2614
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8217/



---


[GitHub] carbondata issue #2662: [CARBONDATA-2889]Add decoder based fallback mechanis...

2018-08-31 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2662
  
@kumarvishal09 handled comments, please review


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214505238
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/FallbackDecoderBasedColumnPageEncoder.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.page;
+
+import java.nio.ByteBuffer;
+import java.util.concurrent.Callable;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.datastore.columnar.UnBlockIndexer;
+import org.apache.carbondata.core.datastore.compression.CompressorFactory;
+import 
org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.keygenerator.factory.KeyGeneratorFactory;
+import 
org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.format.Encoding;
+
+public class FallbackDecoderBasedColumnPageEncoder implements 
Callable {
--- End diff --

change class name to DecoderBasedFallbackEncoder


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214505230
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/FallbackActualDataBasedColumnPageEncoder.java
 ---
@@ -19,17 +19,17 @@
 import java.util.concurrent.Callable;
 
 import org.apache.carbondata.core.datastore.TableSpec;
-import 
org.apache.carbondata.core.datastore.page.encoding.ColumnPageEncoder;
-import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory;
 import 
org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
+import org.apache.carbondata.core.util.CarbonUtil;
 
 /**
  * Below class will be used to encode column pages for which local 
dictionary was generated
  * but all the pages in blocklet was not encoded with local dictionary.
  * This is required as all the pages of a column in blocklet either it 
will be local dictionary
  * encoded or without local dictionary encoded.
  */
-public class FallbackColumnPageEncoder implements 
Callable {
+public class FallbackActualDataBasedColumnPageEncoder
--- End diff --

Change class name to ActualDataBasedFallbackEncoder


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214505217
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
 ---
@@ -86,7 +92,8 @@
* @param encodedColumnPage
* encoded column page
*/
-  void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) {
+  void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage,
+  LocalDictionaryGenerator localDictionaryGenerator) {
--- End diff --

better to add local dictionary generator in constructor


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214504899
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.columnar;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+public class BlockIndexerStorage {
+
+  /**
+   * It compresses depends up on the sequence numbers.
+   * [1,2,3,4,6,8,10,11,12,13] is translated to [1,4,6,8,10,13] and [0,6]. 
In
+   * first array the start and end of sequential numbers and second array
+   * keeps the indexes of where sequential numbers starts. If there is no
+   * sequential numbers then the same array it returns with empty second
+   * array.
+   *
+   * @param rowIds
+   */
+  public static Map rleEncodeOnRowId(short[] rowIds, 
short[] rowIdPage,
--- End diff --

move this code to carbonutil


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214504900
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.columnar;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+public class BlockIndexerStorage {
+
+  /**
+   * It compresses depends up on the sequence numbers.
+   * [1,2,3,4,6,8,10,11,12,13] is translated to [1,4,6,8,10,13] and [0,6]. 
In
+   * first array the start and end of sequential numbers and second array
+   * keeps the indexes of where sequential numbers starts. If there is no
+   * sequential numbers then the same array it returns with empty second
+   * array.
+   *
+   * @param rowIds
+   */
+  public static Map rleEncodeOnRowId(short[] rowIds, 
short[] rowIdPage,
+  short[] rowIdRlePage) {
+List list = new 
ArrayList(CarbonCommonConstants.CONSTANT_SIZE_TEN);
+List map = new 
ArrayList(CarbonCommonConstants.CONSTANT_SIZE_TEN);
+int k = 0;
+int i = 1;
+for (; i < rowIds.length; i++) {
+  if (rowIds[i] - rowIds[i - 1] == 1) {
+k++;
+  } else {
+if (k > 0) {
+  map.add(((short) list.size()));
+  list.add(rowIds[i - k - 1]);
+  list.add(rowIds[i - 1]);
+} else {
+  list.add(rowIds[i - 1]);
+}
+k = 0;
+  }
+}
+if (k > 0) {
+  map.add(((short) list.size()));
+  list.add(rowIds[i - k - 1]);
+  list.add(rowIds[i - 1]);
+} else {
+  list.add(rowIds[i - 1]);
+}
+int compressionPercentage = (((list.size() + map.size()) * 100) / 
rowIds.length);
+if (compressionPercentage > 70) {
+  rowIdPage = rowIds;
+} else {
+  rowIdPage = convertToArray(list);
+}
+if (rowIds.length == rowIdPage.length) {
+  rowIdRlePage = new short[0];
+} else {
+  rowIdRlePage = convertToArray(map);
+}
+Map rowIdAndRowRleIdPages = new HashMap<>(2);
+rowIdAndRowRleIdPages.put("rowIdPage", rowIdPage);
+rowIdAndRowRleIdPages.put("rowRlePage", rowIdRlePage);
+return rowIdAndRowRleIdPages;
+  }
+
+  public static short[] convertToArray(List list) {
--- End diff --

move this code to carbonutil


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214504804
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java
 ---
@@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage 
encodedTablePage) {
* @param encodedTablePage
* encoded table page
*/
-  private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) {
+  private void addEncodedMeasurePage(EncodedTablePage encodedTablePage,
+  Map localDictionaryGeneratorMap) {
 // for first page create new list
 if (null == encodedMeasureColumnPages) {
   encodedMeasureColumnPages = new ArrayList<>();
   // adding measure pages
   for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) {
-BlockletEncodedColumnPage blockletEncodedColumnPage = new 
BlockletEncodedColumnPage(null);
-
blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i));
+BlockletEncodedColumnPage blockletEncodedColumnPage = new 
BlockletEncodedColumnPage(null,
+Boolean.parseBoolean(CarbonProperties.getInstance()
--- End diff --

Instead of parsing every time for each encodedPage parse once in 
constructor and add in private field ...for measure u can directly pass false 
as local dictionary will not generated for measure page


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214504750
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
 ---
@@ -105,8 +112,15 @@ void addEncodedColumnColumnPage(EncodedColumnPage 
encodedColumnPage) {
   LOGGER.info(
   "Local dictionary Fallback is initiated for column: " + 
this.columnName + " for page:"
   + encodedColumnPageList.size());
-  fallbackFutureQueue.add(fallbackExecutorService
-  .submit(new FallbackColumnPageEncoder(encodedColumnPage, 
encodedColumnPageList.size(;
+  if (isDecoderBasedFallBackEnabled) {
--- End diff --

Move this code to some private metod and pass encodedColumnPage and 
pageIndex


---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214504736
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
 ---
@@ -86,7 +92,8 @@
* @param encodedColumnPage
* encoded column page
*/
-  void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) {
+  void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage,
--- End diff --

Can u Please update the method name addEncodedColumnPage.


---


[GitHub] carbondata pull request #2614: [CARBONDATA-2837] Added MVExample in example ...

2018-08-31 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2614#discussion_r214504621
  
--- Diff: examples/spark2/pom.xml ---
@@ -49,6 +49,11 @@
   carbondata-store-sdk
   ${project.version}
 
+
+  org.apache.carbondata
+  carbondata-mv-core
--- End diff --

Yes, profile added


---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed with 2.3 
http://136.243.101.176:8080/job/ManualApacheCarbonPRBuilder2.1/176/ 


---


[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2662
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/144/



---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/145/



---


[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2662
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8215/



---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8216/



---


[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...

2018-08-31 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2662#discussion_r214426930
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java
 ---
@@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage 
encodedTablePage) {
* @param encodedTablePage
* encoded table page
*/
-  private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) {
+  private void addEncodedMeasurePage(EncodedTablePage encodedTablePage,
+  Map localDictionaryGeneratorMap) {
 // for first page create new list
 if (null == encodedMeasureColumnPages) {
   encodedMeasureColumnPages = new ArrayList<>();
   // adding measure pages
   for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) {
-BlockletEncodedColumnPage blockletEncodedColumnPage = new 
BlockletEncodedColumnPage(null);
-
blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i));
+BlockletEncodedColumnPage blockletEncodedColumnPage = new 
BlockletEncodedColumnPage(null,
+Boolean.parseBoolean(CarbonProperties.getInstance()
--- End diff --

@xuchuanyin i have tested and results i have published, i think we can keep 
it as property and make default by true, as we getting good result with respect 
to memory


---


[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...

2018-08-31 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2662
  
@kumarvishal09 @jackylk i have updated the PR description with performance 
and memory report, i have published the result in mail also, please have a look


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/143/



---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8214/



---


[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2680
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/140/



---


[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2680
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8211/



---


[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/141/



---


[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8212/



---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
@dhatchayani 
You can raise one more to improvise the code at some places:
1. Unify isScanRequired code in all the filter classes using ENUM and flag 
based on min max comparison
2. Create new page wrapper that extends from ColumnPageWrapper and sends 
the actual data for no dictionary primitive type columns


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214371546
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -331,8 +332,18 @@ private BloomQueryModel 
buildQueryModelInternal(CarbonColumn carbonColumn,
   // for dictionary/date columns, convert the surrogate key to bytes
   internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, 
convertedValue);
 } else {
-  // for non dictionary dimensions, is already bytes,
-  internalFilterValue = (byte[]) convertedValue;
+  // for non dictionary dimensions, numeric columns will be of 
original data,
+  // so convert the data to bytes
+  if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) {
+if (convertedValue == null) {
--- End diff --

if possible initialize and store the flag  in constructor and remove the 
check `DataTypeUtil.isPrimitiveColumn` wherever applicable in the below code


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214361007
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -110,8 +112,19 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   boolean isDecoded = false;
   for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
 if (dimensionRawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
-  dimensionRawColumnChunk.getMinValues()[i], 
dimColumnExecuterInfo.getFilterKeys())) {
+  boolean scanRequired;
+  // for no dictionary measure column comparison can be done
+  // on the original data as like measure column
+  if 
(DataTypeUtil.isPrimitiveColumn(dimColumnEvaluatorInfo.getDimension().getDataType())
+  && 
!dimColumnEvaluatorInfo.getDimension().hasEncoding(Encoding.DICTIONARY)) {
+scanRequired = 
isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
--- End diff --

You can create a `isPrimitiveNoDictionaryColumn` flag and check 
`DataTypeUtil.isPrimitiveColum` in the constructor. This will avoid the check 
for every page. Do this for all the filters


---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
@ravipesala please retrigger 2.3 build,test cases issues are fixed


---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8210/



---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/139/



---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214356896
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
 ---
@@ -239,12 +239,25 @@ private boolean isEncodedWithMeta(DataChunk2 
pageMetadata) {
   protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk 
rawColumnPage,
   ByteBuffer pageData, DataChunk2 pageMetadata, int offset)
   throws IOException, MemoryException {
+List encodings = pageMetadata.getEncoders();
 if (isEncodedWithMeta(pageMetadata)) {
   ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, 
pageData, offset,
   null != rawColumnPage.getLocalDictionary());
   
decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence));
-  return new ColumnPageWrapper(decodedPage, 
rawColumnPage.getLocalDictionary(),
-  isEncodedWithAdaptiveMeta(pageMetadata));
+  int[] invertedIndexes = new int[0];
--- End diff --

add a comment to explain that this scenario is to handle no dictionary 
primitive type columns where inverted index can be created on row id's during 
data load


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214354541
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java 
---
@@ -363,7 +398,16 @@ public EncodedTablePage getEncodedTablePage() {
   columnPageEncoder = encodingFactory.createEncoder(
   spec,
   noDictDimensionPages[noDictIndex]);
-  encodedPage = 
columnPageEncoder.encode(noDictDimensionPages[noDictIndex++]);
+  encodedPage = 
columnPageEncoder.encode(noDictDimensionPages[noDictIndex]);
+  DataType targetDataType =
+  
columnPageEncoder.getTargetDataType(noDictDimensionPages[noDictIndex]);
+  if (null != targetDataType) {
+LOGGER.info("Encoder result ---> Source data type: " + 
noDictDimensionPages[noDictIndex]
--- End diff --

make this logger debug and check for debugenabled


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214352965
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -346,12 +371,21 @@ static ColumnPageCodec 
selectCodecByAlgorithmForDecimal(SimpleStatsResult stats,
   // no effect to use adaptive or delta, use compression only
   return new DirectCompressCodec(stats.getDataType());
 }
+boolean isSort = false;
+boolean isInvertedIndex = false;
+if (columnSpec instanceof TableSpec.DimensionSpec
+&& columnSpec.getColumnType() != ColumnType.COMPLEX_PRIMITIVE) {
+  isSort = ((TableSpec.DimensionSpec) columnSpec).isInSortColumns();
+  isInvertedIndex = isSort && ((TableSpec.DimensionSpec) 
columnSpec).isDoInvertedIndex();
+}
--- End diff --

Put the above changes in one method as the same code is used in above 
places also and then call this method while creating the encoding type


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214351650
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java ---
@@ -91,6 +92,30 @@ private void addMeasures(List measures) {
 }
   }
 
+  /**
+   * No dictionary and complex dimensions of the table
+   *
+   * @return
+   */
+  public DimensionSpec[] getNoDictAndComplexDimensions() {
+List noDicOrCompIndexes = new 
ArrayList<>(dimensionSpec.length);
+int noDicCount = 0;
+for (int i = 0; i < dimensionSpec.length; i++) {
+  if (dimensionSpec[i].getColumnType() == ColumnType.PLAIN_VALUE
+  || dimensionSpec[i].getColumnType() == 
ColumnType.COMPLEX_PRIMITIVE
+  || dimensionSpec[i].getColumnType() == ColumnType.COMPLEX) {
+noDicOrCompIndexes.add(i);
+noDicCount++;
+  }
+}
+
+DimensionSpec[] dims = new DimensionSpec[noDicCount];
+for (int i = 0; i < dims.length; i++) {
+  dims[i] = dimensionSpec[noDicOrCompIndexes.get(i)];
+}
+return dims;
--- End diff --

Avoid the below for loop in this method


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/138/



---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8209/



---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214338168
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -375,6 +454,47 @@ public void 
writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row,
 outputStream.write(rowBuffer.array(), 0, packSize);
   }
 
+  /**
+   * Write the data to stream
+   *
+   * @param data
+   * @param outputStream
+   * @param idx
+   * @throws IOException
+   */
+  private void writeDataToStream(Object data, DataOutputStream 
outputStream, int idx)
+  throws IOException {
+DataType dataType = noDicSortDataTypes[idx];
+if (null == data) {
+  outputStream.writeBoolean(false);
+  return;
--- End diff --

do not use return statement instead use the if else block wisely


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214336180
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -224,10 +237,15 @@ public IntermediateSortTempRow 
readWithNoSortFieldConvert(
 
 // read no-dict & sort data
 for (int idx = 0; idx < this.noDictSortDimCnt; idx++) {
-  short len = inputStream.readShort();
-  byte[] bytes = new byte[len];
-  inputStream.readFully(bytes);
-  noDictSortDims[idx] = bytes;
+  // for no dict measure column get the original data
+  if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) {
+noDictSortDims[idx] = readDataFromStream(inputStream, idx);
+  } else {
+short len = inputStream.readShort();
+byte[] bytes = new byte[len];
+inputStream.readFully(bytes);
+noDictSortDims[idx] = bytes;
+  }
--- End diff --

Above also there is a similar..refactor the code to one method and call 
from both these places


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341633
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/partition/impl/RawRowComparator.java
 ---
@@ -30,24 +33,39 @@
 public class RawRowComparator implements Comparator {
   private int[] sortColumnIndices;
   private boolean[] isSortColumnNoDict;
+  private DataType[] noDicDataTypes;
 
-  public RawRowComparator(int[] sortColumnIndices, boolean[] 
isSortColumnNoDict) {
+  public RawRowComparator(int[] sortColumnIndices, boolean[] 
isSortColumnNoDict,
+  DataType[] noDicDataTypes) {
 this.sortColumnIndices = sortColumnIndices;
 this.isSortColumnNoDict = isSortColumnNoDict;
+this.noDicDataTypes = noDicDataTypes;
   }
 
   @Override
   public int compare(CarbonRow o1, CarbonRow o2) {
 int diff = 0;
 int i = 0;
+int noDicIdx = 0;
 for (int colIdx : sortColumnIndices) {
   if (isSortColumnNoDict[i]) {
-byte[] colA = (byte[]) o1.getObject(colIdx);
-byte[] colB = (byte[]) o2.getObject(colIdx);
-diff = UnsafeComparer.INSTANCE.compareTo(colA, colB);
-if (diff != 0) {
-  return diff;
+if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[noDicIdx])) {
+  // for no dictionary numeric column get comparator based on the 
data type
+  SerializableComparator comparator = 
org.apache.carbondata.core.util.comparator.Comparator
--- End diff --

increment `noDicIdx` in if block and remove from method end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341135
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/IntermediateSortTempRowComparator.java
 ---
@@ -45,18 +52,31 @@ public int compare(IntermediateSortTempRow rowA, 
IntermediateSortTempRow rowB) {
 int diff = 0;
 int dictIndex = 0;
 int nonDictIndex = 0;
+int noDicTypeIdx = 0;
 
 for (boolean isNoDictionary : isSortColumnNoDictionary) {
 
   if (isNoDictionary) {
-byte[] byteArr1 = rowA.getNoDictSortDims()[nonDictIndex];
-byte[] byteArr2 = rowB.getNoDictSortDims()[nonDictIndex];
-nonDictIndex++;
+if 
(DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[noDicTypeIdx])) {
+  // use data types based comparator for the no dictionary measure 
columns
+  SerializableComparator comparator = 
org.apache.carbondata.core.util.comparator.Comparator
--- End diff --

Increment the no dictionary type index here `noDicTypeIdx` in if block and 
not at the end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341442
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/NewRowComparator.java
 ---
@@ -43,15 +53,31 @@ public NewRowComparator(boolean[] 
noDictionarySortColumnMaping) {
   public int compare(Object[] rowA, Object[] rowB) {
 int diff = 0;
 int index = 0;
+int dataTypeIdx = 0;
+int noDicSortIdx = 0;
 
-for (boolean isNoDictionary : noDictionarySortColumnMaping) {
-  if (isNoDictionary) {
-byte[] byteArr1 = (byte[]) rowA[index];
-byte[] byteArr2 = (byte[]) rowB[index];
+for (int i = 0; i < noDicDimColMapping.length; i++) {
+  if (noDicDimColMapping[i]) {
+if (noDicSortColumnMapping[noDicSortIdx++]) {
+  if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[dataTypeIdx])) 
{
+// use data types based comparator for the no dictionary 
measure columns
+SerializableComparator comparator =
--- End diff --

increment `dataTypeIdx` in if block and remove from method end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214337384
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -359,9 +433,14 @@ public void 
writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row,
 
 // write no-dict & sort
 for (int idx = 0; idx < this.noDictSortDimCnt; idx++) {
-  byte[] bytes = (byte[]) row[this.noDictSortDimIdx[idx]];
-  outputStream.writeShort(bytes.length);
-  outputStream.write(bytes);
+  if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) {
--- End diff --

I can see that at multiple places for every row 
DataTypeUtil.isPrimitiveColumn is getting used. Please check the load 
performance impact of this


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341815
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/comparator/UnsafeRowComparator.java
 ---
@@ -60,26 +64,50 @@ public int compare(UnsafeCarbonRow rowL, Object 
baseObjectL, UnsafeCarbonRow row
   if (isNoDictionary) {
 short lengthA = CarbonUnsafe.getUnsafe().getShort(baseObjectL,
 rowA + dictSizeInMemory + sizeInNonDictPartA);
-byte[] byteArr1 = new byte[lengthA];
 sizeInNonDictPartA += 2;
-CarbonUnsafe.getUnsafe()
-.copyMemory(baseObjectL, rowA + dictSizeInMemory + 
sizeInNonDictPartA,
-byteArr1, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthA);
-sizeInNonDictPartA += lengthA;
-
 short lengthB = CarbonUnsafe.getUnsafe().getShort(baseObjectR,
 rowB + dictSizeInMemory + sizeInNonDictPartB);
-byte[] byteArr2 = new byte[lengthB];
 sizeInNonDictPartB += 2;
-CarbonUnsafe.getUnsafe()
-.copyMemory(baseObjectR, rowB + dictSizeInMemory + 
sizeInNonDictPartB,
-byteArr2, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthB);
-sizeInNonDictPartB += lengthB;
+DataType dataType = 
tableFieldStat.getNoDicSortDataType()[noDicSortIdx];
+if (DataTypeUtil.isPrimitiveColumn(dataType)) {
+  Object data1 = null;
--- End diff --

increment `noDicSortIdx` in if block and remove from method end


---


[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2638
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/136/



---


[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2638
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8207/



---


[GitHub] carbondata issue #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2672
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/135/



---


[GitHub] carbondata issue #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2672
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8206/



---


[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2680
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8205/



---


[GitHub] carbondata issue #2679: WIP: [CARBONDATA-2904] Support minmax datamap for ex...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2679
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/133/



---


[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2680
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/134/



---


[GitHub] carbondata issue #2679: WIP: [CARBONDATA-2904] Support minmax datamap for ex...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2679
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8204/



---


[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...

2018-08-31 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2638
  
retest this please


---


[GitHub] carbondata pull request #2676: [CARBONDATA-2902][DataMap] Fix showing negati...

2018-08-31 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2676#discussion_r214320712
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/profiler/TablePruningInfo.java ---
@@ -99,4 +107,44 @@ public String toString() {
 }
 return builder.toString();
   }
+
+  /**
+   * when CACHE_LEVEL = BLOCK or carbon data file is LegacyStore
+   * only show pruned result size of datamaps in block/blocklet level
+   */
+  private String getHitInfoAfterPruning() {
+StringBuilder builder = new StringBuilder();
+builder
+.append(" - total blocks: ").append(totalBlocklets).append("\n")
+.append(" - filter: ").append(filterStatement).append("\n");
+if (defaultDataMap != null) {
+  builder
+  .append(" - pruned by Main DataMap").append("\n")
+  .append("- hit blocks: 
").append(numBlockletsAfterDefaultPruning).append("\n");
--- End diff --

It is better to unified with "blocklet case", you can change to use "hit 
blocklets" in `getSkipBlockletInfoAfterPruning`


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/132/



---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8203/



---


[GitHub] carbondata issue #2644: [CARBONDATA-2853] Implement file-level min/max index...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2644
  
@QiangCai In General I can see that you put empty lines at many places 
in the code. Please remove those empty lines everywhere and add some code 
comments for better understanding


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214310465
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -96,14 +96,35 @@ private static FileFooter3 
getFileFooter3(List infoList,
 return footer;
   }
 
-  public static BlockletIndex getBlockletIndex(
-  org.apache.carbondata.core.metadata.blocklet.index.BlockletIndex 
info) {
+  public static 
org.apache.carbondata.core.metadata.blocklet.index.BlockletMinMaxIndex
+  convertExternalMinMaxIndex(BlockletMinMaxIndex minMaxIndex) {
--- End diff --

please add a method comment to explain what is meaning of 
convertExternalMinMaxIndex


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214303472
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
+
+  private CarbonTable carbonTable;
+
+  private AbsoluteTableIdentifier identifier;
--- End diff --

If carbonTable is getting stored then no need to store identifier...you can 
get it from carbontable


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214307411
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
+
+  private CarbonTable carbonTable;
+
+  private AbsoluteTableIdentifier identifier;
+
+  private FilterExecuter filterExecuter;
+
+  public StreamDataMap(CarbonTable carbonTable) {
+this.carbonTable = carbonTable;
+this.identifier = carbonTable.getAbsoluteTableIdentifier();
+  }
+
+  public void init(FilterResolverIntf filterExp) {
+if (filterExp != null) {
+
+  List minMaxCacheColumns = new ArrayList<>();
+  for (CarbonDimension dimension : carbonTable.getDimensions()) {
+if (!dimension.isComplex()) {
+  minMaxCacheColumns.add(dimension);
+}
+  }
+  minMaxCacheColumns.addAll(carbonTable.getMeasures());
+
+  List listOfColumns =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  int[] columnCardinality = new int[listOfColumns.size()];
+  for (int index = 0; index < columnCardinality.length; index++) {
+columnCardinality[index] = Integer.MAX_VALUE;
+  }
+
+  SegmentProperties segmentProperties =
+  new SegmentProperties(listOfColumns, columnCardinality);
+
+  filterExecuter = FilterUtil.getFilterExecuterTree(
+  filterExp, segmentProperties, null, minMaxCacheColumns);
+}
+  }
+
+  public List prune(List segments) throws IOException 
{
+if (filterExecuter == null) {
+  return listAllStreamFiles(segments, false);
+} else {
+  List streamFileList = new ArrayList<>();
+  for (StreamFile streamFile : listAllStreamFiles(segments, true)) {
+if (isScanRequire(streamFile)) {
+  streamFileList.add(streamFile);
+  streamFile.setMinMaxIndex(null);
+}
+  }
+  return streamFileList;
+}
+  }
+
+  private boolean isScanRequire(StreamFile streamFile) {
+// backward compatibility, old stream file without min/max index
+if (streamFile.getMinMaxIndex() == null) {
+  return true;
+}
+
+byte[][] maxValue = streamFile.getMinMaxIndex().getMaxValues();
+byte[][] minValue = 

[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214313170
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala
 ---
@@ -205,8 +205,9 @@ class StreamHandoffRDD[K, V](
 segmentList.add(Segment.toSegment(handOffSegmentId, null))
 val splits = inputFormat.getSplitsOfStreaming(
   job,
-  
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier,
-  segmentList
+  segmentList,
+  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+  null
--- End diff --

Once you add the overloaded method as explained in above comment you can 
call the method with 3 arguments from here


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214311953
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext 
job, CarbonTable carbonTabl
   /**
* use file list in .carbonindex file to get the split of streaming.
*/
-  public List getSplitsOfStreaming(JobContext job, 
AbsoluteTableIdentifier identifier,
-  List streamSegments) throws IOException {
+  public List getSplitsOfStreaming(JobContext job, 
List streamSegments,
+  CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) 
throws IOException {
 List splits = new ArrayList();
 if (streamSegments != null && !streamSegments.isEmpty()) {
   numStreamSegments = streamSegments.size();
   long minSize = Math.max(getFormatMinSplitSize(), 
getMinSplitSize(job));
   long maxSize = getMaxSplitSize(job);
-  for (Segment segment : streamSegments) {
-String segmentDir =
-CarbonTablePath.getSegmentPath(identifier.getTablePath(), 
segment.getSegmentNo());
-FileFactory.FileType fileType = 
FileFactory.getFileType(segmentDir);
-if (FileFactory.isFileExist(segmentDir, fileType)) {
-  SegmentIndexFileStore segmentIndexFileStore = new 
SegmentIndexFileStore();
-  segmentIndexFileStore.readAllIIndexOfSegment(segmentDir);
-  Map carbonIndexMap = 
segmentIndexFileStore.getCarbonIndexMap();
-  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
-  for (byte[] fileData : carbonIndexMap.values()) {
-indexReader.openThriftReader(fileData);
-try {
-  // map block index
-  while (indexReader.hasNext()) {
-BlockIndex blockIndex = indexReader.readBlockIndexInfo();
-String filePath = segmentDir + File.separator + 
blockIndex.getFile_name();
-Path path = new Path(filePath);
-long length = blockIndex.getFile_size();
-if (length != 0) {
-  BlockLocation[] blkLocations;
-  FileSystem fs = FileFactory.getFileSystem(path);
-  FileStatus file = fs.getFileStatus(path);
-  blkLocations = fs.getFileBlockLocations(path, 0, length);
-  long blockSize = file.getBlockSize();
-  long splitSize = computeSplitSize(blockSize, minSize, 
maxSize);
-  long bytesRemaining = length;
-  while (((double) bytesRemaining) / splitSize > 1.1) {
-int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
-splits.add(makeSplit(segment.getSegmentNo(), path, 
length - bytesRemaining,
-splitSize, blkLocations[blkIndex].getHosts(),
-blkLocations[blkIndex].getCachedHosts(), 
FileFormat.ROW_V1));
-bytesRemaining -= splitSize;
-  }
-  if (bytesRemaining != 0) {
-int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
-splits.add(makeSplit(segment.getSegmentNo(), path, 
length - bytesRemaining,
-bytesRemaining, blkLocations[blkIndex].getHosts(),
-blkLocations[blkIndex].getCachedHosts(), 
FileFormat.ROW_V1));
-  }
-} else {
-  //Create empty hosts array for zero length files
-  splits.add(makeSplit(segment.getSegmentNo(), path, 0, 
length, new String[0],
-  FileFormat.ROW_V1));
-}
-  }
-} finally {
-  indexReader.closeThriftReader();
+
+  if (filterResolverIntf == null) {
+if (carbonTable != null) {
+  Expression filter = getFilterPredicates(job.getConfiguration());
+  if (filter != null) {
+carbonTable.processFilterExpression(filter, null, null);
+filterResolverIntf = carbonTable.resolveFilter(filter);
+  }
+}
+  }
+  StreamDataMap streamDataMap =
+  DataMapStoreManager.getInstance().getStreamDataMap(carbonTable);
+  streamDataMap.init(filterResolverIntf);
+  List streamFiles = streamDataMap.prune(streamSegments);
+  for (StreamFile streamFile : streamFiles) {
+if (FileFactory.isFileExist(streamFile.getFilePath())) {
+  Path path = new Path(streamFile.getFilePath());
+  long length = streamFile.getFileSize();
+

[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214311329
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext 
job, CarbonTable carbonTabl
   /**
* use file list in .carbonindex file to get the split of streaming.
*/
-  public List getSplitsOfStreaming(JobContext job, 
AbsoluteTableIdentifier identifier,
-  List streamSegments) throws IOException {
+  public List getSplitsOfStreaming(JobContext job, 
List streamSegments,
+  CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) 
throws IOException {
--- End diff --

You can write an overloaded method for getSplitsOfStreaming. One which 
accepts 3 parameters and one with 4 parameters.
1.  getSplitsOfStreaming(JobContext job, AbsoluteTableIdentifier 
identifier,List streamSegments)
-- From this method you can the other method and pass null as the 4th 
argument. This will avoid passing null at all places above.
2. getSplitsOfStreaming(JobContext job, List streamSegments, 
CarbonTable carbonTable, FilterResolverIntf filterResolverIntf)


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214305126
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
--- End diff --

Please check the feasibility if we can extend DataMap interface and 
implement all its method to keep it similar like BlockDataMap. I think it 
should be feasible


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214316607
  
--- Diff: 
streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java
 ---
@@ -212,9 +271,13 @@ private void initializeAtFirstRow() throws 
IOException, InterruptedException {
 byte[] col = (byte[]) columnValue;
 output.writeShort(col.length);
 output.writeBytes(col);
+dimensionStatsCollectors[dimCount].update(col);
   } else {
 output.writeInt((int) columnValue);
+
dimensionStatsCollectors[dimCount].update(ByteUtil.toBytes((int) columnValue));
--- End diff --

For min/max comparison you are converting from Int to byte array for all 
the rows. This can impact the writing performance. Instead you can typecast 
into Int and do the comparison. After all the data is loaded then at the end 
you can convert all the values into byte array based on datatype. At that time 
it will be only one conversion for the final min/max values


---


[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2672#discussion_r214318648
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/InputProcessorStepWithNoConverterImpl.java
 ---
@@ -64,10 +63,13 @@
 
   private Map dataFieldsWithComplexDataType;
 
+  private short sdkUserCore;
--- End diff --

done.


---


[GitHub] carbondata pull request #2676: [CARBONDATA-2902][DataMap] Fix showing negati...

2018-08-31 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2676#discussion_r214317180
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/profiler/ExplainCollector.java ---
@@ -125,6 +125,13 @@ public static void addTotalBlocklets(int numBlocklets) 
{
 }
   }
 
+  public static void setDefaultDMBlockLevel(boolean isBlockLevel) {
--- End diff --

please add comment


---


[GitHub] carbondata pull request #2680: [CARBONDATA-2905] Set stream property for str...

2018-08-31 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/2680

[CARBONDATA-2905] Set stream property for streaming table

For streaming table with table property "streaming"="true", we should allow 
set the streaming table property to false. After setting to false, only batch 
segments will be queried.

 - [X] Any interfaces changed?
 No
 - [X] Any backward compatibility impacted?
 No
 - [X] Document update required?
No
 - [X] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
 rerun all test  
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata 
set_stream_property

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2680


commit dec6a565dd9ac7872c1baf03058a83a9cdae3ee5
Author: Jacky Li 
Date:   2018-08-31T10:51:25Z

set stream property




---


[jira] [Created] (CARBONDATA-2905) Should allow set stream property on streaming table

2018-08-31 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-2905:


 Summary: Should allow set stream property on streaming table
 Key: CARBONDATA-2905
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2905
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li
Assignee: Jacky Li
 Fix For: 1.5.0


For streaming table with table property "streaming"="true", we should allow set 
the streaming table property to false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2672#discussion_r214314596
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 ---
@@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() {
 
 private CarbonOutputIteratorWrapper[] iterators;
 
-private int counter;
+private AtomicLong counter;
--- End diff --

done


---


[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2672#discussion_r214314587
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 ---
@@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() {
 
 private CarbonOutputIteratorWrapper[] iterators;
 
-private int counter;
+private AtomicLong counter;
 
 CarbonMultiRecordWriter(CarbonOutputIteratorWrapper[] iterators,
 DataLoadExecutor dataLoadExecutor, CarbonLoadModel loadModel, 
Future future,
 ExecutorService executorService) {
   super(null, dataLoadExecutor, loadModel, future, executorService);
   this.iterators = iterators;
+  counter = new AtomicLong(0);
 }
 
-@Override public synchronized void write(NullWritable aVoid, 
ObjectArrayWritable objects)
+@Override public void write(NullWritable aVoid, ObjectArrayWritable 
objects)
 throws InterruptedException {
-  iterators[counter].write(objects.get());
-  if (++counter == iterators.length) {
-//round robin reset
-counter = 0;
+  int hash = (int) (counter.incrementAndGet() % iterators.length);
--- End diff --

done


---


[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance

2018-08-31 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2672#discussion_r214313919
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 ---
@@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() {
 
 private CarbonOutputIteratorWrapper[] iterators;
 
-private int counter;
+private AtomicLong counter;
 
 CarbonMultiRecordWriter(CarbonOutputIteratorWrapper[] iterators,
 DataLoadExecutor dataLoadExecutor, CarbonLoadModel loadModel, 
Future future,
 ExecutorService executorService) {
   super(null, dataLoadExecutor, loadModel, future, executorService);
   this.iterators = iterators;
+  counter = new AtomicLong(0);
 }
 
-@Override public synchronized void write(NullWritable aVoid, 
ObjectArrayWritable objects)
+@Override public void write(NullWritable aVoid, ObjectArrayWritable 
objects)
 throws InterruptedException {
-  iterators[counter].write(objects.get());
-  if (++counter == iterators.length) {
-//round robin reset
-counter = 0;
+  int hash = (int) (counter.incrementAndGet() % iterators.length);
--- End diff --

If makes an integer and write called for more than INT_MAX records, it will 
give negative results,
So, keeping long is enough for very huge record. hence long.

But always long % int will be within int. so a safe type cast.


https://stackoverflow.com/questions/7262133/will-a-long-int-will-always-fit-into-an-int



---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread dhatchayani
Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214313592
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -240,8 +258,44 @@ public IntermediateSortTempRow 
readWithNoSortFieldConvert(
 return new IntermediateSortTempRow(dictSortDims, 
noDictSortDims,measure);
   }
 
+  /**
+   * Read the data from the stream
+   *
+   * @param inputStream
+   * @param idx
+   * @return
+   * @throws IOException
+   */
+  private Object readDataFromStream(DataInputStream inputStream, int idx) 
throws IOException {
--- End diff --

For measures, it will always be packed/unpacked to/from bytebuffer


---


[GitHub] carbondata pull request #2679: WIP: [CARBONDATA-2904] Support minmax datamap...

2018-08-31 Thread xuchuanyin
GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2679

WIP: [CARBONDATA-2904] Support minmax datamap for external format table

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata ef_index_dm_minmax

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2679


commit 8c0e84804c266c56cada024384d9ab2eaa89e9f2
Author: xuchuanyin 
Date:   2018-08-20T01:38:12Z

Support build file leve index for external format table

+ support directly generate file level index
+ support create and generate file index on existing data
+ We will flatten the input files recursively and remove the duplicated
input files in one load

The folder structure of the index file looks like below:

${datamap_name}/${segment_name}/File_level_${fact_file1_path_with_base64_encoding}/${column_name}.bloomindex
 
../File_level_${fact_file2_path_with_base64_encoding}/${column_name}.bloomindex

Note that in this commit, the index datamap is not used during query.

commit 30a861a92ff53df8befd14ee48b7a37499ab7c96
Author: xuchuanyin 
Date:   2018-08-25T10:49:43Z

Support query external format using bloomfilter datamaps

support query external format using bloomfilter datamap

commit 0664a1abd19e97cbe09920635a00619c945f0a20
Author: xuchuanyin 
Date:   2018-08-28T06:17:51Z

rename path for minmax datamap

commit f29ec1d80acea6fcb85a55ab37c02847e9282e5b
Author: xuchuanyin 
Date:   2018-08-29T12:26:38Z

Fix bugs in MinMaxDataMap

make minmax datamap useable and add more tests for it




---


[jira] [Created] (CARBONDATA-2904) Support minmax datamap for external format

2018-08-31 Thread xuchuanyin (JIRA)
xuchuanyin created CARBONDATA-2904:
--

 Summary: Support minmax datamap for external format
 Key: CARBONDATA-2904
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2904
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xuchuanyin
Assignee: xuchuanyin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8202/



---


[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/131/



---


[GitHub] carbondata pull request #2675: [CARBONDATA-2901] Fixed JVM crash in Load sce...

2018-08-31 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2675#discussion_r214306748
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
@@ -140,19 +143,25 @@ public void initialize() throws MemoryException, 
CarbonSortKeyAndGroupByExceptio
 semaphore = new Semaphore(parameters.getNumberOfCores());
   }
 
-  private UnsafeCarbonRowPage createUnsafeRowPage()
-  throws MemoryException, CarbonSortKeyAndGroupByException {
-MemoryBlock baseBlock =
-UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, 
inMemoryChunkSize);
-boolean isMemoryAvailable =
-
UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size());
-if (isMemoryAvailable) {
-  
UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size());
-} else {
-  // merge and spill in-memory pages to disk if memory is not enough
-  unsafeInMemoryIntermediateFileMerger.tryTriggerInmemoryMerging(true);
+  private UnsafeCarbonRowPage createUnsafeRowPage() {
+try {
+  MemoryBlock baseBlock =
+  UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, 
inMemoryChunkSize);
+  boolean isMemoryAvailable =
+  
UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size());
+  if (isMemoryAvailable) {
+
UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size());
+  } else {
+// merge and spill in-memory pages to disk if memory is not enough
+
unsafeInMemoryIntermediateFileMerger.tryTriggerInmemoryMerging(true);
+  }
+  return new UnsafeCarbonRowPage(tableFieldStat, baseBlock, 
!isMemoryAvailable, taskId);
+} catch (MemoryException | CarbonSortKeyAndGroupByException e) {
+  // This will set rowPage reference to null. If not set, other 
threads will use same reference.
+  // As handlePreviousPage() free the rowPage.
+  // If not set to null, rowPage will be accessed again after free by 
other thread.
+  return null;
--- End diff --

Issue came because of this itself. Throwing exception will not set rowPage 
reference to null. So, other thread will access this rowPage. But rowPage was 
already freed from previous thread. hence jvm crash


---


[GitHub] carbondata issue #2673: [WIP] Test Carbonstore

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2673
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/130/



---


[GitHub] carbondata issue #2673: [WIP] Test Carbonstore

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2673
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8201/



---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build failed with 2.3 
http://136.243.101.176:8080/job/ManualApacheCarbonPRBuilder2.1/175/


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2671


---


[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2676
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/128/



---


[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2676
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8199/



---


[GitHub] carbondata pull request #2677: [CARBONDATA-2903] Fix compiler warning

2018-08-31 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/2677

[CARBONDATA-2903] Fix compiler warning

When build using mvn, there are some compiler warnings. They are fixed in 
this PR.

 - [X] Any interfaces changed?
 No
 - [X] Any backward compatibility impacted?
 No
 - [X] Document update required?
No
 - [X] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   rerun all tests
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata remove_warning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2677


commit b16c030886ea330ac0ca521fb33f4c265ee26152
Author: Jacky Li 
Date:   2018-08-31T08:07:40Z

remove warning




---


[jira] [Created] (CARBONDATA-2903) Fix compiler warnings

2018-08-31 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-2903:


 Summary: Fix compiler warnings
 Key: CARBONDATA-2903
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2903
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li
 Fix For: 1.5.0, 1.4.2


When build using mvn, there are some compiler warnings. They should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2661: [CARBONDATA-2888] Support multi level subfold...

2018-08-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2661#discussion_r214273342
  
--- Diff: 
integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala
 ---
@@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends 
Rule[LogicalPlan] {
   fileIndex
 }
   }
+
+  /**
+   * Get datafolders recursively
+   */
+  private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = {
+val files = carbonFile.listFiles()
+var folders: Seq[CarbonFile] = Seq()
+files.foreach { f =>
+  if (f.isDirectory) {
+val files = f.listFiles()
+if (files.nonEmpty && !files(0).isDirectory) {
+  folders = Seq(f) ++ folders
+} else {
+  folders = getDataFolders(f) ++ folders
--- End diff --

This statement can be moved under files.nonEmpty check


---


[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2676
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8198/



---


[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2676
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/127/



---


[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2671
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/126/



---


[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2671
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8197/



---


[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-31 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2642
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/125/



---