[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2725


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298762
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1831,6 +1831,16 @@
 
   public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;
 
+  /**
+   * property to be used for specifying the max character limit for 
string/varchar data type till
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298776
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298749
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java
 ---
@@ -40,12 +40,14 @@
 
   int BLOCK_LENGTH = 8;
 
+  int BLOCK_MIN_MAX_FLAG = 9;
+
   // below variables are specific for blockletDataMap
-  int BLOCKLET_INFO_INDEX = 9;
+  int BLOCKLET_INFO_INDEX = 10;
--- End diff --

No it will not have any impact as the change this code is in query part


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218273696
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1831,6 +1831,16 @@
 
   public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;
 
+  /**
+   * property to be used for specifying the max character limit for 
string/varchar data type till
--- End diff --

’specifying the max character limit‘ ->'specifying the max byte limit'


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218273764
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
--- End diff --

unnecessary ’return‘


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218274542
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java
 ---
@@ -40,12 +40,14 @@
 
   int BLOCK_LENGTH = 8;
 
+  int BLOCK_MIN_MAX_FLAG = 9;
+
   // below variables are specific for blockletDataMap
-  int BLOCKLET_INFO_INDEX = 9;
+  int BLOCKLET_INFO_INDEX = 10;
--- End diff --

will this modification affect legacy store?


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139529
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
+   */
+  boolean[] isMinMaxSet();
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139409
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java
 ---
@@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(rawColumnChunk.getPagesCount());
   FilterExecuter filterExecuter = null;
   boolean isExclude = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
 if (rawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(rawColumnChunk.getMaxValues()[i], 
this.filterRangeValues)) {
+  if (!isMinMaxSetForFilterDimension || 
isScanRequired(rawColumnChunk.getMaxValues()[i],
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139494
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
 ---
@@ -133,6 +134,21 @@
 return null;
   }
 
+  @Override
+  public boolean[] isMinMaxSet() {
+BlockletIndex blockletIndex =
+
blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139378
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java
 ---
@@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
 bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
 .isFilterValuesPresentInBlockOrBlocklet(
 dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue(), blockletId);
+dataBlock.getColumnsMinValue(), blockletId, 
dataBlock.isMinMaxSet());
   } else {
 bitSet = this.filterExecuter
 .isScanRequired(dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue());
+dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
--- End diff --

For blocklet min max comparison the footer is getting read in executor and 
the same min max flag information is getting used. The information serialized 
from driver is not getting used for blocklet min max comparison in executor


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139444
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -108,10 +108,13 @@ public BitSetGroup 
applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(dimensionRawColumnChunk.getPagesCount());
   filterValues = dimColumnExecuterInfo.getFilterKeys();
   boolean isDecoded = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
 if (dimensionRawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
-  dimensionRawColumnChunk.getMinValues()[i], 
dimColumnExecuterInfo.getFilterKeys())) {
+  if (!isMinMaxSetForFilterDimension || isScanRequired(
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139467
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java
 ---
@@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex 
minMaxIndex) {
 this.minMaxIndex = minMaxIndex;
   }
 
+  @Override public void write(DataOutput out) throws IOException {
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218138883
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -250,6 +275,42 @@ public static BlockletIndex 
getBlockletIndex(EncodedBlocklet encodedBlocklet,
   }
 
   /**
+   * This method will combine the writeMinMax flag from all the pages. If 
any page for a given
+   * dimension has writeMinMax flag set to false then min max for that 
dimension will nto be
+   * written in any of the page and metadata
+   *
+   * @param blockletMinMaxIndex
+   * @param encodedBlocklet
+   */
+  private static List mergeWriteMinMaxFlagForAllPages(
+  BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet 
encodedBlocklet) {
+Boolean[] mergedWriteMinMaxFlag =
+new Boolean[encodedBlocklet.getNumberOfDimension() + 
encodedBlocklet.getNumberOfMeasure()];
+// set writeMinMax flag to true for all the columns by default and 
then update if stats object
+// has the this flag set to false
+Arrays.fill(mergedWriteMinMaxFlag, true);
+for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
+  for (int pageIndex = 0; pageIndex < 
encodedBlocklet.getNumberOfPages(); pageIndex++) {
+EncodedColumnPage encodedColumnPage =
+
encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
+.get(pageIndex);
+SimpleStatsResult stats = encodedColumnPage.getStats();
+if (!stats.writeMinMax()) {
+  mergedWriteMinMaxFlag[i] = stats.writeMinMax();
+  String columnName = 
encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
+  LOGGER.info("Min Max writing ignored for column " + columnName + 
" from page 0 to "
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218066737
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -250,6 +275,42 @@ public static BlockletIndex 
getBlockletIndex(EncodedBlocklet encodedBlocklet,
   }
 
   /**
+   * This method will combine the writeMinMax flag from all the pages. If 
any page for a given
+   * dimension has writeMinMax flag set to false then min max for that 
dimension will nto be
+   * written in any of the page and metadata
+   *
+   * @param blockletMinMaxIndex
+   * @param encodedBlocklet
+   */
+  private static List mergeWriteMinMaxFlagForAllPages(
+  BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet 
encodedBlocklet) {
+Boolean[] mergedWriteMinMaxFlag =
+new Boolean[encodedBlocklet.getNumberOfDimension() + 
encodedBlocklet.getNumberOfMeasure()];
+// set writeMinMax flag to true for all the columns by default and 
then update if stats object
+// has the this flag set to false
+Arrays.fill(mergedWriteMinMaxFlag, true);
+for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
+  for (int pageIndex = 0; pageIndex < 
encodedBlocklet.getNumberOfPages(); pageIndex++) {
+EncodedColumnPage encodedColumnPage =
+
encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
+.get(pageIndex);
+SimpleStatsResult stats = encodedColumnPage.getStats();
+if (!stats.writeMinMax()) {
+  mergedWriteMinMaxFlag[i] = stats.writeMinMax();
+  String columnName = 
encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
+  LOGGER.info("Min Max writing ignored for column " + columnName + 
" from page 0 to "
--- End diff --

Add generic message saying that `"Min Max writing of blocklet ignored for 
column with name " + columnName + "  `


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218062043
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java
 ---
@@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(rawColumnChunk.getPagesCount());
   FilterExecuter filterExecuter = null;
   boolean isExclude = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
 if (rawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(rawColumnChunk.getMaxValues()[i], 
this.filterRangeValues)) {
+  if (!isMinMaxSetForFilterDimension || 
isScanRequired(rawColumnChunk.getMaxValues()[i],
--- End diff --

Please use the minaxflags from the current page and do the scan require 
check


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218062180
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java
 ---
@@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
 bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
 .isFilterValuesPresentInBlockOrBlocklet(
 dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue(), blockletId);
+dataBlock.getColumnsMinValue(), blockletId, 
dataBlock.isMinMaxSet());
   } else {
 bitSet = this.filterExecuter
 .isScanRequired(dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue());
+dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
--- End diff --

Please use the minaxflags from the current footer and do the scan require 
check


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218061484
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -108,10 +108,13 @@ public BitSetGroup 
applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(dimensionRawColumnChunk.getPagesCount());
   filterValues = dimColumnExecuterInfo.getFilterKeys();
   boolean isDecoded = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
 if (dimensionRawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
-  dimensionRawColumnChunk.getMinValues()[i], 
dimColumnExecuterInfo.getFilterKeys())) {
+  if (!isMinMaxSetForFilterDimension || isScanRequired(
--- End diff --

Please use the minaxflags from the current page and do the scan require 
check


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218060271
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java
 ---
@@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex 
minMaxIndex) {
 this.minMaxIndex = minMaxIndex;
   }
 
+  @Override public void write(DataOutput out) throws IOException {
--- End diff --

Please don't serialize as it is not required to pass to executor


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218059894
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
 ---
@@ -133,6 +134,21 @@
 return null;
   }
 
+  @Override
+  public boolean[] isMinMaxSet() {
+BlockletIndex blockletIndex =
+
blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
--- End diff --

No need to pass this information from driver to executor.


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218048262
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
+   */
+  boolean[] isMinMaxSet();
--- End diff --

Better change name to minMaxFlagArray`


---