[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2725 ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218298762 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1831,6 +1831,16 @@ public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3; + /** + * property to be used for specifying the max character limit for string/varchar data type till --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218298776 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java --- @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde * @return */ BitSetGroup getIndexedData(); + + /** + * Return the array which contains the flag for each column whether the min max for that column + * is written in metadata or not + * + * @return --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218298749 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java --- @@ -40,12 +40,14 @@ int BLOCK_LENGTH = 8; + int BLOCK_MIN_MAX_FLAG = 9; + // below variables are specific for blockletDataMap - int BLOCKLET_INFO_INDEX = 9; + int BLOCKLET_INFO_INDEX = 10; --- End diff -- No it will not have any impact as the change this code is in query part ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218273696 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1831,6 +1831,16 @@ public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3; + /** + * property to be used for specifying the max character limit for string/varchar data type till --- End diff -- âspecifying the max character limitâ ->'specifying the max byte limit' ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218273764 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java --- @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde * @return */ BitSetGroup getIndexedData(); + + /** + * Return the array which contains the flag for each column whether the min max for that column + * is written in metadata or not + * + * @return --- End diff -- unnecessary âreturnâ ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218274542 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java --- @@ -40,12 +40,14 @@ int BLOCK_LENGTH = 8; + int BLOCK_MIN_MAX_FLAG = 9; + // below variables are specific for blockletDataMap - int BLOCKLET_INFO_INDEX = 9; + int BLOCKLET_INFO_INDEX = 10; --- End diff -- will this modification affect legacy store? ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139529 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java --- @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde * @return */ BitSetGroup getIndexedData(); + + /** + * Return the array which contains the flag for each column whether the min max for that column + * is written in metadata or not + * + * @return + */ + boolean[] isMinMaxSet(); --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139409 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java --- @@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, BitSetGroup bitSetGroup = new BitSetGroup(rawColumnChunk.getPagesCount()); FilterExecuter filterExecuter = null; boolean isExclude = false; + boolean isMinMaxSetForFilterDimension = + rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex]; for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) { if (rawColumnChunk.getMaxValues() != null) { - if (isScanRequired(rawColumnChunk.getMaxValues()[i], this.filterRangeValues)) { + if (!isMinMaxSetForFilterDimension || isScanRequired(rawColumnChunk.getMaxValues()[i], --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139494 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java --- @@ -133,6 +134,21 @@ return null; } + @Override + public boolean[] isMinMaxSet() { +BlockletIndex blockletIndex = + blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex(); --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139378 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) { bitSet = ((ImplicitColumnFilterExecutor) filterExecuter) .isFilterValuesPresentInBlockOrBlocklet( dataBlock.getColumnsMaxValue(), -dataBlock.getColumnsMinValue(), blockletId); +dataBlock.getColumnsMinValue(), blockletId, dataBlock.isMinMaxSet()); } else { bitSet = this.filterExecuter .isScanRequired(dataBlock.getColumnsMaxValue(), -dataBlock.getColumnsMinValue()); +dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet()); --- End diff -- For blocklet min max comparison the footer is getting read in executor and the same min max flag information is getting used. The information serialized from driver is not getting used for blocklet min max comparison in executor ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139444 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -108,10 +108,13 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, BitSetGroup bitSetGroup = new BitSetGroup(dimensionRawColumnChunk.getPagesCount()); filterValues = dimColumnExecuterInfo.getFilterKeys(); boolean isDecoded = false; + boolean isMinMaxSetForFilterDimension = + rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex]; for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) { if (dimensionRawColumnChunk.getMaxValues() != null) { - if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i], - dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) { + if (!isMinMaxSetForFilterDimension || isScanRequired( --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218139467 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java --- @@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex minMaxIndex) { this.minMaxIndex = minMaxIndex; } + @Override public void write(DataOutput out) throws IOException { --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218138883 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java --- @@ -250,6 +275,42 @@ public static BlockletIndex getBlockletIndex(EncodedBlocklet encodedBlocklet, } /** + * This method will combine the writeMinMax flag from all the pages. If any page for a given + * dimension has writeMinMax flag set to false then min max for that dimension will nto be + * written in any of the page and metadata + * + * @param blockletMinMaxIndex + * @param encodedBlocklet + */ + private static List mergeWriteMinMaxFlagForAllPages( + BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet encodedBlocklet) { +Boolean[] mergedWriteMinMaxFlag = +new Boolean[encodedBlocklet.getNumberOfDimension() + encodedBlocklet.getNumberOfMeasure()]; +// set writeMinMax flag to true for all the columns by default and then update if stats object +// has the this flag set to false +Arrays.fill(mergedWriteMinMaxFlag, true); +for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) { + for (int pageIndex = 0; pageIndex < encodedBlocklet.getNumberOfPages(); pageIndex++) { +EncodedColumnPage encodedColumnPage = + encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList() +.get(pageIndex); +SimpleStatsResult stats = encodedColumnPage.getStats(); +if (!stats.writeMinMax()) { + mergedWriteMinMaxFlag[i] = stats.writeMinMax(); + String columnName = encodedColumnPage.getActualPage().getColumnSpec().getFieldName(); + LOGGER.info("Min Max writing ignored for column " + columnName + " from page 0 to " --- End diff -- ok ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218066737 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java --- @@ -250,6 +275,42 @@ public static BlockletIndex getBlockletIndex(EncodedBlocklet encodedBlocklet, } /** + * This method will combine the writeMinMax flag from all the pages. If any page for a given + * dimension has writeMinMax flag set to false then min max for that dimension will nto be + * written in any of the page and metadata + * + * @param blockletMinMaxIndex + * @param encodedBlocklet + */ + private static List mergeWriteMinMaxFlagForAllPages( + BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet encodedBlocklet) { +Boolean[] mergedWriteMinMaxFlag = +new Boolean[encodedBlocklet.getNumberOfDimension() + encodedBlocklet.getNumberOfMeasure()]; +// set writeMinMax flag to true for all the columns by default and then update if stats object +// has the this flag set to false +Arrays.fill(mergedWriteMinMaxFlag, true); +for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) { + for (int pageIndex = 0; pageIndex < encodedBlocklet.getNumberOfPages(); pageIndex++) { +EncodedColumnPage encodedColumnPage = + encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList() +.get(pageIndex); +SimpleStatsResult stats = encodedColumnPage.getStats(); +if (!stats.writeMinMax()) { + mergedWriteMinMaxFlag[i] = stats.writeMinMax(); + String columnName = encodedColumnPage.getActualPage().getColumnSpec().getFieldName(); + LOGGER.info("Min Max writing ignored for column " + columnName + " from page 0 to " --- End diff -- Add generic message saying that `"Min Max writing of blocklet ignored for column with name " + columnName + " ` ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218062043 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java --- @@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, BitSetGroup bitSetGroup = new BitSetGroup(rawColumnChunk.getPagesCount()); FilterExecuter filterExecuter = null; boolean isExclude = false; + boolean isMinMaxSetForFilterDimension = + rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex]; for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) { if (rawColumnChunk.getMaxValues() != null) { - if (isScanRequired(rawColumnChunk.getMaxValues()[i], this.filterRangeValues)) { + if (!isMinMaxSetForFilterDimension || isScanRequired(rawColumnChunk.getMaxValues()[i], --- End diff -- Please use the minaxflags from the current page and do the scan require check ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218062180 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) { bitSet = ((ImplicitColumnFilterExecutor) filterExecuter) .isFilterValuesPresentInBlockOrBlocklet( dataBlock.getColumnsMaxValue(), -dataBlock.getColumnsMinValue(), blockletId); +dataBlock.getColumnsMinValue(), blockletId, dataBlock.isMinMaxSet()); } else { bitSet = this.filterExecuter .isScanRequired(dataBlock.getColumnsMaxValue(), -dataBlock.getColumnsMinValue()); +dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet()); --- End diff -- Please use the minaxflags from the current footer and do the scan require check ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218061484 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -108,10 +108,13 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, BitSetGroup bitSetGroup = new BitSetGroup(dimensionRawColumnChunk.getPagesCount()); filterValues = dimColumnExecuterInfo.getFilterKeys(); boolean isDecoded = false; + boolean isMinMaxSetForFilterDimension = + rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex]; for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) { if (dimensionRawColumnChunk.getMaxValues() != null) { - if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i], - dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) { + if (!isMinMaxSetForFilterDimension || isScanRequired( --- End diff -- Please use the minaxflags from the current page and do the scan require check ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218060271 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java --- @@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex minMaxIndex) { this.minMaxIndex = minMaxIndex; } + @Override public void write(DataOutput out) throws IOException { --- End diff -- Please don't serialize as it is not required to pass to executor ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218059894 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java --- @@ -133,6 +134,21 @@ return null; } + @Override + public boolean[] isMinMaxSet() { +BlockletIndex blockletIndex = + blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex(); --- End diff -- No need to pass this information from driver to executor. ---
[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2725#discussion_r218048262 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java --- @@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde * @return */ BitSetGroup getIndexedData(); + + /** + * Return the array which contains the flag for each column whether the min max for that column + * is written in metadata or not + * + * @return + */ + boolean[] isMinMaxSet(); --- End diff -- Better change name to minMaxFlagArray` ---