[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2614 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/146/ ---
[GitHub] carbondata issue #2614: [CARBONDATA-2837] Added MVExample in example module
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2614 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8217/ ---
[GitHub] carbondata issue #2662: [CARBONDATA-2889]Add decoder based fallback mechanis...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2662 @kumarvishal09 handled comments, please review ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214505238 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/FallbackDecoderBasedColumnPageEncoder.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.page; + +import java.nio.ByteBuffer; +import java.util.concurrent.Callable; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.TableSpec; +import org.apache.carbondata.core.datastore.columnar.UnBlockIndexer; +import org.apache.carbondata.core.datastore.compression.CompressorFactory; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.keygenerator.factory.KeyGeneratorFactory; +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.format.Encoding; + +public class FallbackDecoderBasedColumnPageEncoder implements Callable { --- End diff -- change class name to DecoderBasedFallbackEncoder ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214505230 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/FallbackActualDataBasedColumnPageEncoder.java --- @@ -19,17 +19,17 @@ import java.util.concurrent.Callable; import org.apache.carbondata.core.datastore.TableSpec; -import org.apache.carbondata.core.datastore.page.encoding.ColumnPageEncoder; -import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory; import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.util.CarbonUtil; /** * Below class will be used to encode column pages for which local dictionary was generated * but all the pages in blocklet was not encoded with local dictionary. * This is required as all the pages of a column in blocklet either it will be local dictionary * encoded or without local dictionary encoded. */ -public class FallbackColumnPageEncoder implements Callable { +public class FallbackActualDataBasedColumnPageEncoder --- End diff -- Change class name to ActualDataBasedFallbackEncoder ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214505217 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -86,7 +92,8 @@ * @param encodedColumnPage * encoded column page */ - void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) { + void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage, + LocalDictionaryGenerator localDictionaryGenerator) { --- End diff -- better to add local dictionary generator in constructor ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214504899 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.columnar; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +public class BlockIndexerStorage { + + /** + * It compresses depends up on the sequence numbers. + * [1,2,3,4,6,8,10,11,12,13] is translated to [1,4,6,8,10,13] and [0,6]. In + * first array the start and end of sequential numbers and second array + * keeps the indexes of where sequential numbers starts. If there is no + * sequential numbers then the same array it returns with empty second + * array. + * + * @param rowIds + */ + public static Map rleEncodeOnRowId(short[] rowIds, short[] rowIdPage, --- End diff -- move this code to carbonutil ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214504900 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.columnar; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +public class BlockIndexerStorage { + + /** + * It compresses depends up on the sequence numbers. + * [1,2,3,4,6,8,10,11,12,13] is translated to [1,4,6,8,10,13] and [0,6]. In + * first array the start and end of sequential numbers and second array + * keeps the indexes of where sequential numbers starts. If there is no + * sequential numbers then the same array it returns with empty second + * array. + * + * @param rowIds + */ + public static Map rleEncodeOnRowId(short[] rowIds, short[] rowIdPage, + short[] rowIdRlePage) { +List list = new ArrayList(CarbonCommonConstants.CONSTANT_SIZE_TEN); +List map = new ArrayList(CarbonCommonConstants.CONSTANT_SIZE_TEN); +int k = 0; +int i = 1; +for (; i < rowIds.length; i++) { + if (rowIds[i] - rowIds[i - 1] == 1) { +k++; + } else { +if (k > 0) { + map.add(((short) list.size())); + list.add(rowIds[i - k - 1]); + list.add(rowIds[i - 1]); +} else { + list.add(rowIds[i - 1]); +} +k = 0; + } +} +if (k > 0) { + map.add(((short) list.size())); + list.add(rowIds[i - k - 1]); + list.add(rowIds[i - 1]); +} else { + list.add(rowIds[i - 1]); +} +int compressionPercentage = (((list.size() + map.size()) * 100) / rowIds.length); +if (compressionPercentage > 70) { + rowIdPage = rowIds; +} else { + rowIdPage = convertToArray(list); +} +if (rowIds.length == rowIdPage.length) { + rowIdRlePage = new short[0]; +} else { + rowIdRlePage = convertToArray(map); +} +Map rowIdAndRowRleIdPages = new HashMap<>(2); +rowIdAndRowRleIdPages.put("rowIdPage", rowIdPage); +rowIdAndRowRleIdPages.put("rowRlePage", rowIdRlePage); +return rowIdAndRowRleIdPages; + } + + public static short[] convertToArray(List list) { --- End diff -- move this code to carbonutil ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214504804 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java --- @@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage encodedTablePage) { * @param encodedTablePage * encoded table page */ - private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) { + private void addEncodedMeasurePage(EncodedTablePage encodedTablePage, + Map localDictionaryGeneratorMap) { // for first page create new list if (null == encodedMeasureColumnPages) { encodedMeasureColumnPages = new ArrayList<>(); // adding measure pages for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) { -BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null); - blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i)); +BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null, +Boolean.parseBoolean(CarbonProperties.getInstance() --- End diff -- Instead of parsing every time for each encodedPage parse once in constructor and add in private field ...for measure u can directly pass false as local dictionary will not generated for measure page ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214504750 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -105,8 +112,15 @@ void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) { LOGGER.info( "Local dictionary Fallback is initiated for column: " + this.columnName + " for page:" + encodedColumnPageList.size()); - fallbackFutureQueue.add(fallbackExecutorService - .submit(new FallbackColumnPageEncoder(encodedColumnPage, encodedColumnPageList.size(; + if (isDecoderBasedFallBackEnabled) { --- End diff -- Move this code to some private metod and pass encodedColumnPage and pageIndex ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214504736 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -86,7 +92,8 @@ * @param encodedColumnPage * encoded column page */ - void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) { + void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage, --- End diff -- Can u Please update the method name addEncodedColumnPage. ---
[GitHub] carbondata pull request #2614: [CARBONDATA-2837] Added MVExample in example ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2614#discussion_r214504621 --- Diff: examples/spark2/pom.xml --- @@ -49,6 +49,11 @@ carbondata-store-sdk ${project.version} + + org.apache.carbondata + carbondata-mv-core --- End diff -- Yes, profile added ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with 2.3 http://136.243.101.176:8080/job/ManualApacheCarbonPRBuilder2.1/176/ ---
[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/144/ ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/145/ ---
[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2662 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8215/ ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8216/ ---
[GitHub] carbondata pull request #2662: [WIP][CARBONDATA-2889]Add decoder based fallb...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2662#discussion_r214426930 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/EncodedBlocklet.java --- @@ -87,19 +91,24 @@ private void addPageMetadata(EncodedTablePage encodedTablePage) { * @param encodedTablePage * encoded table page */ - private void addEncodedMeasurePage(EncodedTablePage encodedTablePage) { + private void addEncodedMeasurePage(EncodedTablePage encodedTablePage, + Map localDictionaryGeneratorMap) { // for first page create new list if (null == encodedMeasureColumnPages) { encodedMeasureColumnPages = new ArrayList<>(); // adding measure pages for (int i = 0; i < encodedTablePage.getNumMeasures(); i++) { -BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null); - blockletEncodedColumnPage.addEncodedColumnColumnPage(encodedTablePage.getMeasure(i)); +BlockletEncodedColumnPage blockletEncodedColumnPage = new BlockletEncodedColumnPage(null, +Boolean.parseBoolean(CarbonProperties.getInstance() --- End diff -- @xuchuanyin i have tested and results i have published, i think we can keep it as property and make default by true, as we getting good result with respect to memory ---
[GitHub] carbondata issue #2662: [WIP][CARBONDATA-2889]Add decoder based fallback mec...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2662 @kumarvishal09 @jackylk i have updated the PR description with performance and memory report, i have published the result in mail also, please have a look ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/143/ ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8214/ ---
[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2680 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/140/ ---
[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2680 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8211/ ---
[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2678 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/141/ ---
[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2678 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8212/ ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2654 @dhatchayani You can raise one more to improvise the code at some places: 1. Unify isScanRequired code in all the filter classes using ENUM and flag based on min max comparison 2. Create new page wrapper that extends from ColumnPageWrapper and sends the actual data for no dictionary primitive type columns ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214371546 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -331,8 +332,18 @@ private BloomQueryModel buildQueryModelInternal(CarbonColumn carbonColumn, // for dictionary/date columns, convert the surrogate key to bytes internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, convertedValue); } else { - // for non dictionary dimensions, is already bytes, - internalFilterValue = (byte[]) convertedValue; + // for non dictionary dimensions, numeric columns will be of original data, + // so convert the data to bytes + if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) { +if (convertedValue == null) { --- End diff -- if possible initialize and store the flag in constructor and remove the check `DataTypeUtil.isPrimitiveColumn` wherever applicable in the below code ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214361007 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -110,8 +112,19 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, boolean isDecoded = false; for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) { if (dimensionRawColumnChunk.getMaxValues() != null) { - if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i], - dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) { + boolean scanRequired; + // for no dictionary measure column comparison can be done + // on the original data as like measure column + if (DataTypeUtil.isPrimitiveColumn(dimColumnEvaluatorInfo.getDimension().getDataType()) + && !dimColumnEvaluatorInfo.getDimension().hasEncoding(Encoding.DICTIONARY)) { +scanRequired = isScanRequired(dimensionRawColumnChunk.getMaxValues()[i], --- End diff -- You can create a `isPrimitiveNoDictionaryColumn` flag and check `DataTypeUtil.isPrimitiveColum` in the constructor. This will avoid the check for every page. Do this for all the filters ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user sandeep-katta commented on the issue: https://github.com/apache/carbondata/pull/2642 @ravipesala please retrigger 2.3 build,test cases issues are fixed ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8210/ ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/139/ ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214356896 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -239,12 +239,25 @@ private boolean isEncodedWithMeta(DataChunk2 pageMetadata) { protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, ByteBuffer pageData, DataChunk2 pageMetadata, int offset) throws IOException, MemoryException { +List encodings = pageMetadata.getEncoders(); if (isEncodedWithMeta(pageMetadata)) { ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, null != rawColumnPage.getLocalDictionary()); decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence)); - return new ColumnPageWrapper(decodedPage, rawColumnPage.getLocalDictionary(), - isEncodedWithAdaptiveMeta(pageMetadata)); + int[] invertedIndexes = new int[0]; --- End diff -- add a comment to explain that this scenario is to handle no dictionary primitive type columns where inverted index can be created on row id's during data load ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214354541 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -363,7 +398,16 @@ public EncodedTablePage getEncodedTablePage() { columnPageEncoder = encodingFactory.createEncoder( spec, noDictDimensionPages[noDictIndex]); - encodedPage = columnPageEncoder.encode(noDictDimensionPages[noDictIndex++]); + encodedPage = columnPageEncoder.encode(noDictDimensionPages[noDictIndex]); + DataType targetDataType = + columnPageEncoder.getTargetDataType(noDictDimensionPages[noDictIndex]); + if (null != targetDataType) { +LOGGER.info("Encoder result ---> Source data type: " + noDictDimensionPages[noDictIndex] --- End diff -- make this logger debug and check for debugenabled ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214352965 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -346,12 +371,21 @@ static ColumnPageCodec selectCodecByAlgorithmForDecimal(SimpleStatsResult stats, // no effect to use adaptive or delta, use compression only return new DirectCompressCodec(stats.getDataType()); } +boolean isSort = false; +boolean isInvertedIndex = false; +if (columnSpec instanceof TableSpec.DimensionSpec +&& columnSpec.getColumnType() != ColumnType.COMPLEX_PRIMITIVE) { + isSort = ((TableSpec.DimensionSpec) columnSpec).isInSortColumns(); + isInvertedIndex = isSort && ((TableSpec.DimensionSpec) columnSpec).isDoInvertedIndex(); +} --- End diff -- Put the above changes in one method as the same code is used in above places also and then call this method while creating the encoding type ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214351650 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java --- @@ -91,6 +92,30 @@ private void addMeasures(List measures) { } } + /** + * No dictionary and complex dimensions of the table + * + * @return + */ + public DimensionSpec[] getNoDictAndComplexDimensions() { +List noDicOrCompIndexes = new ArrayList<>(dimensionSpec.length); +int noDicCount = 0; +for (int i = 0; i < dimensionSpec.length; i++) { + if (dimensionSpec[i].getColumnType() == ColumnType.PLAIN_VALUE + || dimensionSpec[i].getColumnType() == ColumnType.COMPLEX_PRIMITIVE + || dimensionSpec[i].getColumnType() == ColumnType.COMPLEX) { +noDicOrCompIndexes.add(i); +noDicCount++; + } +} + +DimensionSpec[] dims = new DimensionSpec[noDicCount]; +for (int i = 0; i < dims.length; i++) { + dims[i] = dimensionSpec[noDicOrCompIndexes.get(i)]; +} +return dims; --- End diff -- Avoid the below for loop in this method ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/138/ ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8209/ ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214338168 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java --- @@ -375,6 +454,47 @@ public void writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row, outputStream.write(rowBuffer.array(), 0, packSize); } + /** + * Write the data to stream + * + * @param data + * @param outputStream + * @param idx + * @throws IOException + */ + private void writeDataToStream(Object data, DataOutputStream outputStream, int idx) + throws IOException { +DataType dataType = noDicSortDataTypes[idx]; +if (null == data) { + outputStream.writeBoolean(false); + return; --- End diff -- do not use return statement instead use the if else block wisely ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214336180 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java --- @@ -224,10 +237,15 @@ public IntermediateSortTempRow readWithNoSortFieldConvert( // read no-dict & sort data for (int idx = 0; idx < this.noDictSortDimCnt; idx++) { - short len = inputStream.readShort(); - byte[] bytes = new byte[len]; - inputStream.readFully(bytes); - noDictSortDims[idx] = bytes; + // for no dict measure column get the original data + if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) { +noDictSortDims[idx] = readDataFromStream(inputStream, idx); + } else { +short len = inputStream.readShort(); +byte[] bytes = new byte[len]; +inputStream.readFully(bytes); +noDictSortDims[idx] = bytes; + } --- End diff -- Above also there is a similar..refactor the code to one method and call from both these places ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214341633 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/partition/impl/RawRowComparator.java --- @@ -30,24 +33,39 @@ public class RawRowComparator implements Comparator { private int[] sortColumnIndices; private boolean[] isSortColumnNoDict; + private DataType[] noDicDataTypes; - public RawRowComparator(int[] sortColumnIndices, boolean[] isSortColumnNoDict) { + public RawRowComparator(int[] sortColumnIndices, boolean[] isSortColumnNoDict, + DataType[] noDicDataTypes) { this.sortColumnIndices = sortColumnIndices; this.isSortColumnNoDict = isSortColumnNoDict; +this.noDicDataTypes = noDicDataTypes; } @Override public int compare(CarbonRow o1, CarbonRow o2) { int diff = 0; int i = 0; +int noDicIdx = 0; for (int colIdx : sortColumnIndices) { if (isSortColumnNoDict[i]) { -byte[] colA = (byte[]) o1.getObject(colIdx); -byte[] colB = (byte[]) o2.getObject(colIdx); -diff = UnsafeComparer.INSTANCE.compareTo(colA, colB); -if (diff != 0) { - return diff; +if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[noDicIdx])) { + // for no dictionary numeric column get comparator based on the data type + SerializableComparator comparator = org.apache.carbondata.core.util.comparator.Comparator --- End diff -- increment `noDicIdx` in if block and remove from method end ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214341135 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/IntermediateSortTempRowComparator.java --- @@ -45,18 +52,31 @@ public int compare(IntermediateSortTempRow rowA, IntermediateSortTempRow rowB) { int diff = 0; int dictIndex = 0; int nonDictIndex = 0; +int noDicTypeIdx = 0; for (boolean isNoDictionary : isSortColumnNoDictionary) { if (isNoDictionary) { -byte[] byteArr1 = rowA.getNoDictSortDims()[nonDictIndex]; -byte[] byteArr2 = rowB.getNoDictSortDims()[nonDictIndex]; -nonDictIndex++; +if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[noDicTypeIdx])) { + // use data types based comparator for the no dictionary measure columns + SerializableComparator comparator = org.apache.carbondata.core.util.comparator.Comparator --- End diff -- Increment the no dictionary type index here `noDicTypeIdx` in if block and not at the end ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214341442 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/NewRowComparator.java --- @@ -43,15 +53,31 @@ public NewRowComparator(boolean[] noDictionarySortColumnMaping) { public int compare(Object[] rowA, Object[] rowB) { int diff = 0; int index = 0; +int dataTypeIdx = 0; +int noDicSortIdx = 0; -for (boolean isNoDictionary : noDictionarySortColumnMaping) { - if (isNoDictionary) { -byte[] byteArr1 = (byte[]) rowA[index]; -byte[] byteArr2 = (byte[]) rowB[index]; +for (int i = 0; i < noDicDimColMapping.length; i++) { + if (noDicDimColMapping[i]) { +if (noDicSortColumnMapping[noDicSortIdx++]) { + if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[dataTypeIdx])) { +// use data types based comparator for the no dictionary measure columns +SerializableComparator comparator = --- End diff -- increment `dataTypeIdx` in if block and remove from method end ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214337384 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java --- @@ -359,9 +433,14 @@ public void writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row, // write no-dict & sort for (int idx = 0; idx < this.noDictSortDimCnt; idx++) { - byte[] bytes = (byte[]) row[this.noDictSortDimIdx[idx]]; - outputStream.writeShort(bytes.length); - outputStream.write(bytes); + if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) { --- End diff -- I can see that at multiple places for every row DataTypeUtil.isPrimitiveColumn is getting used. Please check the load performance impact of this ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214341815 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/comparator/UnsafeRowComparator.java --- @@ -60,26 +64,50 @@ public int compare(UnsafeCarbonRow rowL, Object baseObjectL, UnsafeCarbonRow row if (isNoDictionary) { short lengthA = CarbonUnsafe.getUnsafe().getShort(baseObjectL, rowA + dictSizeInMemory + sizeInNonDictPartA); -byte[] byteArr1 = new byte[lengthA]; sizeInNonDictPartA += 2; -CarbonUnsafe.getUnsafe() -.copyMemory(baseObjectL, rowA + dictSizeInMemory + sizeInNonDictPartA, -byteArr1, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthA); -sizeInNonDictPartA += lengthA; - short lengthB = CarbonUnsafe.getUnsafe().getShort(baseObjectR, rowB + dictSizeInMemory + sizeInNonDictPartB); -byte[] byteArr2 = new byte[lengthB]; sizeInNonDictPartB += 2; -CarbonUnsafe.getUnsafe() -.copyMemory(baseObjectR, rowB + dictSizeInMemory + sizeInNonDictPartB, -byteArr2, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthB); -sizeInNonDictPartB += lengthB; +DataType dataType = tableFieldStat.getNoDicSortDataType()[noDicSortIdx]; +if (DataTypeUtil.isPrimitiveColumn(dataType)) { + Object data1 = null; --- End diff -- increment `noDicSortIdx` in if block and remove from method end ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2638 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/136/ ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2638 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8207/ ---
[GitHub] carbondata issue #2672: [HOTFIX] improve sdk multi-thread performance
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2672 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/135/ ---
[GitHub] carbondata issue #2672: [HOTFIX] improve sdk multi-thread performance
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2672 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8206/ ---
[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2680 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8205/ ---
[GitHub] carbondata issue #2679: WIP: [CARBONDATA-2904] Support minmax datamap for ex...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2679 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/133/ ---
[GitHub] carbondata issue #2680: [CARBONDATA-2905] Set stream property for streaming ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2680 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/134/ ---
[GitHub] carbondata issue #2679: WIP: [CARBONDATA-2904] Support minmax datamap for ex...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2679 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8204/ ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2638 retest this please ---
[GitHub] carbondata pull request #2676: [CARBONDATA-2902][DataMap] Fix showing negati...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2676#discussion_r214320712 --- Diff: core/src/main/java/org/apache/carbondata/core/profiler/TablePruningInfo.java --- @@ -99,4 +107,44 @@ public String toString() { } return builder.toString(); } + + /** + * when CACHE_LEVEL = BLOCK or carbon data file is LegacyStore + * only show pruned result size of datamaps in block/blocklet level + */ + private String getHitInfoAfterPruning() { +StringBuilder builder = new StringBuilder(); +builder +.append(" - total blocks: ").append(totalBlocklets).append("\n") +.append(" - filter: ").append(filterStatement).append("\n"); +if (defaultDataMap != null) { + builder + .append(" - pruned by Main DataMap").append("\n") + .append("- hit blocks: ").append(numBlockletsAfterDefaultPruning).append("\n"); --- End diff -- It is better to unified with "blocklet case", you can change to use "hit blocklets" in `getSkipBlockletInfoAfterPruning` ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/132/ ---
[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2654 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8203/ ---
[GitHub] carbondata issue #2644: [CARBONDATA-2853] Implement file-level min/max index...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2644 @QiangCai In General I can see that you put empty lines at many places in the code. Please remove those empty lines everywhere and add some code comments for better understanding ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214310465 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java --- @@ -96,14 +96,35 @@ private static FileFooter3 getFileFooter3(List infoList, return footer; } - public static BlockletIndex getBlockletIndex( - org.apache.carbondata.core.metadata.blocklet.index.BlockletIndex info) { + public static org.apache.carbondata.core.metadata.blocklet.index.BlockletMinMaxIndex + convertExternalMinMaxIndex(BlockletMinMaxIndex minMaxIndex) { --- End diff -- please add a method comment to explain what is meaning of convertExternalMinMaxIndex ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214303472 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datamap; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.reader.CarbonIndexFileReader; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.util.CarbonMetadataUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.format.BlockIndex; + +@InterfaceAudience.Internal +public class StreamDataMap { + + private CarbonTable carbonTable; + + private AbsoluteTableIdentifier identifier; --- End diff -- If carbonTable is getting stored then no need to store identifier...you can get it from carbontable ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214307411 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datamap; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.reader.CarbonIndexFileReader; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.util.CarbonMetadataUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.format.BlockIndex; + +@InterfaceAudience.Internal +public class StreamDataMap { + + private CarbonTable carbonTable; + + private AbsoluteTableIdentifier identifier; + + private FilterExecuter filterExecuter; + + public StreamDataMap(CarbonTable carbonTable) { +this.carbonTable = carbonTable; +this.identifier = carbonTable.getAbsoluteTableIdentifier(); + } + + public void init(FilterResolverIntf filterExp) { +if (filterExp != null) { + + List minMaxCacheColumns = new ArrayList<>(); + for (CarbonDimension dimension : carbonTable.getDimensions()) { +if (!dimension.isComplex()) { + minMaxCacheColumns.add(dimension); +} + } + minMaxCacheColumns.addAll(carbonTable.getMeasures()); + + List listOfColumns = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + int[] columnCardinality = new int[listOfColumns.size()]; + for (int index = 0; index < columnCardinality.length; index++) { +columnCardinality[index] = Integer.MAX_VALUE; + } + + SegmentProperties segmentProperties = + new SegmentProperties(listOfColumns, columnCardinality); + + filterExecuter = FilterUtil.getFilterExecuterTree( + filterExp, segmentProperties, null, minMaxCacheColumns); +} + } + + public List prune(List segments) throws IOException { +if (filterExecuter == null) { + return listAllStreamFiles(segments, false); +} else { + List streamFileList = new ArrayList<>(); + for (StreamFile streamFile : listAllStreamFiles(segments, true)) { +if (isScanRequire(streamFile)) { + streamFileList.add(streamFile); + streamFile.setMinMaxIndex(null); +} + } + return streamFileList; +} + } + + private boolean isScanRequire(StreamFile streamFile) { +// backward compatibility, old stream file without min/max index +if (streamFile.getMinMaxIndex() == null) { + return true; +} + +byte[][] maxValue = streamFile.getMinMaxIndex().getMaxValues(); +byte[][] minValue =
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214313170 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala --- @@ -205,8 +205,9 @@ class StreamHandoffRDD[K, V]( segmentList.add(Segment.toSegment(handOffSegmentId, null)) val splits = inputFormat.getSplitsOfStreaming( job, - carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier, - segmentList + segmentList, + carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, + null --- End diff -- Once you add the overloaded method as explained in above comment you can call the method with 3 arguments from here ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214311953 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext job, CarbonTable carbonTabl /** * use file list in .carbonindex file to get the split of streaming. */ - public List getSplitsOfStreaming(JobContext job, AbsoluteTableIdentifier identifier, - List streamSegments) throws IOException { + public List getSplitsOfStreaming(JobContext job, List streamSegments, + CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) throws IOException { List splits = new ArrayList(); if (streamSegments != null && !streamSegments.isEmpty()) { numStreamSegments = streamSegments.size(); long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job)); long maxSize = getMaxSplitSize(job); - for (Segment segment : streamSegments) { -String segmentDir = -CarbonTablePath.getSegmentPath(identifier.getTablePath(), segment.getSegmentNo()); -FileFactory.FileType fileType = FileFactory.getFileType(segmentDir); -if (FileFactory.isFileExist(segmentDir, fileType)) { - SegmentIndexFileStore segmentIndexFileStore = new SegmentIndexFileStore(); - segmentIndexFileStore.readAllIIndexOfSegment(segmentDir); - Map carbonIndexMap = segmentIndexFileStore.getCarbonIndexMap(); - CarbonIndexFileReader indexReader = new CarbonIndexFileReader(); - for (byte[] fileData : carbonIndexMap.values()) { -indexReader.openThriftReader(fileData); -try { - // map block index - while (indexReader.hasNext()) { -BlockIndex blockIndex = indexReader.readBlockIndexInfo(); -String filePath = segmentDir + File.separator + blockIndex.getFile_name(); -Path path = new Path(filePath); -long length = blockIndex.getFile_size(); -if (length != 0) { - BlockLocation[] blkLocations; - FileSystem fs = FileFactory.getFileSystem(path); - FileStatus file = fs.getFileStatus(path); - blkLocations = fs.getFileBlockLocations(path, 0, length); - long blockSize = file.getBlockSize(); - long splitSize = computeSplitSize(blockSize, minSize, maxSize); - long bytesRemaining = length; - while (((double) bytesRemaining) / splitSize > 1.1) { -int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); -splits.add(makeSplit(segment.getSegmentNo(), path, length - bytesRemaining, -splitSize, blkLocations[blkIndex].getHosts(), -blkLocations[blkIndex].getCachedHosts(), FileFormat.ROW_V1)); -bytesRemaining -= splitSize; - } - if (bytesRemaining != 0) { -int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); -splits.add(makeSplit(segment.getSegmentNo(), path, length - bytesRemaining, -bytesRemaining, blkLocations[blkIndex].getHosts(), -blkLocations[blkIndex].getCachedHosts(), FileFormat.ROW_V1)); - } -} else { - //Create empty hosts array for zero length files - splits.add(makeSplit(segment.getSegmentNo(), path, 0, length, new String[0], - FileFormat.ROW_V1)); -} - } -} finally { - indexReader.closeThriftReader(); + + if (filterResolverIntf == null) { +if (carbonTable != null) { + Expression filter = getFilterPredicates(job.getConfiguration()); + if (filter != null) { +carbonTable.processFilterExpression(filter, null, null); +filterResolverIntf = carbonTable.resolveFilter(filter); + } +} + } + StreamDataMap streamDataMap = + DataMapStoreManager.getInstance().getStreamDataMap(carbonTable); + streamDataMap.init(filterResolverIntf); + List streamFiles = streamDataMap.prune(streamSegments); + for (StreamFile streamFile : streamFiles) { +if (FileFactory.isFileExist(streamFile.getFilePath())) { + Path path = new Path(streamFile.getFilePath()); + long length = streamFile.getFileSize(); +
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214311329 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext job, CarbonTable carbonTabl /** * use file list in .carbonindex file to get the split of streaming. */ - public List getSplitsOfStreaming(JobContext job, AbsoluteTableIdentifier identifier, - List streamSegments) throws IOException { + public List getSplitsOfStreaming(JobContext job, List streamSegments, + CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) throws IOException { --- End diff -- You can write an overloaded method for getSplitsOfStreaming. One which accepts 3 parameters and one with 4 parameters. 1. getSplitsOfStreaming(JobContext job, AbsoluteTableIdentifier identifier,List streamSegments) -- From this method you can the other method and pass null as the 4th argument. This will avoid passing null at all places above. 2. getSplitsOfStreaming(JobContext job, List streamSegments, CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214305126 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datamap; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.reader.CarbonIndexFileReader; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.util.CarbonMetadataUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.format.BlockIndex; + +@InterfaceAudience.Internal +public class StreamDataMap { --- End diff -- Please check the feasibility if we can extend DataMap interface and implement all its method to keep it similar like BlockDataMap. I think it should be feasible ---
[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2644#discussion_r214316607 --- Diff: streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java --- @@ -212,9 +271,13 @@ private void initializeAtFirstRow() throws IOException, InterruptedException { byte[] col = (byte[]) columnValue; output.writeShort(col.length); output.writeBytes(col); +dimensionStatsCollectors[dimCount].update(col); } else { output.writeInt((int) columnValue); + dimensionStatsCollectors[dimCount].update(ByteUtil.toBytes((int) columnValue)); --- End diff -- For min/max comparison you are converting from Int to byte array for all the rows. This can impact the writing performance. Instead you can typecast into Int and do the comparison. After all the data is loaded then at the end you can convert all the values into byte array based on datatype. At that time it will be only one conversion for the final min/max values ---
[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2672#discussion_r214318648 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/InputProcessorStepWithNoConverterImpl.java --- @@ -64,10 +63,13 @@ private Map dataFieldsWithComplexDataType; + private short sdkUserCore; --- End diff -- done. ---
[GitHub] carbondata pull request #2676: [CARBONDATA-2902][DataMap] Fix showing negati...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2676#discussion_r214317180 --- Diff: core/src/main/java/org/apache/carbondata/core/profiler/ExplainCollector.java --- @@ -125,6 +125,13 @@ public static void addTotalBlocklets(int numBlocklets) { } } + public static void setDefaultDMBlockLevel(boolean isBlockLevel) { --- End diff -- please add comment ---
[GitHub] carbondata pull request #2680: [CARBONDATA-2905] Set stream property for str...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/2680 [CARBONDATA-2905] Set stream property for streaming table For streaming table with table property "streaming"="true", we should allow set the streaming table property to false. After setting to false, only batch segments will be queried. - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. rerun all test - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata set_stream_property Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2680 commit dec6a565dd9ac7872c1baf03058a83a9cdae3ee5 Author: Jacky Li Date: 2018-08-31T10:51:25Z set stream property ---
[jira] [Created] (CARBONDATA-2905) Should allow set stream property on streaming table
Jacky Li created CARBONDATA-2905: Summary: Should allow set stream property on streaming table Key: CARBONDATA-2905 URL: https://issues.apache.org/jira/browse/CARBONDATA-2905 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.5.0 For streaming table with table property "streaming"="true", we should allow set the streaming table property to false -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2672#discussion_r214314596 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() { private CarbonOutputIteratorWrapper[] iterators; -private int counter; +private AtomicLong counter; --- End diff -- done ---
[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2672#discussion_r214314587 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() { private CarbonOutputIteratorWrapper[] iterators; -private int counter; +private AtomicLong counter; CarbonMultiRecordWriter(CarbonOutputIteratorWrapper[] iterators, DataLoadExecutor dataLoadExecutor, CarbonLoadModel loadModel, Future future, ExecutorService executorService) { super(null, dataLoadExecutor, loadModel, future, executorService); this.iterators = iterators; + counter = new AtomicLong(0); } -@Override public synchronized void write(NullWritable aVoid, ObjectArrayWritable objects) +@Override public void write(NullWritable aVoid, ObjectArrayWritable objects) throws InterruptedException { - iterators[counter].write(objects.get()); - if (++counter == iterators.length) { -//round robin reset -counter = 0; + int hash = (int) (counter.incrementAndGet() % iterators.length); --- End diff -- done ---
[GitHub] carbondata pull request #2672: [HOTFIX] improve sdk multi-thread performance
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2672#discussion_r214313919 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -460,27 +461,29 @@ public CarbonLoadModel getLoadModel() { private CarbonOutputIteratorWrapper[] iterators; -private int counter; +private AtomicLong counter; CarbonMultiRecordWriter(CarbonOutputIteratorWrapper[] iterators, DataLoadExecutor dataLoadExecutor, CarbonLoadModel loadModel, Future future, ExecutorService executorService) { super(null, dataLoadExecutor, loadModel, future, executorService); this.iterators = iterators; + counter = new AtomicLong(0); } -@Override public synchronized void write(NullWritable aVoid, ObjectArrayWritable objects) +@Override public void write(NullWritable aVoid, ObjectArrayWritable objects) throws InterruptedException { - iterators[counter].write(objects.get()); - if (++counter == iterators.length) { -//round robin reset -counter = 0; + int hash = (int) (counter.incrementAndGet() % iterators.length); --- End diff -- If makes an integer and write called for more than INT_MAX records, it will give negative results, So, keeping long is enough for very huge record. hence long. But always long % int will be within int. so a safe type cast. https://stackoverflow.com/questions/7262133/will-a-long-int-will-always-fit-into-an-int ---
[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...
Github user dhatchayani commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2654#discussion_r214313592 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java --- @@ -240,8 +258,44 @@ public IntermediateSortTempRow readWithNoSortFieldConvert( return new IntermediateSortTempRow(dictSortDims, noDictSortDims,measure); } + /** + * Read the data from the stream + * + * @param inputStream + * @param idx + * @return + * @throws IOException + */ + private Object readDataFromStream(DataInputStream inputStream, int idx) throws IOException { --- End diff -- For measures, it will always be packed/unpacked to/from bytebuffer ---
[GitHub] carbondata pull request #2679: WIP: [CARBONDATA-2904] Support minmax datamap...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2679 WIP: [CARBONDATA-2904] Support minmax datamap for external format table Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata ef_index_dm_minmax Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2679.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2679 commit 8c0e84804c266c56cada024384d9ab2eaa89e9f2 Author: xuchuanyin Date: 2018-08-20T01:38:12Z Support build file leve index for external format table + support directly generate file level index + support create and generate file index on existing data + We will flatten the input files recursively and remove the duplicated input files in one load The folder structure of the index file looks like below: ${datamap_name}/${segment_name}/File_level_${fact_file1_path_with_base64_encoding}/${column_name}.bloomindex ../File_level_${fact_file2_path_with_base64_encoding}/${column_name}.bloomindex Note that in this commit, the index datamap is not used during query. commit 30a861a92ff53df8befd14ee48b7a37499ab7c96 Author: xuchuanyin Date: 2018-08-25T10:49:43Z Support query external format using bloomfilter datamaps support query external format using bloomfilter datamap commit 0664a1abd19e97cbe09920635a00619c945f0a20 Author: xuchuanyin Date: 2018-08-28T06:17:51Z rename path for minmax datamap commit f29ec1d80acea6fcb85a55ab37c02847e9282e5b Author: xuchuanyin Date: 2018-08-29T12:26:38Z Fix bugs in MinMaxDataMap make minmax datamap useable and add more tests for it ---
[jira] [Created] (CARBONDATA-2904) Support minmax datamap for external format
xuchuanyin created CARBONDATA-2904: -- Summary: Support minmax datamap for external format Key: CARBONDATA-2904 URL: https://issues.apache.org/jira/browse/CARBONDATA-2904 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2678 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8202/ ---
[GitHub] carbondata issue #2678: [WIP] Multi user support for SDK on S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2678 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/131/ ---
[GitHub] carbondata pull request #2675: [CARBONDATA-2901] Fixed JVM crash in Load sce...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2675#discussion_r214306748 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java --- @@ -140,19 +143,25 @@ public void initialize() throws MemoryException, CarbonSortKeyAndGroupByExceptio semaphore = new Semaphore(parameters.getNumberOfCores()); } - private UnsafeCarbonRowPage createUnsafeRowPage() - throws MemoryException, CarbonSortKeyAndGroupByException { -MemoryBlock baseBlock = -UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, inMemoryChunkSize); -boolean isMemoryAvailable = - UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size()); -if (isMemoryAvailable) { - UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size()); -} else { - // merge and spill in-memory pages to disk if memory is not enough - unsafeInMemoryIntermediateFileMerger.tryTriggerInmemoryMerging(true); + private UnsafeCarbonRowPage createUnsafeRowPage() { +try { + MemoryBlock baseBlock = + UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, inMemoryChunkSize); + boolean isMemoryAvailable = + UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size()); + if (isMemoryAvailable) { + UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size()); + } else { +// merge and spill in-memory pages to disk if memory is not enough + unsafeInMemoryIntermediateFileMerger.tryTriggerInmemoryMerging(true); + } + return new UnsafeCarbonRowPage(tableFieldStat, baseBlock, !isMemoryAvailable, taskId); +} catch (MemoryException | CarbonSortKeyAndGroupByException e) { + // This will set rowPage reference to null. If not set, other threads will use same reference. + // As handlePreviousPage() free the rowPage. + // If not set to null, rowPage will be accessed again after free by other thread. + return null; --- End diff -- Issue came because of this itself. Throwing exception will not set rowPage reference to null. So, other thread will access this rowPage. But rowPage was already freed from previous thread. hence jvm crash ---
[GitHub] carbondata issue #2673: [WIP] Test Carbonstore
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2673 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/130/ ---
[GitHub] carbondata issue #2673: [WIP] Test Carbonstore
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2673 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8201/ ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2642 Build failed with 2.3 http://136.243.101.176:8080/job/ManualApacheCarbonPRBuilder2.1/175/ ---
[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2671 ---
[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2676 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/128/ ---
[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2676 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8199/ ---
[GitHub] carbondata pull request #2677: [CARBONDATA-2903] Fix compiler warning
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/2677 [CARBONDATA-2903] Fix compiler warning When build using mvn, there are some compiler warnings. They are fixed in this PR. - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. rerun all tests - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata remove_warning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2677 commit b16c030886ea330ac0ca521fb33f4c265ee26152 Author: Jacky Li Date: 2018-08-31T08:07:40Z remove warning ---
[jira] [Created] (CARBONDATA-2903) Fix compiler warnings
Jacky Li created CARBONDATA-2903: Summary: Fix compiler warnings Key: CARBONDATA-2903 URL: https://issues.apache.org/jira/browse/CARBONDATA-2903 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.5.0, 1.4.2 When build using mvn, there are some compiler warnings. They should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2661: [CARBONDATA-2888] Support multi level subfold...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2661#discussion_r214273342 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { +val files = carbonFile.listFiles() +var folders: Seq[CarbonFile] = Seq() +files.foreach { f => + if (f.isDirectory) { +val files = f.listFiles() +if (files.nonEmpty && !files(0).isDirectory) { + folders = Seq(f) ++ folders +} else { + folders = getDataFolders(f) ++ folders --- End diff -- This statement can be moved under files.nonEmpty check ---
[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2676 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8198/ ---
[GitHub] carbondata issue #2676: [CARBONDATA-2902][DataMap] Fix showing negative prun...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2676 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/127/ ---
[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2671 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/126/ ---
[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2671 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8197/ ---
[GitHub] carbondata issue #2642: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2642 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/125/ ---