[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r200843765 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -53,11 +60,75 @@ public int fillVector(int[] filteredRowId, ColumnVectorInfo[] vectorInfo, int ch throw new UnsupportedOperationException("internal error"); } - @Override - public byte[] getChunkData(int rowId) { -return columnPage.getBytes(rowId); + @Override public byte[] getChunkData(int rowId) { +ColumnType columnType = columnPage.getColumnSpec().getColumnType(); +DataType srcDataType = columnPage.getColumnSpec().getSchemaDataType(); +DataType targetDataType = columnPage.getDataType(); +if (columnPage.getNullBits().get(rowId)) { + // if this row is null, return default null represent in byte array + return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY; +} +if ((columnType == ColumnType.COMPLEX_PRIMITIVE) && this.isAdaptiveComplexPrimitive()) { + if (srcDataType == DataTypes.DOUBLE || srcDataType == DataTypes.FLOAT) { +double doubleData = columnPage.getDouble(rowId); +if (srcDataType == DataTypes.FLOAT) { + float out = (float) doubleData; --- End diff -- Convert to actual type (float) and get bytes adds one additional conversion per row. Can we avoid by extract/copy only required bytes based on type? ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r200843282 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java --- @@ -359,38 +412,36 @@ public void freeMemory() { } } - @Override - public void convertValue(ColumnPageValueConverter codec) { -int pageSize = getPageSize(); + @Override public void convertValue(ColumnPageValueConverter codec) { if (dataType == DataTypes.BYTE) { - for (long i = 0; i < pageSize; i++) { + for (long i = 0; i < totalLength / ByteUtil.SIZEOF_BYTE; i++) { --- End diff -- for loop end condition (totalLength / ByteUtil.SIZEOF_BYTE) is evaluated for every row. if we extract the computation of page size to a method, we can avoid this ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r200842962 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -53,11 +60,75 @@ public int fillVector(int[] filteredRowId, ColumnVectorInfo[] vectorInfo, int ch throw new UnsupportedOperationException("internal error"); } - @Override - public byte[] getChunkData(int rowId) { -return columnPage.getBytes(rowId); + @Override public byte[] getChunkData(int rowId) { +ColumnType columnType = columnPage.getColumnSpec().getColumnType(); +DataType srcDataType = columnPage.getColumnSpec().getSchemaDataType(); +DataType targetDataType = columnPage.getDataType(); +if (columnPage.getNullBits().get(rowId)) { + // if this row is null, return default null represent in byte array + return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY; +} +if ((columnType == ColumnType.COMPLEX_PRIMITIVE) && this.isAdaptiveComplexPrimitive()) { + if (srcDataType == DataTypes.DOUBLE || srcDataType == DataTypes.FLOAT) { +double doubleData = columnPage.getDouble(rowId); +if (srcDataType == DataTypes.FLOAT) { + float out = (float) doubleData; + return ByteUtil.toBytes(out); +} else { + return ByteUtil.toBytes(doubleData); +} + } else if (DataTypes.isDecimal(srcDataType)) { +throw new RuntimeException("unsupported type: " + srcDataType); + } else if ((srcDataType == DataTypes.BYTE) || + (srcDataType == DataTypes.BOOLEAN) || + (srcDataType == DataTypes.SHORT) || + (srcDataType == DataTypes.SHORT_INT) || + (srcDataType == DataTypes.INT) || + (srcDataType == DataTypes.LONG) || + (srcDataType == DataTypes.TIMESTAMP)) { +long longData = columnPage.getLong(rowId); --- End diff -- Should we read the bytes from column page based type ? Otherwise for small types like byte,short, also reading long would consume 8 bytes from the page which leads wrong data? ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
GitHub user sounakr reopened a pull request: https://github.com/apache/carbondata/pull/2417 [WIP][Complex Column Enhancements]Primitive DataType Adaptive Encoding Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sounakr/incubator-carbondata primitive_adaptive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2417 commit 546761fc2fba34b7694b9d127373ab11e792bb91 Author: ajantha-bhat Date: 2018-07-03T13:35:09Z backup commit dc35518631566dc7197c64ca9866e2c0971f2322 Author: ajantha-bhat Date: 2018-07-03T17:06:49Z some more commit e771e86ff0caf0ae74b3cfeea17c79de845468d8 Author: sounakr Date: 2018-07-04T04:13:39Z Read commit 14c387a518b2af904f75e30f855e4340280bbc24 Author: ajantha-bhat Date: 2018-07-04T04:52:26Z fix negative array size issue commit f65742253c0322b357522a759f062502a902e361 Author: ajantha-bhat Date: 2018-07-04T07:07:01Z TODO: revert CCC and test case change commit 396efbff6a1c10f0c5020836fb669a51f151db62 Author: ajantha-bhat Date: 2018-07-04T10:24:04Z struct of int commit 26c32ce7d8b3266c883d2f21d9f6fe02e6370141 Author: sounakr Date: 2018-07-04T09:38:56Z Safe Page Changes commit a0c2c60af324b24bbd111b9f62a1a7acdc698982 Author: sounakr Date: 2018-07-04T10:25:07Z L1 commit dfaedea96449f07fed7a36ea7f47e54b44d83cfc Author: sounakr Date: 2018-07-04T12:01:45Z Unsafe Fix Changes commit 210c5ab733ceeafdd3bc1593995da39acb0294a7 Author: ajantha-bhat Date: 2018-07-04T12:41:43Z fixed array type commit ce35af9fd9fb23e191f11d6d0f441ac9c8068147 Author: ajantha-bhat Date: 2018-07-04T15:07:15Z issue fixes commit 7eed4c75ea17197c602373ceab3fb6185e1d261b Author: ajantha-bhat Date: 2018-07-04T15:36:13Z unsafe issue fix commit b244b9b62ad448c862a1fddfb19126c8e3f22dfd Author: ajantha-bhat Date: 2018-07-04T15:51:06Z fix style commit 16d4280cd021ee389e538e23bcf1f62f6e849e7b Author: ajantha-bhat Date: 2018-07-05T02:28:09Z compilation fix commit a37ee3dff7f9c43ed88d4205f72f8ebbc6a73c3a Author: ajantha-bhat Date: 2018-07-05T02:45:06Z null value commit be861f3b1c7b748605c5be3b0a55d241c5de8394 Author: ajantha-bhat Date: 2018-07-05T05:16:53Z refactoring changes commit 19ec84d82e27e4ed5f23f8c12bec1b9d507e2ec6 Author: sounakr Date: 2018-07-05T10:47:56Z Float DataType Support commit f8b5b1adb21b1e42f23e30bd469730c23972ec35 Author: sounakr Date: 2018-07-05T11:30:26Z Refactor commit 2882c17552f81c821e43ccd6cda79feba7d60873 Author: ajantha-bhat Date: 2018-07-05T11:33:14Z clean up commit a9e8141b48eda7cabcd4748980196b64ab2cb685 Author: sounakr Date: 2018-07-05T15:15:38Z Adaptive Complex commit 663f0296dbee80efca0fda3fe39b91d3a9ad57c4 Author: sounakr Date: 2018-07-05T16:47:12Z TimeStamp Adaptive and Date Block ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr closed the pull request at: https://github.com/apache/carbondata/pull/2417 ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r200320193 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3177,4 +3177,34 @@ public static void setLocalDictColumnsToWrapperSchema(List columns } return columnLocalDictGenMap; } + + public static DataType getMappingDataType(String type) { --- End diff -- done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user dhatchayani commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r200307851 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3177,4 +3177,34 @@ public static void setLocalDictColumnsToWrapperSchema(List columns } return columnLocalDictGenMap; } + + public static DataType getMappingDataType(String type) { --- End diff -- Please reuse org.apache.carbondata.core.util.DataTypeUtil#valueOf(java.lang.String) this method.. This seems to be duplicate method ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199482930 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -161,14 +174,16 @@ private static DataType fitLongMinMax(long max, long min) { } private static DataType fitMinMax(DataType dataType, Object max, Object min) { -if (dataType == DataTypes.BYTE) { +if ((dataType == DataTypes.BYTE) || (dataType == DataTypes.BOOLEAN)) { --- End diff -- Use Switch instead of ifelse ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199481217 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +127,70 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +throw new UnsupportedOperationException( +"internal error: should be called for only dictionary columns"); } @Override public void freeMemory() { } + private void fillData(int[] rowMapping, ColumnVectorInfo columnVectorInfo, + CarbonColumnVector vector) { +int offsetRowId = columnVectorInfo.offset; +int vectorOffset = columnVectorInfo.vectorOffset; +int maxRowId = offsetRowId + columnVectorInfo.size; +BitSet nullBitSet = columnPage.getNullBits(); +TableSpec.ColumnSpec columnSpec = columnPage.getColumnSpec(); +if (columnSpec.getColumnType() == PLAIN_VALUE) { + for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { +int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; +if (nullBitSet.get(currentRowId)) { + // to handle the null values + vector.putNull(vectorOffset++); +} else { + if (columnSpec.getSchemaDataType() == DataTypes.STRING) { +byte[] data = columnPage.getBytes(currentRowId); +if (ByteUtil.UnsafeComparer.INSTANCE +.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, data)) { + vector.putNull(vectorOffset++); +} else { + vector.putBytes(vectorOffset++, 0, data.length, data); +} + } else if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) { +boolean data = columnPage.getBoolean(currentRowId); +vector.putBoolean(vectorOffset++, data); + } else if (columnSpec.getSchemaDataType() == DataTypes.SHORT) { +short data = columnPage.getShort(currentRowId); +vector.putShort(vectorOffset++, data); --- End diff -- Use Switch instead of ifelse ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199481192 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -30,32 +40,72 @@ public ColumnPageWrapper(ColumnPage columnPage) { this.columnPage = columnPage; } + public ColumnPage getColumnPage() { +return columnPage; + } + @Override public int fillRawData(int rowId, int offset, byte[] data, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +throw new UnsupportedOperationException( +"internal error: should be called for only dictionary columns"); } @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +throw new UnsupportedOperationException( +"internal error: should be called for only dictionary columns"); } @Override public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// fill the vector with data in column page +ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex]; +CarbonColumnVector vector = columnVectorInfo.vector; +fillData(null, columnVectorInfo, vector); +return chunkIndex + 1; } + @Override public int fillVector(int[] filteredRowId, ColumnVectorInfo[] vectorInfo, int chunkIndex, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex]; +CarbonColumnVector vector = columnVectorInfo.vector; +fillData(filteredRowId, columnVectorInfo, vector); +return chunkIndex + 1; } - @Override - public byte[] getChunkData(int rowId) { -return columnPage.getBytes(rowId); + @Override public byte[] getChunkData(int rowId) { +ColumnType columnType = columnPage.getColumnSpec().getColumnType(); +// TODO: No need to convert to Byte array, handle like measure +// But interface currently doesn't support, need to add new interface. +if (columnType == ColumnType.PLAIN_VALUE) { + if (columnPage.getNullBits().get(rowId)) { +// if this row is null, return default null represent in byte array +return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY; + } + if (columnPage.getDataType() == DataTypes.BYTE) { +byte byteData = columnPage.getByte(rowId); +return ByteUtil.toBytes(byteData); + } else if (columnPage.getDataType() == DataTypes.SHORT) { --- End diff -- Use Switch instead of ifelse ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199081933 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -405,13 +446,30 @@ public void putData(int rowId, Object value) { } else if (dataType == DataTypes.STRING || dataType == DataTypes.BYTE_ARRAY || dataType == DataTypes.VARCHAR) { - putBytes(rowId, (byte[]) value); - statsCollector.update((byte[]) value); + byte[] valueWithLength; + if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) { +// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In this +// scenario the dataType is BYTE_ARRAY and passed bytearray should +// be saved. +putBytes(rowId, (byte[]) value); +statsCollector.update((byte[]) value); + } else { +if (dataType == DataTypes.VARCHAR) { + // Add length and then add the data. + valueWithLength = addIntLengthToByteArray((byte[]) value); +} else { + valueWithLength = addShortLengthToByteArray((byte[]) value); +} +putBytes(rowId, valueWithLength); +statsCollector.update((byte[]) valueWithLength); + } } else { throw new RuntimeException("unsupported data type: " + dataType); } } + --- End diff -- Done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199081333 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -58,16 +67,36 @@ public static EncodingFactory getInstance() { @Override public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, ColumnPage inputPage) { // TODO: add log +ColumnPageEncoder pageEncoder = null; if (columnSpec instanceof TableSpec.MeasureSpec) { return createEncoderForMeasure(inputPage); -} else { - if (newWay) { -return createEncoderForDimension((TableSpec.DimensionSpec) columnSpec, inputPage); - } else { -assert columnSpec instanceof TableSpec.DimensionSpec; +} else if (columnSpec instanceof TableSpec.DimensionSpec) { + pageEncoder = createCodecForDimension((TableSpec.DimensionSpec) columnSpec, inputPage); + if (pageEncoder == null) { return createEncoderForDimensionLegacy((TableSpec.DimensionSpec) columnSpec); } } +return pageEncoder; + } + + private ColumnPageEncoder createCodecForDimension(TableSpec.DimensionSpec columnSpec, + ColumnPage inputPage) { +switch (columnSpec.getColumnType()) { + case PLAIN_VALUE: +if ((inputPage.getDataType() == DataTypes.BYTE) || (inputPage.getDataType() +== DataTypes.SHORT) || (inputPage.getDataType() == DataTypes.INT) || ( +inputPage.getDataType() == DataTypes.LONG)) { + return selectCodecByAlgorithmForIntegral(inputPage.getStatistics()).createEncoder(null); +} else if ((inputPage.getDataType() == DataTypes.FLOAT) || (inputPage.getDataType() +== DataTypes.DOUBLE)) { + return selectCodecByAlgorithmForFloating(inputPage.getStatistics()).createEncoder(null); +} else if (inputPage.getDataType() == DataTypes.STRING) { + // TODO. Currently let string go through legacy encoding. Later will change the encoding. + return null; +} +break; +} +return null; } private ColumnPageEncoder createEncoderForDimension(TableSpec.DimensionSpec columnSpec, --- End diff -- Done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199081394 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -436,10 +436,11 @@ public static boolean isFixedSizeDataType(DataType dataType) { * * @param dataInBytesdata * @param actualDataType actual data type + * @param isTimeStampConversion * @return actual data after conversion */ public static Object getDataBasedOnDataTypeForNoDictionaryColumn(byte[] dataInBytes, - DataType actualDataType) { + DataType actualDataType, boolean isTimeStampConversion) { --- End diff -- Done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199081355 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -354,6 +341,18 @@ public EncodedTablePage getEncodedTablePage() { .getColumnType()); } } +//for (int i = 0; i < dimensionPages.length; i++) { --- End diff -- Done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user sounakr commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r199081296 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -38,6 +38,15 @@ import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; +import static org.apache.carbondata.core.metadata.datatype.DataTypes.BOOLEAN; --- End diff -- Done ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198877123 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -436,10 +436,11 @@ public static boolean isFixedSizeDataType(DataType dataType) { * * @param dataInBytesdata * @param actualDataType actual data type + * @param isTimeStampConversion * @return actual data after conversion */ public static Object getDataBasedOnDataTypeForNoDictionaryColumn(byte[] dataInBytes, - DataType actualDataType) { + DataType actualDataType, boolean isTimeStampConversion) { --- End diff -- Add one more method to pass the `isTimeStampConversion` ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198877038 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java --- @@ -354,6 +341,18 @@ public EncodedTablePage getEncodedTablePage() { .getColumnType()); } } +//for (int i = 0; i < dimensionPages.length; i++) { --- End diff -- remove commented code ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198874999 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -58,16 +67,36 @@ public static EncodingFactory getInstance() { @Override public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, ColumnPage inputPage) { // TODO: add log +ColumnPageEncoder pageEncoder = null; if (columnSpec instanceof TableSpec.MeasureSpec) { return createEncoderForMeasure(inputPage); -} else { - if (newWay) { -return createEncoderForDimension((TableSpec.DimensionSpec) columnSpec, inputPage); - } else { -assert columnSpec instanceof TableSpec.DimensionSpec; +} else if (columnSpec instanceof TableSpec.DimensionSpec) { + pageEncoder = createCodecForDimension((TableSpec.DimensionSpec) columnSpec, inputPage); + if (pageEncoder == null) { return createEncoderForDimensionLegacy((TableSpec.DimensionSpec) columnSpec); } } +return pageEncoder; + } + + private ColumnPageEncoder createCodecForDimension(TableSpec.DimensionSpec columnSpec, + ColumnPage inputPage) { +switch (columnSpec.getColumnType()) { + case PLAIN_VALUE: +if ((inputPage.getDataType() == DataTypes.BYTE) || (inputPage.getDataType() +== DataTypes.SHORT) || (inputPage.getDataType() == DataTypes.INT) || ( +inputPage.getDataType() == DataTypes.LONG)) { + return selectCodecByAlgorithmForIntegral(inputPage.getStatistics()).createEncoder(null); +} else if ((inputPage.getDataType() == DataTypes.FLOAT) || (inputPage.getDataType() +== DataTypes.DOUBLE)) { + return selectCodecByAlgorithmForFloating(inputPage.getStatistics()).createEncoder(null); +} else if (inputPage.getDataType() == DataTypes.STRING) { + // TODO. Currently let string go through legacy encoding. Later will change the encoding. + return null; +} +break; +} +return null; } private ColumnPageEncoder createEncoderForDimension(TableSpec.DimensionSpec columnSpec, --- End diff -- remove it , no body uses it ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198874161 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -38,6 +38,15 @@ import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; +import static org.apache.carbondata.core.metadata.datatype.DataTypes.BOOLEAN; --- End diff -- just add `import static org.apache.carbondata.core.metadata.datatype.DataTypes.*` ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198873521 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java --- @@ -283,16 +283,16 @@ public byte getByte(int rowId) { @Override public short getShort(int rowId) { --- End diff -- Check for float also ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198872182 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -405,13 +446,30 @@ public void putData(int rowId, Object value) { } else if (dataType == DataTypes.STRING || dataType == DataTypes.BYTE_ARRAY || dataType == DataTypes.VARCHAR) { - putBytes(rowId, (byte[]) value); - statsCollector.update((byte[]) value); + byte[] valueWithLength; + if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) { +// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In this +// scenario the dataType is BYTE_ARRAY and passed bytearray should +// be saved. +putBytes(rowId, (byte[]) value); +statsCollector.update((byte[]) value); + } else { +if (dataType == DataTypes.VARCHAR) { + // Add length and then add the data. + valueWithLength = addIntLengthToByteArray((byte[]) value); +} else { + valueWithLength = addShortLengthToByteArray((byte[]) value); +} +putBytes(rowId, valueWithLength); +statsCollector.update((byte[]) valueWithLength); + } } else { throw new RuntimeException("unsupported data type: " + dataType); } } + --- End diff -- remove unnecessary gaps ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198872075 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -405,13 +446,30 @@ public void putData(int rowId, Object value) { } else if (dataType == DataTypes.STRING || dataType == DataTypes.BYTE_ARRAY || dataType == DataTypes.VARCHAR) { - putBytes(rowId, (byte[]) value); - statsCollector.update((byte[]) value); + byte[] valueWithLength; + if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) { +// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In this +// scenario the dataType is BYTE_ARRAY and passed bytearray should +// be saved. +putBytes(rowId, (byte[]) value); +statsCollector.update((byte[]) value); + } else { +if (dataType == DataTypes.VARCHAR) { + // Add length and then add the data. + valueWithLength = addIntLengthToByteArray((byte[]) value); +} else { + valueWithLength = addShortLengthToByteArray((byte[]) value); +} +putBytes(rowId, valueWithLength); +statsCollector.update((byte[]) valueWithLength); --- End diff -- Move down ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198868873 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { + int surrogate = columnPage.getInt(rowId); + int input = ByteBuffer.wrap(compareValue).getInt(); + return surrogate - input; +} else { + byte[] data; + if (columnPage.getDataType() == DataTypes.INT) { --- End diff -- First convert `compareValue` to respective datatype and compare with actual value ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198867991 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { --- End diff -- remove dictionary ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198866126 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -17,45 +17,102 @@ package org.apache.carbondata.core.datastore.chunk.store; +import java.nio.ByteBuffer; +import java.util.BitSet; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.ColumnType; +import org.apache.carbondata.core.datastore.TableSpec; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.util.ByteUtil; public class ColumnPageWrapper implements DimensionColumnPage { private ColumnPage columnPage; + private TableSpec.ColumnSpec columnSpec; + private int columnValueSize = 0; + public ColumnPageWrapper(ColumnPage columnPage) { this.columnPage = columnPage; +this.columnSpec = columnPage.getColumnSpec(); } @Override public int fillRawData(int rowId, int offset, byte[] data, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// TODO verify the implementation. Mostly this is for dictionary. +int surrogate = columnPage.getInt(rowId); +ByteBuffer buffer = ByteBuffer.wrap(data); +buffer.putInt(offset, surrogate); +return columnValueSize; } @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId); +return chunkIndex + 1; } @Override public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// fill the vector with data in column page +ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex]; +CarbonColumnVector vector = columnVectorInfo.vector; +fillData(null, columnVectorInfo, vector); +return chunkIndex + 1; } + @Override public int fillVector(int[] filteredRowId, ColumnVectorInfo[] vectorInfo, int chunkIndex, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex]; +CarbonColumnVector vector = columnVectorInfo.vector; +fillData(filteredRowId, columnVectorInfo, vector); +return chunkIndex + 1; } - @Override - public byte[] getChunkData(int rowId) { -return columnPage.getBytes(rowId); + @Override public byte[] getChunkData(int rowId) { +ColumnType columnType = columnPage.getColumnSpec().getColumnType(); +if (columnType == ColumnType.DIRECT_DICTIONARY) { --- End diff -- remove dictionary and direct dicionary ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198865395 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { + int surrogate = columnPage.getInt(rowId); + int input = ByteBuffer.wrap(compareValue).getInt(); + return surrogate - input; +} else { + byte[] data; + if (columnPage.getDataType() == DataTypes.INT) { +data = ByteUtil.toBytes(columnPage.getInt(rowId)); + } else if (columnPage.getDataType() == DataTypes.STRING) { +data = columnPage.getBytes(rowId); + } else { +throw new RuntimeException("invalid data type for dimension: " + columnPage.getDataType()); + } + return ByteUtil.UnsafeComparer.INSTANCE + .compareTo(data, 0, data.length, compareValue, 0, compareValue.length); +} } @Override public void freeMemory() { } + private void fillData(int[] rowMapping, ColumnVectorInfo columnVectorInfo, + CarbonColumnVector vector) { +int offsetRowId = columnVectorInfo.offset; +int vectorOffset = columnVectorInfo.vectorOffset; +int maxRowId = offsetRowId + columnVectorInfo.size; +BitSet nullBitset = columnPage.getNullBits(); +switch (columnSpec.getColumnType()) { + case DIRECT_DICTIONARY: +DirectDictionaryGenerator generator = columnVectorInfo.directDictionaryGenerator; +assert (generator != null); +DataType dataType = generator.getReturnType(); +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int surrogate = columnPage.getInt(currentRowId); +Object valueFromSurrogate = generator.getValueFromSurrogate(surrogate); +if (valueFromSurrogate == null) { + vector.putNull(vectorOffset++); +} else { + if (dataType == DataTypes.INT) { +vector.putInt(vectorOffset++, (int) valueFromSurrogate); + } else { +vector.putLong(vectorOffset++, (long) valueFromSurrogate); + } +} + } +} +break; + case GLOBAL_DICTIONARY: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int data = columnPage.getInt(currentRowId); +vector.putInt(vectorOffset++, data); + } +} +break; + case PLAIN_VALUE: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +if (columnSpec.getSchemaDataType() == DataTypes.STRING) { + byte[] data = columnPage.getBytes(currentRowId); + if (isNullPlainValue(data)) { +vector.putNull(vectorOffset++); + } else { +vector.putBytes(vectorOffset++, 0, data.length, data); + } +} else if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) { + boolean data = columnPage.getBoolean(currentRowId); + vector.putBoolean(vectorOffset++, (boolean) data); +} else if (columnSpec.getSchemaDataType() == DataTypes.INT) { + // TODO have to check for other dataTypes. Only INT Specified Now. + int data = columnPage.getInt(currentRowId); + vector.putInt(vectorOffset++, (int) data); +} else if (columnSpec.getSchemaDataType() == DataTypes.LONG) { + long data = columnPage.getLong(currentRowId); + vector.putLong(vectorOffset++, (long) data); +} else if (columnSpec.getSchemaDataType() == DataTypes.TIMESTAMP) { + long
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198864334 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { + int surrogate = columnPage.getInt(rowId); + int input = ByteBuffer.wrap(compareValue).getInt(); + return surrogate - input; +} else { + byte[] data; + if (columnPage.getDataType() == DataTypes.INT) { +data = ByteUtil.toBytes(columnPage.getInt(rowId)); + } else if (columnPage.getDataType() == DataTypes.STRING) { +data = columnPage.getBytes(rowId); + } else { +throw new RuntimeException("invalid data type for dimension: " + columnPage.getDataType()); + } + return ByteUtil.UnsafeComparer.INSTANCE + .compareTo(data, 0, data.length, compareValue, 0, compareValue.length); +} } @Override public void freeMemory() { } + private void fillData(int[] rowMapping, ColumnVectorInfo columnVectorInfo, + CarbonColumnVector vector) { +int offsetRowId = columnVectorInfo.offset; +int vectorOffset = columnVectorInfo.vectorOffset; +int maxRowId = offsetRowId + columnVectorInfo.size; +BitSet nullBitset = columnPage.getNullBits(); +switch (columnSpec.getColumnType()) { + case DIRECT_DICTIONARY: +DirectDictionaryGenerator generator = columnVectorInfo.directDictionaryGenerator; +assert (generator != null); +DataType dataType = generator.getReturnType(); +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int surrogate = columnPage.getInt(currentRowId); +Object valueFromSurrogate = generator.getValueFromSurrogate(surrogate); +if (valueFromSurrogate == null) { + vector.putNull(vectorOffset++); +} else { + if (dataType == DataTypes.INT) { +vector.putInt(vectorOffset++, (int) valueFromSurrogate); + } else { +vector.putLong(vectorOffset++, (long) valueFromSurrogate); + } +} + } +} +break; + case GLOBAL_DICTIONARY: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int data = columnPage.getInt(currentRowId); +vector.putInt(vectorOffset++, data); + } +} +break; + case PLAIN_VALUE: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +if (columnSpec.getSchemaDataType() == DataTypes.STRING) { + byte[] data = columnPage.getBytes(currentRowId); + if (isNullPlainValue(data)) { +vector.putNull(vectorOffset++); + } else { +vector.putBytes(vectorOffset++, 0, data.length, data); + } +} else if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) { + boolean data = columnPage.getBoolean(currentRowId); + vector.putBoolean(vectorOffset++, (boolean) data); +} else if (columnSpec.getSchemaDataType() == DataTypes.INT) { + // TODO have to check for other dataTypes. Only INT Specified Now. + int data = columnPage.getInt(currentRowId); + vector.putInt(vectorOffset++, (int) data); --- End diff -- remove all typecasts, not required ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198864176 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { + int surrogate = columnPage.getInt(rowId); + int input = ByteBuffer.wrap(compareValue).getInt(); + return surrogate - input; +} else { + byte[] data; + if (columnPage.getDataType() == DataTypes.INT) { +data = ByteUtil.toBytes(columnPage.getInt(rowId)); + } else if (columnPage.getDataType() == DataTypes.STRING) { +data = columnPage.getBytes(rowId); + } else { +throw new RuntimeException("invalid data type for dimension: " + columnPage.getDataType()); + } + return ByteUtil.UnsafeComparer.INSTANCE + .compareTo(data, 0, data.length, compareValue, 0, compareValue.length); +} } @Override public void freeMemory() { } + private void fillData(int[] rowMapping, ColumnVectorInfo columnVectorInfo, + CarbonColumnVector vector) { +int offsetRowId = columnVectorInfo.offset; +int vectorOffset = columnVectorInfo.vectorOffset; +int maxRowId = offsetRowId + columnVectorInfo.size; +BitSet nullBitset = columnPage.getNullBits(); +switch (columnSpec.getColumnType()) { + case DIRECT_DICTIONARY: +DirectDictionaryGenerator generator = columnVectorInfo.directDictionaryGenerator; +assert (generator != null); +DataType dataType = generator.getReturnType(); +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int surrogate = columnPage.getInt(currentRowId); +Object valueFromSurrogate = generator.getValueFromSurrogate(surrogate); +if (valueFromSurrogate == null) { + vector.putNull(vectorOffset++); +} else { + if (dataType == DataTypes.INT) { +vector.putInt(vectorOffset++, (int) valueFromSurrogate); + } else { +vector.putLong(vectorOffset++, (long) valueFromSurrogate); + } +} + } +} +break; + case GLOBAL_DICTIONARY: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +int data = columnPage.getInt(currentRowId); +vector.putInt(vectorOffset++, data); + } +} +break; + case PLAIN_VALUE: +for (int rowId = offsetRowId; rowId < maxRowId; rowId++) { + int currentRowId = (rowMapping == null) ? rowId : rowMapping[rowId]; + if (nullBitset.get(currentRowId)) { +vector.putNull(vectorOffset++); + } else { +if (columnSpec.getSchemaDataType() == DataTypes.STRING) { + byte[] data = columnPage.getBytes(currentRowId); + if (isNullPlainValue(data)) { +vector.putNull(vectorOffset++); + } else { +vector.putBytes(vectorOffset++, 0, data.length, data); + } +} else if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) { + boolean data = columnPage.getBoolean(currentRowId); + vector.putBoolean(vectorOffset++, (boolean) data); +} else if (columnSpec.getSchemaDataType() == DataTypes.INT) { + // TODO have to check for other dataTypes. Only INT Specified Now. --- End diff -- remove it ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198864088 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -77,14 +134,115 @@ public boolean isExplicitSorted() { return false; } - @Override - public int compareTo(int rowId, byte[] compareValue) { -throw new UnsupportedOperationException("internal error"); + @Override public int compareTo(int rowId, byte[] compareValue) { +if (columnPage.getColumnSpec().getColumnType() == ColumnType.DIRECT_DICTIONARY) { + int surrogate = columnPage.getInt(rowId); + int input = ByteBuffer.wrap(compareValue).getInt(); + return surrogate - input; +} else { + byte[] data; + if (columnPage.getDataType() == DataTypes.INT) { +data = ByteUtil.toBytes(columnPage.getInt(rowId)); + } else if (columnPage.getDataType() == DataTypes.STRING) { +data = columnPage.getBytes(rowId); + } else { +throw new RuntimeException("invalid data type for dimension: " + columnPage.getDataType()); + } + return ByteUtil.UnsafeComparer.INSTANCE + .compareTo(data, 0, data.length, compareValue, 0, compareValue.length); +} } @Override public void freeMemory() { } + private void fillData(int[] rowMapping, ColumnVectorInfo columnVectorInfo, + CarbonColumnVector vector) { +int offsetRowId = columnVectorInfo.offset; +int vectorOffset = columnVectorInfo.vectorOffset; +int maxRowId = offsetRowId + columnVectorInfo.size; +BitSet nullBitset = columnPage.getNullBits(); +switch (columnSpec.getColumnType()) { + case DIRECT_DICTIONARY: --- End diff -- No need to handle `DIRECT_DICTIONARY` and 'GLOBAL_DICTIONARY' ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198861716 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -17,45 +17,102 @@ package org.apache.carbondata.core.datastore.chunk.store; +import java.nio.ByteBuffer; +import java.util.BitSet; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.ColumnType; +import org.apache.carbondata.core.datastore.TableSpec; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.util.ByteUtil; public class ColumnPageWrapper implements DimensionColumnPage { private ColumnPage columnPage; + private TableSpec.ColumnSpec columnSpec; + private int columnValueSize = 0; + public ColumnPageWrapper(ColumnPage columnPage) { this.columnPage = columnPage; +this.columnSpec = columnPage.getColumnSpec(); } @Override public int fillRawData(int rowId, int offset, byte[] data, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// TODO verify the implementation. Mostly this is for dictionary. +int surrogate = columnPage.getInt(rowId); +ByteBuffer buffer = ByteBuffer.wrap(data); +buffer.putInt(offset, surrogate); +return columnValueSize; } @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId); --- End diff -- not required, remove ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198861844 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -17,45 +17,102 @@ package org.apache.carbondata.core.datastore.chunk.store; +import java.nio.ByteBuffer; +import java.util.BitSet; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.ColumnType; +import org.apache.carbondata.core.datastore.TableSpec; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.util.ByteUtil; public class ColumnPageWrapper implements DimensionColumnPage { private ColumnPage columnPage; + private TableSpec.ColumnSpec columnSpec; + private int columnValueSize = 0; + public ColumnPageWrapper(ColumnPage columnPage) { this.columnPage = columnPage; +this.columnSpec = columnPage.getColumnSpec(); } @Override public int fillRawData(int rowId, int offset, byte[] data, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// TODO verify the implementation. Mostly this is for dictionary. +int surrogate = columnPage.getInt(rowId); +ByteBuffer buffer = ByteBuffer.wrap(data); +buffer.putInt(offset, surrogate); +return columnValueSize; } @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId); +return chunkIndex + 1; } @Override public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// fill the vector with data in column page --- End diff -- no need ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198861537 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java --- @@ -17,45 +17,102 @@ package org.apache.carbondata.core.datastore.chunk.store; +import java.nio.ByteBuffer; +import java.util.BitSet; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.ColumnType; +import org.apache.carbondata.core.datastore.TableSpec; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.util.ByteUtil; public class ColumnPageWrapper implements DimensionColumnPage { private ColumnPage columnPage; + private TableSpec.ColumnSpec columnSpec; + private int columnValueSize = 0; + public ColumnPageWrapper(ColumnPage columnPage) { this.columnPage = columnPage; +this.columnSpec = columnPage.getColumnSpec(); } @Override public int fillRawData(int rowId, int offset, byte[] data, KeyStructureInfo restructuringInfo) { -throw new UnsupportedOperationException("internal error"); +// TODO verify the implementation. Mostly this is for dictionary. +int surrogate = columnPage.getInt(rowId); +ByteBuffer buffer = ByteBuffer.wrap(data); +buffer.putInt(offset, surrogate); +return columnValueSize; --- End diff -- No implementation required here as it is used only for dictionary. It will go to FixedChunReader so just `throw new UnsupportedOperationExceptio` ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2417#discussion_r198857878 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,6 +221,9 @@ private boolean isEncodedWithMeta(DataChunk2 pageMetadata) { if (encodings != null && encodings.size() == 1) { Encoding encoding = encodings.get(0); switch (encoding) { +case ADAPTIVE_INTEGRAL: +case ADAPTIVE_DELTA_INTEGRAL: +case ADAPTIVE_FLOATING: --- End diff -- `ADAPTIVE_DELTA_FLOATING` is not required to add here? ---
[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...
GitHub user sounakr opened a pull request: https://github.com/apache/carbondata/pull/2417 [WIP][Complex Column Enhancements]Primitive DataType Adaptive Encoding Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sounakr/incubator-carbondata primitive_adaptive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2417 commit 6d8a958726af35103d3a64d081e28fc2c423c990 Author: sounakr Date: 2018-06-26T11:18:39Z Adaptive Encoding For No Dictionary For INT DataType commit 86e1756f19c34ffd2020c1cf3f177bf3205fdabe Author: sounakr Date: 2018-06-26T14:04:49Z String DataType No Dictionary Rectification commit 16bf26e8504221d7213ed655065bde921aef71e1 Author: sounakr Date: 2018-06-26T14:47:31Z Long DataType No Dictionary commit 97380d56dd70f7a158fdda86443e13fb270acc5e Author: sounakr Date: 2018-06-26T22:16:37Z TimeStamp DataType No Dictionary commit 3343d0074c908cd6ee6b1f6fb481179f4f37d1fb Author: sounakr Date: 2018-06-27T03:07:01Z Review ---