[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2396 ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197724441 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; --- End diff -- it should be at method level ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197724373 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -102,6 +123,9 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { dictionaryColumnIndex = 0; noDictionaryColumnIndex = 0; complexTypeColumnIndex = 0; +mergedComplexDimensionDataMap = new HashMap<>(); --- End diff -- Why it is created for each row? why not reuse ? ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197723962 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; + + /** + * Fields of this Map of Parent Ordinal with the List is the Child Column Dimension and + * the corresponding data buffer of that column. + */ + + public Map> mergedComplexDimensionDataMap; --- End diff -- it should be private ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197723498 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; --- End diff -- Why it cannot be moved? ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197678648 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; + + /** + * Fields of this Map of Parent Ordinal with the List is the Child Column Dimension and + * the corresponding data buffer of that column. + */ + + public Map> mergedComplexDimensionDataMap; --- End diff -- It is already at class level and initialised only once and reused ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197678288 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -27,6 +28,7 @@ import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; --- End diff -- Dictionary will be handled with Adaptive encoding PR ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197678273 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -19,6 +19,7 @@ import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Arrays; +import java.util.HashMap; --- End diff -- VectorCollector is not required for now. It will be handled in next Complex data type enhancement PR ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645303 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java --- @@ -97,4 +99,56 @@ public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +int dataLength; +if (parent.getOrdinal() < child.getOrdinal()) { + dataLength = parent.getNumberOfChild(); + + if (dataLength == -1) { +return null; + } + Object[] data = new Object[dataLength]; + for (int i = 0; i < dataLength; i++) { +data[i] = children +.getDataBasedOnColumn(dataBuffer, parent.getListOfChildDimensions().get(i), child); + } + return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); +} else if (parent.getOrdinal() > child.getOrdinal()) { + return null; +} else { + // dataLength = dataBuffer.getInt(); + return DataTypeUtil.getDataTypeConverter() + .wrapWithGenericArrayData(getDataBasedOnDataType(dataBuffer)); +} + } + + @Override public Object getDataBasedOnColumnList(Map childBuffer, --- End diff -- changed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645280 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala --- @@ -67,14 +68,68 @@ case class CarbonDatasourceHadoopRelation( override def schema: StructType = tableSchema.getOrElse(carbonRelation.schema) def buildScan(requiredColumns: Array[String], + projects: Seq[NamedExpression], filters: Array[Filter], partitions: Seq[PartitionSpec]): RDD[InternalRow] = { val filterExpression: Option[Expression] = filters.flatMap { filter => CarbonFilters.createCarbonFilter(schema, filter) }.reduceOption(new AndExpression(_, _)) +var parentColumns = new ListBuffer[String] +// In case of Struct or StructofStruct Complex type, get the project column for given +// parent/child field and pushdown the corresponding project column. In case of Array, +// ArrayofStruct or StructofArray, pushdown parent column +var reqColumns = projects.map { + case a@Alias(s: GetStructField, name) => +val arrayTypeExists = s.childSchema.map(x => x.dataType) + .filter(dataType => dataType.isInstanceOf[ArrayType]) +if (0 == arrayTypeExists.length) { + val columnName = s.toString().replaceAll("#[0-9]*", "") + parentColumns += columnName.split("\\.")(0) + columnName +} +else { + None +} + case a@Alias(s: GetArrayItem, name) => +None + case other => other.name.replaceAll("#[0-9]*", "") +} + +var reqCols = reqColumns.filterNot(none => none.equals(None)).map(col => col.toString) +parentColumns = parentColumns.distinct +reqCols = reqCols.distinct + +// if the parent column is there in the projection list then we can filter out all the children +// in that projection list +val parentColumnOnProjectionList = reqCols.filter(col => parentColumns.contains(col)) --- End diff -- removed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645259 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -417,6 +417,44 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } } + /** + * Returns true for fixed length DataTypes. + * @param dataType + * @return + */ + public static boolean isFixedSizeDataType(DataType dataType) { +if (dataType == DataTypes.STRING || DataTypes.isDecimal(dataType)) { + return false; +} else { + return true; +} + } + /** + * get the size of fixed size DataType + * @param actualDataType + * @return + */ + public static int getSizeOfDataType(DataType actualDataType) { --- End diff -- changed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645177 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -136,28 +189,81 @@ void fillDimensionData(BlockletScannedResult scannedResult, int[] surrogateResul } } else if (complexDataTypeArray[i]) { // Complex Type With No Dictionary Encoding. -row[order[i]] = comlexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal()) - .getDataBasedOnDataType(ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++])); +if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { + if (mergedComplexDimensionColumns + .get(queryDimensions[i].getDimension().getParentOrdinal()).size() > 1) { +fillRowForComplexColumn(complexDimensionInfoMap, row, i); + } else { +row[order[i]] = + complexDimensionInfoMap.get(queryDimensions[i].getDimension().getParentOrdinal()) +.getDataBasedOnColumn( + ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]), + queryDimensions[i].getDimension().getComplexParentDimension(), +queryDimensions[i].getDimension()); + } +} else { + row[order[i]] = + complexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal()) + .getDataBasedOnDataType( + ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++])); +} } else { -row[order[i]] = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn( -noDictionaryKeys[noDictionaryColumnIndex++], -queryDimensions[i].getDimension().getDataType()); +if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { --- End diff -- done ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645154 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/PrimitiveQueryType.java --- @@ -146,4 +148,58 @@ public PrimitiveQueryType(String name, String parentname, int blockIndex, return actualData; } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +Object actualData = null; + +if (parent.getOrdinal() != child.getOrdinal() || null == dataBuffer) { + return null; +} + +if (isDirectDictionary) { --- End diff -- removed and changed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645152 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/StructQueryType.java --- @@ -109,4 +111,53 @@ public StructQueryType(String name, String parentname, int blockIndex) { } return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields); } + + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +int childLength; +if (parent.getOrdinal() < child.getOrdinal()) { + childLength = parent.getNumberOfChild(); + Object[] fields = new Object[childLength]; + for (int i = 0; i < childLength; i++) { +fields[i] = children.get(i) +.getDataBasedOnColumn(dataBuffer, parent.getListOfChildDimensions().get(i), child); + } + return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields); +} else if (parent.getOrdinal() > child.getOrdinal()) { + return null; +} +else { --- End diff -- done ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645136 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -156,8 +162,70 @@ private static boolean isColumnMatches(boolean isTransactionalTable, // If it is non transactional table just check the column names, no need to validate // column id as multiple sdk's output placed in a single folder doesn't have same // column ID but can have same column name -return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || -(!isTransactionalTable && tableColumn.getColName().equals(queryColumn.getColName(; +if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId() +== DataTypes.ARRAY_TYPE_ID)) { + if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) { +return true; + } else { +return isColumnMatchesStruct(tableColumn, queryColumn); + } +} else { + return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || (!isTransactionalTable + && tableColumn.getColName().equals(queryColumn.getColName(; +} + } + + /** + * In case of Multilevel Complex column - STRUCT/STRUCTofSTRUCT, traverse all the child dimension + * to check column Id + * + * @param tableColumn + * @param queryColumn + * @return + */ + private static boolean isColumnMatchesStruct(CarbonColumn tableColumn, CarbonColumn queryColumn) { +if (tableColumn instanceof CarbonDimension) { + CarbonDimension dimension = (CarbonDimension) tableColumn; + if (dimension.getColName().equalsIgnoreCase(queryColumn.getColName().split("\\.")[0])) { +if (dimension.getListOfChildDimensions() != null) { + CarbonColumn columnMatchesStructlevel = + isColumnMatchesStructlevel(tableColumn, dimension.getListOfChildDimensions(), + queryColumn); + if (null != columnMatchesStructlevel) { +return true; + } else { +return false; + } +} + } else { +return (tableColumn.getColumnId().equals(queryColumn.getColumnId())); + } +} +return false; + } + + private static CarbonColumn isColumnMatchesStructlevel(CarbonColumn tableColumn, --- End diff -- changed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645133 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java --- @@ -54,18 +58,134 @@ public QueryModelBuilder projectColumns(String[] projectionColumns) { } else { CarbonMeasure measure = table.getMeasureByName(factTableName, projectionColumnName); if (measure == null) { - throw new RuntimeException(projectionColumnName + - " column not found in the table " + factTableName); + throw new RuntimeException( + projectionColumnName + " column not found in the table " + factTableName); } projection.addMeasure(measure, i); i++; } } - +optimizeProjectionForComplexColumns(projection); this.projection = projection; return this; } + private void optimizeProjectionForComplexColumns(QueryProjection projection) { +// Get the List of Complex Column Projection. +// The optimization techniques which can be applied are +// A. Merging in Driver Side +// B. Merging in the result Collector side. +// Merging is driver side cases are +// Driver merging will eliminate one of the CarbonDimension. +// Executor merging will merge the column output in Result Collector. +// In this routine we are going to do driver merging and leave executor merging. +Map> complexColumnMap = new HashMap<>(); +List carbonDimensions = projection.getDimensions(); +for (ProjectionDimension cols : carbonDimensions) { + // get all the Projections with Parent Ordinal Set. + if (cols.getDimension().getParentOrdinal() != -1) { +if (complexColumnMap.get(cols.getDimension().getParentOrdinal()) != null) { + List childColumns = complexColumnMap.get(cols.getDimension().getParentOrdinal()); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} else { + List childColumns = new ArrayList<>(); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} + } +} + +// Traverse the Map to Find any columns are parent. +for (Map.Entry> entry : complexColumnMap.entrySet()) { + List childOrdinals = entry.getValue(); + if (childOrdinals.size() > 1) { +// In case of more that one child, have to check if the child columns are in the same path +// and have a common parent. +Collections.sort(childOrdinals); +List mergedOrdinals = mergeChildColumns(childOrdinals, entry.getKey()); +if (mergedOrdinals.size() > 0) { + projection = removeDimension(projection, mergedOrdinals); +} + } +} + } + + private QueryProjection removeDimension(QueryProjection projection, + List mergedOrdinals) { +List carbonDimensions = projection.getDimensions(); +QueryProjection outputProjection = new QueryProjection(); +int i = 0; +for (ProjectionDimension cols : carbonDimensions) { + if (!mergedOrdinals.contains(cols.getDimension().getOrdinal())) { +outputProjection.addDimension(cols.getDimension(), i++); + } +} +List carbonMeasures = projection.getMeasures(); +for (ProjectionMeasure cols : carbonMeasures) { + outputProjection.addMeasure(cols.getMeasure(), i++); +} +return outputProjection; + } + + private List mergeChildColumns(List childOrdinals, Integer key) { --- End diff -- removed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645110 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java --- @@ -97,4 +99,56 @@ public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, --- End diff -- Okay.done ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197645070 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; --- End diff -- Cannot be moved as it is used at class level ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197644958 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/CarbonDimension.java --- @@ -51,6 +51,16 @@ */ private int complexTypeOrdinal; + /** + * to store the Ordinal of the Complex Parent Column. + */ + private int parentOrdinal = -1; --- End diff -- Removed ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user Indhumathi27 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197644949 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -665,6 +665,34 @@ public CarbonDimension getDimensionByName(String tableName, String columnName) { if (dim.getColName().equalsIgnoreCase(columnName)) { carbonDimension = dim; break; + } else if (dim.getListOfChildDimensions() != null) { --- End diff -- simplified. please check ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197502390 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java --- @@ -97,4 +99,56 @@ public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +int dataLength; +if (parent.getOrdinal() < child.getOrdinal()) { + dataLength = parent.getNumberOfChild(); + + if (dataLength == -1) { +return null; + } + Object[] data = new Object[dataLength]; + for (int i = 0; i < dataLength; i++) { +data[i] = children +.getDataBasedOnColumn(dataBuffer, parent.getListOfChildDimensions().get(i), child); + } + return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); +} else if (parent.getOrdinal() > child.getOrdinal()) { + return null; +} else { + // dataLength = dataBuffer.getInt(); + return DataTypeUtil.getDataTypeConverter() + .wrapWithGenericArrayData(getDataBasedOnDataType(dataBuffer)); +} + } + + @Override public Object getDataBasedOnColumnList(Map childBuffer, --- End diff -- Don't implement it, throw unsupported exception ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197502370 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java --- @@ -97,4 +99,56 @@ public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data); } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, --- End diff -- Don't implement it, throw unsupported exception ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197502251 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala --- @@ -67,14 +68,68 @@ case class CarbonDatasourceHadoopRelation( override def schema: StructType = tableSchema.getOrElse(carbonRelation.schema) def buildScan(requiredColumns: Array[String], + projects: Seq[NamedExpression], filters: Array[Filter], partitions: Seq[PartitionSpec]): RDD[InternalRow] = { val filterExpression: Option[Expression] = filters.flatMap { filter => CarbonFilters.createCarbonFilter(schema, filter) }.reduceOption(new AndExpression(_, _)) +var parentColumns = new ListBuffer[String] +// In case of Struct or StructofStruct Complex type, get the project column for given +// parent/child field and pushdown the corresponding project column. In case of Array, +// ArrayofStruct or StructofArray, pushdown parent column +var reqColumns = projects.map { + case a@Alias(s: GetStructField, name) => +val arrayTypeExists = s.childSchema.map(x => x.dataType) + .filter(dataType => dataType.isInstanceOf[ArrayType]) +if (0 == arrayTypeExists.length) { + val columnName = s.toString().replaceAll("#[0-9]*", "") + parentColumns += columnName.split("\\.")(0) + columnName +} +else { + None +} + case a@Alias(s: GetArrayItem, name) => +None + case other => other.name.replaceAll("#[0-9]*", "") +} + +var reqCols = reqColumns.filterNot(none => none.equals(None)).map(col => col.toString) +parentColumns = parentColumns.distinct +reqCols = reqCols.distinct + +// if the parent column is there in the projection list then we can filter out all the children +// in that projection list +val parentColumnOnProjectionList = reqCols.filter(col => parentColumns.contains(col)) --- End diff -- Remove merge logic here ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197501751 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -417,6 +417,44 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } } + /** + * Returns true for fixed length DataTypes. + * @param dataType + * @return + */ + public static boolean isFixedSizeDataType(DataType dataType) { +if (dataType == DataTypes.STRING || DataTypes.isDecimal(dataType)) { + return false; +} else { + return true; +} + } + /** + * get the size of fixed size DataType + * @param actualDataType + * @return + */ + public static int getSizeOfDataType(DataType actualDataType) { --- End diff -- Get the size in bytes directly from DataType ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197501384 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java --- @@ -54,18 +58,134 @@ public QueryModelBuilder projectColumns(String[] projectionColumns) { } else { CarbonMeasure measure = table.getMeasureByName(factTableName, projectionColumnName); if (measure == null) { - throw new RuntimeException(projectionColumnName + - " column not found in the table " + factTableName); + throw new RuntimeException( + projectionColumnName + " column not found in the table " + factTableName); } projection.addMeasure(measure, i); i++; } } - +optimizeProjectionForComplexColumns(projection); this.projection = projection; return this; } + private void optimizeProjectionForComplexColumns(QueryProjection projection) { +// Get the List of Complex Column Projection. +// The optimization techniques which can be applied are +// A. Merging in Driver Side +// B. Merging in the result Collector side. +// Merging is driver side cases are +// Driver merging will eliminate one of the CarbonDimension. +// Executor merging will merge the column output in Result Collector. +// In this routine we are going to do driver merging and leave executor merging. +Map> complexColumnMap = new HashMap<>(); +List carbonDimensions = projection.getDimensions(); +for (ProjectionDimension cols : carbonDimensions) { + // get all the Projections with Parent Ordinal Set. + if (cols.getDimension().getParentOrdinal() != -1) { +if (complexColumnMap.get(cols.getDimension().getParentOrdinal()) != null) { + List childColumns = complexColumnMap.get(cols.getDimension().getParentOrdinal()); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} else { + List childColumns = new ArrayList<>(); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} + } +} + +// Traverse the Map to Find any columns are parent. +for (Map.Entry> entry : complexColumnMap.entrySet()) { + List childOrdinals = entry.getValue(); + if (childOrdinals.size() > 1) { +// In case of more that one child, have to check if the child columns are in the same path +// and have a common parent. +Collections.sort(childOrdinals); +List mergedOrdinals = mergeChildColumns(childOrdinals, entry.getKey()); +if (mergedOrdinals.size() > 0) { + projection = removeDimension(projection, mergedOrdinals); +} + } +} + } + + private QueryProjection removeDimension(QueryProjection projection, + List mergedOrdinals) { +List carbonDimensions = projection.getDimensions(); +QueryProjection outputProjection = new QueryProjection(); +int i = 0; +for (ProjectionDimension cols : carbonDimensions) { + if (!mergedOrdinals.contains(cols.getDimension().getOrdinal())) { +outputProjection.addDimension(cols.getDimension(), i++); + } +} +List carbonMeasures = projection.getMeasures(); +for (ProjectionMeasure cols : carbonMeasures) { + outputProjection.addMeasure(cols.getMeasure(), i++); +} +return outputProjection; + } + + private List mergeChildColumns(List childOrdinals, Integer key) { +// Check If children if they are in the path of not. +List mergedChild = new ArrayList<>(); + +for (int i = 0; i < childOrdinals.size(); i++) { + for (int j = i; j < childOrdinals.size(); j++) { +if (!mergedChild.contains(childOrdinals.get(j)) && checkChildsInSamePath( +childOrdinals.get(i), childOrdinals.get(j))) { + mergedChild.add(j); +} + } +} +return mergedChild; + } + + private boolean checkChildsInSamePath(Integer parentOrdinal, Integer childOrdinal) { +List dimList = table.getDimensions(); +CarbonDimension parentDimension = getDimensionBasedOnOrdinal(dimList, parentOrdinal); +CarbonDimension childDimension = getDimensionBasedOnOrdinal(dimList, childOrdinal); +if (checkForChildColumns(parentDimension, childDimension)) { + return true; +} else { + return false; +} + } + +
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197499107 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java --- @@ -54,18 +58,134 @@ public QueryModelBuilder projectColumns(String[] projectionColumns) { } else { CarbonMeasure measure = table.getMeasureByName(factTableName, projectionColumnName); if (measure == null) { - throw new RuntimeException(projectionColumnName + - " column not found in the table " + factTableName); + throw new RuntimeException( + projectionColumnName + " column not found in the table " + factTableName); } projection.addMeasure(measure, i); i++; } } - +optimizeProjectionForComplexColumns(projection); this.projection = projection; return this; } + private void optimizeProjectionForComplexColumns(QueryProjection projection) { +// Get the List of Complex Column Projection. +// The optimization techniques which can be applied are +// A. Merging in Driver Side +// B. Merging in the result Collector side. +// Merging is driver side cases are +// Driver merging will eliminate one of the CarbonDimension. +// Executor merging will merge the column output in Result Collector. +// In this routine we are going to do driver merging and leave executor merging. +Map> complexColumnMap = new HashMap<>(); +List carbonDimensions = projection.getDimensions(); +for (ProjectionDimension cols : carbonDimensions) { + // get all the Projections with Parent Ordinal Set. + if (cols.getDimension().getParentOrdinal() != -1) { +if (complexColumnMap.get(cols.getDimension().getParentOrdinal()) != null) { + List childColumns = complexColumnMap.get(cols.getDimension().getParentOrdinal()); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} else { + List childColumns = new ArrayList<>(); + childColumns.add(cols.getDimension().getOrdinal()); + complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns); +} + } +} + +// Traverse the Map to Find any columns are parent. +for (Map.Entry> entry : complexColumnMap.entrySet()) { + List childOrdinals = entry.getValue(); + if (childOrdinals.size() > 1) { +// In case of more that one child, have to check if the child columns are in the same path +// and have a common parent. +Collections.sort(childOrdinals); +List mergedOrdinals = mergeChildColumns(childOrdinals, entry.getKey()); +if (mergedOrdinals.size() > 0) { + projection = removeDimension(projection, mergedOrdinals); +} + } +} + } + + private QueryProjection removeDimension(QueryProjection projection, + List mergedOrdinals) { +List carbonDimensions = projection.getDimensions(); +QueryProjection outputProjection = new QueryProjection(); +int i = 0; +for (ProjectionDimension cols : carbonDimensions) { + if (!mergedOrdinals.contains(cols.getDimension().getOrdinal())) { +outputProjection.addDimension(cols.getDimension(), i++); + } +} +List carbonMeasures = projection.getMeasures(); +for (ProjectionMeasure cols : carbonMeasures) { + outputProjection.addMeasure(cols.getMeasure(), i++); +} +return outputProjection; + } + + private List mergeChildColumns(List childOrdinals, Integer key) { --- End diff -- remove unnecessary key ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197496409 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -156,8 +162,70 @@ private static boolean isColumnMatches(boolean isTransactionalTable, // If it is non transactional table just check the column names, no need to validate // column id as multiple sdk's output placed in a single folder doesn't have same // column ID but can have same column name -return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || -(!isTransactionalTable && tableColumn.getColName().equals(queryColumn.getColName(; +if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId() +== DataTypes.ARRAY_TYPE_ID)) { + if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) { +return true; + } else { +return isColumnMatchesStruct(tableColumn, queryColumn); + } +} else { + return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || (!isTransactionalTable + && tableColumn.getColName().equals(queryColumn.getColName(; +} + } + + /** + * In case of Multilevel Complex column - STRUCT/STRUCTofSTRUCT, traverse all the child dimension + * to check column Id + * + * @param tableColumn + * @param queryColumn + * @return + */ + private static boolean isColumnMatchesStruct(CarbonColumn tableColumn, CarbonColumn queryColumn) { +if (tableColumn instanceof CarbonDimension) { + CarbonDimension dimension = (CarbonDimension) tableColumn; + if (dimension.getColName().equalsIgnoreCase(queryColumn.getColName().split("\\.")[0])) { +if (dimension.getListOfChildDimensions() != null) { + CarbonColumn columnMatchesStructlevel = + isColumnMatchesStructlevel(tableColumn, dimension.getListOfChildDimensions(), + queryColumn); + if (null != columnMatchesStructlevel) { +return true; + } else { +return false; + } +} + } else { +return (tableColumn.getColumnId().equals(queryColumn.getColumnId())); + } +} +return false; + } + + private static CarbonColumn isColumnMatchesStructlevel(CarbonColumn tableColumn, --- End diff -- CHange as per CarbonTable suggested code ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197492221 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/StructQueryType.java --- @@ -109,4 +111,53 @@ public StructQueryType(String name, String parentname, int blockIndex) { } return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields); } + + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +int childLength; +if (parent.getOrdinal() < child.getOrdinal()) { + childLength = parent.getNumberOfChild(); + Object[] fields = new Object[childLength]; + for (int i = 0; i < childLength; i++) { +fields[i] = children.get(i) +.getDataBasedOnColumn(dataBuffer, parent.getListOfChildDimensions().get(i), child); + } + return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields); +} else if (parent.getOrdinal() > child.getOrdinal()) { + return null; +} +else { --- End diff -- correct the style ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197491159 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/PrimitiveQueryType.java --- @@ -146,4 +148,58 @@ public PrimitiveQueryType(String name, String parentname, int blockIndex, return actualData; } + @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent, + CarbonDimension child) { +Object actualData = null; + +if (parent.getOrdinal() != child.getOrdinal() || null == dataBuffer) { + return null; +} + +if (isDirectDictionary) { --- End diff -- Remove the duplicate code, code is almost similar to `getDataBasedOnDataType` ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197488859 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -136,28 +189,81 @@ void fillDimensionData(BlockletScannedResult scannedResult, int[] surrogateResul } } else if (complexDataTypeArray[i]) { // Complex Type With No Dictionary Encoding. -row[order[i]] = comlexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal()) - .getDataBasedOnDataType(ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++])); +if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { --- End diff -- Move this to new method and add a comment ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197488819 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -136,28 +189,81 @@ void fillDimensionData(BlockletScannedResult scannedResult, int[] surrogateResul } } else if (complexDataTypeArray[i]) { // Complex Type With No Dictionary Encoding. -row[order[i]] = comlexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal()) - .getDataBasedOnDataType(ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++])); +if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { + if (mergedComplexDimensionColumns + .get(queryDimensions[i].getDimension().getParentOrdinal()).size() > 1) { +fillRowForComplexColumn(complexDimensionInfoMap, row, i); + } else { +row[order[i]] = + complexDimensionInfoMap.get(queryDimensions[i].getDimension().getParentOrdinal()) +.getDataBasedOnColumn( + ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]), + queryDimensions[i].getDimension().getComplexParentDimension(), +queryDimensions[i].getDimension()); + } +} else { + row[order[i]] = + complexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal()) + .getDataBasedOnDataType( + ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++])); +} } else { -row[order[i]] = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn( -noDictionaryKeys[noDictionaryColumnIndex++], -queryDimensions[i].getDimension().getDataType()); +if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { --- End diff -- Move this to new method and add a comment ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197488339 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; + + /** + * Fields of this Map of Parent Ordinal with the List is the Child Column Dimension and + * the corresponding data buffer of that column. + */ + + public Map> mergedComplexDimensionDataMap; --- End diff -- Move this also to method and create only once and reuse the same. If it is possible please use multidimensional array ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197486428 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -119,9 +143,38 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { return listBasedResult; } + private void fillComplexColumnDataBuffer() { + +//dictionaryComplexColumnIndex = 0; +noDictionaryComplexColumnIndex = 0; +complexTypeComplexColumnIndex = 0; +for (int i = 0; i < queryDimensions.length; i++) { + if (queryDimensions[i].getDimension().getParentOrdinal() != -1) { --- End diff -- Take an array and fill the ordinals in a constructor ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197483516 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; --- End diff -- All these class variables not needed, move to method level ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197482635 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -27,6 +28,7 @@ import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; --- End diff -- To handle dictionary complex type, try to place this code in ColumnPageWrapper ``` @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey, KeyStructureInfo restructuringInfo) { byte[] rowBytes = columnPage.getBytes(rowId); outputSurrogateKey[chunkIndex] = CarbonUtil.getSurrogateInternal(rowBytes, 0, rowBytes.length); return chunkIndex + 1; } ``` ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197473335 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -19,6 +19,7 @@ import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Arrays; +import java.util.HashMap; --- End diff -- Please check and handle vectorCollector also if required. ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197472622 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/CarbonDimension.java --- @@ -51,6 +51,16 @@ */ private int complexTypeOrdinal; + /** + * to store the Ordinal of the Complex Parent Column. + */ + private int parentOrdinal = -1; --- End diff -- Please remove it as we can get the information from `complexParentDimension` ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197470593 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -665,6 +665,34 @@ public CarbonDimension getDimensionByName(String tableName, String columnName) { if (dim.getColName().equalsIgnoreCase(columnName)) { carbonDimension = dim; break; + } else if (dim.getListOfChildDimensions() != null) { --- End diff -- Try to simplify as follow ``` /** * to get particular dimension from a table * * @param tableName * @param columnName [col1.col12.col123] * @return */ public CarbonDimension getDimensionByName(String tableName, String columnName) { CarbonDimension carbonDimension = null; List dimList = tableDimensionsMap.get(tableName); String[] colsplits = columnName.split("\\."); String tempColName = colsplits[0]; for (String colsplit : colsplits) { carbonDimension = getCarbonDimension(tempColName, dimList); if (carbonDimension.getListOfChildDimensions() != null) { tempColName = tempColName + "." + colsplit; dimList = carbonDimension.getListOfChildDimensions(); } } List implicitDimList = tableImplicitDimensionsMap.get(tableName); if (carbonDimension == null) { carbonDimension = getCarbonDimension(columnName, implicitDimList); } return carbonDimension; } private CarbonDimension getCarbonDimension(String columnName, List dimensions) { CarbonDimension carbonDimension = null; for (CarbonDimension dim : dimensions) { if (dim.getColName().equalsIgnoreCase(columnName)) { carbonDimension = dim; break; } } return carbonDimension; } ``` ---