[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3053 @kumarvishal09 ...I agree with you that it is a functional issue and we need to merge it. My point was before merging we can do one load performance test to see if there is any performance degrade and if there is any then we can update the benchmark results ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3053 @akashrn5 .I agree with @xuchuanyin before merging the PR it is better to get the PR tested for performance. We can observe 2 things during the benchmark test - performance and compression ratio of rawcompress Vs compressDouble and then take the final decision ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3047 LGTM ---
[GitHub] carbondata issue #3044: [CARBONDATA-3149]Documentation for alter table colum...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3044 LGTM ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224509 --- Diff: docs/ddl-of-carbondata.md --- @@ -681,24 +682,24 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE DATA TYPE + - # CHANGE COLUMN NAME/TYPE - This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher. + This command is used to change the column's name and the data type from INT to BIGINT or decimal precision from lower to higher and rename column. --- End diff -- Write this as below `This command is used to change column name and the data type from INT to BIGINT or decimal precision from lower to higher.` ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224386 --- Diff: docs/ddl-of-carbondata.md --- @@ -47,7 +47,8 @@ CarbonData DDL statements are documented here,which includes: * [RENAME TABLE](#rename-table) * [ADD COLUMNS](#add-columns) * [DROP COLUMNS](#drop-columns) -* [CHANGE DATA TYPE](#change-data-type) +* [RENAME COLUMN](#change-column-name-/-type) +* [CHANGE DATA TYPE](#change-column-name-/-type) --- End diff -- check for linking. With this text it will not link ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224565 --- Diff: docs/ddl-of-carbondata.md --- @@ -681,24 +682,24 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE DATA TYPE + - # CHANGE COLUMN NAME/TYPE - This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher. + This command is used to change the column's name and the data type from INT to BIGINT or decimal precision from lower to higher and rename column. Change of decimal data type from lower precision to higher precision will only be supported for cases where there is no data loss. ``` - ALTER TABLE [db_name.]table_name CHANGE col_name col_name changed_column_type + ALTER TABLE [db_name.]table_name CHANGE old_col_name new_col_name column_data_type --- End diff -- Change this as below `ALTER TABLE [db_name.]table_name CHANGE col_old_name col_new_name column_type` ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3047 LGTM...can be merged once build passes ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244920921 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -46,9 +47,9 @@ object CarbonStore { def showSegments( limit: Option[String], - tablePath: String, + carbonTable: CarbonTable, --- End diff -- Move `carbonTable` as the first argument of method ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244922117 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) --- End diff -- Boolean flag in the method call is to update the data and index size in the table status file. Pass the flag as true so that it computes the size and update the table status file. This will avoid calculation for each Show Segment call ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r244919178 --- Diff: docs/ddl-of-carbondata.md --- @@ -681,24 +682,28 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE DATA TYPE + - # RENAME COLUMN AND CHANGE DATATYPE --- End diff -- Change this to `Change Column Name/Type`. Please go through the below link and try to align the documentation in that way if possible https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment ---
[GitHub] carbondata pull request #3039: [CARBONDATA-3217] Optimize implicit filter ex...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3039#discussion_r244668804 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map> blockIdToBlockletIdMapping; + + public ImplicitExpression(List implicitFilterList) { +// initialize map with half the size of filter list as one block id can contain +// multiple blocklets +blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); +for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); +} + } + + public ImplicitExpression(Map> blockIdToBlockletIdMapping) { +this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { +String blockId = +blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); +Set blockletIds = blockIdToBlockletIdMapping.get(blockId); +if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); +} + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); --- End diff -- Not required to catch the NumberFormatException. Blocklet Id is always expected to be an integer number and in the complete flow it is used as an integer. So if there is any exception in the conversion then the code problem is not here but from the other part of code. So if we handle the exception here the actual cause will get suppressed here ---
[GitHub] carbondata pull request #3039: [CARBONDATA-3217] Optimize implicit filter ex...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3039#discussion_r244668632 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map> blockIdToBlockletIdMapping; + + public ImplicitExpression(List implicitFilterList) { +// initialize map with half the size of filter list as one block id can contain +// multiple blocklets +blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); +for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); +} + } + + public ImplicitExpression(Map> blockIdToBlockletIdMapping) { +this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { +String blockId = +blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); +Set blockletIds = blockIdToBlockletIdMapping.get(blockId); +if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); +} + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); + } + + @Override public ExpressionResult evaluate(RowIntf value) + throws FilterUnsupportedException, FilterIllegalMemberException { +throw new UnsupportedOperationException("Operation not supported for Implicit expression"); + } + + public Map> getBlockIdToBlockletIdMapping() { +return blockIdToBlockletIdMapping; + } + + @Override public ExpressionType getFilterExpressionType() { +return ExpressionType.IMPLICIT; + } + + @Override public void findAndSetChild(Expression oldExpr, Expression newExpr) { --- End diff -- This method is not required to be implemented in this class so its implementation is empty ---
[GitHub] carbondata pull request #3039: [WIP] Optimize implicit filter expression per...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/carbondata/pull/3039 [WIP] Optimize implicit filter expression performance by removing extra serialization Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata implicit_column_filter_serialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3039.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3039 commit 73390e08aad788698dd9f12e3513a4a4814afd73 Author: manishgupta88 Date: 2018-12-27T09:48:07Z Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient ---
[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3027 LGTM ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244306298 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -262,13 +263,28 @@ private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( carbonTable: CarbonTable, tableInfo: TableInfo, addColumnSchema: ColumnSchema, - schemaEvolutionEntry: SchemaEvolutionEntry): Unit = { + schemaEvolutionEntry: SchemaEvolutionEntry, + oldCarbonColumn: CarbonColumn): Unit = { val schemaConverter = new ThriftWrapperSchemaConverterImpl -val a = List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema)) -val (tableIdentifier, schemaParts, cols) = AlterTableUtil.updateSchemaInfo( - carbonTable, schemaEvolutionEntry, tableInfo, Some(a))(sparkSession) +// get the carbon column in schema order +val carbonColumns = carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala + .collect { case carbonColumn if !carbonColumn.isInvisible => carbonColumn.getColumnSchema } +// get the schema ordinal of the column for which the datatype changed or column is renamed +var schemaOrdinal: Int = 0 +carbonColumns.foreach { carbonColumn => + if (carbonColumn.getColumnName.equalsIgnoreCase(oldCarbonColumn.getColName)) { +schemaOrdinal = carbonColumns.indexOf(carbonColumn) --- End diff -- Use filter function to achieve the required output ---
[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3022 LGTM ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244271907 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -262,13 +263,26 @@ private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( carbonTable: CarbonTable, tableInfo: TableInfo, addColumnSchema: ColumnSchema, - schemaEvolutionEntry: SchemaEvolutionEntry): Unit = { + schemaEvolutionEntry: SchemaEvolutionEntry, + oldCarbonColumn: CarbonColumn): Unit = { val schemaConverter = new ThriftWrapperSchemaConverterImpl -val a = List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema)) +// get the carbon column in schema order +val carbonColumns = carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala + .filter(!_.isInvisible).collect{case carbonColumn => carbonColumn.getColumnSchema} --- End diff -- Move filter operation to collect ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244271865 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala --- @@ -93,11 +93,17 @@ private[sql] case class CarbonAlterTableAddColumnCommand( schemaEvolutionEntry.setAdded(newCols.toList.asJava) val thriftTable = schemaConverter .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, tableName) + // carbon columns based on schema order + val carbonColumns = carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala +.collect { case carbonColumn => carbonColumn.getColumnSchema } +.filter(!_.isInvisible) --- End diff -- Move filter operation in collect operation by adding if clause in the case statement ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244272092 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -262,13 +263,26 @@ private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( carbonTable: CarbonTable, tableInfo: TableInfo, addColumnSchema: ColumnSchema, - schemaEvolutionEntry: SchemaEvolutionEntry): Unit = { + schemaEvolutionEntry: SchemaEvolutionEntry, + oldCarbonColumn: CarbonColumn): Unit = { val schemaConverter = new ThriftWrapperSchemaConverterImpl -val a = List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema)) +// get the carbon column in schema order +val carbonColumns = carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala + .filter(!_.isInvisible).collect{case carbonColumn => carbonColumn.getColumnSchema} +// get the schema ordinal of the column for which the datatype changed or column is renamed +val schemaOrdinal = carbonColumns.collect { --- End diff -- Instead of collect try and use foreach ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244271732 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala --- @@ -93,11 +93,17 @@ private[sql] case class CarbonAlterTableAddColumnCommand( schemaEvolutionEntry.setAdded(newCols.toList.asJava) val thriftTable = schemaConverter .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, tableName) + // carbon columns based on schema order + val carbonColumns = carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala +.collect { case carbonColumn => carbonColumn.getColumnSchema } +.filter(!_.isInvisible) + // sort the new columns based on schema order + val sortedColsBasedActualSchemaOrder = newCols.sortBy(a => a.getSchemaOrdinal) val (tableIdentifier, schemaParts, cols) = AlterTableUtil.updateSchemaInfo( carbonTable, schemaConverter.fromWrapperToExternalSchemaEvolutionEntry(schemaEvolutionEntry), thriftTable, - Some(newCols))(sparkSession) + Some(carbonColumns ++ sortedColsBasedActualSchemaOrder))(sparkSession) --- End diff -- `AlterTableUtil.updateSchemaInfo` is not making use of columns passed so remove the method argument and use columns for changing the hive schema ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244270219 --- Diff: integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala --- @@ -105,47 +106,37 @@ class CarbonHiveSessionCatalog( .asInstanceOf[HiveExternalCatalog].client } - def alterTableRename(oldTableIdentifier: TableIdentifier, - newTableIdentifier: TableIdentifier, - newTablePath: String): Unit = { -getClient().runSqlHive( - s"ALTER TABLE ${ oldTableIdentifier.database.get }.${ oldTableIdentifier.table } " + - s"RENAME TO ${ oldTableIdentifier.database.get }.${ newTableIdentifier.table }") -getClient().runSqlHive( - s"ALTER TABLE ${ oldTableIdentifier.database.get }.${ newTableIdentifier.table} " + - s"SET SERDEPROPERTIES" + - s"('tableName'='${ newTableIdentifier.table }', " + - s"'dbName'='${ oldTableIdentifier.database.get }', 'tablePath'='${ newTablePath }')") - } - - override def alterTable(tableIdentifier: TableIdentifier, - schemaParts: String, - cols: Option[Seq[org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema]]) - : Unit = { -getClient() - .runSqlHive(s"ALTER TABLE ${tableIdentifier.database.get}.${ tableIdentifier.table } " + - s"SET TBLPROPERTIES(${ schemaParts })") - } - override def alterAddColumns(tableIdentifier: TableIdentifier, schemaParts: String, - cols: Option[Seq[org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema]]) - : Unit = { + cols: Option[Seq[ColumnSchema]]): Unit = { alterTable(tableIdentifier, schemaParts, cols) +CarbonSessionUtil + .alterExternalCatalogForTableWithUpdatedSchema(tableIdentifier, +cols, +schemaParts, +sparkSession) } override def alterDropColumns(tableIdentifier: TableIdentifier, --- End diff -- Unify `alterDropColumns` and `alterAddColumns` into one method...keep interface methods same but move the common code to 1 method and call it from the interface methods ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3027#discussion_r244270598 --- Diff: integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionUtil.scala --- @@ -93,4 +98,34 @@ object CarbonSessionUtil { ) } + /** + * This method alter the table for datatype change or column rename operation, and update the + * external catalog directly + * + * @param tableIdentifier tableIdentifier for table + * @param colsall the column of table, which are updated with datatype change of + *new column name + * @param schemaParts schemaParts + * @param sparkSessionsparkSession + */ + def alterExternalCatalogForTableWithUpdatedSchema(tableIdentifier: TableIdentifier, + cols: Option[Seq[ColumnSchema]], + schemaParts: String, + sparkSession: SparkSession): Unit = { +val carbonTable = CarbonEnv.getCarbonTable(tableIdentifier)(sparkSession) +val colArray: scala.collection.mutable.ArrayBuffer[StructField] = ArrayBuffer() +cols.get.foreach(column => + if (!column.isInvisible) { +colArray += StructField(column.getColumnName, + SparkTypeConverter +.convertCarbonToSparkDataType(column, + carbonTable)) + } +) +sparkSession.sessionState.catalog.externalCatalog + .alterTableDataSchema(tableIdentifier.database.get, --- End diff -- add a comment for the usage of API `alterTableDataSchema` to explain its purpose ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244086878 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { +if (!(numberOfSortColumns > 0)) { --- End diff -- 1. Rewrite the condition `if (!(numberOfSortColumns > 0))` as `if (numberOfSortColumns == 0)` 2. The functionality of this method is not clear. Add a comment to explain the logic explanation and use of this method ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244085418 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala --- @@ -1068,4 +1068,40 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll { sql("Drop table if exists adaptive") } + test("Test major compaction for struct of array type") { +sql("DROP TABLE IF EXISTS carbon") --- End diff -- 1. Include dropping of table in afterAll also 2. Give the test case description as below `Test major compaction with dictionary include for struct of array type` ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244086110 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName), carbonTable.getMeasureByTableName(tableName)); carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema); -// get the cardinality for all all the columns including no dictionary columns -int[] formattedCardinality = CarbonUtil - .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); +// get the cardinality for all all the columns including no +// dictionary columns and complex columns +int[] dimAndComplexColumnCardinality = +new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties +.getComplexDimColumnCardinality().length]; +for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) { + dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i]; --- End diff -- Check for a resturcture drop column case when few loads are done and then dictionary column is dropped and compaction is triggered. In that case segmentProperties will contain cardinality of dropped column also check finally what is the schema and cardinality written during compaction ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244087077 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { +if (!(numberOfSortColumns > 0)) { + for (int i = 0; i < complexCardinality.length; i++) { +if (complexCardinality[i] != 0) { + complexCardinality[i] = Integer.MAX_VALUE; +} + } +} +List dimsLenList = new ArrayList(); +for (int eachDimLen : complexCardinality) { + if (eachDimLen != 0) dimsLenList.add(eachDimLen); +} +int[] dimLens = new int[dimsLenList.size()]; +for (int i = 0; i < dimsLenList.size(); i++) { + dimLens[i] = dimsLenList.get(i); +} +return dimLens; + } + + public static KeyGenerator[] createKeyGeneratorForComplexDimension(int numberOfSortColumns, --- End diff -- Add a method comment to explain the logic and method usage ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244086909 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -699,4 +701,33 @@ public static boolean isRawDataRequired(CarbonDataLoadConfiguration configuratio return iterators; } + public static int[] calcDimensionLengths(int numberOfSortColumns, int[] complexCardinality) { +if (!(numberOfSortColumns > 0)) { + for (int i = 0; i < complexCardinality.length; i++) { +if (complexCardinality[i] != 0) { + complexCardinality[i] = Integer.MAX_VALUE; +} + } +} +List dimsLenList = new ArrayList(); +for (int eachDimLen : complexCardinality) { + if (eachDimLen != 0) dimsLenList.add(eachDimLen); +} +int[] dimLens = new int[dimsLenList.size()]; --- End diff -- Conversion from arrayList to array can be done directly ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2990 LGTM ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243164028 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableDataTypeChangeModel, DataTypeInfo, + MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, + AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +abstract class CarbonAlterTableColumnRenameCommand(oldColumnName: String, newColumnName: String) + extends MetadataCommand { + + protected def validColumnsForRenaming(carbonColumns: mutable.Buffer[CarbonColumn], + oldCarbonColumn: CarbonColumn, + carbonTable: CarbonTable): Unit = { +// check whether new column name is already an existing column name +if (carbonColumns.exists(_.getColName.equalsIgnoreCase(newColumnName))) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. New " + +s"column name $newColumnName already exists" + +s" in table ${ carbonTable.getTableName }") +} + +// if the column rename is for complex column, block the operation +if (oldCarbonColumn.isComplex) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. Rename " + +s"column is unsupported for complex datatype " + +s"column ${ oldCarbonColumn.getColName }") +} + +// if column rename operation is on partition column, then fail the rename operation +if (null != carbonTable.getPartitionInfo) { + val partitionColumns = carbonTable.getPartitionInfo.getColumnSchemaList + partitionColumns.asScala.foreach { +col => + if (col.getColumnName.equalsIgnoreCase(oldColumnName)) { +throw new MalformedCarbonCommandException( + s"Column Rename Operation failed. Renaming " + + s"the partition column $newColumnName is not " + + s"allowed") + } + } +} + + } +} + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableDataTypeChangeModel) + extends CarbonAlterTableColumnRenameCommand(alterTableColRenameAndDataTypeChangeModel.columnName, +alterTableColRenameAndDataTypeChangeModel.newColumnName) { + + override def
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243163786 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1511,7 +1514,15 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } DataTypeInfo("decimal", precision, scale) case _ => -throw new MalformedCarbonCommandException("Data type provided is invalid.") +if (isColumnRename) { + if (dataType.equalsIgnoreCase("decimal")) { --- End diff -- instead of if else block use case matching here ---
[GitHub] carbondata issue #3002: [CARBONDATA-3182] Fixed SDV Testcase failures
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3002 LGTM ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242860813 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala --- @@ -171,11 +171,12 @@ case class DropPartitionCallableModel(carbonLoadModel: CarbonLoadModel, case class DataTypeInfo(dataType: String, precision: Int = 0, scale: Int = 0) -case class AlterTableDataTypeChangeModel(dataTypeInfo: DataTypeInfo, +case class AlterTableColRenameAndDataTypeChangeModel(dataTypeInfo: DataTypeInfo, --- End diff -- Seggregate Rename and DataType change model ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242837854 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala --- @@ -641,7 +641,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll { val ex3 = intercept[MalformedCarbonCommandException] { sql("alter table datamap_test7 change id id BIGINT") } -assert(ex3.getMessage.contains("alter table change datatype is not supported")) +assert(ex3.getMessage.contains("alter table change datatype or column rename is not supported")) --- End diff -- Perform the validation separately for column rename and datatype change and change the message as per change in the TableOperation enum ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242870689 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala --- @@ -249,32 +249,99 @@ object AlterTableUtil { * @param timeStamp * @param sparkSession */ - def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) + def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) (sparkSession: SparkSession): Unit = { val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable) val evolutionEntryList = thriftTable.fact_table.schema_evolution.schema_evolution_history -val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp -if (updatedTime == timeStamp) { - LOGGER.error(s"Reverting changes for $dbName.$tableName") - val removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed - thriftTable.fact_table.table_columns.asScala.foreach { columnSchema => -removedColumns.asScala.foreach { removedColumn => - if (columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) && - !columnSchema.isInvisible) { -columnSchema.setData_type(removedColumn.data_type) -columnSchema.setPrecision(removedColumn.precision) -columnSchema.setScale(removedColumn.scale) - } +// here, there can be maximum of two entries for schemaEvolution, when my operation is +// both column rename and datatype change. So check if last two Evolution entry timestamp is +// same, then it is both column rename and datatype change, so revert two entries,else one entry +if (evolutionEntryList.size() > 1 && +(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) && +(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp == timeStamp)) { + LOGGER.error(s"Reverting column rename and datatype changes for $dbName.$tableName") + revertColumnSchemaChanges(thriftTable, evolutionEntryList, true) +} else { + if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) { +LOGGER.error(s"Reverting changes for $dbName.$tableName") +revertColumnSchemaChanges(thriftTable, evolutionEntryList, false) + } --- End diff -- same comment as above ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242863058 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel, DataTypeInfo, MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableColRenameAndDataTypeChangeModel) + extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = alterTableColRenameAndDataTypeChangeModel.tableName +val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName + .getOrElse(sparkSession.catalog.currentDatabase) +var isColumnRenameOnly = false +var isDataTypeChangeOnly = false +var isBothColRenameAndDataTypeChange = false +setAuditTable(dbName, tableName) +setAuditInfo(Map( + "column" -> alterTableColRenameAndDataTypeChangeModel.columnName, + "newColumn" -> alterTableColRenameAndDataTypeChangeModel.newColumnName, + "newType" -> alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType)) +val locksToBeAcquired = List(LockUsage.METADATA_LOCK, LockUsage.COMPACTION_LOCK) +var locks = List.empty[ICarbonLock] +// get the latest carbon table and check for column existence +var carbonTable: CarbonTable = null +var timeStamp = 0L +try { + locks = AlterTableUtil +.validateTableAndAcquireLock(dbName, tableName, locksToBeAcquired)(sparkSession) + val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore + carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) + if (!carbonTable.canAllow(carbonTable, TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE, +alterTableColRenameAndDataTypeChangeModel.columnName)) { +throw new MalformedCarbonCommandException( + "alter table change datatype or column rename is not supported for index datamap") + } + val operationContext = new OperationContext + val alterTableColRenameAndDataTypeChangePreEvent = +AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, carbonTable, + alterTableColRenameAndDataTypeChangeModel) + OperationListenerBus.getInstance() +.fire
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242861256 --- Diff: integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/SqlAstBuilderHelper.scala --- @@ -38,25 +37,27 @@ trait SqlAstBuilderHelper extends SparkSqlAstBuilder { override def visitChangeColumn(ctx: ChangeColumnContext): LogicalPlan = { val newColumn = visitColType(ctx.colType) +var isColumnRename = false --- End diff -- Modify it to val ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242870325 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -318,15 +322,27 @@ class CarbonFileMetastore extends CarbonMetaStore { */ def revertTableSchemaInAlterFailure(carbonTableIdentifier: CarbonTableIdentifier, thriftTableInfo: org.apache.carbondata.format.TableInfo, - absoluteTableIdentifier: AbsoluteTableIdentifier)(sparkSession: SparkSession): String = { + absoluteTableIdentifier: AbsoluteTableIdentifier, + timeStamp: Long)(sparkSession: SparkSession): String = { val schemaConverter = new ThriftWrapperSchemaConverterImpl val wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( thriftTableInfo, carbonTableIdentifier.getDatabaseName, carbonTableIdentifier.getTableName, absoluteTableIdentifier.getTablePath) val evolutionEntries = thriftTableInfo.fact_table.schema_evolution.schema_evolution_history -evolutionEntries.remove(evolutionEntries.size() - 1) +// we may need to remove two evolution entries if the operation is both col rename and datatype +// change operation +if (evolutionEntries.size() > 1 && +(evolutionEntries.get(evolutionEntries.size() - 1).time_stamp == +evolutionEntries.get(evolutionEntries.size() - 2).time_stamp)) { + evolutionEntries.remove(evolutionEntries.size() - 1) + evolutionEntries.remove(evolutionEntries.size() - 2) +} else { + if (evolutionEntries.get(evolutionEntries.size() - 1).time_stamp == timeStamp) { +evolutionEntries.remove(evolutionEntries.size() - 1) + } --- End diff -- Better not toi harcode the entries. Iterate and compare the entries and remove till timestamps are equal ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242849005 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1487,31 +1487,46 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { * @param values * @return */ - def parseDataType(dataType: String, values: Option[List[(Int, Int)]]): DataTypeInfo = { + def parseDataType( + dataType: String, + values: Option[List[(Int, Int)]], + isColumnRename: Boolean): DataTypeInfo = { +def validateAndGetDecimalDatatype: DataTypeInfo = { + var precision: Int = 0 + var scale: Int = 0 + if (values.isDefined) { +precision = values.get(0)._1 +scale = values.get(0)._2 + } else { +throw new MalformedCarbonCommandException("Decimal format provided is invalid") + } + // precision should be > 0 and <= 38 and scale should be >= 0 and <= 38 + if (precision < 1 || precision > 38) { +throw new MalformedCarbonCommandException("Invalid value for precision") + } else if (scale < 0 || scale > 38) { +throw new MalformedCarbonCommandException("Invalid value for scale") + } + DataTypeInfo("decimal", precision, scale) +} + dataType match { case "bigint" | "long" => if (values.isDefined) { throw new MalformedCarbonCommandException("Invalid data type") } DataTypeInfo(dataType) case "decimal" => -var precision: Int = 0 -var scale: Int = 0 -if (values.isDefined) { - precision = values.get(0)._1 - scale = values.get(0)._2 +validateAndGetDecimalDatatype --- End diff -- This change is not required. You can revert. In default case handle for rename scenario ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242870457 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonHiveMetaStore.scala --- @@ -199,11 +202,22 @@ class CarbonHiveMetaStore extends CarbonFileMetastore { */ override def revertTableSchemaInAlterFailure(carbonTableIdentifier: CarbonTableIdentifier, thriftTableInfo: format.TableInfo, - identifier: AbsoluteTableIdentifier) + identifier: AbsoluteTableIdentifier, + timeStamp: Long) (sparkSession: SparkSession): String = { val schemaConverter = new ThriftWrapperSchemaConverterImpl val evolutionEntries = thriftTableInfo.fact_table.schema_evolution.schema_evolution_history -evolutionEntries.remove(evolutionEntries.size() - 1) +// we may need to remove two evolution entries if the operation is both col rename and datatype +// change operation +if (evolutionEntries.size() > 1 && (evolutionEntries.get(evolutionEntries.size() - 1).time_stamp + == evolutionEntries.get(evolutionEntries.size() - 2).time_stamp)) { + evolutionEntries.remove(evolutionEntries.size() - 1) + evolutionEntries.remove(evolutionEntries.size() - 2) +} else { + if (evolutionEntries.get(evolutionEntries.size() - 1).time_stamp == timeStamp) { +evolutionEntries.remove(evolutionEntries.size() - 1) + } --- End diff -- Same comment as above ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242869441 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel, DataTypeInfo, MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableColRenameAndDataTypeChangeModel) + extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = alterTableColRenameAndDataTypeChangeModel.tableName +val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName + .getOrElse(sparkSession.catalog.currentDatabase) +var isColumnRenameOnly = false +var isDataTypeChangeOnly = false +var isBothColRenameAndDataTypeChange = false --- End diff -- Using only 1 flag can serve the required purpose of code. So make use of only 2 flags `isDataTypeChangeOnly ` and modify the code logic accordingly. You can use the rename flag from `alterTableColRenameAndDataTypeChangeModel` model ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242865160 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel, DataTypeInfo, MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableColRenameAndDataTypeChangeModel) + extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = alterTableColRenameAndDataTypeChangeModel.tableName +val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName + .getOrElse(sparkSession.catalog.currentDatabase) +var isColumnRenameOnly = false +var isDataTypeChangeOnly = false +var isBothColRenameAndDataTypeChange = false +setAuditTable(dbName, tableName) +setAuditInfo(Map( + "column" -> alterTableColRenameAndDataTypeChangeModel.columnName, + "newColumn" -> alterTableColRenameAndDataTypeChangeModel.newColumnName, + "newType" -> alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType)) +val locksToBeAcquired = List(LockUsage.METADATA_LOCK, LockUsage.COMPACTION_LOCK) +var locks = List.empty[ICarbonLock] +// get the latest carbon table and check for column existence +var carbonTable: CarbonTable = null +var timeStamp = 0L +try { + locks = AlterTableUtil +.validateTableAndAcquireLock(dbName, tableName, locksToBeAcquired)(sparkSession) + val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore + carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) + if (!carbonTable.canAllow(carbonTable, TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE, +alterTableColRenameAndDataTypeChangeModel.columnName)) { +throw new MalformedCarbonCommandException( + "alter table change datatype or column rename is not supported for index datamap") + } + val operationContext = new OperationContext + val alterTableColRenameAndDataTypeChangePreEvent = +AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, carbonTable, + alterTableColRenameAndDataTypeChangeModel) + OperationListenerBus.getInstance() +.fire
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242871800 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala --- @@ -249,32 +249,99 @@ object AlterTableUtil { * @param timeStamp * @param sparkSession */ - def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) + def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) (sparkSession: SparkSession): Unit = { val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable) val evolutionEntryList = thriftTable.fact_table.schema_evolution.schema_evolution_history -val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp -if (updatedTime == timeStamp) { - LOGGER.error(s"Reverting changes for $dbName.$tableName") - val removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed - thriftTable.fact_table.table_columns.asScala.foreach { columnSchema => -removedColumns.asScala.foreach { removedColumn => - if (columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) && - !columnSchema.isInvisible) { -columnSchema.setData_type(removedColumn.data_type) -columnSchema.setPrecision(removedColumn.precision) -columnSchema.setScale(removedColumn.scale) - } +// here, there can be maximum of two entries for schemaEvolution, when my operation is +// both column rename and datatype change. So check if last two Evolution entry timestamp is +// same, then it is both column rename and datatype change, so revert two entries,else one entry +if (evolutionEntryList.size() > 1 && +(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) && +(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp == timeStamp)) { + LOGGER.error(s"Reverting column rename and datatype changes for $dbName.$tableName") + revertColumnSchemaChanges(thriftTable, evolutionEntryList, true) +} else { + if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) { +LOGGER.error(s"Reverting changes for $dbName.$tableName") +revertColumnSchemaChanges(thriftTable, evolutionEntryList, false) + } +} +metastore + .revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier, +thriftTable, carbonTable.getAbsoluteTableIdentifier, timeStamp)(sparkSession) + } + + /** + * This method reverts the column schema in case of failure in alter datatype change or col rename + * @param thriftTable thrift table + * @param evolutionEntryList SchemaEvolutionEntry List + * @param isBothColRenameAndDataTypeChange true if operation done is noth rename and datatype chng + */ + private def revertColumnSchemaChanges(thriftTable: TableInfo, + evolutionEntryList: util.List[SchemaEvolutionEntry], + isBothColRenameAndDataTypeChange: Boolean): Unit = { +var removedColumns: mutable.Buffer[org.apache.carbondata.format.ColumnSchema] = null +if (isBothColRenameAndDataTypeChange) { + removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed.asScala ++ + evolutionEntryList.get(evolutionEntryList.size() - 2).removed.asScala +} else { + removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed.asScala +} +thriftTable.fact_table.table_columns.asScala.foreach { columnSchema => + removedColumns.foreach { removedColumn => +if (columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) && +!columnSchema.isInvisible) { + columnSchema.setColumn_name(removedColumn.column_name) + columnSchema.setData_type(removedColumn.data_type) + columnSchema.setPrecision(removedColumn.precision) + columnSchema.setScale(removedColumn.scale) } } - metastore - .revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier, - thriftTable, carbonTable.getAbsoluteTableIdentifier)(sparkSession) } } + /** + * This method modifies the table properties if column rename happened + * + * @param tableProperties + */ + def modifyTableProper
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242871412 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala --- @@ -249,32 +249,99 @@ object AlterTableUtil { * @param timeStamp * @param sparkSession */ - def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) + def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: String, timeStamp: Long) (sparkSession: SparkSession): Unit = { val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable) val evolutionEntryList = thriftTable.fact_table.schema_evolution.schema_evolution_history -val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp -if (updatedTime == timeStamp) { - LOGGER.error(s"Reverting changes for $dbName.$tableName") - val removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed - thriftTable.fact_table.table_columns.asScala.foreach { columnSchema => -removedColumns.asScala.foreach { removedColumn => - if (columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) && - !columnSchema.isInvisible) { -columnSchema.setData_type(removedColumn.data_type) -columnSchema.setPrecision(removedColumn.precision) -columnSchema.setScale(removedColumn.scale) - } +// here, there can be maximum of two entries for schemaEvolution, when my operation is +// both column rename and datatype change. So check if last two Evolution entry timestamp is +// same, then it is both column rename and datatype change, so revert two entries,else one entry +if (evolutionEntryList.size() > 1 && +(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) && +(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp == timeStamp)) { + LOGGER.error(s"Reverting column rename and datatype changes for $dbName.$tableName") + revertColumnSchemaChanges(thriftTable, evolutionEntryList, true) +} else { + if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp == timeStamp) { +LOGGER.error(s"Reverting changes for $dbName.$tableName") +revertColumnSchemaChanges(thriftTable, evolutionEntryList, false) + } +} +metastore + .revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier, +thriftTable, carbonTable.getAbsoluteTableIdentifier, timeStamp)(sparkSession) + } + + /** + * This method reverts the column schema in case of failure in alter datatype change or col rename + * @param thriftTable thrift table + * @param evolutionEntryList SchemaEvolutionEntry List + * @param isBothColRenameAndDataTypeChange true if operation done is noth rename and datatype chng + */ + private def revertColumnSchemaChanges(thriftTable: TableInfo, + evolutionEntryList: util.List[SchemaEvolutionEntry], + isBothColRenameAndDataTypeChange: Boolean): Unit = { +var removedColumns: mutable.Buffer[org.apache.carbondata.format.ColumnSchema] = null +if (isBothColRenameAndDataTypeChange) { + removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed.asScala ++ + evolutionEntryList.get(evolutionEntryList.size() - 2).removed.asScala +} else { + removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 1).removed.asScala --- End diff -- I can see the same code in multiple classes. Try and refine the code and move this method at a common place ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242837109 --- Diff: core/src/main/java/org/apache/carbondata/core/features/TableOperation.java --- @@ -21,7 +21,7 @@ ALTER_RENAME, ALTER_DROP, ALTER_ADD_COLUMN, - ALTER_CHANGE_DATATYPE, + ALTER_COL_RENAME_AND_CHANGE_DATATYPE, --- End diff -- Keep column rename and datatype change as different operations and make this change whereever applicable ---
[GitHub] carbondata pull request #2897: [CARBONDATA-3080] Supporting local dictionary...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2897#discussion_r242423291 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -94,10 +93,9 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d } @Override public void fillRow(int rowId, CarbonColumnVector vector, int vectorRow) { -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +// always set dictionary otherwise +// empty dictionary will get set if same col is called again in projection. +vector.setDictionary(dictionary); --- End diff -- @BJangir 1. Please check and confirm if the same problem occurs with CarbonSession also 2. Modify the PR description and specify the details for bug fixed in this PR after completion of point 1 ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2980 LGTM ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2980 @manishnalla1994 ...Please rebase and remove the commit for PR 2979 as it is already merged ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2980 LGTM...check for test case failures if any and fix ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240886853 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java --- @@ -631,7 +620,9 @@ public void setFactTimeStamp(long factTimeStamp) { } public String[] getDelimiters() { -return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 }; +String[] delimiters = new String[complexDelimiters.size()]; +delimiters = complexDelimiters.toArray(delimiters); +return delimiters; --- End diff -- This method is not required. Conversion can be done whereever required ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240886665 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java --- @@ -65,8 +65,7 @@ private String csvHeader; private String[] csvHeaderColumns; private String csvDelimiter; - private String complexDelimiterLevel1; - private String complexDelimiterLevel2; + private ArrayList complexDelimiters = new ArrayList<>(); --- End diff -- We can do lazy initialization in the setter method. This will avoid extra memory being consumed for non complex type schemas ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240152242 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/CarbonParserFactory.java --- @@ -51,23 +54,37 @@ public static GenericParser createParser(CarbonColumn carbonColumn, String[] com * delimiters * @return GenericParser */ - private static GenericParser createParser(CarbonColumn carbonColumn, String[] complexDelimiters, + private static GenericParser createParser(CarbonColumn carbonColumn, + Queue complexDelimiters, String nullFormat, int depth) { +if (depth > 2) { + return null; --- End diff -- I think we should throw exception with proper error message if depth is more than 2 ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240126484 --- Diff: examples/spark2/src/main/resources/maptest2.csv --- @@ -0,0 +1,2 @@ +1\002Nalla\0012\002Singh\0011\002Gupta\0014\002Kumar +10\002Nallaa\00120\002Sissngh\001100\002Gusspta\00140\002Kumar --- End diff -- Move the above newly added files to spark-common-test ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240151744 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -119,6 +119,10 @@ "complex_delimiter_level_2", Maps.getOrDefault(options, "complex_delimiter_level_2", ":")); +optionsFinal.put( +"complex_delimiter_level_3", +Maps.getOrDefault(options, "complex_delimiter_level_3", "003")); + --- End diff -- Remove hardcoding ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240150644 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -188,11 +188,13 @@ case class CarbonLoadDataCommand( val carbonLoadModel = new CarbonLoadModel() val tableProperties = table.getTableInfo.getFactTable.getTableProperties val optionsFinal = LoadOption.fillOptionWithDefaultValue(options.asJava) +// These two delimiters are non configurable and hardcoded for map type +optionsFinal.put("complex_delimiter_level_3", "\003") +optionsFinal.put("complex_delimiter_level_4", "\004") --- End diff -- Remove hardcoded values. Create one `ComplexDelimeterEnum` and use the values from there ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240153125 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java --- @@ -34,8 +37,12 @@ private int numberOfColumns; public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) { -String[] complexDelimiters = +String[] tempComplexDelimiters = (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS); +Queue complexDelimiters = new LinkedList<>(); +for (int i = 0; i < 4; i++) { --- End diff -- Avoid using hardcoded values like `4` in the loops while codinginstead iterate over tempComplexDelimiters length ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240151175 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java --- @@ -222,8 +222,8 @@ public static CarbonDataLoadConfiguration createConfiguration(CarbonLoadModel lo configuration.setSegmentId(loadModel.getSegmentId()); configuration.setTaskNo(loadModel.getTaskNo()); configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS, -new String[] { loadModel.getComplexDelimiterLevel1(), -loadModel.getComplexDelimiterLevel2() }); +new String[] { loadModel.getComplexDelimiterLevel1(), loadModel.getComplexDelimiterLevel2(), +loadModel.getComplexDelimiterLevel3(), loadModel.getComplexDelimiterLevel4() }); --- End diff -- Instead of adding these many delimeters methods in loadmodel, create a list or array of complex delimeters in loadmodel and then use it here ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240126217 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio SKIP_EMPTY_LINE, carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE))); -String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":"); +String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "003"); --- End diff -- Move all the default delimters to constants and use it everywhere ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240147303 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala --- @@ -0,0 +1,452 @@ +/* + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +* +http://www.apache.org/licenses/LICENSE-2.0 +* +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ +package org.apache.carbondata.spark.testsuite.createTable.TestCreateDDLForComplexMapType + +import java.io.File +import java.util + +import org.apache.hadoop.conf.Configuration +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk + +class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { + private val conf: Configuration = new Configuration(false) + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + val path = s"$rootPath/examples/spark2/src/main/resources/maptest2.csv" + + private def checkForLocalDictionary(dimensionRawColumnChunks: util + .List[DimensionRawColumnChunk]): Boolean = { +var isLocalDictionaryGenerated = false +import scala.collection.JavaConversions._ +for (dimensionRawColumnChunk <- dimensionRawColumnChunks) { + if (dimensionRawColumnChunk.getDataChunkV3 +.isSetLocal_dictionary) { +isLocalDictionaryGenerated = true + } --- End diff -- You can directly use scala filter operation to check for local dictionary ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240145684 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio SKIP_EMPTY_LINE, carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE))); -String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":"); +String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "003"); String[] split = complexDelim.split(","); model.setComplexDelimiterLevel1(split[0]); if (split.length > 1) { model.setComplexDelimiterLevel2(split[1]); } +if (split.length > 2) { + model.setComplexDelimiterLevel3(split[2]); +} --- End diff -- Modify the above code as below `if (split.length > 2) { model.setComplexDelimiterLevel2(split[1]); model.setComplexDelimiterLevel3(split[2]); } else if (split.length > 1) { model.setComplexDelimiterLevel2(split[1]); }` ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240151368 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java --- @@ -631,7 +651,7 @@ public void setFactTimeStamp(long factTimeStamp) { } public String[] getDelimiters() { -return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 }; +return new String[] { complexDelimiterLevel1, complexDelimiterLevel2, complexDelimiterLevel3 }; --- End diff -- Modify this class and related changes in other classes as per the above comment ---
[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2968 LGTM ---
[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2940#discussion_r238152763 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala --- @@ -463,6 +464,39 @@ class TestPreAggCreateCommand extends QueryTest with BeforeAndAfterAll { executorService.shutdown() } + test("support set carbon.query.directQueryOnDataMap.enabled=true") { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath +val testData = s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv" +sql("drop table if exists mainTable") +sql( + s""" + | CREATE TABLE mainTable + | (id Int, + | name String, + | city String, + | age Int) + | STORED BY 'org.apache.carbondata.format' + """.stripMargin); + + +sql( + s""" + | LOAD DATA LOCAL INPATH '$testData' + | into table mainTable + """.stripMargin); --- End diff -- for scala code semi-colon `;` is not required ---
[GitHub] carbondata issue #2956: [CARBONDATA-3134] fixed null values when cachelevel ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2956 LGTM ---
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2956#discussion_r236644573 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -332,13 +334,42 @@ public void clear() { } SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other = (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj; - return tableIdentifier.equals(other.tableIdentifier) && columnsInTable - .equals(other.columnsInTable) && Arrays + return tableIdentifier.equals(other.tableIdentifier) && checkColumnSchemaEquality( + columnsInTable, other.columnsInTable) && Arrays .equals(columnCardinality, other.columnCardinality); } +private boolean checkColumnSchemaEquality(List obj1, List obj2) { + List clonedObj1 = new ArrayList<>(obj1); --- End diff -- You can add the first check for length in the first line of method...if length of 2 lists is not same then we can return false from here itself ---
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2956#discussion_r236646004 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -332,13 +334,42 @@ public void clear() { } SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other = (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj; - return tableIdentifier.equals(other.tableIdentifier) && columnsInTable - .equals(other.columnsInTable) && Arrays + return tableIdentifier.equals(other.tableIdentifier) && checkColumnSchemaEquality( + columnsInTable, other.columnsInTable) && Arrays .equals(columnCardinality, other.columnCardinality); } +private boolean checkColumnSchemaEquality(List obj1, List obj2) { + List clonedObj1 = new ArrayList<>(obj1); + List clonedObj2 = new ArrayList<>(obj2); + clonedObj1.addAll(obj1); + clonedObj2.addAll(obj2); + Collections.sort(clonedObj1, new Comparator() { +@Override public int compare(ColumnSchema o1, ColumnSchema o2) { + return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId()); +} + }); + Collections.sort(clonedObj2, new Comparator() { +@Override public int compare(ColumnSchema o1, ColumnSchema o2) { + return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId()); +} + }); + boolean exists = true; + for (int i = 0; i < obj1.size(); i++) { +if (!clonedObj1.get(i).equalsWithColumnId(clonedObj2.get(i))) { + exists = false; + break; +} + } + return exists; +} + @Override public int hashCode() { - return tableIdentifier.hashCode() + columnsInTable.hashCode() + Arrays + int hashCode = 0; + for (ColumnSchema columnSchema: columnsInTable) { +hashCode += columnSchema.hashCodeWithColumnId(); --- End diff -- rename variable name to `allColumnsHashCode` ---
[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2923 LGTM ---
[GitHub] carbondata issue #2872: [CARBONDATA-3113] Fixed Local Dictionary Query Perfo...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2872 LGTM ---
[GitHub] carbondata issue #2863: [CARBONDATA-3112] Optimise decompressing while filli...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2863 LGTM ---
[GitHub] carbondata issue #2906: [CARBONDATA-3088][Compaction] support prefetch for c...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2906 LGTM ---
[GitHub] carbondata issue #2928: [CARBONDATA-3106] Written_by_APPNAME not serialized ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2928 LGTM ---
[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2923#discussion_r234499385 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { +sql("drop table if exists par") +sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") +sql("alter table par drop columns(name)") +sql("alter table par add columns(name string)") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") --- End diff -- keeping partition column at the end is carbondata behavior which may or may not be known to user. For a normal table whenever a column is dropped and added, the added column data should either be added as the last column in csv file or it should be mapped through fileheader which is the correct behavior. As you are using the same csv file in your test case without changing the order of data and providing header the above explained behavior might not hold true. Please revisit the changes and take opinion from other PMC's/Committers on this behavioral change ---
[GitHub] carbondata issue #2872: [WIP] Added reusable buffer code
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2872 @kumarvishal09 ...In general, instead of passing the `ReusableDataBuffer` instance everwhere can we make this class singleton and create a `byte[]` pool and use the concept of object pool. This will help in following ways: 1. It will avoid modification of too many method/code modifications as wherever required we can take the object from pool and make the code simpler. 2. Currentlt it is being used for filling hthe decompressed data to avoid array copy. In future it will be easier to make use of the reusable buffer object pool wherever required without making too many changes in the code. We can discuss more on this and decide. ---
[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2872#discussion_r234176956 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/MeasureColumnChunkReader.java --- @@ -56,13 +57,14 @@ MeasureRawColumnChunk readRawMeasureChunk(FileReader fileReader, int columnIndex * @return * @throws IOException */ - ColumnPage decodeColumnPage(MeasureRawColumnChunk measureRawColumnChunk, - int pageNumber) throws IOException, MemoryException; + ColumnPage decodeColumnPage(MeasureRawColumnChunk measureRawColumnChunk, int pageNumber, + ReusableDataBuffer reusableDataBuffer) throws IOException, MemoryException; /** * Decode raw data and fill the vector */ - void decodeColumnPageAndFillVector(MeasureRawColumnChunk measureRawColumnChunk, - int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException; + void decodeColumnPageAndFillVector(MeasureRawColumnChunk measureRawColumnChunk, int pageNumber, + ColumnVectorInfo vectorInfo, ReusableDataBuffer reusableDataBuffer) + throws IOException, MemoryException; --- End diff -- Can we have separate interface methods with and wothout `ReusableDataBuffer` for both the above methods. Many places null is being passed which can be avoided using separate interface methods and then the common code can be refactored for classes implementing these methods ---
[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2872#discussion_r234201988 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/ZstdCompressor.java --- @@ -74,4 +74,16 @@ public long maxCompressedLength(long inputSize) { public boolean supportUnsafe() { return false; } + + @Override public int unCompressedLength(byte[] data, int offset, int length) { +throw new RuntimeException("Unsupported operation Exception"); + } + + @Override public int rawUncompress(byte[] data, int offset, int length, byte[] output) { +throw new RuntimeException("Unsupported operation Exception"); + } + + @Override public boolean supportReusableBuffer() { +return false; --- End diff -- Move the default implementation to Abstract class `AbstractCompressor` ---
[GitHub] carbondata issue #2918: [CARBONDATA-3098] Fix for negative exponents value g...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2918 LGTM ---
[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2901 LGTM ---
[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2895 LGTM ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 @ravipesala in this method we are passing the `List` . The list contains the segments file only from the new table. So only the new index files will be loaded and queried and the same is covered in test case added as part of this PR. The stale dataMap entries will never be referred. For clearing them we need to configure the LRU cache size which will automatically take care of eviction when threshold is reached. If LRU cache size is not configured then there is no mechanism to clear the stale datamaps ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 @ravipesala ...which method exactly you are referring to?...In all `getDataMap` methods latest `carbonTable` object is passed.and used for fetching the dataMaps..there is only one `getAllDataMaps` method which does not have any parameter but that is being only in the test cases... ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 retest this please ---
[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2902#discussion_r231012403 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala --- @@ -772,12 +772,33 @@ class UpdateCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { sql("""drop table if exists iud.dest33_part""") } + test("check data after update with row.filter pushdown as false") { +CarbonProperties.getInstance().addProperty(CarbonCommonConstants + .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false") +sql("""drop table if exists iud.dest33_flat""") +sql( + """create table iud.dest33_part (c1 int,c2 string, c3 short) STORED BY 'carbondata'""" +.stripMargin) +sql( + s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/negativevalue.csv' INTO table iud + |.dest33_part options('header'='false')""".stripMargin) +sql( + """update iud.dest33_part d set (c1) = (5) where d.c1 = 0""".stripMargin).show() +checkAnswer(sql("select c3 from iud.dest33_part"), Seq(Row(-300), Row(0), Row(-200), Row(700) + , Row(100), Row(-100), Row(null))) +sql("""drop table if exists iud.dest33_part""") +CarbonProperties.getInstance().addProperty(CarbonCommonConstants + .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true") + } + override def afterAll { sql("use default") sql("drop database if exists iud cascade") CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.isHorizontalCompactionEnabled , "true") CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER , "true") +CarbonProperties.getInstance().addProperty(CarbonCommonConstants + .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false") --- End diff -- instead of hard coding `"false"` use default value from constants ---
[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2902#discussion_r231016130 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -257,7 +265,13 @@ private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, } else if (pageDataType == DataTypes.SHORT) { short[] shortData = columnPage.getShortPage(); if (vectorDataType == DataTypes.SHORT) { - vector.putShorts(0, pageSize, shortData, 0); + if (isUnderlyingVectorPresent) { +for (int i = 0; i < pageSize; i++) { + vector.putShort(i, shortData[i]); +} + } else { +vector.putShorts(0, pageSize, shortData, 0); --- End diff -- I think using `putShorts/putFloats` is common and unavoidable. In future also any new encoding class can make use of these method and then again the same problem can occur. Is it feasible to modify the vector classes implementation methods itself just like an example below `public void putShorts(int rowId, int count, short[] src, int srcIndex) { for (int i = srcIndex; i < count; i++) { putShort(rowId++, src[i]); } }` This way it will be better ---
[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2902#discussion_r231014429 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala --- @@ -772,12 +772,33 @@ class UpdateCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { sql("""drop table if exists iud.dest33_part""") } + test("check data after update with row.filter pushdown as false") { +CarbonProperties.getInstance().addProperty(CarbonCommonConstants + .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false") +sql("""drop table if exists iud.dest33_flat""") +sql( + """create table iud.dest33_part (c1 int,c2 string, c3 short) STORED BY 'carbondata'""" +.stripMargin) +sql( + s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/negativevalue.csv' INTO table iud + |.dest33_part options('header'='false')""".stripMargin) +sql( + """update iud.dest33_part d set (c1) = (5) where d.c1 = 0""".stripMargin).show() +checkAnswer(sql("select c3 from iud.dest33_part"), Seq(Row(-300), Row(0), Row(-200), Row(700) + , Row(100), Row(-100), Row(null))) +sql("""drop table if exists iud.dest33_part""") +CarbonProperties.getInstance().addProperty(CarbonCommonConstants + .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true") --- End diff -- After test case completion we should set the default value for `CARBON_PUSH_ROW_FILTERS_FOR_VECTOR`?...default property is false so I think at the start of test case no need to modify the property value ---
[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2901 LGTM...please check the CI failure ---
[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2902#discussion_r231009938 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1734,7 +1734,7 @@ private CarbonCommonConstants() { public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR = "carbon.push.rowfilters.for.vector"; - public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false"; + public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "true"; --- End diff -- Any specific reason for changing the default value? ---
[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2895 LGTM..can be merged once build is passed ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r230999455 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- Both the method are called in carbon flow for vector filling. One is direct fill case and the other one is old vector fill flow. Please cross check once ---
[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2901 Add PR description ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 retest this please ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 @xuchuanyin ...yes this scenario will work fine. In case of dropping normal table it will go through CarbonSession flow and drop table command is already taking care of clearing the datamaps. In case of fileFormat table drop, if the clear dataMap API is not integrated by customer in that case the changes done in this PR will take care of referring only to latest carbon table ---
[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2901#discussion_r230818772 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java --- @@ -171,13 +171,20 @@ public Object getCurrentValue() throws IOException, InterruptedException { rowCount += 1; Object[] row = new Object[carbonColumnarBatch.columnVectors.length]; for (int i = 0; i < carbonColumnarBatch.columnVectors.length; i ++) { + Object data = carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); if (carbonColumnarBatch.columnVectors[i].getType() == DataTypes.STRING || carbonColumnarBatch.columnVectors[i].getType() == DataTypes.VARCHAR) { -byte[] data = (byte[]) carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); -row[i] = ByteUtil.toString(data, 0, data.length); +if (data == null) { + row[i] = null; +} else { + row[i] = ByteUtil.toString((byte[]) data, 0, (((byte[]) data).length)); +} } else if (carbonColumnarBatch.columnVectors[i].getType() == DataTypes.BOOLEAN) { -byte data = (byte) carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); -row[i] = ByteUtil.toBoolean(data); +if (data == null) { + row[i] = null; +} else { + row[i] = ByteUtil.toBoolean((byte) data); +} --- End diff -- For other dataTypes is the same handling of null required?...If required then you can move the if check for `data == null` before first if check and set the row to null if data is null and continue ---
[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2901#discussion_r230816465 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1844,4 +1844,53 @@ public void testVectorReader() { } } + @Test + public void testReadingNullValues() { +String path = "./testWriteFiles"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[2]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.BOOLEAN); --- End diff -- Rename `shortField` to `booleanField` ---
[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2895 LGTM apart from a minor comment ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r230814030 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- Is the same handling required in `fillRow` method also in the same class?..If required then `isDictionaryUsed` and `setDictionaryUsed` API's will not be required and those can also be removed from the interface ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 retest this please ---
[GitHub] carbondata issue #2897: [CARBONDATA-3080] Supporting local dictionary enable...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2897 @BJangir ...please go through the discussion and then decide whether for SDK be default we should enable local dictionary or not. We can further discuss on it based on your view http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Proposal-for-offline-and-DDL-local-dictionary-support-td67620.html ---
[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2898 @xuchuanyin ...your point is correct. To explain this in detail 1. We have already a way to clear the cached DataMaps through API call `DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier identifier)`. This API call ensures that for a given table all the dataMaps are cleared. 2. For FileFormat case if the above API is not integrated by the customer there is a possibility that drop table call will not come to carbondata layer and there can be few stale objects which can cause query failure. The PR is raised to handle the 2nd case. The other stale DataMaps are being already taken care by the LRU cache which will clear the stale entries one LRU cache threshold is reached. Let me know if you still have doubts ---
[GitHub] carbondata pull request #2898: [WIP] Fixed query failure in fileformat due s...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/carbondata/pull/2898 [WIP] Fixed query failure in fileformat due stale cache issue **Problem** While using FileFormat API, if a table created, dropped and then recreated with the same name the query fails because of schema mismatch issue **Analysis** In case of carbondata used through FileFormat API, once a table is dropped and recreated with the same name again then because the dataMap contains the stale carbon table schema mismatch exception is thrown **Solution** To avoid such scenarios it is always better to update the carbon table object retrieved - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Added UT to verify the scenario - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata stale_carbon_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2898.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2898 commit 2b6789ee5464f90f43ecac3654e58424257eaa29 Author: m00258959 Date: 2018-11-05T10:15:46Z Fixed select query failure due to stale carbonTable in dataMapFactory class ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Implement VectorizedReader for SDK...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2869 LGTM ---