[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
@kumarvishal09 ...I agree with you that it is a functional issue and we 
need to merge it. My point was before merging we can do one load performance 
test to see if there is any performance degrade and if there is any then we can 
update the benchmark results


---


[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
@akashrn5 .I agree with @xuchuanyin before merging the PR it is 
better to get the PR tested for performance. We can observe 2 things during the 
benchmark test - performance and compression ratio of rawcompress Vs 
compressDouble and then take the final decision


---


[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

2019-01-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3047
  
LGTM


---


[GitHub] carbondata issue #3044: [CARBONDATA-3149]Documentation for alter table colum...

2019-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3044
  
LGTM


---


[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...

2019-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3044#discussion_r245224509
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -681,24 +682,24 @@ Users can specify which columns to include and 
exclude for local dictionary gene
 
  **NOTE:** Drop Complex child column is not supported.
 
-   - # CHANGE DATA TYPE
+   - # CHANGE COLUMN NAME/TYPE

- This command is used to change the data type from INT to BIGINT or 
decimal precision from lower to higher.
+ This command is used to change the column's name and the data type 
from INT to BIGINT or decimal precision from lower to higher and rename column.
--- End diff --

Write this as below
`This command is used to change column name and the data type from INT to 
BIGINT or decimal precision from lower to higher.`


---


[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...

2019-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3044#discussion_r245224386
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -47,7 +47,8 @@ CarbonData DDL statements are documented here,which 
includes:
 * [RENAME TABLE](#rename-table)
 * [ADD COLUMNS](#add-columns)
 * [DROP COLUMNS](#drop-columns)
-* [CHANGE DATA TYPE](#change-data-type)
+* [RENAME COLUMN](#change-column-name-/-type)
+* [CHANGE DATA TYPE](#change-column-name-/-type)
--- End diff --

check for linking. With this text it will not link


---


[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...

2019-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3044#discussion_r245224565
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -681,24 +682,24 @@ Users can specify which columns to include and 
exclude for local dictionary gene
 
  **NOTE:** Drop Complex child column is not supported.
 
-   - # CHANGE DATA TYPE
+   - # CHANGE COLUMN NAME/TYPE

- This command is used to change the data type from INT to BIGINT or 
decimal precision from lower to higher.
+ This command is used to change the column's name and the data type 
from INT to BIGINT or decimal precision from lower to higher and rename column.
  Change of decimal data type from lower precision to higher precision 
will only be supported for cases where there is no data loss.
 
  ```
- ALTER TABLE [db_name.]table_name CHANGE col_name col_name 
changed_column_type
+ ALTER TABLE [db_name.]table_name CHANGE old_col_name new_col_name 
column_data_type
--- End diff --

Change this as below
`ALTER TABLE [db_name.]table_name CHANGE col_old_name col_new_name 
column_type`


---


[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

2019-01-03 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3047
  
LGTM...can be merged once build passes


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244920921
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -46,9 +47,9 @@ object CarbonStore {
 
   def showSegments(
   limit: Option[String],
-  tablePath: String,
+  carbonTable: CarbonTable,
--- End diff --

Move `carbonTable` as the first argument of method


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244922117
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, false)
--- End diff --

Boolean flag in the method call is to update the data and index size in the 
table status file. Pass the flag as true so that it computes the size and 
update the table status file. This will avoid calculation for each Show Segment 
call


---


[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...

2019-01-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3044#discussion_r244919178
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -681,24 +682,28 @@ Users can specify which columns to include and 
exclude for local dictionary gene
 
  **NOTE:** Drop Complex child column is not supported.
 
-   - # CHANGE DATA TYPE
+   - # RENAME COLUMN AND CHANGE DATATYPE
--- End diff --

Change this to `Change Column Name/Type`. Please go through the below link 
and try to align the documentation in that way if possible

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment


---


[GitHub] carbondata pull request #3039: [CARBONDATA-3217] Optimize implicit filter ex...

2019-01-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3039#discussion_r244668804
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.expression.conditional;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+
+import org.apache.commons.lang.StringUtils;
+
+/**
+ * Custom class to handle filter values for Implicit filter
+ */
+public class ImplicitExpression extends Expression {
+
+  /**
+   * map that contains the mapping of block id to the valid blocklets in 
that block which contain
+   * the data as per the applied filter
+   */
+  private Map> blockIdToBlockletIdMapping;
+
+  public ImplicitExpression(List implicitFilterList) {
+// initialize map with half the size of filter list as one block id 
can contain
+// multiple blocklets
+blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 
2);
+for (Expression value : implicitFilterList) {
+  String blockletPath = ((LiteralExpression) 
value).getLiteralExpValue().toString();
+  addBlockEntry(blockletPath);
+}
+  }
+
+  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
+this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
+  }
+
+  private void addBlockEntry(String blockletPath) {
+String blockId =
+blockletPath.substring(0, 
blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR));
+Set blockletIds = blockIdToBlockletIdMapping.get(blockId);
+if (null == blockletIds) {
+  blockletIds = new HashSet<>();
+  blockIdToBlockletIdMapping.put(blockId, blockletIds);
+}
+
blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1)));
--- End diff --

Not required to catch the NumberFormatException. Blocklet Id is always 
expected to be an integer number and in the complete flow it is used as an 
integer. So if there is any exception in the conversion then the code problem 
is not here but from the other part of code. So if we handle the exception here 
the actual cause will get suppressed here


---


[GitHub] carbondata pull request #3039: [CARBONDATA-3217] Optimize implicit filter ex...

2019-01-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3039#discussion_r244668632
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.expression.conditional;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+
+import org.apache.commons.lang.StringUtils;
+
+/**
+ * Custom class to handle filter values for Implicit filter
+ */
+public class ImplicitExpression extends Expression {
+
+  /**
+   * map that contains the mapping of block id to the valid blocklets in 
that block which contain
+   * the data as per the applied filter
+   */
+  private Map> blockIdToBlockletIdMapping;
+
+  public ImplicitExpression(List implicitFilterList) {
+// initialize map with half the size of filter list as one block id 
can contain
+// multiple blocklets
+blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 
2);
+for (Expression value : implicitFilterList) {
+  String blockletPath = ((LiteralExpression) 
value).getLiteralExpValue().toString();
+  addBlockEntry(blockletPath);
+}
+  }
+
+  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
+this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
+  }
+
+  private void addBlockEntry(String blockletPath) {
+String blockId =
+blockletPath.substring(0, 
blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR));
+Set blockletIds = blockIdToBlockletIdMapping.get(blockId);
+if (null == blockletIds) {
+  blockletIds = new HashSet<>();
+  blockIdToBlockletIdMapping.put(blockId, blockletIds);
+}
+
blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1)));
+  }
+
+  @Override public ExpressionResult evaluate(RowIntf value)
+  throws FilterUnsupportedException, FilterIllegalMemberException {
+throw new UnsupportedOperationException("Operation not supported for 
Implicit expression");
+  }
+
+  public Map> getBlockIdToBlockletIdMapping() {
+return blockIdToBlockletIdMapping;
+  }
+
+  @Override public ExpressionType getFilterExpressionType() {
+return ExpressionType.IMPLICIT;
+  }
+
+  @Override public void findAndSetChild(Expression oldExpr, Expression 
newExpr) {
--- End diff --

This method is not required to be implemented in this class so its 
implementation is empty


---


[GitHub] carbondata pull request #3039: [WIP] Optimize implicit filter expression per...

2018-12-31 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/3039

[WIP] Optimize implicit filter expression performance by removing extra 
serialization

Fixed performance issue for Implicit filter column
1. Removed serialization all the implicit filter values in each task. 
Instead serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor 
for each task. 1 time is sufficient

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
implicit_column_filter_serialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3039


commit 73390e08aad788698dd9f12e3513a4a4814afd73
Author: manishgupta88 
Date:   2018-12-27T09:48:07Z

Fixed performance issue for Implicit filter column
1. Removed serialization all the implicit filter values in each task. 
Instead serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor 
for each task. 1 time is sufficient




---


[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...

2018-12-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3027
  
LGTM


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-28 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244306298
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -262,13 +263,28 @@ private[sql] case class 
CarbonAlterTableColRenameDataTypeChangeCommand(
   carbonTable: CarbonTable,
   tableInfo: TableInfo,
   addColumnSchema: ColumnSchema,
-  schemaEvolutionEntry: SchemaEvolutionEntry): Unit = {
+  schemaEvolutionEntry: SchemaEvolutionEntry,
+  oldCarbonColumn: CarbonColumn): Unit = {
 val schemaConverter = new ThriftWrapperSchemaConverterImpl
-val a = 
List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema))
-val (tableIdentifier, schemaParts, cols) = 
AlterTableUtil.updateSchemaInfo(
-  carbonTable, schemaEvolutionEntry, tableInfo, Some(a))(sparkSession)
+// get the carbon column in schema order
+val carbonColumns = 
carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala
+  .collect { case carbonColumn if !carbonColumn.isInvisible => 
carbonColumn.getColumnSchema }
+// get the schema ordinal of the column for which the datatype changed 
or column is renamed
+var schemaOrdinal: Int = 0
+carbonColumns.foreach { carbonColumn =>
+  if 
(carbonColumn.getColumnName.equalsIgnoreCase(oldCarbonColumn.getColName)) {
+schemaOrdinal = carbonColumns.indexOf(carbonColumn)
--- End diff --

Use filter function to achieve the required output


---


[GitHub] carbondata issue #3022: [CARBONDATA-3196] [CARBONDATA-3203]Fixed Compaction ...

2018-12-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3022
  
LGTM


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244271907
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -262,13 +263,26 @@ private[sql] case class 
CarbonAlterTableColRenameDataTypeChangeCommand(
   carbonTable: CarbonTable,
   tableInfo: TableInfo,
   addColumnSchema: ColumnSchema,
-  schemaEvolutionEntry: SchemaEvolutionEntry): Unit = {
+  schemaEvolutionEntry: SchemaEvolutionEntry,
+  oldCarbonColumn: CarbonColumn): Unit = {
 val schemaConverter = new ThriftWrapperSchemaConverterImpl
-val a = 
List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema))
+// get the carbon column in schema order
+val carbonColumns = 
carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala
+  .filter(!_.isInvisible).collect{case carbonColumn => 
carbonColumn.getColumnSchema}
--- End diff --

Move filter operation to collect


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244271865
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala
 ---
@@ -93,11 +93,17 @@ private[sql] case class 
CarbonAlterTableAddColumnCommand(
   schemaEvolutionEntry.setAdded(newCols.toList.asJava)
   val thriftTable = schemaConverter
 .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, 
tableName)
+  // carbon columns based on schema order
+  val carbonColumns = 
carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala
+.collect { case carbonColumn => carbonColumn.getColumnSchema }
+.filter(!_.isInvisible)
--- End diff --

Move filter operation in collect operation by adding if clause in the case 
statement


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244272092
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -262,13 +263,26 @@ private[sql] case class 
CarbonAlterTableColRenameDataTypeChangeCommand(
   carbonTable: CarbonTable,
   tableInfo: TableInfo,
   addColumnSchema: ColumnSchema,
-  schemaEvolutionEntry: SchemaEvolutionEntry): Unit = {
+  schemaEvolutionEntry: SchemaEvolutionEntry,
+  oldCarbonColumn: CarbonColumn): Unit = {
 val schemaConverter = new ThriftWrapperSchemaConverterImpl
-val a = 
List(schemaConverter.fromExternalToWrapperColumnSchema(addColumnSchema))
+// get the carbon column in schema order
+val carbonColumns = 
carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala
+  .filter(!_.isInvisible).collect{case carbonColumn => 
carbonColumn.getColumnSchema}
+// get the schema ordinal of the column for which the datatype changed 
or column is renamed
+val schemaOrdinal = carbonColumns.collect {
--- End diff --

Instead of collect try and use foreach


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244271732
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala
 ---
@@ -93,11 +93,17 @@ private[sql] case class 
CarbonAlterTableAddColumnCommand(
   schemaEvolutionEntry.setAdded(newCols.toList.asJava)
   val thriftTable = schemaConverter
 .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, 
tableName)
+  // carbon columns based on schema order
+  val carbonColumns = 
carbonTable.getCreateOrderColumn(carbonTable.getTableName).asScala
+.collect { case carbonColumn => carbonColumn.getColumnSchema }
+.filter(!_.isInvisible)
+  // sort the new columns based on schema order
+  val sortedColsBasedActualSchemaOrder = newCols.sortBy(a => 
a.getSchemaOrdinal)
   val (tableIdentifier, schemaParts, cols) = 
AlterTableUtil.updateSchemaInfo(
   carbonTable,
   
schemaConverter.fromWrapperToExternalSchemaEvolutionEntry(schemaEvolutionEntry),
   thriftTable,
-  Some(newCols))(sparkSession)
+  Some(carbonColumns ++ 
sortedColsBasedActualSchemaOrder))(sparkSession)
--- End diff --

`AlterTableUtil.updateSchemaInfo` is not making use of columns passed so 
remove the method argument and use columns for changing the hive schema


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244270219
  
--- Diff: 
integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala
 ---
@@ -105,47 +106,37 @@ class CarbonHiveSessionCatalog(
   .asInstanceOf[HiveExternalCatalog].client
   }
 
-  def alterTableRename(oldTableIdentifier: TableIdentifier,
-  newTableIdentifier: TableIdentifier,
-  newTablePath: String): Unit = {
-getClient().runSqlHive(
-  s"ALTER TABLE ${ oldTableIdentifier.database.get }.${ 
oldTableIdentifier.table } " +
-  s"RENAME TO ${ oldTableIdentifier.database.get }.${ 
newTableIdentifier.table }")
-getClient().runSqlHive(
-  s"ALTER TABLE ${ oldTableIdentifier.database.get }.${ 
newTableIdentifier.table} " +
-  s"SET SERDEPROPERTIES" +
-  s"('tableName'='${ newTableIdentifier.table }', " +
-  s"'dbName'='${ oldTableIdentifier.database.get }', 'tablePath'='${ 
newTablePath }')")
-  }
-
-  override def alterTable(tableIdentifier: TableIdentifier,
-  schemaParts: String,
-  cols: 
Option[Seq[org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema]])
-  : Unit = {
-getClient()
-  .runSqlHive(s"ALTER TABLE ${tableIdentifier.database.get}.${ 
tableIdentifier.table } " +
-  s"SET TBLPROPERTIES(${ schemaParts })")
-  }
-
   override def alterAddColumns(tableIdentifier: TableIdentifier,
   schemaParts: String,
-  cols: 
Option[Seq[org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema]])
-  : Unit = {
+  cols: Option[Seq[ColumnSchema]]): Unit = {
 alterTable(tableIdentifier, schemaParts, cols)
+CarbonSessionUtil
+  .alterExternalCatalogForTableWithUpdatedSchema(tableIdentifier,
+cols,
+schemaParts,
+sparkSession)
   }
 
   override def alterDropColumns(tableIdentifier: TableIdentifier,
--- End diff --

Unify `alterDropColumns` and `alterAddColumns` into one method...keep 
interface methods same but move the common code to 1 method and call it from 
the interface methods


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3027#discussion_r244270598
  
--- Diff: 
integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionUtil.scala
 ---
@@ -93,4 +98,34 @@ object CarbonSessionUtil {
 )
   }
 
+  /**
+   * This method alter the table for datatype change or column rename 
operation, and update the
+   * external catalog directly
+   *
+   * @param tableIdentifier tableIdentifier for table
+   * @param colsall the column of table, which are updated 
with datatype change of
+   *new column name
+   * @param schemaParts schemaParts
+   * @param sparkSessionsparkSession
+   */
+  def alterExternalCatalogForTableWithUpdatedSchema(tableIdentifier: 
TableIdentifier,
+  cols: Option[Seq[ColumnSchema]],
+  schemaParts: String,
+  sparkSession: SparkSession): Unit = {
+val carbonTable = 
CarbonEnv.getCarbonTable(tableIdentifier)(sparkSession)
+val colArray: scala.collection.mutable.ArrayBuffer[StructField] = 
ArrayBuffer()
+cols.get.foreach(column =>
+  if (!column.isInvisible) {
+colArray += StructField(column.getColumnName,
+  SparkTypeConverter
+.convertCarbonToSparkDataType(column,
+  carbonTable))
+  }
+)
+sparkSession.sessionState.catalog.externalCatalog
+  .alterTableDataSchema(tableIdentifier.database.get,
--- End diff --

add a comment for the usage of API `alterTableDataSchema` to explain its 
purpose


---


[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

2018-12-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3022#discussion_r244085418
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
 ---
@@ -1068,4 +1068,40 @@ class TestCompactionComplexType extends QueryTest 
with BeforeAndAfterAll {
 sql("Drop table if exists adaptive")
   }
 
+  test("Test major compaction for struct of array type") {
+sql("DROP TABLE IF EXISTS carbon")
--- End diff --

1. Include dropping of table in afterAll also
2. Give the test case description as below
`Test major compaction with dictionary include for struct of array type`


---


[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

2018-12-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3022#discussion_r244086878
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -699,4 +701,33 @@ public static boolean 
isRawDataRequired(CarbonDataLoadConfiguration configuratio
 return iterators;
   }
 
+  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] 
complexCardinality) {
+if (!(numberOfSortColumns > 0)) {
--- End diff --

1. Rewrite the condition `if (!(numberOfSortColumns > 0))` as `if 
(numberOfSortColumns == 0)`
2. The functionality of this method is not clear. Add a comment to explain 
the logic explanation and use of this method


---


[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

2018-12-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3022#discussion_r244086110
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
 ---
@@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel 
getCarbonFactDataHandlerModel(CarbonLoa
 
.getColumnSchemaList(carbonTable.getDimensionByTableName(tableName),
 carbonTable.getMeasureByTableName(tableName));
 carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema);
-// get the cardinality for all all the columns including no dictionary 
columns
-int[] formattedCardinality = CarbonUtil
-
.getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), 
wrapperColumnSchema);
+// get the cardinality for all all the columns including no
+// dictionary columns and complex columns
+int[] dimAndComplexColumnCardinality =
+new int[segmentProperties.getDimColumnsCardinality().length + 
segmentProperties
+.getComplexDimColumnCardinality().length];
+for (int i = 0; i < 
segmentProperties.getDimColumnsCardinality().length; i++) {
+  dimAndComplexColumnCardinality[i] = 
segmentProperties.getDimColumnsCardinality()[i];
--- End diff --

Check for a resturcture drop column case when few loads are done and then 
dictionary column is dropped and compaction is triggered. In that case 
segmentProperties will contain cardinality of dropped column also check finally 
what is the schema and cardinality written during compaction


---


[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

2018-12-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3022#discussion_r244087077
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -699,4 +701,33 @@ public static boolean 
isRawDataRequired(CarbonDataLoadConfiguration configuratio
 return iterators;
   }
 
+  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] 
complexCardinality) {
+if (!(numberOfSortColumns > 0)) {
+  for (int i = 0; i < complexCardinality.length; i++) {
+if (complexCardinality[i] != 0) {
+  complexCardinality[i] = Integer.MAX_VALUE;
+}
+  }
+}
+List dimsLenList = new ArrayList();
+for (int eachDimLen : complexCardinality) {
+  if (eachDimLen != 0) dimsLenList.add(eachDimLen);
+}
+int[] dimLens = new int[dimsLenList.size()];
+for (int i = 0; i < dimsLenList.size(); i++) {
+  dimLens[i] = dimsLenList.get(i);
+}
+return dimLens;
+  }
+
+  public static KeyGenerator[] createKeyGeneratorForComplexDimension(int 
numberOfSortColumns,
--- End diff --

Add a method comment to explain the logic and method usage


---


[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...

2018-12-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3022#discussion_r244086909
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -699,4 +701,33 @@ public static boolean 
isRawDataRequired(CarbonDataLoadConfiguration configuratio
 return iterators;
   }
 
+  public static int[] calcDimensionLengths(int numberOfSortColumns, int[] 
complexCardinality) {
+if (!(numberOfSortColumns > 0)) {
+  for (int i = 0; i < complexCardinality.length; i++) {
+if (complexCardinality[i] != 0) {
+  complexCardinality[i] = Integer.MAX_VALUE;
+}
+  }
+}
+List dimsLenList = new ArrayList();
+for (int eachDimLen : complexCardinality) {
+  if (eachDimLen != 0) dimsLenList.add(eachDimLen);
+}
+int[] dimLens = new int[dimsLenList.size()];
--- End diff --

Conversion from arrayList to array can be done directly


---


[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename

2018-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2990
  
LGTM


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r243164028
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableDataTypeChangeModel, 
DataTypeInfo,
+  MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.datatype.DecimalType
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent,
+  AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+abstract class CarbonAlterTableColumnRenameCommand(oldColumnName: String, 
newColumnName: String)
+  extends MetadataCommand {
+
+  protected def validColumnsForRenaming(carbonColumns: 
mutable.Buffer[CarbonColumn],
+  oldCarbonColumn: CarbonColumn,
+  carbonTable: CarbonTable): Unit = {
+// check whether new column name is already an existing column name
+if 
(carbonColumns.exists(_.getColName.equalsIgnoreCase(newColumnName))) {
+  throw new MalformedCarbonCommandException(s"Column Rename Operation 
failed. New " +
+s"column name 
$newColumnName already exists" +
+s" in table ${ 
carbonTable.getTableName }")
+}
+
+// if the column rename is for complex column, block the operation
+if (oldCarbonColumn.isComplex) {
+  throw new MalformedCarbonCommandException(s"Column Rename Operation 
failed. Rename " +
+s"column is unsupported 
for complex datatype " +
+s"column ${ 
oldCarbonColumn.getColName }")
+}
+
+// if column rename operation is on partition column, then fail the 
rename operation
+if (null != carbonTable.getPartitionInfo) {
+  val partitionColumns = 
carbonTable.getPartitionInfo.getColumnSchemaList
+  partitionColumns.asScala.foreach {
+col =>
+  if (col.getColumnName.equalsIgnoreCase(oldColumnName)) {
+throw new MalformedCarbonCommandException(
+  s"Column Rename Operation failed. Renaming " +
+  s"the partition column $newColumnName is not " +
+  s"allowed")
+  }
+  }
+}
+
+  }
+}
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableDataTypeChangeModel)
+  extends 
CarbonAlterTableColumnRenameCommand(alterTableColRenameAndDataTypeChangeModel.columnName,
+alterTableColRenameAndDataTypeChangeModel.newColumnName) {
+
+  override def

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r243163786
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1511,7 +1514,15 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
 DataTypeInfo("decimal", precision, scale)
   case _ =>
-throw new MalformedCarbonCommandException("Data type provided is 
invalid.")
+if (isColumnRename) {
+  if (dataType.equalsIgnoreCase("decimal")) {
--- End diff --

instead of if else block use case matching here


---


[GitHub] carbondata issue #3002: [CARBONDATA-3182] Fixed SDV Testcase failures

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3002
  
LGTM


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242860813
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
 ---
@@ -171,11 +171,12 @@ case class 
DropPartitionCallableModel(carbonLoadModel: CarbonLoadModel,
 
 case class DataTypeInfo(dataType: String, precision: Int = 0, scale: Int = 
0)
 
-case class AlterTableDataTypeChangeModel(dataTypeInfo: DataTypeInfo,
+case class AlterTableColRenameAndDataTypeChangeModel(dataTypeInfo: 
DataTypeInfo,
--- End diff --

Seggregate Rename and DataType change model


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242837854
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
 ---
@@ -641,7 +641,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest 
with BeforeAndAfterAll {
 val ex3 = intercept[MalformedCarbonCommandException] {
   sql("alter table datamap_test7 change id id BIGINT")
 }
-assert(ex3.getMessage.contains("alter table change datatype is not 
supported"))
+assert(ex3.getMessage.contains("alter table change datatype or column 
rename is not supported"))
--- End diff --

Perform the validation separately for column rename and datatype change and 
change the message as per change in the TableOperation enum


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242870689
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -249,32 +249,99 @@ object AlterTableUtil {
* @param timeStamp
* @param sparkSession
*/
-  def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: 
Long)
+  def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: 
String, timeStamp: Long)
 (sparkSession: SparkSession): Unit = {
 val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
 val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
 val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable)
 val evolutionEntryList = 
thriftTable.fact_table.schema_evolution.schema_evolution_history
-val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 
1).time_stamp
-if (updatedTime == timeStamp) {
-  LOGGER.error(s"Reverting changes for $dbName.$tableName")
-  val removedColumns = 
evolutionEntryList.get(evolutionEntryList.size() - 1).removed
-  thriftTable.fact_table.table_columns.asScala.foreach { columnSchema 
=>
-removedColumns.asScala.foreach { removedColumn =>
-  if 
(columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) &&
-  !columnSchema.isInvisible) {
-columnSchema.setData_type(removedColumn.data_type)
-columnSchema.setPrecision(removedColumn.precision)
-columnSchema.setScale(removedColumn.scale)
-  }
+// here, there can be maximum of two entries for schemaEvolution, when 
my operation is
+// both column rename and datatype change. So check if last two 
Evolution entry timestamp is
+// same, then it is both column rename and datatype change, so revert 
two entries,else one entry
+if (evolutionEntryList.size() > 1 &&
+(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) &&
+(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp 
== timeStamp)) {
+  LOGGER.error(s"Reverting column rename and datatype changes for 
$dbName.$tableName")
+  revertColumnSchemaChanges(thriftTable, evolutionEntryList, true)
+} else {
+  if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) {
+LOGGER.error(s"Reverting changes for $dbName.$tableName")
+revertColumnSchemaChanges(thriftTable, evolutionEntryList, false)
+  }
--- End diff --

same comment as above


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242863058
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel,
 DataTypeInfo, MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.datatype.DecimalType
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, 
AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableColRenameAndDataTypeChangeModel)
+  extends MetadataCommand {
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+val tableName = alterTableColRenameAndDataTypeChangeModel.tableName
+val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName
+  .getOrElse(sparkSession.catalog.currentDatabase)
+var isColumnRenameOnly = false
+var isDataTypeChangeOnly = false
+var isBothColRenameAndDataTypeChange = false
+setAuditTable(dbName, tableName)
+setAuditInfo(Map(
+  "column" -> alterTableColRenameAndDataTypeChangeModel.columnName,
+  "newColumn" -> 
alterTableColRenameAndDataTypeChangeModel.newColumnName,
+  "newType" -> 
alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType))
+val locksToBeAcquired = List(LockUsage.METADATA_LOCK, 
LockUsage.COMPACTION_LOCK)
+var locks = List.empty[ICarbonLock]
+// get the latest carbon table and check for column existence
+var carbonTable: CarbonTable = null
+var timeStamp = 0L
+try {
+  locks = AlterTableUtil
+.validateTableAndAcquireLock(dbName, tableName, 
locksToBeAcquired)(sparkSession)
+  val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+  carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
+  if (!carbonTable.canAllow(carbonTable, 
TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE,
+alterTableColRenameAndDataTypeChangeModel.columnName)) {
+throw new MalformedCarbonCommandException(
+  "alter table change datatype or column rename is not supported 
for index datamap")
+  }
+  val operationContext = new OperationContext
+  val alterTableColRenameAndDataTypeChangePreEvent =
+AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, 
carbonTable,
+  alterTableColRenameAndDataTypeChangeModel)
+  OperationListenerBus.getInstance()
+.fire

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242861256
  
--- Diff: 
integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/SqlAstBuilderHelper.scala
 ---
@@ -38,25 +37,27 @@ trait SqlAstBuilderHelper extends SparkSqlAstBuilder {
   override def visitChangeColumn(ctx: ChangeColumnContext): LogicalPlan = {
 
 val newColumn = visitColType(ctx.colType)
+var isColumnRename = false
--- End diff --

Modify it to val


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242870325
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -318,15 +322,27 @@ class CarbonFileMetastore extends CarbonMetaStore {
*/
   def revertTableSchemaInAlterFailure(carbonTableIdentifier: 
CarbonTableIdentifier,
   thriftTableInfo: org.apache.carbondata.format.TableInfo,
-  absoluteTableIdentifier: AbsoluteTableIdentifier)(sparkSession: 
SparkSession): String = {
+  absoluteTableIdentifier: AbsoluteTableIdentifier,
+  timeStamp: Long)(sparkSession: SparkSession): String = {
 val schemaConverter = new ThriftWrapperSchemaConverterImpl
 val wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
   thriftTableInfo,
   carbonTableIdentifier.getDatabaseName,
   carbonTableIdentifier.getTableName,
   absoluteTableIdentifier.getTablePath)
 val evolutionEntries = 
thriftTableInfo.fact_table.schema_evolution.schema_evolution_history
-evolutionEntries.remove(evolutionEntries.size() - 1)
+// we may need to remove two evolution entries if the operation is 
both col rename and datatype
+// change operation
+if (evolutionEntries.size() > 1 &&
+(evolutionEntries.get(evolutionEntries.size() - 1).time_stamp ==
+evolutionEntries.get(evolutionEntries.size() - 2).time_stamp)) {
+  evolutionEntries.remove(evolutionEntries.size() - 1)
+  evolutionEntries.remove(evolutionEntries.size() - 2)
+} else {
+  if (evolutionEntries.get(evolutionEntries.size() - 1).time_stamp == 
timeStamp) {
+evolutionEntries.remove(evolutionEntries.size() - 1)
+  }
--- End diff --

Better not toi harcode the entries. Iterate and compare the entries and 
remove till timestamps are equal


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242865160
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel,
 DataTypeInfo, MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.datatype.DecimalType
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, 
AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableColRenameAndDataTypeChangeModel)
+  extends MetadataCommand {
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+val tableName = alterTableColRenameAndDataTypeChangeModel.tableName
+val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName
+  .getOrElse(sparkSession.catalog.currentDatabase)
+var isColumnRenameOnly = false
+var isDataTypeChangeOnly = false
+var isBothColRenameAndDataTypeChange = false
+setAuditTable(dbName, tableName)
+setAuditInfo(Map(
+  "column" -> alterTableColRenameAndDataTypeChangeModel.columnName,
+  "newColumn" -> 
alterTableColRenameAndDataTypeChangeModel.newColumnName,
+  "newType" -> 
alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType))
+val locksToBeAcquired = List(LockUsage.METADATA_LOCK, 
LockUsage.COMPACTION_LOCK)
+var locks = List.empty[ICarbonLock]
+// get the latest carbon table and check for column existence
+var carbonTable: CarbonTable = null
+var timeStamp = 0L
+try {
+  locks = AlterTableUtil
+.validateTableAndAcquireLock(dbName, tableName, 
locksToBeAcquired)(sparkSession)
+  val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+  carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
+  if (!carbonTable.canAllow(carbonTable, 
TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE,
+alterTableColRenameAndDataTypeChangeModel.columnName)) {
+throw new MalformedCarbonCommandException(
+  "alter table change datatype or column rename is not supported 
for index datamap")
+  }
+  val operationContext = new OperationContext
+  val alterTableColRenameAndDataTypeChangePreEvent =
+AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, 
carbonTable,
+  alterTableColRenameAndDataTypeChangeModel)
+  OperationListenerBus.getInstance()
+.fire

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242871800
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -249,32 +249,99 @@ object AlterTableUtil {
* @param timeStamp
* @param sparkSession
*/
-  def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: 
Long)
+  def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: 
String, timeStamp: Long)
 (sparkSession: SparkSession): Unit = {
 val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
 val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
 val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable)
 val evolutionEntryList = 
thriftTable.fact_table.schema_evolution.schema_evolution_history
-val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 
1).time_stamp
-if (updatedTime == timeStamp) {
-  LOGGER.error(s"Reverting changes for $dbName.$tableName")
-  val removedColumns = 
evolutionEntryList.get(evolutionEntryList.size() - 1).removed
-  thriftTable.fact_table.table_columns.asScala.foreach { columnSchema 
=>
-removedColumns.asScala.foreach { removedColumn =>
-  if 
(columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) &&
-  !columnSchema.isInvisible) {
-columnSchema.setData_type(removedColumn.data_type)
-columnSchema.setPrecision(removedColumn.precision)
-columnSchema.setScale(removedColumn.scale)
-  }
+// here, there can be maximum of two entries for schemaEvolution, when 
my operation is
+// both column rename and datatype change. So check if last two 
Evolution entry timestamp is
+// same, then it is both column rename and datatype change, so revert 
two entries,else one entry
+if (evolutionEntryList.size() > 1 &&
+(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) &&
+(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp 
== timeStamp)) {
+  LOGGER.error(s"Reverting column rename and datatype changes for 
$dbName.$tableName")
+  revertColumnSchemaChanges(thriftTable, evolutionEntryList, true)
+} else {
+  if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) {
+LOGGER.error(s"Reverting changes for $dbName.$tableName")
+revertColumnSchemaChanges(thriftTable, evolutionEntryList, false)
+  }
+}
+metastore
+  
.revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier,
+thriftTable, carbonTable.getAbsoluteTableIdentifier, 
timeStamp)(sparkSession)
+  }
+
+  /**
+   * This method reverts the column schema in case of failure in alter 
datatype change or col rename
+   * @param thriftTable thrift table
+   * @param evolutionEntryList SchemaEvolutionEntry List
+   * @param isBothColRenameAndDataTypeChange true if operation done is 
noth rename and datatype chng
+   */
+  private def revertColumnSchemaChanges(thriftTable: TableInfo,
+  evolutionEntryList: util.List[SchemaEvolutionEntry],
+  isBothColRenameAndDataTypeChange: Boolean): Unit = {
+var removedColumns: 
mutable.Buffer[org.apache.carbondata.format.ColumnSchema] = null
+if (isBothColRenameAndDataTypeChange) {
+  removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 
1).removed.asScala ++
+   evolutionEntryList.get(evolutionEntryList.size() - 
2).removed.asScala
+} else {
+  removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 
1).removed.asScala
+}
+thriftTable.fact_table.table_columns.asScala.foreach { columnSchema =>
+  removedColumns.foreach { removedColumn =>
+if 
(columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) &&
+!columnSchema.isInvisible) {
+  columnSchema.setColumn_name(removedColumn.column_name)
+  columnSchema.setData_type(removedColumn.data_type)
+  columnSchema.setPrecision(removedColumn.precision)
+  columnSchema.setScale(removedColumn.scale)
 }
   }
-  metastore
-
.revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier,
-  thriftTable, 
carbonTable.getAbsoluteTableIdentifier)(sparkSession)
 }
   }
 
+  /**
+   * This method modifies the table properties if column rename happened
+   *
+   * @param tableProperties
+   */
+  def modifyTableProper

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242871412
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -249,32 +249,99 @@ object AlterTableUtil {
* @param timeStamp
* @param sparkSession
*/
-  def revertDataTypeChanges(dbName: String, tableName: String, timeStamp: 
Long)
+  def revertColumnRenameAndDataTypeChanges(dbName: String, tableName: 
String, timeStamp: Long)
 (sparkSession: SparkSession): Unit = {
 val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
 val carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
 val thriftTable: TableInfo = metastore.getThriftTableInfo(carbonTable)
 val evolutionEntryList = 
thriftTable.fact_table.schema_evolution.schema_evolution_history
-val updatedTime = evolutionEntryList.get(evolutionEntryList.size() - 
1).time_stamp
-if (updatedTime == timeStamp) {
-  LOGGER.error(s"Reverting changes for $dbName.$tableName")
-  val removedColumns = 
evolutionEntryList.get(evolutionEntryList.size() - 1).removed
-  thriftTable.fact_table.table_columns.asScala.foreach { columnSchema 
=>
-removedColumns.asScala.foreach { removedColumn =>
-  if 
(columnSchema.column_id.equalsIgnoreCase(removedColumn.column_id) &&
-  !columnSchema.isInvisible) {
-columnSchema.setData_type(removedColumn.data_type)
-columnSchema.setPrecision(removedColumn.precision)
-columnSchema.setScale(removedColumn.scale)
-  }
+// here, there can be maximum of two entries for schemaEvolution, when 
my operation is
+// both column rename and datatype change. So check if last two 
Evolution entry timestamp is
+// same, then it is both column rename and datatype change, so revert 
two entries,else one entry
+if (evolutionEntryList.size() > 1 &&
+(evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) &&
+(evolutionEntryList.get(evolutionEntryList.size() - 2).time_stamp 
== timeStamp)) {
+  LOGGER.error(s"Reverting column rename and datatype changes for 
$dbName.$tableName")
+  revertColumnSchemaChanges(thriftTable, evolutionEntryList, true)
+} else {
+  if (evolutionEntryList.get(evolutionEntryList.size() - 1).time_stamp 
== timeStamp) {
+LOGGER.error(s"Reverting changes for $dbName.$tableName")
+revertColumnSchemaChanges(thriftTable, evolutionEntryList, false)
+  }
+}
+metastore
+  
.revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier,
+thriftTable, carbonTable.getAbsoluteTableIdentifier, 
timeStamp)(sparkSession)
+  }
+
+  /**
+   * This method reverts the column schema in case of failure in alter 
datatype change or col rename
+   * @param thriftTable thrift table
+   * @param evolutionEntryList SchemaEvolutionEntry List
+   * @param isBothColRenameAndDataTypeChange true if operation done is 
noth rename and datatype chng
+   */
+  private def revertColumnSchemaChanges(thriftTable: TableInfo,
+  evolutionEntryList: util.List[SchemaEvolutionEntry],
+  isBothColRenameAndDataTypeChange: Boolean): Unit = {
+var removedColumns: 
mutable.Buffer[org.apache.carbondata.format.ColumnSchema] = null
+if (isBothColRenameAndDataTypeChange) {
+  removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 
1).removed.asScala ++
+   evolutionEntryList.get(evolutionEntryList.size() - 
2).removed.asScala
+} else {
+  removedColumns = evolutionEntryList.get(evolutionEntryList.size() - 
1).removed.asScala
--- End diff --

I can see the same code in multiple classes. Try and refine the code and 
move this method at a common place


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242837109
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/features/TableOperation.java ---
@@ -21,7 +21,7 @@
   ALTER_RENAME,
   ALTER_DROP,
   ALTER_ADD_COLUMN,
-  ALTER_CHANGE_DATATYPE,
+  ALTER_COL_RENAME_AND_CHANGE_DATATYPE,
--- End diff --

Keep column rename and datatype change as different operations and make 
this change whereever applicable


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242849005
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1487,31 +1487,46 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
* @param values
* @return
*/
-  def parseDataType(dataType: String, values: Option[List[(Int, Int)]]): 
DataTypeInfo = {
+  def parseDataType(
+  dataType: String,
+  values: Option[List[(Int, Int)]],
+  isColumnRename: Boolean): DataTypeInfo = {
+def validateAndGetDecimalDatatype: DataTypeInfo = {
+  var precision: Int = 0
+  var scale: Int = 0
+  if (values.isDefined) {
+precision = values.get(0)._1
+scale = values.get(0)._2
+  } else {
+throw new MalformedCarbonCommandException("Decimal format provided 
is invalid")
+  }
+  // precision should be > 0 and <= 38 and scale should be >= 0 and <= 
38
+  if (precision < 1 || precision > 38) {
+throw new MalformedCarbonCommandException("Invalid value for 
precision")
+  } else if (scale < 0 || scale > 38) {
+throw new MalformedCarbonCommandException("Invalid value for 
scale")
+  }
+  DataTypeInfo("decimal", precision, scale)
+}
+
 dataType match {
   case "bigint" | "long" =>
 if (values.isDefined) {
   throw new MalformedCarbonCommandException("Invalid data type")
 }
 DataTypeInfo(dataType)
   case "decimal" =>
-var precision: Int = 0
-var scale: Int = 0
-if (values.isDefined) {
-  precision = values.get(0)._1
-  scale = values.get(0)._2
+validateAndGetDecimalDatatype
--- End diff --

This change is not required. You can revert.
In default case handle for rename scenario


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242870457
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonHiveMetaStore.scala
 ---
@@ -199,11 +202,22 @@ class CarbonHiveMetaStore extends CarbonFileMetastore 
{
*/
   override def revertTableSchemaInAlterFailure(carbonTableIdentifier: 
CarbonTableIdentifier,
   thriftTableInfo: format.TableInfo,
-  identifier: AbsoluteTableIdentifier)
+  identifier: AbsoluteTableIdentifier,
+  timeStamp: Long)
 (sparkSession: SparkSession): String = {
 val schemaConverter = new ThriftWrapperSchemaConverterImpl
 val evolutionEntries = 
thriftTableInfo.fact_table.schema_evolution.schema_evolution_history
-evolutionEntries.remove(evolutionEntries.size() - 1)
+// we may need to remove two evolution entries if the operation is 
both col rename and datatype
+// change operation
+if (evolutionEntries.size() > 1 && 
(evolutionEntries.get(evolutionEntries.size() - 1).time_stamp
+   == evolutionEntries.get(evolutionEntries.size() - 
2).time_stamp)) {
+  evolutionEntries.remove(evolutionEntries.size() - 1)
+  evolutionEntries.remove(evolutionEntries.size() - 2)
+} else {
+  if (evolutionEntries.get(evolutionEntries.size() - 1).time_stamp == 
timeStamp) {
+evolutionEntries.remove(evolutionEntries.size() - 1)
+  }
--- End diff --

Same comment as above


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242869441
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel,
 DataTypeInfo, MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.datatype.DecimalType
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, 
AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableColRenameAndDataTypeChangeModel)
+  extends MetadataCommand {
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+val tableName = alterTableColRenameAndDataTypeChangeModel.tableName
+val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName
+  .getOrElse(sparkSession.catalog.currentDatabase)
+var isColumnRenameOnly = false
+var isDataTypeChangeOnly = false
+var isBothColRenameAndDataTypeChange = false
--- End diff --

Using only 1 flag can serve the required purpose of code. So make use of 
only 2 flags `isDataTypeChangeOnly ` and modify the code logic accordingly. You 
can use the rename flag from `alterTableColRenameAndDataTypeChangeModel` model


---


[GitHub] carbondata pull request #2897: [CARBONDATA-3080] Supporting local dictionary...

2018-12-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2897#discussion_r242423291
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java
 ---
@@ -94,10 +93,9 @@ public void fillVector(int[] invertedIndex, int[] 
invertedIndexReverse, byte[] d
   }
 
   @Override public void fillRow(int rowId, CarbonColumnVector vector, int 
vectorRow) {
-if (!dictionary.isDictionaryUsed()) {
-  vector.setDictionary(dictionary);
-  dictionary.setDictionaryUsed();
-}
+// always set dictionary otherwise
+// empty dictionary will get set if same col is called again in 
projection.
+vector.setDictionary(dictionary);
--- End diff --

@BJangir 
1. Please check and confirm if the same problem occurs with CarbonSession 
also
2. Modify the PR description and specify the details for bug fixed in this 
PR after completion of point 1


---


[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

2018-12-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2980
  
LGTM


---


[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

2018-12-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2980
  
@manishnalla1994 ...Please rebase and remove the commit for PR 2979 as it 
is already merged


---


[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

2018-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2980
  
LGTM...check for test case failures if any and fix


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240886853
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java
 ---
@@ -631,7 +620,9 @@ public void setFactTimeStamp(long factTimeStamp) {
   }
 
   public String[] getDelimiters() {
-return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 };
+String[] delimiters = new String[complexDelimiters.size()];
+delimiters = complexDelimiters.toArray(delimiters);
+return delimiters;
--- End diff --

This method is not required. Conversion can be done whereever required


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240886665
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java
 ---
@@ -65,8 +65,7 @@
   private String csvHeader;
   private String[] csvHeaderColumns;
   private String csvDelimiter;
-  private String complexDelimiterLevel1;
-  private String complexDelimiterLevel2;
+  private ArrayList complexDelimiters = new ArrayList<>();
--- End diff --

We can do lazy initialization in the setter method. This will avoid extra 
memory being consumed for non complex type schemas


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240152242
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/CarbonParserFactory.java
 ---
@@ -51,23 +54,37 @@ public static GenericParser createParser(CarbonColumn 
carbonColumn, String[] com
*   delimiters
* @return GenericParser
*/
-  private static GenericParser createParser(CarbonColumn carbonColumn, 
String[] complexDelimiters,
+  private static GenericParser createParser(CarbonColumn carbonColumn,
+  Queue complexDelimiters,
   String nullFormat, int depth) {
+if (depth > 2) {
+  return null;
--- End diff --

I think we should throw exception with proper error message if depth is 
more than 2


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240126484
  
--- Diff: examples/spark2/src/main/resources/maptest2.csv ---
@@ -0,0 +1,2 @@
+1\002Nalla\0012\002Singh\0011\002Gupta\0014\002Kumar
+10\002Nallaa\00120\002Sissngh\001100\002Gusspta\00140\002Kumar
--- End diff --

Move the above newly added files to spark-common-test


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151744
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java
 ---
@@ -119,6 +119,10 @@
 "complex_delimiter_level_2",
 Maps.getOrDefault(options, "complex_delimiter_level_2", ":"));
 
+optionsFinal.put(
+"complex_delimiter_level_3",
+Maps.getOrDefault(options, "complex_delimiter_level_3", 
"003"));
+
--- End diff --

Remove hardcoding


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240150644
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -188,11 +188,13 @@ case class CarbonLoadDataCommand(
 val carbonLoadModel = new CarbonLoadModel()
 val tableProperties = 
table.getTableInfo.getFactTable.getTableProperties
 val optionsFinal = 
LoadOption.fillOptionWithDefaultValue(options.asJava)
+// These two delimiters are non configurable and hardcoded for map type
+optionsFinal.put("complex_delimiter_level_3", "\003")
+optionsFinal.put("complex_delimiter_level_4", "\004")
--- End diff --

Remove hardcoded values. Create one `ComplexDelimeterEnum` and use the 
values from there


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240153125
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java
 ---
@@ -34,8 +37,12 @@
   private int numberOfColumns;
 
   public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration 
configuration) {
-String[] complexDelimiters =
+String[] tempComplexDelimiters =
 (String[]) 
configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
+Queue complexDelimiters = new LinkedList<>();
+for (int i = 0; i < 4; i++) {
--- End diff --

Avoid using hardcoded values like `4` in the loops while codinginstead 
iterate over tempComplexDelimiters length


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151175
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
 ---
@@ -222,8 +222,8 @@ public static CarbonDataLoadConfiguration 
createConfiguration(CarbonLoadModel lo
 configuration.setSegmentId(loadModel.getSegmentId());
 configuration.setTaskNo(loadModel.getTaskNo());
 
configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS,
-new String[] { loadModel.getComplexDelimiterLevel1(),
-loadModel.getComplexDelimiterLevel2() });
+new String[] { loadModel.getComplexDelimiterLevel1(), 
loadModel.getComplexDelimiterLevel2(),
+loadModel.getComplexDelimiterLevel3(), 
loadModel.getComplexDelimiterLevel4() });
--- End diff --

Instead of adding these many delimeters methods in loadmodel, create a list 
or array of complex delimeters in loadmodel and then use it here


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240126217
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 ---
@@ -338,12 +338,15 @@ public static CarbonLoadModel 
getLoadModel(Configuration conf) throws IOExceptio
 SKIP_EMPTY_LINE,
 
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
 
-String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
+String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + 
"," + "003");
--- End diff --

Move all the default delimters to constants and use it everywhere 


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240147303
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
 ---
@@ -0,0 +1,452 @@
+/*
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+*
+http://www.apache.org/licenses/LICENSE-2.0
+*
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+package 
org.apache.carbondata.spark.testsuite.createTable.TestCreateDDLForComplexMapType
+
+import java.io.File
+import java.util
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import 
org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+
+class TestCreateDDLForComplexMapType extends QueryTest with 
BeforeAndAfterAll {
+  private val conf: Configuration = new Configuration(false)
+
+  val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+
+  val path = s"$rootPath/examples/spark2/src/main/resources/maptest2.csv"
+
+  private def checkForLocalDictionary(dimensionRawColumnChunks: util
+  .List[DimensionRawColumnChunk]): Boolean = {
+var isLocalDictionaryGenerated = false
+import scala.collection.JavaConversions._
+for (dimensionRawColumnChunk <- dimensionRawColumnChunks) {
+  if (dimensionRawColumnChunk.getDataChunkV3
+.isSetLocal_dictionary) {
+isLocalDictionaryGenerated = true
+  }
--- End diff --

You can directly use scala filter operation to check for local dictionary


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240145684
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 ---
@@ -338,12 +338,15 @@ public static CarbonLoadModel 
getLoadModel(Configuration conf) throws IOExceptio
 SKIP_EMPTY_LINE,
 
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
 
-String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
+String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + 
"," + "003");
 String[] split = complexDelim.split(",");
 model.setComplexDelimiterLevel1(split[0]);
 if (split.length > 1) {
   model.setComplexDelimiterLevel2(split[1]);
 }
+if (split.length > 2) {
+  model.setComplexDelimiterLevel3(split[2]);
+}
--- End diff --

Modify the above code as below
`if (split.length > 2) {
  model.setComplexDelimiterLevel2(split[1]);
  model.setComplexDelimiterLevel3(split[2]);
} else if (split.length > 1) {
  model.setComplexDelimiterLevel2(split[1]);
}`


---


[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

2018-12-10 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151368
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java
 ---
@@ -631,7 +651,7 @@ public void setFactTimeStamp(long factTimeStamp) {
   }
 
   public String[] getDelimiters() {
-return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 };
+return new String[] { complexDelimiterLevel1, complexDelimiterLevel2, 
complexDelimiterLevel3 };
--- End diff --

Modify this class and related changes in other classes as per the above 
comment


---


[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...

2018-12-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2968
  
LGTM


---


[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...

2018-12-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2940#discussion_r238152763
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
 ---
@@ -463,6 +464,39 @@ class TestPreAggCreateCommand extends QueryTest with 
BeforeAndAfterAll {
 executorService.shutdown()
   }
 
+  test("support set carbon.query.directQueryOnDataMap.enabled=true") {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+val testData = 
s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv"
+sql("drop table if exists mainTable")
+sql(
+  s"""
+ | CREATE TABLE mainTable
+ |   (id Int,
+ |   name String,
+ |   city String,
+ |   age Int)
+ | STORED BY 'org.apache.carbondata.format'
+  """.stripMargin);
+
+
+sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$testData'
+ | into table mainTable
+   """.stripMargin);
--- End diff --

for scala code semi-colon `;` is not required


---


[GitHub] carbondata issue #2956: [CARBONDATA-3134] fixed null values when cachelevel ...

2018-11-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2956
  
LGTM


---


[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2956#discussion_r236644573
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
 ---
@@ -332,13 +334,42 @@ public void clear() {
   }
   SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other =
   (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj;
-  return tableIdentifier.equals(other.tableIdentifier) && 
columnsInTable
-  .equals(other.columnsInTable) && Arrays
+  return tableIdentifier.equals(other.tableIdentifier) && 
checkColumnSchemaEquality(
+  columnsInTable, other.columnsInTable) && Arrays
   .equals(columnCardinality, other.columnCardinality);
 }
 
+private boolean checkColumnSchemaEquality(List obj1, 
List obj2) {
+  List clonedObj1 = new ArrayList<>(obj1);
--- End diff --

You can add the first check for length in the first line of method...if 
length of 2 lists is not same then we can return false from here itself


---


[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2956#discussion_r236646004
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
 ---
@@ -332,13 +334,42 @@ public void clear() {
   }
   SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other =
   (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj;
-  return tableIdentifier.equals(other.tableIdentifier) && 
columnsInTable
-  .equals(other.columnsInTable) && Arrays
+  return tableIdentifier.equals(other.tableIdentifier) && 
checkColumnSchemaEquality(
+  columnsInTable, other.columnsInTable) && Arrays
   .equals(columnCardinality, other.columnCardinality);
 }
 
+private boolean checkColumnSchemaEquality(List obj1, 
List obj2) {
+  List clonedObj1 = new ArrayList<>(obj1);
+  List clonedObj2 = new ArrayList<>(obj2);
+  clonedObj1.addAll(obj1);
+  clonedObj2.addAll(obj2);
+  Collections.sort(clonedObj1, new Comparator() {
+@Override public int compare(ColumnSchema o1, ColumnSchema o2) {
+  return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+}
+  });
+  Collections.sort(clonedObj2, new Comparator() {
+@Override public int compare(ColumnSchema o1, ColumnSchema o2) {
+  return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+}
+  });
+  boolean exists = true;
+  for (int i = 0; i < obj1.size(); i++) {
+if (!clonedObj1.get(i).equalsWithColumnId(clonedObj2.get(i))) {
+  exists = false;
+  break;
+}
+  }
+  return exists;
+}
+
 @Override public int hashCode() {
-  return tableIdentifier.hashCode() + columnsInTable.hashCode() + 
Arrays
+  int hashCode = 0;
+  for (ColumnSchema columnSchema: columnsInTable) {
+hashCode += columnSchema.hashCodeWithColumnId();
--- End diff --

rename variable name to `allColumnsHashCode`


---


[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

2018-11-22 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2923
  
LGTM


---


[GitHub] carbondata issue #2872: [CARBONDATA-3113] Fixed Local Dictionary Query Perfo...

2018-11-21 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2872
  
LGTM


---


[GitHub] carbondata issue #2863: [CARBONDATA-3112] Optimise decompressing while filli...

2018-11-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2863
  
LGTM


---


[GitHub] carbondata issue #2906: [CARBONDATA-3088][Compaction] support prefetch for c...

2018-11-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2906
  
LGTM


---


[GitHub] carbondata issue #2928: [CARBONDATA-3106] Written_by_APPNAME not serialized ...

2018-11-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2928
  
LGTM


---


[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

2018-11-18 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r234499385
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
 ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the 
colname in table schema and
 sql("drop datamap if exists preaggTable on table partitionTable")
   }
 
+  test("validate data in partition table after dropping and adding a 
column") {
+sql("drop table if exists par")
+sql("create table par(name string) partitioned by (age double) stored 
by " +
+  "'carbondata'")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
+sql("alter table par drop columns(name)")
+sql("alter table par add columns(name string)")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
--- End diff --

keeping partition column at the end is carbondata behavior which may or may 
not be known to user. For a normal table whenever a column is dropped and 
added, the added column data should either be added as the last column in csv 
file or it should be mapped through fileheader which is the correct behavior.
As you are using the same csv file in your test case without changing the 
order of data and providing header the above explained behavior might not hold 
true. Please revisit the changes and take opinion from other PMC's/Committers 
on this behavioral change


---


[GitHub] carbondata issue #2872: [WIP] Added reusable buffer code

2018-11-16 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2872
  
@kumarvishal09 ...In general, instead of passing the `ReusableDataBuffer` 
instance everwhere can we make this class singleton and create a `byte[]` pool 
and use the concept of object pool. This will help in following ways:
1. It will avoid modification of too many method/code modifications as 
wherever required we can take the object from pool and make the code simpler.
2. Currentlt it is being used for filling hthe decompressed data to avoid 
array copy. In future it will be easier to make use of the reusable buffer 
object pool wherever required without making too many changes in the code.
We can discuss more on this and decide.


---


[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code

2018-11-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2872#discussion_r234176956
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/MeasureColumnChunkReader.java
 ---
@@ -56,13 +57,14 @@ MeasureRawColumnChunk readRawMeasureChunk(FileReader 
fileReader, int columnIndex
* @return
* @throws IOException
*/
-  ColumnPage decodeColumnPage(MeasureRawColumnChunk measureRawColumnChunk,
-  int pageNumber) throws IOException, MemoryException;
+  ColumnPage decodeColumnPage(MeasureRawColumnChunk measureRawColumnChunk, 
int pageNumber,
+  ReusableDataBuffer reusableDataBuffer) throws IOException, 
MemoryException;
 
   /**
* Decode raw data and fill the vector
*/
-  void decodeColumnPageAndFillVector(MeasureRawColumnChunk 
measureRawColumnChunk,
-  int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, 
MemoryException;
+  void decodeColumnPageAndFillVector(MeasureRawColumnChunk 
measureRawColumnChunk, int pageNumber,
+  ColumnVectorInfo vectorInfo, ReusableDataBuffer reusableDataBuffer)
+  throws IOException, MemoryException;
--- End diff --

Can we have separate interface methods with and wothout 
`ReusableDataBuffer` for both the above methods. Many places null is being 
passed which can be avoided using separate interface methods and then the 
common code can be refactored for classes implementing these methods


---


[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code

2018-11-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2872#discussion_r234201988
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/ZstdCompressor.java
 ---
@@ -74,4 +74,16 @@ public long maxCompressedLength(long inputSize) {
   public boolean supportUnsafe() {
 return false;
   }
+
+  @Override public int unCompressedLength(byte[] data, int offset, int 
length) {
+throw new RuntimeException("Unsupported operation Exception");
+  }
+
+  @Override public int rawUncompress(byte[] data, int offset, int length, 
byte[] output) {
+throw new RuntimeException("Unsupported operation Exception");
+  }
+
+  @Override public boolean supportReusableBuffer() {
+return false;
--- End diff --

Move the default implementation to Abstract class `AbstractCompressor`


---


[GitHub] carbondata issue #2918: [CARBONDATA-3098] Fix for negative exponents value g...

2018-11-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2918
  
LGTM


---


[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...

2018-11-13 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2901
  
LGTM


---


[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...

2018-11-13 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2895
  
LGTM


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
@ravipesala in this method we are passing the `List` . The 
list contains the segments file only from the new table. So only the new index 
files will be loaded and queried and the same is covered in test case added as 
part of this PR.
The stale dataMap entries will never be referred. For clearing them we need 
to configure the LRU cache size which will automatically take care of eviction 
when threshold is reached. If LRU cache size is not configured then there is no 
mechanism to clear the stale datamaps 


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
@ravipesala ...which method exactly you are referring to?...In all 
`getDataMap` methods latest `carbonTable` object is passed.and used for 
fetching the dataMaps..there is only one `getAllDataMaps` method which does not 
have any parameter but that is being only in the test cases...


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
retest this please


---


[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2902#discussion_r231012403
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala
 ---
@@ -772,12 +772,33 @@ class UpdateCarbonTableTestCase extends QueryTest 
with BeforeAndAfterAll {
 sql("""drop table if exists iud.dest33_part""")
   }
 
+  test("check data after update with row.filter pushdown as false") {
+CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+  .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false")
+sql("""drop table if exists iud.dest33_flat""")
+sql(
+  """create table iud.dest33_part (c1 int,c2 string, c3 short) STORED 
BY 'carbondata'"""
+.stripMargin)
+sql(
+  s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/negativevalue.csv' 
INTO table iud
+ |.dest33_part options('header'='false')""".stripMargin)
+sql(
+  """update iud.dest33_part d set (c1) = (5) where d.c1 = 
0""".stripMargin).show()
+checkAnswer(sql("select c3 from iud.dest33_part"), Seq(Row(-300), 
Row(0), Row(-200), Row(700)
+  , Row(100), Row(-100), Row(null)))
+sql("""drop table if exists iud.dest33_part""")
+CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+  .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true")
+  }
+
   override def afterAll {
 sql("use default")
 sql("drop database  if exists iud cascade")
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.isHorizontalCompactionEnabled , 
"true")
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER , "true")
+CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+  .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false")
--- End diff --

instead of hard coding `"false"` use default value from constants


---


[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2902#discussion_r231016130
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
 ---
@@ -257,7 +265,13 @@ private void fillVector(ColumnPage columnPage, 
CarbonColumnVector vector,
   } else if (pageDataType == DataTypes.SHORT) {
 short[] shortData = columnPage.getShortPage();
 if (vectorDataType == DataTypes.SHORT) {
-  vector.putShorts(0, pageSize, shortData, 0);
+  if (isUnderlyingVectorPresent) {
+for (int i = 0; i < pageSize; i++) {
+  vector.putShort(i, shortData[i]);
+}
+  } else {
+vector.putShorts(0, pageSize, shortData, 0);
--- End diff --

I think using `putShorts/putFloats` is common and unavoidable. In future 
also any new encoding class can make use of these method and then again the 
same problem can occur. Is it feasible to modify the vector classes 
implementation methods itself just like an example below
`public void putShorts(int rowId, int count, short[] src, int srcIndex) {
for (int i = srcIndex; i < count; i++) {
  putShort(rowId++, src[i]);
}
  }`
This way it will be better


---


[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2902#discussion_r231014429
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala
 ---
@@ -772,12 +772,33 @@ class UpdateCarbonTableTestCase extends QueryTest 
with BeforeAndAfterAll {
 sql("""drop table if exists iud.dest33_part""")
   }
 
+  test("check data after update with row.filter pushdown as false") {
+CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+  .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false")
+sql("""drop table if exists iud.dest33_flat""")
+sql(
+  """create table iud.dest33_part (c1 int,c2 string, c3 short) STORED 
BY 'carbondata'"""
+.stripMargin)
+sql(
+  s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/negativevalue.csv' 
INTO table iud
+ |.dest33_part options('header'='false')""".stripMargin)
+sql(
+  """update iud.dest33_part d set (c1) = (5) where d.c1 = 
0""".stripMargin).show()
+checkAnswer(sql("select c3 from iud.dest33_part"), Seq(Row(-300), 
Row(0), Row(-200), Row(700)
+  , Row(100), Row(-100), Row(null)))
+sql("""drop table if exists iud.dest33_part""")
+CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+  .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true")
--- End diff --

After test case completion we should set the default value for 
`CARBON_PUSH_ROW_FILTERS_FOR_VECTOR`?...default property is false so I think at 
the start of test case no need to modify the property value


---


[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2901
  
LGTM...please check the CI failure


---


[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2902#discussion_r231009938
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1734,7 +1734,7 @@ private CarbonCommonConstants() {
   public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR =
   "carbon.push.rowfilters.for.vector";
 
-  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"false";
+  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"true";
--- End diff --

Any specific reason for changing the default value?


---


[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2895
  
LGTM..can be merged once build is passed


---


[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2895#discussion_r230999455
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java
 ---
@@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] 
invertedIndexReverse, byte[] d
 int columnValueSize = dimensionDataChunkStore.getColumnValueSize();
 int rowsNum = data.length / columnValueSize;
 CarbonColumnVector vector = vectorInfo.vector;
-if (!dictionary.isDictionaryUsed()) {
-  vector.setDictionary(dictionary);
-  dictionary.setDictionaryUsed();
-}
+vector.setDictionary(dictionary);
--- End diff --

Both the method are called in carbon flow for vector filling. One is direct 
fill case and the other one is old vector fill flow. Please cross check once


---


[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2901
  
Add PR description


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
retest this please


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
@xuchuanyin ...yes this scenario will work fine. In case of dropping normal 
table it will go through CarbonSession flow and drop table command is already 
taking care of clearing the datamaps.
In case of fileFormat table drop, if the clear dataMap API is not 
integrated by customer in that case the changes done in this PR will take care 
of referring only to latest carbon table


---


[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2901#discussion_r230818772
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -171,13 +171,20 @@ public Object getCurrentValue() throws IOException, 
InterruptedException {
 rowCount += 1;
 Object[] row = new Object[carbonColumnarBatch.columnVectors.length];
 for (int i = 0; i < carbonColumnarBatch.columnVectors.length; i ++) {
+  Object data = carbonColumnarBatch.columnVectors[i].getData(batchIdx 
- 1);
   if (carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.STRING
   || carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.VARCHAR) {
-byte[] data = (byte[]) 
carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1);
-row[i] = ByteUtil.toString(data, 0, data.length);
+if (data == null) {
+  row[i] = null;
+} else {
+  row[i] = ByteUtil.toString((byte[]) data, 0, (((byte[]) 
data).length));
+}
   } else if (carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.BOOLEAN) {
-byte data = (byte) 
carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1);
-row[i] = ByteUtil.toBoolean(data);
+if (data == null) {
+  row[i] = null;
+} else {
+  row[i] = ByteUtil.toBoolean((byte) data);
+}
--- End diff --

For other dataTypes is the same handling of null required?...If required 
then you can move the if check for `data == null` before first if check and set 
the row to null if data is null and continue


---


[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2901#discussion_r230816465
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1844,4 +1844,53 @@ public void testVectorReader() {
 }
   }
 
+  @Test
+  public void testReadingNullValues() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[2];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.BOOLEAN);
--- End diff --

Rename `shortField` to `booleanField`


---


[GitHub] carbondata issue #2895: [HOTFIX] Fix NPE in spark, when same vector reads fi...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2895
  
LGTM apart from a minor comment


---


[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2895#discussion_r230814030
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java
 ---
@@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] 
invertedIndexReverse, byte[] d
 int columnValueSize = dimensionDataChunkStore.getColumnValueSize();
 int rowsNum = data.length / columnValueSize;
 CarbonColumnVector vector = vectorInfo.vector;
-if (!dictionary.isDictionaryUsed()) {
-  vector.setDictionary(dictionary);
-  dictionary.setDictionaryUsed();
-}
+vector.setDictionary(dictionary);
--- End diff --

Is the same handling required in `fillRow` method also in the same 
class?..If required then `isDictionaryUsed` and `setDictionaryUsed` API's will 
not be required and those can also be removed from the interface


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
retest this please


---


[GitHub] carbondata issue #2897: [CARBONDATA-3080] Supporting local dictionary enable...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2897
  
@BJangir ...please go through the discussion and then decide whether for 
SDK be default we should enable local dictionary or not. We can further discuss 
on it based on your view

http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Proposal-for-offline-and-DDL-local-dictionary-support-td67620.html


---


[GitHub] carbondata issue #2898: [CARBONDATA-3077] Fixed query failure in fileformat ...

2018-11-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2898
  
@xuchuanyin ...your point is correct. To explain this in detail
1. We have already a way to clear the cached DataMaps through API call 
`DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier 
identifier)`. This API call ensures that for a given table all the dataMaps are 
cleared.
2. For FileFormat case if the above API is not integrated by the customer 
there is a possibility that drop table call will not come to carbondata layer 
and there can be few stale objects which can cause query failure. 
The PR is raised to handle the 2nd case. The other stale DataMaps are being 
already taken care by the LRU cache which will clear the stale entries one LRU 
cache threshold is reached.
Let me know if you still have doubts


---


[GitHub] carbondata pull request #2898: [WIP] Fixed query failure in fileformat due s...

2018-11-05 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2898

[WIP] Fixed query failure in fileformat due stale cache issue

**Problem**
While using FileFormat API, if a table created, dropped and then recreated 
with the same name the query fails because of schema mismatch issue

**Analysis**
In case of carbondata used through FileFormat API, once a table is dropped 
and recreated with the same name again then because the dataMap contains the 
stale carbon table schema mismatch exception is thrown

**Solution**
To avoid such scenarios it is always better to update the carbon table 
object retrieved

 - [ ] Any interfaces changed?
No 
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Added UT to verify the scenario
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata stale_carbon_table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2898


commit 2b6789ee5464f90f43ecac3654e58424257eaa29
Author: m00258959 
Date:   2018-11-05T10:15:46Z

Fixed select query failure due to stale carbonTable in dataMapFactory class




---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Implement VectorizedReader for SDK...

2018-11-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
LGTM


---


  1   2   3   4   5   6   7   8   >