[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586667966 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2013/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586663473 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/310/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837579 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql._ +import org.apache.spark.sql.execution.command._ + +import org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.events._ +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Create Materialized View Command implementation + * It will create the MV table, load the MV table (if deferred rebuild is false), + * and register the MV schema in [[DataMapStoreManager]] + */ +case class CreateMaterializedViewCommand( +mvName: String, +properties: Map[String, String], +queryString: Option[String], +ifNotExistsSet: Boolean = false, +deferredRebuild: Boolean = false) + extends AtomicRunnableCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + private var dataMapProvider: DataMapProvider = _ + private var dataMapSchema: DataMapSchema = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + +setAuditInfo(Map("mvName" -> mvName) ++ properties) + +dataMapSchema = new DataMapSchema(mvName, MVDataMapProvider.MV_PROVIDER_NAME) +val property = properties.map(x => (x._1.trim, x._2.trim)).asJava +val javaMap = new java.util.HashMap[String, String](property) Review comment: It needs an mutable Map. I changed to use mutable.Map now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837579 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql._ +import org.apache.spark.sql.execution.command._ + +import org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.events._ +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Create Materialized View Command implementation + * It will create the MV table, load the MV table (if deferred rebuild is false), + * and register the MV schema in [[DataMapStoreManager]] + */ +case class CreateMaterializedViewCommand( +mvName: String, +properties: Map[String, String], +queryString: Option[String], +ifNotExistsSet: Boolean = false, +deferredRebuild: Boolean = false) + extends AtomicRunnableCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + private var dataMapProvider: DataMapProvider = _ + private var dataMapSchema: DataMapSchema = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + +setAuditInfo(Map("mvName" -> mvName) ++ properties) + +dataMapSchema = new DataMapSchema(mvName, MVDataMapProvider.MV_PROVIDER_NAME) +val property = properties.map(x => (x._1.trim, x._2.trim)).asJava +val javaMap = new java.util.HashMap[String, String](property) Review comment: It needs an mutable Map while the passing parameter is immutable map. I changed to use mutable.Map now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3623: [CARBONDATA-3692] Support NoneCompression during loading data.
CarbonDataQA1 commented on issue #3623: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3623#issuecomment-586617116 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586616331 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2012/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Pickupolddriver opened a new pull request #3623: Origin/none compress
Pickupolddriver opened a new pull request #3623: Origin/none compress URL: https://github.com/apache/carbondata/pull/3623 ### Why is this PR needed? In some cases, the data need to be uncompressed after loading into Carbondata file. In the current version, the project does not support loading data without compression. It could speed up data loading form Flink to OBS by skipping compress and uncompress data. ### What changes were proposed in this PR? Provide a new Compressor as NoneCompressor implements the AbstractCompressor which will actually not compress and uncompress anything. This compressor can be set by calling CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR,"none"); ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586609945 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/309/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837579 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql._ +import org.apache.spark.sql.execution.command._ + +import org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.events._ +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Create Materialized View Command implementation + * It will create the MV table, load the MV table (if deferred rebuild is false), + * and register the MV schema in [[DataMapStoreManager]] + */ +case class CreateMaterializedViewCommand( +mvName: String, +properties: Map[String, String], +queryString: Option[String], +ifNotExistsSet: Boolean = false, +deferredRebuild: Boolean = false) + extends AtomicRunnableCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + private var dataMapProvider: DataMapProvider = _ + private var dataMapSchema: DataMapSchema = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + +setAuditInfo(Map("mvName" -> mvName) ++ properties) + +dataMapSchema = new DataMapSchema(mvName, MVDataMapProvider.MV_PROVIDER_NAME) +val property = properties.map(x => (x._1.trim, x._2.trim)).asJava +val javaMap = new java.util.HashMap[String, String](property) Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837509 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql._ +import org.apache.spark.sql.execution.command._ + +import org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.events._ +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Create Materialized View Command implementation + * It will create the MV table, load the MV table (if deferred rebuild is false), + * and register the MV schema in [[DataMapStoreManager]] + */ +case class CreateMaterializedViewCommand( +mvName: String, +properties: Map[String, String], +queryString: Option[String], +ifNotExistsSet: Boolean = false, +deferredRebuild: Boolean = false) + extends AtomicRunnableCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + private var dataMapProvider: DataMapProvider = _ + private var dataMapSchema: DataMapSchema = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + +setAuditInfo(Map("mvName" -> mvName) ++ properties) + +dataMapSchema = new DataMapSchema(mvName, MVDataMapProvider.MV_PROVIDER_NAME) +val property = properties.map(x => (x._1.trim, x._2.trim)).asJava Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837428 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVHelper.scala ## @@ -0,0 +1,487 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import java.util + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, SparkSession} +import org.apache.spark.sql.catalyst.{CarbonParserUtil, TableIdentifier} +import org.apache.spark.sql.catalyst.catalog.CatalogTable +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Cast, Coalesce, Expression, Literal, ScalaUDF} +import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Average} +import org.apache.spark.sql.catalyst.plans.logical.{Join, Limit, LogicalPlan} +import org.apache.spark.sql.execution.command.{Field, PartitionerField, TableModel, TableNewProcessor} +import org.apache.spark.sql.execution.command.table.{CarbonCreateTableCommand, CarbonDropTableCommand} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types.{ArrayType, DateType, MapType, StructType} +import org.apache.spark.util.{DataMapUtil, PartitionUtils} + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.metadata.schema.datamap.{DataMapClassProvider, DataMapProperty} +import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, DataMapSchema, RelationIdentifier} +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.mv.plans.modular.{GroupBy, ModularPlan} +import org.apache.carbondata.mv.plans.util.SQLBuilder +import org.apache.carbondata.mv.rewrite.{SummaryDatasetCatalog, Utils} +import org.apache.carbondata.mv.timeseries.{TimeSeriesFunction, TimeSeriesUtil} +import org.apache.carbondata.spark.util.CommonUtil + +/** + * Utility for MV datamap operations. + */ +object MVHelper { + + def createMVDataMap( + sparkSession: SparkSession, + dataMapSchema: DataMapSchema, + queryString: String, + ifNotExistsSet: Boolean = false): Unit = { +val dmProperties = dataMapSchema.getProperties.asScala +if (dmProperties.contains("streaming") && dmProperties("streaming").equalsIgnoreCase("true")) { + throw new MalformedCarbonCommandException( +s"Materialized view does not support streaming" + ) +} +val mvUtil = new MVUtil +mvUtil.validateDMProperty(dmProperties) +val logicalPlan = Utils.dropDummyFunc( + MVParser.getMVPlan(queryString, sparkSession)) +// if there is limit in MV ctas query string, throw exception, as its not a valid usecase +logicalPlan match { + case Limit(_, _) => +throw new MalformedCarbonCommandException("Materialized view does not support the query " + + "with limit") + case _ => +} +val selectTables = getTables(logicalPlan) +if (selectTables.isEmpty) { + throw new MalformedCarbonCommandException( +s"Non-Carbon table does not support creating MV datamap") +} +val modularPlan = validateMVQuery(sparkSession, logicalPlan) +val updatedQueryWithDb = modularPlan.asCompactSQL +val (timeSeriesColumn, granularity): (String, String) = validateMVTimeSeriesQuery( + logicalPlan, + dataMapSchema) +val fullRebuild = isFullReload(logicalPlan) +var counter = 0 +// the ctas query can have duplicate columns, so we should take distinct and create fields, +// so that it won't fail during create mv table +val fields = logicalPlan.output.map { attr => +
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379837416 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVHelper.scala ## @@ -0,0 +1,487 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import java.util + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, SparkSession} +import org.apache.spark.sql.catalyst.{CarbonParserUtil, TableIdentifier} +import org.apache.spark.sql.catalyst.catalog.CatalogTable +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Cast, Coalesce, Expression, Literal, ScalaUDF} +import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Average} +import org.apache.spark.sql.catalyst.plans.logical.{Join, Limit, LogicalPlan} +import org.apache.spark.sql.execution.command.{Field, PartitionerField, TableModel, TableNewProcessor} +import org.apache.spark.sql.execution.command.table.{CarbonCreateTableCommand, CarbonDropTableCommand} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types.{ArrayType, DateType, MapType, StructType} +import org.apache.spark.util.{DataMapUtil, PartitionUtils} + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.metadata.schema.datamap.{DataMapClassProvider, DataMapProperty} +import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, DataMapSchema, RelationIdentifier} +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.mv.plans.modular.{GroupBy, ModularPlan} +import org.apache.carbondata.mv.plans.util.SQLBuilder +import org.apache.carbondata.mv.rewrite.{SummaryDatasetCatalog, Utils} +import org.apache.carbondata.mv.timeseries.{TimeSeriesFunction, TimeSeriesUtil} +import org.apache.carbondata.spark.util.CommonUtil + +/** + * Utility for MV datamap operations. + */ +object MVHelper { + + def createMVDataMap( + sparkSession: SparkSession, + dataMapSchema: DataMapSchema, + queryString: String, + ifNotExistsSet: Boolean = false): Unit = { +val dmProperties = dataMapSchema.getProperties.asScala +if (dmProperties.contains("streaming") && dmProperties("streaming").equalsIgnoreCase("true")) { + throw new MalformedCarbonCommandException( +s"Materialized view does not support streaming" + ) +} +val mvUtil = new MVUtil +mvUtil.validateDMProperty(dmProperties) +val logicalPlan = Utils.dropDummyFunc( + MVParser.getMVPlan(queryString, sparkSession)) +// if there is limit in MV ctas query string, throw exception, as its not a valid usecase +logicalPlan match { + case Limit(_, _) => +throw new MalformedCarbonCommandException("Materialized view does not support the query " + + "with limit") + case _ => +} +val selectTables = getTables(logicalPlan) +if (selectTables.isEmpty) { + throw new MalformedCarbonCommandException( +s"Non-Carbon table does not support creating MV datamap") +} +val modularPlan = validateMVQuery(sparkSession, logicalPlan) +val updatedQueryWithDb = modularPlan.asCompactSQL +val (timeSeriesColumn, granularity): (String, String) = validateMVTimeSeriesQuery( + logicalPlan, + dataMapSchema) +val fullRebuild = isFullReload(logicalPlan) +var counter = 0 +// the ctas query can have duplicate columns, so we should take distinct and create fields, +// so that it won't fail during create mv table +val fields = logicalPlan.output.map { attr => +
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379834201 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVExtensionSqlParser.scala ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import org.apache.spark.sql.{CarbonEnv, CarbonUtils, SparkSession} +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.SparkSqlParser +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.util.CarbonException + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.spark.util.CarbonScalaUtil + +/** + * Parser for Materialized View related command + */ +class MVExtensionSqlParser( +conf: SQLConf, +sparkSession: SparkSession, +initialParser: ParserInterface +) extends SparkSqlParser(conf) { + + val parser = new MVParser + + override def parsePlan(sqlText: String): LogicalPlan = { +parser.synchronized { + CarbonEnv.getInstance(sparkSession) +} +CarbonUtils.updateSessionInfoToCurrentThread(sparkSession) +try { + val plan = parser.parse(sqlText) + plan +} catch { + case ce: MalformedCarbonCommandException => +throw ce + case ex: Throwable => +try { + val parsedPlan = initialParser.parsePlan(sqlText) + CarbonScalaUtil.cleanParserThreadLocals + parsedPlan +} catch { + case mce: MalformedCarbonCommandException => +throw mce + case e: Throwable => +e.printStackTrace(System.err) Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379834185 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVExtension.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import org.apache.spark.sql.{SparkSession, SparkSessionExtensions, SQLConf} +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +import org.apache.carbondata.mv.rewrite.MVUdf +import org.apache.carbondata.mv.timeseries.TimeSeriesFunction + +// Materialized View extension for Apache Spark +// +// Following SQL command are added: +// 1. CREATE MATERIALIZED VIEW +// 2. DROP MATERIALIZED VIEW +// 3. SHOW MATERIALIZED VIEW +// 4. REFRESH MATERIALIZED VIEW +// +// Following optimizer rules are added: +// 1. Rewrite SQL statement by matching existing MV and +// select the lowest cost MV +// +class MVExtension extends (SparkSessionExtensions => Unit) { + + override def apply(extensions: SparkSessionExtensions): Unit = { +// MV parser +extensions.injectParser( + (sparkSession: SparkSession, parser: ParserInterface) => +new MVExtensionSqlParser(new SQLConf, sparkSession, parser)) + +// MV optimizer rules +extensions.injectPostHocResolutionRule( + (session: SparkSession) => OptimizerRule(session) ) + } +} + +case class OptimizerRule(session: SparkSession) extends Rule[LogicalPlan] { + self => + + var initialized = false + + override def apply(plan: LogicalPlan): LogicalPlan = { +if (!initialized) { + self.synchronized { +if (!initialized) { + initialized = true + + addMVUdf(session) + + val sessionState = session.sessionState + val field = sessionState.getClass.getDeclaredField("optimizer") + field.setAccessible(true) + field.set(sessionState, +new MVRules(session, sessionState.catalog, sessionState.optimizer)) +} + } +} +plan + } + + private def addMVUdf(sparkSession: SparkSession) = { +// added for handling MV table creation. when user will fire create ddl for Review comment: This is required, it is used for MV also. I will change the comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379834081 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/MatchMaker.scala ## @@ -17,7 +17,7 @@ package org.apache.carbondata.mv.rewrite -import org.apache.spark.Logging +import org.apache.spark.internal.Logging Review comment: This need to be changed, org.apache.spark.internal.Logging is the standard way to log, in spark project This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379834004 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVExtension.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import org.apache.spark.sql.{SparkSession, SparkSessionExtensions, SQLConf} +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +import org.apache.carbondata.mv.rewrite.MVUdf +import org.apache.carbondata.mv.timeseries.TimeSeriesFunction + +// Materialized View extension for Apache Spark +// +// Following SQL command are added: +// 1. CREATE MATERIALIZED VIEW +// 2. DROP MATERIALIZED VIEW +// 3. SHOW MATERIALIZED VIEW +// 4. REFRESH MATERIALIZED VIEW +// +// Following optimizer rules are added: +// 1. Rewrite SQL statement by matching existing MV and +// select the lowest cost MV +// +class MVExtension extends (SparkSessionExtensions => Unit) { + + override def apply(extensions: SparkSessionExtensions): Unit = { +// MV parser +extensions.injectParser( + (sparkSession: SparkSession, parser: ParserInterface) => +new MVExtensionSqlParser(new SQLConf, sparkSession, parser)) + +// MV optimizer rules +extensions.injectPostHocResolutionRule( + (session: SparkSession) => OptimizerRule(session) ) + } +} + +case class OptimizerRule(session: SparkSession) extends Rule[LogicalPlan] { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379833852 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVExtension.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension + +import org.apache.spark.sql.{SparkSession, SparkSessionExtensions, SQLConf} +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +import org.apache.carbondata.mv.rewrite.MVUdf +import org.apache.carbondata.mv.timeseries.TimeSeriesFunction + +// Materialized View extension for Apache Spark Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379833604 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVDataMapProvider.scala ## @@ -207,3 +207,7 @@ class MVDataMapProvider( override def supportRebuild(): Boolean = true } + +object MVDataMapProvider { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586585818 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2011/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586579921 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/308/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586577486 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2010/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586577379 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/307/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586575134 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2009/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379815554 ## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala ## @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with } } + test("test current none compressor on legacy store with snappy") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy") +createTable() +loadData() + + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none") +loadData() +checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16))) Review comment: So you want to change all the test cases in this class from select count(*) to *? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586569602 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/306/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
marchpure commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586568016 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services