[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration
CarbonDataQA1 commented on pull request #3913: URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687335279 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2239/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration
CarbonDataQA1 commented on pull request #3913: URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687331617 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3979/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687324981 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2238/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687324192 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3978/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3913: [WIP] Improve partition purining perfromance in presto carbon integration
ajantha-bhat opened a new pull request #3913: URL: https://github.com/apache/carbondata/pull/3913 ### Why is this PR needed? a) For 200K segments table in cloud, presto partition query was taking more than 5 hours. the reason is it was reading all segment files for partition pruning. Now it is less than a minute ! ### What changes were proposed in this PR? a) HiveTableHandle already have partition spec, matching for the filters (it has queried metastore to get all partitions and pruned it). So, create partitionSpec based on that. Also handled for both prestodb and prestosql b) #3885 , broke prestodb compilation, only prestosql is compiled. c) #3887, also didn't handled prestodb ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No [Need to add spark support and create better UT for presto, TODO] verified manually This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687270749 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3977/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687269685 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687269650 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2237/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687265603 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2236/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687265069 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3976/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687140085 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3975/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically
CarbonDataQA1 commented on pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687139428 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2235/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] maheshrajus opened a new pull request #3912: [WIP] Global sort partitions should be determined dynamically
maheshrajus opened a new pull request #3912: URL: https://github.com/apache/carbondata/pull/3912 ### Why is this PR needed? [WIP] Global sort partitions should be determined dynamically ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-687014802 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2234/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
akashrn5 commented on a change in pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#discussion_r483475663 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -2105,7 +2105,7 @@ public int getMaxSIRepairLimit(String dbName, String tableName) { // Check if user has enabled/disabled the use of property for the current db and table using // the set command String thresholdValue = getSessionPropertyValue( -CarbonCommonConstants.CARBON_LOAD_SI_REPAIR + "." + dbName + "." + tableName); Review comment: this changed by mistake i think This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687009894 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2233/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-687009718 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3974/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687006962 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3973/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
QiangCai commented on a change in pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#discussion_r483458848 ## File path: integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URI +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.log4j.Logger +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTablePartition} + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache} +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.path.CarbonTablePath + +object PartitionCacheManager extends Cache[PartitionCacheKey, CacheablePartitionSpec] { + + private val CACHE = new CarbonLRUCache( +CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE, +CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT) + + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName) + + def get(identifier: PartitionCacheKey): CacheablePartitionSpec = { +val cacheablePartitionSpec = + CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec] +val tableStatusModifiedTime = FileFactory + .getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath)) + .getLastModifiedTime +if (cacheablePartitionSpec != null) { + if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) { +readPartitions(identifier, tableStatusModifiedTime) + } else { +cacheablePartitionSpec + } +} else { + readPartitions(identifier, tableStatusModifiedTime) +} + } + + override def getAll(keys: util.List[PartitionCacheKey]): + util.List[CacheablePartitionSpec] = { +keys.asScala.map(get).toList.asJava + } + + override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = { +CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec] + } + + override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = { +CACHE.remove(partitionCacheKey.tableId) + } + + private def readPartitions(identifier: PartitionCacheKey, tableStatusModifiedTime: Long) = { Review comment: please check query flow. it also use incremental loading index. here maybe can reuse the segment info cache. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [wip]Fix update and delete issue when multiple partition columns are present
CarbonDataQA1 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-686985522 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3972/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [wip]Fix update and delete issue when multiple partition columns are present
CarbonDataQA1 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-686984791 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2232/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
kunal642 commented on a change in pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#discussion_r483445661 ## File path: integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URI +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.log4j.Logger +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTablePartition} + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache} +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.path.CarbonTablePath + +object PartitionCacheManager extends Cache[PartitionCacheKey, CacheablePartitionSpec] { + + private val CACHE = new CarbonLRUCache( +CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE, +CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT) + + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName) + + def get(identifier: PartitionCacheKey): CacheablePartitionSpec = { +val cacheablePartitionSpec = + CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec] +val tableStatusModifiedTime = FileFactory + .getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath)) + .getLastModifiedTime +if (cacheablePartitionSpec != null) { + if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) { +readPartitions(identifier, tableStatusModifiedTime) + } else { +cacheablePartitionSpec + } +} else { + readPartitions(identifier, tableStatusModifiedTime) +} + } + + override def getAll(keys: util.List[PartitionCacheKey]): + util.List[CacheablePartitionSpec] = { +keys.asScala.map(get).toList.asJava + } + + override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = { +CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec] + } + + override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = { +CACHE.remove(partitionCacheKey.tableId) + } + + private def readPartitions(identifier: PartitionCacheKey, tableStatusModifiedTime: Long) = { Review comment: ok got it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
QiangCai commented on a change in pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#discussion_r483443382 ## File path: integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URI +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.log4j.Logger +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTablePartition} + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache} +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.path.CarbonTablePath + +object PartitionCacheManager extends Cache[PartitionCacheKey, CacheablePartitionSpec] { + + private val CACHE = new CarbonLRUCache( +CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE, +CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT) + + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName) + + def get(identifier: PartitionCacheKey): CacheablePartitionSpec = { +val cacheablePartitionSpec = + CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec] +val tableStatusModifiedTime = FileFactory + .getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath)) + .getLastModifiedTime +if (cacheablePartitionSpec != null) { + if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) { +readPartitions(identifier, tableStatusModifiedTime) + } else { +cacheablePartitionSpec + } +} else { + readPartitions(identifier, tableStatusModifiedTime) +} + } + + override def getAll(keys: util.List[PartitionCacheKey]): + util.List[CacheablePartitionSpec] = { +keys.asScala.map(get).toList.asJava + } + + override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = { +CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec] + } + + override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = { +CACHE.remove(partitionCacheKey.tableId) + } + + private def readPartitions(identifier: PartitionCacheKey, tableStatusModifiedTime: Long) = { Review comment: "readPartitions" will read all .segment files after loading. better to load the new .segment file only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on a change in pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#discussion_r483419204 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ## @@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation( projects: Seq[NamedExpression], filters: Array[Filter], partitions: Seq[PartitionSpec]): RDD[InternalRow] = { -val filterExpression: Option[Expression] = filters.flatMap { filter => +val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, carbonTable)).sortBy(_._2) +val filterExpression: Option[Expression] = reorderedFilter.flatMap { tuple => Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on a change in pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#discussion_r483419324 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ## @@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation( projects: Seq[NamedExpression], filters: Array[Filter], partitions: Seq[PartitionSpec]): RDD[InternalRow] = { -val filterExpression: Option[Expression] = filters.flatMap { filter => +val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, carbonTable)).sortBy(_._2) Review comment: handled in a little different way, please check again ## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ## @@ -373,6 +375,148 @@ object CarbonFilters { val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession) getPartitions(partitionFilters, sparkSession, carbonTable) } + + def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = { +val column = filter.references.map(carbonTable.getColumnByName) +if (column.isEmpty) { + -1 +} else { + if (column.head.isDimension) { +column.head.getOrdinal + } else { +column.head.getOrdinal + carbonTable.getAllDimensions.size() + } +} + } + + def collectSimilarExpressions(filter: Filter, table: CarbonTable): Seq[(Filter, Int)] = { +filter match { + case sources.And(left, right) => +collectSimilarExpressions(left, table) ++ collectSimilarExpressions(right, table) + case sources.Or(left, right) => collectSimilarExpressions(left, table) ++ + collectSimilarExpressions(right, table) + case others => Seq((others, getStorageOrdinal(others, table))) +} + } + + /** + * This method will reorder the filter based on the Storage Ordinal of the column references. + * + * Example1: + * And And + * Or And =>OrAnd + * col3 col1 col2 col1col1 col3col1 col2 + * + * **Mixed expression filter reordered locally, but wont be reordered globally.** + * + * Example2: + * And And + * And And => AndAnd + * col3 col1 col2 col1col1 col1col2 col3 + * + * OrOr + * Or Or =>OrOr + * col3 col1 col2 col1 col1 col1col2 col3 + * + * **Similar expression filters are reordered globally** + * + * @param filter the filter expression to be reordered + * @return The reordered filter with the current ordinal + */ + def reorderFilter(filter: Filter, table: CarbonTable): (Filter, Int) = { +val filterMap = mutable.HashMap[String, List[(Filter, Int)]]() +// If the filter size is one or the user has disabled reordering then no need to reorder. +if (filter.references.toSet.size == 1 || +!CarbonProperties.isFilterReorderingEnabled) { Review comment: agree, good catch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
kunal642 commented on a change in pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#discussion_r483419069 ## File path: integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URI +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.log4j.Logger +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTablePartition} + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache} +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.SegmentFileStore +import org.apache.carbondata.core.statusmanager.SegmentStatusManager +import org.apache.carbondata.core.util.path.CarbonTablePath + +object PartitionCacheManager extends Cache[PartitionCacheKey, CacheablePartitionSpec] { + + private val CACHE = new CarbonLRUCache( +CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE, +CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT) + + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName) + + def get(identifier: PartitionCacheKey): CacheablePartitionSpec = { +val cacheablePartitionSpec = + CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec] +val tableStatusModifiedTime = FileFactory + .getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath)) + .getLastModifiedTime +if (cacheablePartitionSpec != null) { + if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) { +readPartitions(identifier, tableStatusModifiedTime) + } else { +cacheablePartitionSpec + } +} else { + readPartitions(identifier, tableStatusModifiedTime) +} + } + + override def getAll(keys: util.List[PartitionCacheKey]): + util.List[CacheablePartitionSpec] = { +keys.asScala.map(get).toList.asJava + } + + override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = { +CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec] + } + + override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = { +CACHE.remove(partitionCacheKey.tableId) + } + + private def readPartitions(identifier: PartitionCacheKey, tableStatusModifiedTime: Long) = { Review comment: Do you mean caching of partitions after load? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
Indhumathi27 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-686944027 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3971) Session level dynamic properties for repair(carbon.load.si.repair and carbon.si.repair.limit) are not updated in https://github.com/apache/carbondata/blob/master/doc
Chetan Bhat created CARBONDATA-3971: --- Summary: Session level dynamic properties for repair(carbon.load.si.repair and carbon.si.repair.limit) are not updated in https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md Key: CARBONDATA-3971 URL: https://issues.apache.org/jira/browse/CARBONDATA-3971 Project: CarbonData Issue Type: Bug Components: docs Affects Versions: 2.1.0 Reporter: Chetan Bhat Session level dynamic properties for repair(carbon.load.si.repair and carbon.si.repair.limit) are not mentioned inĀ github link - https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md -- This message was sent by Atlassian Jira (v8.3.4#803005)