Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154845669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -193,38 +195,68 @@ case class InMemoryTableScanExec( private val inMemoryPartitionPruningEnabled = sqlContext.conf.inMemoryPartitionPruning + private def doFilterCachedBatches( + rdd: RDD[CachedBatch], + partitionStatsSchema: Seq[AttributeReference]): RDD[CachedBatch] = { + val schemaIndex = partitionStatsSchema.zipWithIndex + rdd.mapPartitionsWithIndex { + case (partitionIndex, cachedBatches) => + if (inMemoryPartitionPruningEnabled) { + cachedBatches.filter { cachedBatch => + val partitionFilter = newPredicate( + partitionFilters.reduceOption(And).getOrElse(Literal(true)), + partitionStatsSchema) + partitionFilter.initialize(partitionIndex) + if (!partitionFilter.eval(cachedBatch.stats)) { --- End diff -- I might not understand your proposal well...are you trying to simplify the logic in https://github.com/apache/spark/pull/19810/files#diff-5fc188468d3066580ea9a766114b8f1dR74? it would make the code simpler but degrade pruning effect here,
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org