Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/19810#discussion_r154845669
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
---
@@ -193,38 +195,68 @@ case class InMemoryTableScanExec(
private val inMemoryPartitionPruningEnabled =
sqlContext.conf.inMemoryPartitionPruning
+ private def doFilterCachedBatches(
+ rdd: RDD[CachedBatch],
+ partitionStatsSchema: Seq[AttributeReference]): RDD[CachedBatch] = {
+ val schemaIndex = partitionStatsSchema.zipWithIndex
+ rdd.mapPartitionsWithIndex {
+ case (partitionIndex, cachedBatches) =>
+ if (inMemoryPartitionPruningEnabled) {
+ cachedBatches.filter { cachedBatch =>
+ val partitionFilter = newPredicate(
+ partitionFilters.reduceOption(And).getOrElse(Literal(true)),
+ partitionStatsSchema)
+ partitionFilter.initialize(partitionIndex)
+ if (!partitionFilter.eval(cachedBatch.stats)) {
--- End diff --
I might not understand your proposal well...are you trying to simplify the
logic in
https://github.com/apache/spark/pull/19810/files#diff-5fc188468d3066580ea9a766114b8f1dR74?
it would make the code simpler but degrade pruning effect here,
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]