maryannxue commented on a change in pull request #24580: [SPARK-27674][SQL] the 
hint should not be dropped after cache lookup
URL: https://github.com/apache/spark/pull/24580#discussion_r283468838
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
 ##########
 @@ -212,17 +213,18 @@ class CacheManager extends Logging {
   def useCachedData(plan: LogicalPlan): LogicalPlan = {
     val newPlan = plan transformDown {
       case command: IgnoreCachedData => command
-      // Do not lookup the cache by hint node. Hint node is special, we should 
ignore it when
-      // canonicalizing plans, so that plans which are same except hint can 
hit the same cache.
-      // However, we also want to keep the hint info after cache lookup. Here 
we skip the hint
-      // node, so that the returned caching plan won't replace the hint node 
and drop the hint info
-      // from the original plan.
-      case hint: ResolvedHint => hint
 
       case currentFragment =>
-        lookupCachedData(currentFragment)
-          .map(_.cachedRepresentation.withOutput(currentFragment.output))
-          .getOrElse(currentFragment)
+        lookupCachedData(currentFragment).map { cached =>
+          // After cache lookup, we should still keep the hints from the input 
plan.
+          val hints = 
EliminateResolvedHint.extractHintsFromPlan(currentFragment)._2
+          val cachedPlan = 
cached.cachedRepresentation.withOutput(currentFragment.output)
+          // The returned hint list is in top-down order. We should reverse it 
so that the top hint
+          // is still in the top node.
+          hints.reverse.foldLeft[LogicalPlan](cachedPlan) { case (p, hint) =>
 
 Review comment:
   We don't need to drop them, right? Hints are transparent in 
canonicalization. But I agree the inner hints don't matter, coz they will be 
replaced with a leaf node anyway.
   
   I'm wondering though, can we change the `lookupCachedData` instead? like:
   ```
   def lookupCachedData(plan: LogicalPlan): Option[CachedData] = plan match {
     case ResolvedHint(child, hints) => lookupCachedData(child).map(p => 
ResolvedHint(p, hints))
     case _ => cachedData.find(cd => plan.sameResult(cd.plan))
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to