GitHub user xwu0226 opened a pull request:

    https://github.com/apache/spark/pull/9326

    [SPARK-11246] [SQL] Table cache for Parquet broken in 1.5

    The root cause is that when spark.sql.hive.convertMetastoreParquet=true by 
default, the cached InMemoryRelation of the ParquetRelation can not be looked 
up from the cachedData of CacheManager because the key comparison fails even 
though it is the same LogicalPlan representing the Subquery that wraps the 
ParquetRelation.  
    The solution in this PR is overriding the LogicalPlan.sameResult function 
in Subquery case class to eliminate subquery node first before directly 
comparing the child (ParquetRelation), which will find the key  to the cached 
InMemoryRelation. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xwu0226/spark spark-11246-commit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9326
    
----
commit 402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2
Author: xin Wu <[email protected]>
Date:   2015-10-28T06:26:19Z

    [SPARK-11246] Table cache for Parquet broken in 1.5

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to