[GitHub] spark pull request: [SPARK-2405][SQL] Reusue same byte buffers whe...

marmbrus Tue, 08 Jul 2014 10:54:13 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1332#issuecomment-48375419
  
    @aarondav, `newInstance()` is used before we perform resolution to ensure 
that that all expression ids in a plan are unique.  Consider the case where you 
self join an `InMemoryRelation` with itself: we need to know which side of the 
join a given attribute is coming from, so we produce unique instances of the 
relation before resolving attributes.
    
    I thought about the possible concurrency issues, but they will only arise 
in edge cases (simultaneous self-join queries on a table that is cached, but 
not yet materialized?), and will only result in double caching, not correctness 
issues... so this patch is strictly better than what we had before I think.
    
    That said I guess we could fix it with a SyncVar probably...  I'll have to 
think about it some more.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2405][SQL] Reusue same byte buffers whe...

Reply via email to