[GitHub] [spark] cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

GitBox Tue, 28 May 2019 21:38:30 -0700

cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable 
should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735
 
 
   ## What changes were proposed in this pull request?
   
   For simplicity, all `LambdaVariable`s are globally unique, to avoid any 
potential conflicts. However, this causes a perf problem: we can never hit 
codegen cache for encoder expressions that deal with collections (which means 
they contain `LambdaVariable`).
   
   To overcome this problem, `LambdaVariable` should have per-query unique IDs. 
This PR does 2 things:
   1. refactor `LambdaVariable` to carry an ID, so that it's easier to change 
the ID.
   2. add an optimizer rule to reassign `LambdaVariable` IDs, which are 
per-query unique.
   
   ## How was this patch tested?
   
   new tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

Reply via email to