viirya commented on pull request #29942:
URL: https://github.com/apache/spark/pull/29942#issuecomment-703851082


   > @viirya just to clarify, is it to avoid calling the same `from_json` 
multiple times? How does it relate to 
[SPARK-32939](https://issues.apache.org/jira/browse/SPARK-32939) and 
[SPARK-32943](https://issues.apache.org/jira/browse/SPARK-32943)?
   
   This patch targets specifically for a special pattern `CreateNamedStruct` + 
multiple `GetStructField` of same `JsonToStructs`, it could be produced by the 
optimizer or by users manually.
   
   Sometimes the query optimizer can optimize a query to have many duplicated 
expressions e.g. `JsonToStructs`. This is SPARK-32943 wants to fix. It targets 
a broader problem.
   
   For SPARK-32939, because it was not reported by me, some details I might not 
get from its description. We don't de-duplicate expressions in whole-stage 
codegen overall (but only in specified operator). If we disable whole-stage 
codegen, interpreted Project will de-duplicate expressions for some cases 
(`GenerateUnsafeProjection`), but not always (we could also fallback to 
`InterpretedUnsafeProjection` possibly). For specified expressions like 
`CaseWhen`, we have a chance to de-duplicate the condition expressions, if we 
want.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to