The issue of UDFS which return structs being evaluated many times when
accessing the returned struct's fields sounds like
https://issues.apache.org/jira/browse/SPARK-17728; that issue mentions a
trick of using *array* and *explode* to prevent project collapsing.
On Thu, Apr 20, 2017 at 8:55 AM Rey
Doesn't common sub expression elimination address this issue as well?
On Thu, Apr 20, 2017 at 6:40 AM Herman van Hövell tot Westerflier <
hvanhov...@databricks.com> wrote:
> Hi Michael,
>
> This sounds like a good idea. Can you open a JIRA to track this?
>
> My initial feedback on your proposal w
Hi Michael,
This sounds like a good idea. Can you open a JIRA to track this?
My initial feedback on your proposal would be that you might want to
express the no_collapse at the expression level and not at the plan level.
HTH
On Thu, Apr 20, 2017 at 3:31 PM, Michael Styles
wrote:
> Hello,
>
>
Hello,
I am in the process of putting together a PR that introduces a new hint
called NO_COLLAPSE. This hint is essentially identical to Oracle's NO_MERGE
hint.
Let me first give an example of why I am proposing this.
df1 = sc.sql.createDataFrame([(1, "abc")], ["id", "user_agent"])
df2 = df1.wit