Here it is :
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2991198123660769/823198936734135/866038034322120/latest.html
On Wed, Apr 11, 2018 at 10:55 AM, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:
> Hi Shiyuan,
> can you show u
Hi Shiyuan,
can you show us the output of ¨explain¨ over df (as a last step)?
On 11 April 2018 at 19:47, Shiyuan wrote:
> Variable name binding is a python thing, and Spark should not care how the
> variable is named. What matters is the dependency graph. Spark fails to
> handle this dependency
Variable name binding is a python thing, and Spark should not care how the
variable is named. What matters is the dependency graph. Spark fails to
handle this dependency graph correctly for which I am quite surprised: this
is just a simple combination of three very common sql operations.
On Tue,
Hi Shiyuan,
I do not know whether I am right, but I would prefer to avoid expressions
in Spark as:
df = <>
Regards,
Gourav Sengupta
On Tue, Apr 10, 2018 at 10:42 PM, Shiyuan wrote:
> Here is the pretty print of the physical plan which reveals some details
> about what causes the bug (see the
Here is the pretty print of the physical plan which reveals some details
about what causes the bug (see the lines highlighted in bold):
WithColumnRenamed() fails to update the dependency graph correctly:
'Resolved attribute(s) kk#144L missing from ID#118,LABEL#119,kk#96L,score#121
in operator !Pr
The spark warning about Row instead of Dict is not the culprit. The problem
still persists after I use Row instead of Dict to generate the dataframe.
Here is the expain() output regarding the reassignment of df as Gourav
suggests to run, They look the same except that the serial numbers
following
Hi,
what I am curious about is the reassignment of df.
Can you please look into the explain plan of df after the statement df =
df.join(df_t.select("ID"),["ID"])? And then compare with the explain plan
of df1 after the statement df1 = df.join(df_t.select("ID"),["ID"])?
Its late here, but I am ye
Hi Spark Users,
The following code snippet has an "attribute missing" error while the
attribute exists. This bug is triggered by a particular sequence of of
"select", "groupby" and "join". Note that if I take away the "select" in
#line B, the code runs without error. However, the "select