Hi,
I am debugging a program, and for some reason, a line calling the following is
failing:
df.filter("sum(OpenAccounts) > 5").show
It says it cannot find the column OpenAccounts, as if it was applying the sum()
function and looking for a column called like that, where there is not. This
works fine if I rename the column to something without parenthesis.
I can't reproduce this issue in Spark Shell (1.6.0), any ideas on how can I
analyze this? This is an aggregation result, with the default column names
afterwards.
PS: Workaround is to use toDF(cols) and rename all columns, but I am wondering
if toDF has any impact on the RDD structure behind (e.g. repartitioning, cache,
etc)
Appreciated,
Saif