I dont know much about Python style, but I think the point Wes made about
usability on the JIRA is pretty powerful. IMHO the number of methods on a
Spark DataFrame might not be much more compared to Pandas. Given that it
looks like users are okay with the possibility of collisions in Pandas I
On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
I dont know much about Python style, but I think the point Wes made about
usability on the JIRA is pretty powerful. IMHO the number of methods on a
Spark DataFrame might not be much more compared to Pandas.
Hi all,
In PySpark, a DataFrame column can be referenced using df[abcd]
(__getitem__) and df.abcd (__getattr__). There is a discussion on
SPARK-7035 on compatibility issues with the __getattr__ approach, and
I want to collect more inputs on this.
Basically, if in the future we introduce a new
Is there a foolproof way to access methods exclusively (instead of picking
between columns and methods at runtime)? Here are two ideas, neither of
which seems particularly Pythonic
- pyspark.sql.methods(df).name()
- df.__methods__.name()
Punya
On Fri, May 8, 2015 at 10:06 AM Nicholas
And a link to SPARK-7035
https://issues.apache.org/jira/browse/SPARK-7035 (which
Xiangrui mentioned in his initial email) for the lazy.
On Fri, May 8, 2015 at 3:41 AM Xiangrui Meng men...@gmail.com wrote:
On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote: