Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Shivaram Venkataraman
I dont know much about Python style, but I think the point Wes made about usability on the JIRA is pretty powerful. IMHO the number of methods on a Spark DataFrame might not be much more compared to Pandas. Given that it looks like users are okay with the possibility of collisions in Pandas I

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Xiangrui Meng
On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: I dont know much about Python style, but I think the point Wes made about usability on the JIRA is pretty powerful. IMHO the number of methods on a Spark DataFrame might not be much more compared to Pandas.

Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Xiangrui Meng
Hi all, In PySpark, a DataFrame column can be referenced using df[abcd] (__getitem__) and df.abcd (__getattr__). There is a discussion on SPARK-7035 on compatibility issues with the __getattr__ approach, and I want to collect more inputs on this. Basically, if in the future we introduce a new

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Punyashloka Biswal
Is there a foolproof way to access methods exclusively (instead of picking between columns and methods at runtime)? Here are two ideas, neither of which seems particularly Pythonic - pyspark.sql.methods(df).name() - df.__methods__.name() Punya On Fri, May 8, 2015 at 10:06 AM Nicholas

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Nicholas Chammas
And a link to SPARK-7035 https://issues.apache.org/jira/browse/SPARK-7035 (which Xiangrui mentioned in his initial email) for the lazy. On Fri, May 8, 2015 at 3:41 AM Xiangrui Meng men...@gmail.com wrote: On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: