Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Punyashloka Biswal
Would it make sense to isolate the use of deprecated warnings to a subset of projects? That way we could turn on more stringent checks for the other ones. Punya On Thu, Jul 23, 2015 at 12:08 AM Reynold Xin r...@databricks.com wrote: Hi all, FYI, we just merged a patch that fails a build if

Re: PySpark on PyPi

2015-07-22 Thread Punyashloka Biswal
I agree with everything Justin just said. An additional advantage of publishing PySpark's Python code in a standards-compliant way is the fact that we'll be able to declare transitive dependencies (Pandas, Py4J) in a way that pip can use. Contrast this with the current situation, where

Re: Python UDF performance at large scale

2015-06-24 Thread Punyashloka Biswal
Hi Davies, In general, do we expect people to use CPython only for heavyweight UDFs that invoke an external library? Are there any examples of using Jython, especially performance comparisons to Java/Scala and CPython? When using Jython, do you expect the driver to send code to the executor as a

Re: Spark 1.4.0 pyspark and pylint breaking

2015-05-26 Thread Punyashloka Biswal
Davies: Can we use relative imports (import .types) in the unit tests in order to disambiguate between the global and local module? Punya On Tue, May 26, 2015 at 3:09 PM Justin Uang justin.u...@gmail.com wrote: Thanks for clarifying! I don't understand python package and modules names that

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Punyashloka Biswal
of weeks :) Punya On Tue, May 19, 2015 at 12:39 PM Patrick Wendell pwend...@gmail.com wrote: Punya, Let me see if I can publish these under rc1 as well. In the future this will all be automated but current it's a somewhat manual task. - Patrick On Tue, May 19, 2015 at 9:32 AM, Punyashloka

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Punyashloka Biswal
Is there a foolproof way to access methods exclusively (instead of picking between columns and methods at runtime)? Here are two ideas, neither of which seems particularly Pythonic - pyspark.sql.methods(df).name() - df.__methods__.name() Punya On Fri, May 8, 2015 at 10:06 AM Nicholas

Re: [build infra] quick downtime again tomorrow morning for DOCKER

2015-05-08 Thread Punyashloka Biswal
Just curious: will docker allow new capabilities for the Spark build? (Where can I read more?) Punya On Fri, May 8, 2015 at 10:00 AM shane knapp skn...@berkeley.edu wrote: this is happening now. On Thu, May 7, 2015 at 3:40 PM, shane knapp skn...@berkeley.edu wrote: yes, docker. that

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Punyashloka Biswal
I'm in favor of ending support for Java 6. We should also articulate a policy on how long we want to support current and future versions of Java after Oracle declares them EOL (Java 7 will be in that bucket in a matter of days). Punya On Thu, Apr 30, 2015 at 1:18 PM shane knapp

Re: Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
send a PR for that code. But at this time I don't really have plans to look at the thrift server. On Mon, Apr 27, 2015 at 11:58 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Is there a plan for staying up-to-date with current (and future) versions of Hive

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish

Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
Dear Spark devs, Is there a plan for staying up-to-date with current (and future) versions of Hive? Spark currently supports version 0.13 (June 2014), but the latest version of Hive is 1.1.0 (March 2015). I don't see any Jira tickets about updating beyond 0.13, so I was wondering if this was

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
in a repo) is yet another approach we could take, though if we want to do that on the main Spark repo we'd need permission from Apache, which may be tough to get... On Mon, Apr 27, 2015 at 1:47 PM Punyashloka Biswal punya.bis...@gmail.com wrote: Nick, I like your idea of keeping it in a separate

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
the final design docs posted on JIRA. -Sandy On Fri, Apr 24, 2015 at 12:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: The Gradle dev team keep their design documents *checked into* their Git repository -- see https://github.com/gradle/gradle/blob/master/design-docs/build

Re: Graphical display of metrics on application UI page

2015-04-22 Thread Punyashloka Biswal
.js, you can possibly see it on the github. Here's a few of them https://github.com/apache/spark/pulls?utf8=%E2%9C%93q=d3​ Thanks Best Regards On Wed, Apr 22, 2015 at 8:08 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Would people find it useful to have

Graphical display of metrics on application UI page

2015-04-21 Thread Punyashloka Biswal
Dear Spark devs, Would people find it useful to have a graphical display of metrics (such as duration, GC time, etc) on the application UI page? Has anybody worked on this before? Punya

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Punyashloka Biswal
Reynold, thanks for this! At Palantir we're heavy users of the Java APIs and appreciate being able to stop hacking around with fake ClassTags :) Regarding this specific proposal, is the contract of RecordReader#get intended to be that it returns a fresh object each time? Or is it allowed to