Packaging Java + Python library

2015-04-13 Thread Punya Biswal
Dear Spark users, My team is working on a small library that builds on PySpark and is organized like PySpark as well -- it has a JVM component (that runs in the Spark driver and executor) and a Python component (that runs in the PySpark driver and executor processes). What's a good approach

Spark profiler

2014-05-01 Thread Punya Biswal
Hi all, I am thinking of starting work on a profiler for Spark clusters. The current idea is that it would collect jstacks from executor nodes and put them into a central index (either a database or elasticsearch), and it would present them to people in a UI that would let people slice and dice

Re: Separating classloader management from SparkContexts

2014-03-19 Thread Punya Biswal
.addJar(extras-v2.jar) print(sc2.filter(/* fn that depends on jar */).count) } ... even if classes in extras-v1.jar and extras-v2.jar have name collisions. Punya From: Punya Biswal pbis...@palantir.com Reply-To: user@spark.apache.org Date: Sunday, March 16, 2014 at 11:09 AM To: user

Maven repo for Spark pre-built with CDH4?

2014-03-18 Thread Punya Biswal
Hi all, The Maven central repo contains an artifact for spark 0.9.0 built with unmodified Hadoop, and the Cloudera repo contains an artifact for spark 0.9.0 built with CDH 5 beta. Is there a repo that contains spark-core built against a non-beta version of CDH (such as 4.4.0)? Punya