Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Ted Yu
Hdfs class is in hadoop-hdfs-XX.jar Can you check the classpath to see if the above jar is there ? Please describe the command lines you used for building hadoop / Spark. Cheers On Thu, Feb 11, 2016 at 5:15 PM, Charlie Wright wrote: > I am having issues trying to run a test job on a built ver

Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Charlie Wright
I am having issues trying to run a test job on a built version of Spark with a custom Hadoop JAR. My custom hadoop version runs without issues and I can run jobs from a precompiled version of Spark (with Hadoop) no problem. However, whenever I try to run the same Spark example on the Spark versi

Spark Summit San Francisco 2016 call for presentations (CFP)

2016-02-11 Thread Reynold Xin
FYI, Call for presentations is now open for Spark Summit. The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by Febr

Operations on DataFrames with User Defined Types in pyspark

2016-02-11 Thread Franklyn D'souza
I'm using the UDT api to work with a custom Money datatype in dataframes. heres how i have it setup class StringUDT(UserDefinedType): @classmethod def sqlType(self): return StringType() @classmethod def module(cls): return cls.__module__ @classmethod def

Re: Making BatchPythonEvaluation actually Batch

2016-02-11 Thread Davies Liu
Had a quick look in your commit, I think that make sense, could you send a PR for that, then we can review it. In order to support 2), we need to change the serialized Python function from `f(iter)` to `f(x)`, process one row at a time (not a partition), then we can easily combine them together:

[build system] additional jenkins downtime next thursday

2016-02-11 Thread shane knapp
there's a big security patch coming out next week, and i'd like to upgrade our jenkins installation so that we're covered. it'll be around 8am, again, and i'll send out more details about the upgrade when i get them. thanks! shane

Re: [build system] brief downtime, 8am PST thursday feb 10th

2016-02-11 Thread shane knapp
this is now done. On Thu, Feb 11, 2016 at 7:35 AM, shane knapp wrote: > reminder: this is happening in ~30 minutes > > > On Wed, Feb 10, 2016 at 10:58 AM, shane knapp wrote: >> reminder: this is happening tomorrow morning. >> >> On Mon, Feb 8, 2016 at 9:27 AM, shane knapp wrote: >>> happy mon

Re: [build system] brief downtime, 8am PST thursday feb 10th

2016-02-11 Thread shane knapp
reminder: this is happening in ~30 minutes On Wed, Feb 10, 2016 at 10:58 AM, shane knapp wrote: > reminder: this is happening tomorrow morning. > > On Mon, Feb 8, 2016 at 9:27 AM, shane knapp wrote: >> happy monday! >> >> i will be bringing down jenkins and the workers thursday morning to >>

Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-11 Thread Prabhu Joseph
Steve, When ResourceManager is submitted with an application, AMLauncher creates the token YARN_AM_RM_TOKEN (token used between RM and AM). When ApplicationMaster is launched, it tries to contact RM for registering request, allocate request to receive containers, finish request. In all the

SPARK_WORKER_MEMORY in Spark Standalone - conf.getenv vs System.getenv?

2016-02-11 Thread Jacek Laskowski
Hi, Is there a reason to use conf to read SPARK_WORKER_MEMORY not System.getenv as for the other env vars? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/WorkerArguments.scala#L45 Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskow