Re: Can't submit job to stand alone cluster

2015-12-29 Thread Greg Hill
On 12/28/15, 5:16 PM, "Daniel Valdivia" wrote: >Hi, > >I'm trying to submit a job to a small spark cluster running in stand >alone mode, however it seems like the jar file I'm submitting to the >cluster is "not found" by the workers nodes. > >I might have understood

spark-submit problems with --packages and --deploy-mode cluster

2015-12-11 Thread Greg Hill
I'm using Spark 1.5.0 with the standalone scheduler, and for the life of me I can't figure out why this isn't working. I have an application that works fine with --deploy-mode client that I'm trying to get to run in cluster mode so I can use --supervise. I ran into a few issues with my

Re: SPARK_SUBMIT_CLASSPATH question

2014-10-15 Thread Greg Hill
I guess I was a little light on the details in my haste. I'm using Spark on YARN, and this is in the driver process in yarn-client mode (most notably spark-shell). I've had to manually add a bunch of JARs that I had thought it would just pick up like everything else does: export

SPARK_SUBMIT_CLASSPATH question

2014-10-14 Thread Greg Hill
It seems to me that SPARK_SUBMIT_CLASSPATH does not follow the same ability as other tools to put wildcards in the paths you add. For some reason it doesn't pick up the classpath information from yarn-site.xml either, it seems, when running on YARN. I'm having to manually add every single

Re: Spark on YARN driver memory allocation bug?

2014-10-09 Thread Greg Hill
memory allocation bug? Hi Greg, It does seem like a bug. What is the particular exception message that you see? Andrew 2014-10-08 12:12 GMT-07:00 Greg Hill greg.h...@rackspace.commailto:greg.h...@rackspace.com: So, I think this is a bug, but I wanted to get some feedback before I reported

Spark on YARN driver memory allocation bug?

2014-10-08 Thread Greg Hill
So, I think this is a bug, but I wanted to get some feedback before I reported it as such. On Spark on YARN, 1.1.0, if you specify the --driver-memory value to be higher than the memory available on the client machine, Spark errors out due to failing to allocate enough memory. This happens

Re: Spark with YARN

2014-09-24 Thread Greg Hill
Do you have YARN_CONF_DIR set in your environment to point Spark to where your yarn configs are? Greg From: Raghuveer Chanda raghuveer.cha...@gmail.commailto:raghuveer.cha...@gmail.com Date: Wednesday, September 24, 2014 12:25 PM To:

Re: clarification for some spark on yarn configuration options

2014-09-23 Thread Greg Hill
Nishkam Ravi nr...@cloudera.commailto:nr...@cloudera.com: Maybe try --driver-memory if you are using spark-submit? Thanks, Nishkam On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill greg.h...@rackspace.commailto:greg.h...@rackspace.com wrote: Ah, I see. It turns out that my problem

recommended values for spark driver memory?

2014-09-23 Thread Greg Hill
I know the recommendation is it depends, but can people share what sort of memory allocations they're using for their driver processes? I'd like to get an idea of what the range looks like so we can provide sensible defaults without necessarily knowing what the jobs will look like. The

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
an environment variable you could set (SPARK_CLASSPATH), though this is now deprecated. Let me know if you have more questions about these options, -Andrew 2014-09-08 6:59 GMT-07:00 Greg Hill greg.h...@rackspace.commailto:greg.h...@rackspace.com: Is SPARK_EXECUTOR_INSTANCES the total number of workers

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
(SPARK_CLASSPATH), though this is now deprecated. Let me know if you have more questions about these options, -Andrew 2014-09-08 6:59 GMT-07:00 Greg Hill greg.h...@rackspace.commailto:greg.h...@rackspace.com: Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster or the workers per

Re: spark on yarn history server + hdfs permissions issue

2014-09-11 Thread Greg Hill
To answer my own question, in case someone else runs into this. The spark user needs to be in the same group on the namenode, and hdfs caches that information for it seems like at least an hour. Magically started working on its own. Greg From: Greg

spark on yarn history server + hdfs permissions issue

2014-09-09 Thread Greg Hill
I am running Spark on Yarn with the HDP 2.1 technical preview. I'm having issues getting the spark history server permissions to read the spark event logs from hdfs. Both sides are configured to write/read logs from: hdfs:///apps/spark/events The history server is running as user spark, the

Re: pyspark on yarn hdp hortonworks

2014-09-05 Thread Greg Hill
I'm running into a problem getting this working as well. I have spark-submit and spark-shell working fine, but pyspark in interactive mode can't seem to find the lzo jar: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found This is in

spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
My Spark history server won't start because it's trying to hit the namenode on 8021, but the namenode is on 8020 (the default). How can I configure the history server to use the right port? I can't find any relevant setting on the docs:

Re: spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
Nevermind, PEBKAC. I had put in the wrong port in the $LOG_DIR environment variable. Greg From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com Date: Wednesday, September 3, 2014 1:56 PM To: user@spark.apache.orgmailto:user@spark.apache.org

Spark on YARN question

2014-09-02 Thread Greg Hill
I'm working on setting up Spark on YARN using the HDP technical preview - http://hortonworks.com/kb/spark-1-0-1-technical-preview-hdp-2-1-3/ I have installed the Spark JARs on all the slave nodes and configured YARN to find the JARs. It seems like everything is working. Unless I'm

Re: Spark on YARN question

2014-09-02 Thread Greg Hill
Thanks. That sounds like how I was thinking it worked. I did have to install the JARs on the slave nodes for yarn-cluster mode to work, FWIW. It's probably just whichever node ends up spawning the application master that needs it, but it wasn't passed along from spark-submit. Greg From: