Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

DB Tsai Mon, 07 Jul 2014 18:02:11 -0700

Actually, the one needed to install the jar to each individual node is
standalone mode which works for both MR1 and MR2. Cloudera and
Hortonworks currently support spark in this way as far as I know.


For both yarn-cluster or yarn-client, Spark will distribute the jars
through distributed cache and each executor can find the jars there.

On Jul 7, 2014 6:23 AM, "Chester @work" <ches...@alpinenow.com> wrote:
>
> In Yarn cluster mode, you can either have spark on all the cluster nodes or 
> supply the spark jar yourself. In the 2nd case, you don't need install spark 
> on cluster at all. As you supply the spark assembly as we as your app jar 
> together.
>
> I hope this make it clear
>
> Chester
>
> Sent from my iPhone
>
> On Jul 7, 2014, at 5:05 AM, Konstantin Kudryavtsev 
> <kudryavtsev.konstan...@gmail.com> wrote:
>
> thank you Krishna!
>
> Could you please explain why do I need install spark on each node if Spark 
> official site said: If you have a Hadoop 2 cluster, you can run Spark without 
> any installation needed
>
> I have HDP 2 (YARN) and that's why I hope I don't need to install spark on 
> each node
>
> Thank you,
> Konstantin Kudryavtsev
>
>
> On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar <ksanka...@gmail.com> wrote:
>>
>> Konstantin,
>>
>> You need to install the hadoop rpms on all nodes. If it is Hadoop 2, the 
>> nodes would have hdfs & YARN.
>> Then you need to install Spark on all nodes. I haven't had experience with 
>> HDP, but the tech preview might have installed Spark as well.
>> In the end, one should have hdfs,yarn & spark installed on all the nodes.
>> After installations, check the web console to make sure hdfs, yarn & spark 
>> are running.
>> Then you are ready to start experimenting/developing spark applications.
>>
>> HTH.
>> Cheers
>> <k/>
>>
>>
>> On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>
>>> guys, I'm not talking about running spark on VM, I don have problem with it.
>>>
>>> I confused in the next:
>>> 1) Hortonworks describe installation process as RPMs on each node
>>> 2) spark home page said that everything I need is YARN
>>>
>>> And I'm in stucj with understanding what I need to do to run spark on yarn 
>>> (do I need RPMs installations or only build spark on edge node?)
>>>
>>>
>>> Thank you,
>>> Konstantin Kudryavtsev
>>>
>>>
>>> On Mon, Jul 7, 2014 at 4:34 AM, Robert James <srobertja...@gmail.com> wrote:
>>>>
>>>> I can say from my experience that getting Spark to work with Hadoop 2
>>>> is not for the beginner; after solving one problem after another
>>>> (dependencies, scripts, etc.), I went back to Hadoop 1.
>>>>
>>>> Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
>>>> why, but, given so, Hadoop 2 has too many bumps
>>>>
>>>> On 7/6/14, Marco Shaw <marco.s...@gmail.com> wrote:
>>>> > That is confusing based on the context you provided.
>>>> >
>>>> > This might take more time than I can spare to try to understand.
>>>> >
>>>> > For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
>>>> >
>>>> > Cloudera's CDH 5 express VM includes Spark, but the service isn't 
>>>> > running by
>>>> > default.
>>>> >
>>>> > I can't remember for MapR...
>>>> >
>>>> > Marco
>>>> >
>>>> >> On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
>>>> >> <kudryavtsev.konstan...@gmail.com> wrote:
>>>> >>
>>>> >> Marco,
>>>> >>
>>>> >> Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
>>>> >> can try
>>>> >> from
>>>> >> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
>>>> >>  HDP 2.1 means YARN, at the same time they propose ti install rpm
>>>> >>
>>>> >> On other hand, http://spark.apache.org/ said "
>>>> >> Integrated with Hadoop
>>>> >> Spark can run on Hadoop 2's YARN cluster manager, and can read any
>>>> >> existing Hadoop data.
>>>> >>
>>>> >> If you have a Hadoop 2 cluster, you can run Spark without any 
>>>> >> installation
>>>> >> needed. "
>>>> >>
>>>> >> And this is confusing for me... do I need rpm installation on not?...
>>>> >>
>>>> >>
>>>> >> Thank you,
>>>> >> Konstantin Kudryavtsev
>>>> >>
>>>> >>
>>>> >>> On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw <marco.s...@gmail.com>
>>>> >>> wrote:
>>>> >>> Can you provide links to the sections that are confusing?
>>>> >>>
>>>> >>> My understanding, the HDP1 binaries do not need YARN, while the HDP2
>>>> >>> binaries do.
>>>> >>>
>>>> >>> Now, you can also install Hortonworks Spark RPM...
>>>> >>>
>>>> >>> For production, in my opinion, RPMs are better for manageability.
>>>> >>>
>>>> >>>> On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
>>>> >>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>> >>>>
>>>> >>>> Hello, thanks for your message... I'm confused, Hortonworhs suggest
>>>> >>>> install spark rpm on each node, but on Spark main page said that yarn
>>>> >>>> enough and I don't need to install it... What the difference?
>>>> >>>>
>>>> >>>> sent from my HTC
>>>> >>>>
>>>> >>>>> On Jul 6, 2014 8:34 PM, "vs" <vinayshu...@gmail.com> wrote:
>>>> >>>>> Konstantin,
>>>> >>>>>
>>>> >>>>> HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can
>>>> >>>>> try
>>>> >>>>> from
>>>> >>>>> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
>>>> >>>>>
>>>> >>>>> Let me know if you see issues with the tech preview.
>>>> >>>>>
>>>> >>>>> "spark PI example on HDP 2.0
>>>> >>>>>
>>>> >>>>> I downloaded spark 1.0 pre-build from
>>>> >>>>> http://spark.apache.org/downloads.html
>>>> >>>>> (for HDP2)
>>>> >>>>> The run example from spark web-site:
>>>> >>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>>>> >>>>> --master
>>>> >>>>> yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 
>>>> >>>>> 2g
>>>> >>>>> --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
>>>> >>>>>
>>>> >>>>> I got error:
>>>> >>>>> Application application_1404470405736_0044 failed 3 times due to AM
>>>> >>>>> Container for appattempt_1404470405736_0044_000003 exited with
>>>> >>>>> exitCode: 1
>>>> >>>>> due to: Exception from container-launch:
>>>> >>>>> org.apache.hadoop.util.Shell$ExitCodeException:
>>>> >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>>>> >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:379)
>>>> >>>>> at
>>>> >>>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>>>> >>>>> at
>>>> >>>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>>>> >>>>> at
>>>> >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>>>> >>>>> at
>>>> >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>>>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> >>>>> at
>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> >>>>> at
>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> >>>>> at java.lang.Thread.run(Thread.java:744)
>>>> >>>>> .Failing this attempt.. Failing the application.
>>>> >>>>>
>>>> >>>>> Unknown/unsupported param List(--executor-memory, 2048,
>>>> >>>>> --executor-cores, 1,
>>>> >>>>> --num-executors, 3)
>>>> >>>>> Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
>>>> >>>>> Options:
>>>> >>>>>   --jar JAR_PATH       Path to your application's JAR file (required)
>>>> >>>>>   --class CLASS_NAME   Name of your application's main class
>>>> >>>>> (required)
>>>> >>>>> ...bla-bla-bla
>>>> >>>>> "
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> View this message in context:
>>>> >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
>>>> >>>>> Sent from the Apache Spark User List mailing list archive at
>>>> >>>>> Nabble.com.
>>>> >>
>>>> >
>>>
>>>
>>
>

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

Reply via email to