Re: PIO 0.12.1 with HDP Spark on YARN

Miller, Clifford Wed, 30 May 2018 09:55:53 -0700

Are you installing PIO on a client node created by HDP or something else?



On Wed, May 30, 2018 at 2:25 PM, suyash kharade <[email protected]>
wrote:

> I am using hdp 2.6.4
>
> On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
> [email protected]> wrote:
>
>> That's the command that I'm using but it gives me the exception that I
>> listed in the previous email.  I've installed a Spark standalone cluster
>> and am using that for training for now but would like to use Spark on YARN
>> eventually.
>>
>> Are you using HDP? If so, what version of HDP are you using?  I'm using
>> *HDP-2.6.2.14.*
>>
>>
>>
>> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <[email protected]
>> > wrote:
>>
>>> I use 'pio train -- --master yarn'
>>> It works for me to train universal recommender
>>>
>>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>>> [email protected]> wrote:
>>>
>>>> To add more details to this.  When I attempt to execute my training job
>>>> using the command 'pio train -- --master yarn' I get the exception that
>>>> I've included below.  Can anyone tell me how to correctly submit the
>>>> training job or what setting I need to change to make this work.  I've made
>>>> not custom code changes and am simply using PIO 0.12.1 with the
>>>> SimilarProduct Recommender.
>>>>
>>>>
>>>>
>>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>>> request executors before the AM has registered!
>>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>>> qOptimized.scala:33)
>>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>>> la:186)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:819)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:817)
>>>>         at scala.Option.foreach(Option.scala:257)
>>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>>> ala:817)
>>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>>> text(Client.scala:911)
>>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>>> .scala:172)
>>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>>> ulerImpl.scala:156)
>>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>>> flowContext.scala:45)
>>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>>> Workflow.scala:59)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>> Workflow.scala:251)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>> orkflow.scala)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>> .scala:187)
>>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>>> a:212)
>>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>>> 126)
>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>>> [email protected]> wrote:
>>>>
>>>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>>>> detection problem but I'm still not able to submit Spark jobs unless they
>>>>> are strictly local.  How are you submitting to the HDP Spark?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --Cliff.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Miller,
>>>>>>     I faced same issue.
>>>>>>     It is giving error as release file has '-' in version
>>>>>>     Insert simple version in release file something like 2.6.
>>>>>>
>>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>>> part of that installation I created some HDP (Ambari) managed clients.  
>>>>>>> I
>>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>>> eventserver &', I get the following error.*
>>>>>>>
>>>>>>> ####
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>> ".2.6.2.14-5")
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>> ".2.6.2.14-5")
>>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at
>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ which does not meet the minimum version
>>>>>>> requirement of 1.3.0.
>>>>>>> Aborting.
>>>>>>>
>>>>>>> ####
>>>>>>>
>>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the
>>>>>>> RELEASE with an empty file, I can then start the Eventserver, which 
>>>>>>> gives
>>>>>>> me the following message:*
>>>>>>>
>>>>>>> ###
>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is
>>>>>>> a known problem with certain vendors (e.g. Cloudera). Please make sure 
>>>>>>> you
>>>>>>> are using at least 1.3.0.
>>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>>> ####
>>>>>>>
>>>>>>> *I can then send events to the Eventserver.  After sending the
>>>>>>> events listed in the SimilarProduct Recommender example I am unable to
>>>>>>> train.  Using the cluster.  If I use 'pio train' then it successfully
>>>>>>> trains locally.  If I atttempt to use the command "pio train -- --master
>>>>>>> yarn" then I get the following:*
>>>>>>>
>>>>>>> #######
>>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>>> 1
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>>         at scala.collection.mutable.Array
>>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>>> l.scala:152)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>>         at org.apache.spark.scheduler.clu
>>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>>> kend.scala:56)
>>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>>         at org.apache.spark.SparkContext.
>>>>>>> <init>(SparkContext.scala:509)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>>> mit.scala:751)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>>
>>>>>>> ########
>>>>>>>
>>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>>> training?*
>>>>>>>
>>>>>>> *Thanks,*
>>>>>>>
>>>>>>> *--Cliff.*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Suyash K
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Suyash K
>>>
>>
>>
>>
>>
>>
>
>
> --
> Regards,
> Suyash K
>

Re: PIO 0.12.1 with HDP Spark on YARN

Reply via email to