Re: PIO 0.12.1 with HDP Spark on YARN

suyash kharade Wed, 30 May 2018 20:43:11 -0700

I installed PIO on one of hdp nodes.

On Wed, May 30, 2018 at 10:25 PM, Miller, Clifford <
[email protected]> wrote:


> Are you installing PIO on a client node created by HDP or something else?
>
>
>
> On Wed, May 30, 2018 at 2:25 PM, suyash kharade <[email protected]>
> wrote:
>
>> I am using hdp 2.6.4
>>
>> On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
>> [email protected]> wrote:
>>
>>> That's the command that I'm using but it gives me the exception that I
>>> listed in the previous email.  I've installed a Spark standalone cluster
>>> and am using that for training for now but would like to use Spark on YARN
>>> eventually.
>>>
>>> Are you using HDP? If so, what version of HDP are you using?  I'm using
>>> *HDP-2.6.2.14.*
>>>
>>>
>>>
>>> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <
>>> [email protected]> wrote:
>>>
>>>> I use 'pio train -- --master yarn'
>>>> It works for me to train universal recommender
>>>>
>>>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>>>> [email protected]> wrote:
>>>>
>>>>> To add more details to this.  When I attempt to execute my training
>>>>> job using the command 'pio train -- --master yarn' I get the exception 
>>>>> that
>>>>> I've included below.  Can anyone tell me how to correctly submit the
>>>>> training job or what setting I need to change to make this work.  I've 
>>>>> made
>>>>> not custom code changes and am simply using PIO 0.12.1 with the
>>>>> SimilarProduct Recommender.
>>>>>
>>>>>
>>>>>
>>>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040
>>>>> }
>>>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>>>> request executors before the AM has registered!
>>>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:154)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:152)
>>>>>         at scala.collection.IndexedSeqOpt
>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>         at scala.collection.mutable.Array
>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>> l.scala:152)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.submitApplication(Client.scala:172)
>>>>>         at org.apache.spark.scheduler.clu
>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>> kend.scala:56)
>>>>>         at org.apache.spark.scheduler.Tas
>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>         at org.apache.spark.SparkContext.
>>>>> <init>(SparkContext.scala:509)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>> mit.scala:751)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> So updating the version in the RELEASE file to 2.1.1 fixed the
>>>>>> version detection problem but I'm still not able to submit Spark jobs
>>>>>> unless they are strictly local.  How are you submitting to the HDP Spark?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --Cliff.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Miller,
>>>>>>>     I faced same issue.
>>>>>>>     It is giving error as release file has '-' in version
>>>>>>>     Insert simple version in release file something like 2.6.
>>>>>>>
>>>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>>>> part of that installation I created some HDP (Ambari) managed clients. 
>>>>>>>>  I
>>>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>>>> eventserver &', I get the following error.*
>>>>>>>>
>>>>>>>> ####
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>>> ".2.6.2.14-5")
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>>> ".2.6.2.14-5")
>>>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at
>>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ which does not meet the minimum version
>>>>>>>> requirement of 1.3.0.
>>>>>>>> Aborting.
>>>>>>>>
>>>>>>>> ####
>>>>>>>>
>>>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the
>>>>>>>> RELEASE with an empty file, I can then start the Eventserver, which 
>>>>>>>> gives
>>>>>>>> me the following message:*
>>>>>>>>
>>>>>>>> ###
>>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is
>>>>>>>> a known problem with certain vendors (e.g. Cloudera). Please make sure 
>>>>>>>> you
>>>>>>>> are using at least 1.3.0.
>>>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>>>> ####
>>>>>>>>
>>>>>>>> *I can then send events to the Eventserver.  After sending the
>>>>>>>> events listed in the SimilarProduct Recommender example I am unable to
>>>>>>>> train.  Using the cluster.  If I use 'pio train' then it successfully
>>>>>>>> trains locally.  If I atttempt to use the command "pio train -- 
>>>>>>>> --master
>>>>>>>> yarn" then I get the following:*
>>>>>>>>
>>>>>>>> #######
>>>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>> 1
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>>>         at scala.collection.mutable.Array
>>>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>>>> l.scala:152)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>>>         at org.apache.spark.scheduler.clu
>>>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>>>> kend.scala:56)
>>>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>>>         at org.apache.spark.SparkContext.
>>>>>>>> <init>(SparkContext.scala:509)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>>>> mit.scala:751)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>>>
>>>>>>>> ########
>>>>>>>>
>>>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>>>> training?*
>>>>>>>>
>>>>>>>> *Thanks,*
>>>>>>>>
>>>>>>>> *--Cliff.*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Suyash K
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suyash K
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Reply via email to