I installed PIO on one of hdp nodes. On Wed, May 30, 2018 at 10:25 PM, Miller, Clifford < clifford.mil...@phoenix-opsgroup.com> wrote:
> Are you installing PIO on a client node created by HDP or something else? > > > > On Wed, May 30, 2018 at 2:25 PM, suyash kharade <suyash.khar...@gmail.com> > wrote: > >> I am using hdp 2.6.4 >> >> On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford < >> clifford.mil...@phoenix-opsgroup.com> wrote: >> >>> That's the command that I'm using but it gives me the exception that I >>> listed in the previous email. I've installed a Spark standalone cluster >>> and am using that for training for now but would like to use Spark on YARN >>> eventually. >>> >>> Are you using HDP? If so, what version of HDP are you using? I'm using >>> *HDP-2.6.2.14.* >>> >>> >>> >>> On Tue, May 29, 2018 at 8:55 PM, suyash kharade < >>> suyash.khar...@gmail.com> wrote: >>> >>>> I use 'pio train -- --master yarn' >>>> It works for me to train universal recommender >>>> >>>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford < >>>> clifford.mil...@phoenix-opsgroup.com> wrote: >>>> >>>>> To add more details to this. When I attempt to execute my training >>>>> job using the command 'pio train -- --master yarn' I get the exception >>>>> that >>>>> I've included below. Can anyone tell me how to correctly submit the >>>>> training job or what setting I need to change to make this work. I've >>>>> made >>>>> not custom code changes and am simply using PIO 0.12.1 with the >>>>> SimilarProduct Recommender. >>>>> >>>>> >>>>> >>>>> [ERROR] [SparkContext] Error initializing SparkContext. >>>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040 >>>>> } >>>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to >>>>> request executors before the AM has registered! >>>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running >>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 >>>>> at org.apache.spark.deploy.yarn.Y >>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya >>>>> rnSparkHadoopUtil.scala:154) >>>>> at org.apache.spark.deploy.yarn.Y >>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya >>>>> rnSparkHadoopUtil.scala:152) >>>>> at scala.collection.IndexedSeqOpt >>>>> imized$class.foreach(IndexedSeqOptimized.scala:33) >>>>> at scala.collection.mutable.Array >>>>> Ops$ofRef.foreach(ArrayOps.scala:186) >>>>> at org.apache.spark.deploy.yarn.Y >>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti >>>>> l.scala:152) >>>>> at org.apache.spark.deploy.yarn.C >>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819) >>>>> at org.apache.spark.deploy.yarn.C >>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817) >>>>> at scala.Option.foreach(Option.scala:257) >>>>> at org.apache.spark.deploy.yarn.C >>>>> lient.setupLaunchEnv(Client.scala:817) >>>>> at org.apache.spark.deploy.yarn.C >>>>> lient.createContainerLaunchContext(Client.scala:911) >>>>> at org.apache.spark.deploy.yarn.C >>>>> lient.submitApplication(Client.scala:172) >>>>> at org.apache.spark.scheduler.clu >>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac >>>>> kend.scala:56) >>>>> at org.apache.spark.scheduler.Tas >>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156) >>>>> at org.apache.spark.SparkContext. >>>>> <init>(SparkContext.scala:509) >>>>> at org.apache.predictionio.workfl >>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45) >>>>> at org.apache.predictionio.workfl >>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) >>>>> at org.apache.predictionio.workfl >>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251) >>>>> at org.apache.predictionio.workfl >>>>> ow.CreateWorkflow.main(CreateWorkflow.scala) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at sun.reflect.NativeMethodAccess >>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>> at sun.reflect.DelegatingMethodAc >>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>> at org.apache.spark.deploy.SparkS >>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub >>>>> mit.scala:751) >>>>> at org.apache.spark.deploy.SparkS >>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187) >>>>> at org.apache.spark.deploy.SparkS >>>>> ubmit$.submit(SparkSubmit.scala:212) >>>>> at org.apache.spark.deploy.SparkS >>>>> ubmit$.main(SparkSubmit.scala:126) >>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford < >>>>> clifford.mil...@phoenix-opsgroup.com> wrote: >>>>> >>>>>> So updating the version in the RELEASE file to 2.1.1 fixed the >>>>>> version detection problem but I'm still not able to submit Spark jobs >>>>>> unless they are strictly local. How are you submitting to the HDP Spark? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> --Cliff. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade < >>>>>> suyash.khar...@gmail.com> wrote: >>>>>> >>>>>>> Hi Miller, >>>>>>> I faced same issue. >>>>>>> It is giving error as release file has '-' in version >>>>>>> Insert simple version in release file something like 2.6. >>>>>>> >>>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford < >>>>>>> clifford.mil...@phoenix-opsgroup.com> wrote: >>>>>>> >>>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN. As >>>>>>>> part of that installation I created some HDP (Ambari) managed clients. >>>>>>>> I >>>>>>>> installed PIO on one of these clients and configured PIO to use the HDP >>>>>>>> installed Hadoop, HBase, and Spark. When I run the command 'pio >>>>>>>> eventserver &', I get the following error.* >>>>>>>> >>>>>>>> #### >>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [: >>>>>>>> 2.2.6.2.14-5: integer expression expected >>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[: >>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is >>>>>>>> ".2.6.2.14-5") >>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[: >>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is >>>>>>>> ".2.6.2.14-5") >>>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at >>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ which does not meet the minimum version >>>>>>>> requirement of 1.3.0. >>>>>>>> Aborting. >>>>>>>> >>>>>>>> #### >>>>>>>> >>>>>>>> *If I then go to /usr/hdp/2.6.2.14-5/spark2/ and replace the >>>>>>>> RELEASE with an empty file, I can then start the Eventserver, which >>>>>>>> gives >>>>>>>> me the following message:* >>>>>>>> >>>>>>>> ### >>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is >>>>>>>> a known problem with certain vendors (e.g. Cloudera). Please make sure >>>>>>>> you >>>>>>>> are using at least 1.3.0. >>>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070 >>>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature >>>>>>>> cannot be used because libhadoop cannot be loaded. >>>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070 >>>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready. >>>>>>>> #### >>>>>>>> >>>>>>>> *I can then send events to the Eventserver. After sending the >>>>>>>> events listed in the SimilarProduct Recommender example I am unable to >>>>>>>> train. Using the cluster. If I use 'pio train' then it successfully >>>>>>>> trains locally. If I atttempt to use the command "pio train -- >>>>>>>> --master >>>>>>>> yarn" then I get the following:* >>>>>>>> >>>>>>>> ####### >>>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: >>>>>>>> 1 >>>>>>>> at org.apache.spark.deploy.yarn.Y >>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya >>>>>>>> rnSparkHadoopUtil.scala:154) >>>>>>>> at org.apache.spark.deploy.yarn.Y >>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya >>>>>>>> rnSparkHadoopUtil.scala:152) >>>>>>>> at scala.collection.IndexedSeqOpt >>>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33) >>>>>>>> at scala.collection.mutable.Array >>>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186) >>>>>>>> at org.apache.spark.deploy.yarn.Y >>>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti >>>>>>>> l.scala:152) >>>>>>>> at org.apache.spark.deploy.yarn.C >>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819) >>>>>>>> at org.apache.spark.deploy.yarn.C >>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817) >>>>>>>> at scala.Option.foreach(Option.scala:257) >>>>>>>> at org.apache.spark.deploy.yarn.C >>>>>>>> lient.setupLaunchEnv(Client.scala:817) >>>>>>>> at org.apache.spark.deploy.yarn.C >>>>>>>> lient.createContainerLaunchContext(Client.scala:911) >>>>>>>> at org.apache.spark.deploy.yarn.C >>>>>>>> lient.submitApplication(Client.scala:172) >>>>>>>> at org.apache.spark.scheduler.clu >>>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac >>>>>>>> kend.scala:56) >>>>>>>> at org.apache.spark.scheduler.Tas >>>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156) >>>>>>>> at org.apache.spark.SparkContext. >>>>>>>> <init>(SparkContext.scala:509) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>> Method) >>>>>>>> at sun.reflect.NativeMethodAccess >>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>>> at sun.reflect.DelegatingMethodAc >>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub >>>>>>>> mit.scala:751) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.submit(SparkSubmit.scala:212) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.main(SparkSubmit.scala:126) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit.main(SparkSubmit.scala) >>>>>>>> >>>>>>>> ######## >>>>>>>> >>>>>>>> *What is the correct way to get PIO to use the YARN based Spark for >>>>>>>> training?* >>>>>>>> >>>>>>>> *Thanks,* >>>>>>>> >>>>>>>> *--Cliff.* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Suyash K >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Suyash K >>>> >>> >>> >>> >>> >>> >> >> >> -- >> Regards, >> Suyash K >> > > > > -- Regards, Suyash K