Hi, Ok so full command now is: pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn
errors stopped after removing –executor-cores 2 --driver-cores 2 I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2 But now I have problem with hbase :/ I have hbase host set: declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc" [INFO] [Engine$] EngineWorkflow.train [INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e [INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4 [INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7) [INFO] [Engine$] Data sanity check is on. [ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble. [ERROR] [Storage$] Error initializing storage client for source HBASE. org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366) at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252) at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244) at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315) at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364) at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307) at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454) at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37) at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37) at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73) at com.actionml.DataSource.readTraining(DataSource.scala:76) at com.actionml.DataSource.readTraining(DataSource.scala:48) at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40) at org.apache.predictionio.controller.Engine$.train(Engine.scala:642) at org.apache.predictionio.controller.Engine.train(Engine.scala:176) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617) ... 36 more Caused by: java.net.UnknownHostException: unknown host: hbase-master at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) ... 41 more From: Ambuj Sharma Sent: 23 May 2018 08:59 To: user@predictionio.apache.org Cc: Wojciech Kowalski Subject: Re: Problem with training in yarn cluster Hi wojciech, I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train e.g., pio train -- --master yarn --deploy-mode client Thanks and Regards Ambuj Sharma Sunrise may late, But Morning is sure..... Team ML Betaout On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <p...@occamsmachete.com> wrote: Actually you might search the archives for “yarn” because I don’t recall how the setup works off hand. Archives here: https://lists.apache.org/list.html?user@predictionio.apache.org Also check the Spark Yarn requirements and remember that `pio train … -- various Spark params` allows you to pass arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params. From: Pat Ferrel <p...@occamsmachete.com> Reply: user@predictionio.apache.org <user@predictionio.apache.org> Date: May 22, 2018 at 4:07:38 PM To: user@predictionio.apache.org <user@predictionio.apache.org>, Wojciech Kowalski <wojci...@tomandco.co.uk> Subject: RE: Problem with training in yarn cluster What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this. From: Wojciech Kowalski <wojci...@tomandco.co.uk> Reply: user@predictionio.apache.org <user@predictionio.apache.org> Date: May 22, 2018 at 2:28:43 PM To: user@predictionio.apache.org <user@predictionio.apache.org> Subject: RE: Problem with training in yarn cluster Hello, Actually I have another error in logs that is actually preventing train as well: [INFO] [RecommendationEngine$] _ _ __ __ _ /\ | | (_) | \/ | | / \ ___| |_ _ ___ _ __ | \ / | | / /\ \ / __| __| |/ _ \| '_ \| |\/| | | / ____ \ (__| |_| | (_) | | | | | | | |____ /_/ \_\___|\__|_|\___/|_| |_|_| |_|______| [INFO] [Engine] Extracting datasource params... [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. [INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None)) [INFO] [Engine] Extracting preparator params... [INFO] [Engine] Preparator params: (,Empty) [INFO] [Engine] Extracting serving params... [INFO] [Engine] Serving params: (,Empty) [INFO] [log] Logging initialized @6774ms [INFO] [Server] jetty-9.2.z-SNAPSHOT [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark} [INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948} [INFO] [Server] Started @7040ms [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark} [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered! [ERROR] [ApplicationMaster] Uncaught exception: Thanks, Wojciech From: Wojciech Kowalski Sent: 22 May 2018 23:20 To: user@predictionio.apache.org Subject: Problem with training in yarn cluster Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train: log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /pio/pio.log (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102) at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738) at org.apache.spark.internal.Logging$class.log(Logging.scala:46) at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) setup: hbase Hadoop Hdfs Spark cluster with yarn Training in cluster mode I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ? Or any other advice ? Thanks, Wojciech