Re: Spark cluster error
A source build did not fix the problem, has anyone run PIO 0.12.1 on a Spark cluster? The issue seems to be how to pass the correct code to Spark to connect to HBase: [ERROR] [TransportRequestHandler] Error while invoking RpcHandler#receive() for one-way message. [ERROR] [TransportRequestHandler] Error while invoking RpcHandler#receive() for one-way message. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.protobuf.ProtobufUtil at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:521) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:110) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:170) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)``` (edited) Now that we have these pluggable DBs did I miss something? This works with master=local but not with remote Spark master I’ve passed in the hbase-client in the --jars part of spark-submit, still fails, what am I missing? From: Pat FerrelReply: Pat Ferrel Date: May 23, 2018 at 8:57:32 AM To: user@predictionio.apache.org Subject: Spark cluster error Same CLI works using local Spark master, but fails using remote master for a cluster due to a missing class def for protobuf used in hbase. We are using the binary dist 0.12.1. Is this known? Is there a work around? We are now trying a source build in hope the class will be put in the assembly passed to Spark and the reasoning is that the executors don’t contain hbase classes but when you run a local executor it does, due to some local classpath. If the source built assembly does not have these classes, we will have the same problem. Namely how to get protobuf to the executors. Has anyone seen this?
Spark cluster error
Same CLI works using local Spark master, but fails using remote master for a cluster due to a missing class def for protobuf used in hbase. We are using the binary dist 0.12.1. Is this known? Is there a work around? We are now trying a source build in hope the class will be put in the assembly passed to Spark and the reasoning is that the executors don’t contain hbase classes but when you run a local executor it does, due to some local classpath. If the source built assembly does not have these classes, we will have the same problem. Namely how to get protobuf to the executors. Has anyone seen this?
RE: Problem with training in yarn cluster
I noticed the appName is different for DataSource (“shop _live”) and Algorithm (“shop_live”). AppNames must match. Also the eventNames are different, which should be ok but it’s still a question. Why input something that is not used? Given the meaning of the events, I’d use them all for recommendations but you may eventually want to create shopping cart and wishlist models separately since this will yield “complimentary purchases” and “things you may be missing” in the wishlist. From: Wojciech KowalskiReply: user@predictionio.apache.org Date: May 23, 2018 at 5:17:06 AM To: Ambuj Sharma , user@predictionio.apache.org Subject: RE: Problem with training in yarn cluster Hello again, After moving hbase to dataproc cluster from docker ( probs dns/hostname resolution issues ) no more hbase error but still training stops: [INFO] [RecommendationEngine$] _ _ __ __ _ /\ | | (_) | \/ | | / \ ___| |_ _ ___ _ __ | \ / | | / /\ \ / __| __| |/ _ \| '_ \| |\/| | | / \ (__| |_| | (_) | | | | | | | | /_/\_\___|\__|_|\___/|_| |_|_| |_|__| [INFO] [Engine] Extracting datasource params... [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. [INFO] [Engine] Datasource params: (,DataSourceParams(shop _live,List(purchase, basket-add, wishlist-add, view),None,None)) [INFO] [Engine] Extracting preparator params... [INFO] [Engine] Preparator params: (,Empty) [INFO] [Engine] Extracting serving params... [INFO] [Engine] Serving params: (,Empty) [INFO] [log] Logging initialized @10046ms [INFO] [Server] jetty-9.2.z-SNAPSHOT [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark} [INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349} [INFO] [Server] Started @10430ms [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark} [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered! [INFO] [DataSource]
RE: Problem with training in yarn cluster
Hello again, After moving hbase to dataproc cluster from docker ( probs dns/hostname resolution issues ) no more hbase error but still training stops: [INFO] [RecommendationEngine$] _ _ __ __ _ /\ | | (_) | \/ | | / \ ___| |_ _ ___ _ __ | \ / | | / /\ \ / __| __| |/ _ \| '_ \| |\/| | | / \ (__| |_| | (_) | | | | | | | | /_/\_\___|\__|_|\___/|_| |_|_| |_|__| [INFO] [Engine] Extracting datasource params... [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. [INFO] [Engine] Datasource params: (,DataSourceParams(shop _live,List(purchase, basket-add, wishlist-add, view),None,None)) [INFO] [Engine] Extracting preparator params... [INFO] [Engine] Preparator params: (,Empty) [INFO] [Engine] Extracting serving params... [INFO] [Engine] Serving params: (,Empty) [INFO] [log] Logging initialized @10046ms [INFO] [Server] jetty-9.2.z-SNAPSHOT [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark} [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark} [INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349} [INFO] [Server] Started @10430ms [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark} [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered! [INFO] [DataSource] ╔╗ ║ Init DataSource║ ║ ══ ║ ║ App name shop _live ║ ║ Event window None ║ ║ Event names List(purchase, basket-add, wishlist-add, view) ║ ║ Min events per user None ║ ╚╝ [INFO] [URAlgorithm] ╔╗ ║ Init URAlgorithm ║ ║ ══ ║ ║ App name shop_live ║ ║ ES index name oburindex║ ║ ES type name items║ ║ RecsModel all
RE: Problem with training in yarn cluster
Hi, Ok so full command now is: pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn errors stopped after removing –executor-cores 2 --driver-cores 2 I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2 But now I have problem with hbase :/ I have hbase host set: declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc" [INFO] [Engine$] EngineWorkflow.train [INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e [INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4 [INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7) [INFO] [Engine$] Data sanity check is on. [ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble. [ERROR] [Storage$] Error initializing storage client for source HBASE. org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366) at org.apache.predictionio.data.storage.hbase.StorageClient.(StorageClient.scala:53) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252) at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244) at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315) at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364) at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307) at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454) at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37) at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37) at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73) at com.actionml.DataSource.readTraining(DataSource.scala:76) at com.actionml.DataSource.readTraining(DataSource.scala:48) at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40) at org.apache.predictionio.controller.Engine$.train(Engine.scala:642) at org.apache.predictionio.controller.Engine.train(Engine.scala:176) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master at
Re: Problem with training in yarn cluster
Hi wojciech, I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train e.g., pio train -- --master yarn --deploy-mode client Thanks and Regards Ambuj Sharma Sunrise may late, But Morning is sure. Team ML Betaout On Wed, May 23, 2018 at 4:53 AM, Pat Ferrelwrote: > Actually you might search the archives for “yarn” because I don’t recall > how the setup works off hand. > > Archives here: https://lists.apache.org/list.html?user@ > predictionio.apache.org > > Also check the Spark Yarn requirements and remember that `pio train … -- > various Spark params` allows you to pass arbitrary Spark params exactly as > you would to spark-submit on the pio command line. The double dash > separates PIO and Spark params. > > > From: Pat Ferrel > Reply: user@predictionio.apache.org > > Date: May 22, 2018 at 4:07:38 PM > To: user@predictionio.apache.org > , Wojciech Kowalski > > > Subject: RE: Problem with training in yarn cluster > > What is the command line for `pio train …` Specifically are you using > yarn-cluster mode? This causes the driver code, which is a PIO process, to > be executed on an executor. Special setup is required for this. > > > From: Wojciech Kowalski > > Reply: user@predictionio.apache.org > > Date: May 22, 2018 at 2:28:43 PM > To: user@predictionio.apache.org > > Subject: RE: Problem with training in yarn cluster > > Hello, > > > > Actually I have another error in logs that is actually preventing train as > well: > > > > [INFO] [RecommendationEngine$] > > > >_ _ __ __ _ > > /\ | | (_) | \/ | | > > / \ ___| |_ _ ___ _ __ | \ / | | > >/ /\ \ / __| __| |/ _ \| '_ \| |\/| | | > > / \ (__| |_| | (_) | | | | | | | | > > /_/\_\___|\__|_|\___/|_| |_|_| |_|__| > > > > > > > > [INFO] [Engine] Extracting datasource params... > > [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. > > [INFO] [Engine] Datasource params: > (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, > view),None,None)) > > [INFO] [Engine] Extracting preparator params... > > [INFO] [Engine] Preparator params: (,Empty) > > [INFO] [Engine] Extracting serving params... > > [INFO] [Engine] Serving params: (,Empty) > > [INFO] [log] Logging initialized @6774ms > > [INFO] [Server] jetty-9.2.z-SNAPSHOT > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started >