The problem I was having with HBase was a typo in my configuration. After correcting that and running 'pio eventserver &', I was able to submit events and have them stored into my remote HBase. I'm having issues with Spark and will open a separate thread for that.
Thanks for the help. --Cliff. On Fri, May 25, 2018 at 7:21 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > How are you starting the EventServer? You should not use pio-start-all > which assumes all services are local > > configurre pio-env.sh with your remote hbase > start es with `pio eventserver &` or some method where it won’t kill the > es when you log off like `nohup pio eventserver &` > this should not start a local hbase so you should have your remote one > running > Same for the remote Elasticsearch and HDFS, they should be in pio-env.sh > and already started > pio status should be fine with the remote HBase > > > From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com> > <clifford.mil...@phoenix-opsgroup.com> > Reply: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com> > <clifford.mil...@phoenix-opsgroup.com> > Date: May 25, 2018 at 10:16:01 AM > To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> > Cc: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org> > Subject: Re: PIO not using HBase cluster > > I'll keep you informed. However, I'm having issues getting past this. If > I have hbase installed with the clusters config files then it still does > not communicate with the cluster. It does start hbase but on the local PIO > server. If I ONLY have the hbase config (which worked in version 0.10.0) > then pio-start-all gives the following message. > > #### > pio-start-all > Starting Elasticsearch... > Starting HBase... > /home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65: > /home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No > such file or directory > Waiting 10 seconds for Storage Repositories to fully initialize... > Starting PredictionIO Event Server... > ######## > > "pio status" then returns: > > #### > pio status > [INFO] [Management$] Inspecting PredictionIO... > [INFO] [Management$] PredictionIO 0.12.1 is installed at > /home/centos/PredictionIO-0.12.1 > [INFO] [Management$] Inspecting Apache Spark... > [INFO] [Management$] Apache Spark is installed at > /home/centos/PredictionIO-0.12.1/vendors/spark > [INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum > requirement of 1.3.0) > [INFO] [Management$] Inspecting storage backend connections... > [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)... > [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)... > [WARN] [DomainSocketFactory] The short-circuit local reads feature cannot > be used because libhadoop cannot be loaded. > [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)... > [ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts > [ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181, > baseZNode=/hbase Received unexpected KeeperException, re-throwing exception > [WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper > [ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble: > localhost). Please make sure that the configuration is pointing at the > correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so > if you have not configured HBase to use an external ZooKeeper, that means > your HBase is not started or configured properly. > [ERROR] [Storage$] Error initializing storage client for source HBASE. > org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to > ZooKeeper > at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable( > HBaseAdmin.java:2358) > at org.apache.predictionio.data.storage.hbase.StorageClient.< > init>(StorageClient.scala:53) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance( > NativeConstructorAccessorImpl.java:62) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.predictionio.data.storage.Storage$.getClient( > Storage.scala:252) > at org.apache.predictionio.data.storage.Storage$.org$apache$ > predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283) > at org.apache.predictionio.data.storage.Storage$$anonfun$ > sourcesToClientMeta$1.apply(Storage.scala:244) > at org.apache.predictionio.data.storage.Storage$$anonfun$ > sourcesToClientMeta$1.apply(Storage.scala:244) > at scala.collection.mutable.MapLike$class.getOrElseUpdate( > MapLike.scala:194) > at scala.collection.mutable.AbstractMap.getOrElseUpdate( > Map.scala:80) > at org.apache.predictionio.data.storage.Storage$. > sourcesToClientMeta(Storage.scala:244) > at org.apache.predictionio.data.storage.Storage$. > getDataObject(Storage.scala:315) > at org.apache.predictionio.data.storage.Storage$. > getDataObjectFromRepo(Storage.scala:300) > at org.apache.predictionio.data.storage.Storage$.getLEvents( > Storage.scala:448) > at org.apache.predictionio.data.storage.Storage$. > verifyAllDataObjects(Storage.scala:384) > at org.apache.predictionio.tools.commands.Management$.status( > Management.scala:156) > at org.apache.predictionio.tools.console.Pio$.status(Pio.scala: > 155) > at org.apache.predictionio.tools.console.Console$$anonfun$main$ > 1.apply(Console.scala:721) > at org.apache.predictionio.tools.console.Console$$anonfun$main$ > 1.apply(Console.scala:656) > at scala.Option.map(Option.scala:146) > at org.apache.predictionio.tools.console.Console$.main(Console. > scala:656) > at org.apache.predictionio.tools.console.Console.main(Console. > scala) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase > at org.apache.zookeeper.KeeperException.create( > KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create( > KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) > at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable( > HBaseAdmin.java:2349) > ... 23 more > > > > [ERROR] [Management$] Unable to connect to all storage backends > successfully. > The following shows the error message from the storage backend. > > Data source HBASE was not properly initialized. > (org.apache.predictionio.data.storage.StorageClientException) > > Dumping configuration of initialized storage backend sources. > Please make sure they are correct. > > Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> > ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1- > 126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal, > TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME -> > /home/centos/PredictionIO-0.12.1/vendors/elasticsearch > Source Name: HBASE; Type: (error); Configuration: (error) > Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> > hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models > > #### > > > > On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> No, you need to have HBase installed, or at least the config installed on >> the PIO machine. The pio-env.sh defined servers will be configured cluster >> operations and will be started separately from PIO. PIO then will not start >> hbase and try to sommunicate only, not start it. But PIO still needs config >> for the client code that is in the pio assembly jar. >> >> Some services were not cleanly separated between client, master, and >> slave so complete installation is easiest though you can figure out the >> minimum with experimentation and I think it is just the conf directory. >> >> BTW we have a similar setup and are having trouble with the Spark >> training phase getting a `classDefNotFound: >> org.apache.hadoop.hbase.ProtobufUtil` >> so can you let us know how it goes? >> >> >> >> From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com> >> <clifford.mil...@phoenix-opsgroup.com> >> Reply: user@predictionio.apache.org <user@predictionio.apache.org> >> <user@predictionio.apache.org> >> Date: May 25, 2018 at 9:43:46 AM >> To: user@predictionio.apache.org <user@predictionio.apache.org> >> <user@predictionio.apache.org> >> Subject: PIO not using HBase cluster >> >> I'm attempting to use a remote cluster with PIO 0.12.1. When I run >> pio-start-all it starts the hbase locally and does not use the remote >> cluster as configured. I've copied the HBase and Hadoop conf files from >> the cluster and put them into the locally configured directories. I set >> this up in the past using a similar configuration but was using PIO >> 0.10.0. When doing this with this version I could start pio with only the >> hbase and hadoop conf present. This does not seem to be the case any >> longer. >> >> If I only put the cluster configs then it complains that it cannot find >> start-hbase.sh. If I put a hbase installation with cluster configs then it >> will start a local hbase and not use the remote cluster. >> >> Below is my PIO configuration >> >> ######## >> >> #!/usr/bin/env bash >> # >> # Safe config that will work if you expand your cluster later >> SPARK_HOME=$PIO_HOME/vendors/spark >> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch >> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf >> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf >> >> >> # Filesystem paths where PredictionIO uses as block storage. >> PIO_FS_BASEDIR=$HOME/.pio_store >> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines >> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp >> >> # PredictionIO Storage Configuration >> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta >> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH >> >> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event >> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE >> >> # Need to use HDFS here instead of LOCALFS to enable deploying to >> # machines without the local model >> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model >> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS >> >> # What store to use for what data >> # Elasticsearch Example >> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch >> # The next line should match the ES cluster.name in ES config >> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster >> >> # For clustered Elasticsearch (use one host/port if not clustered) >> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-136.us- >> gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1. >> compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal >> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300 >> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to >> # port 9200, change if appropriate but do not use the Transport Client >> port >> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200 >> >> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs >> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-we >> st-1.compute.internal:8020/models >> >> # HBase Source config >> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase >> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase >> >> # Hbase clustered config (use one host/port if not clustered) >> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-west-1. >> compute.internal,ip-10-0-1-209.us-gov-west-1.compute.inte >> rnal,ip-10-0-1-79.us-gov-west-1.compute.internal >> ~ >> >> > > >