Re: PIO not using HBase cluster

Miller, Clifford Sun, 27 May 2018 15:11:38 -0700

The problem I was having with HBase was a typo in my configuration.  After
correcting that and running 'pio eventserver &', I was able to submit
events and have them stored into my remote HBase.  I'm having issues with
Spark and will open a separate thread for that.


Thanks for the help.

--Cliff.


On Fri, May 25, 2018 at 7:21 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> How are you starting the EventServer? You should not use pio-start-all
> which assumes all services are local
>
> configurre pio-env.sh with your remote hbase
> start es with `pio eventserver &` or some method where it won’t kill the
> es when you log off like `nohup pio eventserver &`
> this should not start a local hbase so you should have your remote one
> running
> Same for the remote Elasticsearch and HDFS, they should be in pio-env.sh
> and already started
> pio status should be fine with the remote HBase
>
>
> From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com>
> <clifford.mil...@phoenix-opsgroup.com>
> Reply: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com>
> <clifford.mil...@phoenix-opsgroup.com>
> Date: May 25, 2018 at 10:16:01 AM
> To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com>
> Cc: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Subject:  Re: PIO not using HBase cluster
>
> I'll keep you informed.  However, I'm having issues getting past this.  If
> I have hbase installed with the clusters config files then it still does
> not communicate with the cluster.  It does start hbase but on the local PIO
> server.  If I ONLY have the hbase config (which worked in version 0.10.0)
> then pio-start-all gives the following message.
>
> ####
>  pio-start-all
> Starting Elasticsearch...
> Starting HBase...
> /home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65:
> /home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No
> such file or directory
> Waiting 10 seconds for Storage Repositories to fully initialize...
> Starting PredictionIO Event Server...
> ########
>
> "pio status" then returns:
>
> ####
>  pio status
> [INFO] [Management$] Inspecting PredictionIO...
> [INFO] [Management$] PredictionIO 0.12.1 is installed at
> /home/centos/PredictionIO-0.12.1
> [INFO] [Management$] Inspecting Apache Spark...
> [INFO] [Management$] Apache Spark is installed at
> /home/centos/PredictionIO-0.12.1/vendors/spark
> [INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum
> requirement of 1.3.0)
> [INFO] [Management$] Inspecting storage backend connections...
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
> [WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
> be used because libhadoop cannot be loaded.
> [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
> [ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
> [ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181,
> baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
> [WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
> [ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
> localhost). Please make sure that the configuration is pointing at the
> correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
> if you have not configured HBase to use an external ZooKeeper, that means
> your HBase is not started or configured properly.
> [ERROR] [Storage$] Error initializing storage client for source HBASE.
> org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to
> ZooKeeper
>         at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(
> HBaseAdmin.java:2358)
>         at org.apache.predictionio.data.storage.hbase.StorageClient.<
> init>(StorageClient.scala:53)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.predictionio.data.storage.Storage$.getClient(
> Storage.scala:252)
>         at org.apache.predictionio.data.storage.Storage$.org$apache$
> predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
>         at org.apache.predictionio.data.storage.Storage$$anonfun$
> sourcesToClientMeta$1.apply(Storage.scala:244)
>         at org.apache.predictionio.data.storage.Storage$$anonfun$
> sourcesToClientMeta$1.apply(Storage.scala:244)
>         at scala.collection.mutable.MapLike$class.getOrElseUpdate(
> MapLike.scala:194)
>         at scala.collection.mutable.AbstractMap.getOrElseUpdate(
> Map.scala:80)
>         at org.apache.predictionio.data.storage.Storage$.
> sourcesToClientMeta(Storage.scala:244)
>         at org.apache.predictionio.data.storage.Storage$.
> getDataObject(Storage.scala:315)
>         at org.apache.predictionio.data.storage.Storage$.
> getDataObjectFromRepo(Storage.scala:300)
>         at org.apache.predictionio.data.storage.Storage$.getLEvents(
> Storage.scala:448)
>         at org.apache.predictionio.data.storage.Storage$.
> verifyAllDataObjects(Storage.scala:384)
>         at org.apache.predictionio.tools.commands.Management$.status(
> Management.scala:156)
>         at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:
> 155)
>         at org.apache.predictionio.tools.console.Console$$anonfun$main$
> 1.apply(Console.scala:721)
>         at org.apache.predictionio.tools.console.Console$$anonfun$main$
> 1.apply(Console.scala:656)
>         at scala.Option.map(Option.scala:146)
>         at org.apache.predictionio.tools.console.Console$.main(Console.
> scala:656)
>         at org.apache.predictionio.tools.console.Console.main(Console.
> scala)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:99)
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
>         at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(
> HBaseAdmin.java:2349)
>         ... 23 more
>
>
>
> [ERROR] [Management$] Unable to connect to all storage backends
> successfully.
> The following shows the error message from the storage backend.
>
> Data source HBASE was not properly initialized.
> (org.apache.predictionio.data.storage.StorageClientException)
>
> Dumping configuration of initialized storage backend sources.
> Please make sure they are correct.
>
> Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
> ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-
> 126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,
> TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME ->
> /home/centos/PredictionIO-0.12.1/vendors/elasticsearch
> Source Name: HBASE; Type: (error); Configuration: (error)
> Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH ->
> hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models
>
> ####
>
>
>
> On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
>> No, you need to have HBase installed, or at least the config installed on
>> the PIO machine. The pio-env.sh defined servers will be  configured cluster
>> operations and will be started separately from PIO. PIO then will not start
>> hbase and try to sommunicate only, not start it. But PIO still needs config
>> for the client code that is in the pio assembly jar.
>>
>> Some services were not cleanly separated between client, master, and
>> slave so complete installation is easiest though you can figure out the
>> minimum with experimentation and I think it is just the conf directory.
>>
>> BTW we have a similar setup and are having trouble with the Spark
>> training phase getting a `classDefNotFound: 
>> org.apache.hadoop.hbase.ProtobufUtil`
>> so can you let us know how it goes?
>>
>>
>>
>> From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com>
>> <clifford.mil...@phoenix-opsgroup.com>
>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Date: May 25, 2018 at 9:43:46 AM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  PIO not using HBase cluster
>>
>> I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
>> pio-start-all it starts the hbase locally and does not use the remote
>> cluster as configured.  I've copied the HBase and Hadoop conf files from
>> the cluster and put them into the locally configured directories.  I set
>> this up in the past using a similar configuration but was using PIO
>> 0.10.0.  When doing this with this version I could start pio with only the
>> hbase and hadoop conf present.  This does not seem to be the case any
>> longer.
>>
>> If I only put the cluster configs then it complains that it cannot find
>> start-hbase.sh.  If I put a hbase installation with cluster configs then it
>> will start a local hbase and not use the remote cluster.
>>
>> Below is my PIO configuration
>>
>> ########
>>
>> #!/usr/bin/env bash
>> #
>> # Safe config that will work if you expand your cluster later
>> SPARK_HOME=$PIO_HOME/vendors/spark
>> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
>> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
>> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf
>>
>>
>> # Filesystem paths where PredictionIO uses as block storage.
>> PIO_FS_BASEDIR=$HOME/.pio_store
>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>
>> # PredictionIO Storage Configuration
>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>
>> # Need to use HDFS here instead of LOCALFS to enable deploying to
>> # machines without the local model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
>>
>> # What store to use for what data
>> # Elasticsearch Example
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
>> # The next line should match the ES cluster.name in ES config
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster
>>
>> # For clustered Elasticsearch (use one host/port if not clustered)
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-136.us-
>> gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.
>> compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
>> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
>> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
>> # port 9200, change if appropriate but do not use the Transport Client
>> port
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
>>
>> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
>> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-we
>> st-1.compute.internal:8020/models
>>
>> # HBase Source config
>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>>
>> # Hbase clustered config (use one host/port if not clustered)
>> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-west-1.
>> compute.internal,ip-10-0-1-209.us-gov-west-1.compute.inte
>> rnal,ip-10-0-1-79.us-gov-west-1.compute.internal
>> ~
>>
>>
>
>
>

Re: PIO not using HBase cluster

Reply via email to