I'll keep you informed. However, I'm having issues getting past this. If
I have hbase installed with the clusters config files then it still does
not communicate with the cluster. It does start hbase but on the local PIO
server. If I ONLY have the hbase config (which worked in version 0.10.0)
then pio-start-all gives the following message.
####
pio-start-all
Starting Elasticsearch...
Starting HBase...
/home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65:
/home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No such
file or directory
Waiting 10 seconds for Storage Repositories to fully initialize...
Starting PredictionIO Event Server...
########
"pio status" then returns:
####
pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at
/home/centos/PredictionIO-0.12.1
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at
/home/centos/PredictionIO-0.12.1/vendors/spark
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement
of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
[ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181,
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
localhost). Please make sure that the configuration is pointing at the
correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
if you have not configured HBase to use an external ZooKeeper, that means
your HBase is not started or configured properly.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to
ZooKeeper
at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2358)
at
org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
at
org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
at
scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at
org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
at
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
at
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:300)
at
org.apache.predictionio.data.storage.Storage$.getLEvents(Storage.scala:448)
at
org.apache.predictionio.data.storage.Storage$.verifyAllDataObjects(Storage.scala:384)
at
org.apache.predictionio.tools.commands.Management$.status(Management.scala:156)
at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:155)
at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:721)
at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:656)
at scala.Option.map(Option.scala:146)
at
org.apache.predictionio.tools.console.Console$.main(Console.scala:656)
at org.apache.predictionio.tools.console.Console.main(Console.scala)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2349)
... 23 more
[ERROR] [Management$] Unable to connect to all storage backends
successfully.
The following shows the error message from the storage backend.
Data source HBASE was not properly initialized.
(org.apache.predictionio.data.storage.StorageClientException)
Dumping configuration of initialized storage backend sources.
Please make sure they are correct.
Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,
TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME ->
/home/centos/PredictionIO-0.12.1/vendors/elasticsearch
Source Name: HBASE; Type: (error); Configuration: (error)
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH ->
hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models
####
On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <[email protected]> wrote:
> No, you need to have HBase installed, or at least the config installed on
> the PIO machine. The pio-env.sh defined servers will be configured cluster
> operations and will be started separately from PIO. PIO then will not start
> hbase and try to sommunicate only, not start it. But PIO still needs config
> for the client code that is in the pio assembly jar.
>
> Some services were not cleanly separated between client, master, and slave
> so complete installation is easiest though you can figure out the minimum
> with experimentation and I think it is just the conf directory.
>
> BTW we have a similar setup and are having trouble with the Spark training
> phase getting a `classDefNotFound: org.apache.hadoop.hbase.ProtobufUtil`
> so can you let us know how it goes?
>
>
>
> From: Miller, Clifford <[email protected]>
> <[email protected]>
> Reply: [email protected] <[email protected]>
> <[email protected]>
> Date: May 25, 2018 at 9:43:46 AM
> To: [email protected] <[email protected]>
> <[email protected]>
> Subject: PIO not using HBase cluster
>
> I'm attempting to use a remote cluster with PIO 0.12.1. When I run
> pio-start-all it starts the hbase locally and does not use the remote
> cluster as configured. I've copied the HBase and Hadoop conf files from
> the cluster and put them into the locally configured directories. I set
> this up in the past using a similar configuration but was using PIO
> 0.10.0. When doing this with this version I could start pio with only the
> hbase and hadoop conf present. This does not seem to be the case any
> longer.
>
> If I only put the cluster configs then it complains that it cannot find
> start-hbase.sh. If I put a hbase installation with cluster configs then it
> will start a local hbase and not use the remote cluster.
>
> Below is my PIO configuration
>
> ########
>
> #!/usr/bin/env bash
> #
> # Safe config that will work if you expand your cluster later
> SPARK_HOME=$PIO_HOME/vendors/spark
> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf
>
>
> # Filesystem paths where PredictionIO uses as block storage.
> PIO_FS_BASEDIR=$HOME/.pio_store
> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>
> # PredictionIO Storage Configuration
> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>
> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>
> # Need to use HDFS here instead of LOCALFS to enable deploying to
> # machines without the local model
> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
>
> # What store to use for what data
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
> # The next line should match the ES cluster.name in ES config
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster
>
> # For clustered Elasticsearch (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-
> 136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-
> west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
> # port 9200, change if appropriate but do not use the Transport Client port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
>
> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-
> west-1.compute.internal:8020/models
>
> # HBase Source config
> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>
> # Hbase clustered config (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-
> west-1.compute.internal,ip-10-0-1-209.us-gov-west-1.compute.
> internal,ip-10-0-1-79.us-gov-west-1.compute.internal
> ~
>
>
--
Clifford Miller
Mobile | 321.431.9089