The data will come from HBase (or possibly JDBC but not recommended) the model 
is always stored in Elasticsearch. The reason for storage in Elasticsearch is 
that the last step in the algorithm is performed by the ES query, with gives 
k-nearest neighbors based on cosine similarity. This is not possible with HDFS. 
We are not fetching things by ID, we are performing a mathematical operation on 
the model that fetches special things.

HDFS may be used for import/export but is not needed by the UR explicitly.

If you are using the setup instructions on actionml.com I suggest you look 
through that again. It looks like you have tried things that were outside of 
those instructions.


#!/usr/bin/env bash

# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.

# Safe config that will work if you expand your cluster later
SPARK_HOME=/usr/local/spark
ES_CONF_DIR=/usr/local/elasticsearch
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
HBASE_CONF_DIR=/usr/local/hbase/conf

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities.

# Storage Repositories

PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH


PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

# Need to use HDFS here instead of LOCALFS to account for future expansion
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
# PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE= ELASTICSEARCH


# Storage Data Sources, lower level that repos above, just a simple storage API
# to use

# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/elasticsearch
# the next line should match the cluster.name in elasticsearch.yml
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=infoquest

# For single host Elasticsearch, may add hosts and ports later
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=some-master
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=some-master  <—— put your DNS name or 
IP address for ES here
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300

# dummy models are stored here so use HDFS in case you later want to
# expand the Event and PredictionServers
PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://some-master:9000/models

# HBase Source config
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase
# Hbase single master config
# PIO_STORAGE_SOURCES_HBASE_HOSTS=some-master
PIO_STORAGE_SOURCES_HBASE_HOSTS=some-master   <—— put your DNS name or IP 
address for HBase here
PIO_STORAGE_SOURCES_HBASE_PORTS=0

# I don’t think this is used
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_FS_PATH=/mymodels <—— really? /mymodels at the root of the 
local disk?




On Apr 3, 2017, at 7:01 AM, infoquest india <[email protected]> wrote:

Can we use HDFS or LocalFileSystem for UR ?

I am using single machine setup and changed my /etc/hosts file to point to 
internal IP.

Please find attached pio-env,sh.

One thing i am not clear what is creating issue HDFS or ElasticSearch ?


Thanks
Gaurav
http://www.infoquestsolutions.com <http://www.infoquestsolutions.com/>
Turning Imagination To Reality
Skype:- infoquestsolutions
Gtalk:- infoquestindia

On Mon, Apr 3, 2017 at 6:52 PM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
If you are still using the UR you don’t need HDFS as a storage backend.

In setup instructions, “some-master” is a placeholder where you actually enter 
the DNS name or IP address of your actual master machine running Elasticsearch. 
This can be a list comma separated, no spaces.

Can you share your pio-env.sh


On Apr 3, 2017, at 4:31 AM, infoquest india <[email protected] 
<mailto:[email protected]>> wrote:

Hi 

I am using pio status i am getting error 


SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/home/aml/pio/PredictionIO-0.11.0-SNAPSHOT/lib/spark/pio-data-hdfs-assembly-0.11.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in 
[jar:file:/home/aml/pio/PredictionIO-0.11.0-SNAPSHOT/lib/pio-assembly-0.11.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings 
<http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

[INFO] [Management$] Inspecting PredictionIO...

[INFO] [Management$] PredictionIO 0.11.0-SNAPSHOT is installed at 
/home/aml/pio/PredictionIO-0.11.0-SNAPSHOT

[INFO] [Management$] Inspecting Apache Spark...

[INFO] [Management$] Apache Spark is installed at None

[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 
1.3.0)

[INFO] [Management$] Inspecting storage backend connections...

[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...

[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...

[ERROR] [Storage$] Error initializing storage client for source HDFS

[ERROR] [Management$] Unable to connect to all storage backends successfully.

The following shows the error message from the storage backend.



Data source HDFS was not properly initialized. 
(org.apache.predictionio.data.storage.StorageClientException)



Dumping configuration of initialized storage backend sources.

Please make sure they are correct.



Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOME -> 
/usr/local/elasticsearch, HOSTS -> some-master, PORTS -> 9300, CLUSTERNAME -> 
infoquest, TYPE -> elasticsearch

Source Name: HDFS; Type: (error); Configuration: (error)



Thanks
Gaurav



<pio-env.sh.rtf>

Reply via email to