The data will come from HBase (or possibly JDBC but not recommended) the model is always stored in Elasticsearch. The reason for storage in Elasticsearch is that the last step in the algorithm is performed by the ES query, with gives k-nearest neighbors based on cosine similarity. This is not possible with HDFS. We are not fetching things by ID, we are performing a mathematical operation on the model that fetches special things.
HDFS may be used for import/export but is not needed by the UR explicitly. If you are using the setup instructions on actionml.com I suggest you look through that again. It looks like you have tried things that were outside of those instructions. #!/usr/bin/env bash # PredictionIO Main Configuration # # This section controls core behavior of PredictionIO. It is very likely that # you need to change these to fit your site. # Safe config that will work if you expand your cluster later SPARK_HOME=/usr/local/spark ES_CONF_DIR=/usr/local/elasticsearch HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop HBASE_CONF_DIR=/usr/local/hbase/conf # Filesystem paths where PredictionIO uses as block storage. PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp # PredictionIO Storage Configuration # # This section controls programs that make use of PredictionIO's built-in # storage facilities. # Storage Repositories PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE # Need to use HDFS here instead of LOCALFS to account for future expansion PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model # PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE= ELASTICSEARCH # Storage Data Sources, lower level that repos above, just a simple storage API # to use # Elasticsearch Example PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/elasticsearch # the next line should match the cluster.name in elasticsearch.yml PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=infoquest # For single host Elasticsearch, may add hosts and ports later # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=some-master PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=some-master <—— put your DNS name or IP address for ES here PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 # dummy models are stored here so use HDFS in case you later want to # expand the Event and PredictionServers PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://some-master:9000/models # HBase Source config PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase # Hbase single master config # PIO_STORAGE_SOURCES_HBASE_HOSTS=some-master PIO_STORAGE_SOURCES_HBASE_HOSTS=some-master <—— put your DNS name or IP address for HBase here PIO_STORAGE_SOURCES_HBASE_PORTS=0 # I don’t think this is used PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs PIO_STORAGE_SOURCES_FS_PATH=/mymodels <—— really? /mymodels at the root of the local disk? On Apr 3, 2017, at 7:01 AM, infoquest india <[email protected]> wrote: Can we use HDFS or LocalFileSystem for UR ? I am using single machine setup and changed my /etc/hosts file to point to internal IP. Please find attached pio-env,sh. One thing i am not clear what is creating issue HDFS or ElasticSearch ? Thanks Gaurav http://www.infoquestsolutions.com <http://www.infoquestsolutions.com/> Turning Imagination To Reality Skype:- infoquestsolutions Gtalk:- infoquestindia On Mon, Apr 3, 2017 at 6:52 PM, Pat Ferrel <[email protected] <mailto:[email protected]>> wrote: If you are still using the UR you don’t need HDFS as a storage backend. In setup instructions, “some-master” is a placeholder where you actually enter the DNS name or IP address of your actual master machine running Elasticsearch. This can be a list comma separated, no spaces. Can you share your pio-env.sh On Apr 3, 2017, at 4:31 AM, infoquest india <[email protected] <mailto:[email protected]>> wrote: Hi I am using pio status i am getting error SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/aml/pio/PredictionIO-0.11.0-SNAPSHOT/lib/spark/pio-data-hdfs-assembly-0.11.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/aml/pio/PredictionIO-0.11.0-SNAPSHOT/lib/pio-assembly-0.11.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [INFO] [Management$] Inspecting PredictionIO... [INFO] [Management$] PredictionIO 0.11.0-SNAPSHOT is installed at /home/aml/pio/PredictionIO-0.11.0-SNAPSHOT [INFO] [Management$] Inspecting Apache Spark... [INFO] [Management$] Apache Spark is installed at None [INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0) [INFO] [Management$] Inspecting storage backend connections... [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)... [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)... [ERROR] [Storage$] Error initializing storage client for source HDFS [ERROR] [Management$] Unable to connect to all storage backends successfully. The following shows the error message from the storage backend. Data source HDFS was not properly initialized. (org.apache.predictionio.data.storage.StorageClientException) Dumping configuration of initialized storage backend sources. Please make sure they are correct. Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOME -> /usr/local/elasticsearch, HOSTS -> some-master, PORTS -> 9300, CLUSTERNAME -> infoquest, TYPE -> elasticsearch Source Name: HDFS; Type: (error); Configuration: (error) Thanks Gaurav <pio-env.sh.rtf>
