We (ActionML) do contract and consulting based on PIO and I can assure you from many installations that it works quite well with clustered HDFS, HBase, Spark, and Elasticsearch. For truly scalable setup I’d recommend clusters for Spark, HBase+HDFS, and ES 5.x. So 3 different ones.

If you are using the ActionML setup instructions you must be using the Universal Recommender? This was broken by the latest release of PIO 0.12.0 but we have an RC that will be released with the next Mahout release (should be next week). You can use it now by following some interim build instructions (see the UR-0.7.0-SNAPSHOT readme here: https://github.com/actionml/universal-recommender/tree/0.7.0-SNAPSHOT). BTW the page you reference is for PIO 0.11.0 and is being updated as I write.

If you are using the UR you do not want an HDFS storage backend, which is being checked by `pio status`. Can you share your pio-env? Even if you are not using the UR my theory is that something is setup in pio-env wrong because using clustered HDFS for a store is quite typical and well tested.


On Nov 28, 2017, at 8:49 AM, Thibaut Gensollen - Choose <[email protected]> wrote:

Hi Guys,

I tried to set up a Predictionio cluster, working with an elasticsearch and hadoop clusters. 

The fact is, everything is working well (Hbase, ES or Hadoop/Spark), and as soon as I tried « pio-start-all » I am getting these errors:

aml@master:~$ pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.0-incubating is installed at /home/aml
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at 
/opt/spark/spark-1.6.3-bin-hadoop2.6
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 
1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[ERROR] [Storage$] Error initializing storage client for source HDFS.
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://master.c.choose-ninja-01.internal:9000/models, expected: file:
///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.setWorkingDirectory(RawLocalFileSystem.java:547)
        at 
org.apache.hadoop.fs.FilterFileSystem.setWorkingDirectory(FilterFileSystem.java:280)
        at 
org.apache.predictionio.data.storage.hdfs.StorageClient.<init>(StorageClient.scala:33)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
        at 
org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(S
torage.scala:283)
        at 
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at 
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at 
org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
        at 
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
        at 
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:300)
        at 
org.apache.predictionio.data.storage.Storage$.getModelDataModels(Storage.scala:442)
        at 
org.apache.predictionio.data.storage.Storage$.verifyAllDataObjects(Storage.scala:381)
        at 
org.apache.predictionio.tools.commands.Management$.status(Management.scala:156)
        at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:155)
        at 
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:721)
        at 
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:656)
        at scala.Option.map(Option.scala:146)
        at 
org.apache.predictionio.tools.console.Console$.main(Console.scala:656)
        at org.apache.predictionio.tools.console.Console.main(Console.scala)

[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.
Data source HDFS was not properly initialized. 
(org.apache.predictionio.data.storage.StorageClientException)
Dumping configuration of initialized storage backend sources.
Please make sure they are correct.
Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: TYPE -> 
elasticsearch, HOME -> /opt/elasticsearch/e
lasticsearch-5.5.2
Source Name: HDFS; Type: (error); Configuration: (error


And then, my hadoop is not working anymore :

hdfs dfs -mkdir test
mkdir: Call From master.c.choose-ninja-01.internal/10.128.0.4 to master.c.choose-ninja-01.internal:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

I tried working with a gcloud dataproc, or following your tutorial http://actionml.com/docs/small_ha_cluster with some vm engines (still on gcloud), but the results are the same.. Any Idea ? We tried so many times that I am beginning thinking that we cannot use predictionio with an hadoop and elasticsearch clusters :/

Thanks for your help,
Regards

Thibaut




Thibaut Gensollen


Reply via email to