Yes, those instructions tell you to run HDFS in pseudo-cluster mode. What
do you see in the HDFS GUI on localhost:50070 ?
Those setup instructions create a pseudo-clustered Spark, and HDFS/HBase.
This runs on a single machine but as the page says, are configured so you
can easily expand to a cluster by replacing config to point to remote HDFS
or Spark clusters.
One fix, if you don’t want to run those services in pseudo-cluster mode is:
1) remove any mention of PGSQL or jdbc, we are not using it. These are not
found on the page you linked to and are not used.
2) on a single machine you can put the dummy/empty model file in LOCALFS so
change the lines
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models
to
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE= LOCALFS
PIO_STORAGE_SOURCES_HDFS_PATH=/path/to/models
substituting with a directory where you want to save models
Running them in a pseudo-cluster mode gives you GUIs to see job progress
and browse HDFS for files, among other things. We recommend it for helping
to debug problems when you get to large amounts of data and begin running
out of resources.
From: Anuj Kumar <[email protected]> <[email protected]>
Date: June 19, 2018 at 10:35:02 AM
To: [email protected] <[email protected]> <[email protected]>
Cc: [email protected] <[email protected]>
<[email protected]>, [email protected]
<[email protected]> <[email protected]>
Subject: Re: java.util.NoSuchElementException: head of empty list when
running train
Hi Pat,
Read it on the below link
http://actionml.com/docs/single_machine
here is the pio-env.sh
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6
POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
HBASE_CONF_DIR=/usr/local/hbase/conf
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/els
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pio
PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase
Thanks,
Anuj Kumar
On Tue, Jun 19, 2018 at 9:16 PM Pat Ferrel <[email protected]> wrote:
> Can you show me where on the AML site it says to store models in HDFS, it
> should not say that? I think that may be from the PIO site so you should
> ignore it.
>
> Can you share your pio-env? You need to go through the whole workflow from
> pio build, pio train, to pio deploy using a template from the same
> directory and with the same engine.json and pio-env and I suspect something
> is wrong in pio-env.
>
>
> From: Anuj Kumar <[email protected]>
> <[email protected]>
> Date: June 19, 2018 at 1:28:11 AM
> To: [email protected] <[email protected]> <[email protected]>
> Cc: [email protected] <[email protected]>
> <[email protected]>, [email protected]
> <[email protected]> <[email protected]>
> Subject: Re: java.util.NoSuchElementException: head of empty list when
> running train
>
> Tried with basic engine.json mentioned at UL site examples. Seems to work
> but got stuck at "pio deploy" throwing following error
>
> [ERROR] [OneForOneStrategy] Failed to invert: [B@35c7052
>
>
> before that "pio train" was successful but gave following error. I suspect
> because of this reason "pio deploy" is not working. Please help
>
> [ERROR] [HDFSModels] File /models/pio_modelAWQXIr4APcDlNQi8DwVj could only
> be replicated to 0 nodes instead of minReplication (=1). There are 0
> datanode(s) running and no node(s) are excluded in this operation.
>
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1726)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2565)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
>
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)
>
>
> On Tue, Jun 19, 2018 at 10:45 AM Anuj Kumar <[email protected]>
> wrote:
>
>> Sure, here it is.
>>
>> {
>>
>> "comment":" This config file uses default settings for all but the
>> required values see README.md for docs",
>>
>> "id": "default",
>>
>> "description": "Default settings",
>>
>> "engineFactory": "com.actionml.RecommendationEngine",
>>
>> "datasource": {
>>
>> "params" : {
>>
>> "name": "sample-handmad",
>>
>> "appName": "np",
>>
>> "eventNames": ["read", "search", "view", "category-pref"],
>>
>> "minEventsPerUser": 1,
>>
>> "eventWindow": {
>>
>> "duration": "300 days",
>>
>> "removeDuplicates": true,
>>
>> "compressProperties": true
>>
>> }
>>
>> }
>>
>> },
>>
>> "sparkConf": {
>>
>> "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>>
>> "spark.kryo.registrator":
>> "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
>>
>> "spark.kryo.referenceTracking": "false",
>>
>> "spark.kryoserializer.buffer": "300m",
>>
>> "spark.executor.memory": "4g",
>>
>> "spark.executor.cores": "2",
>>
>> "spark.task.cpus": "2",
>>
>> "spark.default.parallelism": "16",
>>
>> "es.index.auto.create": "true"
>>
>> },
>>
>> "algorithms": [
>>
>> {
>>
>> "comment": "simplest setup where all values are default,
>> popularity based backfill, must add eventsNames",
>>
>> "name": "ur",
>>
>> "params": {
>>
>> "appName": "np",
>>
>> "indexName": "np",
>>
>> "typeName": "items",
>>
>> "blacklistEvents": [],
>>
>> "comment": "must have data for the first event or the model will
>> not build, other events are optional",
>>
>> "indicators": [
>>
>> {
>>
>> "name": "read"
>>
>> },{
>>
>> "name": "search",
>>
>> "maxCorrelatorsPerItem": 5
>>
>> },{
>>
>> "name": "category-pref",
>>
>> "maxCorrelatorsPerItem": 50
>>
>> },{
>>
>> "name": "view",
>>
>> "maxCorrelatorsPerItem": 50
>>
>> }
>>
>> ],
>>
>> "expireDateName": "itemExpiry",
>>
>> "dateName": "date",
>>
>> "num": 5
>>
>> }
>>
>> }
>>
>> ]
>>
>> }
>>
>>
>> On Mon, Jun 18, 2018 at 8:55 PM Pat Ferrel <[email protected]> wrote:
>>
>>> This sounds like some missing required config in engine.json. Can you
>>> share the file?
>>>
>>>
>>> From: Anuj Kumar <[email protected]>
>>> <[email protected]>
>>> Reply: [email protected] <[email protected]>
>>> <[email protected]>
>>> Date: June 18, 2018 at 5:05:22 AM
>>> To: [email protected] <[email protected]>
>>> <[email protected]>
>>> Subject: java.util.NoSuchElementException: head of empty list when
>>> running train
>>>
>>> Getting this while running "pio train". Please help
>>>
>>> Exception in thread "main" java.util.NoSuchElementException: head of
>>> empty list
>>>
>>> at scala.collection.immutable.Nil$.head(List.scala:420)
>>>
>>> at scala.collection.immutable.Nil$.head(List.scala:417)
>>>
>>> at
>>> org.apache.mahout.math.cf.SimilarityAnalysis$.crossOccurrenceDownsampled(SimilarityAnalysis.scala:177)
>>>
>>> at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:343)
>>>
>>> at com.actionml.URAlgorithm.train(URAlgorithm.scala:295)
>>>
>>> at com.actionml.URAlgorithm.train(URAlgorithm.scala:180)
>>>
>>> at
>>> org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
>>>
>>> at
>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)
>>>
>>> at
>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)
>>>
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>>>
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:381)
>>>
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>>>
>>> at scala.collection.immutable.List.map(List.scala:285)
>>>
>>> at org.apache.predictionio.controller.Engine$.train(Engine.scala:690)
>>>
>>> at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
>>>
>>> at
>>> org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
>>>
>>> at
>>> org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>
>>> at
>>> org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
>>>
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
>>>
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>> --
>>> -
>>> Best,
>>> Anuj Kumar
>>>
>>>
>>
>> --
>> -
>> Best,
>> Anuj Kumar
>>
>
>
> --
> -
> Best,
> Anuj Kumar
>
>
--
-
Best,
Anuj Kumar
--
You received this message because you are subscribed to the Google Groups
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/actionml-user/CAN5v0zfsuiGHsqgVdtAgc0t8%3DopRTGg6WE7KPEhhkjfrPvWVeg%40mail.gmail.com
<https://groups.google.com/d/msgid/actionml-user/CAN5v0zfsuiGHsqgVdtAgc0t8%3DopRTGg6WE7KPEhhkjfrPvWVeg%40mail.gmail.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.