Yes, those instructions tell you to run HDFS in pseudo-cluster mode. What do you see in the HDFS GUI on localhost:50070 ?
Those setup instructions create a pseudo-clustered Spark, and HDFS/HBase. This runs on a single machine but as the page says, are configured so you can easily expand to a cluster by replacing config to point to remote HDFS or Spark clusters. One fix, if you don’t want to run those services in pseudo-cluster mode is: 1) remove any mention of PGSQL or jdbc, we are not using it. These are not found on the page you linked to and are not used. 2) on a single machine you can put the dummy/empty model file in LOCALFS so change the lines PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models to PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE= LOCALFS PIO_STORAGE_SOURCES_HDFS_PATH=/path/to/models substituting with a directory where you want to save models Running them in a pseudo-cluster mode gives you GUIs to see job progress and browse HDFS for files, among other things. We recommend it for helping to debug problems when you get to large amounts of data and begin running out of resources. From: Anuj Kumar <anuj.ku...@timesinternet.in> <anuj.ku...@timesinternet.in> Date: June 19, 2018 at 10:35:02 AM To: p...@occamsmachete.com <p...@occamsmachete.com> <p...@occamsmachete.com> Cc: user@predictionio.apache.org <user@predictionio.apache.org> <user@predictionio.apache.org>, actionml-u...@googlegroups.com <actionml-u...@googlegroups.com> <actionml-u...@googlegroups.com> Subject: Re: java.util.NoSuchElementException: head of empty list when running train Hi Pat, Read it on the below link http://actionml.com/docs/single_machine here is the pio-env.sh SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6 POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop HBASE_CONF_DIR=/usr/local/hbase/conf PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/els PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pio PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase Thanks, Anuj Kumar On Tue, Jun 19, 2018 at 9:16 PM Pat Ferrel <p...@occamsmachete.com> wrote: > Can you show me where on the AML site it says to store models in HDFS, it > should not say that? I think that may be from the PIO site so you should > ignore it. > > Can you share your pio-env? You need to go through the whole workflow from > pio build, pio train, to pio deploy using a template from the same > directory and with the same engine.json and pio-env and I suspect something > is wrong in pio-env. > > > From: Anuj Kumar <anuj.ku...@timesinternet.in> > <anuj.ku...@timesinternet.in> > Date: June 19, 2018 at 1:28:11 AM > To: p...@occamsmachete.com <p...@occamsmachete.com> <p...@occamsmachete.com> > Cc: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org>, actionml-u...@googlegroups.com > <actionml-u...@googlegroups.com> <actionml-u...@googlegroups.com> > Subject: Re: java.util.NoSuchElementException: head of empty list when > running train > > Tried with basic engine.json mentioned at UL site examples. Seems to work > but got stuck at "pio deploy" throwing following error > > [ERROR] [OneForOneStrategy] Failed to invert: [B@35c7052 > > > before that "pio train" was successful but gave following error. I suspect > because of this reason "pio deploy" is not working. Please help > > [ERROR] [HDFSModels] File /models/pio_modelAWQXIr4APcDlNQi8DwVj could only > be replicated to 0 nodes instead of minReplication (=1). There are 0 > datanode(s) running and no node(s) are excluded in this operation. > > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1726) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2565) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489) > > > On Tue, Jun 19, 2018 at 10:45 AM Anuj Kumar <anuj.ku...@timesinternet.in> > wrote: > >> Sure, here it is. >> >> { >> >> "comment":" This config file uses default settings for all but the >> required values see README.md for docs", >> >> "id": "default", >> >> "description": "Default settings", >> >> "engineFactory": "com.actionml.RecommendationEngine", >> >> "datasource": { >> >> "params" : { >> >> "name": "sample-handmad", >> >> "appName": "np", >> >> "eventNames": ["read", "search", "view", "category-pref"], >> >> "minEventsPerUser": 1, >> >> "eventWindow": { >> >> "duration": "300 days", >> >> "removeDuplicates": true, >> >> "compressProperties": true >> >> } >> >> } >> >> }, >> >> "sparkConf": { >> >> "spark.serializer": "org.apache.spark.serializer.KryoSerializer", >> >> "spark.kryo.registrator": >> "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator", >> >> "spark.kryo.referenceTracking": "false", >> >> "spark.kryoserializer.buffer": "300m", >> >> "spark.executor.memory": "4g", >> >> "spark.executor.cores": "2", >> >> "spark.task.cpus": "2", >> >> "spark.default.parallelism": "16", >> >> "es.index.auto.create": "true" >> >> }, >> >> "algorithms": [ >> >> { >> >> "comment": "simplest setup where all values are default, >> popularity based backfill, must add eventsNames", >> >> "name": "ur", >> >> "params": { >> >> "appName": "np", >> >> "indexName": "np", >> >> "typeName": "items", >> >> "blacklistEvents": [], >> >> "comment": "must have data for the first event or the model will >> not build, other events are optional", >> >> "indicators": [ >> >> { >> >> "name": "read" >> >> },{ >> >> "name": "search", >> >> "maxCorrelatorsPerItem": 5 >> >> },{ >> >> "name": "category-pref", >> >> "maxCorrelatorsPerItem": 50 >> >> },{ >> >> "name": "view", >> >> "maxCorrelatorsPerItem": 50 >> >> } >> >> ], >> >> "expireDateName": "itemExpiry", >> >> "dateName": "date", >> >> "num": 5 >> >> } >> >> } >> >> ] >> >> } >> >> >> On Mon, Jun 18, 2018 at 8:55 PM Pat Ferrel <p...@occamsmachete.com> wrote: >> >>> This sounds like some missing required config in engine.json. Can you >>> share the file? >>> >>> >>> From: Anuj Kumar <anuj.ku...@timesinternet.in> >>> <anuj.ku...@timesinternet.in> >>> Reply: user@predictionio.apache.org <user@predictionio.apache.org> >>> <user@predictionio.apache.org> >>> Date: June 18, 2018 at 5:05:22 AM >>> To: user@predictionio.apache.org <user@predictionio.apache.org> >>> <user@predictionio.apache.org> >>> Subject: java.util.NoSuchElementException: head of empty list when >>> running train >>> >>> Getting this while running "pio train". Please help >>> >>> Exception in thread "main" java.util.NoSuchElementException: head of >>> empty list >>> >>> at scala.collection.immutable.Nil$.head(List.scala:420) >>> >>> at scala.collection.immutable.Nil$.head(List.scala:417) >>> >>> at >>> org.apache.mahout.math.cf.SimilarityAnalysis$.crossOccurrenceDownsampled(SimilarityAnalysis.scala:177) >>> >>> at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:343) >>> >>> at com.actionml.URAlgorithm.train(URAlgorithm.scala:295) >>> >>> at com.actionml.URAlgorithm.train(URAlgorithm.scala:180) >>> >>> at >>> org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49) >>> >>> at >>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690) >>> >>> at >>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690) >>> >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) >>> >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) >>> >>> at scala.collection.immutable.List.foreach(List.scala:381) >>> >>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) >>> >>> at scala.collection.immutable.List.map(List.scala:285) >>> >>> at org.apache.predictionio.controller.Engine$.train(Engine.scala:690) >>> >>> at org.apache.predictionio.controller.Engine.train(Engine.scala:176) >>> >>> at >>> org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67) >>> >>> at >>> org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251) >>> >>> at >>> org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> >>> at >>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) >>> >>> at >>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) >>> >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) >>> >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) >>> >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> >>> -- >>> - >>> Best, >>> Anuj Kumar >>> >>> >> >> -- >> - >> Best, >> Anuj Kumar >> > > > -- > - > Best, > Anuj Kumar > > -- - Best, Anuj Kumar -- You received this message because you are subscribed to the Google Groups "actionml-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to actionml-user+unsubscr...@googlegroups.com. To post to this group, send email to actionml-u...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAN5v0zfsuiGHsqgVdtAgc0t8%3DopRTGg6WE7KPEhhkjfrPvWVeg%40mail.gmail.com <https://groups.google.com/d/msgid/actionml-user/CAN5v0zfsuiGHsqgVdtAgc0t8%3DopRTGg6WE7KPEhhkjfrPvWVeg%40mail.gmail.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout.