I have a model deployed for this app, it works if I keep only (facet, search) as event types. When asking for a prediction to my deployed model I have an answer in relation with my data (about cars). I checked that the data is sent at the right accesskey in the right ES index. This part is fine I think.
I read the code of the integration-test and I am confident my set up works well, as I already checked for pio status, created a new app, inserted some values in it, trained and deployed it. Yes sorry I forgot to specify the version, it is UR v0.5.0 2017-06-07 11:29 GMT+02:00 Vaghawan Ojha <[email protected]>: > Also what version of UR you're into? Is it the latest one? I've only > worked with UR 0.50 . > > On Wed, Jun 7, 2017 at 3:12 PM, Vaghawan Ojha <[email protected]> > wrote: > >> Yes you need to build the app again when you change something in the >> engine.json. That is every time when you change something in engine.json. >> >> Make sure the data corresponds to the same app which you have provided in >> the engine.json. >> >> Yes you can test example instigation test in UR >> with ./examples/integration-test this command. >> >> You can find more in here http://actionml.com/docs/ur_quickstart . >> >> On Wed, Jun 7, 2017 at 3:07 PM, Bruno LEBON <[email protected]> wrote: >> >>> Yes the three event types that I defined in the engine.json exist in my >>> dataset, facet is my primary, I checked that it exists. >>> >>> I think it is not needed to build again when changing something in the >>> engine.json, as the file is read in the process but I built it and tried >>> again and I still have the same error. >>> >>> What is this example-intrigration? I dont know about this. Where can I >>> find this script? >>> >>> 2017-06-07 11:11 GMT+02:00 Vaghawan Ojha <[email protected]>: >>> >>>> Hi, >>>> >>>> For me this problem had happened when I had mistaken my primary events. >>>> The first eventName in the eventName array "eventNames": >>>> ["facet","view","search"] is primary. There is that event in your data. >>>> >>>> Did you make sure, you built the app again when you changed the >>>> eventName in engine.json? >>>> >>>> Also you could varify everything's fine with UR with >>>> ./example-intrigration. >>>> >>>> Thanks >>>> >>>> On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON <[email protected]> >>>> wrote: >>>> >>>>> Thanks for your answer. >>>>> >>>>> *You could explicitly do * >>>>> >>>>> >>>>> *pio train -- --master spark://localhost:7077 --driver-memory 16G >>>>> --executor-memory 24G * >>>>> >>>>> *and change the spark master url and the memories configuration. And >>>>> see if that works. * >>>>> >>>>> Yes that is the command I use to launch the train, except I am on a >>>>> cluster, so Spark is not local. Here is mine: >>>>> pio train -- --master spark://master:7077 --driver-memory 4g >>>>> --executor-memory 10g >>>>> >>>>> The train works with different datasets, it also works with this >>>>> dataset when I skip the event type *view*. So my guess is that there >>>>> is something about this event type, either in the data but the data looks >>>>> fine to me, or maybe there is a problem when I use more than two types of >>>>> event (this is the first time I have more than two, however I can't >>>>> believe >>>>> that the problem is related the a number of event types). >>>>> >>>>> The spelling is the same in the event sent to the eventserver ( *view >>>>> *) and in the engine.json ( *view *). >>>>> >>>>> I am reading the code to figure out where this error comes from. >>>>> >>>>> >>>>> >>>>> 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha <[email protected]>: >>>>> >>>>>> You could explicitly do >>>>>> >>>>>> pio train -- --master spark://localhost:7077 --driver-memory 16G >>>>>> --executor-memory 24G >>>>>> >>>>>> and change the spark master url and the memories configuration. And >>>>>> see if that works. >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Using UR with PIO 0.10 I am trying to train my dataset. In return I >>>>>>> get the following error: >>>>>>> >>>>>>> *...* >>>>>>> *[INFO] [DataSource] Received events List(facet, view, search)* >>>>>>> *[INFO] [DataSource] Number of events List(5, 4, 6)* >>>>>>> *[INFO] [Engine$] org.template.TrainingData does not support data >>>>>>> sanity check. Skipping check.* >>>>>>> *[INFO] [Engine$] org.template.PreparedData does not support data >>>>>>> sanity check. Skipping check.* >>>>>>> *[INFO] [URAlgorithm] Actions read now creating correlators* >>>>>>> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50, >>>>>>> ip-172-31-40-139.eu-west-1.com >>>>>>> <http://ip-172-31-40-139.eu-west-1.com>pute.internal): >>>>>>> java.lang.NegativeArraySizeException* >>>>>>> * at >>>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)* >>>>>>> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)* >>>>>>> * at org.apache.spark.scheduler.Task.run(Task.scala:89)* >>>>>>> * at >>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)* >>>>>>> * at java.lang.Thread.run(Thread.java:748)* >>>>>>> >>>>>>> *[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; >>>>>>> aborting job* >>>>>>> *Exception in thread "main" org.apache.spark.SparkException: Job >>>>>>> aborted due to stage failure: Task 0 in stage 56.0 failed 4 times, most >>>>>>> recent failure: Lost task 0.3 in stage 56.0 (TID 56, >>>>>>> ip-172-1-1-1.eu-west-1.compute.internal): >>>>>>> java.lang.NegativeArraySizeException* >>>>>>> * at >>>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)* >>>>>>> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)* >>>>>>> * at org.apache.spark.scheduler.Task.run(Task.scala:89)* >>>>>>> * at >>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)* >>>>>>> * at java.lang.Thread.run(Thread.java:748)* >>>>>>> >>>>>>> *Driver stacktrace:* >>>>>>> * at org.apache.spark.scheduler.DAGScheduler.org >>>>>>> <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)* >>>>>>> * at >>>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)* >>>>>>> * at >>>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)* >>>>>>> * at scala.Option.foreach(Option.scala:236)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)* >>>>>>> * at >>>>>>> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)* >>>>>>> * at >>>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)* >>>>>>> * at >>>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)* >>>>>>> * at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)* >>>>>>> * at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$.numNonZeroElementsPerColumn(SparkEngine.scala:81)* >>>>>>> * at >>>>>>> org.apache.mahout.math.drm.CheckpointedOps.numNonZeroElementsPerColumn(CheckpointedOps.scala:36)* >>>>>>> * at org.apache.mahout.math.cf >>>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.sampleDownAndBinarize(SimilarityAnalysis.scala:397)* >>>>>>> * at org.apache.mahout.math.cf >>>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:101)* >>>>>>> * at org.apache.mahout.math.cf >>>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:95)* >>>>>>> * at >>>>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)* >>>>>>> * at >>>>>>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)* >>>>>>> * at org.apache.mahout.math.cf >>>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.cooccurrences(SimilarityAnalysis.scala:95)* >>>>>>> * at org.apache.mahout.math.cf >>>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:147)* >>>>>>> * at org.template.URAlgorithm.calcAll(URAlgorithm.scala:280)* >>>>>>> * at org.template.URAlgorithm.train(URAlgorithm.scala:251)* >>>>>>> * at org.template.URAlgorithm.train(URAlgorithm.scala:169)* >>>>>>> * at >>>>>>> org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)* >>>>>>> * at >>>>>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)* >>>>>>> * at >>>>>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)* >>>>>>> * at >>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)* >>>>>>> * at >>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)* >>>>>>> * at scala.collection.immutable.List.foreach(List.scala:318)* >>>>>>> * at >>>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)* >>>>>>> * at >>>>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)* >>>>>>> * at >>>>>>> org.apache.predictionio.controller.Engine$.train(Engine.scala:692)* >>>>>>> * at >>>>>>> org.apache.predictionio.controller.Engine.train(Engine.scala:177)* >>>>>>> * at >>>>>>> org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)* >>>>>>> * at >>>>>>> org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)* >>>>>>> * at >>>>>>> org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)* >>>>>>> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>> Method)* >>>>>>> * at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)* >>>>>>> * at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* >>>>>>> * at java.lang.reflect.Method.invoke(Method.java:498)* >>>>>>> * at >>>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)* >>>>>>> * at >>>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)* >>>>>>> * at >>>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)* >>>>>>> * at >>>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)* >>>>>>> * at >>>>>>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)* >>>>>>> *Caused by: java.lang.NegativeArraySizeException* >>>>>>> * at >>>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)* >>>>>>> * at >>>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)* >>>>>>> * at >>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)* >>>>>>> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)* >>>>>>> * at >>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)* >>>>>>> * at org.apache.spark.scheduler.Task.run(Task.scala:89)* >>>>>>> * at >>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)* >>>>>>> * at java.lang.Thread.run(Thread.java:748)* >>>>>>> >>>>>>> >>>>>>> Now usually this message NegativeArraySizeException tells me that >>>>>>> one of the events defined in engine.json doesn't exist in my dataset. >>>>>>> However this is not the case here, my three events are present in my >>>>>>> dataset. Here the proves: >>>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f >>>>>>> -a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&e >>>>>>> vent=facet >>>>>>> >>>>>>> [{"eventId":"AYDE4TYMjU2dFGWVAYyUYwAAAVx5_afdpSyQHw_eNT0","event":"facet","entityType":"user","entityId":"92ec6a38-9fee-4c99-92a5-46677ad9ca48","targetEntityType":"item","targetEntityId":"alfa-romeo-marque","properties":{},"eventTime":"2017-06-05T20:41:25.725Z","creationTime":"2017-06-05T20:41:25.725Z"}] >>>>>>> >>>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=view >>>>>>> >>>>>>> [{"eventId":"IjuMNR7h40l_sylo-uqEsAAAAVxoIcPqnumP2B_qWAk","event":"view","entityType":"user","entityId":"bbc5bd25-b1ac-41e0-b771-43fe65a8827e","targetEntityType":"item","targetEntityId":"citroen-marque","properties":{},"eventTime":"2017-06-02T09:27:42.314Z","creationTime":"2017-06-02T09:27:42.314Z"}] >>>>>>> >>>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=search >>>>>>> >>>>>>> [{"eventId":"AI6NF05NJa3fP2bRpKUxAwAAAVxymnYYjm6nNt3TsGY","event":"search","entityType":"user","entityId":"b2c77901-0824-4583-9999-3cd56c1f34c9","targetEntityType":"item","targetEntityId":"peugeot-marque","properties":{},"eventTime":"2017-06-04T10:15:44.408Z","creationTime":"2017-06-04T10:15:44.408Z"}] >>>>>>> >>>>>>> >>>>>>> I selected only one event per type but there are more. >>>>>>> >>>>>>> >>>>>>> If I keep only the event types *facet *and *search*, then it works, the >>>>>>> train succeeds and I have my model. However as soon as I add the event >>>>>>> type *view*, it fails. I tried putting *view *as a primary event and it >>>>>>> doesnt change anything. Not sure why it would change anything but I >>>>>>> tried anyway. >>>>>>> >>>>>>> >>>>>>> Here is my engine.json: >>>>>>> >>>>>>> *{ >>>>>>> "comment":"", >>>>>>> "id": "car", >>>>>>> "description": "settings", >>>>>>> "engineFactory": "org.template.RecommendationEngine", >>>>>>> "datasource": { >>>>>>> "params" : { >>>>>>> "name": "sample-handmade-data.txt", >>>>>>> "appName": "piourcar", >>>>>>> "eventNames": ["facet","view","search"] >>>>>>> } >>>>>>> }, >>>>>>> "sparkConf": { >>>>>>> "spark.serializer": "org.apache.spark.serializer.KryoSerializer", >>>>>>> "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io >>>>>>> <http://sparkbindings.io>.MahoutKryoRegistrator", >>>>>>> "spark.kryo.referenceTracking": "false", >>>>>>> "spark.kryoserializer.buffer": "300m", >>>>>>> "es.index.auto.create": "true", >>>>>>> "es.nodes":"espionode1:9200,espionode2:9200,espionode3:9200" >>>>>>> }, >>>>>>> "algorithms": [ >>>>>>> { >>>>>>> "name": "ur", >>>>>>> "params": { >>>>>>> "appName": "piourcar", >>>>>>> "indexName": "urindex_car", >>>>>>> "typeName": "items", >>>>>>> "eventNames": ["facet","view","search"], >>>>>>> "blacklistEvents": [], >>>>>>> "maxEventsPerEventType": 50000, >>>>>>> "maxCorrelatorsPerEventType": 100, >>>>>>> "maxQueryEvents": 10, >>>>>>> "num": 5, >>>>>>> "userBias": 2, >>>>>>> "returnSelf": true >>>>>>> } >>>>>>> } >>>>>>> ] >>>>>>> }* >>>>>>> >>>>>>> Thanks in advance for your help, regards, >>>>>>> Bruno >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
