Hi,

Using UR with PIO 0.10 I am trying to train my dataset. In return I get the
following error:

*...*
*[INFO] [DataSource] Received events List(facet, view, search)*
*[INFO] [DataSource] Number of events List(5, 4, 6)*
*[INFO] [Engine$] org.template.TrainingData does not support data sanity
check. Skipping check.*
*[INFO] [Engine$] org.template.PreparedData does not support data sanity
check. Skipping check.*
*[INFO] [URAlgorithm] Actions read now creating correlators*
*[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
ip-172-31-40-139.eu-west-1.compute.internal):
java.lang.NegativeArraySizeException*
*        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
*        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
*        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
*        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
*        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
*        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
*        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
*        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
*        at java.lang.Thread.run(Thread.java:748)*

*[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; aborting job*
*Exception in thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0 in stage 56.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 56.0 (TID 56,
ip-172-1-1-1.eu-west-1.compute.internal):
java.lang.NegativeArraySizeException*
*        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
*        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
*        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
*        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
*        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
*        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
*        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
*        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
*        at java.lang.Thread.run(Thread.java:748)*

*Driver stacktrace:*
*        at org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)*
*        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)*
*        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)*
*        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)*
*        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)*
*        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)*
*        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
*        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
*        at scala.Option.foreach(Option.scala:236)*
*        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)*
*        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)*
*        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)*
*        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)*
*        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)*
*        at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)*
*        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)*
*        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)*
*        at
org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)*
*        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)*
*        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)*
*        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)*
*        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$.numNonZeroElementsPerColumn(SparkEngine.scala:81)*
*        at
org.apache.mahout.math.drm.CheckpointedOps.numNonZeroElementsPerColumn(CheckpointedOps.scala:36)*
*        at
org.apache.mahout.math.cf.SimilarityAnalysis$.sampleDownAndBinarize(SimilarityAnalysis.scala:397)*
*        at
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:101)*
*        at
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:95)*
*        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)*
*        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)*
*        at
org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrences(SimilarityAnalysis.scala:95)*
*        at
org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:147)*
*        at org.template.URAlgorithm.calcAll(URAlgorithm.scala:280)*
*        at org.template.URAlgorithm.train(URAlgorithm.scala:251)*
*        at org.template.URAlgorithm.train(URAlgorithm.scala:169)*
*        at
org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)*
*        at
org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)*
*        at
org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)*
*        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)*
*        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)*
*        at scala.collection.immutable.List.foreach(List.scala:318)*
*        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)*
*        at scala.collection.AbstractTraversable.map(Traversable.scala:105)*
*        at
org.apache.predictionio.controller.Engine$.train(Engine.scala:692)*
*        at
org.apache.predictionio.controller.Engine.train(Engine.scala:177)*
*        at
org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)*
*        at
org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)*
*        at
org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)*
*        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
*        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
*        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
*        at java.lang.reflect.Method.invoke(Method.java:498)*
*        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)*
*        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)*
*        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)*
*        at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)*
*        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*
*Caused by: java.lang.NegativeArraySizeException*
*        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
*        at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
*        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
*        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
*        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
*        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
*        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
*        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
*        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
*        at java.lang.Thread.run(Thread.java:748)*


Now usually this message NegativeArraySizeException tells me that one of
the events defined in engine.json doesn't exist in my dataset. However this
is not the case here, my three events are present in my dataset. Here the
proves:
http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=facet

[{"eventId":"AYDE4TYMjU2dFGWVAYyUYwAAAVx5_afdpSyQHw_eNT0","event":"facet","entityType":"user","entityId":"92ec6a38-9fee-4c99-92a5-46677ad9ca48","targetEntityType":"item","targetEntityId":"alfa-romeo-marque","properties":{},"eventTime":"2017-06-05T20:41:25.725Z","creationTime":"2017-06-05T20:41:25.725Z"}]

http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=view

[{"eventId":"IjuMNR7h40l_sylo-uqEsAAAAVxoIcPqnumP2B_qWAk","event":"view","entityType":"user","entityId":"bbc5bd25-b1ac-41e0-b771-43fe65a8827e","targetEntityType":"item","targetEntityId":"citroen-marque","properties":{},"eventTime":"2017-06-02T09:27:42.314Z","creationTime":"2017-06-02T09:27:42.314Z"}]

http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=search

[{"eventId":"AI6NF05NJa3fP2bRpKUxAwAAAVxymnYYjm6nNt3TsGY","event":"search","entityType":"user","entityId":"b2c77901-0824-4583-9999-3cd56c1f34c9","targetEntityType":"item","targetEntityId":"peugeot-marque","properties":{},"eventTime":"2017-06-04T10:15:44.408Z","creationTime":"2017-06-04T10:15:44.408Z"}]


I selected only one event per type but there are more.


If I keep only the event types *facet *and *search*, then it works,
the train succeeds and I have my model. However as soon as I add the
event type *view*, it fails. I tried putting *view *as a primary event
and it doesnt change anything. Not sure why it would change anything
but I tried anyway.


Here is my engine.json:

*{
  "comment":"",
  "id": "car",
  "description": "settings",
  "engineFactory": "org.template.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "sample-handmade-data.txt",
      "appName": "piourcar",
      "eventNames": ["facet","view","search"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator":
"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes":"espionode1:9200,espionode2:9200,espionode3:9200"
  },
"algorithms": [
    {
      "name": "ur",
      "params": {
        "appName": "piourcar",
        "indexName": "urindex_car",
        "typeName": "items",
        "eventNames": ["facet","view","search"],
        "blacklistEvents": [],
        "maxEventsPerEventType": 50000,
        "maxCorrelatorsPerEventType": 100,
        "maxQueryEvents": 10,
        "num": 5,
        "userBias": 2,
        "returnSelf": true
      }
    }
  ]
}*

Thanks in advance for your help, regards,
Bruno

Reply via email to