OK but that doesn’t make cross-validation of item-based recs worth the effort.
Hold-out tests assume you have queries and conversions you can test. The query only uses an item (for similarity) but you have no item-based conversions. You might be able to hold out users since you have their conversions, then test every item they viewed to see if any of the similar items recommended were converted on in the hold-out set, but again this is of dubious benefit. On May 1, 2017, at 11:10 AM, Dennis Honders <[email protected]> wrote: Yes, I already read that. But I am currently not able to setup an environment to do that. Op 1 mei 2017 7:52 PM schreef "Pat Ferrel" <[email protected] <mailto:[email protected]>>: Cross validation for item-based recs is problematic and of dubious value. I’d A/B test changes by starting from default and gong from there. On May 1, 2017, at 8:34 AM, Dennis Honders <[email protected] <mailto:[email protected]>> wrote: Hi, I'm currently working on an Evaluator for the Similar product template. I'm not a Scala expert. I followed the Classification Quickstart which is used for the Evaluator tutorial. http://predictionio.incubator.apache.org/evaluation/paramtuning/ <http://predictionio.incubator.apache.org/evaluation/paramtuning/> A RDD of LabeldPoint is used to retrieve data. val labeledPoints: RDD[LabeledPoint] = eventsDb.aggregateProperties(... The Similar product template retrieves like: val usersRDD: RDD[(String, User)] = PEventStore.aggregateProperties(... val itemsRDD: RDD[(String, Item)] = PEventStore.aggregateProperties( val viewEventsRDD: RDD[ViewEvent] = PEventStore.find( According to the docs, the above should be the same? }.cache() // End of reading from data store // K-fold splitting val evalK = dsp.evalK.get val indexedPointsUsers: RDD[((String, User), Long)] = usersRDD.zipWithIndex() val indexedPointsItems: RDD[((String, Item), Long)] = itemsRDD.zipWithIndex() val indexedPointsView: RDD[(ViewEvent, Long)] = viewEventsRDD.zipWithIndex() (0 until evalK).map { idx => val trainingPointsUsers = indexedPointsUsers.filter(_._2 % evalK != idx).map(_._1) val testingPointsUsers = indexedPointsUsers.filter(_._2 % evalK == idx).map(_._1) val trainingPointsItems = indexedPointsItems.filter(_._2 % evalK != idx).map(_._1) val testingPointsItems = indexedPointsItems.filter(_._2 % evalK == idx).map(_._1) val trainingPointsView = indexedPointsView.filter(_._2 % evalK != idx).map(_._1) val testingPointsView = indexedPointsView.filter(_._2 % evalK == idx).map(_._1) ( new TrainingData(trainingPointsUsers, trainingPointsItems, trainingPointsView), new EmptyEvaluationInfo(), testingPointsUsers.map { p => (new Query(p.features(0), p.features(1), p.features(2)), new ActualResult(p.label)) } ) } class TrainingData(val users: RDD[(String, User)], val items: RDD[(String, Item)], val viewEvents: RDD[ViewEvent]) extends Serializable { override def toString = { s"users: [${users.count()} (${users.take(2).toList}...)]" + s"items: [${items.count()} (${items.take(2).toList}...)]" + s"viewEvents: [${viewEvents.count()}] (${viewEvents.take(2).toList}...)" } } What happens at this part? The red marks correspondend to 'Cannot resolve symbol *' testingPointsUsers.map { p => (new Query(p.features(0), p.features(1), p.features(2)), new ActualResult(p.label)) } Thanks in advance, Dennis
