OK but that doesn’t make cross-validation of item-based recs worth the effort.

Hold-out tests assume you have queries and conversions you can test. The query 
only uses an item (for similarity) but you have no item-based conversions.

You might be able to hold out users since you have their conversions, then test 
every item they viewed to see if any of the similar items recommended  were 
converted on in the hold-out set, but again this is of dubious benefit.



On May 1, 2017, at 11:10 AM, Dennis Honders <[email protected]> wrote:

Yes, I already read that. But I am currently not able to setup an environment 
to do that. 


Op 1 mei 2017 7:52 PM schreef "Pat Ferrel" <[email protected] 
<mailto:[email protected]>>:
Cross validation for item-based recs is problematic and of dubious value. I’d 
A/B test changes by starting from default and gong from there. 


On May 1, 2017, at 8:34 AM, Dennis Honders <[email protected] 
<mailto:[email protected]>> wrote:

Hi,

I'm currently working on an Evaluator for the Similar product template. I'm not 
a Scala expert. 

I followed the Classification Quickstart which is used for the Evaluator 
tutorial. http://predictionio.incubator.apache.org/evaluation/paramtuning/ 
<http://predictionio.incubator.apache.org/evaluation/paramtuning/>

A RDD of LabeldPoint is used to retrieve data. 
val labeledPoints: RDD[LabeledPoint] = eventsDb.aggregateProperties(...
The Similar product template retrieves like: 
val usersRDD: RDD[(String, User)] = PEventStore.aggregateProperties(...
val itemsRDD: RDD[(String, Item)] = PEventStore.aggregateProperties(
val viewEventsRDD: RDD[ViewEvent] = PEventStore.find(
According to the docs, the above should be the same?

}.cache()
    // End of reading from data store

    // K-fold splitting
    val evalK = dsp.evalK.get
    val indexedPointsUsers: RDD[((String, User), Long)] = 
usersRDD.zipWithIndex()
    val indexedPointsItems: RDD[((String, Item), Long)] = 
itemsRDD.zipWithIndex()
    val indexedPointsView: RDD[(ViewEvent, Long)] = viewEventsRDD.zipWithIndex()

    (0 until evalK).map { idx =>
      val trainingPointsUsers = indexedPointsUsers.filter(_._2 % evalK != 
idx).map(_._1)
      val testingPointsUsers = indexedPointsUsers.filter(_._2 % evalK == 
idx).map(_._1)

      val trainingPointsItems = indexedPointsItems.filter(_._2 % evalK != 
idx).map(_._1)
      val testingPointsItems = indexedPointsItems.filter(_._2 % evalK == 
idx).map(_._1)

      val trainingPointsView = indexedPointsView.filter(_._2 % evalK != 
idx).map(_._1)
      val testingPointsView = indexedPointsView.filter(_._2 % evalK == 
idx).map(_._1)

      (
        new TrainingData(trainingPointsUsers, trainingPointsItems, 
trainingPointsView),
        new EmptyEvaluationInfo(),
        testingPointsUsers.map {
          p => (new Query(p.features(0), p.features(1), p.features(2)), new 
ActualResult(p.label))
        }
      )
    }

class TrainingData(val users: RDD[(String, User)], val items: RDD[(String, 
Item)], val viewEvents: RDD[ViewEvent]) extends Serializable {
  override def toString = {
      s"users: [${users.count()} (${users.take(2).toList}...)]" +
      s"items: [${items.count()} (${items.take(2).toList}...)]" +
      s"viewEvents: [${viewEvents.count()}] (${viewEvents.take(2).toList}...)"
  }
}

What happens at this part? The red marks correspondend to 'Cannot resolve 
symbol *'
 testingPointsUsers.map {
          p => (new Query(p.features(0), p.features(1), p.features(2)), new 
ActualResult(p.label))
        }



Thanks in advance, 

Dennis


Reply via email to