I split answers in 2 since the config is a completely separate thing.

increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is 
making the model fuzzier, for lack of a better term. I fact we’d like to 
restrict the correlators to only the best and maxCorrelatorsPerEventType is a 
crude way to do this that is worse the more you allow. Another new method is an 
LLR threshold, which can be set per indicator to use the correlation value as a 
threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take 
the top ones even if their scores are low. This is why making this number big 
will not make things better because it will include more of lower quality.

Also maxEventsPerEventType increases memory usage and takes far longer to 
calculate the model for very little if any gain. This is from a paper by 
Sebastian Schelter, one of the inventors of CCO 
https://ssc.io/pdf/rec11-schelter.pdf <https://ssc.io/pdf/rec11-schelter.pdf>

I’d leave those as defaulted and measure a baseline KPI before doing A/B tests 
or cross-validation to try different numbers there.


On May 24, 2017, at 8:28 AM, Dennis Honders <[email protected]> wrote:

Current data: 

{"event": "cart-transaction", "entityId": "1", "entityType": "user", 
"targetEntityId": "12", "targetEntityType": "item"}, 

{"event": "$set", "entityType": "item", "entityId": "12", "properties": 
{"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": 
"test", "price": "$1-$2"}}

Questions: 

Cart-transaction is the primary for shopping cart recommendation, maybe use 
user-buy-item as secondary event or is there no link between this?

Item-based queries are for similar items. For shopping cart recommendations, 
complementary recommendations will suite better? If so, those are made by 
'user-id' (cart-id). How can this be done?

I like to do content-based recommendation for items that haven't been in a 
transaction. I think this can be configured in the engine.json. Any advice for 
doing this?

Engine.json: 

{
  "comment":" This config file uses default settings for all but the required 
values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "ur-name",
      "appName": "Test",
      "eventNames": ["cart-transaction"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": 
"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer.mb": "300",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based 
backfill, must add eventsNames",
      "name": "ur",
      "params": {
                "appName": "Test",
                "indexName": "test",
                "typeName": "cart",
                "comment": "must have data for the first event or the model 
will not build, other events are optional",
                "eventNames": ["cart-transaction"],
                "maxEventsPerEventType": 50000,
                "maxCorrelatorsPerEventType": 5000,
                "num": 10, 
                "itemBias": 2.0,
                "rankings": [{
                        "name": "preferredRank",
                        "type": "userDefined"
                }]
      }
    }
  ]
}


Reply via email to