My plan was to NOT use lucene to start with though I see the benefits. This is because I want to experiment with weighting--doing idf, no weighting, and with a non-log idf. Also I want to experiment with temporal decay of recomendability and maybe blend item similarity based results in certain cases. Getting the raw recs is important to the experiment so I'd like to use mahout cf/taste if possible.
Therefore my discussion was assuming the use of the entire mahout cf/taste framework even in the retrieving of recs. In that light B'B h_p is just another way of stating the usual train with user and items purchased then get recs for users and so that part of recs is covered. Since this also supports item similarity based queries (no user in the query) I'm covered. As to the B'A h_v part, isn't that just replacing where cf/taste would calc B'B, the self-join matrix, with the result of B'A? To use cf/taste you would ingest the user and items viewed data to create h_v for all users, but instead of allowing cf/taste to calculate B'B you would replace it with B'A. Then at query time taste would take a user and return purchase recs by calculating B'A h_v? isn't this correct? I hope someone comments on this because it is the route I plan to explore. The downside is that without lucene I would have two (or more) sets of recs to blend. I can make lucene return the raw recs fields but not sure how to return similarity based queries with lucene and don't really want to tackle that just yet (keep it simple?) Also in using cf/taste I don't need to create rows of the combined matrix, I can treat them as independent recommenders, which means I can tune them independently. I still have questions about how to generate the view values of A but that's another discussion. Then the question is how to blend B'A h_v with B'B h_p? The range of both of these will be identical. Each row of [B'A | B'B] corresponds to a document. One field (the view=>purchase indicators) contains a row of B'A and another field (the purchase=>purchase indicators) will contain a row of B'B. The query will ultimately contain two fields corresponding to recent views and recent purchases. The search engine will combine the scores from these intelligently without any effort on your part. You can tune how this works, but I haven't ever found that very useful.
