I agree with your reading of what the Herlocker paper is saying. The paper is focused on producing one estimated rating, not recommendations. While those tasks are related -- recommendations are those with the highest estimated ratings -- translating what's in Herlocker directly to a recommendation algorithm is a significant jump.
Done that way, you start with all items as the set of candidate recommendations, and for each, construct a neighborhood to estimate a rating. Even with intelligent caching of user-user similarity -- the framework does this -- this is orders of magnitude slower. It's possible, but I don't think it's realistic in practice. Instead I had always assumed the extension to an actual algorithm was to let one neighborhood define the set of candidate items. The issue isn't quite coverage, I think. If a user has no similarity to any user, there can be no neighborhood, under any approach, and no recommendations. If there is any neighborhood, recommendations can be made. The issue here it seems is including some particular item in the recommendation, which is included in *some* neighborhood but not all neighborhoods. You give a good example where an item that, intuitively, should be recommended is not a candidate for recommendation. I think there are equal examples of this idea going wrong. Say that the most similar users all have a similarity near -1. Under a simple threshold-based neighborhood approach, no recommendations would be made, although, there is indeed *some* neighborhood including those dissimilar users from which recommendations could be made. But those aren't, likely, good recommendations. This is why I believe it's not in general a good idea to construct, for each item, *some* neighborhood that finds that items and predict from there. I can't say I've tested that claim though. But what the Herlocker paper suggests, and I agree with, is that using threshold-based definitions of neighborhoods is a good idea. And then I think that the practical difference between constructing one neighborhood and getting candidate items from there, versus constructing a neighborhood for every item, is probably small. Again, haven't tested that claim directly. That's why I think the current implementation is OK, and at least innocent until proven guilty, and why I also believe that this is the canonical approach as well.
