In short -- you're mostly right. There's a tradeoff between fill-and-swap (no service interruption, but needs 2x memory), and a stop-the-world approach.
FileDataModel usually does an incremental update but will fill-and-swap as you call it when the main file is updated. SlopeOneRecommender does a stop-the-world refresh. If you use both I can see getting into a worst-of-both-worlds situation. I think the way forward is to edit SlopeOneRecommender. It is the less mature bit of code. Stop-the-world semantics aren't so good. I think it would be complex to implement anything but fill-and-swap. But that doubles peak memory requirements, for a component that is definitely memory bound. I don't think there's any other way to do a proper refresh. I can think of ways to do a refresh in-place, but which is not 100% accurate (reload, but don't throw out the old data at all). Maybe that's reasonable -- I haven't thought it through much. Any comments so far? On Fri, Nov 19, 2010 at 8:18 PM, Jordan, Eric <[email protected]>wrote: > Hi, > > We are developing a system that issues recommendations in real-time based > on data from a main data file (say, /tmp/data.lst) together with daily > update files (/tmp/data.1.lst, /tmp/data.2.lst, etc.) We call refresh() on > the SlopeOne recommender when the daily files are updated. We are concerned > about the performance while the daily update files are being loaded, and are > interested in any feedback on what to expect. > > I've been looking through the Mahout code to determine whether Mahout can > make recommendations while the (SlopeOne) recommender is being refreshed. > > From what I can tell, the call to refresh() ends up in > MemoryDiffStorage.buildAverageDiffs(), where the system acquires a write > lock. > This would stall any calls to MemoryDiffStorage.getDiffs(), where the > system acquires a read lock. > So, it looks to me like the MemoryDiffStorage is taking a locking-based > approach, rather than a fill-and-swap approach. > > On the other hand, FileDataModel has a reload() method with: > delegate = buildModel() > Which looks like a fill-and-swap based approach that would allow the system > to seamlessly continue to serve recommendations even while the model is > being refreshed. > > Is this correct? If so, should we be concerned about the locking of the > MemoryDiffStorage? Are there any workarounds? > > Thanks in advance! > > Regards, > > Eric > > > > The information contained in this communication may be CONFIDENTIAL and is > intended only for the use of the recipient(s) named above. If you are not > the intended recipient, you are hereby notified that any dissemination, > distribution, or copying of this communication, or any of its contents, is > strictly prohibited. If you have received this communication in error, > please notify the sender and delete/destroy the original message and any > copy of it from your computer or paper files. >
