Shem, What Steven says is very much correct. I have used web logs several times for recommendations with very good results.
I would add to Steven's comment about how to interpret user actions that you really need to think about what action indicates user interest. It is common to use clicks for this, but that is commonly not so good. It is better to have something more than a quick impulse that indicates whether the user actually engages with the item. It is also very important to keep track of what items that users had an opportunity to engage with. This ultimately helps with a lot of problems. It helps you figure out who is a spammer. It also helps you determine actual level of interest. I recommend that you group all interactions and impressions by user id or by session id and order by time. That will let you extract features from the session. One important session feature is how long somebody actually spent on the item you might be recommending. If they went to another item very quickly, that indicates lack of engagement. It is very common that the logs you have initially don't contain the high quality information you want. For instance, you might have a search engine that you are trying to improve by looking at what people click on. Your log might include the search query and the clicks, but it probably doesn't include the top 20 items from the search, nor an indicator of how long the user spent on the page they clicked to. The click is motivated by the snippet you show, but that is a very noisy indicator of the content so it can be misleading. The improvements you could make to logs like this would be to log all of the results that the user sees and to put a timer beacon on the first level clicked page that tells you when the user has spent 20 seconds on that second page. If you see search, impression, click and beacon, then you know that the page has some real interest. You can get started with your initial log contents here, but getting augmented data would help much more. As a secondary point, measuring engagement instead of initial interest can actually make the actual recommendation process faster as well because the data size goes down dramatically. On Sat, May 7, 2011 at 12:02 PM, Steven Bourke <[email protected]> wrote: > Hi Shem, > > I've tried something similar, and it is indeed more than possible. The real > problems comes down to how you'll actually interpret user interactions on > the site. A users session may vary drastically across multiple different > sessions, also if you are just tracking by IP address you may lose the real > personalisation aspect. In my case I used a IP, Webpage representation and > recommended based on the most popular items. > > Seems to be sufficient. > > 2011/5/7 Danny Leshem <[email protected]> > > > (18) קצת מזכיר לי את דרבי בר... אבל לא נראה לי שזה קשור מחיפוש באינטרנט. > > (15) זה כמובן ננוצ'קה בלילינבלום. > > > > -----Original Message----- > > From: Shem Cristobal [mailto:[email protected]] > > Sent: Saturday, May 07, 2011 15:41 > > To: [email protected] > > Subject: Anyone Experienced in HTTP Logs as Data Source for > Recommendations > > > > Dear All, we are hoping to generate a recommendation from HTTP logs of a > > certain web site. Is this even advisable? What sort of recommendations > have > > you experienced using such HTTP logs? Thanks a lot! > > > > > > > > Best regards, > > > > @shemcristobal > > > > >
