I think that this is OK for cross recommendation. We have user x term (the query history)
And we have user x sku (the click history). We don't even care if the click came from a particular search effort. We merely want to associate behavior 1 (emitting search terms) to behavior 2 (clicking on skus). This data should suffice for that. On Tue, Apr 16, 2013 at 9:11 PM, Nick Kolegraff <[email protected]>wrote: > You are correct. We only know the skus clicked. > > > On Tue, Apr 16, 2013 at 1:46 PM, Pat Ferrel <[email protected]> wrote: > > > Hmm, I looked at the Kaggle description and there is no record of the > > queries except for terms. So we do not know the results of the search, > only > > the skus clicked. Unless I missed something this is a problem for my use. > > > > However the search terms are known so Ted's use case might work. You can > > infer the search from the data, just not all search results. > > > > On Apr 16, 2013, at 1:24 PM, Pat Ferrel <[email protected]> wrote: > > > > I think Ted is talking about a different application of this idea: > > http://www.slideshare.net/tdunning/search-as-recommendation > > > > The IDs in my case must be in the same space, at very least the user IDs > > need to match. Since in retail apps views are a superset of > purchases--all > > purchases are viewed but not the other way around. So we need data > > something like all clicked items are seen in a search result but not all > > seen items are clicked. If we have user IDs for all item IDs--either > > clicked or shown in a search result--we are ok. > > > > The data does get split but by action and the user ID space is the same: > > Set 1: userID, skuID, 1 for clicked action > > Set 2: userID, skuID, 1 for search result item viewed (one preference for > > each item in the results page one user ID for all items seen) > > > > The technique is interesting because it's flexible and in effect learns > > multi-hop inferences. > > > > On Apr 16, 2013, at 11:56 AM, Nick Kolegraff <[email protected]> > > wrote: > > > > yep, product images are there as well. > > have a go here: > > > > > https://bbyopen.com/documentation/products-api/product-attributes#TableImages > > > > What Ted said. (if I understand correctly) > > You could create two datasets from the one with: > > > > A dataset with: > > userID,skuID,1 > > > > Another with: > > searchID,skuID,1 > > > > Timestamps are there too if you want to get clever with preferences > rather > > than binary. > > (i'd scrub the search terms before mapping them to IDs too) > > > > I also have an image with all this data packaged on AWS inside a postgres > > database if you wanted to fart around with it. > > in public images just do a search for "ACM hackathon" and you should see > > it. Feel free to ping me off list with specific questions on that. > > > > > > On Tue, Apr 16, 2013 at 10:29 AM, Ted Dunning <[email protected]> > > wrote: > > > > > Primary action can be emitting a search term. Secondary can be click > to > > > view. > > > > > > > > > On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel <[email protected]> > > wrote: > > > > > >> For the cross-recommender we need some replacement for a primary > > >> action--purchases and a secondary action--views, clicks, impressions, > > >> something. > > >> > > >> To use this data we would treat clicks like a purchase--the primary > > > action > > >> we want to recommend. Then the search-result-item-impressions is like > a > > >> view in the x-recommender description. In this case the SRII is an > item > > >> seen on a search results page. Each SRII would come with a user ID, > > > itemID, > > >> and implicit preference. Clicks also come with userID, itemID and > > > implicit > > >> preference. > > >> > > >> The cross-recommender would have the effect of finding click > > >> recommendations from search result item impressions. At very least > this > > >> seems like a way to use clicks to re-rank search results. > > >> > > >> Is this good enough for testing the x-recommender algo? Do we have > SRIIs > > >> with item ID and user ID? Maybe there are product page URLs we can use > > as > > >> item ids? I'll look, thanks. > > >> > > >> > > >> On Apr 15, 2013, at 5:52 PM, Nick Kolegraff <[email protected]> > > >> wrote: > > >> > > >> Hey Guys, > > >> This is a dataset that kinda fits the bill, sorta -- probably the > > closest > > >> thing out there. I got this extracted from BestBuy. Now, while it is > > > more > > >> focused on 'search' opposed to recommendations...could probably double > > > for > > >> a recs problem. > > >> > > >> basically, each userid is mapped to a query that resulted in a click > on > > a > > >> particular sku (product_id). They are the real skus as well, so they > > can > > >> map back to real products in their products api (this data is also > > > provided > > >> in bulk on kaggle): > > >> > > >> https://bbyopen.com/api-profiles/products-api > > >> http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data > > >> > > >> > > >> On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel <[email protected]> > > > wrote: > > >> > > >>> MAJOR may be too tame a word. > > >>> > > >>> Furthermore there are several enhancements the community could make > to > > >>> support retail data and retail recommenders. For one thing without > > > public > > >>> data a *public* cross-recommender will probably not get built. > > >>> > > >>> The cross-recommender needs to separate actions types and use them in > > >>> slightly different ways so it is important to have a data set with > > > user's > > >>> purchases but also views, add-to-cart, impressions, purchases in > > > groups > > >>> (shopping carts)--whatever events are available with anonymized user > > > IDs. > > >>> > > >>> This data set would be significant in getting new techniques into the > > >>> community and therefore back to people like you. > > >>> > > >>> On Apr 15, 2013, at 9:49 AM, Koobas <[email protected]> wrote: > > >>> > > >>> Definitely of MAJOR interest. > > >>> I am sure it would also draw all kinds of desired attention to your > > >>> business. > > >>> Movie Lens is way too small to be meaningful any more. > > >>> Wikipedia articles and Stackoverflow tags are not retail data! > > >>> By all means, post some real retail data, if you can. > > >>> Meaningful sizes would be appreciated: millions of customers, > > >>> thousands - tens of thousands products. > > >>> > > >>> > > >>> On Mon, Apr 15, 2013 at 12:27 PM, Robin Morris <[email protected]> > > > wrote: > > >>> > > >>>> I asked management here a while ago whether there would be a problem > > >> with > > >>>> releasing an anonymized set of data from one of our retail > customers, > > >> and > > >>>> didn't get too much push-back. If this is something that would be > of > > >>>> major interest, I can ask again and see whether there's something we > > > can > > >>>> put out as a community resource. > > >>>> > > >>>> Robin > > >>>> > > >>>> > > >>>> On 4/10/13 8:37 PM, "Pat Ferrel" <[email protected]> wrote: > > >>>> > > >>>>> I have retail data but can't publish results from it. If I could > get > > > a > > >>>>> public sample I'd share how the technique worked out. > > >>>>> > > >>>>> Not sure how to simulate this data. It has the important > > > characteristic > > >>>>> that every purchase is also a view but not the other way around and > > >>> Ted's > > >>>>> technique is a way to scrub the views that don't lead to purchases. > > > All > > >>>>> these are implicit preferences but that's not the important part > for > > >>> this > > >>>>> technique. > > >>>>> > > >>>>> On Apr 10, 2013, at 4:15 PM, Koobas <[email protected]> wrote: > > >>>>> > > >>>>> Retail data may be hard to impossible, but one can improvise. > > >>>>> It seems to be fairly common to use Wikipedia articles (Myrrix, > > >>> GraphLab). > > >>>>> Another idea is to use StackOverflow tags (Myrrix examples). > > >>>>> Although they are only good for emulating implicit feedback. > > >>>>> > > >>>>> > > >>>>> On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning < > [email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel < > [email protected] > > >> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Does anyone know of a public data set that provides things like > > > views > > >>>>>>> and > > >>>>>>> purchases? > > >>>>>>> > > >>>>>> > > >>>>>> I don't. > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >>> > > >> > > >> > > > > > > > > > >
