Hmm, I looked at the Kaggle description and there is no record of the queries except for terms. So we do not know the results of the search, only the skus clicked. Unless I missed something this is a problem for my use.
However the search terms are known so Ted's use case might work. You can infer the search from the data, just not all search results. On Apr 16, 2013, at 1:24 PM, Pat Ferrel <[email protected]> wrote: I think Ted is talking about a different application of this idea: http://www.slideshare.net/tdunning/search-as-recommendation The IDs in my case must be in the same space, at very least the user IDs need to match. Since in retail apps views are a superset of purchases--all purchases are viewed but not the other way around. So we need data something like all clicked items are seen in a search result but not all seen items are clicked. If we have user IDs for all item IDs--either clicked or shown in a search result--we are ok. The data does get split but by action and the user ID space is the same: Set 1: userID, skuID, 1 for clicked action Set 2: userID, skuID, 1 for search result item viewed (one preference for each item in the results page one user ID for all items seen) The technique is interesting because it's flexible and in effect learns multi-hop inferences. On Apr 16, 2013, at 11:56 AM, Nick Kolegraff <[email protected]> wrote: yep, product images are there as well. have a go here: https://bbyopen.com/documentation/products-api/product-attributes#TableImages What Ted said. (if I understand correctly) You could create two datasets from the one with: A dataset with: userID,skuID,1 Another with: searchID,skuID,1 Timestamps are there too if you want to get clever with preferences rather than binary. (i'd scrub the search terms before mapping them to IDs too) I also have an image with all this data packaged on AWS inside a postgres database if you wanted to fart around with it. in public images just do a search for "ACM hackathon" and you should see it. Feel free to ping me off list with specific questions on that. On Tue, Apr 16, 2013 at 10:29 AM, Ted Dunning <[email protected]> wrote: > Primary action can be emitting a search term. Secondary can be click to > view. > > > On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel <[email protected]> wrote: > >> For the cross-recommender we need some replacement for a primary >> action--purchases and a secondary action--views, clicks, impressions, >> something. >> >> To use this data we would treat clicks like a purchase--the primary > action >> we want to recommend. Then the search-result-item-impressions is like a >> view in the x-recommender description. In this case the SRII is an item >> seen on a search results page. Each SRII would come with a user ID, > itemID, >> and implicit preference. Clicks also come with userID, itemID and > implicit >> preference. >> >> The cross-recommender would have the effect of finding click >> recommendations from search result item impressions. At very least this >> seems like a way to use clicks to re-rank search results. >> >> Is this good enough for testing the x-recommender algo? Do we have SRIIs >> with item ID and user ID? Maybe there are product page URLs we can use as >> item ids? I'll look, thanks. >> >> >> On Apr 15, 2013, at 5:52 PM, Nick Kolegraff <[email protected]> >> wrote: >> >> Hey Guys, >> This is a dataset that kinda fits the bill, sorta -- probably the closest >> thing out there. I got this extracted from BestBuy. Now, while it is > more >> focused on 'search' opposed to recommendations...could probably double > for >> a recs problem. >> >> basically, each userid is mapped to a query that resulted in a click on a >> particular sku (product_id). They are the real skus as well, so they can >> map back to real products in their products api (this data is also > provided >> in bulk on kaggle): >> >> https://bbyopen.com/api-profiles/products-api >> http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data >> >> >> On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel <[email protected]> > wrote: >> >>> MAJOR may be too tame a word. >>> >>> Furthermore there are several enhancements the community could make to >>> support retail data and retail recommenders. For one thing without > public >>> data a *public* cross-recommender will probably not get built. >>> >>> The cross-recommender needs to separate actions types and use them in >>> slightly different ways so it is important to have a data set with > user's >>> purchases but also views, add-to-cart, impressions, purchases in > groups >>> (shopping carts)--whatever events are available with anonymized user > IDs. >>> >>> This data set would be significant in getting new techniques into the >>> community and therefore back to people like you. >>> >>> On Apr 15, 2013, at 9:49 AM, Koobas <[email protected]> wrote: >>> >>> Definitely of MAJOR interest. >>> I am sure it would also draw all kinds of desired attention to your >>> business. >>> Movie Lens is way too small to be meaningful any more. >>> Wikipedia articles and Stackoverflow tags are not retail data! >>> By all means, post some real retail data, if you can. >>> Meaningful sizes would be appreciated: millions of customers, >>> thousands - tens of thousands products. >>> >>> >>> On Mon, Apr 15, 2013 at 12:27 PM, Robin Morris <[email protected]> > wrote: >>> >>>> I asked management here a while ago whether there would be a problem >> with >>>> releasing an anonymized set of data from one of our retail customers, >> and >>>> didn't get too much push-back. If this is something that would be of >>>> major interest, I can ask again and see whether there's something we > can >>>> put out as a community resource. >>>> >>>> Robin >>>> >>>> >>>> On 4/10/13 8:37 PM, "Pat Ferrel" <[email protected]> wrote: >>>> >>>>> I have retail data but can't publish results from it. If I could get > a >>>>> public sample I'd share how the technique worked out. >>>>> >>>>> Not sure how to simulate this data. It has the important > characteristic >>>>> that every purchase is also a view but not the other way around and >>> Ted's >>>>> technique is a way to scrub the views that don't lead to purchases. > All >>>>> these are implicit preferences but that's not the important part for >>> this >>>>> technique. >>>>> >>>>> On Apr 10, 2013, at 4:15 PM, Koobas <[email protected]> wrote: >>>>> >>>>> Retail data may be hard to impossible, but one can improvise. >>>>> It seems to be fairly common to use Wikipedia articles (Myrrix, >>> GraphLab). >>>>> Another idea is to use StackOverflow tags (Myrrix examples). >>>>> Although they are only good for emulating implicit feedback. >>>>> >>>>> >>>>> On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning <[email protected]> >>>>> wrote: >>>>> >>>>>> On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel <[email protected] >> >>>>>> wrote: >>>>>> >>>>>>> Does anyone know of a public data set that provides things like > views >>>>>>> and >>>>>>> purchases? >>>>>>> >>>>>> >>>>>> I don't. >>>>>> >>>>> >>>> >>>> >>> >>> >> >> >
