For the cross-recommender we need some replacement for a primary 
action--purchases and a secondary action--views, clicks, impressions, something.

To use this data we would treat clicks like a purchase--the primary action we 
want to recommend. Then the search-result-item-impressions is like a view in 
the x-recommender description. In this case the SRII is an item seen on a 
search results page. Each SRII would come with a user ID, itemID, and implicit 
preference. Clicks also come with userID, itemID and implicit preference. 

The cross-recommender would have the effect of finding click recommendations 
from search result item impressions. At very least this seems like a way to use 
clicks to re-rank search results.

Is this good enough for testing the x-recommender algo? Do we have SRIIs with 
item ID and user ID? Maybe there are product page URLs we can use as item ids? 
I'll look, thanks.


On Apr 15, 2013, at 5:52 PM, Nick Kolegraff <[email protected]> wrote:

Hey Guys,
This is a dataset that kinda fits the bill, sorta -- probably the closest
thing out there.  I got this extracted from BestBuy.  Now, while it is more
focused on 'search' opposed to recommendations...could probably double for
a recs problem.

basically, each userid is mapped to a query that resulted in a click on a
particular sku (product_id).  They are the real skus as well, so they can
map back to real products in their products api (this data is also provided
in bulk on kaggle):

https://bbyopen.com/api-profiles/products-api
http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data


On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel <[email protected]> wrote:

> MAJOR may be too tame a word.
> 
> Furthermore there are several enhancements the community could make to
> support retail data and retail recommenders. For one thing without public
> data a *public* cross-recommender will probably not get built.
> 
> The cross-recommender needs to separate actions types and use them in
> slightly different ways so it is important to have a data set with user's
> purchases  but also views, add-to-cart, impressions, purchases in groups
> (shopping carts)--whatever events are available with anonymized user IDs.
> 
> This data set would be significant in getting new techniques into the
> community and therefore back to people like you.
> 
> On Apr 15, 2013, at 9:49 AM, Koobas <[email protected]> wrote:
> 
> Definitely of MAJOR interest.
> I am sure it would also draw all kinds of desired attention to your
> business.
> Movie Lens is way too small to be meaningful any more.
> Wikipedia articles and Stackoverflow tags are not retail data!
> By all means, post some real retail data, if you can.
> Meaningful sizes would be appreciated: millions of customers,
> thousands - tens of thousands products.
> 
> 
> On Mon, Apr 15, 2013 at 12:27 PM, Robin Morris <[email protected]> wrote:
> 
>> I asked management here a while ago whether there would be a problem with
>> releasing an anonymized set of data from one of our retail customers, and
>> didn't get too much push-back.  If this is something that would be of
>> major interest, I can ask again and see whether there's something we can
>> put out as a community resource.
>> 
>> Robin
>> 
>> 
>> On 4/10/13 8:37 PM, "Pat Ferrel" <[email protected]> wrote:
>> 
>>> I have retail data but can't publish results from it. If I could get a
>>> public sample I'd share how the technique worked out.
>>> 
>>> Not sure how to simulate this data. It has the important characteristic
>>> that every purchase is also a view but not the other way around and
> Ted's
>>> technique is a way to scrub the views that don't lead to purchases. All
>>> these are implicit preferences but that's not the important part for
> this
>>> technique.
>>> 
>>> On Apr 10, 2013, at 4:15 PM, Koobas <[email protected]> wrote:
>>> 
>>> Retail data may be hard to impossible, but one can improvise.
>>> It seems to be fairly common to use Wikipedia articles (Myrrix,
> GraphLab).
>>> Another idea is to use StackOverflow tags (Myrrix examples).
>>> Although they are only good for emulating implicit feedback.
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning <[email protected]>
>>> wrote:
>>> 
>>>> On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel <[email protected]>
>>>> wrote:
>>>> 
>>>>> Does anyone know of a public data set that provides things like views
>>>>> and
>>>>> purchases?
>>>>> 
>>>> 
>>>> I don't.
>>>> 
>>> 
>> 
>> 
> 
> 

Reply via email to