I think that this is OK for cross recommendation.

We have user x term (the query history)

And we have user x sku (the click history).

We don't even care if the click came from a particular search effort.  We
merely want to associate behavior 1 (emitting search terms) to behavior 2
(clicking on skus).  This data should suffice for that.


On Tue, Apr 16, 2013 at 9:11 PM, Nick Kolegraff <[email protected]>wrote:

> You are correct.  We only know the skus clicked.
>
>
> On Tue, Apr 16, 2013 at 1:46 PM, Pat Ferrel <[email protected]> wrote:
>
> > Hmm, I looked at the Kaggle description and there is no record of the
> > queries except for terms. So we do not know the results of the search,
> only
> > the skus clicked. Unless I missed something this is a problem for my use.
> >
> > However the search terms are known so Ted's use case might work. You can
> > infer the search from the data, just not all search results.
> >
> > On Apr 16, 2013, at 1:24 PM, Pat Ferrel <[email protected]> wrote:
> >
> > I think Ted is talking about a different application of this idea:
> > http://www.slideshare.net/tdunning/search-as-recommendation
> >
> > The IDs in my case must be in the same space, at very least the user IDs
> > need to match. Since in retail apps views are a superset of
> purchases--all
> > purchases are viewed but not the other way around. So we need data
> > something like all clicked items are seen in a search result but not all
> > seen items are clicked. If we have user IDs for all item IDs--either
> > clicked or shown in a search result--we are ok.
> >
> > The data does get split but by action and the user ID space is the same:
> > Set 1: userID, skuID, 1 for clicked action
> > Set 2: userID, skuID, 1 for search result item viewed (one preference for
> > each item in the results page one user ID for all items seen)
> >
> > The technique is interesting because it's flexible and in effect learns
> > multi-hop inferences.
> >
> > On Apr 16, 2013, at 11:56 AM, Nick Kolegraff <[email protected]>
> > wrote:
> >
> > yep, product images are there as well.
> > have a go here:
> >
> >
> https://bbyopen.com/documentation/products-api/product-attributes#TableImages
> >
> > What Ted said. (if I understand correctly)
> > You could create two datasets from the one with:
> >
> > A dataset with:
> > userID,skuID,1
> >
> > Another with:
> > searchID,skuID,1
> >
> > Timestamps are there too if you want to get clever with preferences
> rather
> > than binary.
> > (i'd scrub the search terms before mapping them to IDs too)
> >
> > I also have an image with all this data packaged on AWS inside a postgres
> > database if you wanted to fart around with it.
> > in public images just do a search for "ACM hackathon" and you should see
> > it.   Feel free to ping me off list with specific questions on that.
> >
> >
> > On Tue, Apr 16, 2013 at 10:29 AM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > Primary action can be emitting a search term.  Secondary can be click
> to
> > > view.
> > >
> > >
> > > On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel <[email protected]>
> > wrote:
> > >
> > >> For the cross-recommender we need some replacement for a primary
> > >> action--purchases and a secondary action--views, clicks, impressions,
> > >> something.
> > >>
> > >> To use this data we would treat clicks like a purchase--the primary
> > > action
> > >> we want to recommend. Then the search-result-item-impressions is like
> a
> > >> view in the x-recommender description. In this case the SRII is an
> item
> > >> seen on a search results page. Each SRII would come with a user ID,
> > > itemID,
> > >> and implicit preference. Clicks also come with userID, itemID and
> > > implicit
> > >> preference.
> > >>
> > >> The cross-recommender would have the effect of finding click
> > >> recommendations from search result item impressions. At very least
> this
> > >> seems like a way to use clicks to re-rank search results.
> > >>
> > >> Is this good enough for testing the x-recommender algo? Do we have
> SRIIs
> > >> with item ID and user ID? Maybe there are product page URLs we can use
> > as
> > >> item ids? I'll look, thanks.
> > >>
> > >>
> > >> On Apr 15, 2013, at 5:52 PM, Nick Kolegraff <[email protected]>
> > >> wrote:
> > >>
> > >> Hey Guys,
> > >> This is a dataset that kinda fits the bill, sorta -- probably the
> > closest
> > >> thing out there.  I got this extracted from BestBuy.  Now, while it is
> > > more
> > >> focused on 'search' opposed to recommendations...could probably double
> > > for
> > >> a recs problem.
> > >>
> > >> basically, each userid is mapped to a query that resulted in a click
> on
> > a
> > >> particular sku (product_id).  They are the real skus as well, so they
> > can
> > >> map back to real products in their products api (this data is also
> > > provided
> > >> in bulk on kaggle):
> > >>
> > >> https://bbyopen.com/api-profiles/products-api
> > >> http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data
> > >>
> > >>
> > >> On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel <[email protected]>
> > > wrote:
> > >>
> > >>> MAJOR may be too tame a word.
> > >>>
> > >>> Furthermore there are several enhancements the community could make
> to
> > >>> support retail data and retail recommenders. For one thing without
> > > public
> > >>> data a *public* cross-recommender will probably not get built.
> > >>>
> > >>> The cross-recommender needs to separate actions types and use them in
> > >>> slightly different ways so it is important to have a data set with
> > > user's
> > >>> purchases  but also views, add-to-cart, impressions, purchases in
> > > groups
> > >>> (shopping carts)--whatever events are available with anonymized user
> > > IDs.
> > >>>
> > >>> This data set would be significant in getting new techniques into the
> > >>> community and therefore back to people like you.
> > >>>
> > >>> On Apr 15, 2013, at 9:49 AM, Koobas <[email protected]> wrote:
> > >>>
> > >>> Definitely of MAJOR interest.
> > >>> I am sure it would also draw all kinds of desired attention to your
> > >>> business.
> > >>> Movie Lens is way too small to be meaningful any more.
> > >>> Wikipedia articles and Stackoverflow tags are not retail data!
> > >>> By all means, post some real retail data, if you can.
> > >>> Meaningful sizes would be appreciated: millions of customers,
> > >>> thousands - tens of thousands products.
> > >>>
> > >>>
> > >>> On Mon, Apr 15, 2013 at 12:27 PM, Robin Morris <[email protected]>
> > > wrote:
> > >>>
> > >>>> I asked management here a while ago whether there would be a problem
> > >> with
> > >>>> releasing an anonymized set of data from one of our retail
> customers,
> > >> and
> > >>>> didn't get too much push-back.  If this is something that would be
> of
> > >>>> major interest, I can ask again and see whether there's something we
> > > can
> > >>>> put out as a community resource.
> > >>>>
> > >>>> Robin
> > >>>>
> > >>>>
> > >>>> On 4/10/13 8:37 PM, "Pat Ferrel" <[email protected]> wrote:
> > >>>>
> > >>>>> I have retail data but can't publish results from it. If I could
> get
> > > a
> > >>>>> public sample I'd share how the technique worked out.
> > >>>>>
> > >>>>> Not sure how to simulate this data. It has the important
> > > characteristic
> > >>>>> that every purchase is also a view but not the other way around and
> > >>> Ted's
> > >>>>> technique is a way to scrub the views that don't lead to purchases.
> > > All
> > >>>>> these are implicit preferences but that's not the important part
> for
> > >>> this
> > >>>>> technique.
> > >>>>>
> > >>>>> On Apr 10, 2013, at 4:15 PM, Koobas <[email protected]> wrote:
> > >>>>>
> > >>>>> Retail data may be hard to impossible, but one can improvise.
> > >>>>> It seems to be fairly common to use Wikipedia articles (Myrrix,
> > >>> GraphLab).
> > >>>>> Another idea is to use StackOverflow tags (Myrrix examples).
> > >>>>> Although they are only good for emulating implicit feedback.
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning <
> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel <
> [email protected]
> > >>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Does anyone know of a public data set that provides things like
> > > views
> > >>>>>>> and
> > >>>>>>> purchases?
> > >>>>>>>
> > >>>>>>
> > >>>>>> I don't.
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
>

Reply via email to