So data has to be in "Order By UserId, ItemID, Preference"?  Hmm.. for a
file containing billion rows this may take some time, but if that's what it
wants that's what I will provide.  Please confirm.  Thanks.

On Mon, Jun 4, 2012 at 4:20 PM, Lance Norskog <[email protected]> wrote:

> It needs a complete "ordering", meaning code that takes any two values
> and says "this one before that one". This lets Hadoop do global
> sorting. If they're strings you would sort on the strings.
>
> On Mon, Jun 4, 2012 at 4:00 PM, Something Something
> <[email protected]> wrote:
> > Fair enough.  Just one more question:
> >
> > 1)  >>it just needs to have an ordering
> > The input data doesn't need to be in any particular sequence, correct?
>  Not
> > sure what you mean by 'needs to have an ordering'.
> >
> >
> > On Mon, Jun 4, 2012 at 3:29 PM, Sean Owen <[email protected]> wrote:
> >
> >> That's how it used to work but it was restricted to integers a long time
> >> ago purely for speed and memory. It makes a big difference. Many (most?)
> >> use cases have some numeric ID for these guys already.  Otherwise no
> reason
> >> it needs to be an integer it just needs to have an ordering.
> >>
> >> You can retain the mapping how you like. All you really need are the
> >> original ID values to recreate the mapping as it is just bases on MD5.
> So a
> >> file is sufficient for example. But to do the mapping on the fly it has
> to
> >> be in memory yes or else it is too slow.
> >>
> >> Best is to find a numeric ID to use in your model if you can.
> >>
> >> Myrrix works this way too, if desired, but almost as a feature as the
> >> 'real' IDs need never be sent into the hosted recommender in the cloud,
> >> just a hashed numeric ID. That's nice from a security or privacy
> >> standpoint.
> >>  On Jun 4, 2012 11:05 PM, "Something Something" <
> [email protected]>
> >> wrote:
> >>
> >> > Hmm.. that's a bit weird.  Looking at the algorithm, I don't
> understand
> >> why
> >> > UserID has to be Long.  It's just an Identifier of a row, isn't it?
>  The
> >> > algorithm really only works with Item IDs and even with ItemIDs I
> would
> >> > argue they don't need to be Numeric.  Am I missing something?
> >> >
> >> > We have over billion user ids.  So for each ID I need to create a
> >> > corresponding 'long' value in Memory?  Is that what this class is
> doing?
> >> >
> >> > On Mon, Jun 4, 2012 at 2:50 PM, Manuel Blechschmidt <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi Something,
> >> > > actually this is correct.
> >> > >
> >> > > You can use the MemoryIDMigrator
> >> > >
> >> >
> >>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/MemoryIDMigrator.htmltocreateLongsfrom
>  your strings.
> >> > >
> >> > > /Manuel
> >> > >
> >> > > On 04.06.2012, at 23:47, Something Something wrote:
> >> > >
> >> > > > Trying to use this class.  Noticed that 'UserID' must be Long.
>  That
> >> > > > doesn't sound right.  Isn't there a way to tell this class that
> the
> >> > > > 'UserID' is String?  Please let me know.  Thanks.
> >> > >
> >> > > --
> >> > > Manuel Blechschmidt
> >> > > M.Sc. IT Systems Engineering
> >> > > Dortustr. 57
> >> > > 14467 Potsdam
> >> > > Mobil: 0173/6322621
> >> > > Twitter: http://twitter.com/Manuel_B
> >> > >
> >> > >
> >> >
> >>
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to