Re: Similarity between users' groups

Ted Dunning Fri, 18 Feb 2011 09:20:56 -0800

A better way to sample is to find groups with a very large number of users
and downsample the number of users to a maximum of about 1000 (or even 200
if you want to be more aggressive).  Do the same with users.

That won't delete a whole lot data volume, but it will make most
recommendation algorithms go much faster.  The idea is that after you have
200 or more users in a group, you aren't learning anything new anyway.

On Fri, Feb 18, 2011 at 7:41 AM, Radek Maciaszek
<[email protected]>wrote:

>  Each user can belong to
> many groups so the number of combinations is rather big. In fact this
> number
> of combinations is so large I am considering to sample the users and only
> analyse 1 in about 256 users. So essentially I would have about 1000+
> groups
> and about 150k users. Since one user can potentially belong to many dozens
> of groups this will easily go into millions of records anyway but perhaps
> will be lower than 100M margin you mentioned.
>

Re: Similarity between users' groups

Reply via email to