Thank you. Passing the random seed rather than the projected matrix is
a cool trick. I'm sure I would have figured it out before the heat
death of the universe.

Looking at it again, I can split out some of the processing into
parallel chains and use the Vector classes. This would allow the
output of this to play with other Mahout tools.

The use case is something called 'semantic vectors' to store
recommendations. The idea is to have two random vectors, one for users
and one for items. All users are projected to random positions between
0 and 1. All items are also projected to random positions between 0
and 1. Now, we have the users pull or push Items away from themselves,
using the preference values. This creates an item vector which is
slightly perturbed according to the user/item preferences. Drawing
this displays the concept well; it is intuitively simple which is why
I can understand it.

Do this paired projection/perturbance 100-150-200 times and combine
all of the user vectors and item vectors into parallel N-dimensional
spaces. The rearrangements collectively create a reflection of the
pref values: the ratios of a user's pref list should be roughly
equivalent to the distance between the user's random vector and each
item's perturbed vector. Now, the web of perturbance by all users
against at least one item provides each user with a good distance to
all items.

Now, to recommend items for a user, do a nearest-neighbor check for
the user against all items.

This idea came from the "Semantic Vectors" project on Google code:
http://code.google.com/p/semanticvectors/
They use this algorithm for document/term collocation: projected
documents perturb projected terms.

Enough for this braindump.

Lance

On Wed, Oct 13, 2010 at 7:22 AM, Ted Dunning <[email protected]> wrote:
> See
> http://tdunning.blogspot.com/2010/10/why-is-sum-of-two-uniform-randoms-not.html
>
> On Tue, Oct 12, 2010 at 11:07 PM, Ted Dunning <[email protected]> wrote:
>
>> I will put up a more detailed explanation on my blog where I can draw
>> pretty pictures and write mathematical notation, but the
>> crux of the argument that if you are adding two random variables x and y,
>> then the region where there is non-zero probability is
>> the square [0,1] x [0,1].
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to