Thank you. Passing the random seed rather than the projected matrix is a cool trick. I'm sure I would have figured it out before the heat death of the universe.
Looking at it again, I can split out some of the processing into parallel chains and use the Vector classes. This would allow the output of this to play with other Mahout tools. The use case is something called 'semantic vectors' to store recommendations. The idea is to have two random vectors, one for users and one for items. All users are projected to random positions between 0 and 1. All items are also projected to random positions between 0 and 1. Now, we have the users pull or push Items away from themselves, using the preference values. This creates an item vector which is slightly perturbed according to the user/item preferences. Drawing this displays the concept well; it is intuitively simple which is why I can understand it. Do this paired projection/perturbance 100-150-200 times and combine all of the user vectors and item vectors into parallel N-dimensional spaces. The rearrangements collectively create a reflection of the pref values: the ratios of a user's pref list should be roughly equivalent to the distance between the user's random vector and each item's perturbed vector. Now, the web of perturbance by all users against at least one item provides each user with a good distance to all items. Now, to recommend items for a user, do a nearest-neighbor check for the user against all items. This idea came from the "Semantic Vectors" project on Google code: http://code.google.com/p/semanticvectors/ They use this algorithm for document/term collocation: projected documents perturb projected terms. Enough for this braindump. Lance On Wed, Oct 13, 2010 at 7:22 AM, Ted Dunning <[email protected]> wrote: > See > http://tdunning.blogspot.com/2010/10/why-is-sum-of-two-uniform-randoms-not.html > > On Tue, Oct 12, 2010 at 11:07 PM, Ted Dunning <[email protected]> wrote: > >> I will put up a more detailed explanation on my blog where I can draw >> pretty pictures and write mathematical notation, but the >> crux of the argument that if you are adding two random variables x and y, >> then the region where there is non-zero probability is >> the square [0,1] x [0,1]. >> > -- Lance Norskog [email protected]
