Re: parallelALS and RMSE TEST

Sebastian Schelter Sat, 01 Mar 2014 01:33:23 -0800

The output of parallelALS are two matrices U and M whose product is anapproximation of your input matrix.

The matrices are outputed as sequence files with an IntWritable as key(the index of the row in the matrix) and a VectorWritable as value whichholds the contents of the row vector.


--sebastian

On 02/27/2014 06:30 PM, AJ Rader wrote:


Sean Owen <srowen <at> gmail.com> writes:


Parallel ALS is exactly an example of where you can use matrix
factorization for "0/1" data.

On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin <tevfik.aytekin <at>

gmail.com> wrote:

Hi Sean,
Isn't boolean preferences is supported in the context of memory-based
recommendation algorithms in Mahout?
Are there matrix factorization algorithms in Mahout which can work
with this kind of data (that is, the kind of data which consists of
users and the movies they have seen).




On Mon, May 6, 2013 at 10:34 PM, Sean Owen <srowen <at> gmail.com>

wrote:

Yes, it goes by the name 'boolean prefs' in the project since target
variables don't have values -- they just exist or don't.
So, yes it's certainly supported but the question here is how to
evaluate the output.

On Mon, May 6, 2013 at 8:29 PM, Tevfik Aytekin <tevfik.aytekin <at>

gmail.com> wrote:

This problem is called one-class classification problem. In the domain
of collaborative filtering it is called one-class collaborative
filtering (since what you have are only positive preferences). You may
search the web with these key words to find papers providing
solutions. I'm not sure whether Mahout has algorithms for one-class
collaborative filtering.

On Mon, May 6, 2013 at 1:42 PM, Sean Owen <srowen <at> gmail.com>

wrote:

ALS-WR weights the error on each term differently, so the average
error doesn't really have meaning here, even if you are comparing the
difference with "1". I think you will need to fall back to mean
average precision or something.

On Mon, May 6, 2013 at 11:24 AM, William <icswilliam2010 <at>

gmail.com> wrote:

Sean Owen <srowen <at> gmail.com> writes:


If you have no ratings, how are you using RMSE? this typically
measures error in reconstructing ratings.
I think you are probably measuring something meaningless.



I suppose the rate of seen movies are 1. Is it right?
If I use Collaborative Filtering with ALS-WR to get some

recommendations, I

must have a real rating-matrix?


I was wondering what kind of format the output produced by parallelALS is
stored in. More specifically I am looking for a way to decode/read this
information.

I have been able to run the mahout parallelALS command, calculate RMSE using
mahout evaluateFactorization, and generate recommendations via mahout
recommendfactorized.

However I would like to take a closer look at things like the factorized
products for my probeSet (stored in --tempDir from the 'mahout
evaluateFactorization' command) and the actual feature vectors stored in the
/out/U/ and /out/M/ directories.

thanks
AJ

Re: parallelALS and RMSE TEST

Reply via email to