Sean Owen <srowen <at> gmail.com> writes: > > Parallel ALS is exactly an example of where you can use matrix > factorization for "0/1" data. > > On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin <tevfik.aytekin <at> gmail.com> wrote: > > Hi Sean, > > Isn't boolean preferences is supported in the context of memory-based > > recommendation algorithms in Mahout? > > Are there matrix factorization algorithms in Mahout which can work > > with this kind of data (that is, the kind of data which consists of > > users and the movies they have seen). > > > > > > > > > > On Mon, May 6, 2013 at 10:34 PM, Sean Owen <srowen <at> gmail.com> wrote: > >> Yes, it goes by the name 'boolean prefs' in the project since target > >> variables don't have values -- they just exist or don't. > >> So, yes it's certainly supported but the question here is how to > >> evaluate the output. > >> > >> On Mon, May 6, 2013 at 8:29 PM, Tevfik Aytekin <tevfik.aytekin <at> gmail.com> wrote: > >>> This problem is called one-class classification problem. In the domain > >>> of collaborative filtering it is called one-class collaborative > >>> filtering (since what you have are only positive preferences). You may > >>> search the web with these key words to find papers providing > >>> solutions. I'm not sure whether Mahout has algorithms for one-class > >>> collaborative filtering. > >>> > >>> On Mon, May 6, 2013 at 1:42 PM, Sean Owen <srowen <at> gmail.com> wrote: > >>>> ALS-WR weights the error on each term differently, so the average > >>>> error doesn't really have meaning here, even if you are comparing the > >>>> difference with "1". I think you will need to fall back to mean > >>>> average precision or something. > >>>> > >>>> On Mon, May 6, 2013 at 11:24 AM, William <icswilliam2010 <at> gmail.com> wrote: > >>>>> Sean Owen <srowen <at> gmail.com> writes: > >>>>> > >>>>>> > >>>>>> If you have no ratings, how are you using RMSE? this typically > >>>>>> measures error in reconstructing ratings. > >>>>>> I think you are probably measuring something meaningless. > >>>>>> > >>>>> > >>>>> > >>>>> I suppose the rate of seen movies are 1. Is it right? > >>>>> If I use Collaborative Filtering with ALS-WR to get some recommendations, I > >>>>> must have a real rating-matrix? > >>>>> > >>>>> > >>>>>
I was wondering what kind of format the output produced by parallelALS is stored in. More specifically I am looking for a way to decode/read this information. I have been able to run the mahout parallelALS command, calculate RMSE using mahout evaluateFactorization, and generate recommendations via mahout recommendfactorized. However I would like to take a closer look at things like the factorized products for my probeSet (stored in --tempDir from the 'mahout evaluateFactorization' command) and the actual feature vectors stored in the /out/U/ and /out/M/ directories. thanks AJ
