Re: fast performance way of writing preferences to file?

Suneel Marthi Mon, 06 Apr 2015 19:38:54 -0700

FYI,  adding to Pat's reply below Slope-One has been long deprecated.

On Mon, Apr 6, 2015 at 5:00 PM, Pat Ferrel <p...@occamsmachete.com> wrote:


> Sorry, we are trying to get a release out.
>
> You can look at a custom similarity measure. Look at where
> SIMILARITY_COSINE leads you and customize that maybe? There are in-memory
> and mapreduce versions and not sure which you are using. That is code I
> haven’t looked at for a long time so can’t get you much closer.
>
>
> On Apr 3, 2015, at 10:52 AM, PierLorenzo Bianchini
> <piell...@yahoo.com.INVALID> wrote:
>
> Hi again,
> seeing the answers to this question and the other I had posted ("adjusted
> cosine similarity for item-based recommender?"), I think I should clarify a
> bit what I'm trying to achieve and why I (believe I should) do things the
> way I'm doing.
>
> I'm doing a class called "Learning from User-Generated data". Our first
> assignment deals with analysing the results of various types of
> recommenders. I'll go as far as saying "old-school" recommenders, given the
> content of your answers.
> We have been introduced to:
> * Memory based:
>     - user-based
>     - item-based (*with* adjusted cosine similarity!)
>     - slope-one
>     - graph-based transitivity
> * Memory based
>     - preprocessed item/user based (? this is unclear to me but I didn't
> reach this part of the assignment so I'll search for information before I
> ask questions; I also found an article where they mentioned slope-one
> amongst the model based; I guess I'll need to do more research on this)
>     - matrix factorization-based (I saw that SVD is available in Mahout;
> my project partner is looking into that right now)
>
> We have a *static* training dataset (800.000 <user,movie,preference>
> triples) and another static dataset for which we have to extract the
> predicted preferences (200.000 <user,movie> tuples) and write them back to
> a movie (i.e. recompose the <user,movie,preference> triples). Note that
> this will never go in a production environment, as it is merely a
> university requirement. For the same reason, I would prefer not to mix up
> things too much and I'd rather do a step-by-step learning (i.e. focus on
> Mahout for now, before I dig deeper and check the search-based approach,
> which uses DB-mahout-solr-spark... maybe a bit too much to handle at once
> with the deadline we were given).
>
> So if I might get back to my original questions (again, I'm sorry for
> being stubborn but I'm under specific constraints - I'll really try to
> understand the search-based approach when I have more time) ;)
> 1. I'm guessing that to implement an adjusted cosine similarity I should
> extend AbstractSimilarity (or maybe even AbstractRecommender?). Is this
> right?
> 2. I still can't believe that it takes more than at-most a few minutes to
> go through my 200.000 lines and find the already calculated preference.
> What am I doing wrong? :/ Should I store my whole datamodel in a file
> (how?) and then read through the file? I don't see how this could be faster
> than just reading the exact value I'm searching for...
>
> Thanks again for your answers! Regards,
>
> Pier Lorenzo
>
>
> --------------------------------------------
> On Fri, 4/3/15, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> Subject: Re: fast performance way of writing preferences to file?
> To: "user@mahout.apache.org" <user@mahout.apache.org>
> Date: Friday, April 3, 2015, 5:52 PM
>
> Are you sure that the
> problem is writing the results?  It seems to me that
> the real problem is the use of a user-based
> recommender.
>
> For such a
> small data set, for instance, a search-based recommender
> will be
> able to make recommendations in less
> than a millisecond with multiple
> recommendations possible in parallel.  This
> should allow you to do 200,000
> recommendations in a few minutes on a single
> machine.
>
> With such a small
> dataset, indicator-based methods may not be the best
> option.  To improve that, try using something
> larger such as the million
> song dataset.
> See http://labrosa.ee.columbia.edu/millionsong/
>
> Also, using and estimating
> ratings is not a particularly good thing to be
> doing if you want to build a real
> recommender.
>
>
> On
> Fri, Apr 3, 2015 at 3:26 AM, PierLorenzo Bianchini <
> piell...@yahoo.com.invalid>
> wrote:
>
> > Hello
> everyone,
> > I'm new to mahout, to
> recommender systems and to the mailing list.
> >
> > I''m trying
> to find a (fast) way to write back preferences to a file.
> I
> > tried a few methods but I'm sure
> there must be a better approach.
> >
> Here's the deal (you can find the same post in
> stackoverflow[1]).
> > I have a training
> dataset of 800.000 records from 6000 users rating 3900
> > movies. These are stored in a comma
> separated file like:
> >
> userId,movieId,preference. I have another dataset (200.000
> records) in the
> > format: userId,movieId.
> My goal is to use the first dataset as a
> > training-set, in order to determine the
> missing preferences of the second
> >
> set.
> >
> > So far, I
> managed to load the training dataset and I generated
> user-based
> > recommendations. This is
> pretty smooth and doesn't take too much time. But
> > I'm struggling when it comes to
> writing back the recommendations.
> >
> > The first method I tried is:
> >
> >   * read a line from
> the file and get the userId,movieId tuple.
> >   * retrieve the calculated preference
> with estimatePreference(userId,
> >
> movieId)
> >   * append the preference to
> the line and save it in a new file
> > This
> works, but it's incredibly slow (I added a counter to
> print every
> > 10.000th iteration: after a
> couple of minutes it had only printed once. I
> > have 8GB-RAM with an i7-core... how long
> can it take to process 200.000
> >
> lines?!)
> >
> > My second
> choise was:
> >
> >   *
> create a new FileDataModel with the second dataset
> >   * do something like this:
> newDataModel.setPreference(userId, movieId,
> > recommender.estimatePreference(userId,
> movieId));
> >
> > Here I
> get several problems:
> >   * at runtime:
> java.lang.UnsupportedOperationException (as I found out
> in
> > [2], FileDataModel actually
> can't be updated. I don't understand why the
> > function setPreference exists in the first
> place...)
> >   * The API of
> FileDataModel#setPreference states "This method should
> also
> > be considered relatively
> slow."
> >
> > I read
> around that a solution would be to use delta files, but I
> couldn't
> > find out what that
> actually means. Any suggestion on how I could speed up
> > my writing-the-preferences process?
> > Thank you!
> >
> > Pier Lorenzo
> >
> >
> > [1]
> >
> http://stackoverflow.com/questions/29423824/mahout-fast-performance-how-to-write-preferences-to-file
> > [2] http://comments.gmane.org/gmane.comp.apache.mahout.user/11330
> >
>
>
>

Re: fast performance way of writing preferences to file?

Reply via email to