Yeah that review was, IMHO, had issues. It's important to note the
context: the person was selling their own services. It was trying to
run some sample code, non-distributed code, in a sort of distributed
fashion. The result was predictably not so good. That was a long time
ago.

2M users and 10M items isn't big even for a non-distributed
recommender. This doesn't even sound hard for a non-distributed Mahout
recommender. Sure, let's hear more and we can give some ideas.

On Wed, Dec 29, 2010 at 4:08 AM, Sebastian Schelter <[email protected]> wrote:
> Hi all,
>
> once again, I'm moving a twitter conversation to this mailing list.
>
> Let me introduce Andy, who is currently evaluating recommendation
> components for his NYC located startup and looking into Mahout for that
> reason:
>
> "We are coding primarily in Scala and looking to build or license a
> recommendation component. The base requirement is that it be capable of
> hybrid recommendations on a body of ~2MM users and ~10MM items with rich
> metadata.  The paper I referenced seems to indicate Mahout is not a
> great fit- can you point me to recent improvements that make the
> assertions in the paper obsolete? Any guidance is very much appreciated!"
>
> The paper which he's quoting is an old review of Mahout's recommender
> support available at
> http://www.iletken-project.com/documents/mahout_review_by_iletken.pdf .
> I think we should give great advice to Andy and simulatenously give the
> community an update about the criticized facts in that review that are
> not true anymore.
>
> I'll make a first try to address the state of that review:
>
>  - Mahout currently offers parallel algorithms for Collaborative
> Filtering, see
> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
> which can also be used to precompute a model which can than be used for
> online recommendations.
>
>  - Mahout has some support for matrix factorization based recommenders (
> https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/svd/SVDRecommender.html
> ), a superior algrithm to this (
> https://issues.apache.org/jira/browse/MAHOUT-525 ) as well as a parallel
> implementation ( https://issues.apache.org/jira/browse/MAHOUT-542 ) are
> currently in the making
>
>  -The memory consumption of Taste has significantly improved, I never
> tried to load the Netflix dataset, but I'm pretty sure it fits into some
> hundred megabytes of memory.
>
> Furthermore I think we need to know more details about Andy's usecase to
> give him proper answers about Mahout fitting his project:
>
> - Do you have explicit ratings from the users or are you working with
> implicit data?
>
> - What do you exactly mean by hybrid recommendations? Do you mean a
> combination of content based and collaborative filtering techniques?
>
> - How fast do you need the recommendations? Would it be ok to have them
> precomputed on a daily basis e.g. or do you need them in realtime?
>
> - How often do new users and new items enter your dataset? How sparse is
> your rating data?
>
> --sebastian
>

Reply via email to