On Wed, Nov 17, 2010 at 2:50 PM, Radu Spineanu <[email protected]>wrote:

> We're going to start with < 1.000 observations but we have to be able to
> scale out very quickly if it works. It could get to 100.000 observations in
> 6-8 months.
>

Use R.  Mahout is major over-kill for this problem.  You can always
transition later.

I am not saying your problem isn't difficult or that it isn't valuable to
solve.  Just that the virtue that Mahout brings (scale) isn't the virtue you
need (models sooner with least effort).


>
> The model is a combination between c) and b). All actions except the first
> one are independent. If we build the model around c) would it be hard to
> move to b) later on if that's the case? I want to go the easier route for
> now.
>

(c) is the easiest (independent models for each outcome).


>
> Could you point me to books, docs, howtos, articles about getting up and
> running with c)?
>

General data-mining books should be good.

Chris Bishop's book is excellent but a bit advanced.
http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738/ref=ntt_at_ep_dpi_1/192-5152996-7376364



> I'm a Debian Developer and I noticed Mahout is not in Debian. If I'm able
> to wrap my head around everything and get it working I would love to
> contribute back and package it.
>

We would love it if you did.  Mahout is fast moving and trunk will be
significantly more useful for most people for a while yet.  How does that
affect packaging for debian?


>
>
> > But your specific case will tell.  Your most important priority will be
> to
> > figure out how to test models realistically off-line.
>
> What do you mean by this?
>

I mean that you need to be able to tell if you models are doing some good
without going back to your live audience for more data.

Reply via email to