Yes.

I would start with the SGD system and possibly use the naive bayes models if
you have massive amounts of data.

In fact, if you have < 100,000 observations I would strongly recommend using
a more user friendly system such as R.

Regardless of which system, you need to decide what kind of model you need
to build.  There are several natural alternatives:

a) only one of the possible actions matters (or only one can be done) and
the actions are not ordered.  Use multi-nomial logisitic regression (SGD
implements this very nicely).

b) the actions nest in some way.  An example might be progression by a web
visitor toward economic conversion.  Action 1 might be any visitor, action 2
is clicking on product information, action 3 might be putting an item in a
shopping cart and action 4 might be buying an item.  These items have a
clear and important ordering and all users who complete action n have
completed all lower actions.  Ordinal logistic regression is a natural
choice here.  Mahout does not really support this.  You can do the poor
man's version by just
using the largest action completed and using multinomial logistic
regression.

c) the actions are relatively independent.  Here you can start with n binary
logistic regression models.  This will ignore any nesting
or implication structure among actions.  Mahout can help here with the
binary logistic regression.

On Wed, Nov 17, 2010 at 1:12 PM, Radu Spineanu <[email protected]>wrote:

> Hi.
>
>
> We have data about users that perform certain actions:
> user, age, sex, interests has performed actions 1,2,3
> (training data)
>
> Our goal is to ask in real time how likely is it that another user having
> age, sex, interests would perform the same actions.
>
>
> Can we use mahout for this? If yes, which algorithm do you think would be
> best? Would it work if we had partial data, like only age?
>
>
> Thank you.
> -r.
>

Reply via email to