On Thu, Nov 18, 2010 at 8:54 AM, Radu Spineanu <[email protected]>wrote:

> Offtopic: I can't find examples about how to implement my setup with
> partial queries. In either mahout or R.
>

In R, you build a data frame with all of your columns.  Then when training,
you specify your model using the
formula notation:

    m.all = glm(result ~ age + interest1 + interest2 + gender, yourDataHere,
family=binomial())

or

    m.small = glm(result ~ age + gender, yourDataHere, family=binomial())

This gives you two models, m.all and m.small.  You can select which one you
want to use based on what data you
have.

This is not quite the same as what you were asking for.  Using m.all when
you only have age and gender is a tricky
business since it requires picking some values for interest1 and interest2.
 One thing you can do is sample from your
training data for all examples that match the specified age and gender.
 This gives you a cloud of results, but may
not work (what if you haven't seen *exactly* that combination of age and
gender enough to get a good sample?)

You can't just put in zeros for interest1 and interest2 because of the
internal way that the models are encoded.  Putting
in zeros implicitly chooses the default value (typically interest1 because
it sorts first) which is definitely wrong.


> I can train with "age", "interest1" ... "interestN", "demographic1",
> ..."demographicX" and when querying I could ask with "age", "interest1" ..
> "interestM" where M could be bigger or smaller than N.
>

You really only have the choice of synthesizing data or having multiple
models.  I recommend multiple models in most cases.

I could break them into multiple rows, but it would result fake results.
> Someone interested in Books + Math could yield results, but just Math
> wouldn't.
>

I don't understand it, but I don't recommend it.  The multiple model
approach above is much simpler.

Do you guys know anyone that offers consulting services at a reasonable
> price to help with modelling?
>

I don't know about reasonable, but there are several people who can offer
some help:

- most university stats departments have somebody interested in data-mining.
 There is probably a grad student who could help.

- Chris Poulin at Patterns and Predictions might be able to help.

- Mike Driscoll at Dataspora offers such consulting

- Joseph Turian might be able to help

- many others that don't come to mind instantly.

Good luck!

Reply via email to