On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <[email protected]> wrote:

> Due to the whole Netflix data lawsuit, the training data is synthetic,
> which
> puts the contestants at a disadvantage, and another interesting fact:
> runtime
> performance is at issue: your code will be run *live*, with your model
> being
> used to produce recommendations with a hard timeout of 50ms - if you
> miss this more than 20% of the time, you fail to progress to the end of
> the semi-final round.
>

If the dataset is synthetic (and I assume not random) is the goal to just
guess the model that generated the dataset?  Assuming it performs well, how
far us the 'synthetic' model from the actual customer behavior so that there
are no 'surprises' when it runs 'live'?

Potentially, there are more avenues for a lawsuit than in the Netflix case
since money is involved (just a thought).

Alex K

Reply via email to