More contests at: 
http://challenge.gov/NIH/132-nlm-show-off-your-apps-innovative-uses-of-nlm-information


On May 15, 2011, at 10:25 PM, Alex Kozlov wrote:

> On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <[email protected]> wrote:
> 
>> Due to the whole Netflix data lawsuit, the training data is synthetic,
>> which
>> puts the contestants at a disadvantage, and another interesting fact:
>> runtime
>> performance is at issue: your code will be run *live*, with your model
>> being
>> used to produce recommendations with a hard timeout of 50ms - if you
>> miss this more than 20% of the time, you fail to progress to the end of
>> the semi-final round.
>> 
> 
> If the dataset is synthetic (and I assume not random) is the goal to just
> guess the model that generated the dataset?  Assuming it performs well, how
> far us the 'synthetic' model from the actual customer behavior so that there
> are no 'surprises' when it runs 'live'?
> 
> Potentially, there are more avenues for a lawsuit than in the Netflix case
> since money is involved (just a thought).
> 
> Alex K

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

Reply via email to