[RC] Blending machines and humans to get very high accuracy

Dr. Ernie Prabhakar Thu, 15 Sep 2011 14:17:49 -0700

For Billy...



Geeking with Greg: Blending machines and humans to get very high accuracy
http://glinden.blogspot.com/2011/09/blending-machines-and-humans-to-get.html

A paper by six Googlers from the recent KDD 2011 conference, “Detecting 
Adversarial Advertisements in the Wild” (PDF) is a broadly useful example of 
how to succeed at tasks requiring very high accuracy using a combination of 
many different machine learning algorithms, high quality human experts, and 
lower quality human judges.
Let’s start with an excerpt from the paper:

A small number of adversarial advertisers may seek to profit by attempting to 
promote low quality or untrustworthy content via online advertising systems …. 
[For example, some] attempt to sell counterfeit or otherwise fraudulent goods … 
[or] direct users to landing pages where they might unwittingly download 
malware.

Unlike many data-mining tasks in which the cost of false positives (FP’s) and 
false negatives (FN’s) may be traded off, in this setting both false positives 
and false negatives carry extremely high misclassification cost … [and] must be 
driven to zero, even for difficult edge cases.

[We present a] system currently deployed at Google for detecting and blocking 
adversial advertisements …. At a high level, our system may be viewed as an 
ensemble composed of many large-scale component models …. Our automated … 
methods include a variety of … classifiers … [including] a single, coarse model 
… [to] filter out .. the vast majority of easy, good ads … [and] a set of 
finely-grained models [trained] to detect each of [the] more difficult classes.

Human experts … help detect evolving adversarial advertisements … [through] 
margin-based uncertainty sampling … [often] requiring only a few dozen 
hand-labeled examples … for rapid development of new models …. Expert users 
[also] search for positive examples guided by their intuition … [using a 
custom] tool … [and they have] surprised us … [by] developing hand-crafted, 
rule-based models with extremely high precision.

Because [many] models do not adapt over time, we have developed automated 
monitoring of the effectiveness of each … model; models that cease to be 
effective are removed …. We regularly evaluate the [quality] of our [human 
experts] … both to access the performance of … raters and measure our 
confidence in these assessments … [We also use] an approach similar to 
crowd-sourcing … [to] calibrate our understanding of real user perception and 
ensure that our system continues to protect the interest of actual users.

I love this approach, blending experts and the human intuition of experts to 
help guide, assist, and correct algorithms running over big data. These 
Googlers used an ensemble of classifiers, trained by experts that focused on 
labels of the edge cases, and ran them over features extracted from a massive 
data set of advertisements. They then built custom tools to make it easy for 
experts to search over the ads, follow their intuition, dig in deep, and fix 
the hardest cases the classifiers missed. Because the bad guys never quit, the 
Googlers not only constantly add new models and rules, but also constantly 
evaluate existing rules, models, and the human experts to make sure they are 
still useful. Excellent.
I think the techniques described here are applicable well beyond detecting 
naughty advertisers. For example, I suspect a similar technique could be 
applied to mobile advertising, a hard problem where limited screen space and 
attention makes relevance critical, but we usually have very little data on 
each user’s interests, each user’s intent, and each advertiser. Combining human 
experts with machines like these Googlers have done could be particularly 
useful in bootstrapping and overcoming sparse and noisy data, two problems that 
make it so difficult for startups to succeed on problems like mobile 
advertising.

(via Instapaper)



Sent from my iPhone

-- 
Centroids: The Center of the Radical Centrist Community 
<[email protected]>
Google Group: http://groups.google.com/group/RadicalCentrism
Radical Centrism website and blog: http://RadicalCentrism.org

[RC] Blending machines and humans to get very high accuracy

Reply via email to