Re: [RC] Blending machines and humans to get very high accuracy

BILROJ Thu, 15 Sep 2011 14:37:56 -0700

 
Ernie :
Sounds good in principle. What I wonder is how  --since there are  humans 
in the system--
it is possible to control costs.
 
Billy
 
 
----------------------------------------------------------------------------
---------
 
 
 
 
message dated 9/15/2011 2:17:49 P.M. Pacific Daylight Time,  
[email protected] writes:


 
For Billy... 

Geeking with Greg: Blending machines and humans to get very high  accuracy
_http://glinden.blogspot.com/20
11/09/blending-machines-and-humans-to-get.html_ 
(http://glinden.blogspot.com/2011/09/blending-machines-and-humans-to-get.html)  
 
____________________________________
  
A paper by six Googlers from the recent KDD 2011 conference, “Detecting  
Adversarial Advertisements in the Wild” (_PDF_ 
(http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/
37195.pdf) )  is a broadly useful example of how to succeed at tasks 
requiring very high  accuracy using a combination of many different machine 
learning algorithms,  high quality human experts, and lower quality human 
judges.  
Let’s start with an excerpt from the paper: 
A small number of adversarial advertisers may seek to profit by  attempting 
to promote low quality or untrustworthy content via online  advertising 
systems …. [For example, some] attempt to sell counterfeit or  otherwise 
fraudulent goods … [or] direct users to landing pages where they  might 
unwittingly download malware.

Unlike many data-mining tasks in which the cost of false positives (FP’s)  
and false negatives (FN’s) may be traded off, in this setting both false  
positives and false negatives carry extremely high misclassification cost …  
[and] must be driven to zero, even for difficult edge cases. 
[We present a] system currently deployed at Google for detecting and  
blocking adversial advertisements …. At a high level, our system may be  viewed 
as an ensemble composed of many large-scale component models …. Our  
automated … methods include a variety of … classifiers … [including] a  single, 
coarse model … [to] filter out .. the vast majority of easy, good  ads … [and] 
a set of finely-grained models [trained] to detect each of [the]  more 
difficult classes. 
Human experts … help detect evolving adversarial advertisements …  
[through] margin-based uncertainty sampling … [often] requiring only a few  
dozen 
hand-labeled examples … for rapid development of new models …. Expert  users 
[also] search for positive examples guided by their intuition … [using  a 
custom] tool … [and they have] surprised us … [by] developing  hand-crafted, 
rule-based models with extremely high precision. 
Because [many] models do not adapt over time, we have developed automated  
monitoring of the effectiveness of each … model; models that cease to be  
effective are removed …. We regularly evaluate the [quality] of our [human  
experts] … both to access the performance of … raters and measure our  
confidence in these assessments … [We also use] an approach similar to  
crowd-sourcing … [to] calibrate our understanding of real user perception  and 
ensure 
that our system continues to protect the interest of actual  users.

I love this approach, blending experts and the  human intuition of experts 
to help guide, assist, and correct algorithms  running over big data. These 
Googlers used an ensemble of classifiers, trained  by experts that focused 
on labels of the edge cases, and ran them over  features extracted from a 
massive data set of advertisements. They then built  custom tools to make it 
easy for experts to search over the ads, follow their  intuition, dig in deep, 
and fix the hardest cases the classifiers missed.  Because the bad guys 
never quit, the Googlers not only constantly add new  models and rules, but 
also constantly evaluate existing rules, models, and the  human experts to make 
sure they are still useful. Excellent.  
I think the techniques described here are applicable well beyond detecting  
naughty advertisers. For example, I suspect a similar technique could be  
applied to mobile advertising, a hard problem where limited screen space and  
attention makes relevance critical, but we usually have very little data on 
 each user’s interests, each user’s intent, and each advertiser. Combining 
 human experts with machines like these Googlers have done could be  
particularly useful in bootstrapping and overcoming sparse and noisy data, two  
problems that make it so difficult for startups to succeed on problems like  
mobile advertising.
 
____________________________________
(via _Instapaper_ (http://www.instapaper.com/) )



Sent from my iPhone
-- 
Centroids: The Center of the Radical Centrist Community  
<[email protected]>
Google Group: _http://groups.google.com/group/RadicalCentrism_ 
(http://groups.google.com/group/RadicalCentrism) 
Radical  Centrism website and blog: _http://RadicalCentrism.org_ 
(http://radicalcentrism.org/) 



-- 
Centroids: The Center of the Radical Centrist Community 
<[email protected]>
Google Group: http://groups.google.com/group/RadicalCentrism
Radical Centrism website and blog: http://RadicalCentrism.org

Re: [RC] Blending machines and humans to get very high accuracy

Reply via email to