[ https://issues.apache.org/jira/browse/PIG-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721012#action_12721012 ]
Dmitriy V. Ryaboy commented on PIG-855: --------------------------------------- Jeff, the approach depends on whether you care more about false positives or false negatives. The right way to do this is probably not to write a boolean function, but something that returns one of several codes -- known browser, known crawler, monitor, stuff like wget and curl, and "unknown". IAB has a standard list of bots and spiders (http://www.iab.net/sites/login.php), and maintains an industry standard for the filters that should be applied before numbers are reported. > Filter to determine if a UserAgent string is a bot > -------------------------------------------------- > > Key: PIG-855 > URL: https://issues.apache.org/jira/browse/PIG-855 > Project: Pig > Issue Type: New Feature > Reporter: Dmitriy V. Ryaboy > Priority: Minor > > A PiggyBank contrib that would allow one to filter records by whether a > UserAgent strings represents a bot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.