Comment #3 from Oliver Keyes
This bug (or class of bug) has continued to make itself known. It's
particularly concerning and frequent when running queries that contain
subqueries, since it's treated as multiple jobs, and that increases the
probability that one will fail - and if any ONE element fails, it all fails. As
an example, I've been running variants of:

INSERT OVERWRITE TABLE ironholds.distinct_ip
SELECT distip
FROM (SELECT ip AS distip, COUNT(*) as count FROM wmf.webrequest_mobile WHERE
year = 2014 AND month = 1 AND day = 20 AND content_type IN ('text/html\;
charset=utf-8','text/html\; charset=iso-8859-1','text/html\;
charset=UTF-8','text/html') GROUP BY ip HAVING COUNT(*) >= 2) sub1 LIMIT 10000;

and I've had three failures out of the previous four queries (which, with
subqueries, works out as 3/8). Syntactically valid queries failing
seemingly-randomly with no explanation is a pretty substantial blocker to being
able to rely on Hive for production tasks.

