Hi all, Noticed something extremely strange and concerning regarding subquery column selection. Here's the query, and the sample results in presto vs hive SELECT ipm.ip, ipm.master_id AS mid, COALESCE(ad.cookie, '') AS cid, COALESCE(ad.url_cid, '') AS campaign_id, COALESCE(ad.url_sid, '') AS segment_id, CASE WHEN ad.url_event='clk' THEN 'media_click' ELSE 'media_impression' END AS event, count(*) AS count, CASE WHEN COALESCE(o.name, '') != '' THEN 'first_party_media' ELSE ot.token_type END AS campaign_type, ad.dt, lower(COALESCE(ot.organization_name, o.name)) AS orgFROM ( SELECT base_ip, url_event, url_orgid, url_token, url_cid, url_sid, cookie, url, date_time, dt FROM tag.pixel WHERE dt BETWEEN '2017-08-23' AND '2017-08-23' AND user_agent NOT LIKE 'ELB-HealthChecker%' GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ) ad JOIN default.ip_to_master_v2 ipm ON ipm.ip = ad.base_ip lEFT JOIN default.organization_token ot ON ot.token = ad.url_token LEFT JOIN aa.organization o ON ad.url_orgid = o.idWHERE COALESCE(ot.organization_name, o.name, '') != '' GROUP BY 1, 2, 3, 4, 5, 6, 8, 9, 10LIMIT 10 Presto ip | mid | cid | campaign_id | segment_id | event | count | campaign_type | dt | org (ommitted) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 216.110.25.2 | 93c8a694f6c0d39- | 420ad817041900006549e258d9030000aba10300 | 50 | 2 | media_impression | 1 | first_party_media | 2017-08-23 | ****** 14.140.116.135 | 6dfbca80b99c365-60934b2c332bcaa | 96453917a70d0000cbeb59598e010000fe970400 | 43 | 7 | media_impression | 2 | first_party_media | 2017-08-23 | ****** Hive103.4.16.84 51 | ba82f9c1e7ef7-51ba82f9c1e7ef7 | 50 | 50 | 2 | media_impression | 2 | first_party_media | 2017-08-23 | 5012.175.247.66 | c4028d27f0ebbb2-ef86eff7815a004 | 109 | 109 | 21 | media_impression | 23 | first_party_media | 2017-08-23 | 109116.197.188.12 | e78268e3ce3e13c-19ffd91bd5236a8 | 43 | 43 | 7 | media_impression | 4 | first_party_media | 2017-08-23 | 43 Alright, so clearly the results are very different in presto compared to hive. What's interesting is that columns 3, 4, and 10 show the same stat: campaign_id. I tried a bunch of different things (different aliasing, grouping, etc.), but nothing worked. The thing that did the trick was pulling out the subquery into its own table DROP TABLE IF EXISTS aa.tmp_subquery;CREATE TABLE IF NOT EXISTS aa.tmp_subquery ASSELECT base_ip, url_event, url_orgid, url_token, url_cid, url_sid, cookie, url, date_time, dtFROM tag.pixelWHERE dt BETWEEN '{start-yyyy-mm-dd}' AND '{end-yyyy-mm-dd}' AND user_agent NOT LIKE 'ELB-HealthChecker%'GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; INSERT OVERWRITE TABLE aa.campaign_accounts PARTITION (campaign_type, dt, org) SELECT ipm.ip, ipm.master_id AS mid, COALESCE(ad.cookie, '') AS cid, COALESCE(ad.url_cid, '') AS campaign_id, COALESCE(ad.url_sid, '') AS segment_id, CASE WHEN ad.url_event='clk' THEN 'media_click' ELSE 'media_impression' END AS event, count(*) AS count, CASE WHEN COALESCE(o.name, '') != '' THEN 'first_party_media' ELSE ot.token_type END AS campaign_type, ad.dt, lower(COALESCE(ot.organization_name, o.name)) AS orgFROM aa.tmp_subquery ad JOIN default.ip_to_master_v2 ipm ON ipm.ip = ad.base_ip lEFT JOIN default.organization_token ot ON ot.token = ad.url_token LEFT JOIN aa.organization o ON ad.url_orgid = o.idWHERE COALESCE(ot.organization_name, o.name, '') != '' GROUP BY 1, 2, 3, 4, 5, 6, 8, 9, 10DISTRIBUTE BY campaign_type, dt, orgSORT BY mid; The hive output matches presto's.Seems like a bug that's easy to miss and can cause some major damage. Anyone encountered this before? Thanks, Nitin