Github user jiexiong commented on the issue:
https://github.com/apache/spark/pull/15722
Here is the query:
Here is the query:
INSERT OVERWRITE TABLE
lookalike_trainer_campaign_conv_users_with_country_shadow
PARTITION(ds='2016-10-19')
SELECT c.source_id,
c.country,
c.user_id,
c.conversion_time
FROM (
SELECT b.source_id,
b.country,
b.user_id,
b.conversion_time,
FB_NUMBER_ROWS(b.country, b.source_id) as rank
FROM (
SELECT source_id,
country,
user_id,
MAX(conversion_time) / 1000 AS conversion_time
FROM (
SELECT v.campaigngroup_id,
v.campaign_id,
v.adgroup_id,
v.user_id,
Y.country,
v.last_conversion_time AS conversion_time
FROM dim_all_users_fast:bi Y
JOIN lookalike_trainer_campaign_conv_raw v
ON v.user_id = Y.userid
WHERE v.ds='2016-10-19'
AND Y.ds = '2016-10-19'
AND Y.country IS NOT NULL
) a LATERAL VIEW
EXPLODE(ARRAY(campaigngroup_id, campaign_id, adgroup_id)) s
AS source_id
GROUP BY country, source_id, user_id
DISTRIBUTE by country, source_id
SORT BY country, source_id, conversion_time DESC
) b
) c WHERE rank <= 60000
Before the fix, it would fail from OOM error. After the fix, the OOM error
went away.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]