It would probably have been an idea to mention that in this case, impression is 
an EXPLODEd view into a field in another table.

And that queries into this view produce odd errors.

And that EXPLODE in Hive seems generally questionable.

I'm dealing in another way.

On Sep 12, 2013, at 10:00 AM, Chuck Hardin <char...@namimedia.com> wrote:

> Please bear with me, because this is a pretty large query.
> 
> TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries 
> return no results, but the overall query does.  Sadly, the constraints on the 
> subqueries are not being honored in the UNION ALL.
> 
> I'm executing the following:
> 
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
>       SUM(entries.clicks), SUM(entries.conversions),
>       SUM(entries.cost), SUM(entries.feed_calls),
>       SUM(entries.impressions), SUM(entries.revenue),
>       SUM(entries.used_listings)
> FROM
>  (
>    SELECT
>      used_listing.advertiserid AS advertiser_id,
>      -1 AS keyword,
>      used_listing.adgroupid AS adgroup_id,
>      used_listing.adid AS ad_id,
>      used_listing.feedid AS feed_id,
>      0 AS clicks,
>      used_listing.campaignid AS campaign_id,
>      concat(used_listing.publisherid, "_", used_listing.sourceid, "_", 
> used_listing.subid) AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      used_listing.publisherid AS publisher_id,
>      0 AS cost,
>      used_listing.subid AS sub_id,
>      used_listing.sourceid AS source_id,
>      used_listing.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      used_listing.networkid AS network_id,
>      05 AS lax_day,
>      1 AS used_listings,
>      used_listing.ul_datetime AS start_timestamp,
>      -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND   used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      click.click.advertiser_id AS advertiser_id,
>      click.click.keyword AS keyword,
>      click.click.adgroup_id AS adgroup_id,
>      click.click.ad_id AS ad_id,
>      click.feed_id AS feed_id,
>      1 AS clicks,
>      click.click.campaign_id AS campaign_id,
>      click.click.subid AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      click.click.publisher_id AS publisher_id,
>      click.click.publisher_cost AS cost,
>      click.click.sub_id AS sub_id,
>      click.click.source_id AS source_id,
>      click.click.timezone AS timezone,
>      09 AS lax_month,
>      click.click.click_cost AS revenue,
>      click.click.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      click.start_timestamp AS start_timestamp,
>      click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND   click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      result.ad.advertiser_id AS advertiser_id,
>      result.bid.text AS keyword,
>      result.ad.adgroup_id AS adgroup_id,
>      result.ad.ad_id AS ad_id,
>      result.ad.feed_id AS feed_id,
>      0 AS clicks,
>      result.ad.campaign_id AS campaign_id,
>      concat(result.ad.publisher_id, "_", result.ad.source_id, "_", 
> result.ad.sub_id) AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      1 AS impressions,
>      2013 AS lax_year,
>      result.ad.publisher_id AS publisher_id,
>      0 AS cost,
>      result.ad.sub_id AS sub_id,
>      result.ad.source_id AS source_id,
>      result.ad.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      result.ad.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      result.ad.start_timestamp AS start_timestamp,
>      result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND   result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      conversion.conv_data.advertiser_id AS advertiser_id,
>      conversion.conv_data.keyword AS keyword,
>      conversion.conv_data.adgroup_id AS adgroup_id,
>      conversion.conv_data.ad_id AS ad_id,
>      conversion.conv_data.feed_id AS feed_id,
>      0 AS clicks,
>      conversion.conv_data.campaign_id AS campaign_id,
>      conversion.conv_data.subid AS subid,
>      1 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      conversion.conv_data.publisher_id AS publisher_id,
>      0 AS cost,
>      conversion.conv_data.sub_id AS sub_id,
>      conversion.conv_data.source_id AS source_id,
>      conversion.conv_data.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      conversion.conv_data.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      conversion.start_timestamp AS start_timestamp,
>      -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND   conversion.start_timestamp < 1378450800
> AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
> 
> Got all that?  Good!
> 
> The first strange thing is that the overall query returns results, yet I get 
> no results for any of the subqueries on which I'm performing a UNION ALL.
> 
> The second strange thing is that the overall query does not honor the 
> constraint on network_id at the very least, and probably isn't honoring the 
> constraints on start_timestamp either (I haven't checked).
> 
> I tried putting the constraint on the output of the UNION ALL query:
> 
> WHERE entries.start_timestamp >= 1378364400
> AND   entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
> That didn't work either.  Same behavior.
> 
> Can anyone help me make this work?

Reply via email to