Zoltan Chovan created HIVE-17334:
------------------------------------

             Summary: Problem in interpreting structure columns in view.
                 Key: HIVE-17334
                 URL: https://issues.apache.org/jira/browse/HIVE-17334
             Project: Hive
          Issue Type: Improvement
            Reporter: Zoltan Chovan
            Priority: Minor


Reproduction steps, to set up the env:


{code:java}
drop view if exists test_db.view_a;
drop view if exists test_db.view_b;
drop table if exists test_db.table_a;

create database if not exists test_db;

create external table if not exists test_db.table_a (
`id` bigint,
`users` array<struct<user_id:int,counter:int>>
)
partitioned by (p_date int)
stored as parquet
location '/user/hive/';
{code}

With this, he following scenario will fail:


{code:java}
select
  t.p_date,
  t.id,
  p.user_id,
  sum(p.counter) as counter,
  sum(sum(p.counter)) over
    (partition by t.id) as total_actions
from test_db.table_a as t
lateral view explode(t.users) users_two as p
group by t.p_date, t.id, p.user_id;

create view if not exists test_db.view_a as 
select 
  t.p_date,
  t.id,
  p.user_id,
  sum(p.counter) as counter,
  sum(sum(p.counter)) over
    (partition by t.id) as total_actions
from test_db.table_a as t
lateral view explode(t.users) users_two as p
group by t.p_date, t.id, p.user_id;


select *
from test_db.view_a
where p_date = 20170711;
{code}


The following scenario will succeed:

{code:java}
create view if not exists test_db.view_b as
select
  base.*,
  sum(base.counter) over
    (partition by base.id) as total_actions
from (
  select 
    t.p_date,
    t.id,
    p.user_id,
    sum(p.counter) as counter
  from test_db.table_a as t
  lateral view explode(t.users) users_two as p
  group by t.p_date, t.id, p.user_id
) as base;

select *
from test_db.view_b
where p_date = 20170711;
{code}


As you can see the only difference is that the addition of "sum(sum(p.counter)) 
over (partition by t.id) as total_actions"

If the view is created as follows then the query that was breaking works.


{code:java}
create view if not exists test_db.view_c as 
select 
  dt.p_date,
  dt.id,
  users_two.p.user_id,
  sum(p.counter) as counter,
  sum(sum(p.counter)) over
    (partition by dt.id) as total_actions
from test_db.table_a as dt
lateral view explode(dt.users) users_two as p
group by dt.p_date, dt.id, users_two.p.user_id;

select *
from test_db.view_c
where p_date = 20170711;

{code}

The workaround is to use the table name along with the column name e.g 
users_two.p.user_id. Please advise if this is a bug and if it could be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to