Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

WanHong Fu Tue, 03 Sep 2019 23:09:02 -0700

Sorry, resend in pure text for cases mentioned before:

My case is as below:
1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED
BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as
orc tblproperties ('transactional'='true');
2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values
(1, 'A'),(2, 'B');
3) alter table hive_bucketed2 add columns (age INT);
4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values
(11, '1A', 10),(12, '1B', 22);
5) select * from hive.hive_bucketed2;


WanHong Fu <[email protected]> 于2019年9月4日周三 上午9:52写道：

> Hi Igor,
>
> Thanks for your reply.
>
> For you case Drill works fine.
> But if the table created with Partition, will hit the issue.
>
> My case is as below:
> 1) *create* *table *hive_bucketed2 (emp_id *int*, first_name *string*)
> PARTITIONED *BY* (`col_year_month` *string*) clustered *by* (emp_id)
> *into* 4 buckets stored *as* orc tblproperties ('transactional'='true');
> 2) *insert* *into* hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
> *values* (1, 'A'),(2, 'B');
> 3) *alter* *table *hive_bucketed2 *add* columns (age *INT*);
> 4) *insert* *into *hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
> *values* (11, '1A', 10),(12, '1B', 22);
> 5) *select* * *from* hive.hive_bucketed2;
>
>
> The expected result for query 5):
> emp_id|first_name|age|col_year_month|
> ------|----------|---|--------------|
>     12|1B        | 22|2019-09       |
>      1|A         |   |2019-09       |
>      2|B         |   |2019-09       |
>     11|1A        | 10|2019-09       |
>
> The actual is:
> emp_id|first_name|age|col_year_month|
> ------|----------|---|--------------|
>     12|1B        |   |2019-09       |
>      1|A         |   |2019-09       |
>      2|B         |   |2019-09       |
>     11|1A        |   |2019-09       |
>
> Appreciated if you can advise us.
>
> Regards
> Jerry
>
> Igor Guzenko <[email protected]> 于2019年9月3日周二 下午8:56写道：
>
>> Hello Jerry,
>>
>> Could you please describe your use-case in details. I've tried following
>> steps and Drill reads data for new column just fine.
>>
>> 1) Create table
>> create table hive_bucketed(emp_id int, first_name string) clustered by
>> (emp_id) into 4 buckets stored as orc tblproperties
>> ('transactional'='true');
>>
>> 2) insert data
>> insert into hive_bucketed values (1, 'A'),(2, 'B');
>>
>> 3) queried table with Drill and got 2 columns
>> select * from hive.hive_bucketed
>>
>> 4) add new column
>> alter table hive_bucketed add columns (age INT);
>>
>> 5) set data for new column
>> update hive_bucketed SET age=100;
>>
>> 6) queried table with Drill and got 3 columns with data as expected
>>  select * from hive.hive_bucketed
>>
>> Please, provide small use-case to reproduce your issue. Also ddl and logs
>> are welcome:)
>>
>> Thanks,
>> Igor
>>
>>
>>
>>
>> On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) <[email protected]> wrote:
>>
>> > Hi,
>> >
>> > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed
>> ORC
>> > table, everything is okay but when querying the columns added by HiveQL
>> > “alert table … add columns …”, Drill always returns the query result as
>> > NULL for such columns, HiveQL can query with expected results, can
>> anyone
>> > help to give an adivse on how to solve this issue?
>> >
>> > Appreciated & Thanks
>> > Jerry
>>
>

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Reply via email to