Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Hi Igor, Thanks for reply. Your workaround need to touch Hive metastore but it is impossible for us so far. May I know is there a plan to fix this issue from Drill side as we are eager to get it work. Thanks Jerry Igor Guzenko 于2019年9月4日周三 下午4:45写道: > Hello Jerry, > > Thank you very much for detailed steps. The issue is pretty complex, when > Drill reads meta about partition, somehow linked > storage descriptor doesn't have the newly added column. I've created > DRILL-7365 [1] > which at the end of description contains one possible workaround. > > [1] https://issues.apache.org/jira/browse/DRILL-7365 > > Thanks, > Igor > >
Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Hello Jerry, Thank you very much for detailed steps. The issue is pretty complex, when Drill reads meta about partition, somehow linked storage descriptor doesn't have the newly added column. I've created DRILL-7365 [1] which at the end of description contains one possible workaround. [1] https://issues.apache.org/jira/browse/DRILL-7365 Thanks, Igor
Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Sorry, resend in pure text for cases mentioned before: My case is as below: 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc tblproperties ('transactional'='true'); 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 'A'),(2, 'B'); 3) alter table hive_bucketed2 add columns (age INT); 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (11, '1A', 10),(12, '1B', 22); 5) select * from hive.hive_bucketed2; WanHong Fu 于2019年9月4日周三 上午9:52写道: > Hi Igor, > > Thanks for your reply. > > For you case Drill works fine. > But if the table created with Partition, will hit the issue. > > My case is as below: > 1) *create* *table *hive_bucketed2 (emp_id *int*, first_name *string*) > PARTITIONED *BY* (`col_year_month` *string*) clustered *by* (emp_id) > *into* 4 buckets stored *as* orc tblproperties ('transactional'='true'); > 2) *insert* *into* hive_bucketed2 *PARTITION* (col_year_month = '2019-09') > *values* (1, 'A'),(2, 'B'); > 3) *alter* *table *hive_bucketed2 *add* columns (age *INT*); > 4) *insert* *into *hive_bucketed2 *PARTITION* (col_year_month = '2019-09') > *values* (11, '1A', 10),(12, '1B', 22); > 5) *select* * *from* hive.hive_bucketed2; > > > The expected result for query 5): > emp_id|first_name|age|col_year_month| > --|--|---|--| > 12|1B| 22|2019-09 | > 1|A | |2019-09 | > 2|B | |2019-09 | > 11|1A| 10|2019-09 | > > The actual is: > emp_id|first_name|age|col_year_month| > --|--|---|--| > 12|1B| |2019-09 | > 1|A | |2019-09 | > 2|B | |2019-09 | > 11|1A| |2019-09 | > > Appreciated if you can advise us. > > Regards > Jerry > > Igor Guzenko 于2019年9月3日周二 下午8:56写道: > >> Hello Jerry, >> >> Could you please describe your use-case in details. I've tried following >> steps and Drill reads data for new column just fine. >> >> 1) Create table >> create table hive_bucketed(emp_id int, first_name string) clustered by >> (emp_id) into 4 buckets stored as orc tblproperties >> ('transactional'='true'); >> >> 2) insert data >> insert into hive_bucketed values (1, 'A'),(2, 'B'); >> >> 3) queried table with Drill and got 2 columns >> select * from hive.hive_bucketed >> >> 4) add new column >> alter table hive_bucketed add columns (age INT); >> >> 5) set data for new column >> update hive_bucketed SET age=100; >> >> 6) queried table with Drill and got 3 columns with data as expected >> select * from hive.hive_bucketed >> >> Please, provide small use-case to reproduce your issue. Also ddl and logs >> are welcome:) >> >> Thanks, >> Igor >> >> >> >> >> On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) wrote: >> >> > Hi, >> > >> > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed >> ORC >> > table, everything is okay but when querying the columns added by HiveQL >> > “alert table … add columns …”, Drill always returns the query result as >> > NULL for such columns, HiveQL can query with expected results, can >> anyone >> > help to give an adivse on how to solve this issue? >> > >> > Appreciated & Thanks >> > Jerry >> >
Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Hi Igor, Thanks for your reply. For you case Drill works fine. But if the table created with Partition, will hit the issue. My case is as below: 1) *create* *table *hive_bucketed2 (emp_id *int*, first_name *string*) PARTITIONED *BY* (`col_year_month` *string*) clustered *by* (emp_id) *into* 4 buckets stored *as* orc tblproperties ('transactional'='true'); 2) *insert* *into* hive_bucketed2 *PARTITION* (col_year_month = '2019-09') *values* (1, 'A'),(2, 'B'); 3) *alter* *table *hive_bucketed2 *add* columns (age *INT*); 4) *insert* *into *hive_bucketed2 *PARTITION* (col_year_month = '2019-09') *values* (11, '1A', 10),(12, '1B', 22); 5) *select* * *from* hive.hive_bucketed2; The expected result for query 5): emp_id|first_name|age|col_year_month| --|--|---|--| 12|1B| 22|2019-09 | 1|A | |2019-09 | 2|B | |2019-09 | 11|1A| 10|2019-09 | The actual is: emp_id|first_name|age|col_year_month| --|--|---|--| 12|1B| |2019-09 | 1|A | |2019-09 | 2|B | |2019-09 | 11|1A| |2019-09 | Appreciated if you can advise us. Regards Jerry Igor Guzenko 于2019年9月3日周二 下午8:56写道: > Hello Jerry, > > Could you please describe your use-case in details. I've tried following > steps and Drill reads data for new column just fine. > > 1) Create table > create table hive_bucketed(emp_id int, first_name string) clustered by > (emp_id) into 4 buckets stored as orc tblproperties > ('transactional'='true'); > > 2) insert data > insert into hive_bucketed values (1, 'A'),(2, 'B'); > > 3) queried table with Drill and got 2 columns > select * from hive.hive_bucketed > > 4) add new column > alter table hive_bucketed add columns (age INT); > > 5) set data for new column > update hive_bucketed SET age=100; > > 6) queried table with Drill and got 3 columns with data as expected > select * from hive.hive_bucketed > > Please, provide small use-case to reproduce your issue. Also ddl and logs > are welcome:) > > Thanks, > Igor > > > > > On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) wrote: > > > Hi, > > > > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed > ORC > > table, everything is okay but when querying the columns added by HiveQL > > “alert table … add columns …”, Drill always returns the query result as > > NULL for such columns, HiveQL can query with expected results, can anyone > > help to give an adivse on how to solve this issue? > > > > Appreciated & Thanks > > Jerry >
Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Hello Jerry, Could you please describe your use-case in details. I've tried following steps and Drill reads data for new column just fine. 1) Create table create table hive_bucketed(emp_id int, first_name string) clustered by (emp_id) into 4 buckets stored as orc tblproperties ('transactional'='true'); 2) insert data insert into hive_bucketed values (1, 'A'),(2, 'B'); 3) queried table with Drill and got 2 columns select * from hive.hive_bucketed 4) add new column alter table hive_bucketed add columns (age INT); 5) set data for new column update hive_bucketed SET age=100; 6) queried table with Drill and got 3 columns with data as expected select * from hive.hive_bucketed Please, provide small use-case to reproduce your issue. Also ddl and logs are welcome:) Thanks, Igor On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) wrote: > Hi, > > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed ORC > table, everything is okay but when querying the columns added by HiveQL > “alert table … add columns …”, Drill always returns the query result as > NULL for such columns, HiveQL can query with expected results, can anyone > help to give an adivse on how to solve this issue? > > Appreciated & Thanks > Jerry
Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table
Hi, We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed ORC table, everything is okay but when querying the columns added by HiveQL “alert table … add columns …”, Drill always returns the query result as NULL for such columns, HiveQL can query with expected results, can anyone help to give an adivse on how to solve this issue? Appreciated & Thanks Jerry