Thanks Okumin. I am new to Hive and Tez and I have struggled to deploy a high-performance Dockerized Hive setup. I followed the documentation for setting up a remote Metastore. I have a single node with 32 GB of RAM and 8 cores, but I have a dataset of about 2 GB (Iceberg table partitioned on one column). However, when I run select queries, the performance has not been as fast as expected. Could someone share some insights, especially regarding hive-site.xml and Tez custom configuration?
Any help would be appreciated. On Sun, Aug 4, 2024 at 4:46 PM Okumin <m...@okumin.com> wrote: > Hi Clinton, > > I tested MERGE INTO with minimal reproduction. I saw the same error. > > ``` > CREATE TABLE src (col1 INT, col2 INT); > CREATE TABLE dst (id BIGINT DEFAULT SURROGATE_KEY(), col1 INT, col2 > INT, PRIMARY KEY (id) DISABLE NOVALIDATE) STORED BY ICEBERG; > > MERGE INTO dst d USING src s ON s.col1 = d.col1 > WHEN MATCHED THEN UPDATE SET col2 = s.col2 > WHEN NOT MATCHED THEN INSERT (col1, col2) VALUES (s.col1, s.col2); > ``` > > The following query, which explicitly inserts `id`, succeeded on my > machine. The default keyword is unlikely to work on INSERT on MERGE > INTO. I've yet to investigate whether ANSI allows us to omit it. > > ``` > MERGE INTO dst d USING src s ON s.col1 = d.col1 > WHEN MATCHED THEN UPDATE SET col2 = s.col2 > WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (SURROGATE_KEY(), > s.col1, s.col2); > ``` > > As another point, the SURROGATE_KEY might not work as you expected. It > doesn't generate globally unique ids on my attempts. > > Regards, > Okumin > > On Wed, Jul 31, 2024 at 4:54 PM clinton chikwata > <clintonfun...@gmail.com> wrote: > > > > Dear Team, > > > > Any help will be much appreciated. > > > > Error SQL Error [40000] [42000]: Error while compiling statement: > FAILED: SemanticException Schema of both sides of union should match. > > > > I have an ETL workload that stores data into temp_table with the schema > as shown below. > > > > CREATE EXTERNAL TABLE IF NOT EXISTS temp_table ( > > VC_ITEM_CODE STRING, > > VC_SUB_GROUP STRING, > > VC_PRODUCT_NAME STRING, > > VC_PRODUCT_UNIT STRING, > > VC_GROUP_CODE STRING, > > DT_VAT_START TIMESTAMP, > > VC_BAND_CODE STRING, > > VC_SEMI_BAND_CODE STRING, > > VC_DIVISIONS STRING, > > NU_UNIT_FACTOR DECIMAL(30, 0), > > VC_DIVISION_SEG_CODE STRING, > > VC_COLOR_COMB STRING, > > DT_MOD_DATE TIMESTAMP, > > VC_INACTIVE_PRODUCT STRING, > > RN DECIMAL(10, 0), > > country STRING > > ) > > STORED AS PARQUET > > LOCATION 'S{path}' > > > > Then i want to load it to the final table > > > > CREATE TABLE product_dimension ( > > `ID` BIGINT DEFAULT SURROGATE_KEY(), > > VC_ITEM_CODE STRING, > > VC_SUB_GROUP STRING, > > VC_PRODUCT_NAME STRING, > > VC_PRODUCT_UNIT STRING, > > VC_GROUP_CODE STRING, > > DT_VAT_START TIMESTAMP, > > VC_BAND_CODE STRING, > > VC_SEMI_BAND_CODE STRING, > > VC_DIVISIONS STRING, > > NU_UNIT_FACTOR DECIMAL(30, 0), > > VC_DIVISION_SEG_CODE STRING, > > VC_COLOR_COMB STRING, > > DT_MOD_DATE TIMESTAMP, > > VC_INACTIVE_PRODUCT STRING, > > RN DECIMAL(10, 0), > > country STRING, > > PRIMARY KEY (ID) DISABLE NOVALIDATE) > > STORED BY ICEBERG; > > > > When I attempt to perform a merge operation on column vc_item_code i > get the error as shown above : > > > > MERGE > > INTO > > product_dimension AS c > > USING ( > > SELECT > > * > > FROM > > temp_table) AS s ON s.vc_item_code = c.vc_item_code > > AND s.country = c.country > > WHEN MATCHED THEN > > UPDATE > > SET > > vc_item_code = s.vc_item_code, > > vc_sub_group = s.vc_sub_group, > > vc_product_name = s.vc_product_name, > > vc_product_unit = s.vc_product_unit, > > vc_group_code = s.vc_group_code, > > dt_vat_start = s.dt_vat_start, > > vc_band_code = s.vc_band_code, > > vc_semi_band_code = s.vc_semi_band_code, > > vc_divisions = s.vc_divisions, > > nu_unit_factor = s.nu_unit_factor, > > vc_division_seg_code = s.vc_division_seg_code, > > vc_color_comb = s.vc_color_comb, > > dt_mod_date = s.dt_mod_date, > > vc_inactive_product = s.vc_inactive_product, > > rn = s.rn, > > country = s.country > > WHEN NOT MATCHED THEN > > INSERT > > ( > > vc_item_code, > > vc_sub_group, > > vc_product_name, > > vc_product_unit, > > vc_group_code, > > dt_vat_start, > > vc_band_code, > > vc_semi_band_code, > > vc_divisions, > > nu_unit_factor, > > vc_division_seg_code, > > vc_color_comb, > > dt_mod_date, > > vc_inactive_product, > > rn, > > country > > ) > > VALUES ( > > s.vc_item_code, > > s.vc_sub_group, > > s.vc_product_name, > > s.vc_product_unit, > > s.vc_group_code, > > s.dt_vat_start, > > s.vc_band_code, > > s.vc_semi_band_code, > > s.vc_divisions, > > s.nu_unit_factor, > > s.vc_division_seg_code, > > s.vc_color_comb, > > s.dt_mod_date, > > s.vc_inactive_product, > > s.rn, > > s.country > > ); > > > > Warm Regards >