Thanks Lisoda for those insights.

@Okumin , this is what I observed when checking the log files.

Attached is a log file and the hive-site.xml file configuration.

I have observed this error comes when the execution engine is set to Tez ,
the moment i switch to MR the issue does not come up.

This is a materialized view.

Regards

On Mon, Sep 2, 2024 at 6:29 PM Okumin <m...@okumin.com> wrote:

> Hi Clinton,
>
> Thanks for sharing your problem. If you provide more information, such
> as a dataset or queries, we can reproduce it and file the problem.
>
> Hi Lisoda,
>
> Thanks for giving us real examples. Interesting. Can I understand the
> first problem that happens when there is a big data file in an Iceberg
> table? I could not reproduce it[1], and I am curious about the
> detailed conditions. As for the map-side aggregation, we also found
> and resolved a similar problem[2]. It could be good to check. The
> other issues are also interesting. I'd file tickets if I had evidence.
>
> - [1] https://gist.github.com/okumin/4fccec45109fc9927a22f40c166fe7f9
> - [2] https://issues.apache.org/jira/browse/HIVE-28428
>
> Regards,
> Okumin
>
> On Mon, Sep 2, 2024 at 2:59 PM clinton chikwata <clintonfun...@gmail.com>
> wrote:
> >
> > Hello Lisoda,
> >
> > Thanks for this information.
> >
> >
> > On Sun, Sep 1, 2024 at 4:04 PM lisoda <lis...@yeah.net> wrote:
> >>
> >> Hello Clinton:
> >>
> >> We have actually encountered the same issue where, in many cases,
> querying Iceberg does not meet expected efficiency, falling short of
> regular ORC/Parquet tables in speed. Since the current
> HiveIcebergInputSplit does not support splits based on file size, reading
> can be slow when individual data files in Iceberg are excessively large.
> This issue necessitates improvements from community developers in future
> iterations. Additionally, if Iceberg tables employ zstd compression, the
> current handling via Hive's aircompress library, a Java library, is notably
> less efficient than JNI implementations. This might only improve after a
> reconstruction leveraging JDK-SIMD. Furthermore, we have analyzed execution
> latency using flame graphs and discovered potential issues with the
> implementation of VectorGroupByOperator$ProcessingModeHashAggregate, which
> exhibits exceedingly poor performance. Consequently, as of now, with
> Iceberg tables, we can temporarily address the issue by increasing the
> number of map-tasks and reducing the size of individual data files in the
> Iceberg table. We hope these issues can be resolved in subsequent iterative
> developments.
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2024-08-28 14:41:03,"clinton chikwata" <clintonfun...@gmail.com> 写道:
> >>
> >> Thanks  Okumin.
> >>
> >> I am new to Hive and Tez  and I have struggled to deploy a
> high-performance Dockerized Hive setup. I followed the documentation for
> setting up a remote Metastore. I have a single node with 32 GB of RAM and 8
> cores, but I have a dataset of about 2 GB (Iceberg table partitioned on one
> column). However, when I run select queries, the performance has not been
> as fast as expected. Could someone share some insights, especially
> regarding hive-site.xml and Tez custom configuration?
> >>
> >> Any help would be appreciated.
> >>
> >> On Sun, Aug 4, 2024 at 4:46 PM Okumin <m...@okumin.com> wrote:
> >>>
> >>> Hi Clinton,
> >>>
> >>> I tested MERGE INTO with minimal reproduction. I saw the same error.
> >>>
> >>> ```
> >>> CREATE TABLE src (col1 INT, col2 INT);
> >>> CREATE TABLE dst (id BIGINT DEFAULT SURROGATE_KEY(), col1 INT, col2
> >>> INT, PRIMARY KEY (id) DISABLE NOVALIDATE) STORED BY ICEBERG;
> >>>
> >>> MERGE INTO dst d USING src s ON s.col1 = d.col1
> >>> WHEN MATCHED THEN UPDATE SET col2 = s.col2
> >>> WHEN NOT MATCHED THEN INSERT (col1, col2) VALUES (s.col1, s.col2);
> >>> ```
> >>>
> >>> The following query, which explicitly inserts `id`, succeeded on my
> >>> machine. The default keyword is unlikely to work on INSERT on MERGE
> >>> INTO. I've yet to investigate whether ANSI allows us to omit it.
> >>>
> >>> ```
> >>> MERGE INTO dst d USING src s ON s.col1 = d.col1
> >>> WHEN MATCHED THEN UPDATE SET col2 = s.col2
> >>> WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (SURROGATE_KEY(),
> >>> s.col1, s.col2);
> >>> ```
> >>>
> >>> As another point, the SURROGATE_KEY might not work as you expected. It
> >>> doesn't generate globally unique ids on my attempts.
> >>>
> >>> Regards,
> >>> Okumin
> >>>
> >>> On Wed, Jul 31, 2024 at 4:54 PM clinton chikwata
> >>> <clintonfun...@gmail.com> wrote:
> >>> >
> >>> > Dear Team,
> >>> >
> >>> > Any help will be much appreciated.
> >>> >
> >>> > Error SQL Error [40000] [42000]: Error while compiling statement:
> FAILED: SemanticException Schema of both sides of union should match.
> >>> >
> >>> > I have an ETL workload that stores data into temp_table with the
> schema as shown below.
> >>> >
> >>> > CREATE EXTERNAL TABLE IF NOT EXISTS temp_table (
> >>> >     VC_ITEM_CODE STRING,
> >>> >     VC_SUB_GROUP STRING,
> >>> >     VC_PRODUCT_NAME STRING,
> >>> >     VC_PRODUCT_UNIT STRING,
> >>> >     VC_GROUP_CODE STRING,
> >>> >     DT_VAT_START TIMESTAMP,
> >>> >     VC_BAND_CODE STRING,
> >>> >     VC_SEMI_BAND_CODE STRING,
> >>> >     VC_DIVISIONS STRING,
> >>> >     NU_UNIT_FACTOR DECIMAL(30, 0),
> >>> >     VC_DIVISION_SEG_CODE STRING,
> >>> >     VC_COLOR_COMB STRING,
> >>> >     DT_MOD_DATE TIMESTAMP,
> >>> >     VC_INACTIVE_PRODUCT STRING,
> >>> >     RN DECIMAL(10, 0),
> >>> >     country STRING
> >>> >     )
> >>> > STORED AS PARQUET
> >>> > LOCATION 'S{path}'
> >>> >
> >>> > Then i want to load it to the final table
> >>> >
> >>> > CREATE TABLE product_dimension (
> >>> >    `ID` BIGINT DEFAULT SURROGATE_KEY(),
> >>> >     VC_ITEM_CODE STRING,
> >>> >     VC_SUB_GROUP STRING,
> >>> >     VC_PRODUCT_NAME STRING,
> >>> >     VC_PRODUCT_UNIT STRING,
> >>> >     VC_GROUP_CODE STRING,
> >>> >     DT_VAT_START TIMESTAMP,
> >>> >     VC_BAND_CODE STRING,
> >>> >     VC_SEMI_BAND_CODE STRING,
> >>> >     VC_DIVISIONS STRING,
> >>> >     NU_UNIT_FACTOR DECIMAL(30, 0),
> >>> >     VC_DIVISION_SEG_CODE STRING,
> >>> >     VC_COLOR_COMB STRING,
> >>> >     DT_MOD_DATE TIMESTAMP,
> >>> >     VC_INACTIVE_PRODUCT STRING,
> >>> >     RN DECIMAL(10, 0),
> >>> >     country STRING,
> >>> >     PRIMARY KEY (ID) DISABLE NOVALIDATE)
> >>> > STORED BY ICEBERG;
> >>> >
> >>> > When I attempt to perform a merge operation on column  vc_item_code
> i get the error as shown above :
> >>> >
> >>> > MERGE
> >>> > INTO
> >>> > product_dimension AS c
> >>> > USING (
> >>> > SELECT
> >>> > *
> >>> > FROM
> >>> > temp_table) AS s ON  s.vc_item_code = c.vc_item_code
> >>> > AND s.country = c.country
> >>> > WHEN MATCHED THEN
> >>> > UPDATE
> >>> > SET
> >>> > vc_item_code = s.vc_item_code,
> >>> > vc_sub_group = s.vc_sub_group,
> >>> > vc_product_name = s.vc_product_name,
> >>> > vc_product_unit = s.vc_product_unit,
> >>> > vc_group_code = s.vc_group_code,
> >>> > dt_vat_start = s.dt_vat_start,
> >>> > vc_band_code = s.vc_band_code,
> >>> > vc_semi_band_code = s.vc_semi_band_code,
> >>> > vc_divisions = s.vc_divisions,
> >>> > nu_unit_factor = s.nu_unit_factor,
> >>> > vc_division_seg_code = s.vc_division_seg_code,
> >>> > vc_color_comb = s.vc_color_comb,
> >>> > dt_mod_date = s.dt_mod_date,
> >>> > vc_inactive_product = s.vc_inactive_product,
> >>> > rn = s.rn,
> >>> > country = s.country
> >>> > WHEN NOT MATCHED THEN
> >>> > INSERT
> >>> > (
> >>> >     vc_item_code,
> >>> > vc_sub_group,
> >>> > vc_product_name,
> >>> > vc_product_unit,
> >>> > vc_group_code,
> >>> > dt_vat_start,
> >>> > vc_band_code,
> >>> > vc_semi_band_code,
> >>> > vc_divisions,
> >>> > nu_unit_factor,
> >>> > vc_division_seg_code,
> >>> > vc_color_comb,
> >>> > dt_mod_date,
> >>> > vc_inactive_product,
> >>> > rn,
> >>> > country
> >>> >     )
> >>> > VALUES (
> >>> > s.vc_item_code,
> >>> > s.vc_sub_group,
> >>> > s.vc_product_name,
> >>> > s.vc_product_unit,
> >>> > s.vc_group_code,
> >>> > s.dt_vat_start,
> >>> > s.vc_band_code,
> >>> > s.vc_semi_band_code,
> >>> > s.vc_divisions,
> >>> > s.nu_unit_factor,
> >>> > s.vc_division_seg_code,
> >>> > s.vc_color_comb,
> >>> > s.dt_mod_date,
> >>> > s.vc_inactive_product,
> >>> > s.rn,
> >>> > s.country
> >>> > );
> >>> >
> >>> > Warm Regards
>
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
killedTasks:0, Vertex vertex_1725219018549_0001_2_00 [Map 1] killed/failed due 
to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1725219018549_0001_2_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex 
vertex_1725219018549_0001_2_02 [Reducer 3] killed/failed due 
to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1725219018549_0001_2_01, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex 
vertex_1725219018549_0001_2_01 [Reducer 2] killed/failed due 
to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1725219018549_0001_2_00, diagnostics=[Task failed, 
taskId=task_1725219018549_0001_2_00_000001, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Node: 66a1865ec7dd/192.168.32.4 : Error while running task ( 
failure ) : 
attempt_1725219018549_0001_2_00_000001_0:java.lang.RuntimeException: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 16 more
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:120)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:602)
        at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1168)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:861)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        ... 19 more
], TaskAttempt 1 failed, info=[Error: Node: 66a1865ec7dd/192.168.32.4 : Error 
while running task ( failure ) : 
attempt_1725219018549_0001_2_00_000001_1:java.lang.RuntimeException: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=TULIP
 DIAGNOSTICS (P) LTD/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=TULIP
 DIAGNOSTICS (P) LTD/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 16 more
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=TULIP
 DIAGNOSTICS (P) LTD/000000_0
        at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:120)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:602)
        at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1168)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:861)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        ... 19 more
], TaskAttempt 2 failed, info=[Error: Node: 66a1865ec7dd/192.168.32.4 : Error 
while running task ( failure ) : 
attempt_1725219018549_0001_2_00_000001_2:java.lang.RuntimeException: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/materialized_company_sales
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/opt/hive/data/warehouse/warehouse.db/materialized_company_sales
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 16 more
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/materialized_company_sales
        at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:120)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:602)
        at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1168)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:861)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        ... 19 more
], TaskAttempt 3 failed, info=[Error: Node: 66a1865ec7dd/192.168.32.4 : Error 
while running task ( failure ) : 
attempt_1725219018549_0001_2_00_000001_3:java.lang.RuntimeException: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 16 more
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/opt/hive/data/warehouse/warehouse.db/supplier_wise_perfomance/supplier=OTHERS/000000_0
        at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:120)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:602)
        at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1168)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:861)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        ... 19 more



my hive-site.xml

<configuration>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.tez.exec.inplace.progress</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.exec.scratchdir</name>
        <value>/opt/hive/scratch_dir</value>
    </property>
    <property>
        <name>hive.user.install.directory</name>
        <value>/opt/hive/install_dir</value>
    </property>
    <property>
        <name>tez.runtime.optimize.local.fetch</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.submit.local.task.via.child</name>
        <value>false</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>local</value>
    </property>
    <property>
        <name>tez.local.mode</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.execution.engine</name>
        <value>tez</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/opt/hive/data/warehouse</value>
    </property>
    <property>
        <name>metastore.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
     <name>iceberg.engine.hive.enabled</name>
      <value>true</value>
    </property>
     <property>
     <name>hive.vectorized.execution.enabled</name>
      <value>true</value>
    </property>
</configuration>
Any help will be much appreciated.

Reply via email to