Aki Tanaka created HIVE-18284:
---------------------------------

             Summary: NPE when inserting data with 'distribute by'
                 Key: HIVE-18284
                 URL: https://issues.apache.org/jira/browse/HIVE-18284
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 2.3.2, 2.3.1
            Reporter: Aki Tanaka


A Null Pointer Exception occurs when inserting data with 'distribute by' 
clause. The following snippet query reproduces this issue:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

I could run the insert query without the error if I remove Distribute By  or 
use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee 
clustering or sorting properties on the distributed keys.

FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
previous fsp which might be re-used when we use Distribute By.
https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972

The following stack trace is logged.

{code:java}
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_000000, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1513111717879_0056_1_01_000000_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
        ... 14 more
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
        ... 17 more
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to