> This problem has blocked me a whole week, anybodies have any ideas?

This might be a race condition here.

<https://github.com/apache/hive/blob/master/shims/common/src/main/java/org/
apache/hadoop/hive/io/HdfsUtils.java#L68>


aclStatus.getEntries(); is being modified without being copied (oddly with
Kerberos, it might be okay).


>> >= '1970-01-01 01:00:00' AND TBL_HIS_UWIP_SCAN_PROM.START_TIME <
>>'2010-01-01 01:00:00') DISTRIBUTE BY RAND();

Did Kylin generate this query? This pattern is known to cause data loss
during runtime. 

Distribute BY RAND() loses data when map tasks fail.

>        at org.apache.hadoop.hdfs.DFSClient.setAcl(DFSClient.java:3242)
...
>        at 
>org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:126)

> An interesting thing is that if I narrow down the 'where' to make the
>select query only return about 300,000 line, the insert SQL can be
>completed successfully.

Producing exactly 1 file will fix the issue.

Cheers,
Gopal










Reply via email to