> This problem has blocked me a whole week, anybodies have any ideas? This might be a race condition here.
<https://github.com/apache/hive/blob/master/shims/common/src/main/java/org/ apache/hadoop/hive/io/HdfsUtils.java#L68> aclStatus.getEntries(); is being modified without being copied (oddly with Kerberos, it might be okay). >> >= '1970-01-01 01:00:00' AND TBL_HIS_UWIP_SCAN_PROM.START_TIME < >>'2010-01-01 01:00:00') DISTRIBUTE BY RAND(); Did Kylin generate this query? This pattern is known to cause data loss during runtime. Distribute BY RAND() loses data when map tasks fail. > at org.apache.hadoop.hdfs.DFSClient.setAcl(DFSClient.java:3242) ... > at >org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:126) > An interesting thing is that if I narrow down the 'where' to make the >select query only return about 300,000 line, the insert SQL can be >completed successfully. Producing exactly 1 file will fix the issue. Cheers, Gopal