David Mollitor created HIVE-25831: ------------------------------------- Summary: Report Progress on Every Record Read for CompactorMR Key: HIVE-25831 URL: https://issues.apache.org/jira/browse/HIVE-25831 Project: Hive Issue Type: Improvement Reporter: David Mollitor
Progress should be updated for every read of an input {quote} reads an input, writes an output, nor updates its status string {quote} https://github.com/apache/hive/blob/fffb31f2346df2b8011a9949895de21f506c0117/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L813-L828 I think ever loop should simply be calling {{progress()}}. If during a major compaction there are a lot of deleted values, long gaps of time can occur without a progress update and the job may be timed out by YARN. I'm not 100% sure this is happening, but just something I wanted to point out. -- This message was sent by Atlassian Jira (v8.20.1#820001)