David Mollitor created HIVE-25831:
-------------------------------------

             Summary: Report Progress on Every Record Read for CompactorMR
                 Key: HIVE-25831
                 URL: https://issues.apache.org/jira/browse/HIVE-25831
             Project: Hive
          Issue Type: Improvement
            Reporter: David Mollitor


Progress should be updated for every read of an input

 
{quote}
reads an input, writes an output, nor updates its status string
{quote}

https://github.com/apache/hive/blob/fffb31f2346df2b8011a9949895de21f506c0117/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L813-L828

I think ever loop should simply be calling {{progress()}}.  If during a major 
compaction there are a lot of deleted values, long gaps of time can occur 
without a progress update and the job may be timed out by YARN.

I'm not 100% sure this is happening, but just something I wanted to point out.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to