On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes 
and millions more INFO level hive.log entries from ORC packages were being 
logged.
I feel these log entries should be at the DEBUG level.
Is there an existing bug in Hive or ORC?

Here is one example:
2015-04-06 15:12:43,212 INFO  orc.OrcInputFormat 
(OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = 
(EQUALS company XYZ)
leaf-1 = (EQUALS site DEF)
leaf-2 = (EQUALS table ABC)
expr = (and leaf-0 leaf-1 leaf-2)

To get an acceptable amount of logging that did not fill /tmp we had to add 
these entries to /etc/hive/conf/hive-log4j.settings:
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA


While I'm on the subject, to operationally harden Hive, I think Hive should use 
a more aggressive rolling file appender by default, one that can roll hourly or 
max size, compress the rolled logs…

- Douglas

Reply via email to