Hi eagle dev: We use eagle for Cloudera CDH cluster. We use eagle with official website tutorial. runing OK for long time . But today , the kafka cluster has crash , and because the kafka crash lead to namenode error. The standby namenode auto trans to active status and lead to hadoop cluster error.
We think , the send hdfs_audit log should be a single daemon , Should not be configuration namenode log4j file , and by namenode start to load kakfa jars Because the namenode and 'send to kafka' these two in a single jvm daemon , that can crash namenode cause the kafka down. We think, eagle should design a single daemon to send hdfs audit log to kafka, should be decoupling not enhanced coupling. English is not good , you can understand is ok. I know eagle dev team have some chinese people so you team should understand chinese: 我们通过官方文档去配置eagle,按照文档说的配置namenode的log4j配置并将eagle的相关jar包放入namenode的classpath下,当重启namenode后,成功将 hdfs audit log 发送到kafka 并稳定运行了一段时间, 但是今天,kafka集群宕机了,导致namenode出现问题,datanode连接namenode出现超时,备用namenode开始接管集群,但是原先的活动namenode仍然标记为活动状态,最终导致 hadoop集群出现问题, 排查问题后发现,当kafka宕机后,namenode也出现异常,并导致了namenode的问题出现。 我们建议,不应该将发送至kafka的功能绑定到namenode之中,应当将这两者解耦,设计一个单独的进程去读取audit日志文件并发送至kafka 这样的话 当kafka宕机后 不会对namenode造成影响。 谢谢。 javaca...@163.com