Hi eagle dev:
    We use eagle for Cloudera CDH cluster. We use eagle with official website 
tutorial.
    runing OK for long time .
    But today , the kafka cluster has crash , and because the kafka crash lead 
to namenode error.
    The standby namenode auto trans to active status and lead to hadoop cluster 
error.

    We think , the send hdfs_audit log should be a single daemon , Should not 
be configuration namenode log4j file , and by namenode start to load kakfa jars

    Because the namenode and 'send to kafka' these two in a single jvm daemon , 
that can crash namenode cause the kafka down.

    We think, eagle should design a single daemon to send hdfs audit log to 
kafka,  should be decoupling not enhanced coupling.



    English is not good , you can understand is ok.

    I know eagle dev team have some chinese people so you team should 
understand chinese:
    
我们通过官方文档去配置eagle,按照文档说的配置namenode的log4j配置并将eagle的相关jar包放入namenode的classpath下,当重启namenode后,成功将
    hdfs audit log 发送到kafka 并稳定运行了一段时间,
    
但是今天,kafka集群宕机了,导致namenode出现问题,datanode连接namenode出现超时,备用namenode开始接管集群,但是原先的活动namenode仍然标记为活动状态,最终导致
    hadoop集群出现问题,
    排查问题后发现,当kafka宕机后,namenode也出现异常,并导致了namenode的问题出现。

    我们建议,不应该将发送至kafka的功能绑定到namenode之中,应当将这两者解耦,设计一个单独的进程去读取audit日志文件并发送至kafka
    这样的话 当kafka宕机后 不会对namenode造成影响。

    谢谢。



javaca...@163.com

Reply via email to