Report an bug or unreasonable design

javaca...@163.com Thu, 01 Dec 2016 19:39:08 -0800

Hi eagle dev:
    We use eagle for Cloudera CDH cluster. We use eagle with official website 
tutorial.
    runing OK for long time .
    But today , the kafka cluster has crash , and because the kafka crash lead 
to namenode error.
    The standby namenode auto trans to active status and lead to hadoop cluster 
error.


    We think , the send hdfs_audit log should be a single daemon , Should not 
be configuration namenode log4j file , and by namenode start to load kakfa jars

    Because the namenode and 'send to kafka' these two in a single jvm daemon , 
that can crash namenode cause the kafka down.

    We think, eagle should design a single daemon to send hdfs audit log to 
kafka,  should be decoupling not enhanced coupling.



    English is not good ， you can understand is ok.

    I know eagle dev team have some chinese people so you team should 
understand chinese:
    
我们通过官方文档去配置eagle，按照文档说的配置namenode的log4j配置并将eagle的相关jar包放入namenode的classpath下，当重启namenode后，成功将
    hdfs audit log 发送到kafka 并稳定运行了一段时间，
    
但是今天，kafka集群宕机了，导致namenode出现问题，datanode连接namenode出现超时，备用namenode开始接管集群，但是原先的活动namenode仍然标记为活动状态，最终导致
    hadoop集群出现问题，
    排查问题后发现，当kafka宕机后，namenode也出现异常，并导致了namenode的问题出现。

    我们建议，不应该将发送至kafka的功能绑定到namenode之中，应当将这两者解耦，设计一个单独的进程去读取audit日志文件并发送至kafka
    这样的话 当kafka宕机后 不会对namenode造成影响。

    谢谢。



javaca...@163.com

Report an bug or unreasonable design

Reply via email to