The data flow you described should be correct.

But to be accurate, there are two stream processing for hdfs log monitoring.

Processing 1: data preparation, i.e. enrich the raw audit log. Here enrich
means add extra information to raw audit log.
The input is topic hdfs_audit_log_sandbox, and the output is topic
hdfs_audit_log_enriched_sandbox

Processing 2: policy evaluation.
The input is topic hdfs_audit_log_enriched_sandbox, and the output is alert
which will be persisted in Eagle database. I am NOT sure if policy
evaluation result will be also put into Kafka. Anyone who knows this please
correct me.


For troubleshooting, I would suggest you track the data from the beginning
to processing1 and then processing2.

I normally can use Kafka command to check if message has arrived if offset
is increased.  Also you can check Storm console to see if data is processed.



Thanks
Edward



On Thu, Dec 7, 2017 at 7:05 PM, 绿飕飕 <qi1070445...@gmail.com> wrote:

> *1.* For Install 'Hdfs Audit Log Monitor', I have set following config:
>
>     1.create two kafka topics: *hdfs_audit_log_sandbox*,
> *hdfs_audit_log_enriched_sandbox*
>
>     2.stream audit log into topic *hdfs_audit_log_sandbox*
>
>     3.Kafka Consumer Topic for HDFS Auditlog : *hdfs_audit_log_sandbox*
>
>     4. Kafka Topic for Auditlog Event Sink:
> *hdfs_audit_log_enriched_sandbox*
>
>     5. The Policy is from *HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX* insert
> into *hdfs_audit_log_enriched_stream_out*
>
>
> (a).  But the monitor not work, are there any error about set these config?
>
> (b). I think some topic such as hdfs_audit_log_enriched_stream_out will be
> create by eagle, is it right?
>
>
> *2.* Is it right of the data flow?Or I miss some steps ?
>
> data flow:   a-b-c-d-e-f
>
> a. hdfa --> *hdfs_audit_log*
>
> b. kafka topic -->  *hdfs_audit_log_sandbox*
>
>
>  c.* HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX :*
>
>         storm Spouts --> parserBolt
>
>         storm Bolts --> sensitivityJoin
>
>         storm Bolts --> ipZoneJoin
>
>         storm Bolts -->kafkaSink
>
> d. kafka topic --> *hdfs_audit_log_enriched_sandbox*​
>
>
> e. the Policy handle the *hdfs_audit_log_enriched_sandbox* and send the
> alert result to *hdfs_audit_log_enriched_stream_out*
>
> f. the error message would put into the storage
>
>
>
> *3. *could you recommend the version about the dependence of eagle v0.5.0?
>
>
> Thanks,
>
>     Qilv Wu
>

Reply via email to