The data flow you described should be correct. But to be accurate, there are two stream processing for hdfs log monitoring.
Processing 1: data preparation, i.e. enrich the raw audit log. Here enrich means add extra information to raw audit log. The input is topic hdfs_audit_log_sandbox, and the output is topic hdfs_audit_log_enriched_sandbox Processing 2: policy evaluation. The input is topic hdfs_audit_log_enriched_sandbox, and the output is alert which will be persisted in Eagle database. I am NOT sure if policy evaluation result will be also put into Kafka. Anyone who knows this please correct me. For troubleshooting, I would suggest you track the data from the beginning to processing1 and then processing2. I normally can use Kafka command to check if message has arrived if offset is increased. Also you can check Storm console to see if data is processed. Thanks Edward On Thu, Dec 7, 2017 at 7:05 PM, 绿飕飕 <qi1070445...@gmail.com> wrote: > *1.* For Install 'Hdfs Audit Log Monitor', I have set following config: > > 1.create two kafka topics: *hdfs_audit_log_sandbox*, > *hdfs_audit_log_enriched_sandbox* > > 2.stream audit log into topic *hdfs_audit_log_sandbox* > > 3.Kafka Consumer Topic for HDFS Auditlog : *hdfs_audit_log_sandbox* > > 4. Kafka Topic for Auditlog Event Sink: > *hdfs_audit_log_enriched_sandbox* > > 5. The Policy is from *HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX* insert > into *hdfs_audit_log_enriched_stream_out* > > > (a). But the monitor not work, are there any error about set these config? > > (b). I think some topic such as hdfs_audit_log_enriched_stream_out will be > create by eagle, is it right? > > > *2.* Is it right of the data flow?Or I miss some steps ? > > data flow: a-b-c-d-e-f > > a. hdfa --> *hdfs_audit_log* > > b. kafka topic --> *hdfs_audit_log_sandbox* > > > c.* HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX :* > > storm Spouts --> parserBolt > > storm Bolts --> sensitivityJoin > > storm Bolts --> ipZoneJoin > > storm Bolts -->kafkaSink > > d. kafka topic --> *hdfs_audit_log_enriched_sandbox* > > > e. the Policy handle the *hdfs_audit_log_enriched_sandbox* and send the > alert result to *hdfs_audit_log_enriched_stream_out* > > f. the error message would put into the storage > > > > *3. *could you recommend the version about the dependence of eagle v0.5.0? > > > Thanks, > > Qilv Wu >