Hi,

I have been evaluating Spark for analysing Application and Server Logs. I
believe there are some downsides of doing this:

1. No direct mechanism of collecting log, so need to introduce other tools
like Flume into the pipeline.
2. Need to write lots of code for parsing different patterns from logs,
while some of the log analysis tools like logstash or loggly provide it out
of the box

On the benefits side, I believe Spark might be more performant (although I
am yet to benchmark it) and being a generic processing engine, might work
with complex use cases where the out of the box functionality of log
analysis tools is not sufficient (although I don't have any such use case
right now).

One option I was considering was to use logstash for collection and basic
processing and then sink the processed logs to both elastic search and
kafka. So that Spark Streaming can pick data from Kafka for the complex use
cases, while logstash filters can be used for the simpler use cases.

I was wondering if someone has already done this evaluation and could
provide me some pointers on how/if to create this pipeline with Spark.

Regards,
Ashish

Reply via email to