Re: Spark for Log Analytics

Steve Loughran Thu, 31 Mar 2016 04:26:43 -0700

On 31 Mar 2016, at 09:37, ashish rawat 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I have been evaluating Spark for analysing Application and Server Logs. I
believe there are some downsides of doing this:

1. No direct mechanism of collecting log, so need to introduce other tools like
Flume into the pipeline.

you need something to collect logs no matter what you run. Flume isn't so bad;
if you bring it up on the same host as the app then you can even collect logs
while the network is playing up.

Or you can just copy log4j files to HDFS and process them later

2. Need to write lots of code for parsing different patterns from logs, while
some of the log analysis tools like logstash or loggly provide it out of the box

Log parsing is essentially an ETL problem, especially if you don't try to lock
down the log event format.

You can also configure Log4J to save stuff in an easy-to-parse format and/or
forward directly to your application.

There's a log4j to flume connector to do that for you,

http://www.thecloudavenue.com/2013/11/using-log4jflume-to-log-application.html

or you can output in, say, JSON
(https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/log/Log4Json.java
)

I'd go with flume unless you had a need to save the logs locally and copy them
to HDFS laster.

On the benefits side, I believe Spark might be more performant (although I am
yet to benchmark it) and being a generic processing engine, might work with
complex use cases where the out of the box functionality of log analysis tools
is not sufficient (although I don't have any such use case right now).

One option I was considering was to use logstash for collection and basic
processing and then sink the processed logs to both elastic search and kafka.
So that Spark Streaming can pick data from Kafka for the complex use cases,
while logstash filters can be used for the simpler use cases.

I was wondering if someone has already done this evaluation and could provide
me some pointers on how/if to create this pipeline with Spark.

Regards,
Ashish

Re: Spark for Log Analytics

Reply via email to