Re: Spark or custom processor?

Conrad Crampton Thu, 02 Jun 2016 23:31:45 -0700

Andre, helpful comments – I did consider the logstash part you describe a few 
weeks ago for another use case, but I can see why you are using it for this. 
When MiNiFi comes along, I may consider using that to mimic your topology.
Thanks
Conrad

From: Andre <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, 3 June 2016 at 00:26
To: "[email protected]" <[email protected]>
Subject: Re: Spark or custom processor?

Conrad,

Your work stream is very similar to mine. NIFI will works ok by itself, without 
the need for Spark (keep Spark option there but for other types of processing).

What we do is:

Syslog -> local disk -> logstash-forwarder (tail) -> ListenLumberjack processor
(PR290 -  experimental and not yet merged) -> ParseSyslog -> BlackMagicStuff

The reason we do it this way is to decouple the data flow from blocking 
mechanisms such as RELP and Lumberjack; no matter what happens with NiFi 
cluster, you still have a copy of the data for replay.

This is particularly relevant on environments where you would use TCP syslog or 
any other protocol that can block if unable to push log messages (search for 
tcp syslog causing an outage to Atlassian cloud a few years ago).

We are not an Internet scale shop but still have enough logs to make a SIEM 
suffer and in our opinion NiFi is able to perform well.

For load balancing, any session based TCP base lb will help you utilise all 
your nifi nodes.

Cheers
On 2 Jun 2016 23:28, "Conrad Crampton" 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
ListenSyslog (using the approach that is being discussed currently in another 
thread – ListenSyslog running on primary node as RGP, all other nodes 
connecting to the port that the RPG exposes).
Various enrichment, routing on attributes etc. and finally into HDFS as Avro.
I want to branch off at an appropriate point in the flow and do some further 
realtime analysis – got the output to port feeding to Spark process working 
fine (notwithstanding the issue that you have been so kind to help with 
previously with the SSLContext), just thinking about if this is most 
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/ enriching 
etc. – probably could have done with ExecuteScript processor in hindsight) so 
am comfortable with going this route if that is deemed more appropriate.

Thanks
Conrad

From: Bryan Bende <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, 2 June 2016 at 13:12
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Spark or custom processor?

Conrad,

I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP, 
List+FetchFile?

-Bryan

On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad

SecureData, combating cyber threats

________________________________

The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT

***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.

Re: Spark or custom processor?

Reply via email to