Regarding why I didn't choose to load data with the flatfile loader script...
I want to be able to SEND enrichment data to Metron rather than have to set up
cron jobs to PULL data. At the moment I'm trying to prove that the process
works with a simple data source. In the future we will want enrichment data in
Metron that comes from systems (e.g. HR databases) that I won't have access to,
hence will need someone to be able to send us the data.
> Carolyn: just call the flat file loader from a script processor...
I didn't believe that would work in my environment. I'm pretty sure the script
has dependencies on various Metron JARs, not least for the row id hashing
algorithm. I suppose this would require at least a partial install of Metron
alongside NiFi, and would introduce additional work on the NiFi cluster for any
Metron upgrade. In some (enterprise) environments there might be separation of
ownership between NiFi and Metron.
I also prefer not to have a Java app calling a bash script which calls a new
java process, with logs or error output that might just get swallowed up
invisibly. Somewhere down the line this could hold up effective troubleshooting.
> Simon: I have actually written a stellar processor, which applies stellar to
> all FlowFile attributes...
Gulp.
> Simon: what didn't you like about the flatfile loader script?
The flatfile loader script has worked fine for me when prepping enrichment data
in test systems, however it was a bit of a chore to get the JSON configuration
files set up, especially for "wide" data sources that may have 15-20 fields,
e.g. Active Directory.
More broadly speaking, I want to embrace the streaming data paradigm and tried
to avoid batch jobs. With the DNS example, you might imagine a future where the
enrichment data is streamed based on DHCP registrations, DNS update events,
etc. In principle this could reduce the window of time where we might enrich a
data source with out-of-date data.
Charlie
-Original Message-
From: Carolyn Duby [mailto:cd...@hortonworks.com]
Sent: 12 June 2018 20:33
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
I like the streaming enrichment solutions but it depends on how you are getting
the data in. If you get the data in a csv file just call the flat file loader
from a script processor. No special Nifi required.
If the enrichments don’t arrive in bulk, the streaming solution is better.
Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584
Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer
– Boston - http://grnh.se/8gbxy41 Need Answers? Try
https://community.hortonworks.com
<https://community.hortonworks.com/answers/index.html>
On 6/12/18, 1:08 PM, "Simon Elliston Ball" wrote:
>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need
>the batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this
>use case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how
>to handle releases of a nifi-metron-bundle. I'll probably get round to
>putting the code in my github at least in the next few days, while we
>figure out a more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile
>loader script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at
>> creating a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side
>> of
>> dev-ops): I really don't want to have to use the flatfile loader
>> script, and I'm not going to be able to write a Metron-style HBase
>> key generator any time soon, but I have had some success with a different
>> approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>> ent
>>
>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
>> into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>> "writerClassName": &q