Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Carolyn Duby
I like the streaming enrichment solutions but it depends on how you are getting 
the data in.  If you get the data in a csv file just call the flat file loader 
from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 









On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need the
>batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this use
>case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how to
>handle releases of a nifi-metron-bundle. I'll probably get round to putting
>the code in my github at least in the next few days, while we figure out a
>more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile loader
>script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt 
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at creating
>> a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side of
>> dev-ops): I really don't want to have to use the flatfile loader script,
>> and I'm not going to be able to write a Metron-style HBase key generator
>> any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi
>> 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>>
>> 4. Create a new "dns" sensor in Metron
>> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
>> settings to push this into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>> "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>> "sensorTopic": "dns",
>> "parserConfig": {
>> "shew.table": " dns",
>> "shew.cf": "dns",
>> "shew.keyColumns": "name",
>> "shew.enrichmentType": "dns",
>> "columns": {
>> "name": 0,
>> "type": 1,
>> "data": 2
>> }
>> },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which looks
>> more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -Original Message-
>> From: Casey Stella [mailto:ceste...@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We
>> construct the key very specifically in Metron, so it's unlikely to work out
>> of the box with the NiFi processor unfortunately.  The key that we use is
>> formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>- type - The enrichment type
>>- indicator - the indicator to use
>>- hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that allows
>> the keys to be uniformly distributed among the regions and prevents
>> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
>> construct the key and pass it in, then you might be able to either
>> construct the key in NiFi or write a processor to construct the key.
>> Ultimately though, what Carolyn said is true..the easiest approach is
>> probably using the flatfile loader.
>> If you do get this working in NiFi, however, do please let us know and/or
>> consider contributing it back to the project as a PR :)
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>> charles.jo...@gresearch.co.uk>
>> wrote:
>>
>> > Hello,
>> >
>> > I work as a Dev/Ops Data Engineer within the security team at a
>> > company in London where we are in the process of implementing Metron.
>> > I have been tasked with implementing feeds of network environment data
>> > into HBase so that this data can be used as enrichment sources for our
>> s

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Simon Elliston Ball
Good solution. The streaming enrichment writer makes a lot of sense for
this, especially if you're not using huge enrichment sources that need the
batch based loaders.

As it happens I have written most of a NiFi processor to handle this use
case directly - both non-record and Record based, especially for Otto :).
The one thing we need to figure out now is where to host that, and how to
handle releases of a nifi-metron-bundle. I'll probably get round to putting
the code in my github at least in the next few days, while we figure out a
more permanent home.

Charlie, out of curiosity, what didn't you like about the flatfile loader
script?

Simon

On 12 June 2018 at 18:00, Charles Joynt 
wrote:

> Thanks for the responses. I appreciate the willingness to look at creating
> a NiFi processer. That would be great!
>
> Just to follow up on this (after a week looking after the "ops" side of
> dev-ops): I really don't want to have to use the flatfile loader script,
> and I'm not going to be able to write a Metron-style HBase key generator
> any time soon, but I have had some success with a different approach.
>
> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> 192.168.0.198"
> 2. Send this to a HTTP listener in NiFi
> 3. Write to a kafka topic
>
> I then followed your instructions in this blog:
> https://cwiki.apache.org/confluence/display/METRON/
> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>
> 4. Create a new "dns" sensor in Metron
> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
> settings to push this into HBase:
>
> {
> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> "writerClassName": "org.apache.metron.enrichment.writer.
> SimpleHbaseEnrichmentWriter",
> "sensorTopic": "dns",
> "parserConfig": {
> "shew.table": " dns",
> "shew.cf": "dns",
> "shew.keyColumns": "name",
> "shew.enrichmentType": "dns",
> "columns": {
> "name": 0,
> "type": 1,
> "data": 2
> }
> },
> }
>
> And... it seems to be working. At least, I have data in HBase which looks
> more like the output of the flatfile loader.
>
> Charlie
>
> -Original Message-
> From: Casey Stella [mailto:ceste...@gmail.com]
> Sent: 05 June 2018 14:56
> To: dev@metron.apache.org
> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
> The problem, as you correctly diagnosed, is the key in HBase.  We
> construct the key very specifically in Metron, so it's unlikely to work out
> of the box with the NiFi processor unfortunately.  The key that we use is
> formed here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>- type - The enrichment type
>- indicator - the indicator to use
>- hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> charles.jo...@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a
> > company in London where we are in the process of implementing Metron.
> > I have been tasked with implementing feeds of network environment data
> > into HBase so that this data can be used as enrichment sources for our
> security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
> > it exactly matches what I would get from the flatfile_loader.sh
> > script. A colleague of mine has already loaded some DNS data using
> > that script, so I am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is
> > working, in the sense that data is successfully written to HBase, but
> > despite (naively) specifying "Row Identifier Encoding Strategy =
> > Binary", the results in HBase don't look correct. Comparing the output
> > from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh pr

RE: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Charles Joynt
Thanks for the responses. I appreciate the willingness to look at creating a 
NiFi processer. That would be great!

Just to follow up on this (after a week looking after the "ops" side of 
dev-ops): I really don't want to have to use the flatfile loader script, and 
I'm not going to be able to write a Metron-style HBase key generator any time 
soon, but I have had some success with a different approach.

1. Generate data in CSV format, e.g. "server.domain.local","A","192.168.0.198"
2. Send this to a HTTP listener in NiFi
3. Write to a kafka topic

I then followed your instructions in this blog:
https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment

4. Create a new "dns" sensor in Metron
5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig settings 
to push this into HBase:

{
"parserClassName": "org.apache.metron.parsers.csv.CSVParser",
"writerClassName": 
"org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter",
"sensorTopic": "dns",
"parserConfig": {
"shew.table": " dns",
"shew.cf": "dns",
"shew.keyColumns": "name",
"shew.enrichmentType": "dns",
"columns": {
"name": 0,
"type": 1,
"data": 2
}
},
}

And... it seems to be working. At least, I have data in HBase which looks more 
like the output of the flatfile loader.

Charlie

-Original Message-
From: Casey Stella [mailto:ceste...@gmail.com] 
Sent: 05 June 2018 14:56
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

The problem, as you correctly diagnosed, is the key in HBase.  We construct the 
key very specifically in Metron, so it's unlikely to work out of the box with 
the NiFi processor unfortunately.  The key that we use is formed here in the 
codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows the 
keys to be uniformly distributed among the regions and prevents hotspotting.  
Depending on how the PutHBaseJSON processor works, if you can construct the key 
and pass it in, then you might be able to either construct the key in NiFi or 
write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is probably 
using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or 
consider contributing it back to the project as a PR :)



On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt 
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a 
> company in London where we are in the process of implementing Metron. 
> I have been tasked with implementing feeds of network environment data 
> into HBase so that this data can be used as enrichment sources for our 
> security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that 
> it exactly matches what I would get from the flatfile_loader.sh 
> script. A colleague of mine has already loaded some DNS data using 
> that script, so I am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP 
> listener and routes it to a PutHBaseJSON processor. The flow is 
> working, in the sense that data is successfully written to HBase, but 
> despite (naively) specifying "Row Identifier Encoding Strategy = 
> Binary", the results in HBase don't look correct. Comparing the output 
> from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840, 
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783, 
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> From source JSON:
>
>
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
> ","data":"192.168.0.198"}}
>
> I know that there are some differences in column family / field names, 
> but my worry is the ROW id. Presumably I need to encode my row key, 
> "k" in the JSON data, in a way that matches how the flatfile_loader.sh script 
> did it.
>
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-read