Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Simon Elliston Ball
Not convinced we should be writing Jiras against the metron project, or the
nifi project if we don't know where it's actually going to end up to be
honest. In any case, working code:
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle
which is currently in a metron fork, for no particular reason. Also, it
needs proper tests, docs and all that jazz, but PoC grade, it works,
scales, and is moderately robust as long as hbase doesn't fall over too
much.

Simon

On 13 June 2018 at 15:24, Otto Fowler  wrote:

> Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
> lays out some
> requirements and context.
>
>
> On June 13, 2018 at 10:04:27, Casey Stella (ceste...@gmail.com) wrote:
>
> no, sadly we do not.
>
> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby 
> wrote:
>
> > Agreed….Streaming enrichments is the right solution for DNS data.
> >
> > Do we have a web service for writing enrichments?
> >
> > Carolyn Duby
> > Solutions Engineer, Northeast
> > cd...@hortonworks.com
> > +1.508.965.0584
> >
> > Join my team!
> > Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> > Solutions Engineer – Boston - http://grnh.se/8gbxy41
> > Need Answers? Try https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6/13/18, 6:25 AM, "Charles Joynt" 
> > wrote:
> >
> > >Regarding why I didn't choose to load data with the flatfile loader
> > script...
> > >
> > >I want to be able to SEND enrichment data to Metron rather than have to
> > set up cron jobs to PULL data. At the moment I'm trying to prove that the
> > process works with a simple data source. In the future we will want
> > enrichment data in Metron that comes from systems (e.g. HR databases)
> that
> > I won't have access to, hence will need someone to be able to send us the
> > data.
> > >
> > >> Carolyn: just call the flat file loader from a script processor...
> > >
> > >I didn't believe that would work in my environment. I'm pretty sure the
> > script has dependencies on various Metron JARs, not least for the row id
> > hashing algorithm. I suppose this would require at least a partial
> install
> > of Metron alongside NiFi, and would introduce additional work on the NiFi
> > cluster for any Metron upgrade. In some (enterprise) environments there
> > might be separation of ownership between NiFi and Metron.
> > >
> > >I also prefer not to have a Java app calling a bash script which calls a
> > new java process, with logs or error output that might just get swallowed
> > up invisibly. Somewhere down the line this could hold up effective
> > troubleshooting.
> > >
> > >> Simon: I have actually written a stellar processor, which applies
> > stellar to all FlowFile attributes...
> > >
> > >Gulp.
> > >
> > >> Simon: what didn't you like about the flatfile loader script?
> > >
> > >The flatfile loader script has worked fine for me when prepping
> > enrichment data in test systems, however it was a bit of a chore to get
> the
> > JSON configuration files set up, especially for "wide" data sources that
> > may have 15-20 fields, e.g. Active Directory.
> > >
> > >More broadly speaking, I want to embrace the streaming data paradigm and
> > tried to avoid batch jobs. With the DNS example, you might imagine a
> future
> > where the enrichment data is streamed based on DHCP registrations, DNS
> > update events, etc. In principle this could reduce the window of time
> where
> > we might enrich a data source with out-of-date data.
> > >
> > >Charlie
> > >
> > >-Original Message-
> > >From: Carolyn Duby [mailto:cd...@hortonworks.com]
> > >Sent: 12 June 2018 20:33
> > >To: dev@metron.apache.org
> > >Subject: Re: Writing enrichment data directly from NiFi with
> PutHBaseJSON
> > >
> > >I like the streaming enrichment solutions but it depends on how you are
> > getting the data in. If you get the data in a csv file just call the flat
> > file loader from a script processor. No special Nifi required.
> > >
> > >If the enrichments don’t arrive in bulk, the streaming solution is
> better.
> > >
> > >Thanks
> > >Carolyn Duby
> > >Solutions Engineer, Northeast
> > >cd...@hortonworks.com
> > >+1.508.96

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Otto Fowler
-jiras-


On June 13, 2018 at 10:30:26, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That’s where something like the Nifi solution would come in...

With the PutEnrichment processor and a ProcessHttpRequest processor, you do
have a web service for loading enrichments.

We could probably also create a rest service end point for it, which would
make some sense, but there is a nice multi-source, queuing, and lineage
element to the nifi solution.

Simon

> On 13 Jun 2018, at 15:04, Casey Stella  wrote:
>
> no, sadly we do not.
>
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby 
wrote:
>>
>> Agreed….Streaming enrichments is the right solution for DNS data.
>>
>> Do we have a web service for writing enrichments?
>>
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cd...@hortonworks.com
>> +1.508.965.0584
>>
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/13/18, 6:25 AM, "Charles Joynt" 
>> wrote:
>>
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>>
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that
the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases)
that
>> I won't have access to, hence will need someone to be able to send us
the
>> data.
>>>
>>>> Carolyn: just call the flat file loader from a script processor...
>>>
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial
install
>> of Metron alongside NiFi, and would introduce additional work on the
NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>>
>>> I also prefer not to have a Java app calling a bash script which calls
a
>> new java process, with logs or error output that might just get
swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>>
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>>
>>> Gulp.
>>>
>>>> Simon: what didn't you like about the flatfile loader script?
>>>
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get
the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>>
>>> More broadly speaking, I want to embrace the streaming data paradigm
and
>> tried to avoid batch jobs. With the DNS example, you might imagine a
future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time
where
>> we might enrich a data source with out-of-date data.
>>>
>>> Charlie
>>>
>>> -Original Message-
>>> From: Carolyn Duby [mailto:cd...@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
>>>
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in. If you get the data in a csv file just call the
flat
>> file loader from a script processor. No special Nifi required.
>>>
>>> If the enrichments don’t arrive in bulk, the streaming solution is
better.
>>>
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cd...@hortonworks.com
>>> +1.508.965.0584
>>>
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>>
>>>
>>> On 6/12/18, 1:08 PM, &q

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Otto Fowler
Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
lays out some
requirements and context.


On June 13, 2018 at 10:04:27, Casey Stella (ceste...@gmail.com) wrote:

no, sadly we do not.

On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby 
wrote:

> Agreed….Streaming enrichments is the right solution for DNS data.
>
> Do we have a web service for writing enrichments?
>
> Carolyn Duby
> Solutions Engineer, Northeast
> cd...@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 6/13/18, 6:25 AM, "Charles Joynt" 
> wrote:
>
> >Regarding why I didn't choose to load data with the flatfile loader
> script...
> >
> >I want to be able to SEND enrichment data to Metron rather than have to
> set up cron jobs to PULL data. At the moment I'm trying to prove that the
> process works with a simple data source. In the future we will want
> enrichment data in Metron that comes from systems (e.g. HR databases)
that
> I won't have access to, hence will need someone to be able to send us the
> data.
> >
> >> Carolyn: just call the flat file loader from a script processor...
> >
> >I didn't believe that would work in my environment. I'm pretty sure the
> script has dependencies on various Metron JARs, not least for the row id
> hashing algorithm. I suppose this would require at least a partial
install
> of Metron alongside NiFi, and would introduce additional work on the NiFi
> cluster for any Metron upgrade. In some (enterprise) environments there
> might be separation of ownership between NiFi and Metron.
> >
> >I also prefer not to have a Java app calling a bash script which calls a
> new java process, with logs or error output that might just get swallowed
> up invisibly. Somewhere down the line this could hold up effective
> troubleshooting.
> >
> >> Simon: I have actually written a stellar processor, which applies
> stellar to all FlowFile attributes...
> >
> >Gulp.
> >
> >> Simon: what didn't you like about the flatfile loader script?
> >
> >The flatfile loader script has worked fine for me when prepping
> enrichment data in test systems, however it was a bit of a chore to get
the
> JSON configuration files set up, especially for "wide" data sources that
> may have 15-20 fields, e.g. Active Directory.
> >
> >More broadly speaking, I want to embrace the streaming data paradigm and
> tried to avoid batch jobs. With the DNS example, you might imagine a
future
> where the enrichment data is streamed based on DHCP registrations, DNS
> update events, etc. In principle this could reduce the window of time
where
> we might enrich a data source with out-of-date data.
> >
> >Charlie
> >
> >-Original Message-
> >From: Carolyn Duby [mailto:cd...@hortonworks.com]
> >Sent: 12 June 2018 20:33
> >To: dev@metron.apache.org
> >Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
> >
> >I like the streaming enrichment solutions but it depends on how you are
> getting the data in. If you get the data in a csv file just call the flat
> file loader from a script processor. No special Nifi required.
> >
> >If the enrichments don’t arrive in bulk, the streaming solution is
better.
> >
> >Thanks
> >Carolyn Duby
> >Solutions Engineer, Northeast
> >cd...@hortonworks.com
> >+1.508.965.0584
> >
> >Join my team!
> >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
> >
> >
> >On 6/12/18, 1:08 PM, "Simon Elliston Ball" 
> wrote:
> >
> >>Good solution. The streaming enrichment writer makes a lot of sense for
> >>this, especially if you're not using huge enrichment sources that need
> >>the batch based loaders.
> >>
> >>As it happens I have written most of a NiFi processor to handle this
> >>use case directly - both non-record and Record based, especially for
> Otto :).
> >>The one thing we need to figure out now is where to host that, and how
> >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> >>putting the code in my github at least in the next few days, while we
> >

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Simon Elliston Ball
That’s where something like the Nifi solution would come in... 

With the PutEnrichment processor and a ProcessHttpRequest processor, you do 
have a web service for loading enrichments.

We could probably also create a rest service end point for it, which would make 
some sense, but there is a nice multi-source, queuing, and lineage element to 
the nifi solution.

Simon 

> On 13 Jun 2018, at 15:04, Casey Stella  wrote:
> 
> no, sadly we do not.
> 
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby  wrote:
>> 
>> Agreed….Streaming enrichments is the right solution for DNS data.
>> 
>> Do we have a web service for writing enrichments?
>> 
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cd...@hortonworks.com
>> +1.508.965.0584
>> 
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 6/13/18, 6:25 AM, "Charles Joynt" 
>> wrote:
>> 
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>> 
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases) that
>> I won't have access to, hence will need someone to be able to send us the
>> data.
>>> 
>>>> Carolyn: just call the flat file loader from a script processor...
>>> 
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial install
>> of Metron alongside NiFi, and would introduce additional work on the NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>> 
>>> I also prefer not to have a Java app calling a bash script which calls a
>> new java process, with logs or error output that might just get swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>> 
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>> 
>>> Gulp.
>>> 
>>>> Simon: what didn't you like about the flatfile loader script?
>>> 
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>> 
>>> More broadly speaking, I want to embrace the streaming data paradigm and
>> tried to avoid batch jobs. With the DNS example, you might imagine a future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time where
>> we might enrich a data source with out-of-date data.
>>> 
>>> Charlie
>>> 
>>> -Original Message-
>>> From: Carolyn Duby [mailto:cd...@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>> 
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in.  If you get the data in a csv file just call the flat
>> file loader from a script processor.  No special Nifi required.
>>> 
>>> If the enrichments don’t arrive in bulk, the streaming solution is better.
>>> 
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cd...@hortonworks.com
>>> +1.508.965.0584
>>> 
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>> 
>>> 
>>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" 
>> wrote:
>>>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Casey Stella
no, sadly we do not.

On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby  wrote:

> Agreed….Streaming enrichments is the right solution for DNS data.
>
> Do we have a web service for writing enrichments?
>
> Carolyn Duby
> Solutions Engineer, Northeast
> cd...@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 6/13/18, 6:25 AM, "Charles Joynt" 
> wrote:
>
> >Regarding why I didn't choose to load data with the flatfile loader
> script...
> >
> >I want to be able to SEND enrichment data to Metron rather than have to
> set up cron jobs to PULL data. At the moment I'm trying to prove that the
> process works with a simple data source. In the future we will want
> enrichment data in Metron that comes from systems (e.g. HR databases) that
> I won't have access to, hence will need someone to be able to send us the
> data.
> >
> >> Carolyn: just call the flat file loader from a script processor...
> >
> >I didn't believe that would work in my environment. I'm pretty sure the
> script has dependencies on various Metron JARs, not least for the row id
> hashing algorithm. I suppose this would require at least a partial install
> of Metron alongside NiFi, and would introduce additional work on the NiFi
> cluster for any Metron upgrade. In some (enterprise) environments there
> might be separation of ownership between NiFi and Metron.
> >
> >I also prefer not to have a Java app calling a bash script which calls a
> new java process, with logs or error output that might just get swallowed
> up invisibly. Somewhere down the line this could hold up effective
> troubleshooting.
> >
> >> Simon: I have actually written a stellar processor, which applies
> stellar to all FlowFile attributes...
> >
> >Gulp.
> >
> >> Simon: what didn't you like about the flatfile loader script?
> >
> >The flatfile loader script has worked fine for me when prepping
> enrichment data in test systems, however it was a bit of a chore to get the
> JSON configuration files set up, especially for "wide" data sources that
> may have 15-20 fields, e.g. Active Directory.
> >
> >More broadly speaking, I want to embrace the streaming data paradigm and
> tried to avoid batch jobs. With the DNS example, you might imagine a future
> where the enrichment data is streamed based on DHCP registrations, DNS
> update events, etc. In principle this could reduce the window of time where
> we might enrich a data source with out-of-date data.
> >
> >Charlie
> >
> >-Original Message-
> >From: Carolyn Duby [mailto:cd...@hortonworks.com]
> >Sent: 12 June 2018 20:33
> >To: dev@metron.apache.org
> >Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
> >
> >I like the streaming enrichment solutions but it depends on how you are
> getting the data in.  If you get the data in a csv file just call the flat
> file loader from a script processor.  No special Nifi required.
> >
> >If the enrichments don’t arrive in bulk, the streaming solution is better.
> >
> >Thanks
> >Carolyn Duby
> >Solutions Engineer, Northeast
> >cd...@hortonworks.com
> >+1.508.965.0584
> >
> >Join my team!
> >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
> >
> >
> >On 6/12/18, 1:08 PM, "Simon Elliston Ball" 
> wrote:
> >
> >>Good solution. The streaming enrichment writer makes a lot of sense for
> >>this, especially if you're not using huge enrichment sources that need
> >>the batch based loaders.
> >>
> >>As it happens I have written most of a NiFi processor to handle this
> >>use case directly - both non-record and Record based, especially for
> Otto :).
> >>The one thing we need to figure out now is where to host that, and how
> >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> >>putting the code in my github at least in the next few days, while we
> >>figure out a more permanent home.
> >>
> >>Charlie, out of curiosity, what didn't you like about the flatfile
> >>loader script?
> >>
> &

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Carolyn Duby
Agreed….Streaming enrichments is the right solution for DNS data.

Do we have a web service for writing enrichments?

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/13/18, 6:25 AM, "Charles Joynt"  wrote:

>Regarding why I didn't choose to load data with the flatfile loader script...
>
>I want to be able to SEND enrichment data to Metron rather than have to set up 
>cron jobs to PULL data. At the moment I'm trying to prove that the process 
>works with a simple data source. In the future we will want enrichment data in 
>Metron that comes from systems (e.g. HR databases) that I won't have access 
>to, hence will need someone to be able to send us the data.
>
>> Carolyn: just call the flat file loader from a script processor...
>
>I didn't believe that would work in my environment. I'm pretty sure the script 
>has dependencies on various Metron JARs, not least for the row id hashing 
>algorithm. I suppose this would require at least a partial install of Metron 
>alongside NiFi, and would introduce additional work on the NiFi cluster for 
>any Metron upgrade. In some (enterprise) environments there might be 
>separation of ownership between NiFi and Metron.
>
>I also prefer not to have a Java app calling a bash script which calls a new 
>java process, with logs or error output that might just get swallowed up 
>invisibly. Somewhere down the line this could hold up effective 
>troubleshooting.
>
>> Simon: I have actually written a stellar processor, which applies stellar to 
>> all FlowFile attributes...
>
>Gulp.
>
>> Simon: what didn't you like about the flatfile loader script?
>
>The flatfile loader script has worked fine for me when prepping enrichment 
>data in test systems, however it was a bit of a chore to get the JSON 
>configuration files set up, especially for "wide" data sources that may have 
>15-20 fields, e.g. Active Directory.
>
>More broadly speaking, I want to embrace the streaming data paradigm and tried 
>to avoid batch jobs. With the DNS example, you might imagine a future where 
>the enrichment data is streamed based on DHCP registrations, DNS update 
>events, etc. In principle this could reduce the window of time where we might 
>enrich a data source with out-of-date data.
>
>Charlie
>
>-Original Message-
>From: Carolyn Duby [mailto:cd...@hortonworks.com] 
>Sent: 12 June 2018 20:33
>To: dev@metron.apache.org
>Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
>I like the streaming enrichment solutions but it depends on how you are 
>getting the data in.  If you get the data in a csv file just call the flat 
>file loader from a script processor.  No special Nifi required.
>
>If the enrichments don’t arrive in bulk, the streaming solution is better.
>
>Thanks
>Carolyn Duby
>Solutions Engineer, Northeast
>cd...@hortonworks.com
>+1.508.965.0584
>
>Join my team!
>Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions 
>Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try 
>https://community.hortonworks.com 
><https://community.hortonworks.com/answers/index.html>
>
>
>On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:
>
>>Good solution. The streaming enrichment writer makes a lot of sense for 
>>this, especially if you're not using huge enrichment sources that need 
>>the batch based loaders.
>>
>>As it happens I have written most of a NiFi processor to handle this 
>>use case directly - both non-record and Record based, especially for Otto :).
>>The one thing we need to figure out now is where to host that, and how 
>>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>>putting the code in my github at least in the next few days, while we 
>>figure out a more permanent home.
>>
>>Charlie, out of curiosity, what didn't you like about the flatfile 
>>loader script?
>>
>>Simon
>>
>>On 12 June 2018 at 18:00, Charles Joynt 
>>wrote:
>>
>>> Thanks for the responses. I appreciate the willingness to look at 
>>> creating a NiFi processer. That would be great!
>>>
>>> Just to follow up on this (after a week looking after the "ops" side 
>>> of
>>> dev-ops): I really don't want to have to use the flatfile loader 
>>> script, and I'm not going to be a

RE: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Charles Joynt
Regarding why I didn't choose to load data with the flatfile loader script...

I want to be able to SEND enrichment data to Metron rather than have to set up 
cron jobs to PULL data. At the moment I'm trying to prove that the process 
works with a simple data source. In the future we will want enrichment data in 
Metron that comes from systems (e.g. HR databases) that I won't have access to, 
hence will need someone to be able to send us the data.

> Carolyn: just call the flat file loader from a script processor...

I didn't believe that would work in my environment. I'm pretty sure the script 
has dependencies on various Metron JARs, not least for the row id hashing 
algorithm. I suppose this would require at least a partial install of Metron 
alongside NiFi, and would introduce additional work on the NiFi cluster for any 
Metron upgrade. In some (enterprise) environments there might be separation of 
ownership between NiFi and Metron.

I also prefer not to have a Java app calling a bash script which calls a new 
java process, with logs or error output that might just get swallowed up 
invisibly. Somewhere down the line this could hold up effective troubleshooting.

> Simon: I have actually written a stellar processor, which applies stellar to 
> all FlowFile attributes...

Gulp.

> Simon: what didn't you like about the flatfile loader script?

The flatfile loader script has worked fine for me when prepping enrichment data 
in test systems, however it was a bit of a chore to get the JSON configuration 
files set up, especially for "wide" data sources that may have 15-20 fields, 
e.g. Active Directory.

More broadly speaking, I want to embrace the streaming data paradigm and tried 
to avoid batch jobs. With the DNS example, you might imagine a future where the 
enrichment data is streamed based on DHCP registrations, DNS update events, 
etc. In principle this could reduce the window of time where we might enrich a 
data source with out-of-date data.

Charlie

-Original Message-
From: Carolyn Duby [mailto:cd...@hortonworks.com] 
Sent: 12 June 2018 20:33
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

I like the streaming enrichment solutions but it depends on how you are getting 
the data in.  If you get the data in a csv file just call the flat file loader 
from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer 
– Boston - http://grnh.se/8gbxy41 Need Answers? Try 
https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>


On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for 
>this, especially if you're not using huge enrichment sources that need 
>the batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this 
>use case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how 
>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>putting the code in my github at least in the next few days, while we 
>figure out a more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile 
>loader script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt 
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at 
>> creating a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side 
>> of
>> dev-ops): I really don't want to have to use the flatfile loader 
>> script, and I'm not going to be able to write a Metron-style HBase 
>> key generator any time soon, but I have had some success with a different 
>> approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>> ent
>>
>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>> into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser&

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Carolyn Duby
I like the streaming enrichment solutions but it depends on how you are getting 
the data in.  If you get the data in a csv file just call the flat file loader 
from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need the
>batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this use
>case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how to
>handle releases of a nifi-metron-bundle. I'll probably get round to putting
>the code in my github at least in the next few days, while we figure out a
>more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile loader
>script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt 
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at creating
>> a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side of
>> dev-ops): I really don't want to have to use the flatfile loader script,
>> and I'm not going to be able to write a Metron-style HBase key generator
>> any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi
>> 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>>
>> 4. Create a new "dns" sensor in Metron
>> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
>> settings to push this into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>> "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>> "sensorTopic": "dns",
>> "parserConfig": {
>> "shew.table": " dns",
>> "shew.cf": "dns",
>> "shew.keyColumns": "name",
>> "shew.enrichmentType": "dns",
>>     "columns": {
>>     "name": 0,
>> "type": 1,
>> "data": 2
>> }
>> },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which looks
>> more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -Original Message-
>> From: Casey Stella [mailto:ceste...@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We
>> construct the key very specifically in Metron, so it's unlikely to work out
>> of the box with the NiFi processor unfortunately.  The key that we use is
>> formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>- type - The enrichment type
>>- indicator - the indicator to use
>>- hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that allows
>> the keys to be uniformly distributed among the regions and prevents
>> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
>> construct the key and pass it in, then you

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Simon Elliston Ball
Good solution. The streaming enrichment writer makes a lot of sense for
this, especially if you're not using huge enrichment sources that need the
batch based loaders.

As it happens I have written most of a NiFi processor to handle this use
case directly - both non-record and Record based, especially for Otto :).
The one thing we need to figure out now is where to host that, and how to
handle releases of a nifi-metron-bundle. I'll probably get round to putting
the code in my github at least in the next few days, while we figure out a
more permanent home.

Charlie, out of curiosity, what didn't you like about the flatfile loader
script?

Simon

On 12 June 2018 at 18:00, Charles Joynt 
wrote:

> Thanks for the responses. I appreciate the willingness to look at creating
> a NiFi processer. That would be great!
>
> Just to follow up on this (after a week looking after the "ops" side of
> dev-ops): I really don't want to have to use the flatfile loader script,
> and I'm not going to be able to write a Metron-style HBase key generator
> any time soon, but I have had some success with a different approach.
>
> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> 192.168.0.198"
> 2. Send this to a HTTP listener in NiFi
> 3. Write to a kafka topic
>
> I then followed your instructions in this blog:
> https://cwiki.apache.org/confluence/display/METRON/
> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>
> 4. Create a new "dns" sensor in Metron
> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
> settings to push this into HBase:
>
> {
> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> "writerClassName": "org.apache.metron.enrichment.writer.
> SimpleHbaseEnrichmentWriter",
> "sensorTopic": "dns",
> "parserConfig": {
> "shew.table": " dns",
> "shew.cf": "dns",
> "shew.keyColumns": "name",
> "shew.enrichmentType": "dns",
> "columns": {
> "name": 0,
> "type": 1,
>         "data": 2
>     }
> },
> }
>
> And... it seems to be working. At least, I have data in HBase which looks
> more like the output of the flatfile loader.
>
> Charlie
>
> -Original Message-
> From: Casey Stella [mailto:ceste...@gmail.com]
> Sent: 05 June 2018 14:56
> To: dev@metron.apache.org
> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
> The problem, as you correctly diagnosed, is the key in HBase.  We
> construct the key very specifically in Metron, so it's unlikely to work out
> of the box with the NiFi processor unfortunately.  The key that we use is
> formed here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>- type - The enrichment type
>- indicator - the indicator to use
>- hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> charles.jo...@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a
> > company in London where we are in the process of implementing Metron.
> > I have been tasked with implementing feeds of network environment data
> > into HBase so that this data can be used as enrichment sources for our
> security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
> > it exactly matches what I woul

RE: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-12 Thread Charles Joynt
Thanks for the responses. I appreciate the willingness to look at creating a 
NiFi processer. That would be great!

Just to follow up on this (after a week looking after the "ops" side of 
dev-ops): I really don't want to have to use the flatfile loader script, and 
I'm not going to be able to write a Metron-style HBase key generator any time 
soon, but I have had some success with a different approach.

1. Generate data in CSV format, e.g. "server.domain.local","A","192.168.0.198"
2. Send this to a HTTP listener in NiFi
3. Write to a kafka topic

I then followed your instructions in this blog:
https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment

4. Create a new "dns" sensor in Metron
5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig settings 
to push this into HBase:

{
"parserClassName": "org.apache.metron.parsers.csv.CSVParser",
"writerClassName": 
"org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter",
"sensorTopic": "dns",
"parserConfig": {
"shew.table": " dns",
"shew.cf": "dns",
"shew.keyColumns": "name",
"shew.enrichmentType": "dns",
"columns": {
"name": 0,
"type": 1,
"data": 2
}
},
}

And... it seems to be working. At least, I have data in HBase which looks more 
like the output of the flatfile loader.

Charlie

-Original Message-
From: Casey Stella [mailto:ceste...@gmail.com] 
Sent: 05 June 2018 14:56
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

The problem, as you correctly diagnosed, is the key in HBase.  We construct the 
key very specifically in Metron, so it's unlikely to work out of the box with 
the NiFi processor unfortunately.  The key that we use is formed here in the 
codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows the 
keys to be uniformly distributed among the regions and prevents hotspotting.  
Depending on how the PutHBaseJSON processor works, if you can construct the key 
and pass it in, then you might be able to either construct the key in NiFi or 
write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is probably 
using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or 
consider contributing it back to the project as a PR :)



On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt 
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a 
> company in London where we are in the process of implementing Metron. 
> I have been tasked with implementing feeds of network environment data 
> into HBase so that this data can be used as enrichment sources for our 
> security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that 
> it exactly matches what I would get from the flatfile_loader.sh 
> script. A colleague of mine has already loaded some DNS data using 
> that script, so I am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP 
> listener and routes it to a PutHBaseJSON processor. The flow is 
> working, in the sense that data is successfully written to HBase, but 
> despite (naively) specifying "Row Identifier Encoding Strategy = 
> Binary", the results in HBase don't look correct. Comparing the output 
> from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840, 
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783, 
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> Fro

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
Having it in it’s own repo doesn’t tie it to Metron any less functional
wise, but allows
for a new release with Nifi only changes to be produced, or multiple
streams of releases
across nifi versions ( 1.7.x, 1.8.x ) to be produced.



On June 5, 2018 at 15:14:38, Casey Stella (ceste...@gmail.com) wrote:

I agree with Simon here, the benefit of providing NiFi tooling is to enable
NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar
enrichments, etc). This would tie it to Metron pretty closely.

On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler  wrote:

> Nifi releases more often then Metron does, that might be an issue.
>
>
> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> To be honest, I would expect this to be heavily linked to the Metron
> releases, since it's going to use other metron classes and dependencies
to
> ensure compatibility. For example, a Stellar NiFi processor will be
linked
> to Metron's stellar-common, the enrichment loader will depend on key
> construction code from metron-enrichment (and should align to it). I was
> also considering an opinionated PublishMetron which linked to the Metron
> kafka, and hid some of the dances you have to do to make the readMetadata
> functions to work (i.e. some sugar around our mild abuse of kafka keys,
> which prevents people hurting their kafka by choosing the wrong
> partitioner).
>
> To that extent, I think the releases belong with Metron releases, though
of
> course that does increase our release and test burden.
>
> On 5 June 2018 at 10:55, Otto Fowler  wrote:
>
> > Similar to Bro, we may need to release out of cycle.
> >
> >
> >
> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > Do you mean in the sense of a separate module, or are you suggesting we
> go
> > as far as a sub-project?
> >
> > On 5 June 2018 at 10:08, Otto Fowler  wrote:
> >
> > > If we do that, we should have it as a separate component maybe.
> > >
> > >
> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > > si...@simonellistonball.com) wrote:
> > >
> > > @otto, well, of course we would use the record api... it's great.
> > >
> > > @casey, I have actually written a stellar processor, which applies
> > stellar
> > > to all FlowFile attributes outputting the resulting stellar variable
> > space
> > > to either attributes or as json in the content.
> > >
> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that
off,
> > > since I'm half way there.
> > >
> > > Simon
> > >
> > >
> > >
> > > On 5 June 2018 at 08:41, Otto Fowler  wrote:
> > >
> > > > We have jiras about ‘diverting’ and reading from nifi flows already
> > > >
> > > >
> > > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com)
> wrote:
> > > >
> > > > I'd be in strong support of that, Simon. I think we should have
some
> > > other
> > > > NiFi components in Metron to enable users to interact with our
> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > > etc).
> > > >
> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > > > Do we, the community, think it would be a good idea to create a
> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > > number
> > > > of
> > > > > people want to use NiFi to manage and schedule loading of
> > enrichments
> > > for
> > > > > example.
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > > > >
> > > > > > The problem, as you correctly diagnosed, is the key in HBase.
We
> > > > > construct
> > > > > > the key very specifically in Metron, so it's unlikely to work
out
> > of
> > > > the
> > > > > > box with the NiFi processor unfortunately. The key that we use
is
> > > > formed
> > > > > > here in the codebase:
> > > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > > >
> > > > > > To put that in english, consider the following:
> > > > > >
> > > > > > - type - The enrichment type
> > > > > > - indicator - the indicator to use
> > > > > > - hash(*) - A murmur 3 128bit hash function
> > > > > >
> > > > > > the key is hash(indicator) + type + indicator
> > > > > >
> > > > > > This hash prefixing is a standard practice in hbase key design
> > that
> > > > > allows
> > > > > > the keys to be uniformly distributed among the regions and
> > prevents
> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
> if
> > > you
> > > > > can
> > > > > > construct the key and pass it in, then you might be able to
> either
> > > > > > construct the key in NiFi or write a processor to construct the
> > key.
> > > > > > Ultimately though, what Carolyn said is true..the easiest
> approach
> > > is
> > > > > > probably using the flatfile loader.
> > >

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Simon Elliston Ball
Also, the bundle would be part of the metron project I expect, so the NiFi 
release shouldn’t matter much, now NiFi can version only processors 
independently.

Simon 

> On 5 Jun 2018, at 20:14, Casey Stella  wrote:
> 
> I agree with Simon here, the benefit of providing NiFi tooling is to enable 
> NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar enrichments, 
> etc).  This would tie it to Metron pretty closely.
> 
>> On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler  wrote:
>> Nifi releases more often then Metron does, that might be an issue.
>> 
>> 
>> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
>> si...@simonellistonball.com) wrote:
>> 
>> To be honest, I would expect this to be heavily linked to the Metron
>> releases, since it's going to use other metron classes and dependencies to
>> ensure compatibility. For example, a Stellar NiFi processor will be linked
>> to Metron's stellar-common, the enrichment loader will depend on key
>> construction code from metron-enrichment (and should align to it). I was
>> also considering an opinionated PublishMetron which linked to the Metron
>> kafka, and hid some of the dances you have to do to make the readMetadata
>> functions to work (i.e. some sugar around our mild abuse of kafka keys,
>> which prevents people hurting their kafka by choosing the wrong
>> partitioner).
>> 
>> To that extent, I think the releases belong with Metron releases, though of
>> course that does increase our release and test burden.
>> 
>> On 5 June 2018 at 10:55, Otto Fowler  wrote:
>> 
>> > Similar to Bro, we may need to release out of cycle.
>> >
>> >
>> >
>> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
>> > si...@simonellistonball.com) wrote:
>> >
>> > Do you mean in the sense of a separate module, or are you suggesting we
>> go
>> > as far as a sub-project?
>> >
>> > On 5 June 2018 at 10:08, Otto Fowler  wrote:
>> >
>> > > If we do that, we should have it as a separate component maybe.
>> > >
>> > >
>> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
>> > > si...@simonellistonball.com) wrote:
>> > >
>> > > @otto, well, of course we would use the record api... it's great.
>> > >
>> > > @casey, I have actually written a stellar processor, which applies
>> > stellar
>> > > to all FlowFile attributes outputting the resulting stellar variable
>> > space
>> > > to either attributes or as json in the content.
>> > >
>> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
>> > > since I'm half way there.
>> > >
>> > > Simon
>> > >
>> > >
>> > >
>> > > On 5 June 2018 at 08:41, Otto Fowler  wrote:
>> > >
>> > > > We have jiras about ‘diverting’ and reading from nifi flows already
>> > > >
>> > > >
>> > > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
>> > > >
>> > > > I'd be in strong support of that, Simon. I think we should have some
>> > > other
>> > > > NiFi components in Metron to enable users to interact with our
>> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
>> > > etc).
>> > > >
>> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
>> > > > si...@simonellistonball.com> wrote:
>> > > >
>> > > > > Do we, the community, think it would be a good idea to create a
>> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
>> > > number
>> > > > of
>> > > > > people want to use NiFi to manage and schedule loading of
>> > enrichments
>> > > for
>> > > > > example.
>> > > > >
>> > > > > Simon
>> > > > >
>> > > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
>> > > > >
>> > > > > > The problem, as you correctly diagnosed, is the key in HBase. We
>> > > > > construct
>> > > > > > the key very specifically in Metron, so it's unlikely to work out
>> > of
>> > > > the
>> > > > > > box with the NiFi processor unfortunately. The key that we use is
>> > > > formed
>> > > > > > here in the codebase:
>> > > > > > https://github.com/cestella/incubator-metron/blob/master/
>> > > > > > metron-platform/metron-enrichment/src/main/java/org/
>> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
>> > > > > >
>> > > > > > To put that in english, consider the following:
>> > > > > >
>> > > > > > - type - The enrichment type
>> > > > > > - indicator - the indicator to use
>> > > > > > - hash(*) - A murmur 3 128bit hash function
>> > > > > >
>> > > > > > the key is hash(indicator) + type + indicator
>> > > > > >
>> > > > > > This hash prefixing is a standard practice in hbase key design
>> > that
>> > > > > allows
>> > > > > > the keys to be uniformly distributed among the regions and
>> > prevents
>> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
>> if
>> > > you
>> > > > > can
>> > > > > > construct the key and pass it in, then you might be able to
>> either
>> > > > > > construct the key in NiFi or write a processor to construct the
>> > key.
>> > > > > > Ultimately though, what Carolyn said is true..the easiest
>> approach
>> > > is
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
I agree with Simon here, the benefit of providing NiFi tooling is to enable
NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar
enrichments, etc).  This would tie it to Metron pretty closely.

On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler  wrote:

> Nifi releases more often then Metron does, that might be an issue.
>
>
> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> To be honest, I would expect this to be heavily linked to the Metron
> releases, since it's going to use other metron classes and dependencies to
> ensure compatibility. For example, a Stellar NiFi processor will be linked
> to Metron's stellar-common, the enrichment loader will depend on key
> construction code from metron-enrichment (and should align to it). I was
> also considering an opinionated PublishMetron which linked to the Metron
> kafka, and hid some of the dances you have to do to make the readMetadata
> functions to work (i.e. some sugar around our mild abuse of kafka keys,
> which prevents people hurting their kafka by choosing the wrong
> partitioner).
>
> To that extent, I think the releases belong with Metron releases, though of
> course that does increase our release and test burden.
>
> On 5 June 2018 at 10:55, Otto Fowler  wrote:
>
> > Similar to Bro, we may need to release out of cycle.
> >
> >
> >
> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > Do you mean in the sense of a separate module, or are you suggesting we
> go
> > as far as a sub-project?
> >
> > On 5 June 2018 at 10:08, Otto Fowler  wrote:
> >
> > > If we do that, we should have it as a separate component maybe.
> > >
> > >
> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > > si...@simonellistonball.com) wrote:
> > >
> > > @otto, well, of course we would use the record api... it's great.
> > >
> > > @casey, I have actually written a stellar processor, which applies
> > stellar
> > > to all FlowFile attributes outputting the resulting stellar variable
> > space
> > > to either attributes or as json in the content.
> > >
> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > > since I'm half way there.
> > >
> > > Simon
> > >
> > >
> > >
> > > On 5 June 2018 at 08:41, Otto Fowler  wrote:
> > >
> > > > We have jiras about ‘diverting’ and reading from nifi flows already
> > > >
> > > >
> > > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com)
> wrote:
> > > >
> > > > I'd be in strong support of that, Simon. I think we should have some
> > > other
> > > > NiFi components in Metron to enable users to interact with our
> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > > etc).
> > > >
> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > > > Do we, the community, think it would be a good idea to create a
> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > > number
> > > > of
> > > > > people want to use NiFi to manage and schedule loading of
> > enrichments
> > > for
> > > > > example.
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > > > >
> > > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > > construct
> > > > > > the key very specifically in Metron, so it's unlikely to work out
> > of
> > > > the
> > > > > > box with the NiFi processor unfortunately. The key that we use is
> > > > formed
> > > > > > here in the codebase:
> > > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > > >
> > > > > > To put that in english, consider the following:
> > > > > >
> > > > > > - type - The enrichment type
> > > > > > - indicator - the indicator to use
> > > > > > - hash(*) - A murmur 3 128bit hash function
> > > > > >
> > > > > > the key is hash(indicator) + type + indicator
> > > > > >
> > > > > > This hash prefixing is a standard practice in hbase key design
> > that
> > > > > allows
> > > > > > the keys to be uniformly distributed among the regions and
> > prevents
> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
> if
> > > you
> > > > > can
> > > > > > construct the key and pass it in, then you might be able to
> either
> > > > > > construct the key in NiFi or write a processor to construct the
> > key.
> > > > > > Ultimately though, what Carolyn said is true..the easiest
> approach
> > > is
> > > > > > probably using the flatfile loader.
> > > > > > If you do get this working in NiFi, however, do please let us
> know
> > > > and/or
> > > > > > consider contributing it back to the project as a PR :)
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > > charles.jo...@gresearch.co.uk>
> > > > 

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
Nifi releases more often then Metron does, that might be an issue.


On June 5, 2018 at 14:07:22, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

To be honest, I would expect this to be heavily linked to the Metron
releases, since it's going to use other metron classes and dependencies to
ensure compatibility. For example, a Stellar NiFi processor will be linked
to Metron's stellar-common, the enrichment loader will depend on key
construction code from metron-enrichment (and should align to it). I was
also considering an opinionated PublishMetron which linked to the Metron
kafka, and hid some of the dances you have to do to make the readMetadata
functions to work (i.e. some sugar around our mild abuse of kafka keys,
which prevents people hurting their kafka by choosing the wrong
partitioner).

To that extent, I think the releases belong with Metron releases, though of
course that does increase our release and test burden.

On 5 June 2018 at 10:55, Otto Fowler  wrote:

> Similar to Bro, we may need to release out of cycle.
>
>
>
> On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Do you mean in the sense of a separate module, or are you suggesting we
go
> as far as a sub-project?
>
> On 5 June 2018 at 10:08, Otto Fowler  wrote:
>
> > If we do that, we should have it as a separate component maybe.
> >
> >
> > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > @otto, well, of course we would use the record api... it's great.
> >
> > @casey, I have actually written a stellar processor, which applies
> stellar
> > to all FlowFile attributes outputting the resulting stellar variable
> space
> > to either attributes or as json in the content.
> >
> > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > since I'm half way there.
> >
> > Simon
> >
> >
> >
> > On 5 June 2018 at 08:41, Otto Fowler  wrote:
> >
> > > We have jiras about ‘diverting’ and reading from nifi flows already
> > >
> > >
> > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
> > >
> > > I'd be in strong support of that, Simon. I think we should have some
> > other
> > > NiFi components in Metron to enable users to interact with our
> > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > etc).
> > >
> > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > > > Do we, the community, think it would be a good idea to create a
> > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > number
> > > of
> > > > people want to use NiFi to manage and schedule loading of
> enrichments
> > for
> > > > example.
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > > >
> > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > construct
> > > > > the key very specifically in Metron, so it's unlikely to work out
> of
> > > the
> > > > > box with the NiFi processor unfortunately. The key that we use is
> > > formed
> > > > > here in the codebase:
> > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > >
> > > > > To put that in english, consider the following:
> > > > >
> > > > > - type - The enrichment type
> > > > > - indicator - the indicator to use
> > > > > - hash(*) - A murmur 3 128bit hash function
> > > > >
> > > > > the key is hash(indicator) + type + indicator
> > > > >
> > > > > This hash prefixing is a standard practice in hbase key design
> that
> > > > allows
> > > > > the keys to be uniformly distributed among the regions and
> prevents
> > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
if
> > you
> > > > can
> > > > > construct the key and pass it in, then you might be able to
either
> > > > > construct the key in NiFi or write a processor to construct the
> key.
> > > > > Ultimately though, what Carolyn said is true..the easiest
approach
> > is
> > > > > probably using the flatfile loader.
> > > > > If you do get this working in NiFi, however, do please let us
know
> > > and/or
> > > > > consider contributing it back to the project as a PR :)
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > charles.jo...@gresearch.co.uk>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > > company
> > > > > in
> > > > > > London where we are in the process of implementing Metron. I
> have
> > > been
> > > > > > tasked with implementing feeds of network environment data into
> > HBase
> > > > so
> > > > > > that this data can be used as enrichment sources for our
> security
> > > > events.
> > > > > > First-off I wanted to pull in DNS data for an internal domain.
> >

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Simon Elliston Ball
To be honest, I would expect this to be heavily linked to the Metron
releases, since it's going to use other metron classes and dependencies to
ensure compatibility. For example, a Stellar NiFi processor will be linked
to Metron's stellar-common, the enrichment loader will depend on key
construction code from metron-enrichment (and should align to it). I was
also considering an opinionated PublishMetron which linked to the Metron
kafka, and hid some of the dances you have to do to make the readMetadata
functions to work (i.e. some sugar around our mild abuse of kafka keys,
which prevents people hurting their kafka by choosing the wrong
partitioner).

To that extent, I think the releases belong with Metron releases, though of
course that does increase our release and test burden.

On 5 June 2018 at 10:55, Otto Fowler  wrote:

> Similar to Bro, we may need to release out of cycle.
>
>
>
> On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Do you mean in the sense of a separate module, or are you suggesting we go
> as far as a sub-project?
>
> On 5 June 2018 at 10:08, Otto Fowler  wrote:
>
> > If we do that, we should have it as a separate component maybe.
> >
> >
> > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > @otto, well, of course we would use the record api... it's great.
> >
> > @casey, I have actually written a stellar processor, which applies
> stellar
> > to all FlowFile attributes outputting the resulting stellar variable
> space
> > to either attributes or as json in the content.
> >
> > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > since I'm half way there.
> >
> > Simon
> >
> >
> >
> > On 5 June 2018 at 08:41, Otto Fowler  wrote:
> >
> > > We have jiras about ‘diverting’ and reading from nifi flows already
> > >
> > >
> > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
> > >
> > > I'd be in strong support of that, Simon. I think we should have some
> > other
> > > NiFi components in Metron to enable users to interact with our
> > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > etc).
> > >
> > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > > > Do we, the community, think it would be a good idea to create a
> > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > number
> > > of
> > > > people want to use NiFi to manage and schedule loading of
> enrichments
> > for
> > > > example.
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > > >
> > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > construct
> > > > > the key very specifically in Metron, so it's unlikely to work out
> of
> > > the
> > > > > box with the NiFi processor unfortunately. The key that we use is
> > > formed
> > > > > here in the codebase:
> > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > >
> > > > > To put that in english, consider the following:
> > > > >
> > > > > - type - The enrichment type
> > > > > - indicator - the indicator to use
> > > > > - hash(*) - A murmur 3 128bit hash function
> > > > >
> > > > > the key is hash(indicator) + type + indicator
> > > > >
> > > > > This hash prefixing is a standard practice in hbase key design
> that
> > > > allows
> > > > > the keys to be uniformly distributed among the regions and
> prevents
> > > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> > you
> > > > can
> > > > > construct the key and pass it in, then you might be able to either
> > > > > construct the key in NiFi or write a processor to construct the
> key.
> > > > > Ultimately though, what Carolyn said is true..the easiest approach
> > is
> > > > > probably using the flatfile loader.
> > > > > If you do get this working in NiFi, however, do please let us know
> > > and/or
> > > > > consider contributing it back to the project as a PR :)
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > charles.jo...@gresearch.co.uk>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > > company
> > > > > in
> > > > > > London where we are in the process of implementing Metron. I
> have
> > > been
> > > > > > tasked with implementing feeds of network environment data into
> > HBase
> > > > so
> > > > > > that this data can be used as enrichment sources for our
> security
> > > > events.
> > > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > > >
> > > > > > I am assuming that I need to write data into HBase in such a way
> > that
> > > > it
> > > > > > exactly matches what I would get from t

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
Similar to Bro, we may need to release out of cycle.



On June 5, 2018 at 13:17:55, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Do you mean in the sense of a separate module, or are you suggesting we go
as far as a sub-project?

On 5 June 2018 at 10:08, Otto Fowler  wrote:

> If we do that, we should have it as a separate component maybe.
>
>
> On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> @otto, well, of course we would use the record api... it's great.
>
> @casey, I have actually written a stellar processor, which applies
stellar
> to all FlowFile attributes outputting the resulting stellar variable
space
> to either attributes or as json in the content.
>
> Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> since I'm half way there.
>
> Simon
>
>
>
> On 5 June 2018 at 08:41, Otto Fowler  wrote:
>
> > We have jiras about ‘diverting’ and reading from nifi flows already
> >
> >
> > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
> >
> > I'd be in strong support of that, Simon. I think we should have some
> other
> > NiFi components in Metron to enable users to interact with our
> > infrastructure from NiFi (e.g. being able to transform via stellar,
> etc).
> >
> > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> > > Do we, the community, think it would be a good idea to create a
> > > PutMetronEnrichment NiFi processor for this use case? It seems a
> number
> > of
> > > people want to use NiFi to manage and schedule loading of enrichments
> for
> > > example.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > >
> > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > construct
> > > > the key very specifically in Metron, so it's unlikely to work out
of
> > the
> > > > box with the NiFi processor unfortunately. The key that we use is
> > formed
> > > > here in the codebase:
> > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > >
> > > > To put that in english, consider the following:
> > > >
> > > > - type - The enrichment type
> > > > - indicator - the indicator to use
> > > > - hash(*) - A murmur 3 128bit hash function
> > > >
> > > > the key is hash(indicator) + type + indicator
> > > >
> > > > This hash prefixing is a standard practice in hbase key design that
> > > allows
> > > > the keys to be uniformly distributed among the regions and prevents
> > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> you
> > > can
> > > > construct the key and pass it in, then you might be able to either
> > > > construct the key in NiFi or write a processor to construct the
key.
> > > > Ultimately though, what Carolyn said is true..the easiest approach
> is
> > > > probably using the flatfile loader.
> > > > If you do get this working in NiFi, however, do please let us know
> > and/or
> > > > consider contributing it back to the project as a PR :)
> > > >
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > charles.jo...@gresearch.co.uk>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > company
> > > > in
> > > > > London where we are in the process of implementing Metron. I have
> > been
> > > > > tasked with implementing feeds of network environment data into
> HBase
> > > so
> > > > > that this data can be used as enrichment sources for our security
> > > events.
> > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > >
> > > > > I am assuming that I need to write data into HBase in such a way
> that
> > > it
> > > > > exactly matches what I would get from the flatfile_loader.sh
> script.
> > A
> > > > > colleague of mine has already loaded some DNS data using that
> script,
> > > so
> > > > I
> > > > > am using that as a reference.
> > > > >
> > > > > I have implemented a flow in NiFi which takes JSON data from a
> HTTP
> > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > working,
> > > > in
> > > > > the sense that data is successfully written to HBase, but despite
> > > > (naively)
> > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> results
> > in
> > > > > HBase don't look correct. Comparing the output from HBase scan
> > > commands I
> > > > > see:
> > > > >
> > > > > flatfile_loader.sh produced:
> > > > >
> > > > > ROW:
> > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > x05whois\x00\x0E192.168.0.198
> > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > value={"clientname":"server.domain.local","clientip":"192.
> > 168.0.198"}
> > > > >
> > > > > PutHBaseJSON produced:
> > > > >
> > > > > ROW: server.domain.local
> > > > > CELL: col

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Simon Elliston Ball
Do you mean in the sense of a separate module, or are you suggesting we go
as far as a sub-project?

On 5 June 2018 at 10:08, Otto Fowler  wrote:

> If we do that, we should have it as a separate component maybe.
>
>
> On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> @otto, well, of course we would use the record api... it's great.
>
> @casey, I have actually written a stellar processor, which applies stellar
> to all FlowFile attributes outputting the resulting stellar variable space
> to either attributes or as json in the content.
>
> Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> since I'm half way there.
>
> Simon
>
>
>
> On 5 June 2018 at 08:41, Otto Fowler  wrote:
>
> > We have jiras about ‘diverting’ and reading from nifi flows already
> >
> >
> > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
> >
> > I'd be in strong support of that, Simon. I think we should have some
> other
> > NiFi components in Metron to enable users to interact with our
> > infrastructure from NiFi (e.g. being able to transform via stellar,
> etc).
> >
> > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> > > Do we, the community, think it would be a good idea to create a
> > > PutMetronEnrichment NiFi processor for this use case? It seems a
> number
> > of
> > > people want to use NiFi to manage and schedule loading of enrichments
> for
> > > example.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > >
> > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > construct
> > > > the key very specifically in Metron, so it's unlikely to work out of
> > the
> > > > box with the NiFi processor unfortunately. The key that we use is
> > formed
> > > > here in the codebase:
> > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > >
> > > > To put that in english, consider the following:
> > > >
> > > > - type - The enrichment type
> > > > - indicator - the indicator to use
> > > > - hash(*) - A murmur 3 128bit hash function
> > > >
> > > > the key is hash(indicator) + type + indicator
> > > >
> > > > This hash prefixing is a standard practice in hbase key design that
> > > allows
> > > > the keys to be uniformly distributed among the regions and prevents
> > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> you
> > > can
> > > > construct the key and pass it in, then you might be able to either
> > > > construct the key in NiFi or write a processor to construct the key.
> > > > Ultimately though, what Carolyn said is true..the easiest approach
> is
> > > > probably using the flatfile loader.
> > > > If you do get this working in NiFi, however, do please let us know
> > and/or
> > > > consider contributing it back to the project as a PR :)
> > > >
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > charles.jo...@gresearch.co.uk>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > company
> > > > in
> > > > > London where we are in the process of implementing Metron. I have
> > been
> > > > > tasked with implementing feeds of network environment data into
> HBase
> > > so
> > > > > that this data can be used as enrichment sources for our security
> > > events.
> > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > >
> > > > > I am assuming that I need to write data into HBase in such a way
> that
> > > it
> > > > > exactly matches what I would get from the flatfile_loader.sh
> script.
> > A
> > > > > colleague of mine has already loaded some DNS data using that
> script,
> > > so
> > > > I
> > > > > am using that as a reference.
> > > > >
> > > > > I have implemented a flow in NiFi which takes JSON data from a
> HTTP
> > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > working,
> > > > in
> > > > > the sense that data is successfully written to HBase, but despite
> > > > (naively)
> > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> results
> > in
> > > > > HBase don't look correct. Comparing the output from HBase scan
> > > commands I
> > > > > see:
> > > > >
> > > > > flatfile_loader.sh produced:
> > > > >
> > > > > ROW:
> > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > x05whois\x00\x0E192.168.0.198
> > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > value={"clientname":"server.domain.local","clientip":"192.
> > 168.0.198"}
> > > > >
> > > > > PutHBaseJSON produced:
> > > > >
> > > > > ROW: server.domain.local
> > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > >
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > > >
> > > > > Fro

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
If we do that, we should have it as a separate component maybe.


On June 5, 2018 at 12:42:57, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

@otto, well, of course we would use the record api... it's great.

@casey, I have actually written a stellar processor, which applies stellar
to all FlowFile attributes outputting the resulting stellar variable space
to either attributes or as json in the content.

Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
since I'm half way there.

Simon



On 5 June 2018 at 08:41, Otto Fowler  wrote:

> We have jiras about ‘diverting’ and reading from nifi flows already
>
>
> On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
>
> I'd be in strong support of that, Simon. I think we should have some
other
> NiFi components in Metron to enable users to interact with our
> infrastructure from NiFi (e.g. being able to transform via stellar, etc).
>
> On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > Do we, the community, think it would be a good idea to create a
> > PutMetronEnrichment NiFi processor for this use case? It seems a number
> of
> > people want to use NiFi to manage and schedule loading of enrichments
for
> > example.
> >
> > Simon
> >
> > On 5 June 2018 at 06:56, Casey Stella  wrote:
> >
> > > The problem, as you correctly diagnosed, is the key in HBase. We
> > construct
> > > the key very specifically in Metron, so it's unlikely to work out of
> the
> > > box with the NiFi processor unfortunately. The key that we use is
> formed
> > > here in the codebase:
> > > https://github.com/cestella/incubator-metron/blob/master/
> > > metron-platform/metron-enrichment/src/main/java/org/
> > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >
> > > To put that in english, consider the following:
> > >
> > > - type - The enrichment type
> > > - indicator - the indicator to use
> > > - hash(*) - A murmur 3 128bit hash function
> > >
> > > the key is hash(indicator) + type + indicator
> > >
> > > This hash prefixing is a standard practice in hbase key design that
> > allows
> > > the keys to be uniformly distributed among the regions and prevents
> > > hotspotting. Depending on how the PutHBaseJSON processor works, if
you
> > can
> > > construct the key and pass it in, then you might be able to either
> > > construct the key in NiFi or write a processor to construct the key.
> > > Ultimately though, what Carolyn said is true..the easiest approach is
> > > probably using the flatfile loader.
> > > If you do get this working in NiFi, however, do please let us know
> and/or
> > > consider contributing it back to the project as a PR :)
> > >
> > >
> > >
> > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > charles.jo...@gresearch.co.uk>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I work as a Dev/Ops Data Engineer within the security team at a
> company
> > > in
> > > > London where we are in the process of implementing Metron. I have
> been
> > > > tasked with implementing feeds of network environment data into
HBase
> > so
> > > > that this data can be used as enrichment sources for our security
> > events.
> > > > First-off I wanted to pull in DNS data for an internal domain.
> > > >
> > > > I am assuming that I need to write data into HBase in such a way
that
> > it
> > > > exactly matches what I would get from the flatfile_loader.sh
script.
> A
> > > > colleague of mine has already loaded some DNS data using that
script,
> > so
> > > I
> > > > am using that as a reference.
> > > >
> > > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > working,
> > > in
> > > > the sense that data is successfully written to HBase, but despite
> > > (naively)
> > > > specifying "Row Identifier Encoding Strategy = Binary", the results
> in
> > > > HBase don't look correct. Comparing the output from HBase scan
> > commands I
> > > > see:
> > > >
> > > > flatfile_loader.sh produced:
> > > >
> > > > ROW:
> > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > x05whois\x00\x0E192.168.0.198
> > > > CELL: column=data:v, timestamp=1516896203840,
> > > > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198"}
> > > >
> > > > PutHBaseJSON produced:
> > > >
> > > > ROW: server.domain.local
> > > > CELL: column=dns:v, timestamp=1527778603783,
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > >
> > > > From source JSON:
> > > >
> > > >
> > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > ,"type":"A","data":"192.168.0.198"}}
> > > >
> > > > I know that there are some differences in column family / field
> names,
> > > but
> > > > my worry is the ROW id. Presumably I need to encode my row key, "k"
> in
> > > the
> > > > JSON data, in a way that matches how the flatfile_loader.sh script
> did
> > > it.
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Simon Elliston Ball
@otto, well, of course we would use the record api... it's great.

@casey, I have actually written a stellar processor, which applies stellar
to all FlowFile attributes outputting the resulting stellar variable space
to either attributes or as json in the content.

Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
since I'm half way there.

Simon



On 5 June 2018 at 08:41, Otto Fowler  wrote:

> We have jiras about ‘diverting’ and reading from nifi flows already
>
>
> On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:
>
> I'd be in strong support of that, Simon. I think we should have some other
> NiFi components in Metron to enable users to interact with our
> infrastructure from NiFi (e.g. being able to transform via stellar, etc).
>
> On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > Do we, the community, think it would be a good idea to create a
> > PutMetronEnrichment NiFi processor for this use case? It seems a number
> of
> > people want to use NiFi to manage and schedule loading of enrichments for
> > example.
> >
> > Simon
> >
> > On 5 June 2018 at 06:56, Casey Stella  wrote:
> >
> > > The problem, as you correctly diagnosed, is the key in HBase. We
> > construct
> > > the key very specifically in Metron, so it's unlikely to work out of
> the
> > > box with the NiFi processor unfortunately. The key that we use is
> formed
> > > here in the codebase:
> > > https://github.com/cestella/incubator-metron/blob/master/
> > > metron-platform/metron-enrichment/src/main/java/org/
> > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >
> > > To put that in english, consider the following:
> > >
> > > - type - The enrichment type
> > > - indicator - the indicator to use
> > > - hash(*) - A murmur 3 128bit hash function
> > >
> > > the key is hash(indicator) + type + indicator
> > >
> > > This hash prefixing is a standard practice in hbase key design that
> > allows
> > > the keys to be uniformly distributed among the regions and prevents
> > > hotspotting. Depending on how the PutHBaseJSON processor works, if you
> > can
> > > construct the key and pass it in, then you might be able to either
> > > construct the key in NiFi or write a processor to construct the key.
> > > Ultimately though, what Carolyn said is true..the easiest approach is
> > > probably using the flatfile loader.
> > > If you do get this working in NiFi, however, do please let us know
> and/or
> > > consider contributing it back to the project as a PR :)
> > >
> > >
> > >
> > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > charles.jo...@gresearch.co.uk>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I work as a Dev/Ops Data Engineer within the security team at a
> company
> > > in
> > > > London where we are in the process of implementing Metron. I have
> been
> > > > tasked with implementing feeds of network environment data into HBase
> > so
> > > > that this data can be used as enrichment sources for our security
> > events.
> > > > First-off I wanted to pull in DNS data for an internal domain.
> > > >
> > > > I am assuming that I need to write data into HBase in such a way that
> > it
> > > > exactly matches what I would get from the flatfile_loader.sh script.
> A
> > > > colleague of mine has already loaded some DNS data using that script,
> > so
> > > I
> > > > am using that as a reference.
> > > >
> > > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > working,
> > > in
> > > > the sense that data is successfully written to HBase, but despite
> > > (naively)
> > > > specifying "Row Identifier Encoding Strategy = Binary", the results
> in
> > > > HBase don't look correct. Comparing the output from HBase scan
> > commands I
> > > > see:
> > > >
> > > > flatfile_loader.sh produced:
> > > >
> > > > ROW:
> > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > x05whois\x00\x0E192.168.0.198
> > > > CELL: column=data:v, timestamp=1516896203840,
> > > > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198"}
> > > >
> > > > PutHBaseJSON produced:
> > > >
> > > > ROW: server.domain.local
> > > > CELL: column=dns:v, timestamp=1527778603783,
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > >
> > > > From source JSON:
> > > >
> > > >
> > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > ,"type":"A","data":"192.168.0.198"}}
> > > >
> > > > I know that there are some differences in column family / field
> names,
> > > but
> > > > my worry is the ROW id. Presumably I need to encode my row key, "k"
> in
> > > the
> > > > JSON data, in a way that matches how the flatfile_loader.sh script
> did
> > > it.
> > > >
> > > > Can anyone explain how I might convert my Id to the correct format?
> > > > -or-
> > > > Does this matter-can Metron use the human-readable

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
We have jiras about ‘diverting’ and reading from nifi flows already


On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com) wrote:

I'd be in strong support of that, Simon. I think we should have some other
NiFi components in Metron to enable users to interact with our
infrastructure from NiFi (e.g. being able to transform via stellar, etc).

On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Do we, the community, think it would be a good idea to create a
> PutMetronEnrichment NiFi processor for this use case? It seems a number
of
> people want to use NiFi to manage and schedule loading of enrichments for
> example.
>
> Simon
>
> On 5 June 2018 at 06:56, Casey Stella  wrote:
>
> > The problem, as you correctly diagnosed, is the key in HBase. We
> construct
> > the key very specifically in Metron, so it's unlikely to work out of
the
> > box with the NiFi processor unfortunately. The key that we use is
formed
> > here in the codebase:
> > https://github.com/cestella/incubator-metron/blob/master/
> > metron-platform/metron-enrichment/src/main/java/org/
> > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >
> > To put that in english, consider the following:
> >
> > - type - The enrichment type
> > - indicator - the indicator to use
> > - hash(*) - A murmur 3 128bit hash function
> >
> > the key is hash(indicator) + type + indicator
> >
> > This hash prefixing is a standard practice in hbase key design that
> allows
> > the keys to be uniformly distributed among the regions and prevents
> > hotspotting. Depending on how the PutHBaseJSON processor works, if you
> can
> > construct the key and pass it in, then you might be able to either
> > construct the key in NiFi or write a processor to construct the key.
> > Ultimately though, what Carolyn said is true..the easiest approach is
> > probably using the flatfile loader.
> > If you do get this working in NiFi, however, do please let us know
and/or
> > consider contributing it back to the project as a PR :)
> >
> >
> >
> > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > charles.jo...@gresearch.co.uk>
> > wrote:
> >
> > > Hello,
> > >
> > > I work as a Dev/Ops Data Engineer within the security team at a
company
> > in
> > > London where we are in the process of implementing Metron. I have
been
> > > tasked with implementing feeds of network environment data into HBase
> so
> > > that this data can be used as enrichment sources for our security
> events.
> > > First-off I wanted to pull in DNS data for an internal domain.
> > >
> > > I am assuming that I need to write data into HBase in such a way that
> it
> > > exactly matches what I would get from the flatfile_loader.sh script.
A
> > > colleague of mine has already loaded some DNS data using that script,
> so
> > I
> > > am using that as a reference.
> > >
> > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > listener and routes it to a PutHBaseJSON processor. The flow is
> working,
> > in
> > > the sense that data is successfully written to HBase, but despite
> > (naively)
> > > specifying "Row Identifier Encoding Strategy = Binary", the results
in
> > > HBase don't look correct. Comparing the output from HBase scan
> commands I
> > > see:
> > >
> > > flatfile_loader.sh produced:
> > >
> > > ROW:
> > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > x05whois\x00\x0E192.168.0.198
> > > CELL: column=data:v, timestamp=1516896203840,
> > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> > >
> > > PutHBaseJSON produced:
> > >
> > > ROW: server.domain.local
> > > CELL: column=dns:v, timestamp=1527778603783,
> > >
value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > >
> > > From source JSON:
> > >
> > >
> > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > ,"type":"A","data":"192.168.0.198"}}
> > >
> > > I know that there are some differences in column family / field
names,
> > but
> > > my worry is the ROW id. Presumably I need to encode my row key, "k"
in
> > the
> > > JSON data, in a way that matches how the flatfile_loader.sh script
did
> > it.
> > >
> > > Can anyone explain how I might convert my Id to the correct format?
> > > -or-
> > > Does this matter-can Metron use the human-readable ROW ids?
> > >
> > > Charlie Joynt
> > >
> > > --
> > > G-RESEARCH believes the information provided herein is reliable.
While
> > > every care has been taken to ensure accuracy, the information is
> > furnished
> > > to the recipients with no warranty as to the completeness and
accuracy
> of
> > > its contents and on condition that any errors or omissions shall not
be
> > > made the basis of any claim, demand or cause of action.
> > > The information in this email is intended only for the named
recipient.
> > > If you are not the intended recipient please notify us immediately
and
> do
> > > not copy, distribute or take action based on this e-mail.
> 

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Otto Fowler
PutMetronEnrichementRecords*  ;)


On June 5, 2018 at 10:32:43, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Do we, the community, think it would be a good idea to create a
PutMetronEnrichment NiFi processor for this use case? It seems a number of
people want to use NiFi to manage and schedule loading of enrichments for
example.

Simon

On 5 June 2018 at 06:56, Casey Stella  wrote:

> The problem, as you correctly diagnosed, is the key in HBase. We
construct
> the key very specifically in Metron, so it's unlikely to work out of the
> box with the NiFi processor unfortunately. The key that we use is formed
> here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
> - type - The enrichment type
> - indicator - the indicator to use
> - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that
allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting. Depending on how the PutHBaseJSON processor works, if you
can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> charles.jo...@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a company
> in
> > London where we are in the process of implementing Metron. I have been
> > tasked with implementing feeds of network environment data into HBase
so
> > that this data can be used as enrichment sources for our security
events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
it
> > exactly matches what I would get from the flatfile_loader.sh script. A
> > colleague of mine has already loaded some DNS data using that script,
so
> I
> > am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is
working,
> in
> > the sense that data is successfully written to HBase, but despite
> (naively)
> > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > HBase don't look correct. Comparing the output from HBase scan commands
I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x00\x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW: server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type":"A","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> but
> > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> the
> > JSON data, in a way that matches how the flatfile_loader.sh script did
> it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> furnished
> > to the recipients with no warranty as to the completeness and accuracy
of
> > its contents and on condition that any errors or omissions shall not be
> > made the basis of any claim, demand or cause of action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and
do
> > not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company
number
> > 08127121.
> > --
> >
>



-- 
-- 
simon elliston ball
@sireb


Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
I'd be in strong support of that, Simon.  I think we should have some other
NiFi components in Metron to enable users to interact with our
infrastructure from NiFi (e.g. being able to transform via stellar, etc).

On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Do we, the community, think it would be a good idea to create a
> PutMetronEnrichment NiFi processor for this use case? It seems a number of
> people want to use NiFi to manage and schedule loading of enrichments for
> example.
>
> Simon
>
> On 5 June 2018 at 06:56, Casey Stella  wrote:
>
> > The problem, as you correctly diagnosed, is the key in HBase.  We
> construct
> > the key very specifically in Metron, so it's unlikely to work out of the
> > box with the NiFi processor unfortunately.  The key that we use is formed
> > here in the codebase:
> > https://github.com/cestella/incubator-metron/blob/master/
> > metron-platform/metron-enrichment/src/main/java/org/
> > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >
> > To put that in english, consider the following:
> >
> >- type - The enrichment type
> >- indicator - the indicator to use
> >- hash(*) - A murmur 3 128bit hash function
> >
> > the key is hash(indicator) + type + indicator
> >
> > This hash prefixing is a standard practice in hbase key design that
> allows
> > the keys to be uniformly distributed among the regions and prevents
> > hotspotting.  Depending on how the PutHBaseJSON processor works, if you
> can
> > construct the key and pass it in, then you might be able to either
> > construct the key in NiFi or write a processor to construct the key.
> > Ultimately though, what Carolyn said is true..the easiest approach is
> > probably using the flatfile loader.
> > If you do get this working in NiFi, however, do please let us know and/or
> > consider contributing it back to the project as a PR :)
> >
> >
> >
> > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > charles.jo...@gresearch.co.uk>
> > wrote:
> >
> > > Hello,
> > >
> > > I work as a Dev/Ops Data Engineer within the security team at a company
> > in
> > > London where we are in the process of implementing Metron. I have been
> > > tasked with implementing feeds of network environment data into HBase
> so
> > > that this data can be used as enrichment sources for our security
> events.
> > > First-off I wanted to pull in DNS data for an internal domain.
> > >
> > > I am assuming that I need to write data into HBase in such a way that
> it
> > > exactly matches what I would get from the flatfile_loader.sh script. A
> > > colleague of mine has already loaded some DNS data using that script,
> so
> > I
> > > am using that as a reference.
> > >
> > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > listener and routes it to a PutHBaseJSON processor. The flow is
> working,
> > in
> > > the sense that data is successfully written to HBase, but despite
> > (naively)
> > > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > > HBase don't look correct. Comparing the output from HBase scan
> commands I
> > > see:
> > >
> > > flatfile_loader.sh produced:
> > >
> > > ROW:
> > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > x05whois\x00\x0E192.168.0.198
> > > CELL: column=data:v, timestamp=1516896203840,
> > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> > >
> > > PutHBaseJSON produced:
> > >
> > > ROW:  server.domain.local
> > > CELL: column=dns:v, timestamp=1527778603783,
> > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > >
> > > From source JSON:
> > >
> > >
> > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > ,"type":"A","data":"192.168.0.198"}}
> > >
> > > I know that there are some differences in column family / field names,
> > but
> > > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> > the
> > > JSON data, in a way that matches how the flatfile_loader.sh script did
> > it.
> > >
> > > Can anyone explain how I might convert my Id to the correct format?
> > > -or-
> > > Does this matter-can Metron use the human-readable ROW ids?
> > >
> > > Charlie Joynt
> > >
> > > --
> > > G-RESEARCH believes the information provided herein is reliable. While
> > > every care has been taken to ensure accuracy, the information is
> > furnished
> > > to the recipients with no warranty as to the completeness and accuracy
> of
> > > its contents and on condition that any errors or omissions shall not be
> > > made the basis of any claim, demand or cause of action.
> > > The information in this email is intended only for the named recipient.
> > > If you are not the intended recipient please notify us immediately and
> do
> > > not copy, distribute or take action based on this e-mail.
> > > All messages sent to and from this e-mail address will be logged by
> > > G-RESEARCH and are subject to archival storage, m

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Simon Elliston Ball
Do we, the community, think it would be a good idea to create a
PutMetronEnrichment NiFi processor for this use case? It seems a number of
people want to use NiFi to manage and schedule loading of enrichments for
example.

Simon

On 5 June 2018 at 06:56, Casey Stella  wrote:

> The problem, as you correctly diagnosed, is the key in HBase.  We construct
> the key very specifically in Metron, so it's unlikely to work out of the
> box with the NiFi processor unfortunately.  The key that we use is formed
> here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>- type - The enrichment type
>- indicator - the indicator to use
>- hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> charles.jo...@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a company
> in
> > London where we are in the process of implementing Metron. I have been
> > tasked with implementing feeds of network environment data into HBase so
> > that this data can be used as enrichment sources for our security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that it
> > exactly matches what I would get from the flatfile_loader.sh script. A
> > colleague of mine has already loaded some DNS data using that script, so
> I
> > am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is working,
> in
> > the sense that data is successfully written to HBase, but despite
> (naively)
> > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > HBase don't look correct. Comparing the output from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x00\x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW:  server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type":"A","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> but
> > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> the
> > JSON data, in a way that matches how the flatfile_loader.sh script did
> it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> furnished
> > to the recipients with no warranty as to the completeness and accuracy of
> > its contents and on condition that any errors or omissions shall not be
> > made the basis of any claim, demand or cause of action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and do
> > not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company number
> > 08127121.
> > --
> >
>



-- 
--
simon elliston ball
@sireb


Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
The problem, as you correctly diagnosed, is the key in HBase.  We construct
the key very specifically in Metron, so it's unlikely to work out of the
box with the NiFi processor unfortunately.  The key that we use is formed
here in the codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows
the keys to be uniformly distributed among the regions and prevents
hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
construct the key and pass it in, then you might be able to either
construct the key in NiFi or write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is
probably using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or
consider contributing it back to the project as a PR :)



On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt 
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a company in
> London where we are in the process of implementing Metron. I have been
> tasked with implementing feeds of network environment data into HBase so
> that this data can be used as enrichment sources for our security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that it
> exactly matches what I would get from the flatfile_loader.sh script. A
> colleague of mine has already loaded some DNS data using that script, so I
> am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP
> listener and routes it to a PutHBaseJSON processor. The flow is working, in
> the sense that data is successfully written to HBase, but despite (naively)
> specifying "Row Identifier Encoding Strategy = Binary", the results in
> HBase don't look correct. Comparing the output from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840,
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783,
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> From source JSON:
>
>
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}
>
> I know that there are some differences in column family / field names, but
> my worry is the ROW id. Presumably I need to encode my row key, "k" in the
> JSON data, in a way that matches how the flatfile_loader.sh script did it.
>
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-readable ROW ids?
>
> Charlie Joynt
>
> --
> G-RESEARCH believes the information provided herein is reliable. While
> every care has been taken to ensure accuracy, the information is furnished
> to the recipients with no warranty as to the completeness and accuracy of
> its contents and on condition that any errors or omissions shall not be
> made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and do
> not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by
> G-RESEARCH and are subject to archival storage, monitoring, review and
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company number
> 08127121.
> --
>


Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-01 Thread Carolyn Duby
Hi Charles - 

I think your best bet is to create a csv file and use the flatfile_loader.sh  
This will be easier and you won’t have to worry if the format of Hbase storage 
changes:

https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#loading-utilities


The flat file loader is located here:

https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/scripts/flatfile_loader.sh


Here is an example of an enrichment that maps a userid to a user category.

Here is the csv mapping the userid to a category.  For example tsausner has 
user category BAD_GUY.

[centos@metron-demo-4 rangeraudit]$ cat user_enrichment.csv 
tsausner,BAD_GUY 
ndhanase,CONTRACTOR 
svelagap,ADMIN 
jprivite,EMPLOYEE 
nolan,EMPLOYEE

Create an extractor config file that maps the columns of the csv file to 
enrichments.  The indicator_column is the key for the enrichment.   


[centos@metron-demo-4 rangeraudit]$ cat user_extraction.json 
{
  "config" : {
"columns" : {
 "user_id" : 0
,"user_category" : 1 
}
,"indicator_column" : "user_id"
,"type" : "user_categorization"
,"separator" : ","
  }
  ,"extractor" : "CSV"
}

This is an optional step where you can specify where to use the enrichments in 
Metron, when you import the enrichment data.  You can skip this step if the 
enrichments are already configured or you can add them later.
This config file applies the user_categorization enrichment using the reqUser 
field as the key.  
 
[centos@metron-demo-4 rangeraudit]$ cat 
rangeradmin_user_category_enrichment.json 
{
  "zkQuorum": 
"metron-demo-2.field.hortonworks.com:2181,metron-demo-0.field.hortonworks.com:2181,metron-demo-1.field.hortonworks.com:2181",
  "sensorToFieldList": {
"rangeradmin": {
  "type": "ENRICHMENT",
  "fieldToEnrichmentTypes": {
"reqUser": [
  "user_categorization"
]
  }
}
  }
}

The command below imports the enrichment mappings into Hbase and adds the 
enrichment to the rangeradmin sensor data.   The result is that when a ranger 
admin event is enriched, metron will use the reqUser field value as a key into 
the user_categorization enrichment.  If the value of the field is present in 
the CSV data the enriched event will have a new field indicating the user 
category:

[centos@metron-demo-4 rangeraudit]$ 
/usr/hcp/1.4.0.0-38/metron/bin/flatfile_loader.sh -e user_extraction.json -t 
enrichment -i user_enrichment.csv -c t -n 
rangeradmin_user_category_enrichment.json


Base will look similar to this:

hbase(main):002:0> scan 'enrichment'
ROW  COLUMN+CELL
   
 \x01\x12\x8Bjx@d.\xF3\xBF\xD3\xB2\x column=t:v, timestamp=1518118740456, 
value={"user_category":"BAD_GUY ","user_id":"tsausner"}  
 81\xEB\xB5\xD2\x00\x13user_categori
   
 zation\x00\x08tsausner 
   
 /\xA8\xEB\xB1\xE0N\xBE\xCBv?\xCAz9\ column=t:v, timestamp=1518118740540, 
value={"user_category":"ADMIN ","user_id":"svelagap"}
 xF6;\xD3\x00\x13user_categorization
   
 \x00\x08svelagap   
   
 l\xF1F\x83t\xD6x\xF9\xBEwrk3\x00M2\ column=t:v, timestamp=1518118740522, 
value={"user_category":"CONTRACTOR ","user_id":"ndhanase"}   
 x00\x13user_categorization\x00\x08n
   
 dhanase  



After the enrichment data is in Hbase, create an event and add it to the 
rangeradmin topic.  For example if the reqUser field is set to nnolan, the 
enriched event will have the following fields:

enrichments:hbaseEnrichment:reqUser:user_categorization:user_category
EMPLOYEE

enrichments:hbaseEnrichment:reqUser:user_categorization:user_id
nnolan



Thanks

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 









On 6/1/18, 6:26 AM, "Charles Joynt"  wrote:

>Hello,
>
>I work as a Dev/Ops Data Engineer within the security team at a company in 
>London where we are in the process of implementing Metron. I have been tasked 
>with implementing feeds of network environment data into HBase so that this 
>data can be used as enrichment sources for our security events. First-off I 
>wanted to pul