RE: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Charles Joynt
Regarding why I didn't choose to load data with the flatfile loader script...

I want to be able to SEND enrichment data to Metron rather than have to set up 
cron jobs to PULL data. At the moment I'm trying to prove that the process 
works with a simple data source. In the future we will want enrichment data in 
Metron that comes from systems (e.g. HR databases) that I won't have access to, 
hence will need someone to be able to send us the data.

> Carolyn: just call the flat file loader from a script processor...

I didn't believe that would work in my environment. I'm pretty sure the script 
has dependencies on various Metron JARs, not least for the row id hashing 
algorithm. I suppose this would require at least a partial install of Metron 
alongside NiFi, and would introduce additional work on the NiFi cluster for any 
Metron upgrade. In some (enterprise) environments there might be separation of 
ownership between NiFi and Metron.

I also prefer not to have a Java app calling a bash script which calls a new 
java process, with logs or error output that might just get swallowed up 
invisibly. Somewhere down the line this could hold up effective troubleshooting.

> Simon: I have actually written a stellar processor, which applies stellar to 
> all FlowFile attributes...

Gulp.

> Simon: what didn't you like about the flatfile loader script?

The flatfile loader script has worked fine for me when prepping enrichment data 
in test systems, however it was a bit of a chore to get the JSON configuration 
files set up, especially for "wide" data sources that may have 15-20 fields, 
e.g. Active Directory.

More broadly speaking, I want to embrace the streaming data paradigm and tried 
to avoid batch jobs. With the DNS example, you might imagine a future where the 
enrichment data is streamed based on DHCP registrations, DNS update events, 
etc. In principle this could reduce the window of time where we might enrich a 
data source with out-of-date data.

Charlie

-Original Message-
From: Carolyn Duby [mailto:cd...@hortonworks.com] 
Sent: 12 June 2018 20:33
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

I like the streaming enrichment solutions but it depends on how you are getting 
the data in.  If you get the data in a csv file just call the flat file loader 
from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer 
– Boston - http://grnh.se/8gbxy41 Need Answers? Try 
https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>


On 6/12/18, 1:08 PM, "Simon Elliston Ball"  wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for 
>this, especially if you're not using huge enrichment sources that need 
>the batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this 
>use case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how 
>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>putting the code in my github at least in the next few days, while we 
>figure out a more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile 
>loader script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt 
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at 
>> creating a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side 
>> of
>> dev-ops): I really don't want to have to use the flatfile loader 
>> script, and I'm not going to be able to write a Metron-style HBase 
>> key generator any time soon, but I have had some success with a different 
>> approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>> ent
>>
>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>> into HBase:
>>
>> {
>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>> "writerClassName": &q

RE: [DISCUSS] Treating null as false in boolean expressions in Stellar

2018-06-19 Thread Charles Joynt
I'd welcome both of these on the grounds that they'll make life easier writing 
short(er) Stellar code AND deciphering what someone else has written.

-Original Message-
From: Casey Stella [mailto:ceste...@gmail.com] 
Sent: 16 June 2018 18:33
To: dev@metron.apache.org
Subject: Re: [DISCUSS] Treating null as false in boolean expressions in Stellar

I created a PR for the empty collection falseyness as well:
https://github.com/apache/metron/pull/1064 so we can choose either of them if 
we so desire.

On Sat, Jun 16, 2018 at 1:10 PM Casey Stella  wrote:

> I created a PR for this functionality, in case we decided for it:
> https://github.com/apache/metron/pull/1063
>
> Also, while we're talking, perhaps we should treat empty lists as 
> false as well, like javascript and python.
> So, for instance, if [] then 'blah' else 'foo' would return foo.
>
> Thoughts?
>
> On Sat, Jun 16, 2018 at 10:17 AM Casey Stella  wrote:
>
>> Right now, because fields may not exist, users can have an awkward time.
>> For instance, checking for is_alert, you end up having to preface 
>> checks with exists(is_alert).
>>
>> For instance, in one of our use-cases:
>> https://github.com/apache/metron/tree/master/use-cases/geographic_log
>> in_outliers
>> we use
>>
>> "is_alert := exists(is_alert) && is_alert", "is_alert := is_alert || 
>> (geo_outlier != null && geo_outlier == true)",
>>
>>  instead of :
>>
>> "is_alert := is_alert || geo_outlier == true",
>>
>> I suggest that we adopt a convention from javascript whereby we 
>> assume a field not existing or being null should act as false in 
>> boolean expressions.  This will simplify stellar's use and hopefully 
>> result in less awkwardness.
>>
>> Thoughts?
>>
>

--
G-RESEARCH believes the information provided herein is reliable. While every 
care has been taken to ensure accuracy, the information is furnished to the 
recipients with no warranty as to the completeness and accuracy of its contents 
and on condition that any errors or omissions shall not be made the basis of 
any claim, demand or cause of action.
The information in this email is intended only for the named recipient.  If you 
are not the intended recipient please notify us immediately and do not copy, 
distribute or take action based on this e-mail.
All messages sent to and from this e-mail address will be logged by G-RESEARCH 
and are subject to archival storage, monitoring, review and disclosure.
G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington 
House, 19-30 Alfred Place, London WC1E 7EA.
Trenchant Limited is a company registered in England with company number 
08127121.
--