Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-28 Thread Nick Allen
Thanks for all the reviews and support. I have merged the feature branch into master. On Thu, Sep 27, 2018 at 2:41 PM James Sirota wrote: > +1 from me as well. great work > > 27.09.2018, 11:15, "Ryan Merriman" : > > +1 from me. Great work. > > > > On Thu, Sep 27, 2018 at 12:41 PM Justin Leet

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-27 Thread James Sirota
+1 from me as well. great work 27.09.2018, 11:15, "Ryan Merriman" : > +1 from me. Great work. > > On Thu, Sep 27, 2018 at 12:41 PM Justin Leet wrote: > >>  I'm +1 on merging the feature branch into master. There's a lot of good >>  work here, and it's definitely been nice to see the couple

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-27 Thread Ryan Merriman
+1 from me. Great work. On Thu, Sep 27, 2018 at 12:41 PM Justin Leet wrote: > I'm +1 on merging the feature branch into master. There's a lot of good > work here, and it's definitely been nice to see the couple remaining > improvements make it in. > > Thanks a lot for the contribution, this is

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-27 Thread Justin Leet
I'm +1 on merging the feature branch into master. There's a lot of good work here, and it's definitely been nice to see the couple remaining improvements make it in. Thanks a lot for the contribution, this is great stuff! On Wed, Sep 26, 2018 at 6:26 PM Nick Allen wrote: > Or support to be

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-26 Thread Nick Allen
Or support to be offered for merging this feature branch into master? On Wed, Sep 26, 2018 at 6:20 PM Nick Allen wrote: > Thanks for the review. With https://github.com/apache/metron/pull/1209 > complete, > I think the feature branch is ready to be merged. Sounds like I have > Mike's

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-26 Thread Nick Allen
Thanks for the review. With https://github.com/apache/metron/pull/1209 complete, I think the feature branch is ready to be merged. Sounds like I have Mike's support. Anyone else have comments, concerns, questions? On Tue, Sep 25, 2018 at 10:33 PM Michael Miklavcic <

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-25 Thread Michael Miklavcic
I just made a couple minor comments on that PR, and I am in agreement about the readiness for merging with master. Good stuff Nick. On Fri, Sep 21, 2018 at 12:37 PM Nick Allen wrote: > Here is a PR that adds the input time constraints to the Batch Profiler > (METRON-1787);

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-21 Thread Nick Allen
Here is a PR that adds the input time constraints to the Batch Profiler (METRON-1787); https://github.com/apache/metron/pull/1209. It seems that the consensus is that this is probably the last feature we need before merging the FB into master. The other two can wait until after the feature

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
Yeah, agreed. Per use case 3, when deploying to production there really wouldn't be a huge overlap like 3 months of already profiled data. Its day 1, the profile was just deployed around the same time as you are running the Batch Profiler, so the overlap is in minutes, maybe hours. But I can

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Michael Miklavcic
I think we might want to allow the flexibility to choose the date range then. I don't yet feel like I have a good enough understanding of all the ways in which users would want to seed to force them to run the batch job over all the data. It might also make it easier to deal with remediation, ie

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
Assuming you have 9 months of data archived, yes. On Thu, Sep 20, 2018 at 1:22 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > So in the case of 3 - if you had 6 months of data that hadn't been profiled > and another 3 that had been profiled (9 months total data), in its current >

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Michael Miklavcic
So in the case of 3 - if you had 6 months of data that hadn't been profiled and another 3 that had been profiled (9 months total data), in its current form the batch job runs over all 9 months? On Thu, Sep 20, 2018 at 11:13 AM Nick Allen wrote: > > How do we establish "tm" from 1.1 above? Any

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
> It's just cleaner from a usage/management perspective to say "I want to put a profile in prod, just use streaming profiler and the batch profiler with the same setup and they're good to go." Agreed. I can add it. It would be a simple addition. On Thu, Sep 20, 2018 at 12:49 PM Justin Leet

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
> How do we establish "tm" from 1.1 above? Any concerns about overlap or gaps after the seeding is performed? Good point. Right now, if the Streaming and Batch Profiler overlap the last write wins. And presumably the output of the Streaming and Batch Profiler are the same, so no worries, right?

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Justin Leet
I think the main difference between this and the flatfile loader is that we actively maintain our profiles in ZK for streaming. Doing this from files is likely going to be the main usage, particularly for speculative usage. For me, the main use case for ZK is definitely use case 3. I can

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Michael Miklavcic
Ok, makes sense. That's sort of what I was thinking as well, Nick. Pulling at this thread just a bit more... 1. I have an existing system that's been up a while, and I have added k profiles - assume these are the first profiles I've created. 1. I would have t0 - tm (where m is the

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
I think more often than not, you would want to load your profile definition from a file. This is why I considered the 'load from Zk' more of a nice-to-have. - In use case 1 and 2, this would definitely be the case. The profiles I am working with are speculative and I am using the batch

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Michael Miklavcic
I think I'm torn on this, specifically because it's batch and would generally be run as-needed. Justin, can you elaborate on your concerns there? This feels functionally very similar to our flat file loaders, which all have inputs for config from the CLI only. On the other hand, our flat file

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Justin Leet
The profile not being able to read from ZK feels like a fairly substantial, if subtle, set of potential problems. I'd like to see that in either before merging or at least pretty soon after merging. Is it a lot of work to add that functionality based on where things are right now? On Thu, Sep

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Nick Allen
Here is another limitation that I just thought. It can only read a profile definition from a file. It probably also makes sense to add an option that allows it to read the current Profiler configuration from Zookeeper. > Is it worth setting up a default config that pulls from the main indexing

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-19 Thread James Sirota
I think what you have outlined above is a good initial stab at the feature. Manual install of spark is not a big deal. Configuring via command line while we mature this feature is ok as well. Doesn't look like configuration steps are too hard. I think you should merge. James 19.09.2018,

[DISCUSS] Batch Profiler Feature Branch

2018-09-19 Thread Nick Allen
I would like to open a discussion to get the Batch Profiler feature branch merged into master as part of METRON-1699 [1] Create Batch Profiler. All of the work that I had in mind for our first draft of the Batch Profiler has been completed. Please take a look through what I have and let me know

Re: [DISCUSS] Batch Profiler

2018-08-16 Thread Nick Allen
FYI - Work is progressing on the Batch Profiler in Spark. For those interested, feel free to take a look at any of the PRs that are open on this feature branch. https://github.com/apache/metron/pulls/nickwallen On Mon, Jul 30, 2018 at 10:50 AM, Nick Allen wrote: > >> 1. We will need a break

Re: [DISCUSS] Batch Profiler

2018-07-30 Thread Nick Allen
>> 1. We will need a break down of introducing Spark to the stack; required version due to HDP support; do we want to update HDP support before this?; Spark tuning/defaults; Spark configuration support / UI etc All sounds useful. I'm not sure how much of that we can do before we have the code

Re: [DISCUSS] Batch Profiler

2018-07-30 Thread Simon Elliston Ball
Good points Otto +1 to all that. On the Spark question, we should definitely be more deliberate about it. We currently have an implicit dependency on spark through the zeppelin notebooks. Most implementations I've seen of Metron also have some sort of Spark work built around them. The current

Re: [DISCUSS] Batch Profiler

2018-07-30 Thread Otto Fowler
I think the feature branch is a good idea, but what is in the feature branch or feature branches will have to shake out. I agree in concept with what you have in the jira, but I have two points. 1. We will need a break down of introducing Spark to the stack - required version due to HDP

Re: [DISCUSS] Batch Profiler

2018-07-28 Thread Nick Allen
Thanks. Opening up the feature branch lets me get a PR or two out. On Sat, Jul 28, 2018 at 1:01 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > +1 on the feature branch, Nick. I'll start reviewing the write-ups shortly. > > On Fri, Jul 27, 2018, 9:29 AM Nick Allen wrote: > > >

Re: [DISCUSS] Batch Profiler

2018-07-28 Thread Michael Miklavcic
+1 on the feature branch, Nick. I'll start reviewing the write-ups shortly. On Fri, Jul 27, 2018, 9:29 AM Nick Allen wrote: > Hi Everyone - > > A while back I opened up a discuss thread around the general idea of a > Batch Profiler [1]. I'd like to start making progress on a first draft of >

[DISCUSS] Batch Profiler

2018-07-27 Thread Nick Allen
Hi Everyone - A while back I opened up a discuss thread around the general idea of a Batch Profiler [1]. I'd like to start making progress on a first draft of that functionality. I created METRON-1699 [2] which outlines the general approach and ideas. If you're interested, review that JIRA and