Thanks for all the reviews and support. I have merged the feature branch
On Thu, Sep 27, 2018 at 2:41 PM James Sirota wrote:
> +1 from me as well. great work
> 27.09.2018, 11:15, "Ryan Merriman" :
> > +1 from me. Great work.
> > On Thu, Sep 27, 2018 at 12:41 PM Justin Leet
+1 from me as well. great work
27.09.2018, 11:15, "Ryan Merriman" :
> +1 from me. Great work.
> On Thu, Sep 27, 2018 at 12:41 PM Justin Leet wrote:
>> I'm +1 on merging the feature branch into master. There's a lot of good
>> work here, and it's definitely been nice to see the couple
+1 from me. Great work.
On Thu, Sep 27, 2018 at 12:41 PM Justin Leet wrote:
> I'm +1 on merging the feature branch into master. There's a lot of good
> work here, and it's definitely been nice to see the couple remaining
> improvements make it in.
> Thanks a lot for the contribution, this is
I'm +1 on merging the feature branch into master. There's a lot of good
work here, and it's definitely been nice to see the couple remaining
improvements make it in.
Thanks a lot for the contribution, this is great stuff!
On Wed, Sep 26, 2018 at 6:26 PM Nick Allen wrote:
> Or support to be
Or support to be offered for merging this feature branch into master?
On Wed, Sep 26, 2018 at 6:20 PM Nick Allen wrote:
> Thanks for the review. With https://github.com/apache/metron/pull/1209
> I think the feature branch is ready to be merged. Sounds like I have
Thanks for the review. With
I think the feature branch is ready to be merged. Sounds like I have
Mike's support. Anyone else have comments, concerns, questions?
On Tue, Sep 25, 2018 at 10:33 PM Michael Miklavcic <
I just made a couple minor comments on that PR, and I am in agreement about
the readiness for merging with master. Good stuff Nick.
On Fri, Sep 21, 2018 at 12:37 PM Nick Allen wrote:
> Here is a PR that adds the input time constraints to the Batch Profiler
Here is a PR that adds the input time constraints to the Batch Profiler
It seems that the consensus is that this is probably the last feature we
need before merging the FB into master. The other two can wait until after
Yeah, agreed. Per use case 3, when deploying to production there really
wouldn't be a huge overlap like 3 months of already profiled data. Its day
1, the profile was just deployed around the same time as you are running
the Batch Profiler, so the overlap is in minutes, maybe hours. But I can
I think we might want to allow the flexibility to choose the date range
then. I don't yet feel like I have a good enough understanding of all the
ways in which users would want to seed to force them to run the batch job
over all the data. It might also make it easier to deal with remediation,
Assuming you have 9 months of data archived, yes.
On Thu, Sep 20, 2018 at 1:22 PM Michael Miklavcic <
> So in the case of 3 - if you had 6 months of data that hadn't been profiled
> and another 3 that had been profiled (9 months total data), in its current
So in the case of 3 - if you had 6 months of data that hadn't been profiled
and another 3 that had been profiled (9 months total data), in its current
form the batch job runs over all 9 months?
On Thu, Sep 20, 2018 at 11:13 AM Nick Allen wrote:
> > How do we establish "tm" from 1.1 above? Any
> It's just cleaner from a usage/management perspective to say "I want to put
a profile in prod, just use streaming profiler and the batch profiler with
the same setup and they're good to go."
Agreed. I can add it. It would be a simple addition.
On Thu, Sep 20, 2018 at 12:49 PM Justin Leet
> How do we establish "tm" from 1.1 above? Any concerns about overlap or
gaps after the seeding is performed?
Good point. Right now, if the Streaming and Batch Profiler overlap the
last write wins. And presumably the output of the Streaming and Batch
Profiler are the same, so no worries, right?
I think the main difference between this and the flatfile loader is that we
actively maintain our profiles in ZK for streaming. Doing this from files
is likely going to be the main usage, particularly for speculative usage.
For me, the main use case for ZK is definitely use case 3.
Ok, makes sense. That's sort of what I was thinking as well, Nick. Pulling
at this thread just a bit more...
1. I have an existing system that's been up a while, and I have added k
profiles - assume these are the first profiles I've created.
1. I would have t0 - tm (where m is the
I think more often than not, you would want to load your profile definition
from a file. This is why I considered the 'load from Zk' more of a
- In use case 1 and 2, this would definitely be the case. The profiles
I am working with are speculative and I am using the batch
I think I'm torn on this, specifically because it's batch and would
generally be run as-needed. Justin, can you elaborate on your concerns
there? This feels functionally very similar to our flat file loaders, which
all have inputs for config from the CLI only. On the other hand, our flat
The profile not being able to read from ZK feels like a fairly substantial,
if subtle, set of potential problems. I'd like to see that in either
before merging or at least pretty soon after merging. Is it a lot of work
to add that functionality based on where things are right now?
On Thu, Sep
Here is another limitation that I just thought. It can only read a profile
definition from a file. It probably also makes sense to add an option that
allows it to read the current Profiler configuration from Zookeeper.
> Is it worth setting up a default config that pulls from the main indexing
> * You do not configure the Batch Profiler in Ambari. It is configured
> and executed completely from the command-line.
Is it worth setting up a default config that pulls from the main indexing
output? I'm a little on the fence about it, but it seems like making the
most common case
I think what you have outlined above is a good initial stab at the feature.
Manual install of spark is not a big deal. Configuring via command line while
we mature this feature is ok as well. Doesn't look like configuration steps
are too hard. I think you should merge.
I would like to open a discussion to get the Batch Profiler feature branch
merged into master as part of METRON-1699  Create Batch Profiler. All
of the work that I had in mind for our first draft of the Batch Profiler
has been completed. Please take a look through what I have and let me know
FYI - Work is progressing on the Batch Profiler in Spark. For those
interested, feel free to take a look at any of the PRs that are open on
this feature branch.
On Mon, Jul 30, 2018 at 10:50 AM, Nick Allen wrote:
> >> 1. We will need a break
>> 1. We will need a break down of introducing Spark to the stack; required
version due to HDP support; do we want to update HDP support before
tuning/defaults; Spark configuration support / UI etc
All sounds useful. I'm not sure how much of that we can do before we have
Good points Otto +1 to all that.
On the Spark question, we should definitely be more deliberate about it. We
currently have an implicit dependency on spark through the zeppelin
notebooks. Most implementations I've seen of Metron also have some sort of
Spark work built around them. The current
I think the feature branch is a good idea, but what is in the feature
branch or feature branches will have to shake out.
I agree in concept with what you have in the jira, but I have two points.
1. We will need a break down of introducing Spark to the stack
- required version due to HDP
Thanks. Opening up the feature branch lets me get a PR or two out.
On Sat, Jul 28, 2018 at 1:01 PM Michael Miklavcic <
> +1 on the feature branch, Nick. I'll start reviewing the write-ups shortly.
> On Fri, Jul 27, 2018, 9:29 AM Nick Allen wrote:
+1 on the feature branch, Nick. I'll start reviewing the write-ups shortly.
On Fri, Jul 27, 2018, 9:29 AM Nick Allen wrote:
> Hi Everyone -
> A while back I opened up a discuss thread around the general idea of a
> Batch Profiler . I'd like to start making progress on a first draft of
Hi Everyone -
A while back I opened up a discuss thread around the general idea of a
Batch Profiler . I'd like to start making progress on a first draft of
I created METRON-1699  which outlines the general approach and ideas.
If you're interested, review that JIRA and
Mail list logo