Storm Question

2017-11-13 Thread Otto Fowler
Are the values in a storm Tuple guaranteed to always be in the same order,
such that you can reliably reference them
by position and not by name?

Asking for a friend.

ottO


Re: Assign METRON-1307 to Brian Hurley and close

2017-11-14 Thread Otto Fowler
I assigned it to myself and closed.


On November 14, 2017 at 09:02:04, zeo...@gmail.com (zeo...@gmail.com) wrote:

I'm unable to find Brian Hurley in the list of assignees, but he was the
one who contributed the fix[1]. Can someone assign and close this JIRA?
Thanks,

Jon

1: https://github.com/apache/metron/pull/835
-- 

Jon


Re: [MENTORS][DISCUSS] Release Procedure + 'Kafka Plugin for Bro'

2017-11-16 Thread Otto Fowler
How often to we expect to change this?  If it is effectively pinned then a
release process is not that bad.


On November 16, 2017 at 10:06:53, Nick Allen (n...@nickallen.org) wrote:

>
> I would suggest that we institute a release procedure for the package
> itself, but I don't think it necessarily has to line up with metron
> releases (happy to be persuaded otherwise). Then we can just link metron
> to metron-bro-plugin-kafka by pointing to specific
> metron-bro-plugin-kafka releases (git tags
>  versioning>
> ).
> Right now, full-dev spins up against the
> apache/metron-bro-plugin-kafka master branch, which is not a good idea to
> have in place for an upcoming release. That is the crux of why I think we
> need to finalize the move to bro 2.5.2 and the plugin packaging before
our
> next release (working on it as we speak).
> Jon


​I replayed Jon's comments from the other thread above.​

My initial thought, is that I would not want to manage two separate release
processes. I don't want to have a roll call, cut release candidates and
test both.

I was thinking we would just need to change some of the behind-the-scenes
processes handled by the release manager. This is one area where I had
thought using a submodule in Git would help.









On Thu, Nov 16, 2017 at 9:58 AM, Nick Allen  wrote:

> + Restarting the thread to include mentors.
>
> The code of the 'Kafka Plugin for Bro' is now maintained in the external
> repository that we set up a while back.
>
> - Metron Core: git://git.apache.org/metron.git
> - Kafka Plugin for Bro: git://git.apache.org/
> metron-bro-plugin-kafka.git
>
> (Q) Do we need to change anything in the release procedure to account for
> this?
>


Re: Using Storm Resource Aware Scheduler

2017-11-22 Thread Otto Fowler
How are you measuring the utilization?


On November 22, 2017 at 08:12:51, Ali Nazemian (alinazem...@gmail.com)
wrote:

Hi all,


One of the issues that we are dealing with is the fact that not all of
the Metron feeds have the same type of resource requirements. For example,
we have some feeds that even a single Strom slot is way more than what it
needs. We thought we could make it more utilised in total by limiting at
least the amount of available heap space per feed to the parser topology
worker. However, since Storm scheduler relies on available slots, it is
very hard and almost impossible to utilise the cluster in the scenario that
there will be lots of different topologies with different requirements
running at the same time. Therefore, on a daily basis, we can see that for
example one of the Storm hosts is 120% utilised and another is 20%
utilised! I was wondering whether we can address this situation by using
Storm Resource Aware scheduler or not.

P.S: it would be very nice to have a functionality to tune Storm
topology-related parameters per feed in the GUI (for example in Management
UI).


Regards,
Ali


Re: Using Storm Resource Aware Scheduler

2017-11-24 Thread Otto Fowler
Hi Ali,

This is a holiday in the US (Thanksgiving) and many people have a 4 day
weekend.  It is also common to travel for this holiday.
It is possible that some of the community that know a bit more about storm
will not be online during this time.

I do not have experience with the RAS, but in doing some research I can see
the following:

1. Our parser topologies are built in code, and therefore would require a
code change to allow for setting the cpu and memory
component properties for parser topology components.  It is also not clear
to me how we set those properties for ambari started topologies.
2. Our enrichment and indexing topologies are built with flux, so I *think*
those configurations could be edited in the field to set the cpu and memory
configurations ( as well as other RAS configs ).  But I have not seen any
Flux examples on how to do so.
3. I am not certain how much of the node specific configurations to the
yaml files can be done from ambari, but it may be.

TL/DR;
We (I) would need to do more research on how we could support this.

Hopefully someone with more storm know how will hop on this soon.

I would recommend that you open a Jira on Metron Storm Topologies
Supporting Resource Aware Scheduling


On November 24, 2017 at 02:56:28, Ali Nazemian (alinazem...@gmail.com)
wrote:

Any help regarding this question would be appreciated.


On Thu, Nov 23, 2017 at 8:57 AM, Ali Nazemian  wrote:

> 30 mins average of CPU load by checking Ambari.
>
> On 23 Nov. 2017 00:51, "Otto Fowler"  wrote:
>
> How are you measuring the utilization?
>
>
> On November 22, 2017 at 08:12:51, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Hi all,
>
>
> One of the issues that we are dealing with is the fact that not all of
> the Metron feeds have the same type of resource requirements. For example,
> we have some feeds that even a single Strom slot is way more than what it
> needs. We thought we could make it more utilised in total by limiting at
> least the amount of available heap space per feed to the parser topology
> worker. However, since Storm scheduler relies on available slots, it is
> very hard and almost impossible to utilise the cluster in the scenario that
> there will be lots of different topologies with different requirements
> running at the same time. Therefore, on a daily basis, we can see that for
> example one of the Storm hosts is 120% utilised and another is 20%
> utilised! I was wondering whether we can address this situation by using
> Storm Resource Aware scheduler or not.
>
> P.S: it would be very nice to have a functionality to tune Storm
> topology-related parameters per feed in the GUI (for example in Management
> UI).
>
>
> Regards,
> Ali
>
>
>


--
A.Nazemian


[DISCUSS] NPM / Node Problems

2017-11-24 Thread Otto Fowler
It seems like it is getting *very* common for people to have trouble
building recently. Errors with NPM and Node seen common, with fixes ranging
from updating c/c++ libs to the version of npm/node.

There has to be a better way to do this.

   -

   Are we out of date or missing requirements in our documentation?
   -

   Does our documentation need to be updated for building?
   -

   Is there a better way in maven to check the versions required for some
   of these things and fail faster with a better message?
   -

   Are we building correctly or are we asking for trouble?

The ability to build metron is pretty important, and it seems that people
are having a lot of trouble related to the new technologies in alerts and
config ui.


Re: [DISCUSS] NPM / Node Problems

2017-11-27 Thread Otto Fowler
s would be Vagrant and Virtualbox. We could cut new images
for
>>> each Metron release. Or selectively cut new dev images from master as
we
>>> see fit.
>>>
>>> (2) Distribute the Metron RPMs (and the MPack tarball?) so that users
can
>>> install Metron on a cluster without having to build it.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Nov 24, 2017 at 10:11 AM, Otto Fowler 
>>> wrote:
>>>
>>>> It seems like it is getting *very* common for people to have trouble
>>>> building recently. Errors with NPM and Node seen common, with fixes
ranging
>>>> from updating c/c++ libs to the version of npm/node.
>>>>
>>>> There has to be a better way to do this.
>>>>
>>>> -
>>>>
>>>> Are we out of date or missing requirements in our documentation?
>>>> -
>>>>
>>>> Does our documentation need to be updated for building?
>>>> -
>>>>
>>>> Is there a better way in maven to check the versions required for some
>>>> of these things and fail faster with a better message?
>>>> -
>>>>
>>>> Are we building correctly or are we asking for trouble?
>>>>
>>>> The ability to build metron is pretty important, and it seems that
people
>>>> are having a lot of trouble related to the new technologies in alerts
and
>>>> config ui.
>>>>
>>


Re: [DISCUSS] NPM / Node Problems

2017-11-27 Thread Otto Fowler
First issue is that we need c++ 11 on centos 6.8



On November 27, 2017 at 09:53:55, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Well, that’s good news on that issue. Reproducing the problem is half way
to solving it, right?

I would still say there are some systemic things going on that have
manifested in a variety of ways on both the users and dev list, so it’s
worth us having a good look at a more robust approach to node dependencies
(both npm ones, and the native ones)

Simon

On 27 Nov 2017, at 13:30, Otto Fowler  wrote:

I can reproduce the failure in out ansible docker build container, which is
also centos.
The issue is building our node on centos in all these cases.



On November 27, 2017 at 07:02:51, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Thinking about this, doesn’t our build plugin explicitly install it’s own
node? So actually all the node version things may be a red herring, since
this is under our control through the pom. Not sure if we actually
exercising this control. It seems that some of the errors people report are
more to do with compilation failures for native node modules, which is
doesn’t pin (i.e. things like system library dependencies). I’m not sure
what we have in the dependency tree that requires complex native
dependencies, but this might just be one of those node things we could doc
around.

This scenario would be fixed by standardising the build container.

Yarn’s big thing is that it enables faster dependency resolution and local
caching, right? It does not seem to address any of the problems we see, but
sure, it’s the shiny new dependency system for node modules, which might
make npm less horrible to deal with, so worth looking into.

The other issue that I’ve seen people run into a lot is flat out download
errors. This could be helped by finding our versions, maybe with yarn, but
let’s face it, package-lock.json could also do that with npm, albeit with a
slightly slower algorithm. However, short of bundling and hosting deps
ourselves, I suspect the download errors are beyond our control, and
certainly beyond the scope of this project (fix maven, fix npm, fix all the
node hosting servers…)

Simon


> On 27 Nov 2017, at 07:28, RaghuMitra Kandikonda 
wrote:
>
> Looking at some of the build failure emails and past experience i
> would suggest having a node & npm version check in our build scripts
> and moving dependency management to yarn.
>
> We need not restrict the build to a specific version of node & npm but
> we can surely suggest a min version required to build UI successfully.
>
> -Raghu
>
>
>
> On Fri, Nov 24, 2017 at 10:21 PM, Simon Elliston Ball
>  wrote:
>> Agreeing with Nick, it seems like the main reason people are building
themselves, and hitting all these environmental issues, is that we do not
as a project produce binary release artefacts (the rpms which users could
just install) and instead leave that for the commercial distributors to do.
>>
>> Yarn may help with some of the dependency version issues we’re having,
but not afaik with the core missing library headers / build tools / node
and npm version issue, those would seem to fit a documentation fix and
improvements to platform-info to flag the problems, so this can then be a
pre-flight check tool as well as a diagnostic tool.
>>
>> Another option I would put on the table is to standardise our build
environment, so that the non-java bits are run in a standard docker image
or something fo the sort, that way we can take control of all the
environmental and OS dependent pieces, much as we do right now with the rpm
build sections of the mpack build.
>>
>> The challenge here will be adding the relevant maven support. At the
moment we’re relying on the maven npm and node build plugins, this would
likely need replacing with something custom and a challenge to support to
go dow this route.
>>
>> Perhaps the real answer here is to push people who are just kicking the
tyres towards a binary distribution, or at least rpm artefacts as part of
the Apache release to give them a head start for a happy path on a known
good OS environment.
>>
>> Simon
>>
>>> On 24 Nov 2017, at 16:01, Nick Allen  wrote:
>>>
>>> Yes, it is a problem. I think you've identified a couple important
things
>>> that we could address in parallel. I see these as challenges we need to
>>> solve for the dev community.
>>>
>>> (1) NPM is causing us some major headaches. Which version do we require?

>>> How do I install that version (on Mac, Windows, Linux)? Does YARN help
>>> here at all?
>>>
>>> (2) Can we automate the prerequisite checks that we currently do
manually
>>> with `platform-info.sh`? An automated check could run and fail as part
of
>>> the build or de

Re: [DISCUSS] NPM / Node Problems

2017-11-27 Thread Otto Fowler
OK,
So I have >mvn clean package working in docker.
I want to try a couple of things and maybe I can throw a pr together.



On November 27, 2017 at 10:03:31, Otto Fowler (ottobackwa...@gmail.com)
wrote:

First issue is that we need c++ 11 on centos 6.8



On November 27, 2017 at 09:53:55, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Well, that’s good news on that issue. Reproducing the problem is half way
to solving it, right?

I would still say there are some systemic things going on that have
manifested in a variety of ways on both the users and dev list, so it’s
worth us having a good look at a more robust approach to node dependencies
(both npm ones, and the native ones)

Simon

On 27 Nov 2017, at 13:30, Otto Fowler  wrote:

I can reproduce the failure in out ansible docker build container, which is
also centos.
The issue is building our node on centos in all these cases.



On November 27, 2017 at 07:02:51, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Thinking about this, doesn’t our build plugin explicitly install it’s own
node? So actually all the node version things may be a red herring, since
this is under our control through the pom. Not sure if we actually
exercising this control. It seems that some of the errors people report are
more to do with compilation failures for native node modules, which is
doesn’t pin (i.e. things like system library dependencies). I’m not sure
what we have in the dependency tree that requires complex native
dependencies, but this might just be one of those node things we could doc
around.

This scenario would be fixed by standardising the build container.

Yarn’s big thing is that it enables faster dependency resolution and local
caching, right? It does not seem to address any of the problems we see, but
sure, it’s the shiny new dependency system for node modules, which might
make npm less horrible to deal with, so worth looking into.

The other issue that I’ve seen people run into a lot is flat out download
errors. This could be helped by finding our versions, maybe with yarn, but
let’s face it, package-lock.json could also do that with npm, albeit with a
slightly slower algorithm. However, short of bundling and hosting deps
ourselves, I suspect the download errors are beyond our control, and
certainly beyond the scope of this project (fix maven, fix npm, fix all the
node hosting servers…)

Simon


> On 27 Nov 2017, at 07:28, RaghuMitra Kandikonda 
wrote:
>
> Looking at some of the build failure emails and past experience i
> would suggest having a node & npm version check in our build scripts
> and moving dependency management to yarn.
>
> We need not restrict the build to a specific version of node & npm but
> we can surely suggest a min version required to build UI successfully.
>
> -Raghu
>
>
>
> On Fri, Nov 24, 2017 at 10:21 PM, Simon Elliston Ball
>  wrote:
>> Agreeing with Nick, it seems like the main reason people are building
themselves, and hitting all these environmental issues, is that we do not
as a project produce binary release artefacts (the rpms which users could
just install) and instead leave that for the commercial distributors to do.
>>
>> Yarn may help with some of the dependency version issues we’re having,
but not afaik with the core missing library headers / build tools / node
and npm version issue, those would seem to fit a documentation fix and
improvements to platform-info to flag the problems, so this can then be a
pre-flight check tool as well as a diagnostic tool.
>>
>> Another option I would put on the table is to standardise our build
environment, so that the non-java bits are run in a standard docker image
or something fo the sort, that way we can take control of all the
environmental and OS dependent pieces, much as we do right now with the rpm
build sections of the mpack build.
>>
>> The challenge here will be adding the relevant maven support. At the
moment we’re relying on the maven npm and node build plugins, this would
likely need replacing with something custom and a challenge to support to
go dow this route.
>>
>> Perhaps the real answer here is to push people who are just kicking the
tyres towards a binary distribution, or at least rpm artefacts as part of
the Apache release to give them a head start for a happy path on a known
good OS environment.
>>
>> Simon
>>
>>> On 24 Nov 2017, at 16:01, Nick Allen  wrote:
>>>
>>> Yes, it is a problem. I think you've identified a couple important
things
>>> that we could address in parallel. I see these as challenges we need to
>>> solve for the dev community.
>>>
>>> (1) NPM is causing us some major headaches. Which version do we require?

>>> How do I install that version (on Mac, Windows, Linux)? Does YARN help
>>> here at all?
>>>
&g

Re: [DISCUSS] NPM / Node Problems

2017-11-27 Thread Otto Fowler
Also, since I changed the profiles to not run the rpm docker if you are in
docker already ( and put the rpm tools into the ansible docker ) a while ago
we may be able to build world in the ansible image, and point folks having
issues to that….


On November 27, 2017 at 10:57:03, Otto Fowler (ottobackwa...@gmail.com)
wrote:

OK,
So I have >mvn clean package working in docker.
I want to try a couple of things and maybe I can throw a pr together.



On November 27, 2017 at 10:03:31, Otto Fowler (ottobackwa...@gmail.com)
wrote:

First issue is that we need c++ 11 on centos 6.8



On November 27, 2017 at 09:53:55, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Well, that’s good news on that issue. Reproducing the problem is half way
to solving it, right?

I would still say there are some systemic things going on that have
manifested in a variety of ways on both the users and dev list, so it’s
worth us having a good look at a more robust approach to node dependencies
(both npm ones, and the native ones)

Simon

On 27 Nov 2017, at 13:30, Otto Fowler  wrote:

I can reproduce the failure in out ansible docker build container, which is
also centos.
The issue is building our node on centos in all these cases.



On November 27, 2017 at 07:02:51, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Thinking about this, doesn’t our build plugin explicitly install it’s own
node? So actually all the node version things may be a red herring, since
this is under our control through the pom. Not sure if we actually
exercising this control. It seems that some of the errors people report are
more to do with compilation failures for native node modules, which is
doesn’t pin (i.e. things like system library dependencies). I’m not sure
what we have in the dependency tree that requires complex native
dependencies, but this might just be one of those node things we could doc
around.

This scenario would be fixed by standardising the build container.

Yarn’s big thing is that it enables faster dependency resolution and local
caching, right? It does not seem to address any of the problems we see, but
sure, it’s the shiny new dependency system for node modules, which might
make npm less horrible to deal with, so worth looking into.

The other issue that I’ve seen people run into a lot is flat out download
errors. This could be helped by finding our versions, maybe with yarn, but
let’s face it, package-lock.json could also do that with npm, albeit with a
slightly slower algorithm. However, short of bundling and hosting deps
ourselves, I suspect the download errors are beyond our control, and
certainly beyond the scope of this project (fix maven, fix npm, fix all the
node hosting servers…)

Simon


> On 27 Nov 2017, at 07:28, RaghuMitra Kandikonda 
wrote:
>
> Looking at some of the build failure emails and past experience i
> would suggest having a node & npm version check in our build scripts
> and moving dependency management to yarn.
>
> We need not restrict the build to a specific version of node & npm but
> we can surely suggest a min version required to build UI successfully.
>
> -Raghu
>
>
>
> On Fri, Nov 24, 2017 at 10:21 PM, Simon Elliston Ball
>  wrote:
>> Agreeing with Nick, it seems like the main reason people are building
themselves, and hitting all these environmental issues, is that we do not
as a project produce binary release artefacts (the rpms which users could
just install) and instead leave that for the commercial distributors to do.
>>
>> Yarn may help with some of the dependency version issues we’re having,
but not afaik with the core missing library headers / build tools / node
and npm version issue, those would seem to fit a documentation fix and
improvements to platform-info to flag the problems, so this can then be a
pre-flight check tool as well as a diagnostic tool.
>>
>> Another option I would put on the table is to standardise our build
environment, so that the non-java bits are run in a standard docker image
or something fo the sort, that way we can take control of all the
environmental and OS dependent pieces, much as we do right now with the rpm
build sections of the mpack build.
>>
>> The challenge here will be adding the relevant maven support. At the
moment we’re relying on the maven npm and node build plugins, this would
likely need replacing with something custom and a challenge to support to
go dow this route.
>>
>> Perhaps the real answer here is to push people who are just kicking the
tyres towards a binary distribution, or at least rpm artefacts as part of
the Apache release to give them a head start for a happy path on a known
good OS environment.
>>
>> Simon
>>
>>> On 24 Nov 2017, at 16:01, Nick Allen  wrote:
>>>
>>> Yes, it is a problem. I think you've identified a couple important
things
>>> that we

Re: [DISCUSS] Upcoming Release

2017-11-27 Thread Otto Fowler
n calculation of scores
(justinleet) closes apache/metron#763
METRON-1187 Indexing/Profiler Kafka ACL Groups Not Setup Correctly
(nickwallen) closes apache/metron#759
METRON-1185: Stellar REPL does not work on a kerberized cluster when
calling functions interacting with HBase closes apache/incubator-metron#755
METRON-1186: Profiler Functions use classutils from shaded storm closes
apache/incubator-metron#758
METRON-1173: Fix pointers to old stellar docs closes
apache/incubator-metron#746
METRON-1179: Make STATS_ADD to take a list closes
apache/incubator-metron#750
METRON-1180: Make Stellar Shell accept zookeeper quorum as a CSV list and
not require a port closes apache/incubator-metron#751
METRON-1183 Improve KDC Setup Instructions (nickwallen) closes
apache/metron#753
METRON-1177 Stale running topologies seen post-kerberization and cause
exceptions (nickwallen) closes apache/metron#748
METRON-1158 Build backend for grouping alerts into meta alerts (justinleet)
closes apache/metron#734
METRON-1146: Add ability to parse JSON string into JSONObject for stellar
closes apache/incubator-metron#727
METRON-1176 REST: HDFS Service should support setting permissions on files
when writing (ottobackwards) closes apache/metron#749
METRON-1114 Add group by capabilities to search REST endpoint (merrimanr)
closes apache/metron#702
METRON-1167 Define Session Specific Global Configuration Values in the REPL
(nickwallen) closes apache/metron#740
METRON-1171: Better validation for the SUBSTRING stellar function closes
apache/incubator-metron#745



On 11/17/17, 11:59 AM, "Nick Allen"  wrote:

I just wanted to send an update on where we are at. We've gotten a lot
done here recently as you can see below.

✓ DONE (1) First, METRON-1289 needs to go in. This one was a fairly big
effort and I am hearing that we are pretty close.

✓ DONE (2) METRON-1294 fixes an issue in how field types are looked-up.

✓ DONE (3) METRON-1290 is next. While this may have been fixed in
M-1289, there may be some test cases we want from this PR.

✓ DONE (4) METRON-1301 addresses a problem with the sorting logic.

✓ DONE (5) METRON-1291 fixes an issue with escalation of metaalerts.

(6) That leads us to Raghu's UI work in METRON-1252. This introduces the
UI bits that depend on all the previous backend work.

(7) At this point, we should have our best effort at running Metaalerts
on Elasticsearch 2.x. I propose that we cut a release here.

(8) After we cut the release, we can introduce the work for ES 5.x in
METRON-939. I know we will need lots of help testing and reviewing this
one.



We also have an outstanding question that needs resolved BEFORE we
release. We need to come to a consensus on how to release having moved our
Bro Plugin to a separate repo. I don't think we've heard from everyone on
this. I'd urge everyone to chime in so we can choose a path forward.

If anyone is totally confused in regards to that discussion, I can try and
send an options summary again as a separate discuss thread. The original
chain was somewhere around here [1].

[1]
https://lists.apache.org/thread.html/54a4474881b97e559df24728b3a0e923a58345a282451085eef832ef@%3Cdev.metron.apache.org%3E



On Wed, Nov 15, 2017 at 10:04 AM, Nick Allen  wrote:

> Hi Guys -
>
> I want to follow-up on this discussion. It sounds like most people are in
> agreement with the general approach.
>
> A lot of people have been working hard on Metaalerts and Elasticsearch. I
> have checked-in with those doing the heavy lifting and have compiled a
more
> detailed plan based on where we are at now. To the best of my knowledge
> here is the plan of attack for finishing out this effort.
>
> (1) First, METRON-1289 needs to go in. This one was a fairly big effort
> and I am hearing that we are pretty close.
>
> (2) METRON-1294 fixes an issue in how field types are looked-up.
>
> (3) METRON-1290 is next. While this may have been fixed in M-1289,
> there may be some test cases we want from this PR.
>
> (4) METRON-1301 addresses a problem with the sorting logic.
>
> (5) METRON-1291 fixes an issue with escalation of metaalerts.
>
> (6) That leads us to Raghu's UI work in METRON-1252. This introduces
> the UI bits that depend on all the previous backend work.
>
> (7) At this point, we should have our best effort at running Metaalerts
> on Elasticsearch 2.x. I propose that we cut a release here.
>
> (8) After we cut the release, we can introduce the work for ES 5.x in
> METRON-939. I know we will need lots of help testing and reviewing this
> one.
>
> Please correct me if I am wrong. I will try and send out updates as we
> make progress.
>
>
>
>
>
> On Mon, Nov 6, 2017 at 1:03 PM, zeo...@gmail.com 
wrote:
>
>> I agree, I think it's very reasonable to move in line with Nick's
>> proposal. I would also suggest that we outline w

Re: [MENTORS][DISCUSS] Release Procedure + 'Kafka Plugin for Bro'

2017-11-27 Thread Otto Fowler
I am not sure that our use of the plugin necessarily equates to it being
implicitly coupled to Metron.  It seems like the Right Thing To Do™, esp.
 for an Apache project would be to make this available for use by the
greater bro community.
Unless we expect to do extensive iterative work on the plugin, which would
then make the decision to spin it out now premature.

Then again, I might be wrong ;)


On November 27, 2017 at 19:58:11, Matt Foley (ma...@apache.org) wrote:

[Please pardon me that the below is a little labored. I’m trying to
understand the implications for both release and use, which requires some
explanation as well as the two questions needed. Q1 and Q2 below are
probably the same question, asked in slightly different contexts. Please
consider them together.]

So this made me go back and look at the history that caused us to put the
bro plugin in a separate repo. As best I can see, this was in
https://issues.apache.org/jira/browse/METRON-813 , which cites an email
discussion thread. Also please see
https://issues.apache.org/jira/browse/METRON-883 for background on the
plugin itself.

As best I can assemble the many bits brought up in the threads, the reasons
to put it in a separate repo was:
- The plugin was thought to be useful to multiple clients of bro and kafka,
including Storm and Spark, as well as Metron.
- Originally the bro project was maintaining bro plugins and it was thought
they might adopt this one.
- Bro then formalized their plugin framework BUT dumped all plugins out of
their sphere of maintenance.
- As of 3/31/2017, Nick said that “the [bro] package mechanism requires
that a package live within its own repo”. Jon said “the bro packages model
doesn't allow colocation with anything else.”
- So on 3/31 Jon opened METRON-813, and the metron-bro-plugin-kafka repo
was created a few days later. But Metron wasn’t actually modified to remove
the metron-sensors/bro-plugin-kafka/ subdirectory and start using the
plugin from the metron-bro-plugin-kafka repo until Nov 12 – two weeks ago!
– with https://issues.apache.org/jira/browse/METRON-1309 .
- Presumably the need to have metron-bro-plugin-kafka in a separate repo
remain valid, if the bro plugin mechanism is used. But obviously there are
(non-conforming) ways to build the plugin as part of metron, and install it
in a way that works.

Q1. I think that last statement needs some explanation. Nick or Jon, can
you please expand on it, especially wrt how the end user installs the
plugin once the plugin is built the two different ways? And whether it’s
still valuable to have a separate repo for the plugin?

Nick suggests using a submodule approach to managing the bro plugin, for
Metron versioning purposes. As I understand it, this would continue the
existence of the metron-bro-plugin-kafka repo, but copy it into the metron
code tree for building, versioning, and release purposes. Git submodules
are documented here: https://git-scm.com/book/en/v2/Git-Tools-Submodules .
We would use the submodule capability to clone the metron-bro-plugin-kafka
source code into a subdirectory of Metron at the time one clones the metron
repo. It would then be released with Metron as part of the source code
release for a given version of Metron. Part of the way submodules are
managed, is that git stores the SHA1 hash of the submodule into a file
named .gitmodules, which in turn gets saved when you do a git push. So
indeed submodules would ensure that everyone cloning a given version of
metron would get the expected “version” (sha, actually) of
metron-bro-plugin-kafka.

This sounds like a good idea, although it isn’t without cost. Submodules
impose the need for additional commands to actually get a copy of the
submodule source, and if the plugin repo advanced beyond the version in a
metron repo, it causes some ‘git status’ artifacts that could be confusing
to folks who aren’t familiar with submodules. But these can be documented.

Q2. Nick, what I’m not clear about is the process by which the
metron-bro-plugin-kafka would be built and “plugged in” by (a) metron
developers, and (b) end users. If it “must” be in a separate repo to be
successfully built and managed by the bro plugin mechanism, does that mean
it can’t be built from the copy in the Metron source tree? Yet until
November, that’s exactly what we were doing. Do we go back to doing that?
What does that mean wrt users installing the plugin?

Thanks for your patience in reading this far.
--Matt


On 11/27/17, 2:58 PM, "James Sirota"  wrote:

I agree with Nick. Since the plugin is tightly coupled with Metron why not
just pull it into the main repo and version it with the rest of the code?
Do we really need the second repo for the plug-in?

Thanks,
James



16.11.2017, 08:06, "Nick Allen" :
>> I would suggest that we institute a release procedure for the package
>> itself, but I don't think it necessarily has to line up with metron
>> releases (happy to be persuaded otherwise). Then we can just link metron
>> t

Re: [DISCUSS] e2e test infrastructure

2017-11-28 Thread Otto Fowler
As long as there is not a large chuck of custom deployment that has to be
maintained docker sounds ideal.
I would like to understand what it would take to create the docker e2e env.



On November 28, 2017 at 17:27:13, Ryan Merriman (merrim...@gmail.com) wrote:

Currently the e2e tests for our Alerts UI depends on full dev being up and
running. This is not a good long term solution because it forces a
contributor/reviewer to run the tests manually with full dev running. It
would be better if the backend services could be made available to the e2e
tests while running in Travis. This would allow us to add the e2e tests to
our automated build process.

What is the right approach? Here are some options I can think of:

- Use the in-memory components we use for the backend integration tests
- Use a Docker approach
- Use mock components designed for the e2e tests

Mocking the backend would be my least favorite option because it would
introduce a complex module of code that we have to maintain.

The in-memory approach has some shortcomings but we may be able to solve
some of those by moving components to their own process and spinning them
up/down at the beginning/end of tests. Plus we are already using them.

My preference would be Docker because it most closely mimics a real
installation and gives you isolation, networking and dependency management
features OOTB. In many cases Dockerfiles are maintained and published by a
third party and require no work other than some setup like loading data or
templates/schemas. Elasticsearch is a good example.

I believe we could make any of these approaches work in Travis. What does
everyone think?

Ryan


Re: [DISCUSS] e2e test infrastructure

2017-11-29 Thread Otto Fowler
ests from the unit tests
> entirely. It would likely improve the build times if we we're reusing the
> components between test classes (keep in mind right now, we only reuse
> between test cases in a given class).
>
> In my mind, ideally we have a single infra for integration and e2e tests.
> I'd like to be able to run them from IntelliJ and debug them directly (or
> at least be able to easily, and in a well documented manner, be able to
do
> remote debugging of them). Obviously, that's easier said than done, but
> what I'd like to avoid us having essentially two different ways to do the
> same thing (spin up some our of dependency components and run code
against
> them). I'm worried that's quick vs full dev all over again. But without
us
> being able to easily kill one because half of tests depend on one and
half
> on the other.
>
> On Wed, Nov 29, 2017 at 1:22 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > What about just spinning up each of the components in their own
process?
> > It's even lighter weight, doesn't have the complications for HDFS (you
> can
> > use the local FS easily, for example), and doesn't have any issues
around
> > ports and port mapping with the containers.
> >
> > On Tue, Nov 28, 2017 at 3:48 PM, Otto Fowler 
> > wrote:
> >
> > > As long as there is not a large chuck of custom deployment that has
to
> be
> > > maintained docker sounds ideal.
> > > I would like to understand what it would take to create the docker
e2e
> > env.
> > >
> > >
> > >
> > > On November 28, 2017 at 17:27:13, Ryan Merriman (merrim...@gmail.com)
> > > wrote:
> > >
> > > Currently the e2e tests for our Alerts UI depends on full dev being
up
> > and
> > > running. This is not a good long term solution because it forces a
> > > contributor/reviewer to run the tests manually with full dev running.
> It
> > > would be better if the backend services could be made available to
the
> > e2e
> > > tests while running in Travis. This would allow us to add the e2e
tests
> > to
> > > our automated build process.
> > >
> > > What is the right approach? Here are some options I can think of:
> > >
> > > - Use the in-memory components we use for the backend integration
tests
> > > - Use a Docker approach
> > > - Use mock components designed for the e2e tests
> > >
> > > Mocking the backend would be my least favorite option because it
would
> > > introduce a complex module of code that we have to maintain.
> > >
> > > The in-memory approach has some shortcomings but we may be able to
> solve
> > > some of those by moving components to their own process and spinning
> them
> > > up/down at the beginning/end of tests. Plus we are already using
them.
> > >
> > > My preference would be Docker because it most closely mimics a real
> > > installation and gives you isolation, networking and dependency
> > management
> > > features OOTB. In many cases Dockerfiles are maintained and published
> by
> > a
> > > third party and require no work other than some setup like loading
data
> > or
> > > templates/schemas. Elasticsearch is a good example.
> > >
> > > I believe we could make any of these approaches work in Travis. What
> does
> > > everyone think?
> > >
> > > Ryan
> > >
> >
>


Re: [DISCUSS] e2e test infrastructure

2017-11-29 Thread Otto Fowler
So we will just have a :

ZK container
Kafka Container
HDFS Container

and not deploy any metron stuff to them in the docker setup, the test
itself will deploy what it needs and cleanup?


On November 29, 2017 at 11:53:46, Ryan Merriman (merrim...@gmail.com) wrote:

“I would feel better using docker if each docker container only had the
base services, and did not require a separate but parallel deployment path
to ambari”

This exactly how it works. There is a container for each base service, just
like we now have an in-memory component for each base service. There is
also no deployment path to Ambari. Ambari is not involved at all.

>From a client perspective (our e2e/integration tests in this case) there
really is not much of a difference. At the end of the day services are up
and running and available on various ports.

Also there is going to be maintenance required no matter what approach we
decide on. If we add another ES template that needs to be loaded by the
MPack, our e2e/integration test infrastructure will also have to load that
template. I have had to do this with our current integration tests.

> On Nov 29, 2017, at 9:38 AM, Otto Fowler  wrote:
>
> So the issue with metron-docker is that it is all custom setup for metron
components, and understanding how to maintain it when you make changes to
the system is difficult for the developers.
> This is a particular issue to me, because I would have to re-write a big
chunk of it to accommodate 777.
>
> I would feel better using docker if each docker container only had the
base services, and did not require a separate but parallel deployment path
to ambari. That is to say if the docker components
> were functional equivalent and limited to the in memory components
functionality and usage. I apologize if that is in fact what you are
getting at.
>
> Then we could move the integrations and e2e to them.
>
>
>
>> On November 29, 2017 at 10:00:20, Ryan Merriman (merrim...@gmail.com)
wrote:
>>
>> Thanks for the feedback so far everyone. All good points.
>>
>> Otto, if we did decide to go down the Docker route, we could
>> use /master/metron-contrib/metron-docker as a starting point. The reason
I
>> initially create that module was to support Management UI testing
because
>> full dev was unusable for that purpose at that time. This is the same
use
>> case. A lot of the work has already been done but we would need to
review
>> it and bring it up to date with the current state of master. Once we get
>> it to a point where we can manually spin up the Docker environment and
get
>> the e2e tests to pass, we would then need to add it into our Travis
>> workflow.
>>
>> Mike, yes this is one of the options I listed at the start of the
discuss
>> thread although I'm not sure I agree with the Docker disadvantages you
>> list. We could use a similar approach for HDFS in Docker by setting it
to
>> local FS and creating a shared volume that all the containers have
access
>> to. I've also found that Docker Compose makes the networking part much
>> easier. What other advantages would in-memory components in separate
>> process offer us that you can think of? Are there other disadvantages
with
>> using Docker?
>>
>> Justin, I think that's a really good point and I would be on board with
>> it. I see this use case (e2e testing infrastructure) as a good way to
>> evaluate our options without making major changes across our codebase. I
>> would agree that standardizing on an approach would be ideal and
something
>> we should work towards. The debugging request is also something that
would
>> be extremely helpful. The only issue I see is debugging a Storm
topology,
>> this would still need to be run locally using LocalCluster because
remote
>> debugging does not work well in Storm (per previous comments from Storm
>> committers). At one point I was able to get this to work with Docker
>> containers but we would definitely need to revisit it and create tooling
>> around it.
>>
>> So in summary, I think we agree on these points so far:
>>
>> - no one seems to be in favor of mocking our backend so I'll take that
>> option off the table
>> - everyone seems to be in favor of moving to a strategy where we spin up
>> backend services at the beginning of all tests and spin down at the end,
>> rather than spinning up/down for each class or suite of tests
>> - the ability to debug our code locally is important and something to
>> keep in mind as we evaluate our options
>>
>> I think the next step is to decide whether we pursue in-memory/separate
>> process vs Docker. Having used both, there are a couple disadvantages I
>> see with the in-memory a

Re: DISCUSS: Quick change to parser config

2017-11-30 Thread Otto Fowler
I would suggest that instead of explicitly having “complete”, we have
“operation”:”complete”

Such that we can have multiple transformations, each with a different
“operation”.
No operation would be the status quo ante, if we can do it so that we don’t
get errors with old configs and the keep same behavior.

{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.




On November 30, 2017 at 20:00:06, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

I’m looking at the way parser config works, and transformation of field
from their native names in, for example the ASA or CEF parsers, into a
standard data model.

At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might
have:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which leave me with the field set:
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst",
"pointlessExtraStuff": null,
"ipSrc": null,
"ipDst": null
}
}
]
}

which seems a little over verbose.

Do you think it would be valuable to add a switch of some sort on the
transformation to make it “complete”, i.e. to only preserve fields which
are explicitly set.

To my mind, this breaks a principal of mutability, but gives us much much
cleaner mapping of data.

I would propose something like:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"complete": true,
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
effectively making the nulling in my previous example implicit.

Thoughts?

Also, in the second scenario, if ‘output' were to be empty would we assume
that the output field set should be ["ip_src_addr", “ip_dst_addr”]?

Simon


Re: DISCUSS: Quick change to parser config

2017-11-30 Thread Otto Fowler
Or, we can create new transformation types
STELLAR_COMPLETE, which may be more in line with the original design.



On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com)
wrote:

I would suggest that instead of explicitly having “complete”, we have
“operation”:”complete”

Such that we can have multiple transformations, each with a different
“operation”.
No operation would be the status quo ante, if we can do it so that we don’t
get errors with old configs and the keep same behavior.

{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.




On November 30, 2017 at 20:00:06, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

I’m looking at the way parser config works, and transformation of field
from their native names in, for example the ASA or CEF parsers, into a
standard data model.

At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might
have:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which leave me with the field set:
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst",
"pointlessExtraStuff": null,
"ipSrc": null,
"ipDst": null
}
}
]
}

which seems a little over verbose.

Do you think it would be valuable to add a switch of some sort on the
transformation to make it “complete”, i.e. to only preserve fields which
are explicitly set.

To my mind, this breaks a principal of mutability, but gives us much much
cleaner mapping of data.

I would propose something like:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"complete": true,
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
effectively making the nulling in my previous example implicit.

Thoughts?

Also, in the second scenario, if ‘output' were to be empty would we assume
that the output field set should be ["ip_src_addr", “ip_dst_addr”]?

Simon


Re: DISCUSS: Quick change to parser config

2017-12-04 Thread Otto Fowler
Would https://github.com/apache/metron/pull/687 play some role in this?
Or could it be made to?


On December 4, 2017 at 12:21:40, Casey Stella (ceste...@gmail.com) wrote:

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
  "fieldTransformations": [
{
  "transformation": "STELLAR",
  "config": [
"ipSrc ^= TRIM(raw_ip_src)"
"ip_src_addr := ipSrc"
  ]
}
  ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
  "fieldTransformations": [
{
  "transformation": "STELLAR",
  "config": [
"ipSrc ^= TRIM(raw_ip_src)"
"ip_src_addr := ipSrc"
  ]
},
 {
  "transformation": "COMPLETE",
  "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
}
  ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Hmmm… Actually, I kinda like that.
>
> May want a little refactoring in the back for clarity.
>
> My question about whether we could ever imagine this ‘cleanup policy’
> applying to other transforms would sway me to the field rather than
> transformation name approach though.
>
> Simon
>
> > On 1 Dec 2017, at 01:17, Otto Fowler  wrote:
> >
> > Or, we can create new transformation types
> > STELLAR_COMPLETE, which may be more in line with the original design.
> >
> >
> >
> > On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com
> <mailto:ottobackwa...@gmail.com>) wrote:
> >
> >> I would suggest that instead of explicitly having “complete”, we have
> “operation”:”complete”
> >>
> >> Such that we can have multiple transformations, each with a different
> “operation”.
> >> No operation would be the status quo ante, if we can do it so that we
> don’t get errors with old configs and the keep same behavior.
> >>
> >> {
> >> "fieldTransformations": [
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “complete",
> >> "output": ["ip_src_addr", "ip_dst_addr"],
> >> "config": {
> >> "ip_src_addr": "ipSrc",
> >> "ip_dest_addr": "ipDst"
> >> } ,
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “SomeOtherThing",
> >> "output": [“foo", “bar"],
> >> "config": {
> >> “foo": “TO_UPPER(foo)",
> >> “bar": “TO_LOWER(bar)"
> >> }
> >> }
> >> ]
> >> }
> >>
> >>
> >> Sorry for the junk examples, but hopefully it makes sense.
> >>
> >>
> >>
> >>
> >>
> >> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) wrote:
> >>
> >>> I’m looking at the way parser config works, and transformation of
> field from their native names in, for example the ASA or CEF parsers, into
> a standard data model.
> >>>
> >>> At the moment I would do something like this:
> >>>
> >>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
> might have:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": &quo

Re: DISCUSS: Quick change to parser config

2017-12-04 Thread Otto Fowler
I’m not sure about consensus. I would like to see it summarized.

My point about assignment has to do with how many assignment like operators
we are going to support.  The fact that the assignment is to a variable
that is temporary or not doesn’t need to be part of the grammar/language,
 since all variable management is external in Stellar, that may not be
necessary.



On December 4, 2017 at 13:14:23, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Personally I suspect that temporary variable is a different thing as is the
assignment PR. Might be useful for intermediate steps in a parser, but then
we’re potentially getting more complex than a parser wants to be. I am
warming to the idea of temporary variables though.

In terms of the removal, I like the idea of the COMPLETE transformation to
express a projection. That makes the output interface of the metron object
more explicit in a parser, which makes governance much easier.

Do we think this is a good consensus? Shall I ticket it (I might even code
it!) in the transformation form proposed?

Simon

On 4 Dec 2017, at 17:21, Casey Stella  wrote:

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
 "fieldTransformations": [
   {
 "transformation": "STELLAR",
 "config": [
   "ipSrc ^= TRIM(raw_ip_src)"
   "ip_src_addr := ipSrc"
 ]
   }
 ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
 "fieldTransformations": [
   {
 "transformation": "STELLAR",
 "config": [
   "ipSrc ^= TRIM(raw_ip_src)"
   "ip_src_addr := ipSrc"
 ]
   },
{
 "transformation": "COMPLETE",
 "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
   }
 ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

Hmmm… Actually, I kinda like that.

May want a little refactoring in the back for clarity.

My question about whether we could ever imagine this ‘cleanup policy’
applying to other transforms would sway me to the field rather than
transformation name approach though.

Simon

On 1 Dec 2017, at 01:17, Otto Fowler  wrote:

Or, we can create new transformation types
STELLAR_COMPLETE, which may be more in line with the original design.



On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com

<mailto:ottobackwa...@gmail.com >) wrote:


I would suggest that instead of explicitly having “complete”, we have

“operation”:”complete”


Such that we can have multiple transformations, each with a different

“operation”.

No operation would be the status quo ante, if we can do it so that we

don’t get errors with old configs and the keep same behavior.


{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.





On November 30, 2017 at 20:00:06, Simon Elliston Ball (

si...@simonellistonball.com <mailto:si...@simonellistonball.com
>) wrote:


I’m looking at the way parser config works, and transformation of

field from their native names in, for example the ASA or CEF parsers, into
a standard data model.


At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtra

Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-04 Thread Otto Fowler
My first thought is what are the errors when you get a high error rate?


On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com) wrote:

Any thoughts?

On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
wrote:

> Hi,
>
> We have noticed recently that no matter what batch size we use for Metron
> indexing feeds, as long as we start using different batch size for
> different Metron feeds, indexing topology throughput will start dropping
> due to the high error rate! So I was wondering whether based on the
current
> indexing topology design, we have to choose the same batch size for all
the
> feeds or not. Otherwise, throughout will be dropped. I assume since it is
> acceptable to use different batch sizes for different feeds, it is not
> expected by design.
>
> Moreover, I have noticed in practice that even if we change the batch
> size, it will not affect the messages that are already in enrichments or
> indexing topics, and it will only affect the new messages that are coming
> to the parser. Therefore, we need to let all the messages pass the
indexing
> topology so that we can change the batch size!
>
> It would be great if we can have more details regarding the design of
this
> section so we can understand our observations are based on the design or
> some kind of bug.
>
> Regards,
> Ali
>



-- 
A.Nazemian


Re: [MENTORS][DISCUSS] Release Procedure + 'Kafka Plugin for Bro'

2017-12-04 Thread Otto Fowler
cause of how bro-pkg works. If you'd like
to get an idea of how this would work in application for Bro users, you can
see my test instructions here (specifically step #3). If a 0.1 tag gets
pushed to apache/metron-bro-plugin-kafka, the command could be `bro-pkg
install metron-bro-plugin-kafka --version 0.1` or `bro-pkg install
apache/metron-bro-plugin-kafka --version 0.1` due to this (the --force is
just to remove user interaction, for an ansible spin-up).





1: To clone the Bro git repo, you must run `git clone --recursive
https://github.com/bro/bro` (note the --recursive). Not too big of a deal,
but requires that you remember it and existing instructions/blog posts may
give users inaccurate steps. Let's make this worse and try to checkout
their latest release, v2.5.2, and automatically update the submodules
appropriately via `git checkout v2.5.2 --recurse-submodules`. This fails
because aux/plugins (https://github.com/bro/plugins) was removed since
their latest release. Okay, we can work around this using `git checkout
v2.5.2` and then remember to `git submodule update` every time you checkout
a release or branch. But because they have nested submodules, we actually
need to run `git submodule update --recursive`. I can't imagine opting into
a workflow anything like this. There are other options as well, such as git
subtrees, but those I am less familiar with.



Jon



On Mon, Nov 27, 2017 at 8:59 PM Otto Fowler 
wrote:

I am not sure that our use of the plugin necessarily equates to it being
implicitly coupled to Metron. It seems like the Right Thing To Do, esp.
for an Apache project would be to make this available for use by the
greater bro community.
Unless we expect to do extensive iterative work on the plugin, which would
then make the decision to spin it out now premature.

Then again, I might be wrong ;)


On November 27, 2017 at 19:58:11, Matt Foley (ma...@apache.org) wrote:

[Please pardon me that the below is a little labored. I’m trying to
understand the implications for both release and use, which requires some
explanation as well as the two questions needed. Q1 and Q2 below are
probably the same question, asked in slightly different contexts. Please
consider them together.]

So this made me go back and look at the history that caused us to put the
bro plugin in a separate repo. As best I can see, this was in
https://issues.apache.org/jira/browse/METRON-813 , which cites an email
discussion thread. Also please see
https://issues.apache.org/jira/browse/METRON-883 for background on the
plugin itself.

As best I can assemble the many bits brought up in the threads, the reasons
to put it in a separate repo was:
- The plugin was thought to be useful to multiple clients of bro and kafka,
including Storm and Spark, as well as Metron.
- Originally the bro project was maintaining bro plugins and it was thought
they might adopt this one.
- Bro then formalized their plugin framework BUT dumped all plugins out of
their sphere of maintenance.
- As of 3/31/2017, Nick said that “the [bro] package mechanism requires
that a package live within its own repo”. Jon said “the bro packages model
doesn't allow colocation with anything else.”
- So on 3/31 Jon opened METRON-813, and the metron-bro-plugin-kafka repo
was created a few days later. But Metron wasn’t actually modified to remove
the metron-sensors/bro-plugin-kafka/ subdirectory and start using the
plugin from the metron-bro-plugin-kafka repo until Nov 12 – two weeks ago!
– with https://issues.apache.org/jira/browse/METRON-1309 .
- Presumably the need to have metron-bro-plugin-kafka in a separate repo
remain valid, if the bro plugin mechanism is used. But obviously there are
(non-conforming) ways to build the plugin as part of metron, and install it
in a way that works.

Q1. I think that last statement needs some explanation. Nick or Jon, can
you please expand on it, especially wrt how the end user installs the
plugin once the plugin is built the two different ways? And whether it’s
still valuable to have a separate repo for the plugin?

Nick suggests using a submodule approach to managing the bro plugin, for
Metron versioning purposes. As I understand it, this would continue the
existence of the metron-bro-plugin-kafka repo, but copy it into the metron
code tree for building, versioning, and release purposes. Git submodules
are documented here: https://git-scm.com/book/en/v2/Git-Tools-Submodules .
We would use the submodule capability to clone the metron-bro-plugin-kafka
source code into a subdirectory of Metron at the time one clones the metron
repo. It would then be released with Metron as part of the source code
release for a given version of Metron. Part of the way submodules are
managed, is that git stores the SHA1 hash of the submodule into a file
named .gitmodules, which in turn gets saved when you do a git push. So
indeed submodules would ensure that everyone cloning a given version of
metron would get the expected “version” (sha, ac

Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-05 Thread Otto Fowler
Which of the indexing options are you changing the batch size for?  HDFS?
Elasticsearch?  Both?

Can you give an example?



On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com) wrote:

No specific error in the logs. I haven't enabled debug/trace, though.

On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
wrote:

> My first thought is what are the errors when you get a high error rate?
>
>
> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Any thoughts?
>
> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
> wrote:
>
> > Hi,
> >
> > We have noticed recently that no matter what batch size we use for Metron
> > indexing feeds, as long as we start using different batch size for
> > different Metron feeds, indexing topology throughput will start dropping
> > due to the high error rate! So I was wondering whether based on the
> current
> > indexing topology design, we have to choose the same batch size for all
> the
> > feeds or not. Otherwise, throughout will be dropped. I assume since it is
> > acceptable to use different batch sizes for different feeds, it is not
> > expected by design.
> >
> > Moreover, I have noticed in practice that even if we change the batch
> > size, it will not affect the messages that are already in enrichments or
> > indexing topics, and it will only affect the new messages that are coming
> > to the parser. Therefore, we need to let all the messages pass the
> indexing
> > topology so that we can change the batch size!
> >
> > It would be great if we can have more details regarding the design of
> this
> > section so we can understand our observations are based on the design or
> > some kind of bug.
> >
> > Regards,
> > Ali
> >
>
>
>
> --
> A.Nazemian
>
>


--
A.Nazemian


Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-05 Thread Otto Fowler
Where are you seeing the errors?  Screenshot?


On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
wrote:

Which of the indexing options are you changing the batch size for?  HDFS?
Elasticsearch?  Both?

Can you give an example?



On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com) wrote:

No specific error in the logs. I haven't enabled debug/trace, though.

On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
wrote:

> My first thought is what are the errors when you get a high error rate?
>
>
> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Any thoughts?
>
> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
> wrote:
>
> > Hi,
> >
> > We have noticed recently that no matter what batch size we use for Metron
> > indexing feeds, as long as we start using different batch size for
> > different Metron feeds, indexing topology throughput will start dropping
> > due to the high error rate! So I was wondering whether based on the
> current
> > indexing topology design, we have to choose the same batch size for all
> the
> > feeds or not. Otherwise, throughout will be dropped. I assume since it is
> > acceptable to use different batch sizes for different feeds, it is not
> > expected by design.
> >
> > Moreover, I have noticed in practice that even if we change the batch
> > size, it will not affect the messages that are already in enrichments or
> > indexing topics, and it will only affect the new messages that are coming
> > to the parser. Therefore, we need to let all the messages pass the
> indexing
> > topology so that we can change the batch size!
> >
> > It would be great if we can have more details regarding the design of
> this
> > section so we can understand our observations are based on the design or
> > some kind of bug.
> >
> > Regards,
> > Ali
> >
>
>
>
> --
> A.Nazemian
>
>


--
A.Nazemian


Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-06 Thread Otto Fowler
What do you see in the storm ui for the indexing topology?


On December 6, 2017 at 07:10:17, Ali Nazemian (alinazem...@gmail.com) wrote:

Both hdfs and Elasticsearch batch sizes. There is no error in the logs. It
mpacts topology error rate and cause almost 90% error rate on indexing
tuples.

On 6 Dec. 2017 00:20, "Otto Fowler"  wrote:

Where are you seeing the errors?  Screenshot?


On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
wrote:

Which of the indexing options are you changing the batch size for?  HDFS?
Elasticsearch?  Both?

Can you give an example?



On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com) wrote:

No specific error in the logs. I haven't enabled debug/trace, though.

On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
wrote:

> My first thought is what are the errors when you get a high error rate?
>
>
> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Any thoughts?
>
> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
> wrote:
>
> > Hi,
> >
> > We have noticed recently that no matter what batch size we use for Metron
> > indexing feeds, as long as we start using different batch size for
> > different Metron feeds, indexing topology throughput will start dropping
> > due to the high error rate! So I was wondering whether based on the
> current
> > indexing topology design, we have to choose the same batch size for all
> the
> > feeds or not. Otherwise, throughout will be dropped. I assume since it is
> > acceptable to use different batch sizes for different feeds, it is not
> > expected by design.
> >
> > Moreover, I have noticed in practice that even if we change the batch
> > size, it will not affect the messages that are already in enrichments or
> > indexing topics, and it will only affect the new messages that are coming
> > to the parser. Therefore, we need to let all the messages pass the
> indexing
> > topology so that we can change the batch size!
> >
> > It would be great if we can have more details regarding the design of
> this
> > section so we can understand our observations are based on the design or
> > some kind of bug.
> >
> > Regards,
> > Ali
> >
>
>
>
> --
> A.Nazemian
>
>


--
A.Nazemian


Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-06 Thread Otto Fowler
I have looked at it.

We maintain batch lists for each sensor which gather messages to index.
When we get a message that puts it over the batch size the messages are
flushed and written to the target.
There is also a timeout component, where the batch would be flushed based
on timeout.

While batch size checking occurs on a per sensor-message receipt basis,
each message, regardless of sensor will trigger a check of the batch
timeout for all the lists.

At least that is what I think I see.

Without understanding what the failures are for it is hard to see what the
issue is.

Do we have timing issues where all the lists are timing out all the time
causing some kind of cascading failure for example?
Does the number of sensors matter?  For example if only one sensor topology
is running with batch setup X, is everything fine?  Do failures start after
adding Nth additional sensor?

Hopefully someone else on the list may have an idea.
That code does not have any logging to speak of… well debug / trace logging
that would help here either.



On December 6, 2017 at 08:18:01, Ali Nazemian (alinazem...@gmail.com) wrote:

Everything looks normal except the high number of failed tuples. Do you
know how the indexing batch size works? Based on our observations it seems
it doesn't update the messages that are in enrichments and indexing topics.

On Thu, Dec 7, 2017 at 12:13 AM, Otto Fowler 
wrote:

> What do you see in the storm ui for the indexing topology?
>
>
> On December 6, 2017 at 07:10:17, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Both hdfs and Elasticsearch batch sizes. There is no error in the logs. It
> mpacts topology error rate and cause almost 90% error rate on indexing
> tuples.
>
> On 6 Dec. 2017 00:20, "Otto Fowler"  wrote:
>
> Where are you seeing the errors?  Screenshot?
>
>
> On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> Which of the indexing options are you changing the batch size for?  HDFS?
> Elasticsearch?  Both?
>
> Can you give an example?
>
>
>
> On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> No specific error in the logs. I haven't enabled debug/trace, though.
>
> On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
> wrote:
>
>> My first thought is what are the errors when you get a high error rate?
>>
>>
>> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> Any thoughts?
>>
>> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
>> wrote:
>>
>> > Hi,
>> >
>> > We have noticed recently that no matter what batch size we use for
>> Metron
>> > indexing feeds, as long as we start using different batch size for
>> > different Metron feeds, indexing topology throughput will start dropping
>> > due to the high error rate! So I was wondering whether based on the
>> current
>> > indexing topology design, we have to choose the same batch size for all
>> the
>> > feeds or not. Otherwise, throughout will be dropped. I assume since it
>> is
>> > acceptable to use different batch sizes for different feeds, it is not
>> > expected by design.
>> >
>> > Moreover, I have noticed in practice that even if we change the batch
>> > size, it will not affect the messages that are already in enrichments or
>> > indexing topics, and it will only affect the new messages that are
>> coming
>> > to the parser. Therefore, we need to let all the messages pass the
>> indexing
>> > topology so that we can change the batch size!
>> >
>> > It would be great if we can have more details regarding the design of
>> this
>> > section so we can understand our observations are based on the design or
>> > some kind of bug.
>> >
>> > Regards,
>> > Ali
>> >
>>
>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>
>
>


--
A.Nazemian


Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-06 Thread Otto Fowler
Sorry,
We flush for timeouts on every storm ‘tick’ message, not on every message.



On December 6, 2017 at 08:29:51, Otto Fowler (ottobackwa...@gmail.com)
wrote:

I have looked at it.

We maintain batch lists for each sensor which gather messages to index.
When we get a message that puts it over the batch size the messages are
flushed and written to the target.
There is also a timeout component, where the batch would be flushed based
on timeout.

While batch size checking occurs on a per sensor-message receipt basis,
each message, regardless of sensor will trigger a check of the batch
timeout for all the lists.

At least that is what I think I see.

Without understanding what the failures are for it is hard to see what the
issue is.

Do we have timing issues where all the lists are timing out all the time
causing some kind of cascading failure for example?
Does the number of sensors matter?  For example if only one sensor topology
is running with batch setup X, is everything fine?  Do failures start after
adding Nth additional sensor?

Hopefully someone else on the list may have an idea.
That code does not have any logging to speak of… well debug / trace logging
that would help here either.



On December 6, 2017 at 08:18:01, Ali Nazemian (alinazem...@gmail.com) wrote:

Everything looks normal except the high number of failed tuples. Do you
know how the indexing batch size works? Based on our observations it seems
it doesn't update the messages that are in enrichments and indexing topics.

On Thu, Dec 7, 2017 at 12:13 AM, Otto Fowler 
wrote:

> What do you see in the storm ui for the indexing topology?
>
>
> On December 6, 2017 at 07:10:17, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Both hdfs and Elasticsearch batch sizes. There is no error in the logs. It
> mpacts topology error rate and cause almost 90% error rate on indexing
> tuples.
>
> On 6 Dec. 2017 00:20, "Otto Fowler"  wrote:
>
> Where are you seeing the errors?  Screenshot?
>
>
> On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> Which of the indexing options are you changing the batch size for?  HDFS?
> Elasticsearch?  Both?
>
> Can you give an example?
>
>
>
> On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> No specific error in the logs. I haven't enabled debug/trace, though.
>
> On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
> wrote:
>
>> My first thought is what are the errors when you get a high error rate?
>>
>>
>> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> Any thoughts?
>>
>> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
>> wrote:
>>
>> > Hi,
>> >
>> > We have noticed recently that no matter what batch size we use for
>> Metron
>> > indexing feeds, as long as we start using different batch size for
>> > different Metron feeds, indexing topology throughput will start dropping
>> > due to the high error rate! So I was wondering whether based on the
>> current
>> > indexing topology design, we have to choose the same batch size for all
>> the
>> > feeds or not. Otherwise, throughout will be dropped. I assume since it
>> is
>> > acceptable to use different batch sizes for different feeds, it is not
>> > expected by design.
>> >
>> > Moreover, I have noticed in practice that even if we change the batch
>> > size, it will not affect the messages that are already in enrichments or
>> > indexing topics, and it will only affect the new messages that are
>> coming
>> > to the parser. Therefore, we need to let all the messages pass the
>> indexing
>> > topology so that we can change the batch size!
>> >
>> > It would be great if we can have more details regarding the design of
>> this
>> > section so we can understand our observations are based on the design or
>> > some kind of bug.
>> >
>> > Regards,
>> > Ali
>> >
>>
>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>
>
>


--
A.Nazemian


Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-07 Thread Otto Fowler
We use TreeCache
<https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/cache/TreeCache.html>
.

When the configuration is updated in zookeeper, the configuration object in
the bolt is updated. This configuration is read on each message, so I think
from what I see new configurations should get picked up for the next
message.

I could be wrong though.




On December 7, 2017 at 06:47:15, Ali Nazemian (alinazem...@gmail.com) wrote:

Thank you very much. Unfortunately, reproducing all the situations are very
costly for us at this moment. We are kind of avoiding to hit that issue by
using the same batch size for all the feeds. Hopefully, with the new PR
Casey provided for the segregation of ES and HDFS, it will be very much
clear to tune them.

Do you know how the synchronization of indexing config will happen with the
topology? Does the topology gets synchronised by pulling the last configs
from ZK based on some background mechanism or it is based on an update
trigger? As I mentioned, based on our observation it looks like the
synchronization doesn't work until all the old messages in Kafka queue get
processed based on the old indexing configs.

Regards,
Ali

On Thu, Dec 7, 2017 at 12:33 AM, Otto Fowler 
wrote:

> Sorry,
> We flush for timeouts on every storm ‘tick’ message, not on every message.
>
>
>
> On December 6, 2017 at 08:29:51, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> I have looked at it.
>
> We maintain batch lists for each sensor which gather messages to index.
> When we get a message that puts it over the batch size the messages are
> flushed and written to the target.
> There is also a timeout component, where the batch would be flushed based
> on timeout.
>
> While batch size checking occurs on a per sensor-message receipt basis,
> each message, regardless of sensor will trigger a check of the batch
> timeout for all the lists.
>
> At least that is what I think I see.
>
> Without understanding what the failures are for it is hard to see what the
> issue is.
>
> Do we have timing issues where all the lists are timing out all the time
> causing some kind of cascading failure for example?
> Does the number of sensors matter?  For example if only one sensor
> topology is running with batch setup X, is everything fine?  Do failures
> start after adding Nth additional sensor?
>
> Hopefully someone else on the list may have an idea.
> That code does not have any logging to speak of… well debug / trace
> logging that would help here either.
>
>
>
> On December 6, 2017 at 08:18:01, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Everything looks normal except the high number of failed tuples. Do you
> know how the indexing batch size works? Based on our observations it seems
> it doesn't update the messages that are in enrichments and indexing topics.
>
> On Thu, Dec 7, 2017 at 12:13 AM, Otto Fowler 
> wrote:
>
>> What do you see in the storm ui for the indexing topology?
>>
>>
>> On December 6, 2017 at 07:10:17, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> Both hdfs and Elasticsearch batch sizes. There is no error in the logs.
>> It mpacts topology error rate and cause almost 90% error rate on indexing
>> tuples.
>>
>> On 6 Dec. 2017 00:20, "Otto Fowler"  wrote:
>>
>> Where are you seeing the errors?  Screenshot?
>>
>>
>> On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
>> wrote:
>>
>> Which of the indexing options are you changing the batch size for?
>> HDFS?  Elasticsearch?  Both?
>>
>> Can you give an example?
>>
>>
>>
>> On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> No specific error in the logs. I haven't enabled debug/trace, though.
>>
>> On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
>> wrote:
>>
>>> My first thought is what are the errors when you get a high error rate?
>>>
>>>
>>> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
>>> wrote:
>>>
>>> Any thoughts?
>>>
>>> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > We have noticed recently that no matter what batch size we use for
>>> Metron
>>> > indexing feeds, as long as we start using different batch size for
>>> > different Metron feeds, indexing topology throughput will start
>>> dropping
>>> > due to the high error rate! So I was wondering whether based on the
>>> current
>>> > indexing topology design, we have to

Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-07 Thread Otto Fowler
I’m trying to think of what information would be needed to look at this the
next time it happens.  If someone wanted to reproduce this.
Examples of a set of configurations that do not work maybe.
The metrics on the parser topologies involved ( how many parsers, message
rate ).
The platform_info.sh output for the machines running the indexing topology.
Any load information for those machines as well…..

Anyone think of anything else?



On December 7, 2017 at 07:45:26, Otto Fowler (ottobackwa...@gmail.com)
wrote:

We use TreeCache
<https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/cache/TreeCache.html>
.

When the configuration is updated in zookeeper, the configuration object in
the bolt is updated. This configuration is read on each message, so I think
from what I see new configurations should get picked up for the next
message.

I could be wrong though.



On December 7, 2017 at 06:47:15, Ali Nazemian (alinazem...@gmail.com) wrote:

Thank you very much. Unfortunately, reproducing all the situations are very
costly for us at this moment. We are kind of avoiding to hit that issue by
using the same batch size for all the feeds. Hopefully, with the new PR
Casey provided for the segregation of ES and HDFS, it will be very much
clear to tune them.

Do you know how the synchronization of indexing config will happen with the
topology? Does the topology gets synchronised by pulling the last configs
from ZK based on some background mechanism or it is based on an update
trigger? As I mentioned, based on our observation it looks like the
synchronization doesn't work until all the old messages in Kafka queue get
processed based on the old indexing configs.

Regards,
Ali

On Thu, Dec 7, 2017 at 12:33 AM, Otto Fowler 
wrote:

> Sorry,
> We flush for timeouts on every storm ‘tick’ message, not on every message.
>
>
>
> On December 6, 2017 at 08:29:51, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> I have looked at it.
>
> We maintain batch lists for each sensor which gather messages to index.
> When we get a message that puts it over the batch size the messages are
> flushed and written to the target.
> There is also a timeout component, where the batch would be flushed based
> on timeout.
>
> While batch size checking occurs on a per sensor-message receipt basis,
> each message, regardless of sensor will trigger a check of the batch
> timeout for all the lists.
>
> At least that is what I think I see.
>
> Without understanding what the failures are for it is hard to see what the
> issue is.
>
> Do we have timing issues where all the lists are timing out all the time
> causing some kind of cascading failure for example?
> Does the number of sensors matter?  For example if only one sensor
> topology is running with batch setup X, is everything fine?  Do failures
> start after adding Nth additional sensor?
>
> Hopefully someone else on the list may have an idea.
> That code does not have any logging to speak of… well debug / trace
> logging that would help here either.
>
>
>
> On December 6, 2017 at 08:18:01, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Everything looks normal except the high number of failed tuples. Do you
> know how the indexing batch size works? Based on our observations it seems
> it doesn't update the messages that are in enrichments and indexing topics.
>
> On Thu, Dec 7, 2017 at 12:13 AM, Otto Fowler 
> wrote:
>
>> What do you see in the storm ui for the indexing topology?
>>
>>
>> On December 6, 2017 at 07:10:17, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> Both hdfs and Elasticsearch batch sizes. There is no error in the logs.
>> It mpacts topology error rate and cause almost 90% error rate on indexing
>> tuples.
>>
>> On 6 Dec. 2017 00:20, "Otto Fowler"  wrote:
>>
>> Where are you seeing the errors?  Screenshot?
>>
>>
>> On December 5, 2017 at 08:03:46, Otto Fowler (ottobackwa...@gmail.com)
>> wrote:
>>
>> Which of the indexing options are you changing the batch size for?
>> HDFS?  Elasticsearch?  Both?
>>
>> Can you give an example?
>>
>>
>>
>> On December 5, 2017 at 02:09:29, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>>
>> No specific error in the logs. I haven't enabled debug/trace, though.
>>
>> On Tue, Dec 5, 2017 at 11:54 AM, Otto Fowler 
>> wrote:
>>
>>> My first thought is what are the errors when you get a high error rate?
>>>
>>>
>>> On December 4, 2017 at 19:34:29, Ali Nazemian (alinazem...@gmail.com)
>>> wrote:
>>>
>>> Any thoughts?
>>>
>>> On Sun, Dec 3, 2017 at 11:27 PM, Ali Nazemian 
>>&g

Re: New PMC members

2017-12-07 Thread Otto Fowler
Boy are those Impala guys in for a surprise


On December 7, 2017 at 10:06:19, Casey Stella (ceste...@gmail.com) wrote:

The Project Management Committee (PMC) for Apache Impala has invited Otto
Fowler, Michael Miklavcic and Justin Leet to become a PMC member and we
are pleased to announce that they have accepted.

Congratulations and welcome!


Re: [MENTORS][DISCUSS] Release Procedure + 'Kafka Plugin for Bro'

2017-12-07 Thread Otto Fowler
You and Matt should coordinate sending a mail to the dev list with a heads
up, starting and done.

I think you mean that:

Between X  and YYY, if you do a fetch apache && checkout -b foo
apache/master and then
do vagrant up with the sensors enabled it will fail.  Right?



On December 7, 2017 at 15:09:52, zeo...@gmail.com (zeo...@gmail.com) wrote:

FYI to be uber clear about the effects of what I'm doing, spinning up
full-dev only when including the sensors will be broken on the bro plugin
install step between when I push the changes, and when mattf pushes the 0.1
tag to apache/metron-bro-plugin-kafka.

Jon

On Thu, Dec 7, 2017 at 3:05 PM zeo...@gmail.com  wrote:

> Sounds good. Yes Matt, I will handle my parts now. Thanks everyone
>
> Jon
>
> On Thu, Dec 7, 2017 at 2:32 PM Matt Foley  wrote:
>
>> I can start the release process tonight.
>>
>>
>>
>> Jon, you mentioned you want to commit
>>
>> > https://github.com/apache/metron/pull/847 and
>> > https://github.com/apache/metron-bro-plugin-kafka/pull/4
>>
>> before the release. Is it convenient for you to do so today?
>>
>>
>>
>> Thanks,
>>
>> --Matt
>>
>>
>>
>> From: Nick Allen 
>> Date: Thursday, December 7, 2017 at 10:13 AM
>> To: "dev@metron.apache.org" 
>> Cc: Matt Foley 
>> Subject: Re: [MENTORS][DISCUSS] Release Procedure + 'Kafka Plugin for
Bro'
>>
>>
>>
>> I am more interested in getting a release cut. If me moving to the (a)
>> camp gets us to consensus and cuts a release faster, then I'll do it.
>> Let's get this release train moving.
>>
>>
>>
>> On Thu, Dec 7, 2017 at 11:44 AM, Justin Leet 
>> wrote:
>>
>> Do we have any further discussion on this? Pardon me if I misstate
>> anyone's position, but it seems like we have a couple people (Otto and
Jon
>> and slightly Matt?) in favor of (a), Nick in favor of (b), and
presumably
>> a
>> section of people like myself without a particular horse in the race.
>>
>> It seems like we need to come to some sort of consensus so that we can
get
>> the release bus moving again, and right now it seems like (a) is
gathering
>> more explicit support. Do we have a compelling reason to not do (a)? To
>> be
>> honest, my main worry is more "If we do (a) are we going to be miserable
>> if
>> we need to iterate or adjust?" I'm not seeing anything that suggests
>> anything too terrible, so unless we see some more discussion, I suggest
we
>> move forward with (a).
>>
>>
>> On Mon, Dec 4, 2017 at 9:34 PM, zeo...@gmail.com 
>> wrote:
>>
>> > I would prefer a, but I was initially thinking of doing the plugin
first
>> > and then get in the two PRs out to use this new tag, which are already
>> +1'd
>> > and just waiting on this conversation. For reference,
>> > https://github.com/apache/metron/pull/847 and
>> > https://github.com/apache/metron-bro-plugin-kafka/pull/4
>> >
>> > Jon
>> >
>> > On Mon, Dec 4, 2017, 20:54 Otto Fowler 
wrote:
>> >
>> > > It seems to me, as I believe I have stated before that a) feels like
>> the
>> > > proper way to handle this. It is how I have seen other projects like
>> > NiFi
>> > > handle things as well.
>> > >
>> > >
>> > >
>> > > On December 4, 2017 at 17:14:41, Matt Foley (ma...@apache.org)
wrote:
>> > >
>> > > Okay, looking at this from the perspective of making a release:
>> > >
>> > >
>> > >
>> > > We have two choices:
>> > >
>> > > a) I can simply make a 0.1 (or 1.0 or 0.4.2) release of
>> > > metron-bro-plugin-kafka, at the same time and using the same process
>> > > (modulo the necessary) as Metron. This is dirt simple.
>> > >
>> > > b) I or someone needs to:
>> > >
>> > > - open a jira,
>> > >
>> > > - add the submodule to the Metron code tree,
>> > >
>> > > - possibly (optionally) add build mechanism to the maven poms, and
>> > >
>> > > - document as much as we think appropriate regarding what it is,
>> how
>> > to
>> > > build it, and how to update it,
>> > >
>> > > and commit that before the 0.4.2 release.
>> > >
>> > >
>> > >
>> > > What is the will of the community?
>> > >
>> > > Thanks,
>> > &

Anyone else having problems building rpms?

2017-12-08 Thread Otto Fowler
I’m getting errors building rpms, but not building the rest of the product
( so vagrant up makes it to the rpm phase ).
When I just build the rpms from the cli I get:

+ npm install
--prefix=/root/BUILDROOT/metron-0.4.2-root/usr/metron/0.4.2/web/expressjs
--only=production
npm ERR! Linux 4.9.49-moby
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "install"
"--prefix=/root/BUILDROOT/metron-0.4.2-root/usr/metron/0.4.2/web/expressjs"
"--only=production"
npm ERR! node v6.11.3
npm ERR! npm  v3.10.10
npm ERR! code ECONNRESET

npm ERR! network tunneling socket could not be established, cause=connect
EINVAL 0.0.246.98:80 - Local (0.0.0.0:0)
npm ERR! network This is most likely not a problem with npm itself
npm ERR! network and is related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network
settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! Please include the following file with any support request:
npm ERR! /root/BUILD/npm-debug.log

error: Bad exit status from /var/tmp/rpm-tmp.lAmezg (%install)


Anyone else see this?

O


Re: Anyone else having problems building rpms?

2017-12-08 Thread Otto Fowler
I seem to be beyond the issue now.  But the question I have is this… do we
required internet connectivity to build rpms now?


On December 8, 2017 at 07:14:41, Otto Fowler (ottobackwa...@gmail.com)
wrote:

I’m getting errors building rpms, but not building the rest of the product
( so vagrant up makes it to the rpm phase ).
When I just build the rpms from the cli I get:

+ npm install
--prefix=/root/BUILDROOT/metron-0.4.2-root/usr/metron/0.4.2/web/expressjs
--only=production
npm ERR! Linux 4.9.49-moby
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "install"
"--prefix=/root/BUILDROOT/metron-0.4.2-root/usr/metron/0.4.2/web/expressjs"
"--only=production"
npm ERR! node v6.11.3
npm ERR! npm  v3.10.10
npm ERR! code ECONNRESET

npm ERR! network tunneling socket could not be established, cause=connect
EINVAL 0.0.246.98:80 - Local (0.0.0.0:0)
npm ERR! network This is most likely not a problem with npm itself
npm ERR! network and is related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network
settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! Please include the following file with any support request:
npm ERR! /root/BUILD/npm-debug.log

error: Bad exit status from /var/tmp/rpm-tmp.lAmezg (%install)


Anyone else see this?

O


Re: [DISCUSS] Upcoming Release

2017-12-08 Thread Otto Fowler
che/metron#668
> > METRON-1190 Fix Meta Alert Type handling in calculation of
> > scores (justinleet) closes apache/metron#763
> > METRON-1187 Indexing/Profiler Kafka ACL Groups Not Setup
> > Correctly (nickwallen) closes apache/metron#759
> > METRON-1185: Stellar REPL does not work on a kerberized
> > cluster when calling functions interacting with HBase closes
> > apache/incubator-metron#755
> > METRON-1186: Profiler Functions use classutils from shaded
> > storm closes apache/incubator-metron#758
> > METRON-1173: Fix pointers to old stellar docs closes
> > apache/incubator-metron#746
> > METRON-1179: Make STATS_ADD to take a list closes
> > apache/incubator-metron#750
> > METRON-1180: Make Stellar Shell accept zookeeper quorum as a
> > CSV list and not require a port closes apache/incubator-metron#751
> > METRON-1183 Improve KDC Setup Instructions (nickwallen)
> closes
> > apache/metron#753
> > METRON-1177 Stale running topologies seen post-kerberization
> > and cause exceptions (nickwallen) closes apache/metron#748
> > METRON-1158 Build backend for grouping alerts into meta
> alerts
> > (justinleet) closes apache/metron#734
> > METRON-1146: Add ability to parse JSON string into JSONObject
> > for stellar closes apache/incubator-metron#727
> > METRON-1176 REST: HDFS Service should support setting
> > permissions on files when writing (ottobackwards) closes
> apache/metron#749
> > METRON-1114 Add group by capabilities to search REST endpoint
> > (merrimanr) closes apache/metron#702
> > METRON-1167 Define Session Specific Global Configuration
> > Values in the REPL (nickwallen) closes apache/metron#740
> > METRON-1171: Better validation for the SUBSTRING stellar
> > function closes apache/incubator-metron#745
> >
> >
> >
> > On 11/17/17, 11:59 AM, "Nick Allen"  wrote:
> >
> > I just wanted to send an update on where we are at. We've
> > gotten a lot
> > done here recently as you can see below.
> >
> > ✓ DONE (1) First, METRON-1289 needs to go in. This one was
> > a fairly big
> > effort and I am hearing that we are pretty close.
> >
> > ✓ DONE (2) METRON-1294 fixes an issue in how field types
> are
> > looked-up.
> >
> > ✓ DONE (3) METRON-1290 is next. While this may have been
> > fixed in
> > M-1289, there may be some test cases we want from this PR.
> >
> > ✓ DONE (4) METRON-1301 addresses a problem with the sorting
> > logic.
> >
> > ✓ DONE (5) METRON-1291 fixes an issue with escalation of
> > metaalerts.
> >
> > (6) That leads us to Raghu's UI work in METRON-1252. This
> > introduces the
> > UI bits that depend on all the previous backend work.
> >
> > (7) At this point, we should have our best effort at
> running
> > Metaalerts
> > on Elasticsearch 2.x. I propose that we cut a release here.
> >
> > (8) After we cut the release, we can introduce the work for
> > ES 5.x in
> > METRON-939. I know we will need lots of help testing and
> > reviewing this
> > one.
> >
> >
> >
> > We also have an outstanding question that needs resolved
> > BEFORE we
> > release. We need to come to a consensus on how to release
> > having moved our
> > Bro Plugin to a separate repo. I don't think we've heard
> from
> > everyone on
> > this. I'd urge everyone to chime in so we can choose a path
> > forward.
> >
> > If anyone is totally confused in regards to that discussion,
> I
> > can try and
> > send an options summary again as a separate discuss thread.
> > The original
> > chain was somewhere around here [1].
> >
> > [1]
> > https://lists.apache.org/thread.html/
> > 54a4474881b97e559df24728b3a0e923a58345a282451085eef832ef@%
> > 3Cdev.metron.apache.org%3E
> >
> >
> >
> > On Wed, Nov 15, 2017 at 10:04 AM, Nick Allen <
> > n...@nickallen.org> wrote:
> >
> > > Hi Guys -
> > >
> > > I want to follow-up on this discussion. It sounds like
> most
> > people are in
> > > agreement with the general approach.
> > >
> > > A lot of people have been working hard on Metaalerts and
> > Elasticsearch. I
> > > have checked-in with those doing the heavy lifting and have
> > compiled a more
> > > detailed plan based on where we are at now. To the best of
> > my knowledge
> > > here is the plan of attack for finishing out this effort.
> > >
> > > (1) First, METRON-12

Re: [DISCUSS] Upcoming Release

2017-12-09 Thread Otto Fowler
So RC2 then?



On December 8, 2017 at 20:43:21, Matt Foley (mfo...@hortonworks.com) wrote:

Hah, here it is: https://github.com/apache/metron/pull/743
“This problem seems to only reproduce when one unrolls a tarball rather
than cloning from github.”

Heh, the exclusion at
https://github.com/apache/metron/blob/master/pom.xml#L351 is still there,
but the hashcode in the bundle.css file name has changed from
a0b6b99c10d9a13dc67e to f56deed131e58bd7ee04. Sigh. Did the version of Font
Awesome fonts change?


On 12/8/17, 5:26 PM, "Matt Foley"  wrote:

I remember having trouble with this bundle.css file on the last release,
but I can’t remember what we did about it. Anybody?

On 12/8/17, 1:41 PM, "Otto Fowler"  wrote:

Steps

- Downloaded tar.gz’s, asc files and KEYS
- Verified signing of both tar.gz’s
- searched for rouge 0.4.1 entries
- verified the main pom.xml
- built :

mvn clean && time mvn -q -T 2C -DskipTests install && time mvn -q -T
2C surefire:test@unit-tests && time mvn -q
surefire:test@integration-tests && time mvn -q test --projects
metron-interface/metron-config && time build_utils/verify_licenses.sh

Found rat error:


*
Summary
---
Generated at: 2017-12-08T16:33:27-05:00

Notes: 3
Binaries: 193
Archives: 0
Standards: 75

Apache Licensed: 74
Generated Documents: 0

JavaDocs are generated, thus a license header is optional.
Generated files do not require license headers.

1 Unknown Licenses

*

Files with unapproved licenses:

/Users/batman/tmp/release_ver/apache-metron-0.4.2-rc1/metron-interface/metron-alerts/dist/styles.f56deed131e58bd7ee04.bundle.css


*





*
Summary
---
Generated at: 2017-12-08T16:33:27-05:00

Notes: 3
Binaries: 193
Archives: 0
Standards: 75

Apache Licensed: 74
Generated Documents: 0

JavaDocs are generated, thus a license header is optional.
Generated files do not require license headers.

1 Unknown Licenses

*

Files with unapproved licenses:


/Users/ottofowler/tmp/release_ver/apache-metron-0.4.2-rc1/metron-interface/metron-alerts/dist/styles.f56deed131e58bd7ee04.bundle.css


*



On December 8, 2017 at 04:34:24, Matt Foley (ma...@apache.org) wrote:

Colleagues,
I’ve posted Metron-0.4.2-RC1 and Metron-bro-plugin-kafka-0.1 to
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC1/

Given the complexity of this RC, I’d appreciate if a couple people would be
willing to kick the tires before we put it up for a vote.

I will myself be going thru the Verify Build process this weekend, as I
won’t be able to do it Friday.

Thanks,
--Matt


On 12/4/17, 2:05 PM, "zeo...@gmail.com"  wrote:

Can we resolve the conversation regarding the second repo? I was waiting
to get more input/preferences from people There's also a documentation
update that fixes a few broken Stellar docs that already has aa +1, I just
need to merge it.

Jon

On Mon, Dec 4, 2017, 17:01 Casey Stella  wrote:

> I would be in favor of a release at this point.
>
> On Mon, Dec 4, 2017 at 4:57 PM, Matt Foley  wrote:
>
> > Hey all,
> > I see METRON-1252 was resolved over the weekend. Shall I go ahead and
> > start the process with 0.4.2 release?
> > Does anyone have any commits they feel strongly should go in before
0.4.2
> > is done, or are we ready to call it good?
> >
> > I believe there is consensus the 0.4.2 release should include a release
> of
> > the current state of the metron-bro-plugin-kafka. I will continue the
> > discussion in that thread as to the process for accomplishing that, but
> > plan on it happening.
> >
> > Regards,
> > --Matt
> >
> > On 11/26/17, 6:26 PM, "Matt Foley"  wrote:
> >
> > Hope everyone (at least in the U.S.) had a great Thanksgiving
> holiday.
> > Regarding status of the release effort, still pending METRON-1252, so
> > not making the release branch yet.
> >
> > Regards,
> > --Matt
> >
> > On 11/17/17, 1:32 PM, "Matt Foley"  wrote:
> >
> > (With release manager hat on)
> >
> > The community has proposed a release of Metron in the near
> future,
> > focusing on Meta-alerts running in Elasticsearch.
> > Congrats on getting so many of the below already done. At this
> > point, only METRON-1252, and the discussion of how to handle joint
> release
> > of the Metron bro plugin, remain as gating items for the release. I
> > project these will be resolved next week, so let’s propose the
following:
> >
> > Sometime next week, after the last bits are don

script for verification of metron release canidates

2017-12-11 Thread Otto Fowler
I have written a script:
https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check.sh
.
I think it might be useful.  If any of you could give it a look over and
perhaps try it, I would appreciate it.


Re: script for verification of metron release canidates

2017-12-11 Thread Otto Fowler
Note:  currently the rat check on RC1 doesn’t work, so the build will fail.

I still have some work to do for catching errors.



On December 11, 2017 at 09:18:00, Otto Fowler (ottobackwa...@gmail.com)
wrote:

I have written a script:
https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check.sh
.
I think it might be useful.  If any of you could give it a look over and
perhaps try it, I would appreciate it.


Re: script for verification of metron release canidates

2017-12-11 Thread Otto Fowler
Oops, sorry, I kept working on it ¯\_(ツ)_/¯


On December 11, 2017 at 12:58:28, Laurens Vets (laur...@daemon.be) wrote:

On 2017-12-11 06:18, Otto Fowler wrote:
> I have written a script:
>
https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check.sh
> .
> I think it might be useful. If any of you could give it a look over
> and
> perhaps try it, I would appreciate it.

https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check
:D


[DISCUSS] Community Meetings

2017-12-11 Thread Otto Fowler
I think that we all want to have regular community meetings.  We may be
better able to keep to a regular schedule with these meetings if we spread
out the responsibility for them from James and Casey, both of whom have a
lot on their plate already.

I would be willing to coordinate and run the meetings, and would welcome
anyone else who wants to help when they can.

The only issue for me is I do not have a web-ex account that I can use to
hold the meeting.  So I’ll need some recommendations for a suitable
alternative.  I have not been able to find an Apache Friendly alternative,
in the same way that Atlassian is apache friendly.


So - from what I can see we need to:

- Talk through who is going to do it
- How are we going to host it
- When are we going to do it

Anything else?

ottO


Re: [DISCUSS] Community Meetings

2017-12-11 Thread Otto Fowler
This looks like it *might* work, but no recording, and 40mins?

https://zoom.us/pricing



On December 11, 2017 at 20:37:00, zeo...@gmail.com (zeo...@gmail.com) wrote:

I think this is a great idea. Hangouts works well but last I checked has a
user # limitation. I don't have any other good suggestions, sorry, but I'm
in to attend.

Jon

On Mon, Dec 11, 2017, 16:42 Otto Fowler  wrote:

> I think that we all want to have regular community meetings. We may be
> better able to keep to a regular schedule with these meetings if we
spread
> out the responsibility for them from James and Casey, both of whom have a
> lot on their plate already.
>
> I would be willing to coordinate and run the meetings, and would welcome
> anyone else who wants to help when they can.
>
> The only issue for me is I do not have a web-ex account that I can use to
> hold the meeting. So I’ll need some recommendations for a suitable
> alternative. I have not been able to find an Apache Friendly alternative,
> in the same way that Atlassian is apache friendly.
>
>
> So - from what I can see we need to:
>
> - Talk through who is going to do it
> - How are we going to host it
> - When are we going to do it
>
> Anything else?
>
> ottO
>
-- 

Jon


Re: [DISCUSS] Community Meetings

2017-12-12 Thread Otto Fowler
Thanks!  I think I’d like something hosted though.


On December 12, 2017 at 11:18:52, Ahmed Shah (ahmeds...@cmail.carleton.ca)
wrote:

Hello,

wrt "- How are we going to host it"...

I've used BigBlueButton as an end user at our University.

It is LGPL open source.

https://bigbluebutton.org/
https://bigbluebutton.org/developers/


-Ahmed

___
Ahmed Shah (PMP, M. Eng.)
Cybersecurity Analyst & Developer
GCR - Cybersecurity Operations Center
Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>


________
From: Otto Fowler 
Sent: December 11, 2017 4:41 PM
To: dev@metron.apache.org
Subject: [DISCUSS] Community Meetings

I think that we all want to have regular community meetings. We may be
better able to keep to a regular schedule with these meetings if we spread
out the responsibility for them from James and Casey, both of whom have a
lot on their plate already.

I would be willing to coordinate and run the meetings, and would welcome
anyone else who wants to help when they can.

The only issue for me is I do not have a web-ex account that I can use to
hold the meeting. So I’ll need some recommendations for a suitable
alternative. I have not been able to find an Apache Friendly alternative,
in the same way that Atlassian is apache friendly.


So - from what I can see we need to:

- Talk through who is going to do it
- How are we going to host it
- When are we going to do it

Anything else?

ottO


Re: [DISCUSS] Community Meetings

2017-12-12 Thread Otto Fowler
Excellent, do you have the > 40 min + record option?


On December 12, 2017 at 13:19:55, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Happy to volunteer a zoom room. That seems to have worked for most in the
past.

Simon

> On 12 Dec 2017, at 18:09, Otto Fowler  wrote:
>
> Thanks! I think I’d like something hosted though.
>
>
> On December 12, 2017 at 11:18:52, Ahmed Shah (ahmeds...@cmail.carleton.ca)

> wrote:
>
> Hello,
>
> wrt "- How are we going to host it"...
>
> I've used BigBlueButton as an end user at our University.
>
> It is LGPL open source.
>
> https://bigbluebutton.org/
> https://bigbluebutton.org/developers/
>
>
> -Ahmed
>
> ___
> Ahmed Shah (PMP, M. Eng.)
> Cybersecurity Analyst & Developer
> GCR - Cybersecurity Operations Center
> Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>
>
>
> 
> From: Otto Fowler 
> Sent: December 11, 2017 4:41 PM
> To: dev@metron.apache.org
> Subject: [DISCUSS] Community Meetings
>
> I think that we all want to have regular community meetings. We may be
> better able to keep to a regular schedule with these meetings if we
spread
> out the responsibility for them from James and Casey, both of whom have a
> lot on their plate already.
>
> I would be willing to coordinate and run the meetings, and would welcome
> anyone else who wants to help when they can.
>
> The only issue for me is I do not have a web-ex account that I can use to
> hold the meeting. So I’ll need some recommendations for a suitable
> alternative. I have not been able to find an Apache Friendly alternative,
> in the same way that Atlassian is apache friendly.
>
>
> So - from what I can see we need to:
>
> - Talk through who is going to do it
> - How are we going to host it
> - When are we going to do it
>
> Anything else?
>
> ottO


“777” Feature Branch Redux

2017-12-12 Thread Otto Fowler
I have created a new feature branch
feature/METRON-1211-extensions-parsers-gradual to track the parser
extension work
and have rebased https://github.com/apache/metron/pull/774 on to that.

I have also updated confluence and jira :
https://cwiki.apache.org/confluence/display/METRON/Metron+Extension+System+and+Parser+Extensions

1. Feature Branch still makes sense
2. Now that we are splitting it up it will work better than the “whole
boat” approach to the original branch
3. I don't want to have to worry about regression for short periods as I
re-implement

Cheers.


New Travis Build Image

2017-12-12 Thread Otto Fowler
The new Trusty image in travis that just landed breaks the build for one of
my PR’s.  I am not sure why, but I verified that going back to the old
image resolves the problem.
If you see a build fail, and it works locally or it is failing in a strange
way, be wary.

ottO


Re: [DISCUSS] Community Meetings

2017-12-13 Thread Otto Fowler
I am ok with just notes and no recording.


On December 13, 2017 at 04:37:20, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Good points Larry, we would need to get consent from everyone on the call
to record to properly comply with regulations in some countries. We would
definitely need someone to step up as note taker.

Something else to think about is intended audience. Previously we’ve had
meeting like this which have been very detailed Dev@ focussed (which is a
great thing) but have rather alienated participants in User@ land. We need
to make it clear what level we’re talking about to be inclusive.

Simon

> On 13 Dec 2017, at 00:44, larry mccay  wrote:
>
> Not sure about posting the recordings - you will need to check and make
> sure that doesn't violate anything.
>
> Just a friendly reminder...
> It is important that meetings have notes and a summary that is sent out
> describing topics to be decided on the mailing list.
> No decisions can be made in the community meeting itself - this gives
> others in other timezones and commitments review and voice in the
decisions.
>
> If it didn't happen on the mailing lists then it didn't happen. :)
>
>
> On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Yes, I do.
>>
>> I suspect the best bet will be to post recordings somewhere on the
>> apache.org <http://apache.org/> metron site.
>>
>> Simon
>>
>>> On 12 Dec 2017, at 18:36, Otto Fowler  wrote:
>>>
>>> Excellent, do you have the > 40 min + record option?
>>>
>>>
>>> On December 12, 2017 at 13:19:55, Simon Elliston Ball (
>>> si...@simonellistonball.com) wrote:
>>>
>>> Happy to volunteer a zoom room. That seems to have worked for most in
the
>>> past.
>>>
>>> Simon
>>>
>>>> On 12 Dec 2017, at 18:09, Otto Fowler  wrote:
>>>>
>>>> Thanks! I think I’d like something hosted though.
>>>>
>>>>
>>>> On December 12, 2017 at 11:18:52, Ahmed Shah (
>> ahmeds...@cmail.carleton.ca)
>>>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> wrt "- How are we going to host it"...
>>>>
>>>> I've used BigBlueButton as an end user at our University.
>>>>
>>>> It is LGPL open source.
>>>>
>>>> https://bigbluebutton.org/
>>>> https://bigbluebutton.org/developers/
>>>>
>>>>
>>>> -Ahmed
>>>>
>>>> ___
>>>> Ahmed Shah (PMP, M. Eng.)
>>>> Cybersecurity Analyst & Developer
>>>> GCR - Cybersecurity Operations Center
>>>> Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>
>>>>
>>>>
>>>> 
>>>> From: Otto Fowler 
>>>> Sent: December 11, 2017 4:41 PM
>>>> To: dev@metron.apache.org
>>>> Subject: [DISCUSS] Community Meetings
>>>>
>>>> I think that we all want to have regular community meetings. We may be
>>>> better able to keep to a regular schedule with these meetings if we
>>> spread
>>>> out the responsibility for them from James and Casey, both of whom
have
>> a
>>>> lot on their plate already.
>>>>
>>>> I would be willing to coordinate and run the meetings, and would
welcome
>>>> anyone else who wants to help when they can.
>>>>
>>>> The only issue for me is I do not have a web-ex account that I can use
>> to
>>>> hold the meeting. So I’ll need some recommendations for a suitable
>>>> alternative. I have not been able to find an Apache Friendly
>> alternative,
>>>> in the same way that Atlassian is apache friendly.
>>>>
>>>>
>>>> So - from what I can see we need to:
>>>>
>>>> - Talk through who is going to do it
>>>> - How are we going to host it
>>>> - When are we going to do it
>>>>
>>>> Anything else?
>>>>
>>>> ottO
>>
>>


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Otto Fowler
What is the Master Jira going to be?



On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com) wrote:

I am going to start the process of creating Jiras out of these initial
requirements. I agree with them and think they are a good starting point.
Feel free to join in at anytime and add/change/remove requirements as
needed. I will update the thread once I have the initial Jiras created and
we can go from there.

On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman  wrote:

> The purpose of this discussion is map out what is required to get the POC
> started with https://github.com/apache/metron/pull/858 into master.
>
> The following features were added in the previously mentioned PR:
>
> - Dockerfile for Metron REST
> - Dockerfile for Metron UIs
> - Docker Compose application including Metron images, Elasticsearch,
> Kafka, Zookeeper
> - Modified travis file that manages the Docker environment and runs
> the e2e tests as part of the build
> - Maven pom.xml that installs all the required assets into the Docker
> e2e module
> - Modified metron-alerts pom.xml that allows e2e tests to be run
> through Maven
> - An example integration test that has been converted to use the new
> infrastructure
>
> Here are the initial features proposed for acceptance into master:
>
> - All e2e and integration tests run on common infrastructure.
> - All e2e and integration tests are run automatically in the Travis
> build.
> - All e2e and integration tests run repeatably and reliably in the
> Travis build.
> - Debugging options are available and documented.
> - The new infra and how to interact with it is documented.
> - Old infrastructure removed (anything unused or commented out is
> deleted, instead of staying).
>
> Are there other requirements people want to add to this list?
>
>
>
>


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Otto Fowler
Same as the feature branch name?  I just want to find it and set a watch on
it ;)


On December 13, 2017 at 15:29:00, Ryan Merriman (merrim...@gmail.com) wrote:

I'm open to ideas. What do you think the title should be?

On Wed, Dec 13, 2017 at 2:13 PM, Otto Fowler 
wrote:

> What is the Master Jira going to be?
>
>
>
> On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com)
> wrote:
>
> I am going to start the process of creating Jiras out of these initial
> requirements. I agree with them and think they are a good starting point.
> Feel free to join in at anytime and add/change/remove requirements as
> needed. I will update the thread once I have the initial Jiras created
and
> we can go from there.
>
> On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman 
> wrote:
>
> > The purpose of this discussion is map out what is required to get the
> POC
> > started with https://github.com/apache/metron/pull/858 into master.
> >
> > The following features were added in the previously mentioned PR:
> >
> > - Dockerfile for Metron REST
> > - Dockerfile for Metron UIs
> > - Docker Compose application including Metron images, Elasticsearch,
> > Kafka, Zookeeper
> > - Modified travis file that manages the Docker environment and runs
> > the e2e tests as part of the build
> > - Maven pom.xml that installs all the required assets into the Docker
> > e2e module
> > - Modified metron-alerts pom.xml that allows e2e tests to be run
> > through Maven
> > - An example integration test that has been converted to use the new
> > infrastructure
> >
> > Here are the initial features proposed for acceptance into master:
> >
> > - All e2e and integration tests run on common infrastructure.
> > - All e2e and integration tests are run automatically in the Travis
> > build.
> > - All e2e and integration tests run repeatably and reliably in the
> > Travis build.
> > - Debugging options are available and documented.
> > - The new infra and how to interact with it is documented.
> > - Old infrastructure removed (anything unused or commented out is
> > deleted, instead of staying).
> >
> > Are there other requirements people want to add to this list?
> >
> >
> >
> >
>
>


Re: [DISCUSS] Community Meetings

2017-12-13 Thread Otto Fowler
+1


On December 13, 2017 at 16:39:52, James Sirota (jsir...@apache.org) wrote:

I can set up a dedicated Zoom room with a recurrent meeting and give PMC
members rights to the room. I think hosting these meetings should not be a
problem. I would vote not to record them, but rather provide the notes
after the meeting. It's a lot easier to skim through the notes than jump
around in a recording. As Simon mentioned, I would also make it explicitly
clear that the meetings are dev meetings. These are not user Q&A and are
not meant to be overviews of how different features of Metron work. If we
want to do feature demos or provide user content I would want that to be in
its own separate meeting.

Thanks,
James

13.12.2017, 05:00, "Otto Fowler" :
> I am ok with just notes and no recording.
>
> On December 13, 2017 at 04:37:20, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Good points Larry, we would need to get consent from everyone on the call
> to record to properly comply with regulations in some countries. We would
> definitely need someone to step up as note taker.
>
> Something else to think about is intended audience. Previously we’ve had
> meeting like this which have been very detailed Dev@ focussed (which is a
> great thing) but have rather alienated participants in User@ land. We
need
> to make it clear what level we’re talking about to be inclusive.
>
> Simon
>
>>  On 13 Dec 2017, at 00:44, larry mccay  wrote:
>>
>>  Not sure about posting the recordings - you will need to check and make
>>  sure that doesn't violate anything.
>>
>>  Just a friendly reminder...
>>  It is important that meetings have notes and a summary that is sent out
>>  describing topics to be decided on the mailing list.
>>  No decisions can be made in the community meeting itself - this gives
>>  others in other timezones and commitments review and voice in the
>
> decisions.
>>  If it didn't happen on the mailing lists then it didn't happen. :)
>>
>>  On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
>>  si...@simonellistonball.com> wrote:
>>
>>>  Yes, I do.
>>>
>>>  I suspect the best bet will be to post recordings somewhere on the
>>>  apache.org <http://apache.org/> metron site.
>>>
>>>  Simon
>>>
>>>>  On 12 Dec 2017, at 18:36, Otto Fowler 
wrote:
>>>>
>>>>  Excellent, do you have the > 40 min + record option?
>>>>
>>>>  On December 12, 2017 at 13:19:55, Simon Elliston Ball (
>>>>  si...@simonellistonball.com) wrote:
>>>>
>>>>  Happy to volunteer a zoom room. That seems to have worked for most in
>
> the
>>>>  past.
>>>>
>>>>  Simon
>>>>
>>>>>  On 12 Dec 2017, at 18:09, Otto Fowler 
wrote:
>>>>>
>>>>>  Thanks! I think I’d like something hosted though.
>>>>>
>>>>>  On December 12, 2017 at 11:18:52, Ahmed Shah (
>>>  ahmeds...@cmail.carleton.ca)
>>>>>  wrote:
>>>>>
>>>>>  Hello,
>>>>>
>>>>>  wrt "- How are we going to host it"...
>>>>>
>>>>>  I've used BigBlueButton as an end user at our University.
>>>>>
>>>>>  It is LGPL open source.
>>>>>
>>>>>  https://bigbluebutton.org/
>>>>>  https://bigbluebutton.org/developers/
>>>>>
>>>>>  -Ahmed
>>>>>
>>>>>  ___
>>>>>  Ahmed Shah (PMP, M. Eng.)
>>>>>  Cybersecurity Analyst & Developer
>>>>>  GCR - Cybersecurity Operations Center
>>>>>  Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>

>>>>>
>>>>>  
>>>>>  From: Otto Fowler 
>>>>>  Sent: December 11, 2017 4:41 PM
>>>>>  To: dev@metron.apache.org
>>>>>  Subject: [DISCUSS] Community Meetings
>>>>>
>>>>>  I think that we all want to have regular community meetings. We may
be
>>>>>  better able to keep to a regular schedule with these meetings if we
>>>>  spread
>>>>>  out the responsibility for them from James and Casey, both of whom
>
> have
>>>  a
>>>>>  lot on their plate already.
>>>>>
>>>>>  I would be willing to coordinate and run the meetings, and would
>
> welcome
>>>>>  anyone else who wants to help when they can.
>>>>>
>>>>>  The only issue for me is I do not have a web-ex account that I can
use
>>>  to
>>>>>  hold the meeting. So I’ll need some recommendations for a suitable
>>>>>  alternative. I have not been able to find an Apache Friendly
>>>  alternative,
>>>>>  in the same way that Atlassian is apache friendly.
>>>>>
>>>>>  So - from what I can see we need to:
>>>>>
>>>>>  - Talk through who is going to do it
>>>>>  - How are we going to host it
>>>>>  - When are we going to do it
>>>>>
>>>>>  Anything else?
>>>>>
>>>>>  ottO

---
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


Re: Metron - Emailing Alerts

2017-12-13 Thread Otto Fowler
While summary of _any_ metron data ( perhaps by query etc ) would be good,
let us not lose sight of the OP’s issue.  Ever with summary|digest or one
at a time, they are looking for sending mails to certain people based on
rule.

A pseudo path may be

INDEXING -> New Topology or ?? -> evaluate rules -> bin matches to batches
per destination -> create digest from bin’s and send on batch size or
timeout ( as the bulk writer does )

I’m sure there is something wrong with this, but it is easier to frame it
in the way we do it now, and then work from there for me.



On December 13, 2017 at 16:55:35, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That makes a lot of sense, especially if you wanted the detail in the email
as well. We could definitely use some good "reporting of alerts”
functionality that would make something like that work. What do people
think?

Simon

> On 13 Dec 2017, at 21:52, James Sirota  wrote:
>
> I think there may be gaps in doing it with the profiler. You can record
stats and counts of different alert types, and maybe even alert ids, but
you can't cross-correlate these IDs to the alert body. At least not in the
profiler. I was thinking about emailing something that looks like a
zeppelin report. You would run it in a cron, export to PDF, and send that
out as a summary. It can be a simple list of alerts that match your rule,
or it can have aggregations, graphics, metrics, KPI screens, etc. That
would be the feature that I would want to discuss and flesh out
>
> Thanks,
> James
>
> 13.12.2017, 14:26, "Simon Elliston Ball" :
>> We can already do that with profiles I would have thought. Create a
profile that only picks alerts and then base your emails only from the
alert events produced by that profile. Would that create the right batching
mechanism (at a cost of possible higher latency than you might get with a
more specific alert batcher?)
>>
>> Simon
>>
>>> On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>>
>>> I agree with Simon. If you email each alert individually you will be
overwhelmed. I think a better idea would be to email alert summaries
periodically, which is more manageable. This is probably a feature worthy
of consideration for Metron.
>>>
>>> 13.12.2017, 12:19, "Simon Elliston Ball" :
 Metron generates alerts onto a Kafka queue, which can be used to
integrate with Alert management tools, usually some sort of existing alert
aggregation tool.

 An alternative approach common with this is to have a tool like Apache
NiFi attach to the Metron alert feed and send email.

 The solution here would be to have Metron generate alerts (by adding
the is_alert: true flag in the enrichment process) and possibly other flags
like alert_email for example, and then have NiFi use ConsumeKafka and then
filter out the alert only messages in NiFi to use the PutEmail processor
(probably with a ControlRate before it too).

 Something I would caution is that email is not a great way to manage
or send alerts at the volume likely to occur in network monitoring tools. A
spike in network traffic can lead to a very large number of emails, which
tends to then cause you bigger problems. As such we usually find people
want some sort of buffering or aggregation of alerts, hence the use of a an
alert management or ticketing solution in front.

 Simon

> On 13 Dec 2017, at 19:06, Ahmed Shah 
wrote:
>
> Hello,
> Just wondering if Metron has a feature to email alerts based on rules
that a user defines.
>
> Example:
> Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
> Rule B: Email the user 1...@1.com whenever payload contains "critical"
>
> If not, does anyone have any recommendations on where to code these
rules in the Metron stack that uses attributes from the GROK parser?
>
> -Ahmed
> ___
> Ahmed Shah (PMP, M. Eng.)
> Cybersecurity Analyst & Developer
> GCR - Cybersecurity Operations Center
> Carleton University - cugcr.com
>>>
>>> ---
>>> Thank you,
>>>
>>> James Sirota
>>> PMC- Apache Metron
>>> jsirota AT apache DOT org
>
> ---
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org


Re: Metron - Emailing Alerts

2017-12-13 Thread Otto Fowler
We could also filter out of enrichment to a different topology based on
field like Simon has said so that the rules are run on a filtered set etc.

also s/Ever/Either/


On December 13, 2017 at 17:03:15, Otto Fowler (ottobackwa...@gmail.com)
wrote:

While summary of _any_ metron data ( perhaps by query etc ) would be good,
let us not lose sight of the OP’s issue.  Ever with summary|digest or one
at a time, they are looking for sending mails to certain people based on
rule.

A pseudo path may be

INDEXING -> New Topology or ?? -> evaluate rules -> bin matches to batches
per destination -> create digest from bin’s and send on batch size or
timeout ( as the bulk writer does )

I’m sure there is something wrong with this, but it is easier to frame it
in the way we do it now, and then work from there for me.



On December 13, 2017 at 16:55:35, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That makes a lot of sense, especially if you wanted the detail in the email
as well. We could definitely use some good "reporting of alerts”
functionality that would make something like that work. What do people
think?

Simon

> On 13 Dec 2017, at 21:52, James Sirota  wrote:
>
> I think there may be gaps in doing it with the profiler. You can record
stats and counts of different alert types, and maybe even alert ids, but
you can't cross-correlate these IDs to the alert body. At least not in the
profiler. I was thinking about emailing something that looks like a
zeppelin report. You would run it in a cron, export to PDF, and send that
out as a summary. It can be a simple list of alerts that match your rule,
or it can have aggregations, graphics, metrics, KPI screens, etc. That
would be the feature that I would want to discuss and flesh out
>
> Thanks,
> James
>
> 13.12.2017, 14:26, "Simon Elliston Ball" :
>> We can already do that with profiles I would have thought. Create a
profile that only picks alerts and then base your emails only from the
alert events produced by that profile. Would that create the right batching
mechanism (at a cost of possible higher latency than you might get with a
more specific alert batcher?)
>>
>> Simon
>>
>>> On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>>
>>> I agree with Simon. If you email each alert individually you will be
overwhelmed. I think a better idea would be to email alert summaries
periodically, which is more manageable. This is probably a feature worthy
of consideration for Metron.
>>>
>>> 13.12.2017, 12:19, "Simon Elliston Ball" :
>>>> Metron generates alerts onto a Kafka queue, which can be used to
integrate with Alert management tools, usually some sort of existing alert
aggregation tool.
>>>>
>>>> An alternative approach common with this is to have a tool like Apache
NiFi attach to the Metron alert feed and send email.
>>>>
>>>> The solution here would be to have Metron generate alerts (by adding
the is_alert: true flag in the enrichment process) and possibly other flags
like alert_email for example, and then have NiFi use ConsumeKafka and then
filter out the alert only messages in NiFi to use the PutEmail processor
(probably with a ControlRate before it too).
>>>>
>>>> Something I would caution is that email is not a great way to manage
or send alerts at the volume likely to occur in network monitoring tools. A
spike in network traffic can lead to a very large number of emails, which
tends to then cause you bigger problems. As such we usually find people
want some sort of buffering or aggregation of alerts, hence the use of a an
alert management or ticketing solution in front.
>>>>
>>>> Simon
>>>>
>>>>> On 13 Dec 2017, at 19:06, Ahmed Shah 
wrote:
>>>>>
>>>>> Hello,
>>>>> Just wondering if Metron has a feature to email alerts based on rules
that a user defines.
>>>>>
>>>>> Example:
>>>>> Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
>>>>> Rule B: Email the user 1...@1.com whenever payload contains "critical"
>>>>>
>>>>> If not, does anyone have any recommendations on where to code these
rules in the Metron stack that uses attributes from the GROK parser?
>>>>>
>>>>> -Ahmed
>>>>> ___
>>>>> Ahmed Shah (PMP, M. Eng.)
>>>>> Cybersecurity Analyst & Developer
>>>>> GCR - Cybersecurity Operations Center
>>>>> Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>
>>>
>>> ---
>>> Thank you,
>>>
>>> James Sirota
>>> PMC- Apache Metron
>>> jsirota AT apache DOT org
>
> ---
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Otto Fowler
Awesome Ryan!
Have you thought about confluence?


On December 13, 2017 at 18:11:39, Ryan Merriman (merrim...@gmail.com) wrote:

I took a first pass at adding tasks and will continue adding more as I
think of them. I will wait for feedback on which modules to include before
I add all those (only added metron-elasticsearch for now). I left all but
a couple unassigned so that anyone can pick up a task if they want.

On Wed, Dec 13, 2017 at 4:41 PM, Ryan Merriman  wrote:

> Jira is here: https://issues.apache.org/jira/browse/METRON-1352. I am
> starting to create sub-tasks based on the requirements outlined above and
> included in that Jira description.
>
> I am compiling a list of modules that we'll need to convert to the
testing
> infrastructure. Based on imports of ComponentRunner, I get these modules:
>
> - metron-elasticsearch
> - metron-enrichment
> - metron-indexing
> - metron-integration-test
> - metron-maas-service
> - metron-management
> - metron-pcap-backend
> - metron-profiler
> - metron-rest
> - metron-solr
>
> I am planning on creating sub-tasks for each of these. I know that
> metron-common should also be converted because it uses the Zookeeper in
> memory server but doesn't use ComponentRunner to manage it. Are there
> other modules like this that you know of?
>
> On Wed, Dec 13, 2017 at 2:44 PM, Otto Fowler 
> wrote:
>
>> Same as the feature branch name? I just want to find it and set a watch
>> on it ;)
>>
>>
>> On December 13, 2017 at 15:29:00, Ryan Merriman (merrim...@gmail.com)
>> wrote:
>>
>> I'm open to ideas. What do you think the title should be?
>>
>> On Wed, Dec 13, 2017 at 2:13 PM, Otto Fowler 
>> wrote:
>>
>> > What is the Master Jira going to be?
>> >
>> >
>> >
>> > On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com)
>> > wrote:
>> >
>> > I am going to start the process of creating Jiras out of these initial
>> > requirements. I agree with them and think they are a good starting
>> point.
>> > Feel free to join in at anytime and add/change/remove requirements as
>> > needed. I will update the thread once I have the initial Jiras created
>> and
>> > we can go from there.
>> >
>> > On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman 
>> > wrote:
>> >
>> > > The purpose of this discussion is map out what is required to get
the
>> > POC
>> > > started with https://github.com/apache/metron/pull/858 into master.
>> > >
>> > > The following features were added in the previously mentioned PR:
>> > >
>> > > - Dockerfile for Metron REST
>> > > - Dockerfile for Metron UIs
>> > > - Docker Compose application including Metron images, Elasticsearch,
>> > > Kafka, Zookeeper
>> > > - Modified travis file that manages the Docker environment and runs
>> > > the e2e tests as part of the build
>> > > - Maven pom.xml that installs all the required assets into the
Docker
>> > > e2e module
>> > > - Modified metron-alerts pom.xml that allows e2e tests to be run
>> > > through Maven
>> > > - An example integration test that has been converted to use the new
>> > > infrastructure
>> > >
>> > > Here are the initial features proposed for acceptance into master:
>> > >
>> > > - All e2e and integration tests run on common infrastructure.
>> > > - All e2e and integration tests are run automatically in the Travis
>> > > build.
>> > > - All e2e and integration tests run repeatably and reliably in the
>> > > Travis build.
>> > > - Debugging options are available and documented.
>> > > - The new infra and how to interact with it is documented.
>> > > - Old infrastructure removed (anything unused or commented out is
>> > > deleted, instead of staying).
>> > >
>> > > Are there other requirements people want to add to this list?
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>


Re: [DISCUSS] Community Meetings

2017-12-14 Thread Otto Fowler
Ok,

So we will be concerned with two types of meetings.  I’ll take
responsibility for calling the meetings and ‘moderation’.

Dev meetings
 - feedback on how things are going overall
 - discussions on specific technical problems
 - discussion of possible improvements

User meetings
 - demos
 - user content ( how I’m using metron )
 - some unavoidable discussion on problems
 - some requirements gathering triage

ALL
 - I will call
 - I will gather input for agenda
 - I will distribute the agenda
 - I will distribute the notes to the list and on confluence
 - No decisions will be made, only discussed and then put to list
 - besides general nodes, breakout messages for topical discussion or
decisions



How does that sound?


On December 13, 2017 at 16:41:29, Otto Fowler (ottobackwa...@gmail.com)
wrote:

+1


On December 13, 2017 at 16:39:52, James Sirota (jsir...@apache.org) wrote:

I can set up a dedicated Zoom room with a recurrent meeting and give PMC
members rights to the room. I think hosting these meetings should not be a
problem. I would vote not to record them, but rather provide the notes
after the meeting. It's a lot easier to skim through the notes than jump
around in a recording. As Simon mentioned, I would also make it explicitly
clear that the meetings are dev meetings. These are not user Q&A and are
not meant to be overviews of how different features of Metron work. If we
want to do feature demos or provide user content I would want that to be in
its own separate meeting.

Thanks,
James

13.12.2017, 05:00, "Otto Fowler" :
> I am ok with just notes and no recording.
>
> On December 13, 2017 at 04:37:20, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Good points Larry, we would need to get consent from everyone on the call
> to record to properly comply with regulations in some countries. We would
> definitely need someone to step up as note taker.
>
> Something else to think about is intended audience. Previously we’ve had
> meeting like this which have been very detailed Dev@ focussed (which is a
> great thing) but have rather alienated participants in User@ land. We need
> to make it clear what level we’re talking about to be inclusive.
>
> Simon
>
>>  On 13 Dec 2017, at 00:44, larry mccay  wrote:
>>
>>  Not sure about posting the recordings - you will need to check and make
>>  sure that doesn't violate anything.
>>
>>  Just a friendly reminder...
>>  It is important that meetings have notes and a summary that is sent out
>>  describing topics to be decided on the mailing list.
>>  No decisions can be made in the community meeting itself - this gives
>>  others in other timezones and commitments review and voice in the
>
> decisions.
>>  If it didn't happen on the mailing lists then it didn't happen. :)
>>
>>  On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
>>  si...@simonellistonball.com> wrote:
>>
>>>  Yes, I do.
>>>
>>>  I suspect the best bet will be to post recordings somewhere on the
>>>  apache.org <http://apache.org/> metron site.
>>>
>>>  Simon
>>>
>>>>  On 12 Dec 2017, at 18:36, Otto Fowler  wrote:
>>>>
>>>>  Excellent, do you have the > 40 min + record option?
>>>>
>>>>  On December 12, 2017 at 13:19:55, Simon Elliston Ball (
>>>>  si...@simonellistonball.com) wrote:
>>>>
>>>>  Happy to volunteer a zoom room. That seems to have worked for most in
>
> the
>>>>  past.
>>>>
>>>>  Simon
>>>>
>>>>>  On 12 Dec 2017, at 18:09, Otto Fowler 
wrote:
>>>>>
>>>>>  Thanks! I think I’d like something hosted though.
>>>>>
>>>>>  On December 12, 2017 at 11:18:52, Ahmed Shah (
>>>  ahmeds...@cmail.carleton.ca)
>>>>>  wrote:
>>>>>
>>>>>  Hello,
>>>>>
>>>>>  wrt "- How are we going to host it"...
>>>>>
>>>>>  I've used BigBlueButton as an end user at our University.
>>>>>
>>>>>  It is LGPL open source.
>>>>>
>>>>>  https://bigbluebutton.org/
>>>>>  https://bigbluebutton.org/developers/
>>>>>
>>>>>  -Ahmed
>>>>>
>>>>>  ___
>>>>>  Ahmed Shah (PMP, M. Eng.)
>>>>>  Cybersecurity Analyst & Developer
>>>>>  GCR - Cybersecurity Operations Center
>>>>>  Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>

Re: [DISCUSS] Stellar Documentation Autogeneration

2017-12-14 Thread Otto Fowler
I think this is a great idea, and I looked at the POC and it isn’t as bad
as you make it out to be;)

What I would like to see is documentation for Stellar functions, by
namespace generated. I would also
like the capability to document at the namespace level.

Often we have namespace level concepts that don’t fit into any given
function’s documentation.
Setting aside the how of the namespace documentation for a moment, based on
the POC I would
suggest that we

* find all namespaces
* create a page per namespace
* document each function in it’s namespace’s page
* include the namespace doc in that page

Each module that exports stellar function’s should have it’s own
documentation.  As part of breaking stellar out to it’s own module
we should remove stellar documentation from stellar common that applies to
functions outside that module.



On December 14, 2017 at 14:32:56, Justin Leet (justinjl...@gmail.com) wrote:

I think it would be valuable to have the documentation around Stellar being
autogenerated. We have most of the info we'd want in the @Stellar
annotation, and ideally, we could just pull this info out and produce some
docs similar to what we already manually maintain. This came up a bit in
the context of https://issues.apache.org/jira/browse/METRON-1361

I put together a super, super (super!) rough POC of using the approach of
Javadoc-style doclet processing that reads the annotations and kicks out
something pretty close to the current docs (without any fancy stuff like
the table of contents and so on).

Right now, there'd be a good deal more to do that to make it usable. Off
the top of my head, the main things I wanted to look at before really even
taking an actual stab at it are

1) abstracting out the markdown formatting from the annotation parsing
2) Making sure we can integrate this approach without breaking current
Javadocs
3) Managing things across projects (since we put in Stellar functions all
over).
4) Slightly more though about how we'd manage it.

Otto's alluded to having a couple thoughts, and I'm more than happy to get
a better idea of what we want the end state to look like (either this or
something else, e.g. an annotation processor during compile phase or if
someone knows a tool that takes care of this sort of thing.)

Any thoughts?


Re: [DISCUSS] Community Meetings

2017-12-14 Thread Otto Fowler
Excellen Ahmed, that is just the kind of thing that I would think the
community would like to see.


On December 14, 2017 at 14:56:07, Ahmed Shah (ahmeds...@cmail.carleton.ca)
wrote:

Hello,


For the user meeting we (GCR) could volunteer demoing our dashboard (if
screen sharing is possible) and let everyone one know how we use Metron.


Our project is here:

https://github.com/LTW-GCR-CSOC/csoc-installation-scripts/


-Ahmed
___
Ahmed Shah (PMP, M. Eng.)
Cybersecurity Analyst & Developer
GCR - Cybersecurity Operations Center
Carleton University - cugcr.com<https://cugcr.com/tiki/lce/index.php>



From: Laurens Vets 
Sent: December 14, 2017 11:24 AM
To: dev@metron.apache.org
Cc: James Sirota
Subject: Re: [DISCUSS] Community Meetings


Sounds good to me :)

On 2017-12-14 05:59, Otto Fowler wrote:
> Ok,
>
> So we will be concerned with two types of meetings. I’ll take
> responsibility for calling the meetings and ‘moderation’.
>
> Dev meetings
> - feedback on how things are going overall
> - discussions on specific technical problems
> - discussion of possible improvements
>
> User meetings
> - demos
> - user content ( how I’m using metron )
> - some unavoidable discussion on problems
> - some requirements gathering triage
>
> ALL
> - I will call
> - I will gather input for agenda
> - I will distribute the agenda
> - I will distribute the notes to the list and on confluence
> - No decisions will be made, only discussed and then put to list
> - besides general nodes, breakout messages for topical discussion or
> decisions
>
>
>
> How does that sound?
>
>
> On December 13, 2017 at 16:41:29, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> +1
>
>
> On December 13, 2017 at 16:39:52, James Sirota (jsir...@apache.org)
> wrote:
>
> I can set up a dedicated Zoom room with a recurrent meeting and give
> PMC
> members rights to the room. I think hosting these meetings should not
> be a
> problem. I would vote not to record them, but rather provide the notes
> after the meeting. It's a lot easier to skim through the notes than
> jump
> around in a recording. As Simon mentioned, I would also make it
> explicitly
> clear that the meetings are dev meetings. These are not user Q&A and
> are
> not meant to be overviews of how different features of Metron work. If
> we
> want to do feature demos or provide user content I would want that to
> be in
> its own separate meeting.
>
> Thanks,
> James
>
> 13.12.2017, 05:00, "Otto Fowler" :
>> I am ok with just notes and no recording.
>>
>> On December 13, 2017 at 04:37:20, Simon Elliston Ball (
>> si...@simonellistonball.com) wrote:
>>
>> Good points Larry, we would need to get consent from everyone on the
>> call
>> to record to properly comply with regulations in some countries. We
>> would
>> definitely need someone to step up as note taker.
>>
>> Something else to think about is intended audience. Previously we’ve
>> had
>> meeting like this which have been very detailed Dev@ focussed (which
>> is a
>> great thing) but have rather alienated participants in User@ land. We
>> need
>> to make it clear what level we’re talking about to be inclusive.
>>
>> Simon
>>
>>> On 13 Dec 2017, at 00:44, larry mccay  wrote:
>>>
>>> Not sure about posting the recordings - you will need to check and
>>> make
>>> sure that doesn't violate anything.
>>>
>>> Just a friendly reminder...
>>> It is important that meetings have notes and a summary that is sent
>>> out
>>> describing topics to be decided on the mailing list.
>>> No decisions can be made in the community meeting itself - this
>>> gives
>>> others in other timezones and commitments review and voice in the
>>
>> decisions.
>>> If it didn't happen on the mailing lists then it didn't happen. :)
>>>
>>> On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
>>> si...@simonellistonball.com> wrote:
>>>
>>>> Yes, I do.
>>>>
>>>> I suspect the best bet will be to post recordings somewhere on the
>>>> apache.org <http://apache.org/> metron site.
>>>>
>>>> Simon
>>>>
>>>>> On 12 Dec 2017, at 18:36, Otto Fowler 
>>>>> wrote:
>>>>>
>>>>> Excellent, do you have the > 40 min + record option?
>>>>>
>>>>> On December 12, 2017 at 13:19:55, Sim

[DEV COMMUNITY MEETING] Call for Ideas and Schedule

2017-12-14 Thread Otto Fowler
Dev Community Meeting Call

I would like to propose a developer community meeting.

I propose that we set the meeting early next week, and will throw out
Monday, December 18th at 09:30AM PST, 12:30 on the East Coast and 5:30 in
London Towne.

This meeting will be held over a web-ex, the details of which will be
included in the actual meeting notice.

Please reply to this with scheduling concerns and topic suggestions.
Potential Topics

   - Call for reviewers, ideas how to get more involvement, what people can
   do to help
   - Feature branches : we have two now, what are they and how are we going
   to work on them
   - Extension Repository: Default deployment and installation of parsers
   as it relates to ‘777’
   - General ‘777’ discussion

Developer Community Meeting Disclaimers

   - Developer Community meetings are a means for realtime discussion of
   development issues
   - These meetings are not specifically aimed at demonstrations, unless
   one is required or requested as part of such discussion
   - These meetings are geared towards Metron development issues, not user
   issues with deployment or shipped functionality
   - There are *NO* decisions made in these meetings. The mailing list is
   the official communication record of the Apache Metron Project, and as such
   all public decisions are to be made on the list, as to give the greatest
   opportunity for community involvement.
   - There *ARE* proposals that can be made and discussed in these
   meetings, that will then be discussed on list for decision.
   - Notes will be taken of these meetings, and they will be posted to the
   list
   - There may also be breakout posts to the list per proposal or topic,
   for more detailed discussion


Re: [DISCUSS] Support Ubuntu Installs in the MPack

2017-12-14 Thread Otto Fowler
This sounds awesome.  The hortonworks article is getting older ever day.
This seems like a feature branch candidate.

On December 14, 2017 at 18:22:33, Nick Allen (n...@nickallen.org) wrote:

I've done some work to get the MPack working on Ubuntu. I'd like to get
that work packaged up and contributed back to Apache. I think it would be
genuinly useful to the community.

Here is how I was thinking about tackling that through a series of PRs.

1. Create the DEBs necessary for installing on Ubuntu. See PR #868.

2. Submit 3 or 4 separate PRs that enhance the existing MPack so that it
works on both CentOS and Ubuntu. I honestly am not sure how many will fall
out of the work that I've done, but I will try to chop it up logically so
that it is easy to review.

3. Create a "Full Dev" equivalent for Ubuntu so that we can see the
end-to-end install work for Ubuntu in an automated fashion.


** I do not expect developers to test their PRs on both CentOS and Ubuntu.
I think the existing CentOS "Full Dev" should remain as the gold standard
that we test PRs against. No changes there.

Let me know if you have feedback or thoughts on this.

Chao


Re: [DISCUSS] Support Ubuntu Installs in the MPack

2017-12-15 Thread Otto Fowler
I’m ok if it is not. Suggesting because it is a series of prs.

The end goal is Ubuntu Ambari + Deb and full-dev-ubuntu right?

On December 15, 2017 at 10:03:23, Nick Allen (n...@nickallen.org) wrote:

> This seems like a feature branch candidate.

Personally, I don't see the need for a feature branch on this one.  It
won't involve big, architectural changes.  The touch points are
constrained.  Everything that we currently have will continue to work as it
always had after each PR.  If you feel strongly the other way, please
provide your reasoning to help me understand.




On Thu, Dec 14, 2017 at 6:28 PM, Otto Fowler 
wrote:

> This sounds awesome.  The hortonworks article is getting older ever day.
> This seems like a feature branch candidate.
>
> On December 14, 2017 at 18:22:33, Nick Allen (n...@nickallen.org) wrote:
>
> I've done some work to get the MPack working on Ubuntu. I'd like to get
> that work packaged up and contributed back to Apache. I think it would be
> genuinly useful to the community.
>
> Here is how I was thinking about tackling that through a series of PRs.
>
> 1. Create the DEBs necessary for installing on Ubuntu. See PR #868.
>
> 2. Submit 3 or 4 separate PRs that enhance the existing MPack so that it
> works on both CentOS and Ubuntu. I honestly am not sure how many will fall
> out of the work that I've done, but I will try to chop it up logically so
> that it is easy to review.
>
> 3. Create a "Full Dev" equivalent for Ubuntu so that we can see the
> end-to-end install work for Ubuntu in an automated fashion.
>
>
> ** I do not expect developers to test their PRs on both CentOS and Ubuntu.
> I think the existing CentOS "Full Dev" should remain as the gold standard
> that we test PRs against. No changes there.
>
> Let me know if you have feedback or thoughts on this.
>
> Chao
>
>


Re: [DISCUSS] Support Ubuntu Installs in the MPack

2017-12-15 Thread Otto Fowler
It would almost seem like this is a contrib or incubating effort then no?
You didn’t have to write that Ubuntu guide for nothing.

Maybe we should be more explicit in that way with regards to support.
When we have it fully supported it can ‘graduate’ to the main
metron-deployment.



On December 15, 2017 at 10:54:22, Casey Stella (ceste...@gmail.com) wrote:

Nick is right that the ASF does not provide support in an explicit way
(i.e. there are no pathways to get *prioritized* support via SLAs, etc.),
but it is expected that apache projects provide support via mailing lists
and answered by volunteers.  Specifically, this is the crux of the
"community over code" credo.  That philosophical point aside, I think what
Justin may be intending is "support" in the sense of how much do we fold
Ubuntu into our testing cycle.  It could be said that we tacitly "support"
configurations which we test, beyond that caveat emptor.  Which is to say
that questions on the mailing lists for Metron on Centos will likely be
answered whereas Metron on OpenBSD might be met with more skepticism or not
answered.

I would argue that we start with Nick's very generous contribution without
forcing developers to test their code against it.  Eventually, when we have
a full-dev that spins up ubuntu, I'd argue that we could consider folding
it into our testing plans for an RC.

Regarding whether it fits in a feature branch, I think that as long as each
PR stands alone in providing value, we can avoid a feature branch.  It
might be worthwhile constructing a JIRA in apache to capture the follow-on
tasks required to bring Ubuntu into a status where it's more prominent in
our testing cycle.

On Fri, Dec 15, 2017 at 10:45 AM, Nick Allen  wrote:

> > The end goal is Ubuntu Ambari + Deb and full-dev-ubuntu right?
>
> That list sounds good to me.
>
> (Plus, some way of dealing with Justin's point about support.)
>
>
>
> On Fri, Dec 15, 2017 at 10:11 AM Otto Fowler 
> wrote:
>
> > I’m ok if it is not. Suggesting because it is a series of prs.
> >
> > The end goal is Ubuntu Ambari + Deb and full-dev-ubuntu right?
> >
> > On December 15, 2017 at 10:03:23, Nick Allen (n...@nickallen.org) wrote:
> >
> > > This seems like a feature branch candidate.
> >
> > Personally, I don't see the need for a feature branch on this one.  It
> > won't involve big, architectural changes.  The touch points are
> > constrained.  Everything that we currently have will continue to work as
> it
> > always had after each PR.  If you feel strongly the other way, please
> > provide your reasoning to help me understand.
> >
> >
> >
> >
> > On Thu, Dec 14, 2017 at 6:28 PM, Otto Fowler 
> > wrote:
> >
> >> This sounds awesome.  The hortonworks article is getting older ever day.
> >> This seems like a feature branch candidate.
> >>
> >> On December 14, 2017 at 18:22:33, Nick Allen (n...@nickallen.org)
> wrote:
> >>
> >> I've done some work to get the MPack working on Ubuntu. I'd like to get
> >> that work packaged up and contributed back to Apache. I think it would
> be
> >> genuinly useful to the community.
> >>
> >> Here is how I was thinking about tackling that through a series of PRs.
> >>
> >> 1. Create the DEBs necessary for installing on Ubuntu. See PR #868.
> >>
> >> 2. Submit 3 or 4 separate PRs that enhance the existing MPack so that it
> >> works on both CentOS and Ubuntu. I honestly am not sure how many will
> fall
> >> out of the work that I've done, but I will try to chop it up logically
> so
> >> that it is easy to review.
> >>
> >> 3. Create a "Full Dev" equivalent for Ubuntu so that we can see the
> >> end-to-end install work for Ubuntu in an automated fashion.
> >>
> >>
> >> ** I do not expect developers to test their PRs on both CentOS and
> Ubuntu.
> >> I think the existing CentOS "Full Dev" should remain as the gold
> standard
> >> that we test PRs against. No changes there.
> >>
> >> Let me know if you have feedback or thoughts on this.
> >>
> >> Chao
> >>
> >>
> >
>


Re: [DEV COMMUNITY MEETING] Call for Ideas and Schedule

2017-12-15 Thread Otto Fowler
Great guys,

I’m going to leave the call OPEN for ideas, but at this point let’s say we
are going to schedule it for that time.
I will send the announce when I hear back form James about the room


On December 15, 2017 at 15:44:59, Michael Miklavcic (
michael.miklav...@gmail.com) wrote:

Sounds good Otto. We probably also want to touch on the ES 5.6 upgrade
along with our current release status and short-term release roadmap that
Nick Allen has been guiding.

On Fri, Dec 15, 2017 at 9:02 AM, Laurens Vets  wrote:

> I'll try to attend :)
>
>
> On 2017-12-14 12:43, Otto Fowler wrote:
>
>> Dev Community Meeting Call
>>
>> I would like to propose a developer community meeting.
>>
>> I propose that we set the meeting early next week, and will throw out
>> Monday, December 18th at 09:30AM PST, 12:30 on the East Coast and 5:30 in
>> London Towne.
>>
>> This meeting will be held over a web-ex, the details of which will be
>> included in the actual meeting notice.
>>
>> Please reply to this with scheduling concerns and topic suggestions.
>> Potential Topics
>>
>>- Call for reviewers, ideas how to get more involvement, what people
>> can
>>do to help
>>- Feature branches : we have two now, what are they and how are we
>> going
>>to work on them
>>- Extension Repository: Default deployment and installation of parsers
>>as it relates to ‘777’
>>- General ‘777’ discussion
>>
>> Developer Community Meeting Disclaimers
>>
>>- Developer Community meetings are a means for realtime discussion of
>>development issues
>>- These meetings are not specifically aimed at demonstrations, unless
>>one is required or requested as part of such discussion
>>- These meetings are geared towards Metron development issues, not user
>>issues with deployment or shipped functionality
>>- There are *NO* decisions made in these meetings. The mailing list is
>>the official communication record of the Apache Metron Project, and as
>> such
>>all public decisions are to be made on the list, as to give the
>> greatest
>>opportunity for community involvement.
>>- There *ARE* proposals that can be made and discussed in these
>>meetings, that will then be discussed on list for decision.
>>- Notes will be taken of these meetings, and they will be posted to the
>>list
>>- There may also be breakout posts to the list per proposal or topic,
>>for more detailed discussion
>>
>


Re: [INTRODUCTIO] Would like to contribute

2017-12-17 Thread Otto Fowler
Hi Pushpitha!

Welcome!

Joining the dev and user mailing lists is a great start.  Also, take a look
through https://metron.apache.org , the community and documentation sites
etc.
I find reading the list archives helpful as well.  You will also find our
free node irc channel there.

As to how to contribute, there are many ways to contribute, depending on
your understanding, skills, and interests.

Reviewing PR’s and Documentation helps even if a person doesn’t code
themselves.  There are newbie jira issues in the Apache jira as well :
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=128&projectKey=METRON&view=planning.nodetail&quickFilter=797&epics=visible,
 although I’m not sure that some haven’t be obsoleted…..

There are many ways to get involved, and everyone has probably taken a
different route, but that is ok, there is no ‘one way’ to contribute.





On December 16, 2017 at 00:23:56, Pushpitha Somathilaka (
pushpitha...@cse.mrt.ac.lk) wrote:

Hi all,

I am Pushpitha Dilhan and I am a final year undergraduate majoring Computer
Science. I would like to contribute Apache Metron and hope to have guidance
for a start.

Thank you
Pushpitha


Re: [DEV COMMUNITY MEETING] Call for Ideas and Schedule

2017-12-17 Thread Otto Fowler
I am still waiting for info on the zoom room.  I will have an email out by
the end of the day with information, one way or another.



On December 15, 2017 at 18:15:21, Otto Fowler (ottobackwa...@gmail.com)
wrote:

Great guys,

I’m going to leave the call OPEN for ideas, but at this point let’s say we
are going to schedule it for that time.
I will send the announce when I hear back form James about the room


On December 15, 2017 at 15:44:59, Michael Miklavcic (
michael.miklav...@gmail.com) wrote:

Sounds good Otto. We probably also want to touch on the ES 5.6 upgrade
along with our current release status and short-term release roadmap that
Nick Allen has been guiding.

On Fri, Dec 15, 2017 at 9:02 AM, Laurens Vets  wrote:

> I'll try to attend :)
>
>
> On 2017-12-14 12:43, Otto Fowler wrote:
>
>> Dev Community Meeting Call
>>
>> I would like to propose a developer community meeting.
>>
>> I propose that we set the meeting early next week, and will throw out
>> Monday, December 18th at 09:30AM PST, 12:30 on the East Coast and 5:30 in
>> London Towne.
>>
>> This meeting will be held over a web-ex, the details of which will be
>> included in the actual meeting notice.
>>
>> Please reply to this with scheduling concerns and topic suggestions.
>> Potential Topics
>>
>>- Call for reviewers, ideas how to get more involvement, what people
>> can
>>do to help
>>- Feature branches : we have two now, what are they and how are we
>> going
>>to work on them
>>- Extension Repository: Default deployment and installation of parsers
>>as it relates to ‘777’
>>- General ‘777’ discussion
>>
>> Developer Community Meeting Disclaimers
>>
>>- Developer Community meetings are a means for realtime discussion of
>>development issues
>>- These meetings are not specifically aimed at demonstrations, unless
>>one is required or requested as part of such discussion
>>- These meetings are geared towards Metron development issues, not user
>>issues with deployment or shipped functionality
>>- There are *NO* decisions made in these meetings. The mailing list is
>>the official communication record of the Apache Metron Project, and as
>> such
>>all public decisions are to be made on the list, as to give the
>> greatest
>>opportunity for community involvement.
>>- There *ARE* proposals that can be made and discussed in these
>>meetings, that will then be discussed on list for decision.
>>- Notes will be taken of these meetings, and they will be posted to the
>>list
>>- There may also be breakout posts to the list per proposal or topic,
>>for more detailed discussion
>>
>


December Developer Community Meeting

2017-12-17 Thread Otto Fowler
The December Community Meeting will be held Monday, December 18th.
These are the topics that are up for discussion

   - Call for reviewers, ideas how to get more involvement, what people can
   do to help (Otto)
   - Feature branches : we have two now, what are they and how are we going
   to work on them (Otto)
   - Release process WRT formalized upgrade and installation instructions
   to be
   included as a part of a release (JZeolla)
   - Any concerns/questions
   with the secondary repo for bro. (JZeolla)
   - ES 5.6 upgrade (michael.miklav...@gmail.com)
   - Release Status(michael.miklav...@gmail.com)
   - Short Term Roadmap(michael.miklav...@gmail.com)


We may only have 40 minutes, so we’ll try to keep things concise, and
follow up with Discuss threads.

*NOTE: IF THE ROOM CHANGES I WILL SEND AN UPDATE*


Topic: Metron Developer Community Meeting
Time: Dec 18, 2017 12:30 PM Eastern Time (US and Canada)

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/4534152036

Or iPhone one-tap :
US: +16468769923,,4534152036#  or +16699006833,,4534152036#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 646 876 9923  or +1 669 900 6833
Meeting ID: 453 415 2036
International numbers available:
https://zoom.us/zoomconference?m=iwzFkc-YD_msf1cfRJL21VDYsExP41jo


Developer Community Meeting Disclaimers

   - Developer Community meetings are a means for realtime discussion of
   development issues
   - These meetings are not specifically aimed at demonstrations, unless
   one is required or requested as part of such discussion
   - These meetings are geared towards Metron development issues, not user
   issues with deployment or shipped functionality
   - There are NO decisions made in these meetings. The mailing list is the
   official communication record of the Apache Metron Project, and as such all
   public decisions are to be made on the list, as to give the greatest
   opportunity for community involvement.
   - There ARE proposals that can be made and discussed in these meetings,
   that will then be discussed on list for decision.
   - Notes will be taken of these meetings, and they will be posted to the
   list
   - There may also be breakout posts to the list per proposal or topic,
   for more detailed discussion


Re: UPDATE MEETING December Developer Community Meeting

2017-12-17 Thread Otto Fowler
We will be using this meeting

Topic: Community zoom meeting
Time: this is a recurring meeting Meet anytime

Join from PC, Mac, Linux, iOS or Android:
https://hortonworks.zoom.us/j/658498271

Or join by phone:

+1 669 900 6833  (US Toll) or +1 646 558 8656  (US Toll)
+1 877 853 5247  (US Toll Free)
+1 877 369 0926  (US Toll Free)
Meeting ID: 658 498 271
International numbers available:
https://hortonworks.zoom.us/zoomconference?m=y7M0gPfv8kRv3WvXHjXrpc3n3DyNqTMe


On December 17, 2017 at 13:05:50, Otto Fowler (ottobackwa...@gmail.com)
wrote:

The December Community Meeting will be held Monday, December 18th.
These are the topics that are up for discussion

   - Call for reviewers, ideas how to get more involvement, what people can
   do to help (Otto)
   - Feature branches : we have two now, what are they and how are we going
   to work on them (Otto)
   - Release process WRT formalized upgrade and installation instructions
   to be
   included as a part of a release (JZeolla)
   - Any concerns/questions
   with the secondary repo for bro. (JZeolla)
   - ES 5.6 upgrade (michael.miklav...@gmail.com)
   - Release Status(michael.miklav...@gmail.com)
   - Short Term Roadmap(michael.miklav...@gmail.com)


We may only have 40 minutes, so we’ll try to keep things concise, and
follow up with Discuss threads.

*NOTE: IF THE ROOM CHANGES I WILL SEND AN UPDATE*


Topic: Metron Developer Community Meeting
Time: Dec 18, 2017 12:30 PM Eastern Time (US and Canada)

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/4534152036

Or iPhone one-tap :
US: +16468769923,,4534152036#  or +16699006833,,4534152036#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 646 876 9923  or +1 669 900 6833
Meeting ID: 453 415 2036
International numbers available:
https://zoom.us/zoomconference?m=iwzFkc-YD_msf1cfRJL21VDYsExP41jo


Developer Community Meeting Disclaimers

   - Developer Community meetings are a means for realtime discussion of
   development issues
   - These meetings are not specifically aimed at demonstrations, unless
   one is required or requested as part of such discussion
   - These meetings are geared towards Metron development issues, not user
   issues with deployment or shipped functionality
   - There are NO decisions made in these meetings. The mailing list is the
   official communication record of the Apache Metron Project, and as such all
   public decisions are to be made on the list, as to give the greatest
   opportunity for community involvement.
   - There ARE proposals that can be made and discussed in these meetings,
   that will then be discussed on list for decision.
   - Notes will be taken of these meetings, and they will be posted to the
   list
   - There may also be breakout posts to the list per proposal or topic,
   for more detailed discussion


Re: [DISCUSS] Upcoming Release

2017-12-18 Thread Otto Fowler
Once we have the area, I can do the same for the RC check script


On December 18, 2017 at 11:11:18, Nick Allen (n...@nickallen.org) wrote:

Sure, I can clean up the script a bit and submit a PR for it.

Jon and Otto asked that I open a PR on the script that I use for merging
PRs too.

On Fri, Dec 15, 2017 at 2:30 PM, Matt Foley  wrote:

> Perhaps under “build_utils” we should add a subdirectory for
> “release_utils”.
>
> From: Casey Stella 
> Date: Friday, December 15, 2017 at 10:50 AM
> To: "dev@metron.apache.org" 
> Cc: Matt Foley 
> Subject: Re: [DISCUSS] Upcoming Release
>
> That script seems great, nick! Perhaps we should adjust the wiki around
> releases to point to it? Thoughts?
>
> On Fri, Dec 15, 2017 at 1:47 PM, Nick Allen mailto:nic
> k...@nickallen.org>> wrote:
> Thanks, Matt.
>
> Maybe you already have something that does this. I wrote a quick script
> that validates each JIRA since the last release tag to make sure they are
> marked "Done" and with the correct fix version. I would expect that for
> the next release, each JIRA should have status="Done",
fix-version="0.4.2".
>
> Unless I am mistaken, we have quite a few that need cleaned up. In the
> following output, any line that has a URL indicates that a fix is needed.
> Or it least, **I think** a fix is needed.
>
> To the community If you see your name with a URL next to it, it would
> be great if you could follow that link and fix the JIRA. Otherwise, I
will
> volunteer to help clean some of these up should some not get addressed.
>
>
> *$ ./validate-jira-for-release*
> *Cloning into 'metron-0.4.2'...*
> *remote: Counting objects: 35046, done.*
> *remote: Compressing objects: 100% (13698/13698), done.*
> *remote: Total 35046 (delta 15708), reused 31645 (delta 12822)*
> *Receiving objects: 100% (35046/35046), 53.05 MiB | 6.48 MiB/s, done.*
> *Resolving deltas: 100% (15708/15708), done.*
> *Fetching origin*
> * JIRA STATUS FIX VERSION
> ASSIGNEE FIX*
> * METRON-1345 Done Michael
> Miklavcic https://issues.apache.org/jira/browse/METRON-1345
> <https://issues.apache.org/jira/browse/METRON-1345>*
> * METRON-1349 Done Next + 1 Nick
> Allen https://issues.apache.org/jira/browse/METRON-1349
> <https://issues.apache.org/jira/browse/METRON-1349>*
> * METRON-1343 Done
> Mohan https://issues.apache.org/jira/browse/METRON-1343
> <https://issues.apache.org/jira/browse/METRON-1343>*
> * METRON-1306 To Do
> Unassigned https://issues.apache.org/jira/browse/METRON-1306
> <https://issues.apache.org/jira/browse/METRON-1306>*
> * METRON-1341 Done Simon Elliston
> Ball https://issues.apache.org/jira/browse/METRON-1341
> <https://issues.apache.org/jira/browse/METRON-1341>*
> * METRON-1313 Done Jon
> Zeolla https://issues.apache.org/jira/browse/METRON-1313
> <https://issues.apache.org/jira/browse/METRON-1313>*
> * METRON-1346 Done Otto
> Fowler https://issues.apache.org/jira/browse/METRON-1346
> <https://issues.apache.org/jira/browse/METRON-1346>*
> * METRON-1336 Done 0.4.2 Nick
> Allen*
> * METRON-1335 Done Anand
> Subramanian https://issues.apache.org/jira/browse/METRON-1335
> <https://issues.apache.org/jira/browse/METRON-1335>*
> * METRON-1308 Done Jon
> Zeolla https://issues.apache.org/jira/browse/METRON-1308
> <https://issues.apache.org/jira/browse/METRON-1308>*
> * METRON-1338 Done 0.4.2 Nick
> Allen*
> * METRON-1286 To Do 0.4.2
> Unassigned https://issues.apache.org/jira/browse/METRON-1286
> <https://issues.apache.org/jira/browse/METRON-1286>*
> * METRON-1334 Done 0.4.2 Nick
> Allen*
> * METRON-1277 Done Otto
> Fowler https://issues.apache.org/jira/browse/METRON-1277
> <https://issues.apache.org/jira/browse/METRON-1277>*
> * METRON-1239 To Do
> Unassigned https://issues.apache.org/jira/browse/METRON-1239
> <https://issues.apache.org/jira/browse/METRON-1239>*
> * METRON-1328 Done Anand
> Subramanian https://issues.apache.org/jira/browse/METRON-1328
> <https://issues.apache.org/jira/browse/METRON-1328>*
> * METRON-1333 Done Otto
> Fowler https://issues.apache.org/jira/browse/METRON-1333
> <https://issues.apache.org/jira/browse/METRON-1333>*
> * METRON-1252 Done
> RaghuMitra https://issues.apache.org/jira/browse/METRON-1252
> <https://issues.apache.org/jira/browse/METRON-1252>*
> * METRON-1316 To Do Next + 1
> Unassigned https://issues.apache.org/jira/browse/METRON-1316
> <https://issues.apache.org/jira/browse/METRON-1316>*
> * METRON-1088 Done Jon
> Zeolla https://issues.apache.org/jira/browse/METRON-1088
> <https://issues.apache.org/jira/browse/METRON-1088>*
> * METRON-1319 To Do Ryan
> Merriman 

Re: UPDATE MEETING December Developer Community Meeting

2017-12-18 Thread Otto Fowler
13 minute warning


On December 17, 2017 at 15:56:21, Nadir Hajiyani (nadir.hajiy...@gmail.com)
wrote:

Aah, just noticed it in the middle of the email - 12.30 pm EST - hit send
too early.

Thanks.

On Sun, Dec 17, 2017 at 2:54 PM, Nadir Hajiyani 
wrote:

> Hi,
> What time is this meeting at?
>
> Thanks.
>
> On Sun, Dec 17, 2017 at 1:32 PM, Otto Fowler 
> wrote:
>
>> We will be using this meeting
>>
>> Topic: Community zoom meeting
>> Time: this is a recurring meeting Meet anytime
>>
>> Join from PC, Mac, Linux, iOS or Android:
>> https://hortonworks.zoom.us/j/658498271
>>
>> Or join by phone:
>>
>> +1 669 900 6833  (US Toll) or +1 646 558 8656 > 646
>> 558 8656> (US Toll)
>> +1 877 853 5247  (US Toll Free)
>> +1 877 369 0926  (US Toll Free)
>> Meeting ID: 658 498 271
>> International numbers available:
>> https://hortonworks.zoom.us/zoomconference?m=y7M0gPfv8kRv3Wv
>> XHjXrpc3n3DyNqTMe
>>
>>
>> On December 17, 2017 at 13:05:50, Otto Fowler (ottobackwa...@gmail.com)
>> wrote:
>>
>> The December Community Meeting will be held Monday, December 18th.
>> These are the topics that are up for discussion
>>
>> - Call for reviewers, ideas how to get more involvement, what people
>> can
>> do to help (Otto)
>> - Feature branches : we have two now, what are they and how are we
>> going
>> to work on them (Otto)
>> - Release process WRT formalized upgrade and installation instructions
>> to be
>> included as a part of a release (JZeolla)
>> - Any concerns/questions
>> with the secondary repo for bro. (JZeolla)
>> - ES 5.6 upgrade (michael.miklav...@gmail.com)
>> - Release Status(michael.miklav...@gmail.com)
>> - Short Term Roadmap(michael.miklav...@gmail.com)
>>
>>
>> We may only have 40 minutes, so we’ll try to keep things concise, and
>> follow up with Discuss threads.
>>
>> *NOTE: IF THE ROOM CHANGES I WILL SEND AN UPDATE*
>>
>>
>> Topic: Metron Developer Community Meeting
>> Time: Dec 18, 2017 12:30 PM Eastern Time (US and Canada)
>>
>> Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/4534152036
>>
>> Or iPhone one-tap :
>> US: +16468769923,,4534152036# or +16699006833,,4534152036#
>> Or Telephone:
>> Dial(for higher quality, dial a number based on your current
>> location):
>> US: +1 646 876 9923 or +1 669 900 6833
>> Meeting ID: 453 415 2036
>> International numbers available:
>> https://zoom.us/zoomconference?m=iwzFkc-YD_msf1cfRJL21VDYsExP41jo
>>
>>
>> Developer Community Meeting Disclaimers
>>
>> - Developer Community meetings are a means for realtime discussion of
>> development issues
>> - These meetings are not specifically aimed at demonstrations, unless
>> one is required or requested as part of such discussion
>> - These meetings are geared towards Metron development issues, not user
>> issues with deployment or shipped functionality
>> - There are NO decisions made in these meetings. The mailing list is
>> the
>> official communication record of the Apache Metron Project, and as
>> such all
>> public decisions are to be made on the list, as to give the greatest
>> opportunity for community involvement.
>> - There ARE proposals that can be made and discussed in these meetings,
>> that will then be discussed on list for decision.
>> - Notes will be taken of these meetings, and they will be posted to the
>> list
>> - There may also be breakout posts to the list per proposal or topic,
>> for more detailed discussion
>>
>
>
>
> --
> Regards,
> Nadir Hajiyani
>



-- 
Regards,
Nadir Hajiyani


[MEETING NOTES] 12/18/17 Developer Community Meeting

2017-12-18 Thread Otto Fowler
2017-12-18 Dev Community Meeting

Agenda

   -

   Call for reviewers, ideas how to get more involvement, what people can
   do to help (Otto Fowler)
   -

   Feature branches : we have two now, what are they and how are we going
   to work on them (Otto Fowler)
   -

   ES 5.6 upgrade (Michael Miklavcic)
   -

   Release Status (Michael Miklavcic)
   -

   Short Term Roadmap (Michael Miklavcic)
   -

   Release process WRT formalized upgrade and installation instructions to
   be included as a part of a release (Jon Zeolla)
   -

   Any concerns/questions with the secondary repo for bro. (Jon Zeolla)


Attendees

Jon Zeolla, James Sirota, Matt Foley, Otto Fowler, Hakan Akansel, Justin
Leet, Michael Miklavcic, Nick Allen, Ryan Merriman, Laurens Vets

Discussion

Call for reviewers, ideas how to get more involvement, what people can do
to help (Otto)

   -

   Nick Agrees, suggests potentially simplifying the review process.
   -

   Certain stellar functions that implement certain algos are difficult to
   review properly and rely heavily on the initial implementer.
   -

   Michael suggests heavy focus/scrutiny of the testing/documentation of PRs
   -

   Otto suggests that the bar to spin up Metron may be too high, and could
   simplify the full-dev/PoC spin up.  Justin agrees.
   -

   Three suggested DISCUSS threads
   -

  What’s a better way for us to document reviews and contributions in
  Metron?
  1.

 How to overcome developer inertia for spinning up new envs such as
 testing ansible changes or similar
 -

  How can we lower the barrier to entry for new users to Metron?
  -

   Need to keep multiple PRs and feature branches top of mind to simplify
   review.


Feature Branches (Otto)

   -

   METRON-777
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-777?filter=allopenissues>
   and METRON-1344
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-1344?filter=allissues>
   -

   Are we all comfortable with how we use FBs?
   -

  Ryan, how do we manage follow-on PRs for a FB?
  -

  Try to avoid bugfixes that would be useful to master being put solely
  in a FB.
  -

  All FB processes should align with our policies to commit/review
  against master, with a slightly higher tolerance for instability.
  -

 I.e. Some interim steps may create regressions, but we should
 consider being comfortable with this in order to simplify review
 -

  Still feeling this out, maybe a future DISCUSS on how to determine if
  you should be creating a FB.
  -

  Consider FB-specific documentation to identify what/where/why/how.
  Otto has an example here
  
<https://cwiki.apache.org/confluence/display/METRON/Metron+Extension+System+and+Parser+Extensions>
  .


ES 5.6 upgrade (Michael)

   -

   Michael:  Should be ready for review, looking for testing, etc.  Could
   use help with a multinode instance, performance testing, etc..
   METRON-939
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-939?filter=allopenissues>
   (#840 <https://github.com/apache/metron/pull/840>)
   -

   Otto:  Unclear on what versions of ES Metron should run on.
   -

  Michael:  Looking to support only ES 5.6.2, unable to currently
  support multiple versions of ES due to the complexity/testing reqs.
  -

   Otto:  Unclear on status of #619
   <https://github.com/apache/metron/pull/619>.
   -

  Michael:  This is a subset of the Xpack work, and that xpack support
  is currently planning to be a follow-on.  Still using the
transport client
  from ES under the hood, which is not recommended (should move to the REST
  API client).
  -

  Otto:  We should keep the people who put together #619
  <https://github.com/apache/metron/pull/619> involved (i.e. understand
  their wants and needs) with the more recent ES 5 changes, and
any follow-on
  PRs.


Release Status and Short Term Roadmap (Michael)

   -

   Matt:  Looking to do in the near term, RC2.
   -

   Otto:  Should we have a skip one branch for bigger changes, instead of
   cherry-picking?  May help get larger changes into a release.


Release process WRT formalized upgrade and installation instructions to be
included as a part of a release (Jon)

   -

   Justin and Jon think that we need to improve our process to do
   Upgrading.md, install guide, and upgrading guides should be
   -

   Discuss thread on How to get Upgrade testing feasible or better
   technically


Any concerns/questions with the secondary repo for bro (Jon)

   -

   Jon:  Looking to receive feedback on the split, address any concerns,
   etc.
   -

   Nick:  We can probably plan to continue to align the release process
   with metron, can revisit if this becomes an issue.
   -

   Otto:  Until metron and the bro plugin are managed separately (i.e. if
   y

Re: [MEETING NOTES] 12/18/17 Developer Community Meeting

2017-12-18 Thread Otto Fowler
Please not:

1. Feel free to comment on anything here on the list
2. It is not Jon Zeolla’s fault that the formatting is wrong


On December 18, 2017 at 14:52:36, Otto Fowler (ottobackwa...@gmail.com)
wrote:

2017-12-18 Dev Community Meeting

Agenda

   -

   Call for reviewers, ideas how to get more involvement, what people can
   do to help (Otto Fowler)
   -

   Feature branches : we have two now, what are they and how are we going
   to work on them (Otto Fowler)
   -

   ES 5.6 upgrade (Michael Miklavcic)
   -

   Release Status (Michael Miklavcic)
   -

   Short Term Roadmap (Michael Miklavcic)
   -

   Release process WRT formalized upgrade and installation instructions to
   be included as a part of a release (Jon Zeolla)
   -

   Any concerns/questions with the secondary repo for bro. (Jon Zeolla)


Attendees

Jon Zeolla, James Sirota, Matt Foley, Otto Fowler, Hakan Akansel, Justin
Leet, Michael Miklavcic, Nick Allen, Ryan Merriman, Laurens Vets

Discussion

Call for reviewers, ideas how to get more involvement, what people can do
to help (Otto)

   -

   Nick Agrees, suggests potentially simplifying the review process.
   -

   Certain stellar functions that implement certain algos are difficult to
   review properly and rely heavily on the initial implementer.
   -

   Michael suggests heavy focus/scrutiny of the testing/documentation of PRs
   -

   Otto suggests that the bar to spin up Metron may be too high, and could
   simplify the full-dev/PoC spin up.  Justin agrees.
   -

   Three suggested DISCUSS threads
   -
  -

  What’s a better way for us to document reviews and contributions in
  Metron?
  -
 1.

 How to overcome developer inertia for spinning up new envs such as
 testing ansible changes or similar
 -

  How can we lower the barrier to entry for new users to Metron?
  -

   Need to keep multiple PRs and feature branches top of mind to simplify
   review.


Feature Branches (Otto)

   -

   METRON-777
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-777?filter=allopenissues>
   and METRON-1344
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-1344?filter=allissues>
   -

   Are we all comfortable with how we use FBs?
   -
  -

  Ryan, how do we manage follow-on PRs for a FB?
  -

  Try to avoid bugfixes that would be useful to master being put solely
  in a FB.
  -

  All FB processes should align with our policies to commit/review
  against master, with a slightly higher tolerance for instability.
  -
 -

 I.e. Some interim steps may create regressions, but we should
 consider being comfortable with this in order to simplify review
 -

  Still feeling this out, maybe a future DISCUSS on how to determine if
  you should be creating a FB.
  -

  Consider FB-specific documentation to identify what/where/why/how.
   Otto has an example here
  
<https://cwiki.apache.org/confluence/display/METRON/Metron+Extension+System+and+Parser+Extensions>
  .


ES 5.6 upgrade (Michael)

   -

   Michael:  Should be ready for review, looking for testing, etc.  Could
   use help with a multinode instance, performance testing, etc..
   METRON-939
   
<https://issues.apache.org/jira/projects/METRON/issues/METRON-939?filter=allopenissues>
   (#840 <https://github.com/apache/metron/pull/840>)
   -

   Otto:  Unclear on what versions of ES Metron should run on.
   -
  -

  Michael:  Looking to support only ES 5.6.2, unable to currently support
  multiple versions of ES due to the complexity/testing reqs.
  -

   Otto:  Unclear on status of #619
   <https://github.com/apache/metron/pull/619>.
   -
  -

  Michael:  This is a subset of the Xpack work, and that xpack support
  is currently planning to be a follow-on.  Still using the
transport client
  from ES under the hood, which is not recommended (should move to the REST
  API client).
  -

  Otto:  We should keep the people who put together #619
  <https://github.com/apache/metron/pull/619> involved (i.e. understand
  their wants and needs) with the more recent ES 5 changes, and
any follow-on
  PRs.


Release Status and Short Term Roadmap (Michael)

   -

   Matt:  Looking to do in the near term, RC2.
   -

   Otto:  Should we have a skip one branch for bigger changes, instead of
   cherry-picking?  May help get larger changes into a release.


Release process WRT formalized upgrade and installation instructions to be
included as a part of a release (Jon)

   -

   Justin and Jon think that we need to improve our process to do
   Upgrading.md, install guide, and upgrading guides should be
   -

   Discuss thread on How to get Upgrade testing feasible or better
   technically


Any concerns/questions with the secondary repo for bro (Jon)

   -

   Jon:  Looking to receive f

Re: [DISCUSS] Overcoming developer inertia when spinning up new environments

2017-12-18 Thread Otto Fowler
A bus factor of > 1?

One main requirement would be that the implementation of the deployment has
to be done and documented
in such a way that it is maintainable.  I *think* if this was done in
chucks, and under review and better yet
more collaboration, that would be possible.

Another would be some kind of CI testing for regression.


On December 18, 2017 at 18:46:10, Ryan Merriman (merrim...@gmail.com) wrote:

I want to revisit the idea of providing an alternative container-based
approach (Docker, Kubernetes, etc) to spinning up Metron that is faster and
uses less resources (a "Metron light"). This would provide a way for
reviewers to more quickly review and test out changes. Full dev with
ansible will still serve it's purpose, this would just be another tool for
cases where full dev is not the best fit.

This would be a new non-trivial module that will need to be maintained.
There have been discussions in the past that resulted in the community not
wanting to maintain another installation path. However it has been a while
since we had those discussions and Metron is now more mature. We would
also be able to leverage the work already being done in
https://issues.apache.org/jira/browse/METRON-1352 to unify the integration
testing infrastructure.

There are other potential use cases for this too. It could be expanded to
provide a demo environment for exploring the UIs and Metron API. Providing
container support for Metron could also be the beginning of a broader cloud
deployment strategy.

Is this something we want to explore? What would the requirements be?


Re: [VOTE] Metron Release Candidate 0.4.2-RC2

2017-12-19 Thread Otto Fowler
Ran ->
https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check

Download -> Fine
Key, signing check -> Fine
Build, all tests, build rpms -> Fine
Full Dev -> deployed
Verified -> Ambari, Storm-ui, Config-UI, Kibana dashboards


+1 ( binding )


On December 19, 2017 at 06:23:26, Matt Foley (ma...@apache.org) wrote:

Colleagues,
This is a call to vote on releasing Apache Metron 0.4.2 and its associated
metron-bro-plugin-kafka 0.1.0.
The release candidate is available at
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/

Full list of changes in this release:
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/CHANGES and
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/CHANGES.bro-plugin

The github tags to be voted upon are:
(apache/metron) apache-metron-0.4.2-rc2 and
(apache/metron-bro-plugin-kafka) 0.1

The source archives being voted upon can be found here:
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/apache-metron-0.4.2-rc2.tar.gz
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/apache-metron-bro-plugin-kafka_0.1.0.tar.gz

The site-book is at:
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/site-book/index.html

Other release files, signatures and digests can be found here:
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/

The release artifacts are signed with the following key:
4169 AA27 ECB3 1663 in
https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/KEYS

Please vote on releasing this package as Apache Metron 0.4.2 and Apache
Metron-bro-plugin-kafka 0.1.0

When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted here:
https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
or you are encouraged to try the new release verification script that Otto
published via email on 11 Dec, available at
https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/metron/metron-rc-check

This vote will be open until 9am PST on Friday 22 Dec 2017.

Thank you,
--Matt


[DISCUSS] Resources for how to contribute to Apache Metron

2017-12-19 Thread Otto Fowler
Like any project, Apache Metron needs to maintain and grow it’s contributor
community. We think that we could be doing a better job of this, and would
like to discuss issues and possible improvements. Issues

What are some of the issues that may inhibit people contributing?

   - Barrier of entry (issues getting Metron running in vagrant or local)
   - Documentation : finding current
   - Documentation : content and quality
   - Source Code navigation/documentation/guides
   - Testing guides
   - Use Case Guides
   - Don’t know how they *can* contribute
   - Others that I’m missing?

Remediation Barrier of entry

How can we make the local deployment workflow easier ( other discuss thread
touches on this)?
Documentation : Finding Current

When people look for Metron info, where are they looking? What comes up in
search? - Hortonworks Community forums ( preview release stuff ? ), old
blog posts? - Mailing list archives? - wiki? (not current) - site-book?

How can we reduce the out of data information, and make the relevant
information more prominent?
Documentation : Content and Quality

( this is a little bit of a chicken and egg issue, since documentation is a
wonderful way to contribute…. ) - Up to data architecture documentation -
Non-developer focused ‘feature’ documentation - Developer focused
documentation ( how to add a XX guides )
Source Code Guides

   - Structure of the code tree
   - What is where, how it is logically setup
   - How to maintain concistancy when working in the code
   - Javadoc

Testing Guides

   - Tests that we have are buried in PR’s
   - No regression tests

Use case guides

   - more how-to guides

Contributing guide

   - right now, have dev env guide
   - review and submit doc changes
   - review PR guide
   - pr testing guide ( better pr testing steps?)

These are things I can think of, anyone have any comment, additions,
priorities?


Re: [DISCUSS] Stellar in a Zeppelin Notebook

2017-12-19 Thread Otto Fowler
The image is stripped for me, can you post it as a link?

This seems like it would look awesome ;)


On December 19, 2017 at 10:03:26, Nick Allen (n...@nickallen.org) wrote:

(1) I love the REPL, but I hate how inaccessible it is.

(2) I love our use cases

and
examples
,
but I hate how difficult it is for a new user to run them.

(3) I love the extensibility of Metron, but I hate looking at JSON.

(4) I love the Profiler, but I hate not being able to *see* my profiles as
plots.

...

Let me introduce, Stellar running in a Zeppelin Notebook.

(1) Access the REPL from any web browser.

(2) Create executable use cases that can be easily shared between users.

(3) Use the simpler management functions to interact with Metron (less
JSON).

(4) Extract your profiles and create a time series plot.



[image: Inline image 1]
The screenshot above is a very lightweight MVP showing that we can run
Stellar from Zeppelin.  I have a lot more work ahead in refactoring the
existing Stellar Shell/REPL functionality so that we get the same
experience in Zeppelin as we get on the command line.


Re: [DISCUSS] Stellar in a Zeppelin Notebook

2017-12-19 Thread Otto Fowler
That looks great!


On December 19, 2017 at 12:34:47, Nick Allen (n...@nickallen.org) wrote:

Ah, dang.  Hopefully this works...

https://www.dropbox.com/s/44qz3518dn4jtzq/Stellar%20in%20a%20Zeppelin%20Notebook.png?dl=0

On Tue, Dec 19, 2017 at 10:23 AM, Otto Fowler 
wrote:

> The image is stripped for me, can you post it as a link?
>
> This seems like it would look awesome ;)
>
>
> On December 19, 2017 at 10:03:26, Nick Allen (n...@nickallen.org) wrote:
>
> (1) I love the REPL, but I hate how inaccessible it is.
>
> (2) I love our use cases
> <https://github.com/apache/metron/tree/master/use-cases/geographic_login_outliers>
>  and
> examples
> <https://github.com/apache/metron/tree/master/metron-analytics/metron-profiler#creating-profiles>,
> but I hate how difficult it is for a new user to run them.
>
> (3) I love the extensibility of Metron, but I hate looking at JSON.
>
> (4) I love the Profiler, but I hate not being able to *see* my profiles as
> plots.
>
> ...
>
> Let me introduce, Stellar running in a Zeppelin Notebook.
>
> (1) Access the REPL from any web browser.
>
> (2) Create executable use cases that can be easily shared between users.
>
> (3) Use the simpler management functions to interact with Metron (less
> JSON).
>
> (4) Extract your profiles and create a time series plot.
>
>
>
> [image: Inline image 1]
> The screenshot above is a very lightweight MVP showing that we can run
> Stellar from Zeppelin.  I have a lot more work ahead in refactoring the
> existing Stellar Shell/REPL functionality so that we get the same
> experience in Zeppelin as we get on the command line.
>
>
>
>
>
>
>
>


Re: Metron nested object

2017-12-21 Thread Otto Fowler
I believe right now you have to flatten.
The jsonMap parser does this.


On December 21, 2017 at 08:28:13, Ali Nazemian (alinazem...@gmail.com)
wrote:

Hi all,


We have recently faced some data sources that generate data in a nested
format. For example, AWS Cloudtrail generates data in the following JSON
format:

{

"Records": [

{

"eventVersion": *"2.0"*,

"userIdentity": {

"type": *"IAMUser"*,

"principalId": *"EX_PRINCIPAL_ID"*,

"arn": *"arn:aws:iam::123456789012:user/Alice"*,

"accessKeyId": *"EXAMPLE_KEY_ID"*,

"accountId": *"123456789012"*,

"userName": *"Alice"*

},

"eventTime": *"2014-03-07T21:22:54Z"*,

"eventSource": *"ec2.amazonaws.com "*,

"eventName": *"StartInstances"*,

"awsRegion": *"us-east-2"*,

"sourceIPAddress": *"205.251.233.176"*,

"userAgent": *"ec2-api-tools 1.6.12.2"*,

"requestParameters": {

"instancesSet": {

"items": [

{

"instanceId": *"i-ebeaf9e2"*

}

]

}

},

"responseElements": {

"instancesSet": {

"items": [

{

"instanceId": *"i-ebeaf9e2"*,

"currentState": {

"code": 0,

"name": *"pending"*

},

"previousState": {

"code": 80,

"name": *"stopped"*

}

}

]

}

}

}

]

}


We are able to make this as a flat JSON file. However, a nested object is
supported by data backends in Metron (ES, ORC, etc.), so I was wondering
whether with the current version of Metron we are able to index nested
documents or we have to make it flat?



Cheers,

Ali


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2017-12-24 Thread Otto Fowler
1st.  You are not only one looking at the list on Dec 24th Casey, so don’t
feel bad.


2nd.  Maybe we can separate this into 2 areas of concern.

1. Stellar can load objects into ‘caches’ from some repository and refer to
them.
2. The repositories
3. Some number of strategies to populate and possibly update the
repository, from spark,
to MR jobs to whatever you would classify the flat file stuff as.

wait, separate this into 3, 3 areas of concern!

1. Stellar can load objects into ‘caches’ from some repository and refer to
them.
2. The repositories
3. Some number of strategies to populate and possibly update the
repository, from spark,
to MR jobs to whatever you would classify the flat file stuff as.
4. Let the Stellar API for everything but LOAD() follow after we get usage

4!  4 areas of concern..

wait, I’ll write another reply….




(Nobody expected that!)



On December 24, 2017 at 20:47:17, Casey Stella (ceste...@gmail.com) wrote:

Oh, one more thing, while the example here is around typosquatting, this is
of use outside of that. Pretty much any large existence-style query can be
enabled via this construction (create a summary bloom filter). There are
other use-cases involving other data structures too.

On Sun, Dec 24, 2017 at 8:20 PM, Casey Stella  wrote:

> Hi all,
>
> I wanted to get some feedback on a sensible plan for something. It
> occurred to me the other day when considering the use-case of detecting
> typosquatted domains, that one approach was to generate the set of
> typosquatted domains for some set of reference domains and compare
domains
> as they flow through.
>
> One way we could do this would be to generate this data and import the
> typosquatted domains into HBase. I thought, however, that another
approach
> which may trade-off accuracy to remove the network hop and potential disk
> seek by constructing a bloom filter that includes the set of typosquatted
> domains.
>
> The challenge was that we don't have a way to do this currently. We do,
> however, have a loading infrastructure (e.g. the flatfile_loader) and
> configuration (see https://github.com/apache/metron/tree/master/metron-
> platform/metron-data-management#common-extractor-properties) which
> handles:
>
> - parsing flat files
> - transforming the rows
> - filtering the rows
>
> To enable the new use-case of generating a summary object (e.g. a bloom
> filter), in METRON-1378 (https://github.com/apache/metron/pull/879) I
> propose that we create a new utility that uses the same extractor config
> add the ability to:
>
> - initialize a state object
> - update the object for every row
> - merge the state objects (in the case of multiple threads, in the
> case of one thread it's not needed).
>
> I think this is a sensible decision because:
>
> - It's a minimal movement from the flat file loader
> - Uses the same configs
> - Abstracts and reuses the existing infrastructure
> - Having one extractor config means that it should be easier to
> generate a UI around this to simplify the experience
>
> All that being said, our extractor config is..shall we say...daunting :).
> I am sensitive to the fact that this adds to an existing difficult
config.
> I propose that this is an initial step forward to support the use-case
and
> we can enable something more composable going forward. My concern in
> considering this as the first step was that it felt that the composable
> units for data transformation and manipulation suddenly takes us into a
> place where Stellar starts to look like Pig or Spark RDD API. I wasn't
> ready for that without a lot more discussion.
>
> To summarize, what I'd like to get from the community is, after reviewing
> the entire use-case at https://github.com/cestella/incubator-metron/tree/
> typosquat_merge/use-cases/typosquat_detection:
>
> - Is this so confusing that it does not belong in Metron even as a
> first-step?
> - Is there a way to extend the extractor config in a less confusing
> way to enable this?
>
> I apologize for making the discuss thread *after* the JIRAs, but I felt
> this one might bear having some working code to consider.
>


Re: [DISCUSS] Resources for how to contribute to Apache Metron

2018-01-02 Thread Otto Fowler
Bump  Anyone have any thoughts?


On December 20, 2017 at 10:37:03, Casey Stella (ceste...@gmail.com) wrote:

That's really good feedback, Jon. I agree that we have a significant
barrier to get to the point of tinkering. Full-dev really wasn't intended
to be that entry point; it's more of a way to test PRs in something
resembling a realistic scenario (and it is still not super realistic). I
would welcome creative ideas around how to accomplish that goal.

On Wed, Dec 20, 2017 at 10:15 AM, zeo...@gmail.com 
wrote:

> For nearly everybody I've talked to about this project that had
complaints,
> I've heard something about the significant barrier to entry, divided into
> two general categories. Category 1 is that a lot of security teams lack
> substantial experience with Hadoop and would like to get a better
> understanding of how the involved components fit together - not
> just kafka goes to storm goes to kafka, or a link to the kafka docs for
> details about kafka, but a little bit more detail as to _why_ those
> components are in use in metron, what properties those components possess
> at a high level _which makes them appealing to us_, and how they're
> _currently used_ in the metron environment. Category 2 is that it is
> generally more difficult than it should be to get a testing/poc
environment
> running - running it on a laptop (especially non-macOS) can be a pain to
> get running, some laptops simply cannot run it, etc. I've heard a few
> times that a company uses Azure (not AWS) and they would like to quickly
> spin it up there.
>
> Just my $0.02
>
> Jon
>
> On Tue, Dec 19, 2017 at 9:02 AM Otto Fowler 
> wrote:
>
> > Like any project, Apache Metron needs to maintain and grow it’s
> contributor
> > community. We think that we could be doing a better job of this, and
> would
> > like to discuss issues and possible improvements. Issues
> >
> > What are some of the issues that may inhibit people contributing?
> >
> > - Barrier of entry (issues getting Metron running in vagrant or local)
> > - Documentation : finding current
> > - Documentation : content and quality
> > - Source Code navigation/documentation/guides
> > - Testing guides
> > - Use Case Guides
> > - Don’t know how they *can* contribute
> > - Others that I’m missing?
> >
> > Remediation Barrier of entry
> >
> > How can we make the local deployment workflow easier ( other discuss
> thread
> > touches on this)?
> > Documentation : Finding Current
> >
> > When people look for Metron info, where are they looking? What comes up
> in
> > search? - Hortonworks Community forums ( preview release stuff ? ), old
> > blog posts? - Mailing list archives? - wiki? (not current) - site-book?
> >
> > How can we reduce the out of data information, and make the relevant
> > information more prominent?
> > Documentation : Content and Quality
> >
> > ( this is a little bit of a chicken and egg issue, since documentation
> is a
> > wonderful way to contribute…. ) - Up to data architecture documentation
-
> > Non-developer focused ‘feature’ documentation - Developer focused
> > documentation ( how to add a XX guides )
> > Source Code Guides
> >
> > - Structure of the code tree
> > - What is where, how it is logically setup
> > - How to maintain concistancy when working in the code
> > - Javadoc
> >
> > Testing Guides
> >
> > - Tests that we have are buried in PR’s
> > - No regression tests
> >
> > Use case guides
> >
> > - more how-to guides
> >
> > Contributing guide
> >
> > - right now, have dev env guide
> > - review and submit doc changes
> > - review PR guide
> > - pr testing guide ( better pr testing steps?)
> >
> > These are things I can think of, anyone have any comment, additions,
> > priorities?
> >
> --
>
> Jon
>


Re: [ANNOUNCE] Apache Metron release 0.4.2 and Apache Metron bro plugin for Kafka release 0.1

2018-01-04 Thread Otto Fowler
Thank you Matt, and congratulations everyone!


On January 4, 2018 at 16:11:50, Matt Foley (ma...@apache.org) wrote:

Metron Community: Happy New Year.

I’m happy to announce the release of Metron 0.4.2. A great deal of work
from across the community went into this, with over 100 enhancements,
improvements, and bug fixes since 0.4.1. Thanks to all contributors, and
may all users enjoy the new features!

This release also includes the first official release of the
apache-metron-bro-plugin-kafka, version 0.1.

Details:
The official release source code tarballs may be obtained at any of the
mirrors listed in
http://www.apache.org/dyn/closer.cgi/metron/0.4.2/

As usual, the secure signatures and confirming hashes may be obtained at
https://dist.apache.org/repos/dist/release/metron/0.4.2/

The release branches in github are
https://github.com/apache/metron/tree/Metron_0.4.2 (tag
apache-metron-0.4.2-release)
https://github.com/apache/metron-bro-plugin-kafka/tree/0.1 (tag 0.1)

The release doc book is at http://metron.apache.org/current-book/index.html
The Apache Metron web site at http://metron.apache.org/ has been updated;
please refresh your web browser cache if the new links do not immediately
appear.

Change lists and Release Notes may be obtained at the same locations as the
tarballs.
For your reading pleasure, the change list is appended to this message.

Best regards,
--Matt Foley
release manager

Metron CHANGES (in reverse chron order):
METRON-1373 RAT failure for metron-interface/metron-alerts (mattf-horton)
closes apache/metron#875
METRON-1313 Update metron-deployment to use bro-pkg to install the kafka
plugin (JonZeolla) closes apache/metron#847
METRON-1346 Add new PMC members to web site (ottobackwards) closes
apache/metron#860
METRON-1336 Patching Can Result in Bad Configuration (nickwallen) closes
apache/metron#851
METRON-1335 Install metron-maas-service RPM as a part of the full-dev
deployment (anandsubbu via ottobackwards) closes apache/metron#850
METRON-1308 Fix Metron Documentation (JonZeolla) closes apache/metron#836
METRON-1338 Rat Check Should Ignore Vagrant Retry Files (nickwallen) closes
apache/metron#855
METRON-1286 Add MIN & MAX Stellar functions (jasper-k via justinleet)
closes apache/metron#823
METRON-1334 Add C++11 Compliance Check to platform-info.sh (nickwallen)
closes apache/metron#849
METRON-1277 Add match statement to Stellar language closes
apache/incubator-metron#814
METRON-1239 Drop extra dev environments (nickwallen) closes
apache/metron#852
METRON-1328 Enhance platform-info.sh script to check if docker daemon is
running (anandsubbu via nickwallen) closes apache/metron#846
METRON-1333 Ansible-Docker can no longer build metron (ottobackwards)
closes apache/metron#848
METRON-1252 Build UI for grouping alerts into meta-alerts (iraghumitra via
nickwallen) closes apache/metron#803
METRON-1316 Fastcapa Fails to Compile in Test Environment (nickwallen)
closes apache/metron#841
METRON-1088 Upgrade bro to 2.5.2 (JonZeolla) closes apache/metron#844
METRON-1319 Column Metadata REST service should use default indices on
empty input (merrimanr) closes apache/metron#843
METRON-1321 Metaalert Threat Score Type Does Not Match Sensor Indices
(nickwallen) closes apache/metron#845
METRON-1301 Alerts UI - Sorting on Triage Score Unexpectedly Filters Some
Records (nickwallen) closes apache/metron#832
METRON-1294 IP addresses are not formatted correctly in facet and group
results (merrimanr) closes apache/metron#827
METRON-1291 Kafka produce REST endpoint does not work in a Kerberized
cluster (merrimanr) closes apache/metron#826
METRON-1290 Only first 10 alerts are update when a MetaAlert status is
changed to inactive (justinleet) closes apache/metron#842
METRON-1311 Service Check Should Check Elasticsearch Index Templates
(nickwallen) closes apache/metron#839
METRON-1289 Alert fields are lost when a MetaAlert is created (merrimanr)
closes apache/metron#824
METRON-1309 Change metron-deployment to pull the plugin from
apache/metron-bro-plugin-kafka (JonZeolla) closes apache/metron#837
METRON-1310 Template Delete Action Deletes Search Indices (nickwallen)
closes apache/metron#838
METRON-1275 Fix Metron Documentation closes apache/incubator-metron#833
METRON-1295 Unable to Configure Logging for REST API (nickwallen) closes
apache/metron#828
METRON-1307 Force install of java8 since java9 does not appear to work with
the scripts (brianhurley via ottobackwards) closes apache/metron#835
METRON-1296 Full Dev Fails to Deploy Index Templates (nickwallen via
cestella) closes apache/incubator-metron#829
METRON-1281 Remove hard-coded indices from the Alerts UI (merrimanr) closes
apache/metron#821
METRON-1287 Full Dev Fails When Installing EPEL Repository (nickwallen)
closes apache/metron#820
METRON-1267 Alerts UI returns a 404 when refreshing the alerts-list page
(iraghumitra via merrimanr) closes apache/metron#819
METRON-1283 Install Elasticsearch template as a part of the mpack startup
scripts (anandsubbu vi

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Otto Fowler
If we separate the concerns as I have state previously :

1. Stellar can load objects into ‘caches’ from some repository and refer to
them.
2. The repositories
3. Some number of strategies to populate and possibly update the
repository, from spark,
to MR jobs to whatever you would classify the flat file stuff as.
4. Let the Stellar API for everything but LOAD() follow after we get usage

Then the particulars of ‘3’ are less important.



On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com) wrote:

I agree with the general sentiment that we can tailor specific use cases
via UI, and I'm worried that the use case specific solution (particularly
in light of the note that it's not even general to the class of bloom
filter problems, let alone an actually general problem) becomes more work
than this as soon as about 2 more uses cases actually get realized.
Pushing that to the UI lets people solve a variety of problems if they
really want to dig in, while still giving flexibility to provide a more
tailored experience for what we discover the 80% cases are in practice.

Keeping in mind I am mostly unfamiliar with the extractor config itself, I
am wondering if it makes sense to split up the config a bit. While a lot
of implementation details are shared, maybe the extractor config itself
should be refactored into a couple parts analogous to ETL (as a follow on
task, I think if this is true, it predates Casey's proposed change). It
doesn't necessarily make it less complex, but it might make it more easily
digestible if it's split up by idea (parsing, transformation, etc.).

Re: Mike's point, I don't think we want the actual processing broken up as
ETL, but the representation to the user in terms of configuration could be
similar (Since we're already doing parsing and transformation). We don't
have to implement it as an ETL pipeline, but it does potentially offer the
user a way to quickly grasp what the JSON blob is actually specifying.
Making it easy to understand, even if it's not the ideal way to interact is
potentially still a win.

On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I mentioned this earlier, but I'll reiterate that I think this approach
> gives us the ability to make specific use cases via a UI, or other
> interface should we choose to add one, while keeping the core adaptable
and
> flexible. This is ideal for middle tier as I think this effectively gives
> us the ability to pivot to other use cases very easily while not being so
> generic as to be useless. The fact that you were able to create this as
> quickly as you did seems to me directly related to the fact we made the
> decision to keep the loader somewhat flexible rather than very specific.
> The operation ordering and state carry from one phase of processing to
the
> next would simply have been inscrutable, if not impossible, with a CLI
> option-only approach. Sure, it's not as simple as "put infile.txt
> outfile.txt", but the alternatives are not that clear either. One might
> argue we could split up the processing pieces as in traditional Hadoop,
eg
> ETL: Sqoop ingest -> HDFS -> mapreduce, pig, hive, or spark transform.
But
> quite frankly that's going in the *opposite* direction I think we want
> here. That's more complex in terms of moving parts. The config approach
> with pluggable Stellar insulates users from specific implementations, but
> also gives you the ability to pass lower level constructs, eg Spark SQL
or
> HiveQL, should the need arise.
>
> In summary, my impressions are that at this point the features and level
of
> abstraction feel appropriate to me. I think it buys us 1) learning from a
> starting typosquatting use case, 2) flexibility to change and adapt it
> without affecting users, and 3) enough concrete capability to make more
> specific use cases easy to deliver with a UI.
>
> Cheers,
> Mike
>
> On Jan 4, 2018 9:59 AM, "Casey Stella"  wrote:
>
> > It also occurs to me that even in this situation, it's not a sufficient
> > generalization for just Bloom, but this is a bloom filter of the output
> of
> > the all the typosquatted domains for the domain in each row. If we
> wanted
> > to hard code, we'd have to hard code specifically the bloom filter
*for*
> > typosquatting use-case. Hard coding this would prevent things like
bloom
> > filters containing malicious IPs from a reference source, for instance.
> >
> > On Thu, Jan 4, 2018 at 10:46 AM, Casey Stella 
> wrote:
> >
> > > So, there is value outside of just bloom usage. The most specific
> > example
> > > of this would be in order to configure a bloom filter, we need to
know
> at
> > > least an upper bound of the number of items that are going to be
added
> to
> > > the bloom filter. In order to do that, we need to count the number of
> > > typosquatted domains. Specifically at https://github.com/
> > > cestella/incubator-metron/tree/typosquat_merge/use-
> > > cases/typosquat_detection#configure-the-bloom-filt

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Otto Fowler
I would imagine the ‘stellar-object-repo’ would be part of the global
configuration or configuration passed to the command.
why specify in the function itself?




On January 5, 2018 at 11:22:32, Casey Stella (ceste...@gmail.com) wrote:

I like that, specifically the repositories abstraction. Perhaps we can
construct some longer term JIRAs for extensions.
For the current state of affairs (wrt to the OBJECT_GET call) I was
imagining the simple default HDFS solution as a first cut and
following on adding a repository name (e.g. OBJECT_GET(path, repo_name)
with repo_name being optional and defaulting to HDFS
for backwards compatibility.

In effect, this would be the next step that I'm proposing OBJECT_GET(paths,
repo_name, repo_config) which would be backwards compatible

- paths - a single path or a list of paths (if a list, then a list of
objects returned)
- repo_name - optional name for repo, defaulted to HDFS if we don't
specify
- repo_config - optional config map


This would open things like:

- OBJECT_GET('key', 'HBASE', { 'hbase.table' : 'table', 'hbase.cf' :
'cf'} ) -- pulling from HBase

Eventually we might also be able to fold ENRICHMENT_GET as just a special
repo instance.

On Fri, Jan 5, 2018 at 10:26 AM, Otto Fowler 
wrote:

> If we separate the concerns as I have state previously :
>
> 1. Stellar can load objects into ‘caches’ from some repository and refer
to
> them.
> 2. The repositories
> 3. Some number of strategies to populate and possibly update the
> repository, from spark,
> to MR jobs to whatever you would classify the flat file stuff as.
> 4. Let the Stellar API for everything but LOAD() follow after we get
usage
>
> Then the particulars of ‘3’ are less important.
>
>
>
> On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com)
wrote:
>
> I agree with the general sentiment that we can tailor specific use cases
> via UI, and I'm worried that the use case specific solution (particularly
> in light of the note that it's not even general to the class of bloom
> filter problems, let alone an actually general problem) becomes more work
> than this as soon as about 2 more uses cases actually get realized.
> Pushing that to the UI lets people solve a variety of problems if they
> really want to dig in, while still giving flexibility to provide a more
> tailored experience for what we discover the 80% cases are in practice.
>
> Keeping in mind I am mostly unfamiliar with the extractor config itself,
I
> am wondering if it makes sense to split up the config a bit. While a lot
> of implementation details are shared, maybe the extractor config itself
> should be refactored into a couple parts analogous to ETL (as a follow on
> task, I think if this is true, it predates Casey's proposed change). It
> doesn't necessarily make it less complex, but it might make it more
easily
> digestible if it's split up by idea (parsing, transformation, etc.).
>
> Re: Mike's point, I don't think we want the actual processing broken up
as
> ETL, but the representation to the user in terms of configuration could
be
> similar (Since we're already doing parsing and transformation). We don't
> have to implement it as an ETL pipeline, but it does potentially offer
the
> user a way to quickly grasp what the JSON blob is actually specifying.
> Making it easy to understand, even if it's not the ideal way to interact
is
> potentially still a win.
>
> On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I mentioned this earlier, but I'll reiterate that I think this approach
> > gives us the ability to make specific use cases via a UI, or other
> > interface should we choose to add one, while keeping the core adaptable
> and
> > flexible. This is ideal for middle tier as I think this effectively
gives
> > us the ability to pivot to other use cases very easily while not being
so
> > generic as to be useless. The fact that you were able to create this as
> > quickly as you did seems to me directly related to the fact we made the
> > decision to keep the loader somewhat flexible rather than very
specific.
> > The operation ordering and state carry from one phase of processing to
> the
> > next would simply have been inscrutable, if not impossible, with a CLI
> > option-only approach. Sure, it's not as simple as "put infile.txt
> > outfile.txt", but the alternatives are not that clear either. One might
> > argue we could split up the processing pieces as in traditional Hadoop,
> eg
> > ETL: Sqoop ingest -> HDFS -> mapreduce, pig, hive, or spark transform.
> But
> > quite frankly that's goi

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Otto Fowler
I would say that at the stellar author level, you would just get objects
from the store and the ‘override’ case would be a follow on for edge cases.


On January 5, 2018 at 14:29:16, Casey Stella (ceste...@gmail.com) wrote:

Well, you can pull the default configs from global configs, but you might
want to override them (similar to the profiler).  For instance, you might
want to interact with another hbase table than the one globally configured.

On Fri, Jan 5, 2018 at 12:04 PM, Otto Fowler 
wrote:

> I would imagine the ‘stellar-object-repo’ would be part of the global
> configuration or configuration passed to the command.
> why specify in the function itself?
>
>
>
>
> On January 5, 2018 at 11:22:32, Casey Stella (ceste...@gmail.com) wrote:
>
> I like that, specifically the repositories abstraction. Perhaps we can
> construct some longer term JIRAs for extensions.
> For the current state of affairs (wrt to the OBJECT_GET call) I was
> imagining the simple default HDFS solution as a first cut and
> following on adding a repository name (e.g. OBJECT_GET(path, repo_name)
> with repo_name being optional and defaulting to HDFS
> for backwards compatibility.
>
> In effect, this would be the next step that I'm proposing OBJECT_GET(paths,
> repo_name, repo_config) which would be backwards compatible
>
> - paths - a single path or a list of paths (if a list, then a list of
> objects returned)
> - repo_name - optional name for repo, defaulted to HDFS if we don't
> specify
> - repo_config - optional config map
>
>
> This would open things like:
>
> - OBJECT_GET('key', 'HBASE', { 'hbase.table' : 'table', 'hbase.cf' :
> 'cf'} ) -- pulling from HBase
>
> Eventually we might also be able to fold ENRICHMENT_GET as just a special
> repo instance.
>
> On Fri, Jan 5, 2018 at 10:26 AM, Otto Fowler 
> wrote:
>
> > If we separate the concerns as I have state previously :
> >
> > 1. Stellar can load objects into ‘caches’ from some repository and refer
> to
> > them.
> > 2. The repositories
> > 3. Some number of strategies to populate and possibly update the
> > repository, from spark,
> > to MR jobs to whatever you would classify the flat file stuff as.
> > 4. Let the Stellar API for everything but LOAD() follow after we get
> usage
> >
> > Then the particulars of ‘3’ are less important.
> >
> >
> >
> > On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com)
> wrote:
> >
> > I agree with the general sentiment that we can tailor specific use cases
> > via UI, and I'm worried that the use case specific solution (particularly
> > in light of the note that it's not even general to the class of bloom
> > filter problems, let alone an actually general problem) becomes more work
> > than this as soon as about 2 more uses cases actually get realized.
> > Pushing that to the UI lets people solve a variety of problems if they
> > really want to dig in, while still giving flexibility to provide a more
> > tailored experience for what we discover the 80% cases are in practice.
> >
> > Keeping in mind I am mostly unfamiliar with the extractor config itself,
> I
> > am wondering if it makes sense to split up the config a bit. While a lot
> > of implementation details are shared, maybe the extractor config itself
> > should be refactored into a couple parts analogous to ETL (as a follow on
> > task, I think if this is true, it predates Casey's proposed change). It
> > doesn't necessarily make it less complex, but it might make it more
> easily
> > digestible if it's split up by idea (parsing, transformation, etc.).
> >
> > Re: Mike's point, I don't think we want the actual processing broken up
> as
> > ETL, but the representation to the user in terms of configuration could
> be
> > similar (Since we're already doing parsing and transformation). We don't
> > have to implement it as an ETL pipeline, but it does potentially offer
> the
> > user a way to quickly grasp what the JSON blob is actually specifying.
> > Making it easy to understand, even if it's not the ideal way to interact
> is
> > potentially still a win.
> >
> > On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I mentioned this earlier, but I'll reiterate that I think this approach
> > > gives us the ability to make specific use cases via a UI, or other
> > > interface should we choose to add one, while keeping the core adaptable
> > and
> > > flexi

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Otto Fowler
Yes, abstracted.

We have an api of stellar functions that just load things from the store,
they don’t need to bleed through what the store is.
We have a ‘store’, which may be hdfs or hbase or whatever.
We have an api for adding to the store ( add etc ) that doesn’t
presume the store either.
Then we can have whatever long or short term hard to configure thing to
push to the store that we can imagine.




On January 5, 2018 at 14:16:52, Michael Miklavcic (
michael.miklav...@gmail.com) wrote:

I'm not sure I follow what you're saying as it pertains to summary objects.
Repository is a loaded term, and I'm very apprehensive of pushing for
something potentially very complex where a simpler solution would suffice
in the short term. To wit, the items I'm seeing in this use case doc -
https://github.com/cestella/incubator-metron/tree/typosquat_merge/use-cases/typosquat_detection
- don't preclude the 4 capabilities you've enumerated. Am I missing
something, or can you provide more context? My best guess is that rather
than referring to a specific HDFS path for a serialized object, you're
suggesting we provide a more abstract method for serializing/deserializing
objects to/from a variety of sources. Am I in the ballpark? I'd be in favor
of expanding functionality for such a thing provided a sensible default (ie
HDFS) is provided in the short-term.

On Fri, Jan 5, 2018 at 8:26 AM, Otto Fowler 
wrote:

> If we separate the concerns as I have state previously :
>
> 1. Stellar can load objects into ‘caches’ from some repository and refer
to
> them.
> 2. The repositories
> 3. Some number of strategies to populate and possibly update the
> repository, from spark,
> to MR jobs to whatever you would classify the flat file stuff as.
> 4. Let the Stellar API for everything but LOAD() follow after we get
usage
>
> Then the particulars of ‘3’ are less important.
>
>
>
> On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com)
wrote:
>
> I agree with the general sentiment that we can tailor specific use cases
> via UI, and I'm worried that the use case specific solution (particularly
> in light of the note that it's not even general to the class of bloom
> filter problems, let alone an actually general problem) becomes more work
> than this as soon as about 2 more uses cases actually get realized.
> Pushing that to the UI lets people solve a variety of problems if they
> really want to dig in, while still giving flexibility to provide a more
> tailored experience for what we discover the 80% cases are in practice.
>
> Keeping in mind I am mostly unfamiliar with the extractor config itself,
I
> am wondering if it makes sense to split up the config a bit. While a lot
> of implementation details are shared, maybe the extractor config itself
> should be refactored into a couple parts analogous to ETL (as a follow on
> task, I think if this is true, it predates Casey's proposed change). It
> doesn't necessarily make it less complex, but it might make it more
easily
> digestible if it's split up by idea (parsing, transformation, etc.).
>
> Re: Mike's point, I don't think we want the actual processing broken up
as
> ETL, but the representation to the user in terms of configuration could
be
> similar (Since we're already doing parsing and transformation). We don't
> have to implement it as an ETL pipeline, but it does potentially offer
the
> user a way to quickly grasp what the JSON blob is actually specifying.
> Making it easy to understand, even if it's not the ideal way to interact
is
> potentially still a win.
>
> On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I mentioned this earlier, but I'll reiterate that I think this approach
> > gives us the ability to make specific use cases via a UI, or other
> > interface should we choose to add one, while keeping the core adaptable
> and
> > flexible. This is ideal for middle tier as I think this effectively
gives
> > us the ability to pivot to other use cases very easily while not being
so
> > generic as to be useless. The fact that you were able to create this as
> > quickly as you did seems to me directly related to the fact we made the
> > decision to keep the loader somewhat flexible rather than very
specific.
> > The operation ordering and state carry from one phase of processing to
> the
> > next would simply have been inscrutable, if not impossible, with a CLI
> > option-only approach. Sure, it's not as simple as "put infile.txt
> > outfile.txt", but the alternatives are not that clear either. One might
> > argue we could split up the processing pieces as in traditional Hadoop,
> eg
> >

Full Dev -> Heartbeat issues

2018-01-08 Thread Otto Fowler
I just started up full dev from the 0.4.2 release tag, and ended up with
failed heartbeats for all my services in ambari.
After investigation, I found the my /etc/hosts ( on node1 ) had multiple
entries for node1 :

[vagrant@node1 ~]$ cat /etc/hosts
127.0.0.1 node1 node1
127.0.0.1   localhost

## vagrant-hostmanager-start
192.168.66.121 node1

## vagrant-hostmanager-end

After removing the 127.0.0.1 node1 node1 line and restarting the machine +
all the services etc my issues are resolved and my board is green.

I am not sure why this may happen.
Hopefully if you are seeing this, this will help.

Anyone know why this may happen?


ottO


Re: Anand is a new Committer!

2018-01-11 Thread Otto Fowler
Congratulations and welcome Anand!


On January 11, 2018 at 09:29:24, Casey Stella (ceste...@gmail.com) wrote:

The Project Management Committee (PMC) for Apache Metron has invited Anand
Subramanian to become a committer and we are pleased to announce that they
have accepted.

Congratulations and welcome, Anand!


Checkstyle - have we run it?

2018-01-17 Thread Otto Fowler
Where are we with the check style integration?  How are we handling check
style in existing modules?
I seem to remember talk of a script or something to reformat?

It would be nice to get some of the warnings out of the builds, how should
we go about it?

ottO


Re: Checkstyle - have we run it?

2018-01-17 Thread Otto Fowler
Thanks,
I have check style up and integrated, and I have been running it on *new*
files etc.
But now when I work in existing, I obviously see the issues.

I *think* in the end module by module is the only feasible way is it not?


On January 17, 2018 at 12:15:33, Justin Leet (justinjl...@gmail.com) wrote:

It exists, we have a style that can be imported and setup in IntelliJ with
the Checkstyle plugin

Reformatting can also be done in IntelliJ (which will help a lot, but not
all issues). This can be done on a file mask basis (e.g. just do "*.java"
files to avoid reformatting other things), and could be done module wide or
project wide or whatever. I would turn off autoformatting of Javadocs in
IntelliJ (because a lot of the Apache license headers are Javadocs instead
of comments). Other than that, I don't think there are any other problems.

The main problem is more taking the time to do it, avoiding issues with
existing PRs, and making it manageable to review and take care of. Do we
do it module by module (keeping in mind we have a whole lot)? Things like
that. I'm happy to help out, but I just really haven't put in the effort to
get things moving forward.


On Wed, Jan 17, 2018 at 9:12 AM, Otto Fowler 
wrote:

> Where are we with the check style integration? How are we handling check
> style in existing modules?
> I seem to remember talk of a script or something to reformat?
>
> It would be nice to get some of the warnings out of the builds, how
should
> we go about it?
>
> ottO
>


Travis for Apache/Metron is in trouble

2018-01-18 Thread Otto Fowler
24hr long build is blocking up master’s travis build.
Who can nuke it?

ottO


Re: Some more upgrade fallout... Can't restart Metron Indexing

2018-01-18 Thread Otto Fowler
JIRAS



On January 18, 2018 at 12:14:11, Casey Stella (ceste...@gmail.com) wrote:

So, the challenge here is that our install script isn't smart enough right
now to skip creating tables that are already created. One thing you could
do is

1. rename the hbase tables for metron (see
https://stackoverflow.com/questions/27966072/how-do-you-rename-a-table-in-hbase
)
2. let the install create them anew
3. stop metron
4. delete the new empty hbase tables
5. swap in the old tables
6. start metron

What we probably should do is not barf if the tables exist, but rather
warn.

On Thu, Jan 18, 2018 at 12:02 PM, Laurens Vets  wrote:

> After upgrading from 0.4.1 to 0.4.2, I can't seem to start or restart
> Metron Indexing. I get the following errors:
>
> stderr: /var/lib/ambari-agent/data/errors-2468.txt
>
> Traceback (most recent call last):
> File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 160, in 
> Indexing().execute()
> File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",

> line 280, in execute
> method(env)
> File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 82, in start
> self.configure(env)
> File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 72, in configure
> commands.create_hbase_tables()
> File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_commands.py", line 126, in create_hbase_tables
> user=self.__params.hbase_user
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 155, in __init__
> self.env.run()
> File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 160, in run
> self.run_action(resource, action)
> File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 124, in run_action
> provider_action()
> File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",

> line 273, in action_run
> tries=self.resource.tries, try_sleep=self.resource.try_sleep)
> File
"/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 70, in inner
> result = function(command, **kwargs)
> File
"/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 92, in checked_call
> tries=tries, try_sleep=try_sleep)
> File
"/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 140, in _call_wrapper
> result = _call(command, **kwargs_copy)
> File
"/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 293, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'echo
> "create 'metron_update','t'" | hbase shell -n' returned 1. ERROR
> RuntimeError: Table already exists: metron_update!
>
> stdout: /var/lib/ambari-agent/data/output-2468.txt
>
> 2018-01-18 16:54:30,101 - Using hadoop conf dir:
> /usr/hdp/current/hadoop-client/conf
> 2018-01-18 16:54:30,301 - Using hadoop conf dir:
> /usr/hdp/current/hadoop-client/conf
> 2018-01-18 16:54:30,302 - Group['metron'] {}
> 2018-01-18 16:54:30,303 - Group['livy'] {}
> 2018-01-18 16:54:30,303 - Group['elasticsearch'] {}
> 2018-01-18 16:54:30,303 - Group['spark'] {}
> 2018-01-18 16:54:30,303 - Group['zeppelin'] {}
> 2018-01-18 16:54:30,304 - Group['hadoop'] {}
> 2018-01-18 16:54:30,304 - Group['kibana'] {}
> 2018-01-18 16:54:30,304 - Group['users'] {}
> 2018-01-18 16:54:30,304 - User['hive'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,305 - User['storm'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,306 - User['zookeeper'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,306 - User['infra-solr'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,307 - User['ams'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,307 - User['tez'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['users']}
> 2018-01-18 16:54:30,308 - User['zeppelin'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,309 - User['metron'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,309 - User['livy'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,310 - User['elasticsearch'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,310 - User['spark'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,311 - User['ambari-qa'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['users']}
> 2018-01-18 16:54:30,311 - User['flume'] {'gid': 'hadoop',
> 'fetch_

Re: [DISCUSS] Upgrading Solr

2018-01-18 Thread Otto Fowler
+1 to the feature branch.

Also, there have been some questions about solr support recently, I think
when the feature branch
is ready you should announce it on user@ too, we may get some help from
folks looking for this.



On January 18, 2018 at 14:26:14, Justin Leet (justinjl...@gmail.com) wrote:

Now that we have ES at a modern version, we should consider bringing Solr
to a modern version as well.

The focus of this work would be to get us in a place where Solr is
upgraded, along with the related work of building out the Solr
functionality to parity with Elasticsearch. The goal would not be to add
net new functionality, just to get Solr and ES in the same place for the
alerts UI and REST interface. Additionally, it would include the various
supporting necessities such as ensuring associated DAOs are testable, and
so on.

Given the testing, reviewing, and iteration involved, I'd like to propose
doing this work in a feature in a feature branch.

Jiras would be created based on this discussion once it dies down a bit.


Re: Some more upgrade fallout... Can't restart Metron Indexing

2018-01-18 Thread Otto Fowler
I assigned METRON-1410 to myself.
I will take a shot at addressing this in the ambari service code.



On January 18, 2018 at 13:03:18, Laurens Vets (laur...@daemon.be) wrote:

On 2018-01-18 09:14, Casey Stella wrote:
> So, the challenge here is that our install script isn't smart enough
> right
> now to skip creating tables that are already created. One thing you
> could
> do is
>
> 1. rename the hbase tables for metron (see
>
>
https://stackoverflow.com/questions/27966072/how-do-you-rename-a-table-in-hbase
> )
> 2. let the install create them anew
> 3. stop metron
> 4. delete the new empty hbase tables
> 5. swap in the old tables
> 6. start metron

This worked, thanks! I'll update
https://issues.apache.org/jira/browse/METRON-1410 as well.


Master is failed in Travis

2018-01-22 Thread Otto Fowler
https://travis-ci.org/apache/metron/builds/330900667


Question on NOTICE files, copyrights

2018-01-22 Thread Otto Fowler
1.  I think we need to update our Metron Copyright to 2015-2018
2. Were are the requirements written up wrt NOTICE files and other things
that need to be in the jar META directories?
I have not added them to any jars I have created, and I think that may be
wrong.  So not all modules have them, and some are out of date.

ottO


Re: Dependency Checks

2018-01-24 Thread Otto Fowler
There isn’t a maven plugin for this?


On January 24, 2018 at 16:44:02, Nick Allen (n...@nickallen.org) wrote:

We should re-jigger `platform-info.sh` (or create a new tool) that very
obviously passes or fails based on what it discovers in the user's
environment. Right now, a user just runs the `platform-info.sh` and it is
not apparent to them what the problem is.

The script could be manually executed by a user. This could also be called
at the start of a deployment so that it fails fast if the user is missing
dependencies.

I wish there was a better way to handle this.


On Wed, Jan 24, 2018 at 1:05 PM Sujay Jaladi  wrote:

> Thanks Otto. Please find the output below
>
> scripts sujay$ ./platform-info.sh
>
> Metron 0.4.2
>
> --
>
> --
>
> fatal: your current branch 'master' does not have any commits yet
>
> --
>
> --
>
> ansible 2.2.2.0
>
> config file =
>
> configured module search path = Default w/o overrides
>
> --
>
> Vagrant 2.0.1
>
> --
>
> Python 2.7.10
>
> --
>
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> 2015-11-10T08:41:47-08:00)
>
> Maven home: /usr/local/Cellar/maven@3.3/3.3.9/libexec
>
> Java version: 1.8.0_131, vendor: Oracle Corporation
>
> Java home:
> /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre
>
> Default locale: en_US, platform encoding: UTF-8
>
> OS name: "mac os x", version: "10.12.6", arch: "x86_64", family: "mac"
>
> --
>
> ./platform-info.sh: line 64: docker: command not found
>
> --
>
> node
>
> v9.4.0
>
> --
>
> npm
>
> 5.6.0
>
> --
>
> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
> --with-gxx-include-dir=/usr/include/c++/4.2.1
>
> Apple LLVM version 9.0.0 (clang-900.0.38)
>
> Target: x86_64-apple-darwin16.7.0
>
> Thread model: posix
>
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>
> --
>
> Compiler is C++11 compliant
>
> --
>
> Darwin sujay-lm 16.7.0 Darwin Kernel Version 16.7.0: Wed Oct 4 00:17:00
> PDT 2017; root:xnu-3789.71.6~1/RELEASE_X86_64 x86_64
>
> --
>
> Total System Memory = 16384 MB
>
> Processor Model: Intel(R) Core(TM) i7-6567U CPU
>
> Processor Speed: 3.30GHz
>
> Total Physical Processors: 2
>
> Total cores: 2
>
> Disk information:
>
> /dev/disk1 233Gi 83Gi 149Gi 36% 1222172 4293745107 0% /
>
> This CPU appears to support virtualization
>
> On Wed, Jan 24, 2018 at 4:53 AM, Otto Fowler 
> wrote:
>
>> Can you run metron-deployment/scripts/platform_info.sh and send the
>> output?
>>
>>
>> On January 23, 2018 at 21:43:34, Sujay Jaladi (jsu...@gmail.com) wrote:
>>
>> Hello,
>>
>> Everytime I attempt to deploy apache metron on AWS, I get the following
>> error and all the servers are up and running expect Metron or its
>> components are not installed. Please help.
>>
>> fatal: [ec2-52-10-94-22.us-west-2.compute.amazonaws.com -> localhost]:
>> FAILED! => {"changed": true, "cmd": "cd
>>
/Users/sujay/Downloads/apache-metron-0.4.2-rc2/metron-deployment/amazon-ec2/../playbooks/../..

>> && mvn clean package -DskipTests -T 2C -P HDP-2.5.0.0,mpack", "delta":
>> "0:00:04.845260", "end": "2018-01-23 18:28:27.608265", "failed": true,
>> "rc": 1, "start": "2018-01-23 18:28:22.763005", "stderr": "", "stdout":
>> "[INFO] Scanning for projects...\n[INFO]
>>
\n[INFO]

>> Reactor Build Order:\n[INFO] \n[INFO] Metron\n[INFO]
metron-stellar\n[INFO]
>> stellar-common\n[INFO] metron-analytics\n[INFO]
metron-maas-common\n[INFO]
>> metron-platform\n[INFO] metron-zookeeper\n[INFO]
>> metron-test-utilities\n[INFO] metron-integration-test\n[INFO]
>> metron-maas-service\n[INFO] metron-common\n[INFO]
metron-statistics\n[INFO]
>> metron-writer\n[INFO] metron-storm-kafka-override\n[INFO]
>> metron-storm-kafka\n[INFO] metron-hbase\n[INFO]
>> metron-profiler-common\n[INFO] metron-profiler-client\n[INFO]
>> metron-profiler\n[INFO] metron-hbase-client\n[INFO]
>> metron-enrichment\n[INFO] metron-indexing\n[INFO] metron-solr\n[INFO]
>> metron-pcap\n[INFO] metron-parsers\n[INFO] metron-pcap-backend\n[INFO]
>> metron-data-management\n[INFO] metron-api\n[INFO]
metron-management\n[INFO]
>> elasticsearch-shaded\n[INFO] metron-elasticsearch\n[INFO]
>> metron-deployment\n[INFO] Metron Ambari Management Pack\n[INFO]
>> metron-contr

Re: [DISCUSS] Update Metron Elasticsearch index names to metron_

2018-01-24 Thread Otto Fowler
+1


On January 24, 2018 at 16:28:42, Nick Allen (n...@nickallen.org) wrote:

+1 to a standard prefix for all Metron indices. I've had the same thought
myself and you laid out the advantages well.





On Wed, Jan 24, 2018 at 3:47 PM zeo...@gmail.com  wrote:

> I agree with having a metron_ prefix for ES indexes, and the timing.
>
> Jon
>
> On Wed, Jan 24, 2018 at 3:20 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > With the completion of https://github.com/apache/metron/pull/840
> > (METRON-939: Upgrade ElasticSearch and Kibana), we have the makings for
a
> > major release rev of Metron in the upcoming release (currently slotted
to
> > 0.4.3, I believe). Since there are non-backwards compatible changes
> > pertaining to ES indexing, it seems like a good opportunity to revisit
> our
> > index naming standards.
> >
> > I propose we add a simple prefix "metron_" to all Metron indexes. There
> are
> > numerous reasons for doing so
> >
> > - removes the likelihood of index name collisions when we perform
> > operations on index wildcard names, e.g. "enrichment_*, indexing_*,
> > etc.".
> > - ie, this allows us to be more friendly in a multi-tenant ES
> > environment for relatively low engineering cost.
> > - simplifies the Kibana dashboard a bit. We currently needed to
> create a
> > special index pattern in order to accommodate multi-index pattern
> > matching
> > across all metron-specific indexes. Using metron_* would be much
> simpler
> > and less prone to error.
> > - easier for customers to debug and identify Metron-specific indexes
> and
> > associated data
> >
> >
> > The reason for making these changes now is that we already have
breaking
> > changes with ES. Leveraging existing indexed data rather than deleting
> > indexes and starting from scractch already requires a
> re-indexing/migration
> > step, so there is no additional effort on the part of users if they
> choose
> > to attempt a migration. It further makes sense with our current work
> > towards upgrading Solr.
> >
> > We already have a battery of integration and manual tests after the ES
> > upgrade work that can be leveraged to validate the changes.
> >
> > Mike Miklavcic
> >
>
>
> --
>
> Jon
>


[DISCUSS] Using JSON Path to support more complex documents with the JSONMap Parser

2018-01-25 Thread Otto Fowler
While it would be preferred if all data streamed into the parsers is
already in ‘stream’ form, as opposed to ‘batched’ form, it may not always
be possible, or possible at every step of system development.

I was wondering if it would be worth adding optional support to the JSONMap
Parser to support more complex documents, and split them in the parser into
multiple messages. This is similar in function to the JSON Splitter
processor in NiFi

So, a document would come into the JSONMap Parser from Kafka, with some
embedded set of the real message content, such as in this simplified
example:

{
“messages" : [
{ message1},
{ message2},
….
{messageN}
]
}

the JSONMap Parser, would have a new configuration item for message
selection, that would be a JSON Path expression

“messageSelector” : “$.messages “

Inside the JSONMap Parser, it would evaluate the expression, and do the
same processing on each item returned by the expression list.

the Parser interface already supports returning multiple message objects
from a single byte[] input.

There is a performance penalty to be paid here, and it is more than just
doing more than one message due to the JSONPath evaluation.

I can see this being useful in a couple of circumstances:

   -

   You want to work with some document format with metron but do not have
   NiFi or the equivalent available or setup yet
   -

   You want to prototype with Metron before you get the ‘preprocessing’
   setup
   -

   You are not going to be able to use NiFi and are ok with the performance

I have something in github to look at for more detail :
ottobackwards/json-path-play


Thoughts?


Re: [DISCUSS] Using JSON Path to support more complex documents with the JSONMap Parser

2018-01-25 Thread Otto Fowler
JSONPath is indeed what nifi uses.  I used their implementation as a guide.
I believe starting with a path would be a good minimum viable, a good start.
We could support multiple paths of course.

Beside the fact that I knew NiFi used this approach, I believe that
JSONPath provides a flexible mechanism for defining
the targets within the document, and would make this more usable across
various document structures.

We already do full document with simple json btw.

On January 25, 2018 at 12:45:12, Matt Foley (ma...@apache.org) wrote:

Hi Otto,
Oddly, I had reason a couple weeks ago to try to figure out a streaming
parser for very large json objects -- altho it was in Python rather than
Java.
Search showed two basic approaches, both unsurprisingly modeled on xml
processing:
- SAX-like parsing
- XPath-like parsing

Both are capable of true streaming interface, that is one doesn't have to
load the whole json into memory first.
The sound-bite comparison of the two, thanks to stackoverflow, is:

> SAX is a top-down parser and allows serial access to a XML document, and
works well for read only [serial, streamed] access.
> XPath is useful when you only need a couple of values from the XML
document, and you know where to find them (you know the path of the data,
/root/item/challange/text).
> [XPath is] certainly easier to use, ... whereas ... SAX will always be a
lot more awkward to program than XPath.

Having used SAX before, I agree it's got an "awkward" api, but it's quite
usable and does the job.
I haven't been hands-on with XPath.

Is XPath (or rather JSONPath) what NiFi uses?
And is it sufficient for our needs to have a fixed path to the message
sequence in any given json bundle?

Thanks,
--Matt


On 1/25/18, 7:57 AM, "Otto Fowler"  wrote:

While it would be preferred if all data streamed into the parsers is
already in ‘stream’ form, as opposed to ‘batched’ form, it may not always
be possible, or possible at every step of system development.

I was wondering if it would be worth adding optional support to the JSONMap
Parser to support more complex documents, and split them in the parser into
multiple messages. This is similar in function to the JSON Splitter
processor in NiFi

So, a document would come into the JSONMap Parser from Kafka, with some
embedded set of the real message content, such as in this simplified
example:

{
“messages" : [
{ message1},
{ message2},
….
{messageN}
]
}

the JSONMap Parser, would have a new configuration item for message
selection, that would be a JSON Path expression

“messageSelector” : “$.messages “

Inside the JSONMap Parser, it would evaluate the expression, and do the
same processing on each item returned by the expression list.

the Parser interface already supports returning multiple message objects
from a single byte[] input.

There is a performance penalty to be paid here, and it is more than just
doing more than one message due to the JSONPath evaluation.

I can see this being useful in a couple of circumstances:

-

You want to work with some document format with metron but do not have
NiFi or the equivalent available or setup yet
-

You want to prototype with Metron before you get the ‘preprocessing’
setup
-

You are not going to be able to use NiFi and are ok with the performance

I have something in github to look at for more detail :
ottobackwards/json-path-play
<https://github.com/ottobackwards/json-path-play>

Thoughts?


Re: [DISCUSS] Using JSON Path to support more complex documents with the JSONMap Parser

2018-01-25 Thread Otto Fowler
In other words, I don’t believe the issue is parsing, but rather searching
and extracting.

I have used SAX with xml as well, can you point me to the json equivalent
you found?


On January 25, 2018 at 13:01:58, Otto Fowler (ottobackwa...@gmail.com)
wrote:

JSONPath is indeed what nifi uses.  I used their implementation as a guide.
I believe starting with a path would be a good minimum viable, a good start.
We could support multiple paths of course.

Beside the fact that I knew NiFi used this approach, I believe that
JSONPath provides a flexible mechanism for defining
the targets within the document, and would make this more usable across
various document structures.

We already do full document with simple json btw.

On January 25, 2018 at 12:45:12, Matt Foley (ma...@apache.org) wrote:

Hi Otto,
Oddly, I had reason a couple weeks ago to try to figure out a streaming
parser for very large json objects -- altho it was in Python rather than
Java.
Search showed two basic approaches, both unsurprisingly modeled on xml
processing:
- SAX-like parsing
- XPath-like parsing

Both are capable of true streaming interface, that is one doesn't have to
load the whole json into memory first.
The sound-bite comparison of the two, thanks to stackoverflow, is:

> SAX is a top-down parser and allows serial access to a XML document, and
works well for read only [serial, streamed] access.
> XPath is useful when you only need a couple of values from the XML
document, and you know where to find them (you know the path of the data,
/root/item/challange/text).
> [XPath is] certainly easier to use, ... whereas ... SAX will always be a
lot more awkward to program than XPath.

Having used SAX before, I agree it's got an "awkward" api, but it's quite
usable and does the job.
I haven't been hands-on with XPath.

Is XPath (or rather JSONPath) what NiFi uses?
And is it sufficient for our needs to have a fixed path to the message
sequence in any given json bundle?

Thanks,
--Matt


On 1/25/18, 7:57 AM, "Otto Fowler"  wrote:

While it would be preferred if all data streamed into the parsers is
already in ‘stream’ form, as opposed to ‘batched’ form, it may not always
be possible, or possible at every step of system development.

I was wondering if it would be worth adding optional support to the JSONMap
Parser to support more complex documents, and split them in the parser into
multiple messages. This is similar in function to the JSON Splitter
processor in NiFi

So, a document would come into the JSONMap Parser from Kafka, with some
embedded set of the real message content, such as in this simplified
example:

{
“messages" : [
{ message1},
{ message2},
….
{messageN}
]
}

the JSONMap Parser, would have a new configuration item for message
selection, that would be a JSON Path expression

“messageSelector” : “$.messages “

Inside the JSONMap Parser, it would evaluate the expression, and do the
same processing on each item returned by the expression list.

the Parser interface already supports returning multiple message objects
from a single byte[] input.

There is a performance penalty to be paid here, and it is more than just
doing more than one message due to the JSONPath evaluation.

I can see this being useful in a couple of circumstances:

-

You want to work with some document format with metron but do not have
NiFi or the equivalent available or setup yet
-

You want to prototype with Metron before you get the ‘preprocessing’
setup
-

You are not going to be able to use NiFi and are ok with the performance

I have something in github to look at for more detail :
ottobackwards/json-path-play
<https://github.com/ottobackwards/json-path-play>

Thoughts?


Metron User Community Meeting Call

2018-01-25 Thread Otto Fowler
I would like to propose a Metron user community meeting. I propose that we
set the meeting next week, and will throw out Wednesday, January 31st at
09:30AM PST, 12:30 on the East Coast and 5:30 in London Towne. This meeting
will be held over a web-ex, the details of which will be included in the
actual meeting notice.
Topics

We have a volunteer for a community member presentation:

Ahmed Shah (PMP, M. Eng.) Cybersecurity Analyst & Developer GCR -
Cybersecurity Operations Center Carleton University - cugcr.com

Ahmed would like to talk to the community about

   -

   Who the GCR group is
   -

   How they use Metron 0.4.1
   -

   Walk through their dashboards, UI management screen, nifi
   -

   Challenges we faced up until now

I would like to thank Ahmed for stepping forward for this meeting.

If you have something you would like to present or talk about please reply
here! Maybe we can have people ask for “A better explanation of feature
X” type things?
Metron User Community Meetings

User Community Meetings are a means for realtime discussion of experiences
with Apache Metron, or demonstration of how the community is using or will
be using Apache Metron.

These meetings are geared towards:

   -

   Demonstrations and knowledge sharing as opposed to technical discussion
   or implementation details from members of the Apache Metron Community
   -

   Existing Feature demonstrations
   -

   Proposed Feature demonstrations
   -

   Community feedback

These meetings are *not* for :

   -

   Support discussions. Those are best left to the mailing lists.
   -

   Development discussions. There is another type of meeting for that.


Re: [DISCUSS] Using JSON Path to support more complex documents with the JSONMap Parser

2018-01-25 Thread Otto Fowler
Sure it helps, but I am not sure I answered __your__ questions?

As I mentioned, we already use

Map rawMap = JSONUtils.INSTANCE.load(originalString, new
TypeReference>() {
});


So, using JSONPath which is using the same object mapper operation under
the covers is not a change.
We were already reading the complete document in.


On January 25, 2018 at 15:06:28, Matt Foley (ma...@apache.org) wrote:

Heh, as I said, I was looking in Python. For SAX-like JSON parsers I found
numerous libraries, most built on top of an underlying Python library named
ijson, which is itself based on a C library called yajl.

The yajl page (http://lloyd.github.io/yajl/ ) lists a double handful of
language bindings but, annoyingly, none for Java; nor does Google seem to
know of any.

In Java, there's a library named json-simple in the Google Code Archive
which claims a SAX-like interface and broad production-level
adoption/robustness: https://code.google.com/archive/p/json-simple/ . I
don't have experience with it.

Of course, the gold standard json library for Java is Jackson. It documents
stream-based parsing, but not "SAX-like".
https://github.com/FasterXML/jackson-docs/wiki/JacksonStreamingApi
indicates that using it is equivalent to writing a parser, which suggests
(disappointingly) somewhat lower-level than SAX api.
http://www.cowtowncoder.com/blog/archives/2009/01/entry_132.html compares
Jackson streaming interface to Stax and SAX, and says it is like Stax
Cursor api, claiming simpler use than SAX (about which I have no opinion).
So I think most people use Jackson for non-streaming consumption of json.

JsonPath implementation uses Jackson under the hood, which seems good to me
-- professionals don't recreate the wheel.
And it has the charm (for this community) of a DSL-like interface. It's
likely a good choice.

Hope this helps,
--Matt

On 1/25/18, 10:05 AM, "Otto Fowler"  wrote:

In other words, I don’t believe the issue is parsing, but rather searching
and extracting.

I have used SAX with xml as well, can you point me to the json equivalent
you found?


On January 25, 2018 at 13:01:58, Otto Fowler (ottobackwa...@gmail.com)
wrote:

JSONPath is indeed what nifi uses. I used their implementation as a guide.
I believe starting with a path would be a good minimum viable, a good
start.
We could support multiple paths of course.

Beside the fact that I knew NiFi used this approach, I believe that
JSONPath provides a flexible mechanism for defining
the targets within the document, and would make this more usable across
various document structures.

We already do full document with simple json btw.

On January 25, 2018 at 12:45:12, Matt Foley (ma...@apache.org) wrote:

Hi Otto,
Oddly, I had reason a couple weeks ago to try to figure out a streaming
parser for very large json objects -- altho it was in Python rather than
Java.
Search showed two basic approaches, both unsurprisingly modeled on xml
processing:
- SAX-like parsing
- XPath-like parsing

Both are capable of true streaming interface, that is one doesn't have to
load the whole json into memory first.
The sound-bite comparison of the two, thanks to stackoverflow, is:

> SAX is a top-down parser and allows serial access to a XML document, and
works well for read only [serial, streamed] access.
> XPath is useful when you only need a couple of values from the XML
document, and you know where to find them (you know the path of the data,
/root/item/challange/text).
> [XPath is] certainly easier to use, ... whereas ... SAX will always be a
lot more awkward to program than XPath.

Having used SAX before, I agree it's got an "awkward" api, but it's quite
usable and does the job.
I haven't been hands-on with XPath.

Is XPath (or rather JSONPath) what NiFi uses?
And is it sufficient for our needs to have a fixed path to the message
sequence in any given json bundle?

Thanks,
--Matt


On 1/25/18, 7:57 AM, "Otto Fowler"  wrote:

While it would be preferred if all data streamed into the parsers is
already in ‘stream’ form, as opposed to ‘batched’ form, it may not always
be possible, or possible at every step of system development.

I was wondering if it would be worth adding optional support to the JSONMap
Parser to support more complex documents, and split them in the parser into
multiple messages. This is similar in function to the JSON Splitter
processor in NiFi

So, a document would come into the JSONMap Parser from Kafka, with some
embedded set of the real message content, such as in this simplified
example:

{
“messages" : [
{ message1},
{ message2},
….
{messageN}
]
}

the JSONMap Parser, would have a new configuration item for message
selection, that would be a JSON Path expression

“messageSelector” : “$.messages “

Inside the JSONMap Parser, it would evaluate the expression, and do the
same processing on each item returned by the expression list.

the Parser interface alread

Re: [GitHub] metron issue #903: METRON-1370 Create Full Dev Equivalent for Ubuntu

2018-01-25 Thread Otto Fowler
Maybe we need an adding support for a new platform doc


On January 25, 2018 at 19:37:47, nickwallen (g...@git.apache.org) wrote:

Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/903

> @lvets: Just for my understanding, but why Ubuntu Trusty? In April that
will be 2 full Ubuntu LTS versions behind the then current one...

Because that's the requirement that I need to support. All the work around
the DEBs, the Mpack, Ansible setup was driven towards that.

If you or anyone else wants to add support for a newer version that can
also be done, but someone will have to put in the effort to do so.


---


When things change in hdfs, how do we know

2018-01-25 Thread Otto Fowler
At the moment, when a grok file or something changes in HDFS, how do we
know?  Do we have to restart the parser topology to pick it up?
Just trying to clarify for myself.

ottO


Re: When things change in hdfs, how do we know

2018-01-26 Thread Otto Fowler
https://github.com/ottobackwards/hdfs-inotify-zookeeper

Working on a poc



On January 26, 2018 at 07:41:44, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Should we consider using the Inotify interface to trigger reconfiguration,
in same way we trigger config changes in curator? We also need to fix
caching and lifecycle in the Grok parser to make the zookeeper changes
propagate pattern changes while we’re at it.

Simon

> On 26 Jan 2018, at 03:16, Casey Stella  wrote:
>
> Right now you have to restart the parser topology.
>
> On Thu, Jan 25, 2018 at 10:15 PM, Otto Fowler 
> wrote:
>
>> At the moment, when a grok file or something changes in HDFS, how do we
>> know? Do we have to restart the parser topology to pick it up?
>> Just trying to clarify for myself.
>>
>> ottO
>>


Re: When things change in hdfs, how do we know

2018-01-26 Thread Otto Fowler
In the end, what I’m thinking is this:

We have an ambari service that runs the notification -> zookeeper
it reads the ‘registration area’ from zookeeper to get it’s state and what
to watch
post 777 when parsers are installed and registered it is trivial to have my
installer also register the files to watch

the notifications service also has a notification from zookeeper for new
registrations.

On notify event, the ‘notification node’ has it’s content set to the event
details and time
which the parser would pick up…. causing the reload
???
profit


This would work for the future script parser etc etc.


On January 26, 2018 at 08:30:32, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Interesting, so you have an INotify listener to filter events, and then on
given changes, propagate a notification to zookeeper, which then triggers
the reconfiguration event via the curator client in Metron. I kinda like it
given our existing zookeeper methods.

Simon

On 26 Jan 2018, at 13:27, Otto Fowler  wrote:

https://github.com/ottobackwards/hdfs-inotify-zookeeper

Working on a poc



On January 26, 2018 at 07:41:44, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Should we consider using the Inotify interface to trigger reconfiguration,
in same way we trigger config changes in curator? We also need to fix
caching and lifecycle in the Grok parser to make the zookeeper changes
propagate pattern changes while we’re at it.

Simon

> On 26 Jan 2018, at 03:16, Casey Stella  wrote:
>
> Right now you have to restart the parser topology.
>
> On Thu, Jan 25, 2018 at 10:15 PM, Otto Fowler 
> wrote:
>
>> At the moment, when a grok file or something changes in HDFS, how do we
>> know? Do we have to restart the parser topology to pick it up?
>> Just trying to clarify for myself.
>>
>> ottO
>>


  1   2   3   4   5   6   7   >