Re: [DISCUSS] Community Meetings

2017-12-13 Thread Simon Elliston Ball
Good points Larry, we would need to get consent from everyone on the call to 
record to properly comply with regulations in some countries. We would 
definitely need someone to step up as note taker. 

Something else to think about is intended audience. Previously we’ve had 
meeting like this which have been very detailed Dev@ focussed (which is a great 
thing) but have rather alienated participants in User@ land. We need to make it 
clear what level we’re talking about to be inclusive. 

Simon

> On 13 Dec 2017, at 00:44, larry mccay  wrote:
> 
> Not sure about posting the recordings - you will need to check and make
> sure that doesn't violate anything.
> 
> Just a friendly reminder...
> It is important that meetings have notes and a summary that is sent out
> describing topics to be decided on the mailing list.
> No decisions can be made in the community meeting itself - this gives
> others in other timezones and commitments review and voice in the decisions.
> 
> If it didn't happen on the mailing lists then it didn't happen. :)
> 
> 
> On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
> 
>> Yes, I do.
>> 
>> I suspect the best bet will be to post recordings somewhere on the
>> apache.org  metron site.
>> 
>> Simon
>> 
>>> On 12 Dec 2017, at 18:36, Otto Fowler  wrote:
>>> 
>>> Excellent, do you have the > 40 min + record option?
>>> 
>>> 
>>> On December 12, 2017 at 13:19:55, Simon Elliston Ball (
>>> si...@simonellistonball.com) wrote:
>>> 
>>> Happy to volunteer a zoom room. That seems to have worked for most in the
>>> past.
>>> 
>>> Simon
>>> 
 On 12 Dec 2017, at 18:09, Otto Fowler  wrote:
 
 Thanks! I think I’d like something hosted though.
 
 
 On December 12, 2017 at 11:18:52, Ahmed Shah (
>> ahmeds...@cmail.carleton.ca)
>>> 
 wrote:
 
 Hello,
 
 wrt "- How are we going to host it"...
 
 I've used BigBlueButton as an end user at our University.
 
 It is LGPL open source.
 
 https://bigbluebutton.org/
 https://bigbluebutton.org/developers/
 
 
 -Ahmed
 
 ___
 Ahmed Shah (PMP, M. Eng.)
 Cybersecurity Analyst & Developer
 GCR - Cybersecurity Operations Center
 Carleton University - cugcr.com
 
 
 
 From: Otto Fowler 
 Sent: December 11, 2017 4:41 PM
 To: dev@metron.apache.org
 Subject: [DISCUSS] Community Meetings
 
 I think that we all want to have regular community meetings. We may be
 better able to keep to a regular schedule with these meetings if we
>>> spread
 out the responsibility for them from James and Casey, both of whom have
>> a
 lot on their plate already.
 
 I would be willing to coordinate and run the meetings, and would welcome
 anyone else who wants to help when they can.
 
 The only issue for me is I do not have a web-ex account that I can use
>> to
 hold the meeting. So I’ll need some recommendations for a suitable
 alternative. I have not been able to find an Apache Friendly
>> alternative,
 in the same way that Atlassian is apache friendly.
 
 
 So - from what I can see we need to:
 
 - Talk through who is going to do it
 - How are we going to host it
 - When are we going to do it
 
 Anything else?
 
 ottO
>> 
>> 


[GitHub] metron pull request #863: METRON-1347: Indexing Topology should fail tuples ...

2017-12-13 Thread merrimanr
Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/863#discussion_r156674159
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/bolt/BulkMessageWriterBolt.java
 ---
@@ -229,17 +239,30 @@ public void execute(Tuple tuple) {
   LOG.trace("Writing enrichment message: {}", message);
   WriterConfiguration writerConfiguration = 
configurationTransformation.apply(
   new IndexingWriterConfiguration(bulkMessageWriter.getName(), 
getConfigurations()));
-  if(writerConfiguration.isDefault(sensorType)) {
-//want to warn, but not fail the tuple
-collector.reportError(new Exception("WARNING: Default and (likely) 
unoptimized writer config used for " + bulkMessageWriter.getName() + " writer 
and sensor " + sensorType));
+  if(sensorType == null) {
--- End diff --

@ottobackwards which fields should we validate here?


---


[GitHub] metron pull request #863: METRON-1347: Indexing Topology should fail tuples ...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/863#discussion_r156675356
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/bolt/BulkMessageWriterBolt.java
 ---
@@ -229,17 +239,30 @@ public void execute(Tuple tuple) {
   LOG.trace("Writing enrichment message: {}", message);
   WriterConfiguration writerConfiguration = 
configurationTransformation.apply(
   new IndexingWriterConfiguration(bulkMessageWriter.getName(), 
getConfigurations()));
-  if(writerConfiguration.isDefault(sensorType)) {
-//want to warn, but not fail the tuple
-collector.reportError(new Exception("WARNING: Default and (likely) 
unoptimized writer config used for " + bulkMessageWriter.getName() + " writer 
and sensor " + sensorType));
+  if(sensorType == null) {
--- End diff --

Sure thing.  Really the only two required are `timestamp` and 
`source.type`.  Did I miss any?


---


[GitHub] metron pull request #865: METRON-1212 The bundle System and Maven Plugin (Fe...

2017-12-13 Thread ottobackwards
GitHub user ottobackwards opened a pull request:

https://github.com/apache/metron/pull/865

METRON-1212 The bundle System and Maven Plugin (Feature Branch)

This PR contains the Bundle system and Maven Plugin.

The bundle system and the plugin are adapted from the Apache Nifi project.  

## bundles-maven-plugin
The bundles-maven-plugin is an adapted version of the jar dependency plugin 
whose function is to bundle a jar of jars based on the dependencies for a 
project.  It also creates metadata attributes.
A project's jar, and it's non-provided dependency jars are place in a /lib 
entry in the bundle, with the bundle itself being in jar format.

## bundles-lib 
The bundles-lib contains the functionality required to:
- discover bundles
- inspect bundles for exposed extension types
- load the bundles
- create special class loaders for bundles
- deliver instances of extension types for use

NAR exposed the bundles through many classes.  I have created the 
BundleSystem interface to expose a more usable, simplified api for our use 
cases.

### From the original PR for METRON-777:
Metron Bundle Plugin
- adaptation of the nifi plugin
- more configurable wrt file extension/dependency and metadata naming
bundle-lib
- adaptation of nifi-nar-utils to be used outside of the nifi project
- rudimentary extensibility to allow configuration and injection of service 
types and other things that were hard coded to nifi
- refactored from File based to VFS based
- rebranding to Bundle from Nar ( although the lib and the plugin allow 
that to be configured now )
- added capability to the properties class to write to stream, adapted to 
uri from paths
- added integration tests for hdfs
- changed to be ClassIndex based instead of ServiceLoader. Service loader 
is slower, and Casey's ClassIndex work is great. This also removes the NAR's 
required manual maintenance of the service file.
- refactored to use VFS to load the bundle/nar into the classloader AND to 
use VFS to load the dependency jars -> VFS as a composite filesystem. Thus 
going from NAR's 'working directory', exploded NARS to just loading the 
bundle/nar.

## Previous Review
Please see [@mattf_apache's 
review](https://github.com/apache/metron/pull/530/files/c5f8c34e4de8e6d456b97edd6f8a0d33b4819d69)

## changes from that review
I have changed the InitContext operations to have explicit builders, and 
made it so that creating a context can be done separately from initialization.  
Two contexts can then be 'merged'.  This is to allow for the addition of new 
bundles after initialization.

In preparing this PR I have:
- made checkstyle fixes
- fixed several types
- added a requested set of tests loading and executing simple 
interface/implementation from bundle beyond what is already in the bundle-lib 
tests

## Testing

*` cd bundles-maven-plugin && mvn -q install && cd .. ` must be run once to 
install the maven plugin
* This review is code review and test code review and running only
* [Test Project](https://github.com/ottobackwards/test-bundles-plugin) can 
be examined as a simple example of how to create bundles.
* The README.md has getting started and quickstart sections with some 
overview of creating by hand

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron \
- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottobackwards/metron fifth_bundles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #865






---


[GitHub] metron pull request #863: METRON-1347: Indexing Topology should fail tuples ...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/863#discussion_r156676155
  
--- Diff: metron-platform/metron-indexing/README.md ---
@@ -15,6 +15,12 @@ Indices are written in batch and the batch size and 
batch timeout are specified
 [Sensor Indexing Configuration](#sensor-indexing-configuration) via the 
`batchSize` and `batchTimeout` parameters.
 These configs are variable by sensor type.
 
--- End diff --

So, strictly speaking messages really only require `source.type` (which I 
typo'd) and `timestamp` (which I should add).  I'll fix that, but did I miss 
anything?


---


[GitHub] metron pull request #774: METRON-1212 The bundle system and maven plugin

2017-12-13 Thread ottobackwards
Github user ottobackwards closed the pull request at:

https://github.com/apache/metron/pull/774


---


Re: [DISCUSS] Community Meetings

2017-12-13 Thread Otto Fowler
I am ok with just notes and no recording.


On December 13, 2017 at 04:37:20, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

Good points Larry, we would need to get consent from everyone on the call
to record to properly comply with regulations in some countries. We would
definitely need someone to step up as note taker.

Something else to think about is intended audience. Previously we’ve had
meeting like this which have been very detailed Dev@ focussed (which is a
great thing) but have rather alienated participants in User@ land. We need
to make it clear what level we’re talking about to be inclusive.

Simon

> On 13 Dec 2017, at 00:44, larry mccay  wrote:
>
> Not sure about posting the recordings - you will need to check and make
> sure that doesn't violate anything.
>
> Just a friendly reminder...
> It is important that meetings have notes and a summary that is sent out
> describing topics to be decided on the mailing list.
> No decisions can be made in the community meeting itself - this gives
> others in other timezones and commitments review and voice in the
decisions.
>
> If it didn't happen on the mailing lists then it didn't happen. :)
>
>
> On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Yes, I do.
>>
>> I suspect the best bet will be to post recordings somewhere on the
>> apache.org  metron site.
>>
>> Simon
>>
>>> On 12 Dec 2017, at 18:36, Otto Fowler  wrote:
>>>
>>> Excellent, do you have the > 40 min + record option?
>>>
>>>
>>> On December 12, 2017 at 13:19:55, Simon Elliston Ball (
>>> si...@simonellistonball.com) wrote:
>>>
>>> Happy to volunteer a zoom room. That seems to have worked for most in
the
>>> past.
>>>
>>> Simon
>>>
 On 12 Dec 2017, at 18:09, Otto Fowler  wrote:

 Thanks! I think I’d like something hosted though.


 On December 12, 2017 at 11:18:52, Ahmed Shah (
>> ahmeds...@cmail.carleton.ca)
>>>
 wrote:

 Hello,

 wrt "- How are we going to host it"...

 I've used BigBlueButton as an end user at our University.

 It is LGPL open source.

 https://bigbluebutton.org/
 https://bigbluebutton.org/developers/


 -Ahmed

 ___
 Ahmed Shah (PMP, M. Eng.)
 Cybersecurity Analyst & Developer
 GCR - Cybersecurity Operations Center
 Carleton University - cugcr.com


 
 From: Otto Fowler 
 Sent: December 11, 2017 4:41 PM
 To: dev@metron.apache.org
 Subject: [DISCUSS] Community Meetings

 I think that we all want to have regular community meetings. We may be
 better able to keep to a regular schedule with these meetings if we
>>> spread
 out the responsibility for them from James and Casey, both of whom
have
>> a
 lot on their plate already.

 I would be willing to coordinate and run the meetings, and would
welcome
 anyone else who wants to help when they can.

 The only issue for me is I do not have a web-ex account that I can use
>> to
 hold the meeting. So I’ll need some recommendations for a suitable
 alternative. I have not been able to find an Apache Friendly
>> alternative,
 in the same way that Atlassian is apache friendly.


 So - from what I can see we need to:

 - Talk through who is going to do it
 - How are we going to host it
 - When are we going to do it

 Anything else?

 ottO
>>
>>


[GitHub] metron pull request #863: METRON-1347: Indexing Topology should fail tuples ...

2017-12-13 Thread simonellistonball
Github user simonellistonball commented on a diff in the pull request:

https://github.com/apache/metron/pull/863#discussion_r156676868
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/bolt/BulkMessageWriterBolt.java
 ---
@@ -229,17 +239,30 @@ public void execute(Tuple tuple) {
   LOG.trace("Writing enrichment message: {}", message);
   WriterConfiguration writerConfiguration = 
configurationTransformation.apply(
   new IndexingWriterConfiguration(bulkMessageWriter.getName(), 
getConfigurations()));
-  if(writerConfiguration.isDefault(sensorType)) {
-//want to warn, but not fail the tuple
-collector.reportError(new Exception("WARNING: Default and (likely) 
unoptimized writer config used for " + bulkMessageWriter.getName() + " writer 
and sensor " + sensorType));
+  if(sensorType == null) {
--- End diff --

Strictly speaking that's true, but by convention original_string should be 
required. There is a broader topic about what should be required, but that 
certainly doesn't belong in a comment on a PR.


---


[GitHub] metron issue #863: METRON-1347: Indexing Topology should fail tuples without...

2017-12-13 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/863
  
I would like to hear feedback from @ottobackwards on other required fields 
but this looks good to me otherwise.  


---


[GitHub] metron issue #863: METRON-1347: Indexing Topology should fail tuples without...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/863
  
The minimum required fields, as far as I can see right now are source.type, 
original_string and timestamp.  Given the use case for this is something that 
has skipped the parser topology, we should validate those.

If we think the same can be done for indexing, then we should use the same 
classes/technique there.

Again, this is based on the presented use case


---


[GitHub] metron issue #862: METRON-1343: Swagger UI for User Controller needs request...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/862
  
+1, Thanks for the contribution!


---


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Ryan Merriman
I took a first pass at adding tasks and will continue adding more as I
think of them.  I will wait for feedback on which modules to include before
I add all those (only added metron-elasticsearch for now).  I left all but
a couple unassigned so that anyone can pick up a task if they want.

On Wed, Dec 13, 2017 at 4:41 PM, Ryan Merriman  wrote:

> Jira is here:  https://issues.apache.org/jira/browse/METRON-1352.  I am
> starting to create sub-tasks based on the requirements outlined above and
> included in that Jira description.
>
> I am compiling a list of modules that we'll need to convert to the testing
> infrastructure.  Based on imports of ComponentRunner, I get these modules:
>
>- metron-elasticsearch
>- metron-enrichment
>- metron-indexing
>- metron-integration-test
>- metron-maas-service
>- metron-management
>- metron-pcap-backend
>- metron-profiler
>- metron-rest
>- metron-solr
>
> I am planning on creating sub-tasks for each of these.  I know that
> metron-common should also be converted because it uses the Zookeeper in
> memory server but doesn't use ComponentRunner to manage it.  Are there
> other modules like this that you know of?
>
> On Wed, Dec 13, 2017 at 2:44 PM, Otto Fowler 
> wrote:
>
>> Same as the feature branch name?  I just want to find it and set a watch
>> on it ;)
>>
>>
>> On December 13, 2017 at 15:29:00, Ryan Merriman (merrim...@gmail.com)
>> wrote:
>>
>> I'm open to ideas. What do you think the title should be?
>>
>> On Wed, Dec 13, 2017 at 2:13 PM, Otto Fowler 
>> wrote:
>>
>> > What is the Master Jira going to be?
>> >
>> >
>> >
>> > On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com)
>> > wrote:
>> >
>> > I am going to start the process of creating Jiras out of these initial
>> > requirements. I agree with them and think they are a good starting
>> point.
>> > Feel free to join in at anytime and add/change/remove requirements as
>> > needed. I will update the thread once I have the initial Jiras created
>> and
>> > we can go from there.
>> >
>> > On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman 
>> > wrote:
>> >
>> > > The purpose of this discussion is map out what is required to get the
>> > POC
>> > > started with https://github.com/apache/metron/pull/858 into master.
>> > >
>> > > The following features were added in the previously mentioned PR:
>> > >
>> > > - Dockerfile for Metron REST
>> > > - Dockerfile for Metron UIs
>> > > - Docker Compose application including Metron images, Elasticsearch,
>> > > Kafka, Zookeeper
>> > > - Modified travis file that manages the Docker environment and runs
>> > > the e2e tests as part of the build
>> > > - Maven pom.xml that installs all the required assets into the Docker
>> > > e2e module
>> > > - Modified metron-alerts pom.xml that allows e2e tests to be run
>> > > through Maven
>> > > - An example integration test that has been converted to use the new
>> > > infrastructure
>> > >
>> > > Here are the initial features proposed for acceptance into master:
>> > >
>> > > - All e2e and integration tests run on common infrastructure.
>> > > - All e2e and integration tests are run automatically in the Travis
>> > > build.
>> > > - All e2e and integration tests run repeatably and reliably in the
>> > > Travis build.
>> > > - Debugging options are available and documented.
>> > > - The new infra and how to interact with it is documented.
>> > > - Old infrastructure removed (anything unused or commented out is
>> > > deleted, instead of staying).
>> > >
>> > > Are there other requirements people want to add to this list?
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Otto Fowler
Awesome Ryan!
Have you thought about confluence?


On December 13, 2017 at 18:11:39, Ryan Merriman (merrim...@gmail.com) wrote:

I took a first pass at adding tasks and will continue adding more as I
think of them. I will wait for feedback on which modules to include before
I add all those (only added metron-elasticsearch for now). I left all but
a couple unassigned so that anyone can pick up a task if they want.

On Wed, Dec 13, 2017 at 4:41 PM, Ryan Merriman  wrote:

> Jira is here: https://issues.apache.org/jira/browse/METRON-1352. I am
> starting to create sub-tasks based on the requirements outlined above and
> included in that Jira description.
>
> I am compiling a list of modules that we'll need to convert to the
testing
> infrastructure. Based on imports of ComponentRunner, I get these modules:
>
> - metron-elasticsearch
> - metron-enrichment
> - metron-indexing
> - metron-integration-test
> - metron-maas-service
> - metron-management
> - metron-pcap-backend
> - metron-profiler
> - metron-rest
> - metron-solr
>
> I am planning on creating sub-tasks for each of these. I know that
> metron-common should also be converted because it uses the Zookeeper in
> memory server but doesn't use ComponentRunner to manage it. Are there
> other modules like this that you know of?
>
> On Wed, Dec 13, 2017 at 2:44 PM, Otto Fowler 
> wrote:
>
>> Same as the feature branch name? I just want to find it and set a watch
>> on it ;)
>>
>>
>> On December 13, 2017 at 15:29:00, Ryan Merriman (merrim...@gmail.com)
>> wrote:
>>
>> I'm open to ideas. What do you think the title should be?
>>
>> On Wed, Dec 13, 2017 at 2:13 PM, Otto Fowler 
>> wrote:
>>
>> > What is the Master Jira going to be?
>> >
>> >
>> >
>> > On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com)
>> > wrote:
>> >
>> > I am going to start the process of creating Jiras out of these initial
>> > requirements. I agree with them and think they are a good starting
>> point.
>> > Feel free to join in at anytime and add/change/remove requirements as
>> > needed. I will update the thread once I have the initial Jiras created
>> and
>> > we can go from there.
>> >
>> > On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman 
>> > wrote:
>> >
>> > > The purpose of this discussion is map out what is required to get
the
>> > POC
>> > > started with https://github.com/apache/metron/pull/858 into master.
>> > >
>> > > The following features were added in the previously mentioned PR:
>> > >
>> > > - Dockerfile for Metron REST
>> > > - Dockerfile for Metron UIs
>> > > - Docker Compose application including Metron images, Elasticsearch,
>> > > Kafka, Zookeeper
>> > > - Modified travis file that manages the Docker environment and runs
>> > > the e2e tests as part of the build
>> > > - Maven pom.xml that installs all the required assets into the
Docker
>> > > e2e module
>> > > - Modified metron-alerts pom.xml that allows e2e tests to be run
>> > > through Maven
>> > > - An example integration test that has been converted to use the new
>> > > infrastructure
>> > >
>> > > Here are the initial features proposed for acceptance into master:
>> > >
>> > > - All e2e and integration tests run on common infrastructure.
>> > > - All e2e and integration tests are run automatically in the Travis
>> > > build.
>> > > - All e2e and integration tests run repeatably and reliably in the
>> > > Travis build.
>> > > - Debugging options are available and documented.
>> > > - The new infra and how to interact with it is documented.
>> > > - Old infrastructure removed (anything unused or commented out is
>> > > deleted, instead of staying).
>> > >
>> > > Are there other requirements people want to add to this list?
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>


[GitHub] metron issue #831: METRON-1302: Split up Indexing Topology into batch and ra...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/831
  
The batch v. hdfs stuff still confuses me, I thought we decided on a 
different name?


---


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/866#discussion_r156726426
  
--- Diff: metron-deployment/playbooks/metron_install.yml ---
@@ -15,13 +15,6 @@
 #  limitations under the License.
 #
 ---
-- hosts: metron
-  become: true
-  roles:
-- { role: ambari_slave }
-- { role: metron-builder, tags: ['build'] }
--- End diff --

This is the cause of the second build.  

The process was a bit more complex when Quick Dev was around because it 
would launch the Quick Dev image, then rebuild and try to push out new bits to 
Quick Dev.  Since we don't need that any longer, this can be simplified.


---


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread nickwallen
GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/866

METRON-1349 Full Dev Builds Metron Twice

Removing the "Quick Dev" environment in #852 had an unintended side effect. 
 It caused Metron to be built twice during the Full Dev deployment process.  
Unless you prefer a double-build for thoroughness, this can be annoying.

## Testing

Deploy Full Dev and ensure that Metron is not build twice.  Once Metron is 
deployed, login to Ambari and run the Metron Service Check.  If the service 
check passes, we've done a solid.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nickwallen/metron METRON-1349

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #866


commit f152326f4c401129f0b05d03045440e7cf5dda2b
Author: Nick Allen 
Date:   2017-12-13T17:11:59Z

METRON-1349 Full Dev Builds Metron Twice




---


[GitHub] metron issue #857: METRON-1340: Improve e2e tests for metron alerts

2017-12-13 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/857
  
Follow up from @merrimanr and my work yesterday. We upped the versions of 
Node to 9.2.1. Per the doc, >8 is required to work with async/await. For good 
measure, I also set the NPM version to 5.6.0. We didn't touch Jasmine, but the 
Protractor docs also state that it should be > 2.7. Looks like we are currently 
using 2.5.2 per the package.json file. We may want to consider increasing that 
version as well.

We added `SELENIUM_PROMISE_MANAGER: false` to `protractor.conf.js` and 
immediately got failures due to `Promise` use in the Protractor tests and 
configuration. e.g. `var defer = protractor.promise.defer();`. So we removed 
references to promises in the conf file and were able to get past that first 
batch of errors. Now we were into problems with the tests. I started with the 
`login.e2e-spec.ts` spec file and removed `: Promise`. Running the tests 
again, the login tests were able to succeed.

There are still a large number of failures due to disabling the promise 
manager, but still having code throughout the test suite that leverages the 
older style. It's unclear if this will resolve all stability issues, but I 
think this is moving in the right direction.


---


[GitHub] metron pull request #862: METRON-1343: Swagger UI for User Controller needs ...

2017-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/862


---


[GitHub] metron issue #862: METRON-1343: Swagger UI for User Controller needs request...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/862
  
Please take care to mark the jira as done


---


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/866#discussion_r156725657
  
--- Diff: metron-deployment/roles/ambari_config/tasks/main.yml ---
@@ -26,16 +26,15 @@
   retries: 5
   delay: 10
 
-- name : check if ambari-server is up on {{ ambari_host }}:{{ambari_port}}
+- name : Wait for Ambari to start; http://{{ ambari_host }}:{{ ambari_port 
}}
   wait_for :
 host: "{{ ambari_host }}"
 port: "{{ ambari_port }}"
-delay: 120
-timeout: 300
+timeout: 600
--- End diff --

There is no need to always wait 2 minutes for Ambari to be ready.  Most 
often it is already up and kicking by the time we get here.  Rather than have a 
forced delay, I just added the delay duration to the overall timeout parameter 
in case there is a delay in getting Ambari going.


---


[GitHub] metron issue #863: METRON-1347: Indexing Topology should fail tuples without...

2017-12-13 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/863
  
Actually, I don't think `original_string` is required past the parser 
topology.  For instance, profiler messages into enrichment do not have 
`original_string`.


---


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156723676
  
--- Diff: metron-deployment/roles/ambari_config/vars/small_cluster.yml ---
@@ -87,6 +87,8 @@ configurations:
   topology.classpath: '{{ topology_classpath }}'
   - kafka-broker:
   log.dirs: '{{ kafka_log_dirs | default("/kafka-log") }}'
--- End diff --

Just pushed a change. How's that @ottobackwards?


---


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/866#discussion_r156724938
  
--- Diff: metron-deployment/roles/epel/tasks/main.yml ---
@@ -16,6 +16,4 @@
 #
 ---
 - name: Install EPEL repository
-  yum: name=epel-release update_cache=yes
-
-
+  yum: name=epel-release
--- End diff --

There is no need to force a cache update here.  This role gets run 
repetitively and forcing a cache update just slows us down.


---


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156751241
  
--- Diff: metron-deployment/amazon-ec2/README.md ---
@@ -126,6 +126,10 @@ To provision only subsets of the entire Metron 
deployment, Ansible tags can be s
 ./run.sh --tags="ec2,sensors"
 ```
 
+### Setting REST API Profile
+
--- End diff --

Just linking to the other doc, without the user knowing even a little of 
why they need to look at it is not much better.  I think my blurb is 
appropriate.


---


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156737230
  
--- Diff: metron-deployment/amazon-ec2/README.md ---
@@ -126,6 +126,10 @@ To provision only subsets of the entire Metron 
deployment, Ansible tags can be s
 ./run.sh --tags="ec2,sensors"
 ```
 
+### Setting REST API Profile
+
--- End diff --

Can we say something for people like me..  along the lines of
"Spring profiles are used for x,y,z.  By default the dev profile is 
selected.  Change the values of  For more information on setting the 
profiles and their use please see 


---


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/866#discussion_r156737918
  
--- Diff: metron-deployment/playbooks/metron_install.yml ---
@@ -15,13 +15,6 @@
 #  limitations under the License.
 #
 ---
-- hosts: metron
-  become: true
-  roles:
-- { role: ambari_slave }
-- { role: metron-builder, tags: ['build'] }
--- End diff --

Will --ansible-skip-tags="build"  still keep it from building at all?


---


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread merrimanr
Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156744095
  
--- Diff: metron-deployment/amazon-ec2/README.md ---
@@ -126,6 +126,10 @@ To provision only subsets of the entire Metron 
deployment, Ansible tags can be s
 ./run.sh --tags="ec2,sensors"
 ```
 
+### Setting REST API Profile
+
--- End diff --

Spring profiles are documented in the REST README:  
https://github.com/apache/metron/tree/master/metron-interface/metron-rest#spring-profiles.
  

Is there something we can do to make the REST README more accessible?  I 
feel like a lot of questions people ask are already answered there but no one 
ever reads it.  What can we do to make it more useful?  Table of contents 
maybe?  I would be happy to take that on in a follow-up PR.


---


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156748130
  
--- Diff: metron-deployment/amazon-ec2/README.md ---
@@ -126,6 +126,10 @@ To provision only subsets of the entire Metron 
deployment, Ansible tags can be s
 ./run.sh --tags="ec2,sensors"
 ```
 
+### Setting REST API Profile
+
--- End diff --

That's what I linked to in that README change @merrimanr and 
@ottobackwards. I didn't want to duplicate the REST docs, but agree with Otto 
about having a reference there.


---


Metron - Emailing Alerts

2017-12-13 Thread Ahmed Shah
Hello,
Just wondering if Metron has a feature to email alerts based on rules that a 
user defines.

Example:
Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
Rule B: Email the user 1...@1.com whenever payload contains "critical"

If not, does anyone have any recommendations on where to code these rules in 
the Metron stack that uses attributes from the GROK parser?


-Ahmed
___
Ahmed Shah (PMP, M. Eng.)
Cybersecurity Analyst & Developer
GCR - Cybersecurity Operations Center
Carleton University - cugcr.com


[GitHub] metron pull request #859: METRON-1345: Update EC2 README for custom Ansible ...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/859#discussion_r156755792
  
--- Diff: metron-deployment/amazon-ec2/README.md ---
@@ -126,6 +126,10 @@ To provision only subsets of the entire Metron 
deployment, Ansible tags can be s
 ./run.sh --tags="ec2,sensors"
 ```
 
+### Setting REST API Profile
+
--- End diff --

Reading that back, it seems a little stronger than I intended.  Sorry.  I 
don't think everyone following these deployment steps is necessarily going to 
know why they need to start tripping through readme land without some context.  
We have a lot of people trying deployment and having problems who are not as 
expert


---


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Otto Fowler
What is the Master Jira going to be?



On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com) wrote:

I am going to start the process of creating Jiras out of these initial
requirements. I agree with them and think they are a good starting point.
Feel free to join in at anytime and add/change/remove requirements as
needed. I will update the thread once I have the initial Jiras created and
we can go from there.

On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman  wrote:

> The purpose of this discussion is map out what is required to get the POC
> started with https://github.com/apache/metron/pull/858 into master.
>
> The following features were added in the previously mentioned PR:
>
> - Dockerfile for Metron REST
> - Dockerfile for Metron UIs
> - Docker Compose application including Metron images, Elasticsearch,
> Kafka, Zookeeper
> - Modified travis file that manages the Docker environment and runs
> the e2e tests as part of the build
> - Maven pom.xml that installs all the required assets into the Docker
> e2e module
> - Modified metron-alerts pom.xml that allows e2e tests to be run
> through Maven
> - An example integration test that has been converted to use the new
> infrastructure
>
> Here are the initial features proposed for acceptance into master:
>
> - All e2e and integration tests run on common infrastructure.
> - All e2e and integration tests are run automatically in the Travis
> build.
> - All e2e and integration tests run repeatably and reliably in the
> Travis build.
> - Debugging options are available and documented.
> - The new infra and how to interact with it is documented.
> - Old infrastructure removed (anything unused or commented out is
> deleted, instead of staying).
>
> Are there other requirements people want to add to this list?
>
>
>
>


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Ryan Merriman
I'm open to ideas.  What do you think the title should be?

On Wed, Dec 13, 2017 at 2:13 PM, Otto Fowler 
wrote:

> What is the Master Jira going to be?
>
>
>
> On December 13, 2017 at 14:36:50, Ryan Merriman (merrim...@gmail.com)
> wrote:
>
> I am going to start the process of creating Jiras out of these initial
> requirements. I agree with them and think they are a good starting point.
> Feel free to join in at anytime and add/change/remove requirements as
> needed. I will update the thread once I have the initial Jiras created and
> we can go from there.
>
> On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman 
> wrote:
>
> > The purpose of this discussion is map out what is required to get the
> POC
> > started with https://github.com/apache/metron/pull/858 into master.
> >
> > The following features were added in the previously mentioned PR:
> >
> > - Dockerfile for Metron REST
> > - Dockerfile for Metron UIs
> > - Docker Compose application including Metron images, Elasticsearch,
> > Kafka, Zookeeper
> > - Modified travis file that manages the Docker environment and runs
> > the e2e tests as part of the build
> > - Maven pom.xml that installs all the required assets into the Docker
> > e2e module
> > - Modified metron-alerts pom.xml that allows e2e tests to be run
> > through Maven
> > - An example integration test that has been converted to use the new
> > infrastructure
> >
> > Here are the initial features proposed for acceptance into master:
> >
> > - All e2e and integration tests run on common infrastructure.
> > - All e2e and integration tests are run automatically in the Travis
> > build.
> > - All e2e and integration tests run repeatably and reliably in the
> > Travis build.
> > - Debugging options are available and documented.
> > - The new infra and how to interact with it is documented.
> > - Old infrastructure removed (anything unused or commented out is
> > deleted, instead of staying).
> >
> > Are there other requirements people want to add to this list?
> >
> >
> >
> >
>
>


[GitHub] metron pull request #866: METRON-1349 Full Dev Builds Metron Twice

2017-12-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/866#discussion_r156746647
  
--- Diff: metron-deployment/playbooks/metron_install.yml ---
@@ -15,13 +15,6 @@
 #  limitations under the License.
 #
 ---
-- hosts: metron
-  become: true
-  roles:
-- { role: ambari_slave }
-- { role: metron-builder, tags: ['build'] }
--- End diff --

Yes


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/867

METRON-1350: Add reservoir sampling functions to Stellar

## Contributor Comments
Sampling capabilities would fit very well with the profiler and enable 
algorithms that do not necessarily support our existing probabilistic sketches. 
We should add a reservoir sampler and utilities to merge and resample.

You can play with `SAMPLE_INIT`, `SAMPLE_ADD`, `SAMPLE_MERGE` and 
`SAMPLE_GET` via the REPL:
```
[Stellar]>>> ?SAMPLE_ADD
SAMPLE_ADD
Description: Add to a sample

Arguments:
sampler - Sampler to use.  If null, then a default Uniform sampler is 
created
o - The value to add.  If o is an Iterable, then each item is added.

Returns:
[Stellar]>>> s_10 := SAMPLE_INIT(10)
[Stellar]>>> sample := REDUCE( [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ], (s, 
x) -> SAMPLE_ADD(s, x), SAMPLE_INIT(5))
[Stellar]>>> SAMPLE_GET(sample)
[6, 8, 11, 4, 5]
[Stellar]>>> SAMPLE_ADD(s_10, [5, 2, 5, 7, 10 ])
org.apache.metron.statistics.sampling.UniformSampler@3d8d06c0
[Stellar]>>> SAMPLE_GET(SAMPLE_ADD(s_10, [5, 2, 5, 7, 10 ]))
[5, 2, 5, 7, 10, 5, 2, 5, 7, 10]
```
## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron sampling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #867


commit 7e1a19e29f86a23140aa46291f0083409ddac40d
Author: cstella 
Date:   2017-12-13T20:59:40Z

METRON-1350: Add reservoir sampling functions to Stellar




---


Re: Metron - Emailing Alerts

2017-12-13 Thread James Sirota
I agree with Simon.  If you email each alert individually you will be 
overwhelmed.  I think a better idea would be to email alert summaries 
periodically, which is more manageable.  This is probably a feature worthy of 
consideration for Metron. 

13.12.2017, 12:19, "Simon Elliston Ball" :
> Metron generates alerts onto a Kafka queue, which can be used to integrate 
> with Alert management tools, usually some sort of existing alert aggregation 
> tool.
>
> An alternative approach common with this is to have a tool like Apache NiFi 
> attach to the Metron alert feed and send email.
>
> The solution here would be to have Metron generate alerts (by adding the 
> is_alert: true flag in the enrichment process) and possibly other flags like 
> alert_email for example, and then have NiFi use ConsumeKafka and then filter 
> out the alert only messages in NiFi to use the PutEmail processor (probably 
> with a ControlRate before it too).
>
> Something I would caution is that email is not a great way to manage or send 
> alerts at the volume likely to occur in network monitoring tools. A spike in 
> network traffic can lead to a very large number of emails, which tends to 
> then cause you bigger problems. As such we usually find people want some sort 
> of buffering or aggregation of alerts, hence the use of a an alert management 
> or ticketing solution in front.
>
> Simon
>
>>  On 13 Dec 2017, at 19:06, Ahmed Shah  wrote:
>>
>>  Hello,
>>  Just wondering if Metron has a feature to email alerts based on rules that 
>> a user defines.
>>
>>  Example:
>>  Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
>>  Rule B: Email the user 1...@1.com whenever payload contains "critical"
>>
>>  If not, does anyone have any recommendations on where to code these rules 
>> in the Metron stack that uses attributes from the GROK parser?
>>
>>  -Ahmed
>>  ___
>>  Ahmed Shah (PMP, M. Eng.)
>>  Cybersecurity Analyst & Developer
>>  GCR - Cybersecurity Operations Center
>>  Carleton University - cugcr.com

--- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


Re: [DISCUSS] Integration/e2e test infrastructure requirements

2017-12-13 Thread Ryan Merriman
I am going to start the process of creating Jiras out of these initial
requirements.  I agree with them and think they are a good starting point.
Feel free to join in at anytime and add/change/remove requirements as
needed.  I will update the thread once I have the initial Jiras created and
we can go from there.

On Mon, Dec 11, 2017 at 4:10 PM, Ryan Merriman  wrote:

> The purpose of this discussion is map out what is required to get the POC
> started with https://github.com/apache/metron/pull/858 into master.
>
> The following features were added in the previously mentioned PR:
>
>- Dockerfile for Metron REST
>- Dockerfile for Metron UIs
>- Docker Compose application including Metron images, Elasticsearch,
>Kafka, Zookeeper
>- Modified travis file that manages the Docker environment and runs
>the e2e tests as part of the build
>- Maven pom.xml that installs all the required assets into the Docker
>e2e module
>- Modified metron-alerts pom.xml that allows e2e tests to be run
>through Maven
>- An example integration test that has been converted to use the new
>infrastructure
>
> Here are the initial features proposed for acceptance into master:
>
>- All e2e and integration tests run on common infrastructure.
>- All e2e and integration tests are run automatically in the Travis
>build.
>- All e2e and integration tests run repeatably and reliably in the
>Travis build.
>- Debugging options are available and documented.
>- The new infra and how to interact with it is documented.
>- Old infrastructure removed (anything unused or commented out is
>deleted, instead of staying).
>
> Are there other requirements people want to add to this list?
>
>
>
>


[GitHub] metron issue #867: METRON-1350: Add reservoir sampling functions to Stellar

2017-12-13 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/867
  
Sorry, I am not sure I understand, this is random replacement when after 
the size limit.  Am I mistaking your question?


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156790508
  
--- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/SamplingInitFunctions.java
 ---
@@ -0,0 +1,89 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.statistics.sampling;
+
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.apache.metron.stellar.dsl.Stellar;
+import org.apache.metron.stellar.dsl.StellarFunction;
+
+import java.util.List;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class SamplingInitFunctions {
+
+  @Stellar(namespace="SAMPLE"
+  ,name="INIT"
+  ,description="Create a uniform reservoir sampler of a specific 
size or, if unspecified, size " + Sampler.DEFAULT_SIZE
+  ,params = {
+"size? - The size of the reservoir sampler.  If unspecified, 
the size is " + Sampler.DEFAULT_SIZE
+  }
+  ,returns="The sampler object."
+  )
+
+  public static class UniformSamplerInit implements StellarFunction {
+@Override
+public Object apply(List args, Context context) throws 
ParseException {
+  if(args.size() == 0) {
+return new UniformSampler();
+  }
+  else {
+Optional sizeArg = get(args, 0, "Size", Integer.class);
+if(sizeArg.isPresent() && sizeArg.get() <= 0) {
+  throw new IllegalStateException("Size must be a positive 
integer");
+}
+else {
+  return new UniformSampler(sizeArg.orElse(Sampler.DEFAULT_SIZE));
+}
+  }
+}
+
+@Override
+public void initialize(Context context) {
+}
+
+@Override
+public boolean isInitialized() {
+  return true;
+}
+  }
+
+
+  public static  Optional get(List args, int offset, String 
argName, Class expectedClazz) {
+Object obj = args.get(offset);
+T ret = ConversionUtils.convert(obj, expectedClazz);
--- End diff --

Couldn't this be simplified to :

```java 
if(ret == null ) {
  if(obj != null) {
 throw new IllegalStateException(argName + "argument(" + obj
+ " is expected to be an " + 
expectedClazz.getName()
+ ", but was " + obj
);
   }
}
return Optional.ofNullable(ret);
 }
```


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156788055
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

This makes it seem like Uniform sampler is a 'known' thing.  But it is not, 
either by explanation or reference to where it is explained ( as we have done 
referring to algorithms before ).
Is there another type of sampler?

Somewhere ( I'm not sure where ) we should say:
"A sampler is a x that is | does | acts as  x for the sample 
functions. The default has these properties, but you can override that in init"

Why even mention the Universal?





---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156792855
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

Ok, It seemed like the Uniform implementation was leaking


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156794655
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

There are definitely other types of reservoir samplers which we will 
probably want.  Most specifically a sampler that is biased toward recency (so 
non-uniform in that case).


---


Re: Metron - Emailing Alerts

2017-12-13 Thread James Sirota
I think there may be gaps in doing it with the profiler.  You can record stats 
and counts of different alert types, and maybe even alert ids, but you can't 
cross-correlate these IDs to the alert body.  At least not in the profiler.  I 
was thinking about emailing something that looks like a zeppelin report.  You 
would run it in a cron, export to PDF, and send that out as a summary.  It can 
be a simple list of alerts that match your rule, or it can have aggregations, 
graphics, metrics, KPI screens, etc.  That would be the feature that I would 
want to discuss and flesh out

Thanks,
James 

13.12.2017, 14:26, "Simon Elliston Ball" :
> We can already do that with profiles I would have thought. Create a profile 
> that only picks alerts and then base your emails only from the alert events 
> produced by that profile. Would that create the right batching mechanism (at 
> a cost of possible higher latency than you might get with a more specific 
> alert batcher?)
>
> Simon
>
>>  On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>
>>  I agree with Simon. If you email each alert individually you will be 
>> overwhelmed. I think a better idea would be to email alert summaries 
>> periodically, which is more manageable. This is probably a feature worthy of 
>> consideration for Metron.
>>
>>  13.12.2017, 12:19, "Simon Elliston Ball" :
>>>  Metron generates alerts onto a Kafka queue, which can be used to integrate 
>>> with Alert management tools, usually some sort of existing alert 
>>> aggregation tool.
>>>
>>>  An alternative approach common with this is to have a tool like Apache 
>>> NiFi attach to the Metron alert feed and send email.
>>>
>>>  The solution here would be to have Metron generate alerts (by adding the 
>>> is_alert: true flag in the enrichment process) and possibly other flags 
>>> like alert_email for example, and then have NiFi use ConsumeKafka and then 
>>> filter out the alert only messages in NiFi to use the PutEmail processor 
>>> (probably with a ControlRate before it too).
>>>
>>>  Something I would caution is that email is not a great way to manage or 
>>> send alerts at the volume likely to occur in network monitoring tools. A 
>>> spike in network traffic can lead to a very large number of emails, which 
>>> tends to then cause you bigger problems. As such we usually find people 
>>> want some sort of buffering or aggregation of alerts, hence the use of a an 
>>> alert management or ticketing solution in front.
>>>
>>>  Simon
>>>
   On 13 Dec 2017, at 19:06, Ahmed Shah  wrote:

   Hello,
   Just wondering if Metron has a feature to email alerts based on rules 
 that a user defines.

   Example:
   Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
   Rule B: Email the user 1...@1.com whenever payload contains "critical"

   If not, does anyone have any recommendations on where to code these 
 rules in the Metron stack that uses attributes from the GROK parser?

   -Ahmed
   ___
   Ahmed Shah (PMP, M. Eng.)
   Cybersecurity Analyst & Developer
   GCR - Cybersecurity Operations Center
   Carleton University - cugcr.com
>>
>>  ---
>>  Thank you,
>>
>>  James Sirota
>>  PMC- Apache Metron
>>  jsirota AT apache DOT org

--- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


Re: Metron - Emailing Alerts

2017-12-13 Thread Simon Elliston Ball
That makes a lot of sense, especially if you wanted the detail in the email as 
well. We could definitely use some good "reporting of alerts” functionality 
that would make something like that work. What do people think?

Simon

> On 13 Dec 2017, at 21:52, James Sirota  wrote:
> 
> I think there may be gaps in doing it with the profiler.  You can record 
> stats and counts of different alert types, and maybe even alert ids, but you 
> can't cross-correlate these IDs to the alert body.  At least not in the 
> profiler.  I was thinking about emailing something that looks like a zeppelin 
> report.  You would run it in a cron, export to PDF, and send that out as a 
> summary.  It can be a simple list of alerts that match your rule, or it can 
> have aggregations, graphics, metrics, KPI screens, etc.  That would be the 
> feature that I would want to discuss and flesh out
> 
> Thanks,
> James 
> 
> 13.12.2017, 14:26, "Simon Elliston Ball" :
>> We can already do that with profiles I would have thought. Create a profile 
>> that only picks alerts and then base your emails only from the alert events 
>> produced by that profile. Would that create the right batching mechanism (at 
>> a cost of possible higher latency than you might get with a more specific 
>> alert batcher?)
>> 
>> Simon
>> 
>>>  On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>> 
>>>  I agree with Simon. If you email each alert individually you will be 
>>> overwhelmed. I think a better idea would be to email alert summaries 
>>> periodically, which is more manageable. This is probably a feature worthy 
>>> of consideration for Metron.
>>> 
>>>  13.12.2017, 12:19, "Simon Elliston Ball" :
  Metron generates alerts onto a Kafka queue, which can be used to 
 integrate with Alert management tools, usually some sort of existing alert 
 aggregation tool.
 
  An alternative approach common with this is to have a tool like Apache 
 NiFi attach to the Metron alert feed and send email.
 
  The solution here would be to have Metron generate alerts (by adding the 
 is_alert: true flag in the enrichment process) and possibly other flags 
 like alert_email for example, and then have NiFi use ConsumeKafka and then 
 filter out the alert only messages in NiFi to use the PutEmail processor 
 (probably with a ControlRate before it too).
 
  Something I would caution is that email is not a great way to manage or 
 send alerts at the volume likely to occur in network monitoring tools. A 
 spike in network traffic can lead to a very large number of emails, which 
 tends to then cause you bigger problems. As such we usually find people 
 want some sort of buffering or aggregation of alerts, hence the use of a 
 an alert management or ticketing solution in front.
 
  Simon
 
>   On 13 Dec 2017, at 19:06, Ahmed Shah  
> wrote:
> 
>   Hello,
>   Just wondering if Metron has a feature to email alerts based on rules 
> that a user defines.
> 
>   Example:
>   Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
>   Rule B: Email the user 1...@1.com whenever payload contains "critical"
> 
>   If not, does anyone have any recommendations on where to code these 
> rules in the Metron stack that uses attributes from the GROK parser?
> 
>   -Ahmed
>   ___
>   Ahmed Shah (PMP, M. Eng.)
>   Cybersecurity Analyst & Developer
>   GCR - Cybersecurity Operations Center
>   Carleton University - cugcr.com
>>> 
>>>  ---
>>>  Thank you,
>>> 
>>>  James Sirota
>>>  PMC- Apache Metron
>>>  jsirota AT apache DOT org
> 
> --- 
> Thank you,
> 
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org



Re: Metron - Emailing Alerts

2017-12-13 Thread Otto Fowler
While summary of _any_ metron data ( perhaps by query etc ) would be good,
let us not lose sight of the OP’s issue.  Ever with summary|digest or one
at a time, they are looking for sending mails to certain people based on
rule.

A pseudo path may be

INDEXING -> New Topology or ?? -> evaluate rules -> bin matches to batches
per destination -> create digest from bin’s and send on batch size or
timeout ( as the bulk writer does )

I’m sure there is something wrong with this, but it is easier to frame it
in the way we do it now, and then work from there for me.



On December 13, 2017 at 16:55:35, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That makes a lot of sense, especially if you wanted the detail in the email
as well. We could definitely use some good "reporting of alerts”
functionality that would make something like that work. What do people
think?

Simon

> On 13 Dec 2017, at 21:52, James Sirota  wrote:
>
> I think there may be gaps in doing it with the profiler. You can record
stats and counts of different alert types, and maybe even alert ids, but
you can't cross-correlate these IDs to the alert body. At least not in the
profiler. I was thinking about emailing something that looks like a
zeppelin report. You would run it in a cron, export to PDF, and send that
out as a summary. It can be a simple list of alerts that match your rule,
or it can have aggregations, graphics, metrics, KPI screens, etc. That
would be the feature that I would want to discuss and flesh out
>
> Thanks,
> James
>
> 13.12.2017, 14:26, "Simon Elliston Ball" :
>> We can already do that with profiles I would have thought. Create a
profile that only picks alerts and then base your emails only from the
alert events produced by that profile. Would that create the right batching
mechanism (at a cost of possible higher latency than you might get with a
more specific alert batcher?)
>>
>> Simon
>>
>>> On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>>
>>> I agree with Simon. If you email each alert individually you will be
overwhelmed. I think a better idea would be to email alert summaries
periodically, which is more manageable. This is probably a feature worthy
of consideration for Metron.
>>>
>>> 13.12.2017, 12:19, "Simon Elliston Ball" :
 Metron generates alerts onto a Kafka queue, which can be used to
integrate with Alert management tools, usually some sort of existing alert
aggregation tool.

 An alternative approach common with this is to have a tool like Apache
NiFi attach to the Metron alert feed and send email.

 The solution here would be to have Metron generate alerts (by adding
the is_alert: true flag in the enrichment process) and possibly other flags
like alert_email for example, and then have NiFi use ConsumeKafka and then
filter out the alert only messages in NiFi to use the PutEmail processor
(probably with a ControlRate before it too).

 Something I would caution is that email is not a great way to manage
or send alerts at the volume likely to occur in network monitoring tools. A
spike in network traffic can lead to a very large number of emails, which
tends to then cause you bigger problems. As such we usually find people
want some sort of buffering or aggregation of alerts, hence the use of a an
alert management or ticketing solution in front.

 Simon

> On 13 Dec 2017, at 19:06, Ahmed Shah 
wrote:
>
> Hello,
> Just wondering if Metron has a feature to email alerts based on rules
that a user defines.
>
> Example:
> Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
> Rule B: Email the user 1...@1.com whenever payload contains "critical"
>
> If not, does anyone have any recommendations on where to code these
rules in the Metron stack that uses attributes from the GROK parser?
>
> -Ahmed
> ___
> Ahmed Shah (PMP, M. Eng.)
> Cybersecurity Analyst & Developer
> GCR - Cybersecurity Operations Center
> Carleton University - cugcr.com
>>>
>>> ---
>>> Thank you,
>>>
>>> James Sirota
>>> PMC- Apache Metron
>>> jsirota AT apache DOT org
>
> ---
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156799548
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

Then we'll have a get sample types method, like we do with other things 
like this right?


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156799950
  
--- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/UniformSampler.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.statistics.sampling;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+public class UniformSampler implements Sampler {
+  private List reservoir;
+  private int seen = 0;
+  private int size;
+  private Random rng = new Random(0);
+
+  public UniformSampler() {
+this(DEFAULT_SIZE);
+  }
+
+  public UniformSampler(int size) {
+this.size = size;
+reservoir = new ArrayList<>(size);
+  }
+
+  @Override
+  public Iterable get() {
+return reservoir;
+  }
+
+  /**
+   * Add an object to the reservoir
+   * @param o
+   */
+  public void add(Object o) {
+if(o == null) {
+  return;
+}
+if (reservoir.size() < size) {
+  reservoir.add(o);
+} else {
+  int rIndex = rng.nextInt(seen + 1);
--- End diff --

This makes me think that we need "namespace" scoped documentation


---


Re: Metron - Emailing Alerts

2017-12-13 Thread Simon Elliston Ball
We can already do that with profiles I would have thought. Create a profile 
that only picks alerts and then base your emails only from the alert events 
produced by that profile. Would that create the right batching mechanism (at a 
cost of possible higher latency than you might get with a more specific alert 
batcher?)

Simon 

> On 13 Dec 2017, at 21:23, James Sirota  wrote:
> 
> I agree with Simon.  If you email each alert individually you will be 
> overwhelmed.  I think a better idea would be to email alert summaries 
> periodically, which is more manageable.  This is probably a feature worthy of 
> consideration for Metron. 
> 
> 13.12.2017, 12:19, "Simon Elliston Ball" :
>> Metron generates alerts onto a Kafka queue, which can be used to integrate 
>> with Alert management tools, usually some sort of existing alert aggregation 
>> tool.
>> 
>> An alternative approach common with this is to have a tool like Apache NiFi 
>> attach to the Metron alert feed and send email.
>> 
>> The solution here would be to have Metron generate alerts (by adding the 
>> is_alert: true flag in the enrichment process) and possibly other flags like 
>> alert_email for example, and then have NiFi use ConsumeKafka and then filter 
>> out the alert only messages in NiFi to use the PutEmail processor (probably 
>> with a ControlRate before it too).
>> 
>> Something I would caution is that email is not a great way to manage or send 
>> alerts at the volume likely to occur in network monitoring tools. A spike in 
>> network traffic can lead to a very large number of emails, which tends to 
>> then cause you bigger problems. As such we usually find people want some 
>> sort of buffering or aggregation of alerts, hence the use of a an alert 
>> management or ticketing solution in front.
>> 
>> Simon
>> 
>>>  On 13 Dec 2017, at 19:06, Ahmed Shah  wrote:
>>> 
>>>  Hello,
>>>  Just wondering if Metron has a feature to email alerts based on rules that 
>>> a user defines.
>>> 
>>>  Example:
>>>  Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
>>>  Rule B: Email the user 1...@1.com whenever payload contains "critical"
>>> 
>>>  If not, does anyone have any recommendations on where to code these rules 
>>> in the Metron stack that uses attributes from the GROK parser?
>>> 
>>>  -Ahmed
>>>  ___
>>>  Ahmed Shah (PMP, M. Eng.)
>>>  Cybersecurity Analyst & Developer
>>>  GCR - Cybersecurity Operations Center
>>>  Carleton University - cugcr.com
> 
> --- 
> Thank you,
> 
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156790945
  
--- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/UniformSampler.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.statistics.sampling;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+public class UniformSampler implements Sampler {
+  private List reservoir;
+  private int seen = 0;
+  private int size;
+  private Random rng = new Random(0);
+
+  public UniformSampler() {
+this(DEFAULT_SIZE);
+  }
+
+  public UniformSampler(int size) {
+this.size = size;
+reservoir = new ArrayList<>(size);
+  }
+
+  @Override
+  public Iterable get() {
+return reservoir;
+  }
+
+  /**
+   * Add an object to the reservoir
+   * @param o
+   */
+  public void add(Object o) {
+if(o == null) {
+  return;
+}
+if (reservoir.size() < size) {
+  reservoir.add(o);
+} else {
+  int rIndex = rng.nextInt(seen + 1);
--- End diff --

Just so I'm clear, up to the reservoir size, we add to the reservoir.  When 
we're past the reservoir, we do a random replacement as per 
https://en.wikipedia.org/wiki/Reservoir_sampling


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread simonellistonball
Github user simonellistonball commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156791019
  
--- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/UniformSampler.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.statistics.sampling;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+public class UniformSampler implements Sampler {
+  private List reservoir;
+  private int seen = 0;
+  private int size;
+  private Random rng = new Random(0);
+
+  public UniformSampler() {
+this(DEFAULT_SIZE);
+  }
+
+  public UniformSampler(int size) {
+this.size = size;
+reservoir = new ArrayList<>(size);
+  }
+
+  @Override
+  public Iterable get() {
+return reservoir;
+  }
+
+  /**
+   * Add an object to the reservoir
+   * @param o
+   */
+  public void add(Object o) {
+if(o == null) {
+  return;
+}
+if (reservoir.size() < size) {
+  reservoir.add(o);
+} else {
+  int rIndex = rng.nextInt(seen + 1);
--- End diff --

you are 100% right, that's what I get for skim reading.


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156792461
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

Couldn't this be simplified to

```java
 if(ret == null ) {
  if(obj != null) {
 throw new IllegalStateException(argName + "argument(" + obj
+ " is expected to be an " + 
expectedClazz.getName()
+ ", but was " + obj
);
   }
 }
return Optional.ofNullable(ret);


```


---


Re: [DISCUSS] Community Meetings

2017-12-13 Thread James Sirota
I can set up a dedicated Zoom room with a recurrent meeting and give PMC 
members rights to the room.  I think hosting these meetings should not be a 
problem.  I would vote not to record them, but rather provide the notes after 
the meeting.  It's a lot easier to skim through the notes than jump around in a 
recording.  As Simon mentioned, I would also make it explicitly clear that the 
meetings are dev meetings.  These are not user Q and are not meant to be 
overviews of how different features of Metron work.  If we want to do feature 
demos or provide user content I would want that to be in its own separate 
meeting.

Thanks,
James 

13.12.2017, 05:00, "Otto Fowler" :
> I am ok with just notes and no recording.
>
> On December 13, 2017 at 04:37:20, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Good points Larry, we would need to get consent from everyone on the call
> to record to properly comply with regulations in some countries. We would
> definitely need someone to step up as note taker.
>
> Something else to think about is intended audience. Previously we’ve had
> meeting like this which have been very detailed Dev@ focussed (which is a
> great thing) but have rather alienated participants in User@ land. We need
> to make it clear what level we’re talking about to be inclusive.
>
> Simon
>
>>  On 13 Dec 2017, at 00:44, larry mccay  wrote:
>>
>>  Not sure about posting the recordings - you will need to check and make
>>  sure that doesn't violate anything.
>>
>>  Just a friendly reminder...
>>  It is important that meetings have notes and a summary that is sent out
>>  describing topics to be decided on the mailing list.
>>  No decisions can be made in the community meeting itself - this gives
>>  others in other timezones and commitments review and voice in the
>
> decisions.
>>  If it didn't happen on the mailing lists then it didn't happen. :)
>>
>>  On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
>>  si...@simonellistonball.com> wrote:
>>
>>>  Yes, I do.
>>>
>>>  I suspect the best bet will be to post recordings somewhere on the
>>>  apache.org  metron site.
>>>
>>>  Simon
>>>
  On 12 Dec 2017, at 18:36, Otto Fowler  wrote:

  Excellent, do you have the > 40 min + record option?

  On December 12, 2017 at 13:19:55, Simon Elliston Ball (
  si...@simonellistonball.com) wrote:

  Happy to volunteer a zoom room. That seems to have worked for most in
>
> the
  past.

  Simon

>  On 12 Dec 2017, at 18:09, Otto Fowler  wrote:
>
>  Thanks! I think I’d like something hosted though.
>
>  On December 12, 2017 at 11:18:52, Ahmed Shah (
>>>  ahmeds...@cmail.carleton.ca)
>  wrote:
>
>  Hello,
>
>  wrt "- How are we going to host it"...
>
>  I've used BigBlueButton as an end user at our University.
>
>  It is LGPL open source.
>
>  https://bigbluebutton.org/
>  https://bigbluebutton.org/developers/
>
>  -Ahmed
>
>  ___
>  Ahmed Shah (PMP, M. Eng.)
>  Cybersecurity Analyst & Developer
>  GCR - Cybersecurity Operations Center
>  Carleton University - cugcr.com
>
>  
>  From: Otto Fowler 
>  Sent: December 11, 2017 4:41 PM
>  To: dev@metron.apache.org
>  Subject: [DISCUSS] Community Meetings
>
>  I think that we all want to have regular community meetings. We may be
>  better able to keep to a regular schedule with these meetings if we
  spread
>  out the responsibility for them from James and Casey, both of whom
>
> have
>>>  a
>  lot on their plate already.
>
>  I would be willing to coordinate and run the meetings, and would
>
> welcome
>  anyone else who wants to help when they can.
>
>  The only issue for me is I do not have a web-ex account that I can use
>>>  to
>  hold the meeting. So I’ll need some recommendations for a suitable
>  alternative. I have not been able to find an Apache Friendly
>>>  alternative,
>  in the same way that Atlassian is apache friendly.
>
>  So - from what I can see we need to:
>
>  - Talk through who is going to do it
>  - How are we going to host it
>  - When are we going to do it
>
>  Anything else?
>
>  ottO

--- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


Re: [DISCUSS] Community Meetings

2017-12-13 Thread Otto Fowler
+1


On December 13, 2017 at 16:39:52, James Sirota (jsir...@apache.org) wrote:

I can set up a dedicated Zoom room with a recurrent meeting and give PMC
members rights to the room. I think hosting these meetings should not be a
problem. I would vote not to record them, but rather provide the notes
after the meeting. It's a lot easier to skim through the notes than jump
around in a recording. As Simon mentioned, I would also make it explicitly
clear that the meetings are dev meetings. These are not user Q and are
not meant to be overviews of how different features of Metron work. If we
want to do feature demos or provide user content I would want that to be in
its own separate meeting.

Thanks,
James

13.12.2017, 05:00, "Otto Fowler" :
> I am ok with just notes and no recording.
>
> On December 13, 2017 at 04:37:20, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Good points Larry, we would need to get consent from everyone on the call
> to record to properly comply with regulations in some countries. We would
> definitely need someone to step up as note taker.
>
> Something else to think about is intended audience. Previously we’ve had
> meeting like this which have been very detailed Dev@ focussed (which is a
> great thing) but have rather alienated participants in User@ land. We
need
> to make it clear what level we’re talking about to be inclusive.
>
> Simon
>
>>  On 13 Dec 2017, at 00:44, larry mccay  wrote:
>>
>>  Not sure about posting the recordings - you will need to check and make
>>  sure that doesn't violate anything.
>>
>>  Just a friendly reminder...
>>  It is important that meetings have notes and a summary that is sent out
>>  describing topics to be decided on the mailing list.
>>  No decisions can be made in the community meeting itself - this gives
>>  others in other timezones and commitments review and voice in the
>
> decisions.
>>  If it didn't happen on the mailing lists then it didn't happen. :)
>>
>>  On Tue, Dec 12, 2017 at 1:39 PM, Simon Elliston Ball <
>>  si...@simonellistonball.com> wrote:
>>
>>>  Yes, I do.
>>>
>>>  I suspect the best bet will be to post recordings somewhere on the
>>>  apache.org  metron site.
>>>
>>>  Simon
>>>
  On 12 Dec 2017, at 18:36, Otto Fowler 
wrote:

  Excellent, do you have the > 40 min + record option?

  On December 12, 2017 at 13:19:55, Simon Elliston Ball (
  si...@simonellistonball.com) wrote:

  Happy to volunteer a zoom room. That seems to have worked for most in
>
> the
  past.

  Simon

>  On 12 Dec 2017, at 18:09, Otto Fowler 
wrote:
>
>  Thanks! I think I’d like something hosted though.
>
>  On December 12, 2017 at 11:18:52, Ahmed Shah (
>>>  ahmeds...@cmail.carleton.ca)
>  wrote:
>
>  Hello,
>
>  wrt "- How are we going to host it"...
>
>  I've used BigBlueButton as an end user at our University.
>
>  It is LGPL open source.
>
>  https://bigbluebutton.org/
>  https://bigbluebutton.org/developers/
>
>  -Ahmed
>
>  ___
>  Ahmed Shah (PMP, M. Eng.)
>  Cybersecurity Analyst & Developer
>  GCR - Cybersecurity Operations Center
>  Carleton University - cugcr.com

>
>  
>  From: Otto Fowler 
>  Sent: December 11, 2017 4:41 PM
>  To: dev@metron.apache.org
>  Subject: [DISCUSS] Community Meetings
>
>  I think that we all want to have regular community meetings. We may
be
>  better able to keep to a regular schedule with these meetings if we
  spread
>  out the responsibility for them from James and Casey, both of whom
>
> have
>>>  a
>  lot on their plate already.
>
>  I would be willing to coordinate and run the meetings, and would
>
> welcome
>  anyone else who wants to help when they can.
>
>  The only issue for me is I do not have a web-ex account that I can
use
>>>  to
>  hold the meeting. So I’ll need some recommendations for a suitable
>  alternative. I have not been able to find an Apache Friendly
>>>  alternative,
>  in the same way that Atlassian is apache friendly.
>
>  So - from what I can see we need to:
>
>  - Talk through who is going to do it
>  - How are we going to host it
>  - When are we going to do it
>
>  Anything else?
>
>  ottO

---
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread simonellistonball
Github user simonellistonball commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156794990
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

Recency would surely be more relevant for merged resampling in a profile 
context? 


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156796690
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

They're both needed.  Some use-cases would be fine without bias and some 
would be better with bias.  As a follow-on, I was planning on adding a biased 
sampler, but this is a big enough PR without it.  It'd look something like:
```
samples := SAMPLE_MERGE(PROFILE_GET('samples', ...))
biased_sample := SAMPLE_GET_BIASED(samples, 0.015)
```


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156799854
  
--- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/UniformSampler.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.statistics.sampling;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+public class UniformSampler implements Sampler {
+  private List reservoir;
+  private int seen = 0;
+  private int size;
+  private Random rng = new Random(0);
+
+  public UniformSampler() {
+this(DEFAULT_SIZE);
+  }
+
+  public UniformSampler(int size) {
+this.size = size;
+reservoir = new ArrayList<>(size);
+  }
+
+  @Override
+  public Iterable get() {
+return reservoir;
+  }
+
+  /**
+   * Add an object to the reservoir
+   * @param o
+   */
+  public void add(Object o) {
+if(o == null) {
+  return;
+}
+if (reservoir.size() < size) {
+  reservoir.add(o);
+} else {
+  int rIndex = rng.nextInt(seen + 1);
--- End diff --

Shouldn't we reference Reservoir Sampling in the documentation?  Then the 
use of Universal and other terms would be more in context.


---


[GitHub] metron pull request #867: METRON-1350: Add reservoir sampling functions to S...

2017-12-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/867#discussion_r156800428
  
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -53,6 +53,32 @@ functions can be used from everywhere where Stellar is 
used.
   * bounds - A list of value bounds (excluding min and max) in sorted 
order.
 * Returns: Which bin N the value falls in such that bound(N-1) < value <= 
bound(N).  No min and max bounds are provided, so values smaller than the 0'th 
bound go in the 0'th bin, and values greater than the last bound go in the M'th 
bin.
 
+### Sampling Functions
+
+ `SAMPLE_ADD`
+* Description: Add a value or collection of values to a sampler.
+* Input:
--- End diff --

Actually, more than likely it'd be a separate init since each type are 
going to have different types of parameters depending on the algorithm.  So a 
biased sampler would be `SAMPLE_INIT_BIASED(size, ...)`


---


Re: Metron - Emailing Alerts

2017-12-13 Thread Otto Fowler
We could also filter out of enrichment to a different topology based on
field like Simon has said so that the rules are run on a filtered set etc.

also s/Ever/Either/


On December 13, 2017 at 17:03:15, Otto Fowler (ottobackwa...@gmail.com)
wrote:

While summary of _any_ metron data ( perhaps by query etc ) would be good,
let us not lose sight of the OP’s issue.  Ever with summary|digest or one
at a time, they are looking for sending mails to certain people based on
rule.

A pseudo path may be

INDEXING -> New Topology or ?? -> evaluate rules -> bin matches to batches
per destination -> create digest from bin’s and send on batch size or
timeout ( as the bulk writer does )

I’m sure there is something wrong with this, but it is easier to frame it
in the way we do it now, and then work from there for me.



On December 13, 2017 at 16:55:35, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That makes a lot of sense, especially if you wanted the detail in the email
as well. We could definitely use some good "reporting of alerts”
functionality that would make something like that work. What do people
think?

Simon

> On 13 Dec 2017, at 21:52, James Sirota  wrote:
>
> I think there may be gaps in doing it with the profiler. You can record
stats and counts of different alert types, and maybe even alert ids, but
you can't cross-correlate these IDs to the alert body. At least not in the
profiler. I was thinking about emailing something that looks like a
zeppelin report. You would run it in a cron, export to PDF, and send that
out as a summary. It can be a simple list of alerts that match your rule,
or it can have aggregations, graphics, metrics, KPI screens, etc. That
would be the feature that I would want to discuss and flesh out
>
> Thanks,
> James
>
> 13.12.2017, 14:26, "Simon Elliston Ball" :
>> We can already do that with profiles I would have thought. Create a
profile that only picks alerts and then base your emails only from the
alert events produced by that profile. Would that create the right batching
mechanism (at a cost of possible higher latency than you might get with a
more specific alert batcher?)
>>
>> Simon
>>
>>> On 13 Dec 2017, at 21:23, James Sirota  wrote:
>>>
>>> I agree with Simon. If you email each alert individually you will be
overwhelmed. I think a better idea would be to email alert summaries
periodically, which is more manageable. This is probably a feature worthy
of consideration for Metron.
>>>
>>> 13.12.2017, 12:19, "Simon Elliston Ball" :
 Metron generates alerts onto a Kafka queue, which can be used to
integrate with Alert management tools, usually some sort of existing alert
aggregation tool.

 An alternative approach common with this is to have a tool like Apache
NiFi attach to the Metron alert feed and send email.

 The solution here would be to have Metron generate alerts (by adding
the is_alert: true flag in the enrichment process) and possibly other flags
like alert_email for example, and then have NiFi use ConsumeKafka and then
filter out the alert only messages in NiFi to use the PutEmail processor
(probably with a ControlRate before it too).

 Something I would caution is that email is not a great way to manage
or send alerts at the volume likely to occur in network monitoring tools. A
spike in network traffic can lead to a very large number of emails, which
tends to then cause you bigger problems. As such we usually find people
want some sort of buffering or aggregation of alerts, hence the use of a an
alert management or ticketing solution in front.

 Simon

> On 13 Dec 2017, at 19:06, Ahmed Shah 
wrote:
>
> Hello,
> Just wondering if Metron has a feature to email alerts based on rules
that a user defines.
>
> Example:
> Rule A: Email the user 1...@1.com whenever ip_src_addr=100.2.10.*
> Rule B: Email the user 1...@1.com whenever payload contains "critical"
>
> If not, does anyone have any recommendations on where to code these
rules in the Metron stack that uses attributes from the GROK parser?
>
> -Ahmed
> ___
> Ahmed Shah (PMP, M. Eng.)
> Cybersecurity Analyst & Developer
> GCR - Cybersecurity Operations Center
> Carleton University - cugcr.com
>>>
>>> ---
>>> Thank you,
>>>
>>> James Sirota
>>> PMC- Apache Metron
>>> jsirota AT apache DOT org
>
> ---
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org