Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-27 Thread Otto Fowler
After some discussion with David, I am going to rebase my work off of
PR-436 ( ansible deploys with mpack )



On February 27, 2017 at 10:31:26, Otto Fowler (ottobackwa...@gmail.com)
wrote:

Sure,

I am not doing the ansible work for any other reason than ansible is what
we have now, and what works.

I realized that deployment of the current parsers needs to work with ambari
etc, and it is next.  I just did ansible first, because that is what works
for quick-dev etc.

As far as how to load a parser into an existing system, sure, we can change
or have multiple ways to do that.
I don’t know that this is too early, the best way to make sure you do
something is to start ;

If I do the rpm deployment for the current builds, having the archetype
still be ansible until we have the rest sorted out will not be that
terrible.


On February 27, 2017 at 10:14:04, David Lyle (dlyle65...@gmail.com) wrote:

Hi Otto,

It will have a pretty major effect. We agreed a bit back that we wanted, as
a community, to reduce reliance on Ansible, so I think an Ansible-based
parser loader would be sub-optimal. You may be working on this a bit in
early. There are some some major changes on the way that will could make
this easier for you. You could decide to expose this feature via REST once
the REST layer makes it in, or even use pieces of PR436 to help you.

-D...


On Mon, Feb 27, 2017 at 9:40 AM, Otto Fowler 
wrote:

> Not sure how https://github.com/apache/incubator-metron/pull/436 is going
> to effect this, but it will.
>
>
>
> On February 26, 2017 at 23:04:32, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> OK, Current status
>
> Complete:
> * Parsers broken out into common, base ( raw types - csv, json, grok ) and
> module for each type
> * Ansible modified to install new parsers, adding new parser should just
be
> adding a new name to the array
> * Parsers install into /usr/metron/ver/telemetry/{NAME}/ with contents of
> the archive/dir structure
> * Maven Archetype for creating metron parsers
> * Archetype includes ansible playbook and roles for deploying the produced
> parser into an existing metron system
> * Archetype has simple ( and not satisfactory ) sample implementation
> * Archetype uses input variables to name parser, classes, and configure
the
> ansible scripts, such that if you stick with the name you don’t have to
> make
> modifications for the parser in the project.
> * First pass README documentation of the archetype, playbook, roles,
> parsers common, parsers base and each parser ( will need jiras to document
> parsers )
> * Script to run playbook from archetype produced project and deploy parser
> into an existing vagrant instance - such that if you do quick or full-dev
> you can deploy your parser into it
>
> The bad:
> * the archetype version you give must be the same as the metron version,
> i’m not sure how I want to plumb that through
> * the deployment to monit doesn’t end with the parser start or monitored,
I
> don’t know how to add the service correctly apparently
> * no real data, so parser doesn’t deploy and then show in elasticsearch
etc
> * This breaks the RPM + ambari deployment - i’m starting to look at that,
I
> hate to hard code each jar, but I don’t know much about rpm specs ( help!
)
> and I’d like to do what I do in ansible, then I have to look at ambari and
> the comands
>
> The Ugly ( IE why this isn’t a pr with don’t merge yet )
> * I cannot build in travis. Just leaving in the install -DskipTests fails.
> Here is a raw log of a branch with only the build enabled, no tests :
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/205506160/log.txt. I
> think it has to do with the shading/reporting. That is where it seems to
> stall out (10 minutes without report ). I could honestly use another set
or
> sets of eyes
> with some experience on how to get the parser pom’s correct
> dependency-wise. This is my present concern. I can, as usually build
> locally.
> * I have not been able to deploy the new stuff to a cluster, only local
> vagrant.. and resource wise it is never great to start with, so it still
> needs some shaking out
>
> So - the big things keeping this away from a pr:
>
> * fixing the travis stuff
> * not regressing the rpm / ambari stuff
> * design review and feedback / iterations
>
> If anyone has any ideas or time, I’d be happy for them.
>
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-27 Thread Otto Fowler
Sure,

I am not doing the ansible work for any other reason than ansible is what
we have now, and what works.

I realized that deployment of the current parsers needs to work with ambari
etc, and it is next.  I just did ansible first, because that is what works
for quick-dev etc.

As far as how to load a parser into an existing system, sure, we can change
or have multiple ways to do that.
I don’t know that this is too early, the best way to make sure you do
something is to start ;

If I do the rpm deployment for the current builds, having the archetype
still be ansible until we have the rest sorted out will not be that
terrible.


On February 27, 2017 at 10:14:04, David Lyle (dlyle65...@gmail.com) wrote:

Hi Otto,

It will have a pretty major effect. We agreed a bit back that we wanted, as
a community, to reduce reliance on Ansible, so I think an Ansible-based
parser loader would be sub-optimal. You may be working on this a bit in
early. There are some some major changes on the way that will could make
this easier for you. You could decide to expose this feature via REST once
the REST layer makes it in, or even use pieces of PR436 to help you.

-D...


On Mon, Feb 27, 2017 at 9:40 AM, Otto Fowler 
wrote:

> Not sure how https://github.com/apache/incubator-metron/pull/436 is going
> to effect this, but it will.
>
>
>
> On February 26, 2017 at 23:04:32, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> OK, Current status
>
> Complete:
> * Parsers broken out into common, base ( raw types - csv, json, grok )
and
> module for each type
> * Ansible modified to install new parsers, adding new parser should just
be
> adding a new name to the array
> * Parsers install into /usr/metron/ver/telemetry/{NAME}/ with contents of
> the archive/dir structure
> * Maven Archetype for creating metron parsers
> * Archetype includes ansible playbook and roles for deploying the
produced
> parser into an existing metron system
> * Archetype has simple ( and not satisfactory ) sample implementation
> * Archetype uses input variables to name parser, classes, and configure
the
> ansible scripts, such that if you stick with the name you don’t have to
> make
> modifications for the parser in the project.
> * First pass README documentation of the archetype, playbook, roles,
> parsers common, parsers base and each parser ( will need jiras to
document
> parsers )
> * Script to run playbook from archetype produced project and deploy
parser
> into an existing vagrant instance - such that if you do quick or full-dev
> you can deploy your parser into it
>
> The bad:
> * the archetype version you give must be the same as the metron version,
> i’m not sure how I want to plumb that through
> * the deployment to monit doesn’t end with the parser start or monitored,
I
> don’t know how to add the service correctly apparently
> * no real data, so parser doesn’t deploy and then show in elasticsearch
etc
> * This breaks the RPM + ambari deployment - i’m starting to look at that,
I
> hate to hard code each jar, but I don’t know much about rpm specs ( help!
)
> and I’d like to do what I do in ansible, then I have to look at ambari
and
> the comands
>
> The Ugly ( IE why this isn’t a pr with don’t merge yet )
> * I cannot build in travis. Just leaving in the install -DskipTests
fails.
> Here is a raw log of a branch with only the build enabled, no tests :
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/205506160/log.txt. I
> think it has to do with the shading/reporting. That is where it seems to
> stall out (10 minutes without report ). I could honestly use another set
or
> sets of eyes
> with some experience on how to get the parser pom’s correct
> dependency-wise. This is my present concern. I can, as usually build
> locally.
> * I have not been able to deploy the new stuff to a cluster, only local
> vagrant.. and resource wise it is never great to start with, so it still
> needs some shaking out
>
> So - the big things keeping this away from a pr:
>
> * fixing the travis stuff
> * not regressing the rpm / ambari stuff
> * design review and feedback / iterations
>
> If anyone has any ideas or time, I’d be happy for them.
>
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment/roles
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> platform/metron-parsers
> 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-27 Thread David Lyle
Hi Otto,

It will have a pretty major effect. We agreed a bit back that we wanted, as
a community, to reduce reliance on Ansible, so I think an Ansible-based
parser loader would be sub-optimal. You may be working on this a bit in
early. There are some some major changes on the way that will could make
this easier for you. You could decide to expose this feature via REST once
the REST layer makes it in, or even use pieces of PR436 to help you.

-D...


On Mon, Feb 27, 2017 at 9:40 AM, Otto Fowler 
wrote:

> Not sure how  https://github.com/apache/incubator-metron/pull/436 is going
> to effect this, but it will.
>
>
>
> On February 26, 2017 at 23:04:32, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> OK,  Current status
>
> Complete:
> * Parsers broken out into common, base ( raw types - csv, json, grok ) and
> module for each type
> * Ansible modified to install new parsers, adding new parser should just be
> adding a new name to the array
> * Parsers install into /usr/metron/ver/telemetry/{NAME}/ with contents of
> the archive/dir structure
> * Maven Archetype for creating metron parsers
> * Archetype includes ansible playbook and roles for deploying the produced
> parser into an existing metron system
> * Archetype has simple ( and not satisfactory ) sample implementation
> * Archetype uses input variables to name parser, classes, and configure the
> ansible scripts, such that if you stick with the name you don’t have to
> make
> modifications for the parser in the project.
> * First pass README documentation of the archetype, playbook, roles,
> parsers common, parsers base and each parser ( will need jiras to document
> parsers )
> * Script to run playbook from archetype produced project and deploy parser
> into an existing vagrant instance - such that if you do quick or full-dev
> you can deploy your parser into it
>
> The bad:
> * the archetype version you give must be the same as the metron version,
> i’m not sure how I want to plumb that through
> * the deployment to monit doesn’t end with the parser start or monitored, I
> don’t know how to add the service correctly apparently
> * no real data, so parser doesn’t deploy and then show in elasticsearch etc
> * This breaks the RPM + ambari deployment - i’m starting to look at that, I
> hate to hard code each jar, but I don’t know much about rpm specs ( help! )
> and I’d like to do what I do in ansible, then I have to look at ambari and
> the comands
>
> The Ugly ( IE why this isn’t a pr with don’t merge yet )
> * I cannot build in travis.  Just leaving in the install -DskipTests fails.
> Here is a raw log of a branch with only the build enabled, no tests :
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/205506160/log.txt.  I
> think it has to do with the shading/reporting.  That is where it seems to
> stall out (10 minutes without report ). I could honestly use another set or
> sets of eyes
> with some experience on how to get the parser pom’s correct
> dependency-wise.  This is my present concern.  I can, as usually build
> locally.
> * I have not been able to deploy the new stuff to a cluster, only local
> vagrant.. and resource wise it is never great to start with, so it still
> needs some shaking out
>
> So - the big things keeping this away from a pr:
>
> * fixing the travis stuff
> * not regressing the rpm / ambari stuff
> * design review and feedback / iterations
>
> If anyone has any ideas or time, I’d be happy for them.
>
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment/roles
> https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-
> platform/metron-parsers
> https://github.com/ottobackwards/incubator-metron/blob/METRON-258/metron-
> maven-archetypes/metron-maven-parser-archetype/src/main/
> resources/archetype-resources/metron-parser-deployment/
> scripts/deploy_parsers_to_vagrant.sh
>
> etc etc
>
>
> Once we get this going, we can start talking about some next step ideas
>
>
>
> On February 20, 2017 at 14:26:12, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> More thoughts
>
> (1)  We should do a treatment for each area
> (2)  We can use the telemetry stuff as an incubator, itself to be replaced
> with something better that is developed after
> (3)  That is a nice idea - ‘live packaging’ ( i’m getting the TM and a
> website as we speak )
> (4)  sure, but we may need to think 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-27 Thread Otto Fowler
Not sure how  https://github.com/apache/incubator-metron/pull/436 is going
to effect this, but it will.



On February 26, 2017 at 23:04:32, Otto Fowler (ottobackwa...@gmail.com)
wrote:

OK,  Current status

Complete:
* Parsers broken out into common, base ( raw types - csv, json, grok ) and
module for each type
* Ansible modified to install new parsers, adding new parser should just be
adding a new name to the array
* Parsers install into /usr/metron/ver/telemetry/{NAME}/ with contents of
the archive/dir structure
* Maven Archetype for creating metron parsers
* Archetype includes ansible playbook and roles for deploying the produced
parser into an existing metron system
* Archetype has simple ( and not satisfactory ) sample implementation
* Archetype uses input variables to name parser, classes, and configure the
ansible scripts, such that if you stick with the name you don’t have to make
modifications for the parser in the project.
* First pass README documentation of the archetype, playbook, roles,
parsers common, parsers base and each parser ( will need jiras to document
parsers )
* Script to run playbook from archetype produced project and deploy parser
into an existing vagrant instance - such that if you do quick or full-dev
you can deploy your parser into it

The bad:
* the archetype version you give must be the same as the metron version,
i’m not sure how I want to plumb that through
* the deployment to monit doesn’t end with the parser start or monitored, I
don’t know how to add the service correctly apparently
* no real data, so parser doesn’t deploy and then show in elasticsearch etc
* This breaks the RPM + ambari deployment - i’m starting to look at that, I
hate to hard code each jar, but I don’t know much about rpm specs ( help! )
and I’d like to do what I do in ansible, then I have to look at ambari and
the comands

The Ugly ( IE why this isn’t a pr with don’t merge yet )
* I cannot build in travis.  Just leaving in the install -DskipTests fails.
Here is a raw log of a branch with only the build enabled, no tests :
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205506160/log.txt.  I
think it has to do with the shading/reporting.  That is where it seems to
stall out (10 minutes without report ). I could honestly use another set or
sets of eyes
with some experience on how to get the parser pom’s correct
dependency-wise.  This is my present concern.  I can, as usually build
locally.
* I have not been able to deploy the new stuff to a cluster, only local
vagrant.. and resource wise it is never great to start with, so it still
needs some shaking out

So - the big things keeping this away from a pr:

* fixing the travis stuff
* not regressing the rpm / ambari stuff
* design review and feedback / iterations

If anyone has any ideas or time, I’d be happy for them.

https://github.com/ottobackwards/incubator-metron/tree/METRON-258
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment/roles
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-platform/metron-parsers
https://github.com/ottobackwards/incubator-metron/blob/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment/scripts/deploy_parsers_to_vagrant.sh

etc etc


Once we get this going, we can start talking about some next step ideas



On February 20, 2017 at 14:26:12, Otto Fowler (ottobackwa...@gmail.com)
wrote:

More thoughts

(1)  We should do a treatment for each area
(2)  We can use the telemetry stuff as an incubator, itself to be replaced
with something better that is developed after
(3)  That is a nice idea - ‘live packaging’ ( i’m getting the TM and a
website as we speak )
(4)  sure, but we may need to think through the idea that an existing
mechanism my provide some of that and we piggy back on it, but that can be
a goal.
Having everything in one package, with a defined deployment and state
system will make that possible.

On February 20, 2017 at 13:58:08, Nick Allen (n...@nickallen.org) wrote:

Your mention of a "package mechanism" sparked some half-baked ideas on my
part.
Be forewarned these are probably tangents from your immediate goal (sorry),
but maybe these ideas might help shape how you want to take this forward.

​(1) ​
We should consider that eventually each "function" of Metron
​should be extensible
. Not just parsers, but enrichment, triage, indexing, profil
​ing​
, or maas. Ideally we could cover each of 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-26 Thread Otto Fowler
OK,  Current status

Complete:
* Parsers broken out into common, base ( raw types - csv, json, grok ) and
module for each type
* Ansible modified to install new parsers, adding new parser should just be
adding a new name to the array
* Parsers install into /usr/metron/ver/telemetry/{NAME}/ with contents of
the archive/dir structure
* Maven Archetype for creating metron parsers
* Archetype includes ansible playbook and roles for deploying the produced
parser into an existing metron system
* Archetype has simple ( and not satisfactory ) sample implementation
* Archetype uses input variables to name parser, classes, and configure the
ansible scripts, such that if you stick with the name you don’t have to make
modifications for the parser in the project.
* First pass README documentation of the archetype, playbook, roles,
parsers common, parsers base and each parser ( will need jiras to document
parsers )
* Script to run playbook from archetype produced project and deploy parser
into an existing vagrant instance - such that if you do quick or full-dev
you can deploy your parser into it

The bad:
* the archetype version you give must be the same as the metron version,
i’m not sure how I want to plumb that through
* the deployment to monit doesn’t end with the parser start or monitored, I
don’t know how to add the service correctly apparently
* no real data, so parser doesn’t deploy and then show in elasticsearch etc
* This breaks the RPM + ambari deployment - i’m starting to look at that, I
hate to hard code each jar, but I don’t know much about rpm specs ( help! )
and I’d like to do what I do in ansible, then I have to look at ambari and
the comands

The Ugly ( IE why this isn’t a pr with don’t merge yet )
* I cannot build in travis.  Just leaving in the install -DskipTests fails.
Here is a raw log of a branch with only the build enabled, no tests :
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205506160/log.txt.  I
think it has to do with the shading/reporting.  That is where it seems to
stall out (10 minutes without report ). I could honestly use another set or
sets of eyes
with some experience on how to get the parser pom’s correct
dependency-wise.  This is my present concern.  I can, as usually build
locally.
* I have not been able to deploy the new stuff to a cluster, only local
vagrant.. and resource wise it is never great to start with, so it still
needs some shaking out

So - the big things keeping this away from a pr:

* fixing the travis stuff
* not regressing the rpm / ambari stuff
* design review and feedback / iterations

If anyone has any ideas or time, I’d be happy for them.

https://github.com/ottobackwards/incubator-metron/tree/METRON-258
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment/roles
https://github.com/ottobackwards/incubator-metron/tree/METRON-258/metron-platform/metron-parsers
https://github.com/ottobackwards/incubator-metron/blob/METRON-258/metron-maven-archetypes/metron-maven-parser-archetype/src/main/resources/archetype-resources/metron-parser-deployment/scripts/deploy_parsers_to_vagrant.sh

etc etc


Once we get this going, we can start talking about some next step ideas



On February 20, 2017 at 14:26:12, Otto Fowler (ottobackwa...@gmail.com)
wrote:

More thoughts

(1)  We should do a treatment for each area
(2)  We can use the telemetry stuff as an incubator, itself to be replaced
with something better that is developed after
(3)  That is a nice idea - ‘live packaging’ ( i’m getting the TM and a
website as we speak )
(4)  sure, but we may need to think through the idea that an existing
mechanism my provide some of that and we piggy back on it, but that can be
a goal.
Having everything in one package, with a defined deployment and state
system will make that possible.

On February 20, 2017 at 13:58:08, Nick Allen (n...@nickallen.org) wrote:

Your mention of a "package mechanism" sparked some half-baked ideas on my
part.
Be forewarned these are probably tangents from your immediate goal (sorry),
but maybe these ideas might help shape how you want to take this forward.

​(1) ​
We should consider that eventually each "function" of Metron
​should be extensible
. Not just parsers, but enrichment, triage, indexing, profil
​ing​
, or maas. Ideally we could cover each of these
​functional areas​
​ with the same mechanism.​

​(2) ​
We would want packages to cover different
​kinds of ​
deployable bits; code (a new parser class), configuration (a triage rule

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-20 Thread Otto Fowler
More thoughts

(1)  We should do a treatment for each area
(2)  We can use the telemetry stuff as an incubator, itself to be replaced
with something better that is developed after
(3)  That is a nice idea - ‘live packaging’ ( i’m getting the TM and a
website as we speak )
(4)  sure, but we may need to think through the idea that an existing
mechanism my provide some of that and we piggy back on it, but that can be
a goal.
Having everything in one package, with a defined deployment and state
system will make that possible.

On February 20, 2017 at 13:58:08, Nick Allen (n...@nickallen.org) wrote:

Your mention of a "package mechanism" sparked some half-baked ideas on my
part.
Be forewarned these are probably tangents from your immediate goal (sorry),
but maybe these ideas might help shape how you want to take this forward.

​(1) ​
We should consider that eventually each "function" of Metron
​should be extensible
. Not just parsers, but enrichment, triage, indexing, profil
​ing​
, or maas. Ideally we could cover each of these
​functional areas​
​ with the same mechanism.​

​(2) ​
We would want packages to cover different
​kinds of ​
deployable bits; code (a new parser class), configuration (a triage rule
set), and also external actions (like deploying an Elasticsearch index
template or creating a Kafka topic).

​(3) ​
I'd also love to be able to export
​"​
packages
​"​
from a live system. For example, I setup a test Metron environment and
validate it. I can then export a package from the test environment and
import it into production.

​(4) ​
Could a package mechanism also help us
​provide​

​a ​
clean
​, automated​
upgrade path?

​
First, I export a package from my system running Metron version N. Then I
import that package into a separate system running version N+1. Importing
these packages gives us a hook where we can do upgrade-y stuff, like modify
the configs or do whatever needs done to upgrade
​​
. If I want a package that's native to version N+1, then I just export
​the package from the system running version N+1​
.





On Feb 17, 2017 4:54 PM, "Otto Fowler"  wrote:

> The ability for implementors and developers building on the project to
> ‘side load’, that is to build, maintain, and install, telemetry sources
> into the system without having to actually develop within METRON itself
is
> very important.
>
> If done properly it gives developers and easier and more manageable
> proposition for extending METRON to suit their needs in what may be the
> most common extension case. It also may reduce the necessity to create
and
> maintain forks of METRON.
>
> I would like to put forward a proposal on a way to move this forward, and
> ask the community for feedback and assistance in reaching an acceptable
> approach and raising the issues that I have surely missed.
>
> Conceptually what I would like to propose is the following:
>
> * What is currently metron-parsers should be broken apart such that each
> parser is it’s own individual component
> * Each of these components should be completely self contained ( or
produce
> a self contained package )
> * These packages will include the shaded jar for the parser, default
> configurations for the parser and enrichment, default elasticsearch
> template, and a default log-rotate script
> * These packages will be deployed to disk in a new library directory
under
> metron
> * Zookeeper should have a new telemetry or source area where all
> ‘installed’ sources exist
> * This area would host the default configurations, rules, templates, and
> scripts and metadata
> * Installed sources can be instantiated as named instances
> * Instantiating an instance will move the default configurations to what
is
> currently the enrichment and parser areas for the instance name
> * It will also deploy the elasticsearch template for the instance
> name
> * It will deploy the log-rotate scripts
> * Installed and instantiated sources can be ‘redeployed’ from disk to
> upgrade
> * Installed sources are available for selection in ambari
> * question on post selection configuration, but we have that problem
> already
> * Instantiation is exposed through REST
> * the UI can install a new package
> * the UI can allow a workflow to edit the configurations and templates
> before finalizing
> * are there three states here? Installed | Edited | Instantiated
> ?
> * the UI can edit existing and redeploy
> * possibly re-deploy ES template after adding fields or account for
fields
> added by enrichment…. manually or automatically?
> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
> with only configuration
> * The installation and instantiation should be exposed through the
Stellar
> management console
> * Starting a topology will now start the parser’s shaded jar found
through
> the parser type ( which may need to added to the configurations ) and the
> library
> * A Maven Archetype should be created for a parser | telemetry source
> project 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-20 Thread Otto Fowler
Update:  https://github.com/ottobackwards/incubator-metron/tree/METRON-258

I have all the parsers broken out, including squid and yaf which are
configuration only.
The idea there being we could and should have tests and other collateral
even for configuration only parsers.

I have created:

metron-parsers : top level
metron-parsers-common : all the common classes
metron-parsers-base : the base parsers for raw types, json,csv, grok  ( I’m
not in love with the name though )
metron-parser-XXX : a parser module for a specific type of parser, where
XXX is asa, bro,sourcefire etc etc

I have not worked out the zookeeper ‘installed’ entry yet, but have put
config/zookeeper/telemetry/asa/enrichment|parser to see what it would look
like.
I am also not sure the poms are pruned as much as they could be.

My first main focus is getting things split, building and the tests running.

Next steps:
*deployment to new lib area
*starting topologies from new lib area

unless anyone has other ideas

I am trying to track other ideas I can think of for the back end on ‘how a
3rd party would need this to work’ as I go, but much of that stuff will
fall out from actually trying it.


On February 17, 2017 at 14:54:51, Otto Fowler (ottobackwa...@gmail.com)
wrote:

The ability for implementors and developers building on the project to
‘side load’, that is to build, maintain, and install, telemetry sources
into the system without having to actually develop within METRON itself is
very important.

If done properly it gives developers and easier and more manageable
proposition for extending METRON to suit their needs in what may be the
most common extension case.  It also may reduce the necessity to create and
maintain forks of METRON.

I would like to put forward a proposal on a way to move this forward, and
ask the community for feedback and assistance in reaching an acceptable
approach and raising the issues that I have surely missed.

Conceptually what I would like to propose is the following:

* What is currently metron-parsers should be broken apart such that each
parser is it’s own individual component
* Each of these components should be completely self contained ( or produce
a self contained package )
* These packages will include the shaded jar for the parser, default
configurations for the parser and enrichment, default elasticsearch
template, and a default log-rotate script
* These packages will be deployed to disk in a new library directory under
metron
* Zookeeper should have a new telemetry or source area where all
‘installed’ sources exist
* This area would host the default configurations, rules, templates, and
scripts and metadata
* Installed sources can be instantiated as named instances
* Instantiating an instance will move the default configurations to what is
currently the enrichment and parser areas for the instance name
* It will also deploy the elasticsearch template for the instance
name
* It will deploy the log-rotate scripts
* Installed and instantiated sources can be ‘redeployed’ from disk to
upgrade
* Installed sources are available for selection in ambari
* question on post selection configuration, but we have that problem already
* Instantiation is exposed through REST
* the UI can install a new package
* the UI can allow a workflow to edit the configurations and templates
before finalizing
* are there three states here?   Installed | Edited | Instantiated
?
* the UI can edit existing and redeploy
* possibly re-deploy ES template after adding fields or account for fields
added by enrichment…. manually or automatically?
* a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
with only configuration
* The installation and instantiation should be exposed through the Stellar
management console
* Starting a topology will now start the parser’s shaded jar found through
the parser type ( which may need to added to the configurations ) and the
library
* A Maven Archetype should be created for a parser | telemetry source
project that allows the proper setup of a development project outside the
METRON source tree
* should be published
* should have a useful default set

So the developer’s workflow:

* Create a new project from the archetype outside of the metron tree
* edit the configurations, templates, rules etc in the project
* code or modify the sample
* build
* run the installer script or the ui to upload/deploy the package
* use the console or ui to create an instance

QUESTIONS:
* it seems strange to have this as ‘parsers’ when conceptually parsers are
a part of the whole, should we introduce something like ‘source’ that is
all of it?
* should configurations etc be in ZK or on disk? or HDFS? or All of the
above?
* did you read this far?  good!
* I am sure that after hitting send I will think of 10 things that are
missing from this

I have started a POC of this, and thus far have created
metron-parsers-common and started breaking out metron-parser-asa.
I will continue 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-20 Thread Otto Fowler
Yes, this exactly.  Similar to the NAR format for NiFi.  At this early
point, we can start with the tar.gz’s that we produce from assembly.
Now, if someone reads this and thinks “you are just re-inventing FOO, and I
love FOO!” get, that is an implementation detail anyways.
The important thing to wrap our heads around collectively is some kind of
(hopefully) consistent mechanism for packaging and deployment into metron
that is re-usable
across different feature areas.  Unless I’m missing Nick’s point.



On February 20, 2017 at 13:58:08, Nick Allen (n...@nickallen.org) wrote:

Your mention of a "package mechanism" sparked some half-baked ideas on my
part.
Be forewarned these are probably tangents from your immediate goal (sorry),
but maybe these ideas might help shape how you want to take this forward.

​(1) ​
We should consider that eventually each "function" of Metron
​should be extensible
. Not just parsers, but enrichment, triage, indexing, profil
​ing​
, or maas. Ideally we could cover each of these
​functional areas​
​ with the same mechanism.​

​(2) ​
We would want packages to cover different
​kinds of ​
deployable bits; code (a new parser class), configuration (a triage rule
set), and also external actions (like deploying an Elasticsearch index
template or creating a Kafka topic).

​(3) ​
I'd also love to be able to export
​"​
packages
​"​
from a live system. For example, I setup a test Metron environment and
validate it. I can then export a package from the test environment and
import it into production.

​(4) ​
Could a package mechanism also help us
​provide​

​a ​
clean
​, automated​
upgrade path?

​
First, I export a package from my system running Metron version N. Then I
import that package into a separate system running version N+1. Importing
these packages gives us a hook where we can do upgrade-y stuff, like modify
the configs or do whatever needs done to upgrade
​​
. If I want a package that's native to version N+1, then I just export
​the package from the system running version N+1​
.





On Feb 17, 2017 4:54 PM, "Otto Fowler"  wrote:

> The ability for implementors and developers building on the project to
> ‘side load’, that is to build, maintain, and install, telemetry sources
> into the system without having to actually develop within METRON itself
is
> very important.
>
> If done properly it gives developers and easier and more manageable
> proposition for extending METRON to suit their needs in what may be the
> most common extension case. It also may reduce the necessity to create
and
> maintain forks of METRON.
>
> I would like to put forward a proposal on a way to move this forward, and
> ask the community for feedback and assistance in reaching an acceptable
> approach and raising the issues that I have surely missed.
>
> Conceptually what I would like to propose is the following:
>
> * What is currently metron-parsers should be broken apart such that each
> parser is it’s own individual component
> * Each of these components should be completely self contained ( or
produce
> a self contained package )
> * These packages will include the shaded jar for the parser, default
> configurations for the parser and enrichment, default elasticsearch
> template, and a default log-rotate script
> * These packages will be deployed to disk in a new library directory
under
> metron
> * Zookeeper should have a new telemetry or source area where all
> ‘installed’ sources exist
> * This area would host the default configurations, rules, templates, and
> scripts and metadata
> * Installed sources can be instantiated as named instances
> * Instantiating an instance will move the default configurations to what
is
> currently the enrichment and parser areas for the instance name
> * It will also deploy the elasticsearch template for the instance
> name
> * It will deploy the log-rotate scripts
> * Installed and instantiated sources can be ‘redeployed’ from disk to
> upgrade
> * Installed sources are available for selection in ambari
> * question on post selection configuration, but we have that problem
> already
> * Instantiation is exposed through REST
> * the UI can install a new package
> * the UI can allow a workflow to edit the configurations and templates
> before finalizing
> * are there three states here? Installed | Edited | Instantiated
> ?
> * the UI can edit existing and redeploy
> * possibly re-deploy ES template after adding fields or account for
fields
> added by enrichment…. manually or automatically?
> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
> with only configuration
> * The installation and instantiation should be exposed through the
Stellar
> management console
> * Starting a topology will now start the parser’s shaded jar found
through
> the parser type ( which may need to added to the configurations ) and the
> library
> * A Maven Archetype should be created for a parser | telemetry source
> project that allows the 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-20 Thread Nick Allen
Your mention of a "package mechanism" sparked some half-baked ideas on my
part.
Be forewarned these are probably tangents from your immediate goal (sorry),
but maybe these ideas might help shape how you want to take this forward.

​(1) ​
We should consider that eventually each "function" of Metron
​should be extensible
.  Not just parsers, but enrichment, triage, indexing, profil
​ing​
, or maas. Ideally we could cover each of these
​functional areas​
​ with the same mechanism.​

​(2) ​
We would want packages to cover different
​kinds of ​
deployable bits; code (a new parser class), configuration (a triage rule
set), and also external actions (like deploying an Elasticsearch index
template or creating a Kafka topic).

​(3) ​
I'd also love to be able to export
​"​
packages
​"​
from a live system. For example, I setup a test Metron environment and
validate it. I can then export a package from the test environment and
import it into production.

​(4) ​
Could a package mechanism also help us
​provide​

​a ​
clean
​, automated​
upgrade path?

​
First, I export a package from my system running Metron version N. Then I
import that package into a separate system running version N+1. Importing
these packages gives us a hook where we can do upgrade-y stuff, like modify
the configs or do whatever needs done to upgrade
​​
.  If I want a package that's native to version N+1, then I just export
​the package from the system running version N+1​
.





On Feb 17, 2017 4:54 PM, "Otto Fowler"  wrote:

> The ability for implementors and developers building on the project to
> ‘side load’, that is to build, maintain, and install, telemetry sources
> into the system without having to actually develop within METRON itself is
> very important.
>
> If done properly it gives developers and easier and more manageable
> proposition for extending METRON to suit their needs in what may be the
> most common extension case.  It also may reduce the necessity to create and
> maintain forks of METRON.
>
> I would like to put forward a proposal on a way to move this forward, and
> ask the community for feedback and assistance in reaching an acceptable
> approach and raising the issues that I have surely missed.
>
> Conceptually what I would like to propose is the following:
>
> * What is currently metron-parsers should be broken apart such that each
> parser is it’s own individual component
> * Each of these components should be completely self contained ( or produce
> a self contained package )
> * These packages will include the shaded jar for the parser, default
> configurations for the parser and enrichment, default elasticsearch
> template, and a default log-rotate script
> * These packages will be deployed to disk in a new library directory under
> metron
> * Zookeeper should have a new telemetry or source area where all
> ‘installed’ sources exist
> * This area would host the default configurations, rules, templates, and
> scripts and metadata
> * Installed sources can be instantiated as named instances
> * Instantiating an instance will move the default configurations to what is
> currently the enrichment and parser areas for the instance name
> * It will also deploy the elasticsearch template for the instance
> name
> * It will deploy the log-rotate scripts
> * Installed and instantiated sources can be ‘redeployed’ from disk to
> upgrade
> * Installed sources are available for selection in ambari
> * question on post selection configuration, but we have that problem
> already
> * Instantiation is exposed through REST
> * the UI can install a new package
> * the UI can allow a workflow to edit the configurations and templates
> before finalizing
> * are there three states here?   Installed | Edited | Instantiated
> ?
> * the UI can edit existing and redeploy
> * possibly re-deploy ES template after adding fields or account for fields
> added by enrichment…. manually or automatically?
> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
> with only configuration
> * The installation and instantiation should be exposed through the Stellar
> management console
> * Starting a topology will now start the parser’s shaded jar found through
> the parser type ( which may need to added to the configurations ) and the
> library
> * A Maven Archetype should be created for a parser | telemetry source
> project that allows the proper setup of a development project outside the
> METRON source tree
> * should be published
> * should have a useful default set
>
> So the developer’s workflow:
>
> * Create a new project from the archetype outside of the metron tree
> * edit the configurations, templates, rules etc in the project
> * code or modify the sample
> * build
> * run the installer script or the ui to upload/deploy the package
> * use the console or ui to create an instance
>
> QUESTIONS:
> * it seems strange to have this as ‘parsers’ when conceptually parsers are
> a 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-19 Thread zeo...@gmail.com
Awesome write up and ideas Otto, I also strongly support this idea.  As
someone who has the development of a few parsers quickly approaching the
top of their to do list, I will happily beta test this for you when it's
far enough along for that.  Until then I will attempt to take a look at
your branch and come up to speed more thoroughly.

Regarding the storage of configurations, I'm in favor of ZK for largely the
reasons that mattf mentioned, but also for organizational reasons.  Finding
configurations should be intuitive and to do that I think storing them in a
common area is reasonable and makes them easier to audit.

I'll leave my specific comments regarding management of indexing templates
for the other thread, but I think that getting our arms around a solution
where modifications in one part of the stack accounts for updates to other
places will be key in improving adoption.

Also, James, I have had direct requests regarding the templating parser
assistance that you outlined, so I know that parts of the community are
looking for that exact feature.  It gets a big +1 from me.

Jon

On Sat, Feb 18, 2017, 12:51 PM Otto Fowler  wrote:

I plan on looking at the NiFi archetype to see if there is something there
about this and other things.  I think this is very similar to the nar.



On February 18, 2017 at 12:07:35, James Sirota (jsir...@apache.org) wrote:

I like the idea of having each parser as its own maven module and having an
archetype for it. In my vision when you click to create a maven
"metronParser" archetype what you would get is a module consisting of a
blank parser template with a parse() method that a dev would have to fill
in, the associated test template with some rudimentary tests pre-filled,
and two test resource files to populate with raw data and parsed data. I
think it's clean and extensible.

I think one thing we would have to worry about with this approach is
classpath issues. If there is not a top-level POM anymore then you are
increasing chances of different parser modules pulling in different
versions of the same library.

Thanks,
James



18.02.2017, 07:24, "Otto Fowler" :
> Thanks for taking the time Matt,
>
> It is likely that I am not seeing your point clearly, could you elaborate
> how Spring or Guice would be applicable to this proposal if there is no
> intent to change the parser’s composition or run-time functionality, but
> rather it’s deployment and external management? I will admit that I am
> starting with the idea that I don’t want to change how the parsers work,
so
> I may be limiting my thinking. This is also based on my limited
> understanding of what we need to deliver to storm. Even if the parsers
etc
> were using spring or guice at runtime, wouldn’t we still have to deliver
> the right uber jar to storm?
>
> With regards to the configuration, the idea would be that the current
> runtime configurations would stay exactly how they are now, only be
> delivered differently. So ZK->Parser would be the same.
>
> On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote:
>
> Outstanding write-up, Otto! As Casey said, don’t expect this to be a
> coherent response, but some possibly useful thoughts:
>
> 1. It’s clear that because parsers, enrichers, and indexers are all
> specialized per sensor, that “adding a new sensor” is necessarily a
complex
> operation. You’ve thrown a lasso around it all, and suggested
> auto-generation of the generic parts. Excellent start.
>
> In my fuzzy computer-sciencey way, your sketch makes me view this as an
> Inversion of Control scenario (
> https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have
> to define this for our readers, but allow me to quote one paragraph, from
> article
>
http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
> :
>
> “[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be
> added and tested independently of other objects, because they don't
depend
> on anything other than what you pass them. When using traditional
> dependencies, to test an object you have to create an environment where
all
> of its dependencies exist and are reachable before you can test it. With
> [IoC or] DI, it's possible to test the object in isolation passing it
mock
> objects for the ones you don't want or need to create. Likewise, adding a
> class to a project is facilitated because the class is self-contained, so
> this avoids the ‘big hairball’ that large projects often evolve into.”
>
> Surely part of what we want, no? Does it make sense to use Spring or
Guice
> to drive the integration (and design) of this extensibility capability? I
> know this could be viewed as an implementation issue, but you said you’re
> starting to prototype, and these things are best integrated from the
> beginning.
>
> 2. Regarding configuration, consider that some (dynamic config
parameters)
> will 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-18 Thread James Sirota
I like the idea of having each parser as its own maven module and having an 
archetype for it.  In my vision when you click to create a maven "metronParser" 
archetype what you would get is a module consisting of a blank parser template 
with a parse() method that a dev would have to fill in, the associated test 
template with some rudimentary tests pre-filled, and two test resource files to 
populate with raw data and parsed data.  I think it's clean and extensible. 

I think one thing we would have to worry about with this approach is classpath 
issues.  If there is not a top-level POM anymore then you are increasing 
chances of different parser modules pulling in different versions of the same 
library.  

Thanks,
James 



18.02.2017, 07:24, "Otto Fowler" :
> Thanks for taking the time Matt,
>
> It is likely that I am not seeing your point clearly, could you elaborate
> how Spring or Guice would be applicable to this proposal if there is no
> intent to change the parser’s composition or run-time functionality, but
> rather it’s deployment and external management? I will admit that I am
> starting with the idea that I don’t want to change how the parsers work, so
> I may be limiting my thinking. This is also based on my limited
> understanding of what we need to deliver to storm. Even if the parsers etc
> were using spring or guice at runtime, wouldn’t we still have to deliver
> the right uber jar to storm?
>
> With regards to the configuration, the idea would be that the current
> runtime configurations would stay exactly how they are now, only be
> delivered differently. So ZK->Parser would be the same.
>
> On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote:
>
> Outstanding write-up, Otto! As Casey said, don’t expect this to be a
> coherent response, but some possibly useful thoughts:
>
> 1. It’s clear that because parsers, enrichers, and indexers are all
> specialized per sensor, that “adding a new sensor” is necessarily a complex
> operation. You’ve thrown a lasso around it all, and suggested
> auto-generation of the generic parts. Excellent start.
>
> In my fuzzy computer-sciencey way, your sketch makes me view this as an
> Inversion of Control scenario (
> https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have
> to define this for our readers, but allow me to quote one paragraph, from
> article
> http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
> :
>
> “[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be
> added and tested independently of other objects, because they don't depend
> on anything other than what you pass them. When using traditional
> dependencies, to test an object you have to create an environment where all
> of its dependencies exist and are reachable before you can test it. With
> [IoC or] DI, it's possible to test the object in isolation passing it mock
> objects for the ones you don't want or need to create. Likewise, adding a
> class to a project is facilitated because the class is self-contained, so
> this avoids the ‘big hairball’ that large projects often evolve into.”
>
> Surely part of what we want, no? Does it make sense to use Spring or Guice
> to drive the integration (and design) of this extensibility capability? I
> know this could be viewed as an implementation issue, but you said you’re
> starting to prototype, and these things are best integrated from the
> beginning.
>
> 2. Regarding configuration, consider that some (dynamic config parameters)
> will be dynamically read during runtime and some (static config parameters)
> will require restarting (or re-instantiating) the components. Config params
> that want to be read dynamically should definitely go in ZK so they can
> take advantage of Curator notifications. Static config params, that can
> only usefully be set at startup or instantiation, could either go in ZK or
> be handled the traditional way in Ambari as files on all configured hosts.
> If you choose to put static params also in ZK, note that separating static
> and dynamic configs into different znodes makes the process of monitoring
> changes in the dynamic configs more efficient, and this is unrelated to the
> human-readable grouping of params the user sees in a UI.
>
> I am talking with Ambari engineers about implementing an ability for Ambari
> to manage config parameters in ZK, at the option of the component
> implementor, and expect to be opening Apache Ambari jiras soon. At the
> Ambari UI level there should be no difference; at the implementation level
> a json or other config file could be written once to a ZK znode instead of
> to filesystem files on all configured hosts. The usages could be mixed,
> with the component implementation deciding which config files get written
> to which target.
>
> 3. Yes I read that far :-)
>
> Again, great draft.
> Thanks,
> --Matt
>
> On 2/17/17, 1:07 PM, "Otto 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-18 Thread Otto Fowler
Thanks for taking the time Matt,

It is likely that I am not seeing your point clearly, could you elaborate
how Spring or Guice would be applicable to this proposal if there is no
intent to change the parser’s composition or run-time functionality, but
rather it’s deployment and external management? I will admit that I am
starting with the idea that I don’t want to change how the parsers work, so
I may be limiting my thinking.  This is also based on my limited
understanding of what we need to deliver to storm.  Even if the parsers etc
were using spring or guice at runtime, wouldn’t we still have to deliver
the right uber jar to storm?

With regards to the configuration, the idea would be that the current
runtime configurations would stay exactly how they are now, only be
delivered differently.   So ZK->Parser would be the same.


On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote:

Outstanding write-up, Otto! As Casey said, don’t expect this to be a
coherent response, but some possibly useful thoughts:

1. It’s clear that because parsers, enrichers, and indexers are all
specialized per sensor, that “adding a new sensor” is necessarily a complex
operation. You’ve thrown a lasso around it all, and suggested
auto-generation of the generic parts. Excellent start.

In my fuzzy computer-sciencey way, your sketch makes me view this as an
Inversion of Control scenario (
https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have
to define this for our readers, but allow me to quote one paragraph, from
article
http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
:

“[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be
added and tested independently of other objects, because they don't depend
on anything other than what you pass them. When using traditional
dependencies, to test an object you have to create an environment where all
of its dependencies exist and are reachable before you can test it. With
[IoC or] DI, it's possible to test the object in isolation passing it mock
objects for the ones you don't want or need to create. Likewise, adding a
class to a project is facilitated because the class is self-contained, so
this avoids the ‘big hairball’ that large projects often evolve into.”

Surely part of what we want, no? Does it make sense to use Spring or Guice
to drive the integration (and design) of this extensibility capability? I
know this could be viewed as an implementation issue, but you said you’re
starting to prototype, and these things are best integrated from the
beginning.


2. Regarding configuration, consider that some (dynamic config parameters)
will be dynamically read during runtime and some (static config parameters)
will require restarting (or re-instantiating) the components. Config params
that want to be read dynamically should definitely go in ZK so they can
take advantage of Curator notifications. Static config params, that can
only usefully be set at startup or instantiation, could either go in ZK or
be handled the traditional way in Ambari as files on all configured hosts.
If you choose to put static params also in ZK, note that separating static
and dynamic configs into different znodes makes the process of monitoring
changes in the dynamic configs more efficient, and this is unrelated to the
human-readable grouping of params the user sees in a UI.

I am talking with Ambari engineers about implementing an ability for Ambari
to manage config parameters in ZK, at the option of the component
implementor, and expect to be opening Apache Ambari jiras soon. At the
Ambari UI level there should be no difference; at the implementation level
a json or other config file could be written once to a ZK znode instead of
to filesystem files on all configured hosts. The usages could be mixed,
with the component implementation deciding which config files get written
to which target.

3. Yes I read that far :-)

Again, great draft.
Thanks,
--Matt

On 2/17/17, 1:07 PM, "Otto Fowler"  wrote:

RE:
* One Module - yes, I think grouping for the base parsers is good, I just
don’t want them to stay in -common, it should ‘live’ in the metron lib. I
think a grouped set of the primitive parsers is correct, still it’s own.
* ES Templates - they don’t *have* to be there, but if they are they will
be used. The idea that I’m having is “ someone writing a parser should be
able to produce 1 thing, in one place”. We are talking with Simon on a
different thread about the types of indexing templates we could have. I
think we could have from *nothing to es or solr specific to something new

As we discuss we can come up with the mv-pr.

On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote:

Ok, This is a long one, so don't expect a coherent response just yet, but I
will give some initial impressions:

- I strongly agree with the premise of this idea. Making Metron

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-17 Thread Matt Foley
Outstanding write-up, Otto!  As Casey said, don’t expect this to be a coherent 
response, but some possibly useful thoughts:

1. It’s clear that because parsers, enrichers, and indexers are all specialized 
per sensor, that “adding a new sensor” is necessarily a complex operation.  
You’ve thrown a lasso around it all, and suggested auto-generation of the 
generic parts.  Excellent start.

In my fuzzy computer-sciencey way, your sketch makes me view this as an 
Inversion of Control scenario ( 
https://en.wikipedia.org/wiki/Inversion_of_control ).  I know I don’t have to 
define this for our readers, but allow me to quote one paragraph, from article 
http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
 :

“[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be 
added and tested independently of other objects, because they don't depend on 
anything other than what you pass them. When using traditional dependencies, to 
test an object you have to create an environment where all of its dependencies 
exist and are reachable before you can test it. With [IoC or] DI, it's possible 
to test the object in isolation passing it mock objects for the ones you don't 
want or need to create. Likewise, adding a class to a project is facilitated 
because the class is self-contained, so this avoids the ‘big hairball’ that 
large projects often evolve into.”

Surely part of what we want, no?  Does it make sense to use Spring or Guice to 
drive the integration (and design) of this extensibility capability?  I know 
this could be viewed as an implementation issue, but you said you’re starting 
to prototype, and these things are best integrated from the beginning.


2. Regarding configuration, consider that some (dynamic config parameters) will 
be dynamically read during runtime and some (static config parameters) will 
require restarting (or re-instantiating) the components.  Config params that 
want to be read dynamically should definitely go in ZK so they can take 
advantage of Curator notifications.  Static config params, that can only 
usefully be set at startup or instantiation, could either go in ZK or be 
handled the traditional way in Ambari as files on all configured hosts.  If you 
choose to put static params also in ZK, note that separating static and dynamic 
configs into different znodes makes the process of monitoring changes in the 
dynamic configs more efficient, and this is unrelated to the human-readable 
grouping of params the user sees in a UI.

I am talking with Ambari engineers about implementing an ability for Ambari to 
manage config parameters in ZK, at the option of the component implementor, and 
expect to be opening Apache Ambari jiras soon.  At the Ambari UI level there 
should be no difference; at the implementation level a json or other config 
file could be written once to a ZK znode instead of to filesystem files on all 
configured hosts.  The usages could be mixed, with the component implementation 
deciding which config files get written to which target.

3. Yes I read that far :-)

Again, great draft.
Thanks,
--Matt

On 2/17/17, 1:07 PM, "Otto Fowler"  wrote:

RE:
* One Module - yes, I think grouping for the base parsers is good,  I just
don’t want them to stay in -common, it should ‘live’ in the metron lib.  I
think a grouped set of the primitive parsers is correct, still it’s own.
* ES Templates - they don’t *have* to be there, but if they are they will
be used.  The idea that I’m having is “ someone writing a parser should be
able to produce 1 thing, in one place”.  We are talking with Simon on a
different thread about the types of indexing templates we could have.  I
think we could have from *nothing to es or solr specific to something new

As we discuss we can come up with the mv-pr.

On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote:

Ok, This is a long one, so don't expect a coherent response just yet, but I
will give some initial impressions:

- I strongly agree with the premise of this idea. Making Metron
extensible is and should be among the top of our priorities and at the
moment, it's painful to develop a new parser.
- One maven module per parser may be overkill here as the shading is
costly and I think it may make some sense to group based on characteristics
in some way (e.g. json and csv may get grouped together).
- The notion of instance vs parser is a good one
- Binding ES templates and parsers may not be a good idea. You can have
non-indexed parsers (e.g. streaming enrichments).

Can we start small here and then iterate toward the complete vision? I'd
recommend

- Splitting the parsers up into some coherent organization with common
bits separated from the parser itself
- Having a maven archetype

As 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-17 Thread Otto Fowler
RE:
* One Module - yes, I think grouping for the base parsers is good,  I just
don’t want them to stay in -common, it should ‘live’ in the metron lib.  I
think a grouped set of the primitive parsers is correct, still it’s own.
* ES Templates - they don’t *have* to be there, but if they are they will
be used.  The idea that I’m having is “ someone writing a parser should be
able to produce 1 thing, in one place”.  We are talking with Simon on a
different thread about the types of indexing templates we could have.  I
think we could have from *nothing to es or solr specific to something new

As we discuss we can come up with the mv-pr.

On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote:

Ok, This is a long one, so don't expect a coherent response just yet, but I
will give some initial impressions:

- I strongly agree with the premise of this idea. Making Metron
extensible is and should be among the top of our priorities and at the
moment, it's painful to develop a new parser.
- One maven module per parser may be overkill here as the shading is
costly and I think it may make some sense to group based on characteristics
in some way (e.g. json and csv may get grouped together).
- The notion of instance vs parser is a good one
- Binding ES templates and parsers may not be a good idea. You can have
non-indexed parsers (e.g. streaming enrichments).

Can we start small here and then iterate toward the complete vision? I'd
recommend

- Splitting the parsers up into some coherent organization with common
bits separated from the parser itself
- Having a maven archetype

As the two most valuable and achievable parts of this idea since they are
the bits required to enable users to create parsers without forking Metron.

On Fri, Feb 17, 2017 at 11:54 AM, Otto Fowler 
wrote:

> The ability for implementors and developers building on the project to
> ‘side load’, that is to build, maintain, and install, telemetry sources
> into the system without having to actually develop within METRON itself
is
> very important.
>
> If done properly it gives developers and easier and more manageable
> proposition for extending METRON to suit their needs in what may be the
> most common extension case. It also may reduce the necessity to create
and
> maintain forks of METRON.
>
> I would like to put forward a proposal on a way to move this forward, and
> ask the community for feedback and assistance in reaching an acceptable
> approach and raising the issues that I have surely missed.
>
> Conceptually what I would like to propose is the following:
>
> * What is currently metron-parsers should be broken apart such that each
> parser is it’s own individual component
> * Each of these components should be completely self contained ( or
produce
> a self contained package )
> * These packages will include the shaded jar for the parser, default
> configurations for the parser and enrichment, default elasticsearch
> template, and a default log-rotate script
> * These packages will be deployed to disk in a new library directory
under
> metron
> * Zookeeper should have a new telemetry or source area where all
> ‘installed’ sources exist
> * This area would host the default configurations, rules, templates, and
> scripts and metadata
> * Installed sources can be instantiated as named instances
> * Instantiating an instance will move the default configurations to what
is
> currently the enrichment and parser areas for the instance name
> * It will also deploy the elasticsearch template for the instance
> name
> * It will deploy the log-rotate scripts
> * Installed and instantiated sources can be ‘redeployed’ from disk to
> upgrade
> * Installed sources are available for selection in ambari
> * question on post selection configuration, but we have that problem
> already
> * Instantiation is exposed through REST
> * the UI can install a new package
> * the UI can allow a workflow to edit the configurations and templates
> before finalizing
> * are there three states here? Installed | Edited | Instantiated
> ?
> * the UI can edit existing and redeploy
> * possibly re-deploy ES template after adding fields or account for
fields
> added by enrichment…. manually or automatically?
> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
> with only configuration
> * The installation and instantiation should be exposed through the
Stellar
> management console
> * Starting a topology will now start the parser’s shaded jar found
through
> the parser type ( which may need to added to the configurations ) and the
> library
> * A Maven Archetype should be created for a parser | telemetry source
> project that allows the proper setup of a development project outside the
> METRON source tree
> * should be published
> * should have a useful default set
>
> So the developer’s workflow:
>
> * Create a new project from the archetype outside of the metron tree
> * edit the configurations, templates, 

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-17 Thread Casey Stella
Ok, This is a long one, so don't expect a coherent response just yet, but I
will give some initial impressions:

   - I strongly agree with the premise of this idea.  Making Metron
   extensible is and should be among the top of our priorities and at the
   moment, it's painful to develop a new parser.
   - One maven module per parser may be overkill here as the shading is
   costly and I think it may make some sense to group based on characteristics
   in some way (e.g. json and csv may get grouped together).
   - The notion of instance vs parser is a good one
   - Binding ES templates and parsers may not be a good idea.  You can have
   non-indexed parsers (e.g. streaming enrichments).

Can we start small here and then iterate toward the complete vision?  I'd
recommend

   - Splitting the parsers up into some coherent organization with common
   bits separated from the parser itself
   - Having a maven archetype

As the two most valuable and achievable parts of this idea since they are
the bits required to enable users to create parsers without forking Metron.

On Fri, Feb 17, 2017 at 11:54 AM, Otto Fowler 
wrote:

> The ability for implementors and developers building on the project to
> ‘side load’, that is to build, maintain, and install, telemetry sources
> into the system without having to actually develop within METRON itself is
> very important.
>
> If done properly it gives developers and easier and more manageable
> proposition for extending METRON to suit their needs in what may be the
> most common extension case.  It also may reduce the necessity to create and
> maintain forks of METRON.
>
> I would like to put forward a proposal on a way to move this forward, and
> ask the community for feedback and assistance in reaching an acceptable
> approach and raising the issues that I have surely missed.
>
> Conceptually what I would like to propose is the following:
>
> * What is currently metron-parsers should be broken apart such that each
> parser is it’s own individual component
> * Each of these components should be completely self contained ( or produce
> a self contained package )
> * These packages will include the shaded jar for the parser, default
> configurations for the parser and enrichment, default elasticsearch
> template, and a default log-rotate script
> * These packages will be deployed to disk in a new library directory under
> metron
> * Zookeeper should have a new telemetry or source area where all
> ‘installed’ sources exist
> * This area would host the default configurations, rules, templates, and
> scripts and metadata
> * Installed sources can be instantiated as named instances
> * Instantiating an instance will move the default configurations to what is
> currently the enrichment and parser areas for the instance name
> * It will also deploy the elasticsearch template for the instance
> name
> * It will deploy the log-rotate scripts
> * Installed and instantiated sources can be ‘redeployed’ from disk to
> upgrade
> * Installed sources are available for selection in ambari
> * question on post selection configuration, but we have that problem
> already
> * Instantiation is exposed through REST
> * the UI can install a new package
> * the UI can allow a workflow to edit the configurations and templates
> before finalizing
> * are there three states here?   Installed | Edited | Instantiated
> ?
> * the UI can edit existing and redeploy
> * possibly re-deploy ES template after adding fields or account for fields
> added by enrichment…. manually or automatically?
> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
> with only configuration
> * The installation and instantiation should be exposed through the Stellar
> management console
> * Starting a topology will now start the parser’s shaded jar found through
> the parser type ( which may need to added to the configurations ) and the
> library
> * A Maven Archetype should be created for a parser | telemetry source
> project that allows the proper setup of a development project outside the
> METRON source tree
> * should be published
> * should have a useful default set
>
> So the developer’s workflow:
>
> * Create a new project from the archetype outside of the metron tree
> * edit the configurations, templates, rules etc in the project
> * code or modify the sample
> * build
> * run the installer script or the ui to upload/deploy the package
> * use the console or ui to create an instance
>
> QUESTIONS:
> * it seems strange to have this as ‘parsers’ when conceptually parsers are
> a part of the whole, should we introduce something like ‘source’ that is
> all of it?
> * should configurations etc be in ZK or on disk? or HDFS? or All of the
> above?
> * did you read this far?  good!
> * I am sure that after hitting send I will think of 10 things that are
> missing from this
>
> I have started a POC of this, and thus far have created
> metron-parsers-common 

[DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

2017-02-17 Thread Otto Fowler
The ability for implementors and developers building on the project to
‘side load’, that is to build, maintain, and install, telemetry sources
into the system without having to actually develop within METRON itself is
very important.

If done properly it gives developers and easier and more manageable
proposition for extending METRON to suit their needs in what may be the
most common extension case.  It also may reduce the necessity to create and
maintain forks of METRON.

I would like to put forward a proposal on a way to move this forward, and
ask the community for feedback and assistance in reaching an acceptable
approach and raising the issues that I have surely missed.

Conceptually what I would like to propose is the following:

* What is currently metron-parsers should be broken apart such that each
parser is it’s own individual component
* Each of these components should be completely self contained ( or produce
a self contained package )
* These packages will include the shaded jar for the parser, default
configurations for the parser and enrichment, default elasticsearch
template, and a default log-rotate script
* These packages will be deployed to disk in a new library directory under
metron
* Zookeeper should have a new telemetry or source area where all
‘installed’ sources exist
* This area would host the default configurations, rules, templates, and
scripts and metadata
* Installed sources can be instantiated as named instances
* Instantiating an instance will move the default configurations to what is
currently the enrichment and parser areas for the instance name
* It will also deploy the elasticsearch template for the instance
name
* It will deploy the log-rotate scripts
* Installed and instantiated sources can be ‘redeployed’ from disk to
upgrade
* Installed sources are available for selection in ambari
* question on post selection configuration, but we have that problem already
* Instantiation is exposed through REST
* the UI can install a new package
* the UI can allow a workflow to edit the configurations and templates
before finalizing
* are there three states here?   Installed | Edited | Instantiated
?
* the UI can edit existing and redeploy
* possibly re-deploy ES template after adding fields or account for fields
added by enrichment…. manually or automatically?
* a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
with only configuration
* The installation and instantiation should be exposed through the Stellar
management console
* Starting a topology will now start the parser’s shaded jar found through
the parser type ( which may need to added to the configurations ) and the
library
* A Maven Archetype should be created for a parser | telemetry source
project that allows the proper setup of a development project outside the
METRON source tree
* should be published
* should have a useful default set

So the developer’s workflow:

* Create a new project from the archetype outside of the metron tree
* edit the configurations, templates, rules etc in the project
* code or modify the sample
* build
* run the installer script or the ui to upload/deploy the package
* use the console or ui to create an instance

QUESTIONS:
* it seems strange to have this as ‘parsers’ when conceptually parsers are
a part of the whole, should we introduce something like ‘source’ that is
all of it?
* should configurations etc be in ZK or on disk? or HDFS? or All of the
above?
* did you read this far?  good!
* I am sure that after hitting send I will think of 10 things that are
missing from this

I have started a POC of this, and thus far have created
metron-parsers-common and started breaking out metron-parser-asa.
I will continue to work through some of this here
https://github.com/ottobackwards/incubator-metron/tree/METRON-258

Again,  thank you for your time and feedback.