from:"Michael Miklavcic"

[DISCUSS] Are/how are you using the ES data pruner?

2017-11-22 Thread Michael Miklavcic

>From what I can tell, the data pruner isn't documented anywhere, so I'm
curious if anybody is using this, and if so, how are you using it?

   -
   
https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/README.md
   -
   
https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/bulk/ElasticsearchDataPrunerRunner.java
   -
   
https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/bulk/DataPruner.java

It looks to me that it allows you to specify the start date and a number of
days for lookback from the start date to purge along with a regex pattern
to match the index name. It also does not look like it has any built-in
scheduling semantics, so I assume this was a cron job. I think that about
covers it. Anything I've missed?

I'm adding a quick doc write-up to METRON-939 (
https://github.com/apache/metron/pull/840) for using Curator to prune
indices from Elasticsearch. It is desirable to make sure I've covered
existing use cases.

Best,
Mike

Re: [DISCUSS] Are/how are you using the ES data pruner?

2017-11-22 Thread Michael Miklavcic

Thanks Ali, that's good feedback. Would you be willing to share any of your
Curator calls/config and use cases with the community? I'd love to add it
to a document around ES pruning in the short term, and maybe we could look
at how to build this into indexing at some point.

Cheers,
Mike

On Nov 22, 2017 8:53 PM, "Ali Nazemian"  wrote:

> We tried to use it, but we had the same issue. It was not documented. We
> tried to use it, and we had some issues. It also was not exactly what we
> wanted, so we decided to create something from scratch by using
> Elasticsearch Curator. We wanted to have an ability to manage different
> prune mechanism for different feeds. Having a hard threshold to remove
> index and Soft threshold to close that index. Maybe it can be a feature to
> add to the indexing JSON config file per feed.
>
> Cheers,
> Ali
>
> On Thu, Nov 23, 2017 at 12:20 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > From what I can tell, the data pruner isn't documented anywhere, so I'm
> > curious if anybody is using this, and if so, how are you using it?
> >
> >-
> >https://github.com/apache/metron/blob/master/metron-
> > platform/metron-data-management/README.md
> >-
> >https://github.com/apache/metron/blob/master/metron-
> > platform/metron-data-management/src/main/java/org/
> > apache/metron/dataloads/bulk/ElasticsearchDataPrunerRunner.java
> >-
> >https://github.com/apache/metron/blob/master/metron-
> > platform/metron-data-management/src/main/java/org/
> > apache/metron/dataloads/bulk/DataPruner.java
> >
> > It looks to me that it allows you to specify the start date and a number
> of
> > days for lookback from the start date to purge along with a regex pattern
> > to match the index name. It also does not look like it has any built-in
> > scheduling semantics, so I assume this was a cron job. I think that about
> > covers it. Anything I've missed?
> >
> > I'm adding a quick doc write-up to METRON-939 (
> > https://github.com/apache/metron/pull/840) for using Curator to prune
> > indices from Elasticsearch. It is desirable to make sure I've covered
> > existing use cases.
> >
> > Best,
> > Mike
> >
>
>
>
> --
> A.Nazemian
>

Re: [DISCUSS] Are/how are you using the ES data pruner?

2017-11-27 Thread Michael Miklavcic

It's a worthy mention. Our existing pruner wouldn't be able to handle Solr
without modification, so we'd either need something native to Solr or
something custom.

Mike

On Mon, Nov 27, 2017 at 3:46 PM, James Sirota  wrote:

> One thing to keep in mind, as we will be introducing Solr shortly, is to
> find if something similar to curator exists for Solr.  But we'll cross that
> bridge when we get there
>
> 22.11.2017, 22:58, "Ali Nazemian" :
> > Sure. I will have a chat internally and come back to you shortly. It was
> a
> > quick and dirty work actually just to fix this temporarily. However, it
> > might be a good starting point.
> >
> > On Thu, Nov 23, 2017 at 3:31 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> >>  Thanks Ali, that's good feedback. Would you be willing to share any of
> your
> >>  Curator calls/config and use cases with the community? I'd love to add
> it
> >>  to a document around ES pruning in the short term, and maybe we could
> look
> >>  at how to build this into indexing at some point.
> >>
> >>  Cheers,
> >>  Mike
> >>
> >>  On Nov 22, 2017 8:53 PM, "Ali Nazemian"  wrote:
> >>
> >>  > We tried to use it, but we had the same issue. It was not
> documented. We
> >>  > tried to use it, and we had some issues. It also was not exactly
> what we
> >>  > wanted, so we decided to create something from scratch by using
> >>  > Elasticsearch Curator. We wanted to have an ability to manage
> different
> >>  > prune mechanism for different feeds. Having a hard threshold to
> remove
> >>  > index and Soft threshold to close that index. Maybe it can be a
> feature
> >>  to
> >>  > add to the indexing JSON config file per feed.
> >>  >
> >>  > Cheers,
> >>  > Ali
> >>  >
> >>  > On Thu, Nov 23, 2017 at 12:20 PM, Michael Miklavcic <
> >>  > michael.miklav...@gmail.com> wrote:
> >>  >
> >>  > > From what I can tell, the data pruner isn't documented anywhere,
> so I'm
> >>  > > curious if anybody is using this, and if so, how are you using it?
> >>  > >
> >>  > > -
> >>  > > https://github.com/apache/metron/blob/master/metron-
> >>  > > platform/metron-data-management/README.md
> >>  > > -
> >>  > > https://github.com/apache/metron/blob/master/metron-
> >>  > > platform/metron-data-management/src/main/java/org/
> >>  > > apache/metron/dataloads/bulk/ElasticsearchDataPrunerRunner.java
> >>  > > -
> >>  > > https://github.com/apache/metron/blob/master/metron-
> >>  > > platform/metron-data-management/src/main/java/org/
> >>  > > apache/metron/dataloads/bulk/DataPruner.java
> >>  > >
> >>  > > It looks to me that it allows you to specify the start date and a
> >>  number
> >>  > of
> >>  > > days for lookback from the start date to purge along with a regex
> >>  pattern
> >>  > > to match the index name. It also does not look like it has any
> built-in
> >>  > > scheduling semantics, so I assume this was a cron job. I think that
> >>  about
> >>  > > covers it. Anything I've missed?
> >>  > >
> >>  > > I'm adding a quick doc write-up to METRON-939 (
> >>  > > https://github.com/apache/metron/pull/840) for using Curator to
> prune
> >>  > > indices from Elasticsearch. It is desirable to make sure I've
> covered
> >>  > > existing use cases.
> >>  > >
> >>  > > Best,
> >>  > > Mike
> >>  > >
> >>  >
> >>  >
> >>  >
> >>  > --
> >>  > A.Nazemian
> >>  >
> >
> > --
> > A.Nazemian
>
> ---
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org
>

Re: [DISCUSS] e2e test infrastructure

2017-11-28 Thread Michael Miklavcic

What about just spinning up each of the components in their own process?
It's even lighter weight, doesn't have the complications for HDFS (you can
use the local FS easily, for example), and doesn't have any issues around
ports and port mapping with the containers.

On Tue, Nov 28, 2017 at 3:48 PM, Otto Fowler 
wrote:

> As long as there is not a large chuck of custom deployment that has to be
> maintained docker sounds ideal.
> I would like to understand what it would take to create the docker e2e env.
>
>
>
> On November 28, 2017 at 17:27:13, Ryan Merriman (merrim...@gmail.com)
> wrote:
>
> Currently the e2e tests for our Alerts UI depends on full dev being up and
> running. This is not a good long term solution because it forces a
> contributor/reviewer to run the tests manually with full dev running. It
> would be better if the backend services could be made available to the e2e
> tests while running in Travis. This would allow us to add the e2e tests to
> our automated build process.
>
> What is the right approach? Here are some options I can think of:
>
> - Use the in-memory components we use for the backend integration tests
> - Use a Docker approach
> - Use mock components designed for the e2e tests
>
> Mocking the backend would be my least favorite option because it would
> introduce a complex module of code that we have to maintain.
>
> The in-memory approach has some shortcomings but we may be able to solve
> some of those by moving components to their own process and spinning them
> up/down at the beginning/end of tests. Plus we are already using them.
>
> My preference would be Docker because it most closely mimics a real
> installation and gives you isolation, networking and dependency management
> features OOTB. In many cases Dockerfiles are maintained and published by a
> third party and require no work other than some setup like loading data or
> templates/schemas. Elasticsearch is a good example.
>
> I believe we could make any of these approaches work in Travis. What does
> everyone think?
>
> Ryan
>

Re: [DISCUSS] e2e test infrastructure

2017-11-29 Thread Michael Miklavcic

I'd also be strongly in favor of having 1 approach to running our e2e and
integration tests.

On Wed, Nov 29, 2017 at 5:59 AM, Justin Leet  wrote:

> As an additional consideration, it would be really nice to get our current
> set of integration tests to be able to be run on this infrastructure as
> well. Or at least able to be converted in a known manner. Eventually, we
> could probably split out the integration tests from the unit tests
> entirely. It would likely improve the build times if we we're reusing the
> components between test classes (keep in mind right now, we only reuse
> between test cases in a given class).
>
> In my mind, ideally we have a single infra for integration and e2e tests.
> I'd like to be able to run them from IntelliJ and debug them directly (or
> at least be able to easily, and in a well documented manner, be able to do
> remote debugging of them). Obviously, that's easier said than done, but
> what I'd like to avoid us having essentially two different ways to do the
> same thing (spin up some our of dependency components and run code against
> them). I'm worried that's quick vs full dev all over again.  But without us
> being able to easily kill one because half of tests depend on one and half
> on the other.
>
> On Wed, Nov 29, 2017 at 1:22 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > What about just spinning up each of the components in their own process?
> > It's even lighter weight, doesn't have the complications for HDFS (you
> can
> > use the local FS easily, for example), and doesn't have any issues around
> > ports and port mapping with the containers.
> >
> > On Tue, Nov 28, 2017 at 3:48 PM, Otto Fowler 
> > wrote:
> >
> > > As long as there is not a large chuck of custom deployment that has to
> be
> > > maintained docker sounds ideal.
> > > I would like to understand what it would take to create the docker e2e
> > env.
> > >
> > >
> > >
> > > On November 28, 2017 at 17:27:13, Ryan Merriman (merrim...@gmail.com)
> > > wrote:
> > >
> > > Currently the e2e tests for our Alerts UI depends on full dev being up
> > and
> > > running. This is not a good long term solution because it forces a
> > > contributor/reviewer to run the tests manually with full dev running.
> It
> > > would be better if the backend services could be made available to the
> > e2e
> > > tests while running in Travis. This would allow us to add the e2e tests
> > to
> > > our automated build process.
> > >
> > > What is the right approach? Here are some options I can think of:
> > >
> > > - Use the in-memory components we use for the backend integration tests
> > > - Use a Docker approach
> > > - Use mock components designed for the e2e tests
> > >
> > > Mocking the backend would be my least favorite option because it would
> > > introduce a complex module of code that we have to maintain.
> > >
> > > The in-memory approach has some shortcomings but we may be able to
> solve
> > > some of those by moving components to their own process and spinning
> them
> > > up/down at the beginning/end of tests. Plus we are already using them.
> > >
> > > My preference would be Docker because it most closely mimics a real
> > > installation and gives you isolation, networking and dependency
> > management
> > > features OOTB. In many cases Dockerfiles are maintained and published
> by
> > a
> > > third party and require no work other than some setup like loading data
> > or
> > > templates/schemas. Elasticsearch is a good example.
> > >
> > > I believe we could make any of these approaches work in Travis. What
> does
> > > everyone think?
> > >
> > > Ryan
> > >
> >
>

Re: [DISCUSS] e2e test infrastructure

2017-11-29 Thread Michael Miklavcic

be able to be run on this infrastructure as
> > well. Or at least able to be converted in a known manner. Eventually, we
> > could probably split out the integration tests from the unit tests
> > entirely. It would likely improve the build times if we we're reusing the
> > components between test classes (keep in mind right now, we only reuse
> > between test cases in a given class).
> >
> > In my mind, ideally we have a single infra for integration and e2e tests.
> > I'd like to be able to run them from IntelliJ and debug them directly (or
> > at least be able to easily, and in a well documented manner, be able to
> do
> > remote debugging of them). Obviously, that's easier said than done, but
> > what I'd like to avoid us having essentially two different ways to do the
> > same thing (spin up some our of dependency components and run code
> against
> > them). I'm worried that's quick vs full dev all over again.  But without
> us
> > being able to easily kill one because half of tests depend on one and
> half
> > on the other.
> >
> > On Wed, Nov 29, 2017 at 1:22 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > What about just spinning up each of the components in their own
> process?
> > > It's even lighter weight, doesn't have the complications for HDFS (you
> > can
> > > use the local FS easily, for example), and doesn't have any issues
> around
> > > ports and port mapping with the containers.
> > >
> > > On Tue, Nov 28, 2017 at 3:48 PM, Otto Fowler 
> > > wrote:
> > >
> > > > As long as there is not a large chuck of custom deployment that has
> to
> > be
> > > > maintained docker sounds ideal.
> > > > I would like to understand what it would take to create the docker
> e2e
> > > env.
> > > >
> > > >
> > > >
> > > > On November 28, 2017 at 17:27:13, Ryan Merriman (merrim...@gmail.com
> )
> > > > wrote:
> > > >
> > > > Currently the e2e tests for our Alerts UI depends on full dev being
> up
> > > and
> > > > running. This is not a good long term solution because it forces a
> > > > contributor/reviewer to run the tests manually with full dev running.
> > It
> > > > would be better if the backend services could be made available to
> the
> > > e2e
> > > > tests while running in Travis. This would allow us to add the e2e
> tests
> > > to
> > > > our automated build process.
> > > >
> > > > What is the right approach? Here are some options I can think of:
> > > >
> > > > - Use the in-memory components we use for the backend integration
> tests
> > > > - Use a Docker approach
> > > > - Use mock components designed for the e2e tests
> > > >
> > > > Mocking the backend would be my least favorite option because it
> would
> > > > introduce a complex module of code that we have to maintain.
> > > >
> > > > The in-memory approach has some shortcomings but we may be able to
> > solve
> > > > some of those by moving components to their own process and spinning
> > them
> > > > up/down at the beginning/end of tests. Plus we are already using
> them.
> > > >
> > > > My preference would be Docker because it most closely mimics a real
> > > > installation and gives you isolation, networking and dependency
> > > management
> > > > features OOTB. In many cases Dockerfiles are maintained and published
> > by
> > > a
> > > > third party and require no work other than some setup like loading
> data
> > > or
> > > > templates/schemas. Elasticsearch is a good example.
> > > >
> > > > I believe we could make any of these approaches work in Travis. What
> > does
> > > > everyone think?
> > > >
> > > > Ryan
> > > >
> > >
> >
>

Re: Heterogeneous indexing batch size for different Metron feeds

2017-12-10 Thread Michael Miklavcic

de.md

Sample command without Kerberos enabled (see link [1] for more detail with
Kerberos):

watch -n 10 -d ${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
--describe \
--group indexing \
    --bootstrap-server $BROKERLIST \
--new-consumer

Hope this helps.

Cheers,
Michael Miklavcic


On Sun, Dec 10, 2017 at 5:38 AM, Ali Nazemian  wrote:

> This seems not the same as our observations. Whenever there are some
> messages in the indexing or enrichments backlog, the new configurations (at
> least related to the batch size) won't be applied to the new messages. It
> will remain as the previous state until it processes all the old messages.
> This scenario can be produced very easily.
>
> Create a feed with an inefficient batch size to create a backlog on
> indexing topic. Then change the batch size to an effective value and wait
> to see how long it will take to process the backlog. Based on our
> observations, it takes a while to process messages in a back-log even if
> you fix the batch size. It feels batch size changes are not synchronised
> instantly.
>
> On Thu, Dec 7, 2017 at 11:45 PM, Otto Fowler 
> wrote:
>
> > We use TreeCache
> > <https://curator.apache.org/apidocs/org/apache/curator/
> framework/recipes/cache/TreeCache.html>
> > .
> >
> > When the configuration is updated in zookeeper, the configuration object
> > in the bolt is updated. This configuration is read on each message, so I
> > think from what I see new configurations should get picked up for the
> next
> > message.
> >
> > I could be wrong though.
> >
> >
> >
> >
> > On December 7, 2017 at 06:47:15, Ali Nazemian (alinazem...@gmail.com)
> > wrote:
> >
> > Thank you very much. Unfortunately, reproducing all the situations are
> > very costly for us at this moment. We are kind of avoiding to hit that
> > issue by using the same batch size for all the feeds. Hopefully, with the
> > new PR Casey provided for the segregation of ES and HDFS, it will be very
> > much clear to tune them.
> >
> > Do you know how the synchronization of indexing config will happen with
> > the topology? Does the topology gets synchronised by pulling the last
> > configs from ZK based on some background mechanism or it is based on an
> > update trigger? As I mentioned, based on our observation it looks like
> the
> > synchronization doesn't work until all the old messages in Kafka queue
> get
> > processed based on the old indexing configs.
> >
> > Regards,
> > Ali
> >
> > On Thu, Dec 7, 2017 at 12:33 AM, Otto Fowler 
> > wrote:
> >
> >> Sorry,
> >> We flush for timeouts on every storm ‘tick’ message, not on every
> message.
> >>
> >>
> >>
> >> On December 6, 2017 at 08:29:51, Otto Fowler (ottobackwa...@gmail.com)
> >> wrote:
> >>
> >> I have looked at it.
> >>
> >> We maintain batch lists for each sensor which gather messages to index.
> >> When we get a message that puts it over the batch size the messages are
> >> flushed and written to the target.
> >> There is also a timeout component, where the batch would be flushed
> based
> >> on timeout.
> >>
> >> While batch size checking occurs on a per sensor-message receipt basis,
> >> each message, regardless of sensor will trigger a check of the batch
> >> timeout for all the lists.
> >>
> >> At least that is what I think I see.
> >>
> >> Without understanding what the failures are for it is hard to see what
> >> the issue is.
> >>
> >> Do we have timing issues where all the lists are timing out all the time
> >> causing some kind of cascading failure for example?
> >> Does the number of sensors matter?  For example if only one sensor
> >> topology is running with batch setup X, is everything fine?  Do failures
> >> start after adding Nth additional sensor?
> >>
> >> Hopefully someone else on the list may have an idea.
> >> That code does not have any logging to speak of… well debug / trace
> >> logging that would help here either.
> >>
> >>
> >>
> >> On December 6, 2017 at 08:18:01, Ali Nazemian (alinazem...@gmail.com)
> >> wrote:
> >>
> >> Everything looks normal except the high number of failed tuples. Do you
> >> know how the indexing batch size works? Based on our observations it
> seems
> >> it doesn't update the messages that are in enrichments and indexing
> topics.
> >>
> >> On Thu, Dec 7, 2017 at 12:1

Re: [DISCUSS] Stellar Documentation Autogeneration

2017-12-14 Thread Michael Miklavcic

+1 from me, great idea Justin. I did a bit of digging around also and the
Doclet approach you're already using seems the way to go. I didn't come
across any libraries that would make this easier or better. Not sure if
Swagger has anything along these lines?

On Thu, Dec 14, 2017 at 1:00 PM, Otto Fowler 
wrote:

> I think this is a great idea, and I looked at the POC and it isn’t as bad
> as you make it out to be;)
>
> What I would like to see is documentation for Stellar functions, by
> namespace generated. I would also
> like the capability to document at the namespace level.
>
> Often we have namespace level concepts that don’t fit into any given
> function’s documentation.
> Setting aside the how of the namespace documentation for a moment, based on
> the POC I would
> suggest that we
>
> * find all namespaces
> * create a page per namespace
> * document each function in it’s namespace’s page
> * include the namespace doc in that page
>
> Each module that exports stellar function’s should have it’s own
> documentation.  As part of breaking stellar out to it’s own module
> we should remove stellar documentation from stellar common that applies to
> functions outside that module.
>
>
>
> On December 14, 2017 at 14:32:56, Justin Leet (justinjl...@gmail.com)
> wrote:
>
> I think it would be valuable to have the documentation around Stellar being
> autogenerated. We have most of the info we'd want in the @Stellar
> annotation, and ideally, we could just pull this info out and produce some
> docs similar to what we already manually maintain. This came up a bit in
> the context of https://issues.apache.org/jira/browse/METRON-1361
>
> I put together a super, super (super!) rough POC of using the approach of
> Javadoc-style doclet processing that reads the annotations and kicks out
> something pretty close to the current docs (without any fancy stuff like
> the table of contents and so on).
>
> Right now, there'd be a good deal more to do that to make it usable. Off
> the top of my head, the main things I wanted to look at before really even
> taking an actual stab at it are
>
> 1) abstracting out the markdown formatting from the annotation parsing
> 2) Making sure we can integrate this approach without breaking current
> Javadocs
> 3) Managing things across projects (since we put in Stellar functions all
> over).
> 4) Slightly more though about how we'd manage it.
>
> Otto's alluded to having a couple thoughts, and I'm more than happy to get
> a better idea of what we want the end state to look like (either this or
> something else, e.g. an annotation processor during compile phase or if
> someone knows a tool that takes care of this sort of thing.)
>
> Any thoughts?
>

Re: [DEV COMMUNITY MEETING] Call for Ideas and Schedule

2017-12-15 Thread Michael Miklavcic

Sounds good Otto. We probably also want to touch on the ES 5.6 upgrade
along with our current release status and short-term release roadmap that
Nick Allen has been guiding.

On Fri, Dec 15, 2017 at 9:02 AM, Laurens Vets  wrote:

> I'll try to attend :)
>
>
> On 2017-12-14 12:43, Otto Fowler wrote:
>
>> Dev Community Meeting Call
>>
>> I would like to propose a developer community meeting.
>>
>> I propose that we set the meeting early next week, and will throw out
>> Monday, December 18th at 09:30AM PST, 12:30 on the East Coast and 5:30 in
>> London Towne.
>>
>> This meeting will be held over a web-ex, the details of which will be
>> included in the actual meeting notice.
>>
>> Please reply to this with scheduling concerns and topic suggestions.
>> Potential Topics
>>
>>- Call for reviewers, ideas how to get more involvement, what people
>> can
>>do to help
>>- Feature branches : we have two now, what are they and how are we
>> going
>>to work on them
>>- Extension Repository: Default deployment and installation of parsers
>>as it relates to ‘777’
>>- General ‘777’ discussion
>>
>> Developer Community Meeting Disclaimers
>>
>>- Developer Community meetings are a means for realtime discussion of
>>development issues
>>- These meetings are not specifically aimed at demonstrations, unless
>>one is required or requested as part of such discussion
>>- These meetings are geared towards Metron development issues, not user
>>issues with deployment or shipped functionality
>>- There are *NO* decisions made in these meetings. The mailing list is
>>the official communication record of the Apache Metron Project, and as
>> such
>>all public decisions are to be made on the list, as to give the
>> greatest
>>opportunity for community involvement.
>>- There *ARE* proposals that can be made and discussed in these
>>meetings, that will then be discussed on list for decision.
>>- Notes will be taken of these meetings, and they will be posted to the
>>list
>>- There may also be breakout posts to the list per proposal or topic,
>>for more detailed discussion
>>
>

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-03 Thread Michael Miklavcic

I'm liking this design and growth strategy, Casey. I also think Nick and
Otto have some valid points. I always find there's a natural tension
between too little, just enough, and boiling the ocean and these discuss
threads really help drive what the short and long term visions should look
like.

On the subject of repositories and strategies, I agree that pluggable repos
and strategies for modifying them would be useful. For the first pass, I'd
really like to see HDFS with the proposed set of Stellar functions. This
gives us a lot of bang for our buck - we can capitalize on a set of
powerful features around existence checking earlier without having to worry
about later interface changes impacting users. With the primary interface
coming through the JSON config, we are building a nice facade that protects
users from later implementation abstractions and improvements, all while
providing a stable enough interface on which we can develop UI features as
desired. I'd be interested to hear more about what features could be
provided by a repository as time goes by. Federation, permissions,
governance, metadata management, perhaps?

I also had some concern over duplicating existing Unix features. I think
where I'm at has been largely addressed by Casey's comments on 1) scaling,
2) multiple variables, and 3) portability to Hadoop. Providing 2 approaches
- 1 which is config-based and the other a composable set of functions gives
us the ability to provide a core set of features that can later be easily
expanded by users as the need arises. Here again I think the prescribed
approach provides a strong first pass that we can then expand on without
concern of future improvements becoming a hassle for end users.

Best,
Mike

On Wed, Jan 3, 2018 at 10:25 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> There is some really cool stuff happening here, if only I’d been allowed
> to see the lists over Christmas... :)
>
> A few thoughts...
>
> I like Otto’s generalisation of the problem to include specific local
> stellar objects in a cache loaded from a store (HDFS seems a natural, but
> not only place, maybe even a web service / local microservicey object
> provider!?) That said, I suspect that’s a good platform optimisation
> approach. Should we look at this as a separate piece of work given it
> extends beyond the scope of the summarisation concept and ultimately use it
> as a back-end to feed the summarising engine proposed here for the
> enrichment loader?
>
> On the more specific use case, one think I would comment on is the
> configuration approach. The iteration loop (state_{init|update|merge}
> should be consistent with the way we handle things like the profiler
> config, since it’s the same approach to data handling.
>
> The other thing that seems to have crept in here is the interface to
> something like Spark, which again, I am really very very keen on seeing
> happen. That said, not sure how that would happen in this context, unless
> you’re talking about pushing to something like livy for example (eminently
> sensible for things like cross instance caching and faster RPC-ish access
> to an existing spark context which seem to be what Casey is driving at with
> the spark piece.
>
> To address the question of text manipulation in Stellar / metron
> enrichment ingest etc, we already have this outside of the context of the
> issues here. I would argue that yes, we don’t want too many paths for this,
> and that maybe our parser approach might be heavily related to text-based
> ingest. I would say the scope worth dealing with here though is not really
> text manipulation, but summarisation, which is not well served by existing
> CLI tools like awk / sed and friends.
>
> Simon
>
> > On 3 Jan 2018, at 15:48, Nick Allen  wrote:
> >
> >> Even with 5 threads, it takes an hour for the full Alexa 1m, so I  think
> > this will impact performance
> >
> > What exactly takes an hour?  Adding 1M entries to a bloom filter?  That
> > seems really high, unless I am not understanding something.
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 3, 2018 at 10:17 AM, Casey Stella 
> wrote:
> >
> >> Thanks for the feedback, Nick.
> >>
> >> Regarding "IMHO, I'd rather not reinvent the wheel for text
> manipulation."
> >>
> >> I would argue that we are not reinventing the wheel for text
> manipulation
> >> as the extractor config exists already and we are doing a similar thing
> in
> >> the flatfile loader (in fact, the code is reused and merely extended).
> >> Transformation operations are already supported in our codebase in the
> >> extractor config, this PR has just added some hooks for stateful
> >> operations.
> >>
> >> Furthermore, we will need a configuration object to pass to the REST
> call
> >> if we are ever to create a UI around importing data into hbase or
> creating
> >> these summary objects.
> >>
> >> Regarding your example:
> >> $ cat top-1m.csv | awk -F, '{print $2}' | sed '/^$/d' | stellar -i
> >> 'DOMAIN_REMOVE_TLD

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-03 Thread Michael Miklavcic

I just finished stepping through the typosquatting use case README in your
merge branch. This is really, really good work Casey. I see most of our
previous documentation issues addressed up front, e.g. special variables
are cited, all new fields explained, side effects documented. The use case
doc brings it all together soup-to-nuts and I think all the pieces make
sense in a mostly self-contained way. I can't think of anything I had to
sit and think about for more than a few seconds. I'll be making my way
through your individual PR's in more detail, but my first impressions are
that this is excellent.

On Wed, Jan 3, 2018 at 12:43 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I'm liking this design and growth strategy, Casey. I also think Nick and
> Otto have some valid points. I always find there's a natural tension
> between too little, just enough, and boiling the ocean and these discuss
> threads really help drive what the short and long term visions should look
> like.
>
> On the subject of repositories and strategies, I agree that pluggable
> repos and strategies for modifying them would be useful. For the first
> pass, I'd really like to see HDFS with the proposed set of Stellar
> functions. This gives us a lot of bang for our buck - we can capitalize on
> a set of powerful features around existence checking earlier without having
> to worry about later interface changes impacting users. With the primary
> interface coming through the JSON config, we are building a nice facade
> that protects users from later implementation abstractions and
> improvements, all while providing a stable enough interface on which we can
> develop UI features as desired. I'd be interested to hear more about what
> features could be provided by a repository as time goes by. Federation,
> permissions, governance, metadata management, perhaps?
>
> I also had some concern over duplicating existing Unix features. I think
> where I'm at has been largely addressed by Casey's comments on 1) scaling,
> 2) multiple variables, and 3) portability to Hadoop. Providing 2 approaches
> - 1 which is config-based and the other a composable set of functions gives
> us the ability to provide a core set of features that can later be easily
> expanded by users as the need arises. Here again I think the prescribed
> approach provides a strong first pass that we can then expand on without
> concern of future improvements becoming a hassle for end users.
>
> Best,
> Mike
>
> On Wed, Jan 3, 2018 at 10:25 AM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> There is some really cool stuff happening here, if only I’d been allowed
>> to see the lists over Christmas... :)
>>
>> A few thoughts...
>>
>> I like Otto’s generalisation of the problem to include specific local
>> stellar objects in a cache loaded from a store (HDFS seems a natural, but
>> not only place, maybe even a web service / local microservicey object
>> provider!?) That said, I suspect that’s a good platform optimisation
>> approach. Should we look at this as a separate piece of work given it
>> extends beyond the scope of the summarisation concept and ultimately use it
>> as a back-end to feed the summarising engine proposed here for the
>> enrichment loader?
>>
>> On the more specific use case, one think I would comment on is the
>> configuration approach. The iteration loop (state_{init|update|merge}
>> should be consistent with the way we handle things like the profiler
>> config, since it’s the same approach to data handling.
>>
>> The other thing that seems to have crept in here is the interface to
>> something like Spark, which again, I am really very very keen on seeing
>> happen. That said, not sure how that would happen in this context, unless
>> you’re talking about pushing to something like livy for example (eminently
>> sensible for things like cross instance caching and faster RPC-ish access
>> to an existing spark context which seem to be what Casey is driving at with
>> the spark piece.
>>
>> To address the question of text manipulation in Stellar / metron
>> enrichment ingest etc, we already have this outside of the context of the
>> issues here. I would argue that yes, we don’t want too many paths for this,
>> and that maybe our parser approach might be heavily related to text-based
>> ingest. I would say the scope worth dealing with here though is not really
>> text manipulation, but summarisation, which is not well served by existing
>> CLI tools like awk / sed and friends.
>>
>> Simon
>>
>> > On 3 Jan 2018, at 15:48, Nick Allen  wrote:
>>

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-04 Thread Michael Miklavcic

ciously (see Jon
> Zeolla's
> > concerns in https://issues.apache.org/jira/browse/METRON-517 for a
> > discussion of this).  In order to do that, we could simply execute:
> >
> > $METRON_HOME/bin/flatfile_summarizer.sh -i "select uri from bro" -o
> /tmp/reference/bro_uri_distribution.ser -e ~/uri_length_extractor.json -p
> 5 -om HDFS -m SPARK_SQL
> >
> > with uri_length_extractor.json containing:
> >
> > {
> >   "config" : {
> > "value_filter" : "LENGTH(uri) > 0",
> > "state_init" : "STATS_INIT()",
> > "state_update" : {
> >"state" : "STATS_ADD(state, LENGTH(uri))"
> >  },
> > "state_merge" : "STATS_MERGE(states)",
> > "separator" : ","
> >   },
> >   "extractor" : "SQL_ROW"
> > }
> >
> >
> > Regarding value filter, that's already around in the extractor config
> > because of the need to transform data in the flatfile loader.  While I
> > definitely see the desire to use unix tools to prep data, there are some
> > things that aren't as easy to do.  For instance, here, removing the TLD
> of
> > a domain is not a trivial task in a shell script and we have existing
> > functions for that in Stellar.  I would see people using both.
> >
> > To address the issue of a more targeted experience to bloom, I think that
> > sort of specialization should best exist in the UI layer.  Having a more
> > complete and expressive backend reused across specific UIs seems to be
> the
> > best of all worlds.  It allows power users to drop down and do more
> complex
> > things and still provides a (mostly) code-free and targeted experience
> for
> > users.  It seems to me that limiting the expressibility in the backend
> > isn't the right way to go since this work just fits in with our existing
> > engine.
> >
> >
> > On Thu, Jan 4, 2018 at 1:40 AM, James Sirota  wrote:
> >
> >> I just went through these pull requests as well and also agree this is
> >> good work.  I think it's a good first pass.  I would be careful with
> trying
> >> to boil the ocean here.  I think for the initial use case I would only
> >> support loading the bloom filters from HDFS.  If people want to
> pre-process
> >> the CSV file of domains using awk or sed this should be out of scope of
> >> this work.  It's easy enough to do out of band and I would not include
> any
> >> of these functions at all.   I also think that the config could be
> >> considerably simplified.  I think value_filter should be removed (since
> I
> >> believe that preprocessing should be done by the user outside of this
> >> process).  I also have a question about the init, update, and merge
> >> configurations.  Would I ever initialize to anything but an empty bloom
> >> filter?  For the state update would I ever do anything other than add to
> >> the bloom filter?  For the state merge would I ever do anything other
> than
> >> merge the states?  If the answer to these is 'no', then this should
> simply
> >> be hard coded and not externalized into config values.
> >>
> >> 03.01.2018, 14:20, "Michael Miklavcic" :
> >> > I just finished stepping through the typosquatting use case README in
> >> your
> >> > merge branch. This is really, really good work Casey. I see most of
> our
> >> > previous documentation issues addressed up front, e.g. special
> variables
> >> > are cited, all new fields explained, side effects documented. The use
> >> case
> >> > doc brings it all together soup-to-nuts and I think all the pieces
> make
> >> > sense in a mostly self-contained way. I can't think of anything I had
> to
> >> > sit and think about for more than a few seconds. I'll be making my way
> >> > through your individual PR's in more detail, but my first impressions
> >> are
> >> > that this is excellent.
> >> >
> >> > On Wed, Jan 3, 2018 at 12:43 PM, Michael Miklavcic <
> >> > michael.miklav...@gmail.com> wrote:
> >> >
> >> >>  I'm liking this design and growth strategy, Casey. I also think Nick
> >> and
> >> >>  Otto have some valid points. I always find there's a natural tension
> >> >>  between too little, just enough, and boiling the ocean and these
> >> discuss
> &g

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Michael Miklavcic

I'm not sure I follow what you're saying as it pertains to summary objects.
Repository is a loaded term, and I'm very apprehensive of pushing for
something potentially very complex where a simpler solution would suffice
in the short term. To wit, the items I'm seeing in this use case doc -
https://github.com/cestella/incubator-metron/tree/typosquat_merge/use-cases/typosquat_detection
- don't preclude the 4 capabilities you've enumerated. Am I missing
something, or can you provide more context? My best guess is that rather
than referring to a specific HDFS path for a serialized object, you're
suggesting we provide a more abstract method for serializing/deserializing
objects to/from a variety of sources. Am I in the ballpark? I'd be in favor
of expanding functionality for such a thing provided a sensible default (ie
HDFS) is provided in the short-term.

On Fri, Jan 5, 2018 at 8:26 AM, Otto Fowler  wrote:

> If we separate the concerns as I have state previously :
>
> 1. Stellar can load objects into ‘caches’ from some repository and refer to
> them.
> 2. The repositories
> 3. Some number of strategies to populate and possibly update the
> repository, from spark,
> to MR jobs to whatever you would classify the flat file stuff as.
> 4. Let the Stellar API for everything but LOAD() follow after we get usage
>
> Then the particulars of ‘3’ are less important.
>
>
>
> On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com) wrote:
>
> I agree with the general sentiment that we can tailor specific use cases
> via UI, and I'm worried that the use case specific solution (particularly
> in light of the note that it's not even general to the class of bloom
> filter problems, let alone an actually general problem) becomes more work
> than this as soon as about 2 more uses cases actually get realized.
> Pushing that to the UI lets people solve a variety of problems if they
> really want to dig in, while still giving flexibility to provide a more
> tailored experience for what we discover the 80% cases are in practice.
>
> Keeping in mind I am mostly unfamiliar with the extractor config itself, I
> am wondering if it makes sense to split up the config a bit. While a lot
> of implementation details are shared, maybe the extractor config itself
> should be refactored into a couple parts analogous to ETL (as a follow on
> task, I think if this is true, it predates Casey's proposed change). It
> doesn't necessarily make it less complex, but it might make it more easily
> digestible if it's split up by idea (parsing, transformation, etc.).
>
> Re: Mike's point, I don't think we want the actual processing broken up as
> ETL, but the representation to the user in terms of configuration could be
> similar (Since we're already doing parsing and transformation). We don't
> have to implement it as an ETL pipeline, but it does potentially offer the
> user a way to quickly grasp what the JSON blob is actually specifying.
> Making it easy to understand, even if it's not the ideal way to interact is
> potentially still a win.
>
> On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I mentioned this earlier, but I'll reiterate that I think this approach
> > gives us the ability to make specific use cases via a UI, or other
> > interface should we choose to add one, while keeping the core adaptable
> and
> > flexible. This is ideal for middle tier as I think this effectively gives
> > us the ability to pivot to other use cases very easily while not being so
> > generic as to be useless. The fact that you were able to create this as
> > quickly as you did seems to me directly related to the fact we made the
> > decision to keep the loader somewhat flexible rather than very specific.
> > The operation ordering and state carry from one phase of processing to
> the
> > next would simply have been inscrutable, if not impossible, with a CLI
> > option-only approach. Sure, it's not as simple as "put infile.txt
> > outfile.txt", but the alternatives are not that clear either. One might
> > argue we could split up the processing pieces as in traditional Hadoop,
> eg
> > ETL: Sqoop ingest -> HDFS -> mapreduce, pig, hive, or spark transform.
> But
> > quite frankly that's going in the *opposite* direction I think we want
> > here. That's more complex in terms of moving parts. The config approach
> > with pluggable Stellar insulates users from specific implementations, but
> > also gives you the ability to pass lower level constructs, eg Spark SQL
> or
> > HiveQL, should the need arise.
> >
> > I

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Michael Miklavcic

Any volunteers for creating a set of jiras and feature branch for an object
store repository? This sounds like a massive feature.

On Jan 5, 2018 2:06 PM, "Otto Fowler"  wrote:

> I would say that at the stellar author level, you would just get objects
> from the store and the ‘override’ case would be a follow on for edge cases.
>
>
> On January 5, 2018 at 14:29:16, Casey Stella (ceste...@gmail.com) wrote:
>
> Well, you can pull the default configs from global configs, but you might
> want to override them (similar to the profiler).  For instance, you might
> want to interact with another hbase table than the one globally configured.
>
> On Fri, Jan 5, 2018 at 12:04 PM, Otto Fowler 
> wrote:
>
> > I would imagine the ‘stellar-object-repo’ would be part of the global
> > configuration or configuration passed to the command.
> > why specify in the function itself?
> >
> >
> >
> >
> > On January 5, 2018 at 11:22:32, Casey Stella (ceste...@gmail.com) wrote:
> >
> > I like that, specifically the repositories abstraction. Perhaps we can
> > construct some longer term JIRAs for extensions.
> > For the current state of affairs (wrt to the OBJECT_GET call) I was
> > imagining the simple default HDFS solution as a first cut and
> > following on adding a repository name (e.g. OBJECT_GET(path, repo_name)
> > with repo_name being optional and defaulting to HDFS
> > for backwards compatibility.
> >
> > In effect, this would be the next step that I'm proposing
> OBJECT_GET(paths,
> > repo_name, repo_config) which would be backwards compatible
> >
> > - paths - a single path or a list of paths (if a list, then a list of
> > objects returned)
> > - repo_name - optional name for repo, defaulted to HDFS if we don't
> > specify
> > - repo_config - optional config map
> >
> >
> > This would open things like:
> >
> > - OBJECT_GET('key', 'HBASE', { 'hbase.table' : 'table', 'hbase.cf' :
> > 'cf'} ) -- pulling from HBase
> >
> > Eventually we might also be able to fold ENRICHMENT_GET as just a special
> > repo instance.
> >
> > On Fri, Jan 5, 2018 at 10:26 AM, Otto Fowler 
> > wrote:
> >
> > > If we separate the concerns as I have state previously :
> > >
> > > 1. Stellar can load objects into ‘caches’ from some repository and
> refer
> > to
> > > them.
> > > 2. The repositories
> > > 3. Some number of strategies to populate and possibly update the
> > > repository, from spark,
> > > to MR jobs to whatever you would classify the flat file stuff as.
> > > 4. Let the Stellar API for everything but LOAD() follow after we get
> > usage
> > >
> > > Then the particulars of ‘3’ are less important.
> > >
> > >
> > >
> > > On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com)
> > wrote:
> > >
> > > I agree with the general sentiment that we can tailor specific use
> cases
> > > via UI, and I'm worried that the use case specific solution
> (particularly
> > > in light of the note that it's not even general to the class of bloom
> > > filter problems, let alone an actually general problem) becomes more
> work
> > > than this as soon as about 2 more uses cases actually get realized.
> > > Pushing that to the UI lets people solve a variety of problems if they
> > > really want to dig in, while still giving flexibility to provide a more
> > > tailored experience for what we discover the 80% cases are in practice.
> > >
> > > Keeping in mind I am mostly unfamiliar with the extractor config
> itself,
> > I
> > > am wondering if it makes sense to split up the config a bit. While a
> lot
> > > of implementation details are shared, maybe the extractor config itself
> > > should be refactored into a couple parts analogous to ETL (as a follow
> on
> > > task, I think if this is true, it predates Casey's proposed change). It
> > > doesn't necessarily make it less complex, but it might make it more
> > easily
> > > digestible if it's split up by idea (parsing, transformation, etc.).
> > >
> > > Re: Mike's point, I don't think we want the actual processing broken up
> > as
> > > ETL, but the representation to the user in terms of configuration could
> > be
> > > similar (Since we're already doing parsing and transformation). We
> don't
> > > have to impleme

Re: [DISCUSS] Upgrading Solr

2018-01-22 Thread Michael Miklavcic

I'm +1 on feature branch and user@ announcement as well.

On Thu, Jan 18, 2018 at 12:46 PM, Casey Stella  wrote:

> +1 to both the feature branch and user@ announcement.
>
> On Thu, Jan 18, 2018 at 2:45 PM, Otto Fowler 
> wrote:
>
> > +1 to the feature branch.
> >
> > Also, there have been some questions about solr support recently, I think
> > when the feature branch
> > is ready you should announce it on user@ too, we may get some help from
> > folks looking for this.
> >
> >
> >
> > On January 18, 2018 at 14:26:14, Justin Leet (justinjl...@gmail.com)
> > wrote:
> >
> > Now that we have ES at a modern version, we should consider bringing Solr
> > to a modern version as well.
> >
> > The focus of this work would be to get us in a place where Solr is
> > upgraded, along with the related work of building out the Solr
> > functionality to parity with Elasticsearch. The goal would not be to add
> > net new functionality, just to get Solr and ES in the same place for the
> > alerts UI and REST interface. Additionally, it would include the various
> > supporting necessities such as ensuring associated DAOs are testable, and
> > so on.
> >
> > Given the testing, reviewing, and iteration involved, I'd like to propose
> > doing this work in a feature in a feature branch.
> >
> > Jiras would be created based on this discussion once it dies down a bit.
> >
>

Re: Master is failed in Travis

2018-01-23 Thread Michael Miklavcic

Yeah, this seems to be breaking in every build at this point. I am going to
look into it tomorrow.

On Mon, Jan 22, 2018 at 8:29 AM, Nick Allen  wrote:

> I had copied the wrong text into the bug.  I fixed that.
>
> On Mon, Jan 22, 2018 at 10:22 AM, Casey Stella  wrote:
>
> > This could be one of those intermittent test failures related to timing.
> > Specifically this:
> >
> > test(org.apache.metron.rest.controller.SensorIndexingConfigController
> > IntegrationTest)
> >  Time elapsed: 0.064 sec  <<< FAILURE!
> > java.lang.AssertionError: Status expected:<404> but was:<200>
> > at org.springframework.test.util.AssertionErrors.fail(
> > AssertionErrors.java:54)
> > at org.springframework.test.util.AssertionErrors.assertEquals(
> > AssertionErrors.java:81)
> > at org.springframework.test.web.servlet.result.
> > StatusResultMatchers$10.match(StatusResultMatchers.java:664)
> > at org.springframework.test.web.servlet.MockMvc$1.andExpect(
> > MockMvc.java:171)
> > at org.apache.metron.rest.controller.
> > SensorIndexingConfigControllerIntegrationTest.test(
> > SensorIndexingConfigControllerIntegrationTest.java:146)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:62)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> > FrameworkMethod.java:47)
> > at org.junit.internal.runners.model.ReflectiveCallable.run(
> > ReflectiveCallable.java:12)
> > at org.junit.runners.model.FrameworkMethod.invokeExplosively(
> > FrameworkMethod.java:44)
> > at org.junit.internal.runners.statements.InvokeMethod.
> > evaluate(InvokeMethod.java:17)
> > at org.junit.internal.runners.statements.RunBefores.
> > evaluate(RunBefores.java:26)
> > at org.springframework.test.context.junit4.statements.
> > RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.
> > java:75)
> > at org.springframework.test.context.junit4.statements.
> > RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.
> java:86)
> > at org.springframework.test.context.junit4.statements.
> > SpringRepeat.evaluate(SpringRepeat.java:84)
> > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> > at org.springframework.test.context.junit4.
> > SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:252)
> > at org.springframework.test.context.junit4.
> > SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:94)
> > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> > at org.junit.runners.ParentRunner$1.schedule(
> ParentRunner.java:63)
> > at org.junit.runners.ParentRunner.runChildren(
> > ParentRunner.java:236)
> > at org.junit.runners.ParentRunner.access$000(
> ParentRunner.java:53)
> > at org.junit.runners.ParentRunner$2.evaluate(
> > ParentRunner.java:229)
> > at org.springframework.test.context.junit4.statements.
> > RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.
> java:61)
> > at org.springframework.test.context.junit4.statements.
> > RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
> > at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> > at org.springframework.test.context.junit4.
> > SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:191)
> > at org.apache.maven.surefire.junit4.JUnit4Provider.execute(
> > JUnit4Provider.java:283)
> > at org.apache.maven.surefire.junit4.JUnit4Provider.
> > executeWithRerun(JUnit4Provider.java:173)
> > at org.apache.maven.surefire.junit4.JUnit4Provider.
> > executeTestSet(JUnit4Provider.java:153)
> > at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(
> > JUnit4Provider.java:128)
> > at org.apache.maven.surefire.booter.ForkedBooter.
> > invokeProviderInSameClassLoader(ForkedBooter.java:203)
> > at org.apache.maven.surefire.booter.ForkedBooter.
> > runSuitesInProcess(ForkedBooter.java:155)
> > at org.apache.maven.surefire.booter.ForkedBooter.main(
> > ForkedBooter.java:103)
> >
> >
> >
> > On Mon, Jan 22, 2018 at 10:21 AM, Nick Allen  wrote:
> >
> > > I had created this JIRA for the specific issue earlier this morning.  I
> > > have no idea why it is breaking and I am not currently looking into it.
> > > Definitely nothing to do with the most recent commit.
> > >
> > > https://issues.apache.org/jira/browse/METRON-1414
> > >
> > >
> > > On Mon, Jan 22, 2018 at 10:18 AM, Otto Fowler  >
> > > wrote:
> > >
> > > > https://travis-ci.org/apache/metron/builds/330900667
> > > >
> > >
> >
>

Re: Master is failed in Travis

2018-01-24 Thread Michael Miklavcic

ron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:477:18
[INFO] at Receiver.applyExtensions
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:364:5)
[INFO] at
/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:466:14
[INFO] at Receiver.flush
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:340:3)
[INFO] at Receiver.opcodes.1.finish
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:482:12)
[INFO] at Receiver.expectHandler
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:451:33)
[INFO] at Receiver.add
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/Receiver.js:95:24)
[INFO] at Socket.realHandler
(/Users/mmiklavcic/devprojects/metron/metron-interface/metron-config/node_modules/karma/node_modules/ws/lib/WebSocket.js:800:20)
[INFO] at emitOne (events.js:96:13)
[INFO] at Socket.emit (events.js:188:7)
[INFO] at readableAddChunk (_stream_readable.js:172:18)
[INFO] at Socket.Readable.push (_stream_readable.js:130:10)
[INFO] at TCP.onread (net.js:542:20)
   PhantomJS 2.1.1 (Mac OS X 0.0.0) ERROR
[INFO]   Disconnected, because no message in 1 ms.

This thread appears to shed light on the problem and I'm attempting to lock
the remap-istanbul version now.
https://github.com/jhipster/generator-jhipster/issues/7031

I'd like to re-up my strong support to get this working more repeatably
like our Maven dep management. I'll spend a little bit of time today
revisiting what's available to us via npm. I know Yarn (javascript yarn,
not hadoop) has been mentioned before, but I think npm may have some
features now as well.

Cheers,
Mike

On Tue, Jan 23, 2018 at 2:45 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Yeah, this seems to be breaking in every build at this point. I am going
> to look into it tomorrow.
>
> On Mon, Jan 22, 2018 at 8:29 AM, Nick Allen  wrote:
>
>> I had copied the wrong text into the bug.  I fixed that.
>>
>> On Mon, Jan 22, 2018 at 10:22 AM, Casey Stella 
>> wrote:
>>
>> > This could be one of those intermittent test failures related to timing.
>> > Specifically this:
>> >
>> > test(org.apache.metron.rest.controller.SensorIndexingConfigController
>> > IntegrationTest)
>> >  Time elapsed: 0.064 sec  <<< FAILURE!
>> > java.lang.AssertionError: Status expected:<404> but was:<200>
>> > at org.springframework.test.util.AssertionErrors.fail(
>> > AssertionErrors.java:54)
>> > at org.springframework.test.util.AssertionErrors.assertEquals(
>> > AssertionErrors.java:81)
>> > at org.springframework.test.web.servlet.result.
>> > StatusResultMatchers$10.match(StatusResultMatchers.java:664)
>> > at org.springframework.test.web.servlet.MockMvc$1.andExpect(
>> > MockMvc.java:171)
>> > at org.apache.metron.rest.controller.
>> > SensorIndexingConfigControllerIntegrationTest.test(
>> > SensorIndexingConfigControllerIntegrationTest.java:146)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke(
>> > NativeMethodAccessorImpl.java:62)
>> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> > DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:498)
>> > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>> > FrameworkMethod.java:47)
>> > at org.junit.internal.runners.model.ReflectiveCallable.run(
>> > ReflectiveCallable.java:12)
>> > at org.junit.runners.model.FrameworkMethod.invokeExplosively(
>> > FrameworkMethod.java:44)
>> > at org.junit.internal.runners.statements.InvokeMethod.
>> > evaluate(InvokeMethod.java:17)
>> > at org.junit.internal.runners.statements.RunBefores.
>> > evaluate(RunBefores.java:26)
>> > at org.springframework.test.context.junit4.statements.
>> > RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.
>> > java:75)
>> > at org.springframework.test.context.junit4.statements.
>> > RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallb
>> acks.java:86)
>> > at org.springframewor

Re: Master is failed in Travis

2018-01-24 Thread Michael Miklavcic

Fix is out for the UI build error. Please try it out/review -
https://github.com/apache/metron/pull/908

Between these 2 PR's, this should resolve our current build problems. I'm
looking at what I have to do for metron-config licensing updates right now.

   1. https://github.com/apache/metron/pull/906
   2. https://github.com/apache/metron/pull/908

Also, I'm noticing a ton of logging output again. I haven't looked into
this yet - anyone have any ideas around that? Is it checkstyle, or tests
not adhering to our logging settings?

Best,
Mike


On Wed, Jan 24, 2018 at 9:17 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Working with Ryan Merriman on fixes for the build. There are 2 issues
> currently. I have a PR out for one of them, which is an intermittent test
> failure around the search config test - https://github.com/apache/
> metron/pull/906. The other appears to be us getting bitten once again by
> npm packages having version incompatibilities. We ran "mvn test --projects
> metron-interface/metron-config" locally and the tests bomb out with an
> error in one of the functions:
>
> [INFO] 24 01 2018 08:55:35.441:INFO [PhantomJS 2.1.1 (Mac OS X 0.0.0)]:
> Connected on socket /#LtO6fvDbvgN7JNHX with id 87167843
>PhantomJS 2.1.1 (Mac OS X 0.0.0): Executed 241 of 241 SUCCESS (1
> min 3.946 secs / 1 min 4.309 secs)
> [INFO] Missing error handler on `socket`.
> *[INFO] TypeError: sourceMap.originalPositionFor is not a function*
> [INFO] at getMapping (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/remap-istanbul/lib/remap.js:76:25)
> [INFO] at /Users/mmiklavcic/devprojects/metron/metron-interface/
> metron-config/node_modules/remap-istanbul/lib/remap.js:245:20
> [INFO] at Array.forEach (native)
> [INFO] at /Users/mmiklavcic/devprojects/metron/metron-interface/
> metron-config/node_modules/remap-istanbul/lib/remap.js:243:37
> [INFO] at Array.forEach (native)
> [INFO] at /Users/mmiklavcic/devprojects/metron/metron-interface/
> metron-config/node_modules/remap-istanbul/lib/remap.js:192:22
> [INFO] at Array.forEach (native)
> [INFO] at remap (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/remap-istanbul/lib/remap.js:191:12)
> [INFO] at onRunComplete (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma-remap-istanbul/index.js:55:21)
> [INFO] at . (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/lib/events.js:13:22)
> [INFO] at emitTwo (events.js:111:20)
> [INFO] at emit (events.js:191:7)
> [INFO] at emitRunCompleteIfAllBrowsersDone (/Users/mmiklavcic/
> devprojects/metron/metron-interface/metron-config/node_
> modules/karma/lib/server.js:294:12)
> [INFO] at . (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/lib/server.js:325:7)
> [INFO] at emitOne (events.js:96:13)
> [INFO] at emit (events.js:188:7)
> [INFO] at . (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/lib/server.js:308:12)
> [INFO] at emitTwo (events.js:111:20)
> [INFO] at emit (events.js:191:7)
> [INFO] at onComplete (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/lib/browser.js:143:13)
> [INFO] at Socket. (/Users/mmiklavcic/
> devprojects/metron/metron-interface/metron-config/node_
> modules/karma/lib/events.js:13:22)
> [INFO] at emitTwo (events.js:111:20)
> [INFO] at Socket.emit (events.js:191:7)
> [INFO] at Socket.onevent (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/node_modules/soc
> ket.io/lib/socket.js:335:8)
> [INFO] at Socket.onpacket (/Users/mmiklavcic/
> devprojects/metron/metron-interface/metron-config/node_
> modules/karma/node_modules/socket.io/lib/socket.js:295:12)
> [INFO] at Client.ondecoded (/Users/mmiklavcic/
> devprojects/metron/metron-interface/metron-config/node_
> modules/karma/node_modules/socket.io/lib/client.js:193:14)
> [INFO] at Decoder.Emitter.emit (/Users/mmiklavcic/
> devprojects/metron/metron-interface/metron-config/node_
> modules/component-emitter/index.js:134:20)
> [INFO] at Decoder.add (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/node_modules/
> socket.io-parser/index.js:247:12)
> [INFO] at Client.ondata (/Users/mmiklavcic/devprojects/metron/metron-
> interface/metron-config/node_modules/karma/node_modules/soc
> ket.io/lib/client.js:175:18)
> [INFO] at emitOne (events.js:96:13)
> [INFO] at Socket.emit (events.js:188:7)
> [INFO] at Socket.on

Re: Master is failed in Travis

2018-01-24 Thread Michael Miklavcic

Fix worked on my fork's Travis build -
https://travis-ci.org/mmiklavc/metron/builds/332889008?utm_source=email&utm_medium=notification.
Just waiting on the mainline build to run and we should be good to go.

On Wed, Jan 24, 2018 at 10:36 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Fix is out for the UI build error. Please try it out/review -
> https://github.com/apache/metron/pull/908
>
> Between these 2 PR's, this should resolve our current build problems. I'm
> looking at what I have to do for metron-config licensing updates right now.
>
>1. https://github.com/apache/metron/pull/906
>2. https://github.com/apache/metron/pull/908
>
> Also, I'm noticing a ton of logging output again. I haven't looked into
> this yet - anyone have any ideas around that? Is it checkstyle, or tests
> not adhering to our logging settings?
>
> Best,
> Mike
>
>
> On Wed, Jan 24, 2018 at 9:17 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
>> Working with Ryan Merriman on fixes for the build. There are 2 issues
>> currently. I have a PR out for one of them, which is an intermittent test
>> failure around the search config test - https://github.com/apache/me
>> tron/pull/906. The other appears to be us getting bitten once again by
>> npm packages having version incompatibilities. We ran "mvn test --projects
>> metron-interface/metron-config" locally and the tests bomb out with an
>> error in one of the functions:
>>
>> [INFO] 24 01 2018 08:55:35.441:INFO [PhantomJS 2.1.1 (Mac OS X 0.0.0)]:
>> Connected on socket /#LtO6fvDbvgN7JNHX with id 87167843
>>PhantomJS 2.1.1 (Mac OS X 0.0.0): Executed 241 of 241 SUCCESS (1
>> min 3.946 secs / 1 min 4.309 secs)
>> [INFO] Missing error handler on `socket`.
>> *[INFO] TypeError: sourceMap.originalPositionFor is not a function*
>> [INFO] at getMapping (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> remap-istanbul/lib/remap.js:76:25)
>> [INFO] at /Users/mmiklavcic/devprojects/
>> metron/metron-interface/metron-config/node_modules/remap-
>> istanbul/lib/remap.js:245:20
>> [INFO] at Array.forEach (native)
>> [INFO] at /Users/mmiklavcic/devprojects/
>> metron/metron-interface/metron-config/node_modules/remap-
>> istanbul/lib/remap.js:243:37
>> [INFO] at Array.forEach (native)
>> [INFO] at /Users/mmiklavcic/devprojects/
>> metron/metron-interface/metron-config/node_modules/remap-
>> istanbul/lib/remap.js:192:22
>> [INFO] at Array.forEach (native)
>> [INFO] at remap (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> remap-istanbul/lib/remap.js:191:12)
>> [INFO] at onRunComplete (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma-remap-istanbul/index.js:55:21)
>> [INFO] at . (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma/lib/events.js:13:22)
>> [INFO] at emitTwo (events.js:111:20)
>> [INFO] at emit (events.js:191:7)
>> [INFO] at emitRunCompleteIfAllBrowsersDone
>> (/Users/mmiklavcic/devprojects/metron/metron-interface/
>> metron-config/node_modules/karma/lib/server.js:294:12)
>> [INFO] at . (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma/lib/server.js:325:7)
>> [INFO] at emitOne (events.js:96:13)
>> [INFO] at emit (events.js:188:7)
>> [INFO] at . (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma/lib/server.js:308:12)
>> [INFO] at emitTwo (events.js:111:20)
>> [INFO] at emit (events.js:191:7)
>> [INFO] at onComplete (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma/lib/browser.js:143:13)
>> [INFO] at Socket. (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/
>> karma/lib/events.js:13:22)
>> [INFO] at emitTwo (events.js:111:20)
>> [INFO] at Socket.emit (events.js:191:7)
>> [INFO] at Socket.onevent (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/karma/node_modules/
>> socket.io/lib/socket.js:335:8)
>> [INFO] at Socket.onpacket (/Users/mmiklavcic/devprojects
>> /metron/metron-interface/metron-config/node_modules/karma/node_modules/
>> socket.io/lib/socket.js:295:12)
>> [INFO] at Client.ondecoded (/Users/mmiklavcic/devprojects
>> /metron/metron-inter

Re: Master is failed in Travis

2018-01-24 Thread Michael Miklavcic

The first build fix has been merged into master. Please pull latest master
and merge into your branches to kick off a Travis build again. The search
config test fix is pending a successful Travis build and will be merged in
imminently as well.

Mike

On Wed, Jan 24, 2018 at 11:43 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Fix worked on my fork's Travis build - https://travis-ci.org/
> mmiklavc/metron/builds/332889008?utm_source=email&utm_medium=notification.
> Just waiting on the mainline build to run and we should be good to go.
>
> On Wed, Jan 24, 2018 at 10:36 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
>> Fix is out for the UI build error. Please try it out/review -
>> https://github.com/apache/metron/pull/908
>>
>> Between these 2 PR's, this should resolve our current build problems. I'm
>> looking at what I have to do for metron-config licensing updates right now.
>>
>>1. https://github.com/apache/metron/pull/906
>>2. https://github.com/apache/metron/pull/908
>>
>> Also, I'm noticing a ton of logging output again. I haven't looked into
>> this yet - anyone have any ideas around that? Is it checkstyle, or tests
>> not adhering to our logging settings?
>>
>> Best,
>> Mike
>>
>>
>> On Wed, Jan 24, 2018 at 9:17 AM, Michael Miklavcic <
>> michael.miklav...@gmail.com> wrote:
>>
>>> Working with Ryan Merriman on fixes for the build. There are 2 issues
>>> currently. I have a PR out for one of them, which is an intermittent test
>>> failure around the search config test - https://github.com/apache/me
>>> tron/pull/906. The other appears to be us getting bitten once again by
>>> npm packages having version incompatibilities. We ran "mvn test --projects
>>> metron-interface/metron-config" locally and the tests bomb out with an
>>> error in one of the functions:
>>>
>>> [INFO] 24 01 2018 08:55:35.441:INFO [PhantomJS 2.1.1 (Mac OS X 0.0.0)]:
>>> Connected on socket /#LtO6fvDbvgN7JNHX with id 87167843
>>>PhantomJS 2.1.1 (Mac OS X 0.0.0): Executed 241 of 241 SUCCESS (1
>>> min 3.946 secs / 1 min 4.309 secs)
>>> [INFO] Missing error handler on `socket`.
>>> *[INFO] TypeError: sourceMap.originalPositionFor is not a function*
>>> [INFO] at getMapping (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/remap-
>>> istanbul/lib/remap.js:76:25)
>>> [INFO] at /Users/mmiklavcic/devprojects/
>>> metron/metron-interface/metron-config/node_modules/remap-ist
>>> anbul/lib/remap.js:245:20
>>> [INFO] at Array.forEach (native)
>>> [INFO] at /Users/mmiklavcic/devprojects/
>>> metron/metron-interface/metron-config/node_modules/remap-ist
>>> anbul/lib/remap.js:243:37
>>> [INFO] at Array.forEach (native)
>>> [INFO] at /Users/mmiklavcic/devprojects/
>>> metron/metron-interface/metron-config/node_modules/remap-ist
>>> anbul/lib/remap.js:192:22
>>> [INFO] at Array.forEach (native)
>>> [INFO] at remap (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/remap-
>>> istanbul/lib/remap.js:191:12)
>>> [INFO] at onRunComplete (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/karma-
>>> remap-istanbul/index.js:55:21)
>>> [INFO] at . (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/karma/
>>> lib/events.js:13:22)
>>> [INFO] at emitTwo (events.js:111:20)
>>> [INFO] at emit (events.js:191:7)
>>> [INFO] at emitRunCompleteIfAllBrowsersDone
>>> (/Users/mmiklavcic/devprojects/metron/metron-interface/metro
>>> n-config/node_modules/karma/lib/server.js:294:12)
>>> [INFO] at . (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/karma/
>>> lib/server.js:325:7)
>>> [INFO] at emitOne (events.js:96:13)
>>> [INFO] at emit (events.js:188:7)
>>> [INFO] at . (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/karma/
>>> lib/server.js:308:12)
>>> [INFO] at emitTwo (events.js:111:20)
>>> [INFO] at emit (events.js:191:7)
>>> [INFO] at onComplete (/Users/mmiklavcic/devprojects
>>> /metron/metron-interface/metron-config/node_modules/karma/
>>> lib/browser.js:143:13)
>>> [INFO] at Socket. (/Users/mmiklavcic/devproject

[DISCUSS] Update Metron Elasticsearch index names to metron_

2018-01-24 Thread Michael Miklavcic

With the completion of https://github.com/apache/metron/pull/840
(METRON-939: Upgrade ElasticSearch and Kibana), we have the makings for a
major release rev of Metron in the upcoming release (currently slotted to
0.4.3, I believe). Since there are non-backwards compatible changes
pertaining to ES indexing, it seems like a good opportunity to revisit our
index naming standards.

I propose we add a simple prefix "metron_" to all Metron indexes. There are
numerous reasons for doing so

   - removes the likelihood of index name collisions when we perform
   operations on index wildcard names, e.g. "enrichment_*, indexing_*, etc.".
   - ie, this allows us to be more friendly in a multi-tenant ES
   environment for relatively low engineering cost.
   - simplifies the Kibana dashboard a bit. We currently needed to create a
   special index pattern in order to accommodate multi-index pattern matching
   across all metron-specific indexes. Using metron_* would be much simpler
   and less prone to error.
   - easier for customers to debug and identify Metron-specific indexes and
   associated data


The reason for making these changes now is that we already have breaking
changes with ES. Leveraging existing indexed data rather than deleting
indexes and starting from scractch already requires a re-indexing/migration
step, so there is no additional effort on the part of users if they choose
to attempt a migration. It further makes sense with our current work
towards upgrading Solr.

We already have a battery of integration and manual tests after the ES
upgrade work that can be leveraged to validate the changes.

Mike Miklavcic

Re: Dependency Checks

2018-01-24 Thread Michael Miklavcic

A big +1 to this. Good suggestion.

On Jan 24, 2018 2:44 PM, "Nick Allen"  wrote:

> We should re-jigger `platform-info.sh` (or create a new tool) that very
> obviously passes or fails based on what it discovers in the user's
> environment.  Right now, a user just runs the `platform-info.sh` and it is
> not apparent to them what the problem is.
>
> The script could be manually executed by a user.  This could also be called
> at the start of a deployment so that it fails fast if the user is missing
> dependencies.
>
> I wish there was a better way to handle this.
>
>
> On Wed, Jan 24, 2018 at 1:05 PM Sujay Jaladi  wrote:
>
> > Thanks Otto. Please find the output below
> >
> > scripts sujay$ ./platform-info.sh
> >
> > Metron 0.4.2
> >
> > --
> >
> > --
> >
> > fatal: your current branch 'master' does not have any commits yet
> >
> > --
> >
> > --
> >
> > ansible 2.2.2.0
> >
> >   config file =
> >
> >   configured module search path = Default w/o overrides
> >
> > --
> >
> > Vagrant 2.0.1
> >
> > --
> >
> > Python 2.7.10
> >
> > --
> >
> > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> > 2015-11-10T08:41:47-08:00)
> >
> > Maven home: /usr/local/Cellar/maven@3.3/3.3.9/libexec
> >
> > Java version: 1.8.0_131, vendor: Oracle Corporation
> >
> > Java home:
> > /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre
> >
> > Default locale: en_US, platform encoding: UTF-8
> >
> > OS name: "mac os x", version: "10.12.6", arch: "x86_64", family: "mac"
> >
> > --
> >
> > ./platform-info.sh: line 64: docker: command not found
> >
> > --
> >
> > node
> >
> > v9.4.0
> >
> > --
> >
> > npm
> >
> > 5.6.0
> >
> > --
> >
> > Configured with: --prefix=/Library/Developer/CommandLineTools/usr
> > --with-gxx-include-dir=/usr/include/c++/4.2.1
> >
> > Apple LLVM version 9.0.0 (clang-900.0.38)
> >
> > Target: x86_64-apple-darwin16.7.0
> >
> > Thread model: posix
> >
> > InstalledDir: /Library/Developer/CommandLineTools/usr/bin
> >
> > --
> >
> > Compiler is C++11 compliant
> >
> > --
> >
> > Darwin sujay-lm 16.7.0 Darwin Kernel Version 16.7.0: Wed Oct  4 00:17:00
> > PDT 2017; root:xnu-3789.71.6~1/RELEASE_X86_64 x86_64
> >
> > --
> >
> > Total System Memory = 16384 MB
> >
> > Processor Model: Intel(R) Core(TM) i7-6567U CPU
> >
> > Processor Speed: 3.30GHz
> >
> > Total Physical Processors: 2
> >
> > Total cores: 2
> >
> > Disk information:
> >
> > /dev/disk1 233Gi   83Gi  149Gi36% 1222172 42937451070%
>  /
> >
> > This CPU appears to support virtualization
> >
> > On Wed, Jan 24, 2018 at 4:53 AM, Otto Fowler 
> > wrote:
> >
> >> Can you run metron-deployment/scripts/platform_info.sh and send the
> >> output?
> >>
> >>
> >> On January 23, 2018 at 21:43:34, Sujay Jaladi (jsu...@gmail.com) wrote:
> >>
> >> Hello,
> >>
> >> Everytime I attempt to deploy apache metron on AWS, I get the following
> >> error and all the servers are up and running expect Metron or its
> >> components are not installed. Please help.
> >>
> >> fatal: [ec2-52-10-94-22.us-west-2.compute.amazonaws.com -> localhost]:
> >> FAILED! => {"changed": true, "cmd": "cd
> >> /Users/sujay/Downloads/apache-metron-0.4.2-rc2/metron-
> deployment/amazon-ec2/../playbooks/../..
> >> && mvn clean package -DskipTests -T 2C -P HDP-2.5.0.0,mpack", "delta":
> >> "0:00:04.845260", "end": "2018-01-23 18:28:27.608265", "failed": true,
> >> "rc": 1, "start": "2018-01-23 18:28:22.763005", "stderr": "", "stdout":
> >> "[INFO] Scanning for projects...\n[INFO]
> >> 
> \n[INFO]
> >> Reactor Build Order:\n[INFO] \n[INFO] Metron\n[INFO]
> metron-stellar\n[INFO]
> >> stellar-common\n[INFO] metron-analytics\n[INFO]
> metron-maas-common\n[INFO]
> >> metron-platform\n[INFO] metron-zookeeper\n[INFO]
> >> metron-test-utilities\n[INFO] metron-integration-test\n[INFO]
> >> metron-maas-service\n[INFO] metron-common\n[INFO]
> metron-statistics\n[INFO]
> >> metron-writer\n[INFO] metron-storm-kafka-override\n[INFO]
> >> metron-storm-kafka\n[INFO] metron-hbase\n[INFO]
> >> metron-profiler-common\n[INFO] metron-profiler-client\n[INFO]
> >> metron-profiler\n[INFO] metron-hbase-client\n[INFO]
> >> metron-enrichment\n[INFO] metron-indexing\n[INFO] metron-solr\n[INFO]
> >> metron-pcap\n[INFO] metron-parsers\n[INFO] metron-pcap-backend\n[INFO]
> >> metron-data-management\n[INFO] metron-api\n[INFO]
> metron-management\n[INFO]
> >> elasticsearch-shaded\n[INFO] metron-elasticsearch\n[INFO]
> >> metron-deployment\n[INFO] Metron Ambari Management Pack\n[INFO]
> >> metron-contrib\n[INFO] metron-docker\n[INFO] metron-interface\n[INFO]
> >> metron-config\n[INFO] metron-alerts\n[INFO] metron-rest-client\n[INFO]
> >> metron-rest\n[INFO] site-book\n[INFO] 3rd party Functions (just for
> >> tests)\n[INFO] \n[INFO] Using the MultiThreadedBuilder implementation
> with
> >> a thread count of 8\n[INFO]
> >> \n[INFO]
> >> 
>

Re: [DISCUSS] Update Metron Elasticsearch index names to metron_

2018-01-24 Thread Michael Miklavcic

I hear you Ali. I think this type of change would actually ease issues with
downtime because it offers an easy path to migrating existing indices. I'd
have to review the specifics in the ES docs again, but I believe you could
duplicate the old indexes and migrate them to "metron_" in advance of the
upgrade, and then consume new data to the new index pattern/name after the
upgrade. That should be pretty seamless, I think. I guess it depends on how
you're using ES.

On Wed, Jan 24, 2018 at 4:08 PM, Ali Nazemian  wrote:

> Hi All,
>
> I just wanted to say it would be great if we can be careful with these type
> of changes. From the development point of view, it is just a few lines of
> code which can provide multiple advantages, but for live large-scale Metron
> platforms, some of these changes might be really expensive to address with
> zero-downtime.
>
> Cheers,
> Ali
>
> On Thu, Jan 25, 2018 at 9:29 AM, Otto Fowler 
> wrote:
>
> > +1
> >
> >
> > On January 24, 2018 at 16:28:42, Nick Allen (n...@nickallen.org) wrote:
> >
> > +1 to a standard prefix for all Metron indices. I've had the same thought
> > myself and you laid out the advantages well.
> >
> >
> >
> >
> >
> > On Wed, Jan 24, 2018 at 3:47 PM zeo...@gmail.com 
> wrote:
> >
> > > I agree with having a metron_ prefix for ES indexes, and the timing.
> > >
> > > Jon
> > >
> > > On Wed, Jan 24, 2018 at 3:20 PM Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > With the completion of https://github.com/apache/metron/pull/840
> > > > (METRON-939: Upgrade ElasticSearch and Kibana), we have the makings
> for
> > a
> > > > major release rev of Metron in the upcoming release (currently
> slotted
> > to
> > > > 0.4.3, I believe). Since there are non-backwards compatible changes
> > > > pertaining to ES indexing, it seems like a good opportunity to
> revisit
> > > our
> > > > index naming standards.
> > > >
> > > > I propose we add a simple prefix "metron_" to all Metron indexes.
> There
> > > are
> > > > numerous reasons for doing so
> > > >
> > > > - removes the likelihood of index name collisions when we perform
> > > > operations on index wildcard names, e.g. "enrichment_*, indexing_*,
> > > > etc.".
> > > > - ie, this allows us to be more friendly in a multi-tenant ES
> > > > environment for relatively low engineering cost.
> > > > - simplifies the Kibana dashboard a bit. We currently needed to
> > > create a
> > > > special index pattern in order to accommodate multi-index pattern
> > > > matching
> > > > across all metron-specific indexes. Using metron_* would be much
> > > simpler
> > > > and less prone to error.
> > > > - easier for customers to debug and identify Metron-specific indexes
> > > and
> > > > associated data
> > > >
> > > >
> > > > The reason for making these changes now is that we already have
> > breaking
> > > > changes with ES. Leveraging existing indexed data rather than
> deleting
> > > > indexes and starting from scractch already requires a
> > > re-indexing/migration
> > > > step, so there is no additional effort on the part of users if they
> > > choose
> > > > to attempt a migration. It further makes sense with our current work
> > > > towards upgrading Solr.
> > > >
> > > > We already have a battery of integration and manual tests after the
> ES
> > > > upgrade work that can be leveraged to validate the changes.
> > > >
> > > > Mike Miklavcic
> > > >
> > >
> > >
> > > --
> > >
> > > Jon
> > >
> >
>
>
>
> --
> A.Nazemian
>

Re: [DISCUSS] Update Metron Elasticsearch index names to metron_

2018-01-24 Thread Michael Miklavcic

One other benefit of this revised approach - we can more effectively use
index template patterns to specify our base set of Metron property types.
Call me crazy, but I think we should be able to do something like:



{
  *"template": "metron_*",*
  "mappings": {
"metron_doc": {
  "dynamic_templates": [
  {
"geo_location_point": {
  "match": "enrichments:geo:*:location_point",
  "match_mapping_type": "*",
  "mapping": {
"type": "geo_point"
  }
}
  },
  {
"geo_country": {
  "match": "enrichments:geo:*:country",
  "match_mapping_type": "*",
  "mapping": {
"type": "keyword"
  }
}
  },
  {
"geo_city": {
  "match": "enrichments:geo:*:city",
  "match_mapping_type": "*",
  "mapping": {
"type": "keyword"
  }
}
  },
  {
"geo_location_id": {
  "match": "enrichments:geo:*:locID",
  "match_mapping_type": "*",
  "mapping": {
"type": "keyword"
  }
}
  },
  {
"geo_dma_code": {
  "match": "enrichments:geo:*:dmaCode",
  "match_mapping_type": "*",
  "mapping": {
"type": "keyword"
  }
}
  },
  {
"geo_postal_code": {
  "match": "enrichments:geo:*:postalCode",
  "match_mapping_type": "*",
  "mapping": {
"type": "keyword"
  }
}
  },
  {
"geo_latitude": {
  "match": "enrichments:geo:*:latitude",
  "match_mapping_type": "*",
  "mapping": {
"type": "float"
  }
}
  },
  {
"geo_longitude": {
  "match": "enrichments:geo:*:longitude",
  "match_mapping_type": "*",
  "mapping": {
"type": "float"
  }
}
  },
  {
"timestamps": {
  "match": "*:ts",
  "match_mapping_type": "*",
  "mapping": {
"type": "date",
"format": "epoch_millis"
  }
}
  },
  {
"threat_triage_score": {
  "mapping": {
    "type": "float"
  },
  "match": "threat:triage:*score",
  "match_mapping_type": "*"
}
  },
  {
"threat_triage_reason": {
  "mapping": {
"type": "text",
"fielddata": "true"
  },
  "match": "threat:triage:rules:*:reason",
  "match_mapping_type": "*"
}
  },
  {
"threat_triage_name": {
  "mapping": {
"type": "text",
"fielddata": "true"
  },
  "match": "threat:triage:rules:*:name",
  "match_mapping_type": "*"
}
  }

]}}

That means that for every new sensor we bring on board we can skip
adding that boiler plate mapping config to every new template.



On Wed, Jan 24, 2018 at 6:34 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I hear you Ali. I think this type of change would actually ease issues
> with downtime because it offers an easy path to migrating existing indices.
> I'd have to review the specifics in the ES docs again, but I believe you
> could duplicate the old indexes and migrate them to "metron_" in advance of
> the upgrade, and then consume new data to the new index pattern/name after
> the upgrade. That should be pretty seamless, I think. I guess it depends on
> how you're using ES.
>
> On Wed, Jan 24, 2018 at 4:08 PM, Ali Nazemian 
> wrote:
>
>> Hi All,
>>
>> I just wanted to say it would be great if we can be careful with these
>> type
>> of changes. From the development point of view, it is just a few lines of
>> code which can provide

Re: [DISCUSS] Update Metron Elasticsearch index names to metron_

2018-01-26 Thread Michael Miklavcic

Just checked on the length issue - we should be good -
https://github.com/elastic/elasticsearch/issues/8079

On Fri, Jan 26, 2018 at 3:37 PM, James Sirota  wrote:

> Seems reasonable to me.  The only thing is that it may make the index
> names too long. Not sure if that matters to ES or not
>
> 26.01.2018, 15:32, "Simon Elliston Ball" :
> > +1 on this. The idea of a default broad matching template should also
> include an order entry to avoid conflicts with more specific templates, and
> we should then document the need for a higher order value in all per-source
> index templates.
> >
> > In terms of production migration, I think we may want to provide some
> detailed documentation in the upgrade guide on this, because there will be
> people with a lot of existing indices that will be difficult to handle. We
> may also need some tooling, but I expect docs would do the job. What do
> people think about migration?
> >
> > Simon
> >
> >>  One other benefit of this revised approach - we can more effectively
> use
> >>  index template patterns to specify our base set of Metron property
> types.
> >>  Call me crazy, but I think we should be able to do something like:
> >>
> >>  
> >>
> >>  {
> >>   *"template": "metron_*",*
> >>   "mappings": {
> >> "metron_doc": {
> >>   "dynamic_templates": [
> >>   {
> >> "geo_location_point": {
> >>   "match": "enrichments:geo:*:location_point",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "geo_point"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_country": {
> >>   "match": "enrichments:geo:*:country",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "keyword"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_city": {
> >>   "match": "enrichments:geo:*:city",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "keyword"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_location_id": {
> >>   "match": "enrichments:geo:*:locID",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "keyword"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_dma_code": {
> >>   "match": "enrichments:geo:*:dmaCode",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "keyword"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_postal_code": {
> >>   "match": "enrichments:geo:*:postalCode",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "keyword"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_latitude": {
> >>   "match": "enrichments:geo:*:latitude",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "float"
> >>   }
> >> }
> >>   },
> >>   {
> >> "geo_longitude": {
> >>   "match": "enrichments:geo:*:longitude",
> >>   "match_mapping_type": "*",
> >>   "mapping": {
> >> "type": "float"
> >>   }
> >> }
> >>   },
> >>   {
> >> "timestamps": {
> >>   "match": "*:ts",
> >>   &quo

Re: [DISCUSS] Move SHELL type functions from management to stellar common

2018-01-31 Thread Michael Miklavcic

Agreed

On Jan 31, 2018 7:51 AM, "Justin Leet"  wrote:

> Agreed, I think it makes sense to move them there.
>
> On Wed, Jan 31, 2018 at 9:28 AM, Casey Stella  wrote:
>
> > I'd be in favor of that.  That is general purpose stuff.
> >
> > On Wed, Jan 31, 2018 at 9:12 AM, Otto Fowler 
> > wrote:
> >
> > > Per:  https://issues.apache.org/jira/browse/METRON-876
> > >
> > > I think we should move the shell/console type functions from stellar
> > > management to stellar-common, and guard them with CONSOLE capability.
> > > Thoughts?
> > >
> > > ottO
> > >
> >
>

Re: [DISCUSS] Persistence store for user profile settings

2018-02-01 Thread Michael Miklavcic

Personally, I'd be in favor of something like Maria DB as an open source
repo. Or any other ansi sql store. On the positive side, it should mesh
seamlessly with ORM tools. And the schema for this should be pretty
vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC
and simple command scripts in Java for something this small. I'm not
worried so much about migrations of this sort. Large scale DBs can get
involved with major schema changes, but thats usually when the datastore is
a massive set of tables with complex relationships, at least in my
experience.

We could also use hbase, which probably wouldn't be that hard either, but
there may be more boilerplate to write for the client as compared to
standard SQL. But I'm assuming we could reuse a fair amount of existing
code from our enrichments. One additional reason in favor of hbase might be
data replication. For a SQL instance we'd probably recommend a RAID store
or backup procedure, but we get that pretty easy with hbase too.

On Feb 1, 2018 2:45 PM, "Casey Stella"  wrote:

> So, I'll answer your question with some questions:
>
>- No matter the data store we use upgrading will take some care, right?
>- Do we currently depend on a RDBMS anywhere?  I want to say that we do
>in the REST layer already, right?
>- If we don't use a RDBMs, what's the other option?  What are the pros
>and cons?
>- Have we considered non-server offline persistent solutions (e.g.
>https://www.html5rocks.com/en/features/storage)?
>
>
>
> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman  wrote:
>
> > There is currently a PR up for review that allows a user to configure and
> > save the list of facet fields that appear in the left column of the
> Alerts
> > UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
> > support which means we can store those in a relational database.
> >
> > However I'm not 100% sure this is the best place to keep this.  As we add
> > more use cases like this the backing tables in the RDBMS will need to be
> > managed.  This could make upgrading more tedious and error-prone.  Is
> there
> > are a better way to store this, assuming we can leverage a component
> that's
> > already included in our stack?
> >
> > Ryan
> >
>

Re: [DISCUSS] The requested URL returned error: 404 Not Found

2018-02-05 Thread Michael Miklavcic

I had this problem on a machine running Vagrant 1.8.1. I thought it was
only Ubuntu at first, so I removed some additional boxes and found it to be
a problem for all of them. I didn't see any relevant articles other than
some old stuff talking about a bundled curl command being a problem, but
that didn't help anything. Not surprising since the URL being attempted
with curl didn't work via browser either. 404, as Nick mentioned.

{17:08}~ ➭ vagrant box add hashicorp/precise64
The box 'hashicorp/precise64' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
`vagrant login`. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/hashicorp/precise64";]
Error: The requested URL returned error: 404 Not Found

My last ditch attempt here was to upgrade Vagrant to 2.0.2. That seems to
have fixed it for me, but per their warning message I'm not sure if this
new location is permanent or if OSS Hashicorp Vagrant consumers need to
find an alternative. The new location appears to be https://vagrantcloud.com
.

Mike

On Mon, Feb 5, 2018 at 6:45 AM, Nick Allen  wrote:

> When launching either of the development environments, you are probably
> seeing a warning message like this.
>
> Bringing machine 'node1' up with 'virtualbox' provider...
> > ...
> > ==> node1: There was a problem while downloading the metadata for your
> box
> > ==> node1: to check for updates. This is not an error, since it is
> usually
> > due
> > ==> node1: to temporary network problems. This is just a warning. The
> > problem
> > ==> node1: encountered was:
> > ==> node1:
> > ==> node1: The requested URL returned error: 404 Not Found
> > ==> node1:
> > ==> node1: If you want to check for box updates, verify your network
> > connection
> > ==> node1: is valid and try again.
>
>
>
> I believe the problem is that Hashicorp has chosen to no longer host the
> base images that we rely on (outside of paying enterprise customers.)
>
> The Packer, Artifact Registry and Terraform Enterprise (Legacy) features of
> > Atlas will no longer be actively developed or maintained and will be
> fully
> > decommissioned on Friday, March 30, 2018. Please see our guide on
> building
> > immutable infrastructure with Packer on CI/CD for ideas on implementing
> > Packer and Artifact Registry features yourself and the Upgrading From
> > Terraform Enterprise (Legacy) guide to migrate to the new Terraform
> > Enterprise.
>
>
>
> If you already have the base images downloaded, do not delete them!  The
> development environments will continue to work as long as you have those
> images.
>
> The problem is that new users will no longer be able to download these
> images.  And ultimately I am suspect that future updates will occur on
> these base images.
>
> Is everyone experiencing this problem?  Does anyone have any other details
> to share on this?
>
> I am concerned that this is going to force us into some significant work in
> the short-term to ensure that our development environments continue to
> work.  I have some alternative options to discuss, but I want to make sure
> we have all the information on what's happening before discussing those.
>

Re: [DISCUSS] The requested URL returned error: 404 Not Found

2018-02-05 Thread Michael Miklavcic

Hm, now that is interesting. I'm not clear how they manage their URLs, but
it seems like they may have missed a hardcoded value used by their update
functionality.

On Feb 5, 2018 3:01 PM, "Nick Allen"  wrote:

> I was experiencing this issue with both Ansible 1.8.1 and 2.0.2.
>
> I just found that now it seems to only spit out that warning when checking
> for updates; not when downloading new images.  If I remove the older image
> to force it to re-download, it does seem to work.  But after re-downloading
> the image, it gets an error checking to see if the image is up-to-date.
>
> I have no idea what's going on, but at least it doesn't seem to be
> hindering our development environments... yet.
>
>
> $ cd ~/Development/metron/metron-deployment/development/ubuntu14
> $ vagrant box outdated
> /Users/nallen/Development/metron/metron-deployment/development/ubuntu14/
> Vagrantfile:28:
> warning: constant ::TRUE is deprecated
>  Running with ansible-skip-tags: ["sensors"]
> DEPRECATION: The 'sudo' option for the Ansible provisioner is deprecated.
> Please use the 'become' option instead.
> The 'sudo' option will be removed in a future release of Vagrant.
>
> Checking if box 'ubuntu/trusty64' is up to date...
> There was a problem while downloading the metadata for your box
> to check for updates. This is not an error, since it is usually due
> to temporary network problems. This is just a warning. The problem
> encountered was:
>
> The requested URL returned error: 404 Not Found
>
> If you want to check for box updates, verify your network connection
> is valid and try again.
>
>
> $ vagrant box remove ubuntu/trusty64 --all
>
>
> $ vagrant up
> /Users/nallen/Development/metron/metron-deployment/development/ubuntu14/
> Vagrantfile:28:
> warning: constant ::TRUE is deprecated
>  Running with ansible-skip-tags: ["sensors"]
> DEPRECATION: The 'sudo' option for the Ansible provisioner is deprecated.
> Please use the 'become' option instead.
> The 'sudo' option will be removed in a future release of Vagrant.
>
> Bringing machine 'node1' up with 'virtualbox' provider...
> ==> node1: Box 'ubuntu/trusty64' could not be found. Attempting to find and
> install...
> node1: Box Provider: virtualbox
> node1: Box Version: >= 0
> ==> node1: Loading metadata for box 'ubuntu/trusty64'
> node1: URL: https://vagrantcloud.com/ubuntu/trusty64
> ==> node1: Adding box 'ubuntu/trusty64' (v20180125.0.0) for provider:
> virtualbox
> node1: Downloading: https://vagrantcloud.com/ubuntu/boxes/trusty64/
> versions/20180125.0.0/providers/virtualbox.box
> ==> node1: Successfully added box 'ubuntu/trusty64' (v20180125.0.0) for
> 'virtualbox'!
> ==> node1: Importing base box 'ubuntu/trusty64'...
> ==> node1: Matching MAC address for NAT networking...
> ==> node1: Checking if box 'ubuntu/trusty64' is up to date...
> ==> node1: There was a problem while downloading the metadata for your box
> ==> node1: to check for updates. This is not an error, since it is usually
> due
> ==> node1: to temporary network problems. This is just a warning. The
> problem
> ==> node1: encountered was:
> ==> node1:
> ==> node1: The requested URL returned error: 503 Service Unavailable
> ==> node1:
> ==> node1: If you want to check for box updates, verify your network
> connection
> ==> node1: is valid and try again.
> ==> node1: Setting the name of the VM: ubuntu14_node1_1517850259612_86441
> ==> node1: Clearing any previously set forwarded ports...
> ==> node1: Clearing any previously set network interfaces...
> ==> node1: Preparing network interfaces based on configuration...
> node1: Adapter 1: nat
> node1: Adapter 2: hostonly
> ==> node1: Forwarding ports...
> node1: 22 (guest) =>  (host) (adapter 1)
> ==> node1: Running 'pre-boot' VM customizations...
> ==> node1: Booting VM...
> ==> node1: Waiting for machine to boot. This may take a few minutes...
> node1: SSH address: 127.0.0.1:
> node1: SSH username: vagrant
> node1: SSH auth method: private key
>
>
>
>
> On Mon, Feb 5, 2018 at 11:48 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I had this problem on a machine running Vagrant 1.8.1. I thought it was
> > only Ubuntu at first, so I removed some additional boxes and found it to
> be
> > a problem for all of them. I didn't see any relevant articles other than
> > some old stuff talking abou

Re: [DISCUSS] Persistence store for user profile settings

2018-02-12 Thread Michael Miklavcic

 have an RDBMS requirement for Ambari?  That's
> a
> > > > > dependency that we do not control.
> > > > >
> > > > >
> > > > >> ... hbase seems a good option (because we already have it there,
> it
> > > > would
> > > > > be kinda crazy at this scale if we didn’t already have it)
> > > > >
> > > > > (3) In this scenario, the RDBMS would not scale proportionally with
> > the
> > > > > amount of telemetry, it would scale based on usage; primarily the
> > > number
> > > > of
> > > > > users.  This is not "big data" scale.  I don't think we can make
> the
> > > case
> > > > > for HBase based on scale here.
> > > > >
> > > > >
> > > > >> We would also end up with, as Mike points out, a whole new disk
> > > > > deployment patterns and a bunch of additional DBA ops process
> > > > requirements
> > > > > for every install.
> > > > >
> > > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > > > enterprises and organizations that are already very familiar with
> > RDBMS
> > > > > solutions and have the infrastructure in place to manage those.
> For
> > > > users
> > > > > that don't need HA/DR, just use the DB that gets spun-up with
> Ambari.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > > > si...@simonellistonball.com> wrote:
> > > > >
> > > > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > > > >>
> > > > >> If we consider the data access patterns for user profiles, we are
> > > > unlikely
> > > > >> to query into them, or indeed do anything other than look them up,
> > or
> > > > write
> > > > >> them out by a username key. To that end, using an ORM to translate
> > a a
> > > > >> nested config object into a load of tables seems to introduce
> > > complexity
> > > > >> and brittleness we then have to take away through relying on
> > > relational
> > > > >> consistency models. We would also end up with, as Mike points
> out, a
> > > > whole
> > > > >> new disk deployment patterns and a bunch of additional DBA ops
> > process
> > > > >> requirements for every install.
> > > > >>
> > > > >> Since the access pattern is almost entirely key => value, hbase
> > seems
> > > a
> > > > >> good option (because we already have it there, it would be kinda
> > crazy
> > > > at
> > > > >> this scale if we didn’t already have it) or arguably zookeeper,
> but
> > > that
> > > > >> might be at the other end of the scale argument. I’d even go as
> far
> > as
> > > > to
> > > > >> suggest files on HDFS to keep it simple.
> > > > >>
> > > > >> Simon
> > > > >>
> > > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > > > michael.miklav...@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>> Personally, I'd be in favor of something like Maria DB as an open
> > > > source
> > > > >>> repo. Or any other ansi sql store. On the positive side, it
> should
> > > mesh
> > > > >>> seamlessly with ORM tools. And the schema for this should be
> pretty
> > > > >>> vanilla, I'd imagine. I might even consider skipping ORM for
> > straight
> > > > >> JDBC
> > > > >>> and simple command scripts in Java for something this small. I'm
> > not
> > > > >>> worried so much about migrations of this sort. Large scale DBs
> can
> > > get
> > > > >>> involved with major schema changes, but thats usually when the
> > > > datastore
> > > > >> is
> > > > >>> a massive set of tables with complex relationships, at least in
> my
> > > > >>> experience.
> > > > >>>
> > > > >>> We could also use hbase, which probably wouldn't be that hard
> > either,
> > > > but
> > > > >>> there may be more boilerplate to write for the client as compared
> > to
> > > > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> > > existing
> > > > >>> code from our enrichments. One additional reason in favor of
> hbase
> > > > might
> > > > >> be
> > > > >>> data replication. For a SQL instance we'd probably recommend a
> RAID
> > > > store
> > > > >>> or backup procedure, but we get that pretty easy with hbase too.
> > > > >>>
> > > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" 
> wrote:
> > > > >>>
> > > > >>>> So, I'll answer your question with some questions:
> > > > >>>>
> > > > >>>>  - No matter the data store we use upgrading will take some
> care,
> > > > >> right?
> > > > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say
> that
> > > we
> > > > >> do
> > > > >>>>  in the REST layer already, right?
> > > > >>>>  - If we don't use a RDBMs, what's the other option?  What are
> the
> > > > pros
> > > > >>>>  and cons?
> > > > >>>>  - Have we considered non-server offline persistent solutions
> > (e.g.
> > > > >>>>  https://www.html5rocks.com/en/features/storage)?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <
> > merrim...@gmail.com>
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> There is currently a PR up for review that allows a user to
> > > configure
> > > > >> and
> > > > >>>>> save the list of facet fields that appear in the left column of
> > the
> > > > >>>> Alerts
> > > > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST
> layer
> > > has
> > > > >> ORM
> > > > >>>>> support which means we can store those in a relational
> database.
> > > > >>>>>
> > > > >>>>> However I'm not 100% sure this is the best place to keep this.
> > As
> > > we
> > > > >> add
> > > > >>>>> more use cases like this the backing tables in the RDBMS will
> > need
> > > to
> > > > >> be
> > > > >>>>> managed.  This could make upgrading more tedious and
> error-prone.
> > > Is
> > > > >>>> there
> > > > >>>>> are a better way to store this, assuming we can leverage a
> > > component
> > > > >>>> that's
> > > > >>>>> already included in our stack?
> > > > >>>>>
> > > > >>>>> Ryan
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Apache Website Required Links

2018-02-16 Thread Michael Miklavcic

That's awesome Anand, thanks for tackling this!

On Fri, Feb 16, 2018 at 7:41 AM, Anand Subramanian <
asubraman...@hortonworks.com> wrote:

> Btw, here is the output from running site-scan on my local version of the
> changes:
>
> ➜  tools git:(master) ruby site-scan.rb http://127.0.0.1:4000
> 127 http://127.0.0.1:4000 missing
> {
>   "127": {
> "display_name": "127",
> "uri": "http://127.0.0.1:4000";,
> "events": "https://www.apache.org/events/current-event";,
> "foundation": "ASF HOME",
> "license": "https://www.apache.org/licenses/LICENSE-2.0";,
> "sponsorship": "https://www.apache.org/foundation/sponsorship.html";,
> "security": "https://www.apache.org/security";,
> "trademarks": "Apache Metron and its logo are trademarks of The Apache
> Software Foundation.",
> "copyright": "Copyright © 2018, The Apache Software Foundation.",
> "image": null,
> "thanks": "https://www.apache.org/foundation/thanks.html";,
> "copyparent": true
>   }
> }
>
> Regards
> Anand
>
> On 2/16/18, 8:09 PM, "Anand Subramanian" 
> wrote:
>
> Hello All,
>
> Apache Whimsy checks for the site requisites in the main index.html
> and not inside sub-levels. I have created METRON-1457 (
> https://github.com/apache/metron/pull/938) to move the ASF links to the
> main page.
>
> Here is a screenshot of how the new metron page will look like with
> the ASF links above the page footer:
> https://imgur.com/3Y8ZLWL
>
> Please review and let me know what you think about the new look.
>
> Thanks
> Anand
>
> On 2/16/18, 2:08 AM, "Casey Stella"  wrote:
>
> Just reporting back that Anand's PR METRON-1386 (
> https://github.com/apache/metron/pull/935) has been merged into
> master and
> the asf-site branch.
> Kudos to Anand!
>
> Casey
>
> On Wed, Feb 7, 2018 at 9:11 AM, Anand Subramanian <
> asubraman...@hortonworks.com> wrote:
>
> > I can take a shot at this if there are no other takers.
> >
> > Regards,
> > Anand
> >
> > On 2/5/18, 8:59 PM, "Justin Leet"  wrote:
> >
> > I'd created a Jira awhile ago, but it deserves a callout to
> the
> > community.
> > Especially if someone wants to grab it, it's probably
> something pretty
> > easy
> > (and valuable!) to do.
> >
> > There's a set of required links on Apache web pages, which
> can be seen
> > at Website
> > Navigation Links Policy
> > 
> >
> > Reporting is at Site Check For Project - Metron
> > 
> >
> > This ticket is available at:
> > METRON-1386  jira/browse/METRON-1386>
> >
> >
> >
>
>
>
>
>

[DISCUSS] Split Elasticsearch and Kibana into separate MPack from Metron

2018-02-16 Thread Michael Miklavcic

This came up earlier when discussing work around the ES upgrade:
https://lists.apache.org/thread.html/66280bc061afbba2c353221c3c05fd74b247b970921c009c29edc815@%3Cdev.metron.apache.org%3E
https://lists.apache.org/thread.html/8ec83b6a3ef39057c9466ff72a2f63c9308452f1ebc1804e67cb495b@%3Cdev.metron.apache.org%3E

Looks like Otto made this suggestion and Kyle is on board. I was originally
opposed to this because it did not seem worth the effort to support 2
separate MPacks. However, now that we are working on the Solr upgrade, it
seems like an appropriate solution for enabling us to make the indexing
piece pluggable. I propose that we commence with this solution.

Cheers,
Mike

Re: [DISCUSS] Split Elasticsearch and Kibana into separate MPack from Metron

2018-02-21 Thread Michael Miklavcic

Anyone have any opinions on how we should version the ES/Kibana MPack?

We currently rev the Metron one based on current Metron version and apply
it to both the overall MPack as well as the individual service versions,
e.g. metron_mpack-0.4.3.0/common-services/METRON/0.4.3. For ES we have been
keeping the service version matched to the current ES version because it's
independent of Metron, ie 5.6.2. The upshot is that we can handle this one
of two ways.

   1. Keep the same approach after splitting off ES and Kibana - ES MPack
   version is set to the current Metron version (0.4.3) and the service itself
   is set to the ES version (5.6.2).
   2. Use ES version for both the MPack version and service version (5.6.2).

I personally recommend and prefer the first approach because it allows us
to make changes to the mpack itself without necessarily changing the
version of ES or Kibana, which is something that is likely to happen. It's
also consistent and seamless with our current versioning approach. Lastly,
I don't believe the ES versions provide much contextual sense in the Metron
world for an MPack version - setting the service version definitely makes
sense to indicate what exact ES version we're using, but the MPack is
really our way of providing custom functionality that wraps a specific
version of ES/Kibana. Hope this makes sense.

Mike

On Sat, Feb 17, 2018 at 11:13 AM, Nick Allen  wrote:

> +1 I agree with Otto's idea.  It makes a lot of sense for Elasticsearch and
> Kibana to live in a separate Mpack.
>
> This provides us a path forward to support additional indexers like Solr.
>
> We should also not force our users to install an external component (like
> Elasticsearch) using the Mpack.  There are just too many installation
> configurations for us to reasonably support; especially on larger
> installations.  Supporting the Elasticsearch MPack is a project unto
> itself.  That being said, the functionality will still be there for those
> that want to use it.
>
>
>
> On Fri, Feb 16, 2018 at 4:10 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > This came up earlier when discussing work around the ES upgrade:
> > https://lists.apache.org/thread.html/66280bc061afbba2c353221c3c05fd
> > 74b247b970921c009c29edc815@%3Cdev.metron.apache.org%3E
> > https://lists.apache.org/thread.html/8ec83b6a3ef39057c9466ff72a2f63
> > c9308452f1ebc1804e67cb495b@%3Cdev.metron.apache.org%3E
> >
> > Looks like Otto made this suggestion and Kyle is on board. I was
> originally
> > opposed to this because it did not seem worth the effort to support 2
> > separate MPacks. However, now that we are working on the Solr upgrade, it
> > seems like an appropriate solution for enabling us to make the indexing
> > piece pluggable. I propose that we commence with this solution.
> >
> > Cheers,
> > Mike
> >
>

[DISCUSS] Metron debug info tool

2018-04-11 Thread Michael Miklavcic

Hey guys,

I wanted to bring attention to a tool I created for gathering cluster
details for debugging purposes. There are a number of locations that
properties get materialized, e.g. from Ambari -> properties file -> flux ->
Storm, which means a lot of hunting to guarantee that the changes you've
made are percolating correctly. Furthermore, it's generally useful to get a
sense of how your cluster is configured by gathering all of that info in
one place. I created a Python tool that does just that, and bundles up the
results in a tarball. Here is an overview of the artifacts I'm gathering -
you can see what commands are being used by looking at the script.

Ambari
full cluster config detail

Storm
cluster summary
cluster configuration
topology summary (enrichments and indexing)
topology status summary (enrichments and indexing)

Kafka
broker info
topics list
topic details (enrichments and indexing)

Metron
local file system configuration files
zookeeper configuration
flux files
lib directory file listing
rpm listing

Hadoop
version info

*** Are there any features/details you'd like to see added to this? Any
concerns or suggestions? ***

I am  also planning to add log file support along with md5sum of the jar
files deployed in Metron's lib directory.

https://github.com/apache/metron/pull/988/files#diff-0eddfa8f1dd67247e0803e405497b6e2

Cheers,
Mike Miklavcic

Re: [DISCUSS] Metron debug info tool

2018-04-11 Thread Michael Miklavcic

Comments below

On Wed, Apr 11, 2018, 10:59 AM Justin Leet  wrote:

> First off, this is super nice, and a great way to let us be able to debug
> and help others debug quickly, easily, and hopefully more consistently.
>
> I super briefly glanced at at it, so these might already be there, but I'd
> like to be able to filter what I get back, e.g. if I give the options for
> Storm and Metron, I'd like to limit to just those. Nothing complicated, but
> something quick and simple.
>

I had considered that too - I like it. Adding to the list.

>
> Hand in hand with that, I'd like the option to print to screen (maybe just
> for the non-config stuff or just print out the relevant filenames?).  At
> that point, it'd be really easy to grep or otherwise search through
> things.  Tarball is nice, especially when passing things off to someone
> else, or when you need to dig through a lot of larger config files, but I
> suspect a lot of use cases will be "Hey, real quick what's going on?"
>

Another good suggestion. Currently, the files get landed in a local
directory that are tarred up. We could also do a simple dump, similar to
how the zk config tool works.

>
> Other than that, does anyone have any thoughts on putting something like
> this into the management UI (for the non-Ambari managed stuff)?  That seems
> like it would be the natural place to get that stuff, keep it up to date,
> and even build in an export if we wanted to.  Would make it a lot easier
> for end users to be able to get a quick view into what's going on, and
> could let us build in some slightly better filtering and search
> capabilities.
>
>
>
> On Wed, Apr 11, 2018 at 12:10 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Hey guys,
> >
> > I wanted to bring attention to a tool I created for gathering cluster
> > details for debugging purposes. There are a number of locations that
> > properties get materialized, e.g. from Ambari -> properties file -> flux
> ->
> > Storm, which means a lot of hunting to guarantee that the changes you've
> > made are percolating correctly. Furthermore, it's generally useful to
> get a
> > sense of how your cluster is configured by gathering all of that info in
> > one place. I created a Python tool that does just that, and bundles up
> the
> > results in a tarball. Here is an overview of the artifacts I'm gathering
> -
> > you can see what commands are being used by looking at the script.
> >
> > Ambari
> > full cluster config detail
> >
> > Storm
> > cluster summary
> > cluster configuration
> > topology summary (enrichments and indexing)
> > topology status summary (enrichments and indexing)
> >
> > Kafka
> > broker info
> > topics list
> > topic details (enrichments and indexing)
> >
> > Metron
> > local file system configuration files
> > zookeeper configuration
> > flux files
> > lib directory file listing
> > rpm listing
> >
> > Hadoop
> > version info
> >
> > *** Are there any features/details you'd like to see added to this? Any
> > concerns or suggestions? ***
> >
> > I am  also planning to add log file support along with md5sum of the jar
> > files deployed in Metron's lib directory.
> >
> > https://github.com/apache/metron/pull/988/files#diff-
> > 0eddfa8f1dd67247e0803e405497b6e2
> >
> > Cheers,
> > Mike Miklavcic
> >
>

Re: [DISCUSS] Metron debug info tool

2018-04-11 Thread Michael Miklavcic

Agree with you both - one reason Ambari might be preferable is that there
are config variables we can access more easily from Ambari, kind of like
what we use in the MPacks. I haven't looked at what we have in the
management UI but I think that's also a reasonable option.

On Wed, Apr 11, 2018, 11:56 AM Nick Allen  wrote:

> I think this is super helpful, Mike.
>
>
> > Other than that, does anyone have any thoughts on putting something like
> this
> into the management UI (for the non-Ambari managed stuff)?  That seems like
> it would be the natural place to get that stuff...
>
> I agree this would be a great feature to add to a UI.
>
> My first thought was that this would be a good addition to Ambari.  I can't
> really think of compelling justification to go Ambari or the Mgmt UI
> though.  Either would work to make Mike's tool more accessible.
>
>
>
>
>
> On Wed, Apr 11, 2018 at 12:59 PM, Justin Leet 
> wrote:
>
> > First off, this is super nice, and a great way to let us be able to debug
> > and help others debug quickly, easily, and hopefully more consistently.
> >
> > I super briefly glanced at at it, so these might already be there, but
> I'd
> > like to be able to filter what I get back, e.g. if I give the options for
> > Storm and Metron, I'd like to limit to just those. Nothing complicated,
> but
> > something quick and simple.
> >
> > Hand in hand with that, I'd like the option to print to screen (maybe
> just
> > for the non-config stuff or just print out the relevant filenames?).  At
> > that point, it'd be really easy to grep or otherwise search through
> > things.  Tarball is nice, especially when passing things off to someone
> > else, or when you need to dig through a lot of larger config files, but I
> > suspect a lot of use cases will be "Hey, real quick what's going on?"
> >
> > Other than that, does anyone have any thoughts on putting something like
> > this into the management UI (for the non-Ambari managed stuff)?  That
> seems
> > like it would be the natural place to get that stuff, keep it up to date,
> > and even build in an export if we wanted to.  Would make it a lot easier
> > for end users to be able to get a quick view into what's going on, and
> > could let us build in some slightly better filtering and search
> > capabilities.
> >
> >
> >
> > On Wed, Apr 11, 2018 at 12:10 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > Hey guys,
> > >
> > > I wanted to bring attention to a tool I created for gathering cluster
> > > details for debugging purposes. There are a number of locations that
> > > properties get materialized, e.g. from Ambari -> properties file ->
> flux
> > ->
> > > Storm, which means a lot of hunting to guarantee that the changes
> you've
> > > made are percolating correctly. Furthermore, it's generally useful to
> > get a
> > > sense of how your cluster is configured by gathering all of that info
> in
> > > one place. I created a Python tool that does just that, and bundles up
> > the
> > > results in a tarball. Here is an overview of the artifacts I'm
> gathering
> > -
> > > you can see what commands are being used by looking at the script.
> > >
> > > Ambari
> > > full cluster config detail
> > >
> > > Storm
> > > cluster summary
> > > cluster configuration
> > > topology summary (enrichments and indexing)
> > > topology status summary (enrichments and indexing)
> > >
> > > Kafka
> > > broker info
> > > topics list
> > > topic details (enrichments and indexing)
> > >
> > > Metron
> > > local file system configuration files
> > > zookeeper configuration
> > > flux files
> > > lib directory file listing
> > > rpm listing
> > >
> > > Hadoop
> > > version info
> > >
> > > *** Are there any features/details you'd like to see added to this? Any
> > > concerns or suggestions? ***
> > >
> > > I am  also planning to add log file support along with md5sum of the
> jar
> > > files deployed in Metron's lib directory.
> > >
> > > https://github.com/apache/metron/pull/988/files#diff-
> > > 0eddfa8f1dd67247e0803e405497b6e2
> > >
> > > Cheers,
> > > Mike Miklavcic
> > >
> >
>

Re: [DISCUSS] Inactive PRs

2018-04-13 Thread Michael Miklavcic

I'm for cleaning up the outstanding inactive PR's and putting this in the
dev guidelines. I would actually like to push for a time that is less than
6 weeks. Why not 4? We don't risk much - a submitter can always reopen a
closed PR, and the history is maintained. Closed PR's don't disappear afaik
- they remain in perpetuity.

M

On Fri, Apr 13, 2018 at 1:41 PM, Otto Fowler 
wrote:

> I would make sure that each clause is consistent with the distinction.
> Contributor is def. better.
>
>
> On April 13, 2018 at 15:27:39, Nick Allen (n...@nickallen.org) wrote:
>
> Yes, that is a good edit Otto.  If I formally submit this as a change to
> dev guidelines, I will use your edit.
>
> One small thing, instead of "submitter", I'll stick with "contributor"
> because I use that everywhere else.
>
>  A pull request is 'inactive' if no comments or updates have been made by
> the contributor in the previous 6 weeks.
>
>
>
> On Fri, Apr 13, 2018 at 3:06 PM, Otto Fowler 
> wrote:
>
> > I would be more explicit that the inactivity was the inactivity of the
> > submitter.
> > It should be clear that this is not for PRs that have not been reviewed,
> > or PRs where the submitter has asked a question
> > or answered a question and the reviewers have abandoned the effort.  Not
> > that that ever happens.
> >
> > “A pull request where a review has been initiated will be considered
> > inactive if it is waiting on
> > reply or action on the part of the submitter and has had no activity by
> > that submitter in the previous six weeks”
> >
> > etc etc
> >
> >
> >
> >  A pull request is 'inactive' if no comments or updates have been made by
> > the submitter
> > in the previous 6 weeks
> >
> >
> > On April 13, 2018 at 14:44:40, Nick Allen (n...@nickallen.org) wrote:
> >
> > There are a fair number of inactive PRs in our queue that have little to
> no
> > chance of being merged. Tidying up our queue and keeping open only active
> > PRs should help the community better identify which PRs need reviewed and
> > actioned.
> >
> > If the original contributor does not close the PR, the only course of
> > action that we can take is to open an Apache Infra request to close the
> > PR. We have only ever done this after multiple failed attempts to contact
> > the original contributor.
> >
> > I suggest that we add to the Metron development guidelines [1] exactly
> how
> > inactive PRs should be handled.
> >
> > (Q1) Should we add to the development guidelines a process for handling
> > inactive PRs?
> >
> >
> >
> > Assuming there is support for this, I would suggest the following as a
> > first draft. These would serve as an addendum to section 2.6
> >
> > 2.6.1 Inactive Pull Requests
> >
> >
> > Contributions can often take a significant amount of time to complete the
> > code review process. This process requires active participation from the
> > contributor. If the contributor is unable to actively participate, the PR
> > is unlikely to successfully complete this process. Pull Requests that
> have
> > failed to receive active participation for an extended period of time
> risk
> > being treated as abandoned.
> >
> > Any committer can submit a request for Apache Infra to close a pull
> > request that has been abandoned according to the following guidelines.
> >
> >
> > - A pull request is 'inactive' if no comments or updates have been made
> > in the previous 6 weeks.
> >
> >
> > - For any 'inactive' pull request, a committer can request from the
> > contributor justification for keeping the pull request open.
> >
> >
> > - In that request, the committer should refer the contributor to these
> > development guidelines for inactive pull requests.
> >
> >
> > - If the contributor does not respond to the request within 2 additional
> > weeks, the committer should cast a -1 vote on the PR using these
> > development guidelines as justification.
> >
> >
> > - Any committer can then submit a request to Apache Infra to close the
> > PR based on this -1 vote.
> >
> >
> > (Q2) Assuming support for the idea, are these good guidelines? I offer
> > this only to help drive the discussion. I am open to alternatives.
> >
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/METRON/
> Development+Guidelines
> >
> >
>

[DISCUSS] Metron RPM spec changelog

2018-04-18 Thread Michael Miklavcic

We discovered yesterday while reviewing a PR that the RPM changelog hasn't
been maintained since 9/25/17. There are 7 changes to that file that have
not been logged in the changelog itself. The question is if we want to keep
maintaining the changelog and, if so, should we patch the existing log with
the missing commits. Any opinions on this? I myself don't have a strong
opinion either way, but we shouldn't leave it in its current state.

Mike


Quoting the conversation between myself and Justin Leet:

https://github.com/apache/metron/pull/996#issuecomment-382194736
@justinleet Do we still want/need to do this? The last log change was Tue
Sep 25 2017 by @merrimanr in METRON-1207. However, there have been 6
changes to the spec since then that have not made it to the change log. I
believe there was a reason we started doing this (in duplication of source
control), but I don't recall specifically. Do remember why that was?

https://github.com/apache/metron/pull/996#issuecomment-382199021
I believe, and my memory is pretty fuzzy, is that it's best practice to
maintain that changelog because it's useful for auditing and tracking
purposes given that it's available on the rpm itself.

There's probably a couple questions here


   1. Are we going to maintain it going forward? If not, we should just
   dump it entirely.
   2. If we choose to do so, do we want/need to update the changelog for
   the missing commits (and probably to use the dev list as authors, rather
   than individuals)?


Might be worth opening a discuss on it. I could be persuaded either way in
terms of whether we update it for this PR or not, but I have a slight
preference on adding it until there's agreement we aren't doing it.

Re: [DISCUSS] Metron RPM spec changelog

2018-04-18 Thread Michael Miklavcic

I think I like Casey's recommendation here. Would you want to simply say
that a release was cut, or actually list the changes under the release? We
could probably do a couple things to that end.

1. Per Otto's comment, get the existing changelog in order - I think we
should modify it to reflect a per-release formatting, which would mean
grabbing historical changes to that file and enumerating them per release
(or just having a very simple single change note).
e.g., the 0.4.2 items get merged as follows (changing the date accordingly
to reflect the release date)

* Tue Sep 25 2017 Apache Metron  - 0.4.2
- Add Alerts UI
- Updated and renamed metron-rest script

2. Depending on how you guys feel about granularity, we could make changes
in the current release added as a line-item under a CURRENT or
0.4.3-SNAPSHOT version, e.g.
* RELEASE-DATE Apache Metron  - CURRENT
- METRON-1499 Enable Configuration of Unified Enrichment Topology via A
- METRON-1483: Create a tool to monitor performance of the topologies c
- METRON-1397 Support for JSON Path and complex documents in JSONMapPar
- METRON-1460: Create a complementary non-split-join enrichment topology
- METRON-1302: Split up Indexing Topology into batch and random access
- METRON-1378: Create a summarizer

Or have the release manager do it. The first route would leave a dev on the
hook, but the release manager would then simply need to update the date and
version info rather than collect all the changes. I'm unsure off the top of
my head if rpm will blow a gasket over the date and version formatting, but
we can find a way to make that work. The other approach would mean just
doing a git log on the spec file and grabbing the delta since last release.
Side note, I kind of like the idea of having the Jira ticket number in the
comment like that in the second example. What do you guys think?

Mike


On Wed, Apr 18, 2018 at 9:23 AM, Otto Fowler 
wrote:

> I think having the spec file updated with the changes per release is fine,
> but is the release manager
> going to do that?
>
> If so then the docs need to be updated.  Also, we *should* true up any
> missing entries from the file now.
>
>
>
> On April 18, 2018 at 11:02:35, Casey Stella (ceste...@gmail.com) wrote:
>
> I think I'd prefer to see the changelog only include the release entries,
> rather than individual entries per dev. We keep the spec file in source
> control to determine the individual changes between releases. I'm happy to
> have my mind changed, though.
>
> On Wed, Apr 18, 2018 at 9:47 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > We discovered yesterday while reviewing a PR that the RPM changelog
> hasn't
> > been maintained since 9/25/17. There are 7 changes to that file that have
> > not been logged in the changelog itself. The question is if we want to
> keep
> > maintaining the changelog and, if so, should we patch the existing log
> with
> > the missing commits. Any opinions on this? I myself don't have a strong
> > opinion either way, but we shouldn't leave it in its current state.
> >
> > Mike
> >
> >
> > Quoting the conversation between myself and Justin Leet:
> >
> > https://github.com/apache/metron/pull/996#issuecomment-382194736
> > @justinleet Do we still want/need to do this? The last log change was Tue
> > Sep 25 2017 by @merrimanr in METRON-1207. However, there have been 6
> > changes to the spec since then that have not made it to the change log. I
> > believe there was a reason we started doing this (in duplication of
> source
> > control), but I don't recall specifically. Do remember why that was?
> >
> > https://github.com/apache/metron/pull/996#issuecomment-382199021
> > I believe, and my memory is pretty fuzzy, is that it's best practice to
> > maintain that changelog because it's useful for auditing and tracking
> > purposes given that it's available on the rpm itself.
> >
> > There's probably a couple questions here
> >
> >
> > 1. Are we going to maintain it going forward? If not, we should just
> > dump it entirely.
> > 2. If we choose to do so, do we want/need to update the changelog for
> > the missing commits (and probably to use the dev list as authors, rather
> > than individuals)?
> >
> >
> > Might be worth opening a discuss on it. I could be persuaded either way
> in
> > terms of whether we update it for this PR or not, but I have a slight
> > preference on adding it until there's agreement we aren't doing it.
> >
>

Re: [VOTE] Development Guidelines Addendum on Inactive Pull Requests

2018-04-20 Thread Michael Miklavcic

+1

On Fri, Apr 20, 2018 at 12:54 PM, Casey Stella  wrote:

> +1
>
> On Fri, Apr 20, 2018 at 11:17 AM David Lyle  wrote:
>
> > +1 sounds good to me.
> >
> > -D...
> >
> >
> > On Fri, Apr 20, 2018 at 11:09 AM, zeo...@gmail.com 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Fri, Apr 20, 2018 at 9:42 AM Michel Sumbul 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > 2018-04-20 14:40 GMT+01:00 Otto Fowler :
> > > >
> > > > > +1
> > > > >
> > > > >
> > > > > On April 20, 2018 at 09:30:30, Nick Allen (n...@nickallen.org)
> > wrote:
> > > > >
> > > > > I am proposing the following addition to the project's development
> > > > > guidelines [1]. Based on these guidelines, an abandoned pull
> request
> > > can
> > > > > be closed in roughly 6 weeks time (4 weeks of inactivity plus 2
> weeks
> > > to
> > > > > respond to a committer's request.)
> > > > >
> > > > > Please vote +1, 0, or -1 and also indicate if your vote is binding
> or
> > > > > non-binding. More information on voting can be found in the Apache
> > > Metron
> > > > > By-Laws [2].
> > > > >
> > > > > This vote will remain open for at least 72 hours, excluding this
> > > weekend.
> > > > > I plan to close the vote no sooner than Wednesday, April 25, 2018
> at
> > > 8:00
> > > > > AM EST.
> > > > >
> > > > > The discuss thread that preceeded this vote can be found here [3].
> > > > >
> > > > > --
> > > > >
> > > > > 2.6.1 Inactive Pull Requests
> > > > >
> > > > >
> > > > > Contributions can often take a significant amount of time to
> complete
> > > the
> > > > > code review process. This process requires active participation
> from
> > > the
> > > > > contributor. If the contributor is unable to actively participate,
> > the
> > > > > pull request is unlikely to successfully complete this process.
> > > > >
> > > > > Pull Requests that have failed to receive active participation from
> > the
> > > > > contributor for an extended period of time risk being abandoned.
> Any
> > > > > committer can submit a request for Apache Infra to close a pull
> > request
> > > > > that has been abandoned according to the following guidelines.
> > > > >
> > > > >
> > > > > - A pull request is 'inactive' if no comments or updates have been
> > made
> > > > > by the contributor in the previous 4 weeks.
> > > > >
> > > > >
> > > > > - For any 'inactive' pull request, a committer can request from the
> > > > > contributor justification for keeping the pull request open.
> > > > >
> > > > >
> > > > > - The committer's request should be made as a public comment on the
> > > pull
> > > > > request. The committer should refer the contributor to these
> > > development
> > > > > guidelines for inactive pull requests.
> > > > >
> > > > >
> > > > > - If the contributor publically responds to the request, the pull
> > > > > request is no longer consider 'inactive'.
> > > > >
> > > > >
> > > > > - If the contributor does not respond to the request within 2
> weeks,
> > > the
> > > > > pull request is considered 'abandoned'.
> > > > >
> > > > >
> > > > > - A committer can cast a -1 vote on any 'abandoned' pull request
> > using
> > > > > these development guidelines as justification.
> > > > >
> > > > >
> > > > > - A committer can submit a request to Apache Infra to close the
> > > > > 'abandoned' pull request based on this -1 vote.
> > > > >
> > > > > --
> > > > >
> > > > > [1]
> > > > >
> > > > https://cwiki.apache.org/confluence/display/METRON/
> > > Development+Guidelines
> > > > >
> > > > > [2] https://cwiki.apache.org/confluence/display/METRON/
> > > > > Apache+Metron+Bylaws
> > > > >
> > > > > [3]
> > > > > https://lists.apache.org/thread.html/
> a4e72af67994c8e818f843a9ea8cc2
> > > > > 86d81b5c72002fd011d66111f6@%3Cdev.metron.apache.org%3E
> > > > >
> > > >
> > > --
> > >
> > > Jon
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

2018-05-03 Thread Michael Miklavcic

Thanks for the write-up, Ryan. A few questions and comments.

   1. metron-api
  1. "It hasn't been used in a while and will need some end to end
  testing to make sure it still functions properly" > I was probably
  one of the last developers to touch this code a year or more ago
- fwiw, I
  didn't encounter any major issues the last time I ran it up. That aside,
  end to end testing will be much more critical for a new feature.
  2. Mpack work - can you list what you think is needed? Isn't this
  just a jar and some changes to the UI?
  3. This actually might not be a negative, per se. Can you be specific
  about the issues you see in the older code? Personally, I found it much
  quicker to pick up than the Storm topology classes I've been working with
  lately (parser, enrichment, and indexing bolts with 5+ levels of class
  hierarchies with enrichment bolts extending configured indexing bolts, 2+
  types of undocumented initialization routines, BulkMessageWriters,
  BulkMessageComponents, BulkMessageHandlers, AbstractWriters,
  MessageWriters, and a few dozen configuration types used for
  reading/writing to Zookeeper and disc and maintaining an in-memory cache).
   2. microservices
  1. "We have experimented with a proof of concept and found it was too
  hard to add this feature into our existing REST services because of
  all the dependencies that must coexist in the same application."
> Can you
  share the POC example and/or explain the hurdles along with specific
  dependency errors encountered? With Storm not providing
classpath isolation
  and containerization, we have a number of existing
shaded/relocated modules
  in our system. This may be a simple fix and/or an opportunity to improve
  our existing architecture rather than add more non-standard approaches to
  the mix.
  2. What aspects of this approach "will require the most effort?" What
  specifically makes this more work than the other strategies?
  3. Again, can you enumerate the MPack work? What is net-new or does
  not fit with the existing deployment strategy?
   3. pcap_query.sh
  1. "We know the pcap_query.sh script works and would require
minimal changes"
  > this would be no different from metron-api, no?
  2. How would you manage configuration between the UI and pcap
  topology? Would this go in Ambari, management UI, global config - mpack
  work for this?

My take is that this belongs in the existing REST API, as you mention. I'm
not sure how I feel about calling the pcap_query.sh from Java - it seems a
bit hacky, like we're taking a shortcut to avoid fixing another problem
that will cause us to provide a solution that is inconsistent with the rest
of our REST app. It's like having a philips head screwdriver that plugs
into a philips screw that adapts to a flat-head screwdriver end and plugs
into a flat-head screw. Just use a flat-head screwdriver man :) One way to
possibly mitigate dependency issues that we've been having is to construct
a new module specifically for managing our external dependencies that have
been perennial problem children, like Guava and Jackson, and exclude them
everywhere else in the project. Either way, we should deprecate the older
metron-api that hosts the stand-alone PCAP REST service as part of this
effort. I don't think we should leave both of them there unless someone has
a good reason otherwise. Pending a better understanding of the dependency
issues encountered, I'm interested to hear what others think of calling the
shell script from REST vs leveraging the PCAP query code directly.

Other feature considerations

   1. Agreed that this should be made asynchronous via the UI. A polling or
   callback mechanism would be useful.
   2. Take care that a user doesn't hit refresh or POST multiple times and
   kick off 50 mapreduce jobs.
   3. Options for managing the YARN queue that is used
   4. Should we provide a "cancel" option that kills the MR job, or tell
   the user to go to the CLI to kill their job?
   5. Managing data if multiple users run queries.
   6. Job cleanup/TTL
   7. Date range limits on queries - PCAP data is massive by comparison to
   other sensors

Cheers,
Mike

On Thu, May 3, 2018 at 12:35 PM, Ryan Merriman  wrote:

> We are planning on adding the pcap query feature to the Alerts UI.  Before
> we start this work, I think it is important to get community buy in on the
> architectural approach.  There are a couple different options.
>
> One option is to leverage the existing metron-api module that exposes pcap
> queries through a REST service.  The upsides are:
>
>- some work has already been done
>- it's part of our build so we know unit and integration tests pass
>
> The downsides are:
>
>- It hasn't been used in a while and will need some end to end testing
>to make sure it still functions properly
>- It is synchronous an

Re: [DISCUSS] Pcap panel architecture

2018-05-03 Thread Michael Miklavcic

Comments inline below.


On Thu, May 3, 2018 at 3:25 PM, Ryan Merriman  wrote:

> Otto,
>
> I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
> strongly opposed to it being it's own UI.  What are the reasons for doing
> that?
>
> I don't know that we should split them up. It seems like a sub-section or
wizard or some such would be useful here. The use cases I've seen/heard
around PCAP often started with an infosec analyst doing a search on alerts
that followed with them going to query pcap data corresponding to the
threats they're investigating. Maybe we should emphasize streamlining this
experience?


> Mike,
>
> On using metron-api:
>
>1. I'm making an assumption about it not being used much.  Maybe it
>still works without issue.  I agree, we'll have to test anything we
> build
>so this is a minor issue.
>2. Updating metron-api to be asynchronous is a requirement in my opinion
>

Yes, I agree this is reasonable to do.


>3. The MPack work is the major drawback for me.  We're essentially
>creating a brand new Metron component.  There are a lot of examples we
> can
>draw from but it's going to be a large chunk of new MPack code to
> maintain
>and MPack development has been painful in the past.  I think it will
>include:
>   1. Creating a start script
>   2. Creating master.py and commands.py scripts for managing the
>   application lifecycle, service checks, etc
>   3. Creating an -env.xml file for exposing properties in Ambari
>   4. Adding the component to the various MPack files
>   (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
>

Awesome - this is exactly what I/we need to understand your vision for this
and other features, and weigh the pros and cons.


>4. Our Storm topologies are completely different use cases and much more
>complex so I don't understand the comparison.  But if you prefer this
>coding style then I think this is a minor issue as well.
>
> I still don't understand what the specific code style is in Pcap that is
problematic. Even if I might disagree with you (haha, it could be like
arguing spaces vs tabs), you called it out specifically, and I want to
understand your position and reasons. I might agree with you, I might not,
but I do want to understand the point that's being made regardless. Is it
formatting? Class, interface, and package structure? Esoteric names?
Documentation? Tabs vs spaces, lol? We have the ability to change any of
these things, and they don't necessarily (probably shouldn't) be done
inside of this feature. The reason I pointed out the Storm topology pieces
is because they are indeed complex, and I think they can be greatly
simplified. The unified enrichment topology is one such example that came
through testing, but there are other improvements obtained simply by taking
a holistic view of what's there and introducing simplifying refactorings.
Same thing we can do here with Pcap, if it seems useful.


> On micro-services:
>
>1. Our REST service already includes a lot of dependencies and is
>difficult to manage in it's current state.  I just went through this on
>https://github.com/apache/metron/pull/1008.  It was painful.  When we
>tried to include mapreduce and yarn dependencies it became what seemed
> like
>an endless NoSuchMethod, NoClassDef and similar errors.  Even if we can
> get
>it to work it's going to make managing our REST service that much harder
>than it already is.  I think the shaded jars are the source of all this
>trouble and I agree it would be nice to improve our architecture in this
>area.  However I don't think it's a simple fix and now we're getting
> into
>the "will likely take a long time to plan and implement" concern.  If
>anyone has ideas on how to solve our shaded jar challenge I would be all
>for it.
>

That's why I gave my recommendation about managing external dependencies as
a special module. I do see in the PR you cited that, surprise surprise,
Jackson is in the list. I know you're saying "take my word for it, I did a
POC and tried it. It didn't work. Throw that bathwater OUT." I'm just
asking that you at least share the pom files and any other specifics of the
issue so that the community can A) see what the issues and hurdles are so
that we can also weigh the trade-offs that you seem to have already decided
on B) help, and C) have a record of that we can refer to for the next 10
times we go through the exact same thing. This is useful information for
everyone for more than just this feature. If we set precedent here by
punting without a strict and clear set of reasons, we'll literally do it
for every other feature that adds new dependencies going forward. I don't
think we should manage our architecture and features in this manner.

   2. All the MPack work listed above would also be required here.  A
>micro-services pattern is a significant shift and can't even give you

Re: [DISCUSS] Pcap panel architecture

2018-05-03 Thread Michael Miklavcic

Otto, what are you and your customers finding useful and/or difficult from
a split management/alerts UI perspective? It might help us to restate the
original scope and intent around maintaining separate management and alert
UI's, to your point about "contrary to previous direction." I personally
don't have a strong position on this other than 1) management is a
different feature set from drilling into threat intel, yet many apps still
have their management UI combined with the end user experience and 2) we
should probably consider pcap in context of a workflow with alerts.

On Thu, May 3, 2018 at 4:19 PM, Otto Fowler  wrote:

> If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> alerts ui anymore.
>
> It is becoming more of a “composite” app, with multiple feature ui’s
> together.  I didn’t think that
> was what we were going for, thus the config ui and the alert ui.
>
> Just adding disparate thing as ‘new tabs’ to a ui may be expedient but it
> seems contrary to
> our previous direction.
>
> There are a few things to consider if we are going to start moving
> everything into Alerts Ui aren’t there?
>
> It may be a better road to bring it in on it’s own like the alerts ui
> effort, so it can be released with ‘qualifiers’ and tested with
> the right expectations without effecting the Alerts UI.
>
>
>
> On May 3, 2018 at 17:25:54, Ryan Merriman (merrim...@gmail.com) wrote:
>
> Otto,
>
> I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
> strongly opposed to it being it's own UI. What are the reasons for doing
> that?
>
> Mike,
>
> On using metron-api:
>
> 1. I'm making an assumption about it not being used much. Maybe it
> still works without issue. I agree, we'll have to test anything we build
> so this is a minor issue.
> 2. Updating metron-api to be asynchronous is a requirement in my opinion
> 3. The MPack work is the major drawback for me. We're essentially
> creating a brand new Metron component. There are a lot of examples we can
> draw from but it's going to be a large chunk of new MPack code to maintain
> and MPack development has been painful in the past. I think it will
> include:
> 1. Creating a start script
> 2. Creating master.py and commands.py scripts for managing the
> application lifecycle, service checks, etc
> 3. Creating an -env.xml file for exposing properties in Ambari
> 4. Adding the component to the various MPack files
> (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> 4. Our Storm topologies are completely different use cases and much more
> complex so I don't understand the comparison. But if you prefer this
> coding style then I think this is a minor issue as well.
>
> On micro-services:
>
> 1. Our REST service already includes a lot of dependencies and is
> difficult to manage in it's current state. I just went through this on
> https://github.com/apache/metron/pull/1008. It was painful. When we
> tried to include mapreduce and yarn dependencies it became what seemed like
> an endless NoSuchMethod, NoClassDef and similar errors. Even if we can get
> it to work it's going to make managing our REST service that much harder
> than it already is. I think the shaded jars are the source of all this
> trouble and I agree it would be nice to improve our architecture in this
> area. However I don't think it's a simple fix and now we're getting into
> the "will likely take a long time to plan and implement" concern. If
> anyone has ideas on how to solve our shaded jar challenge I would be all
> for it.
> 2. All the MPack work listed above would also be required here. A
> micro-services pattern is a significant shift and can't even give you
> concrete examples of what exactly we would have to do. We would need to go
> through extensive design and planning to even get to that point.
> 3. It would be a branch new component. See above plus any new
> infrastructure we would need (web server/proxy, service discovery, etc)
>
> On pcap-query:
>
> 1. I don't recall any users or customers directly using metron-api but
> if you say so I believe you :)
> 2. As I understand it the pcap topology and pcap query are somewhat
> decoupled. Maybe location of pcap files would be shared? MPack work here
> is likely to include adding a couple properties and moving some around so
> they can be shared. Deciding between Ambari and global config would be
> similar to properties we add to any component.
>
> I think you may be underestimating how difficult it's going to be to solve
> our dependency problem. Or maybe it's me that is overestimating it :) It
> could be something we experiment with before we start on the pcap work.
> There is major upside and it would benefit the whole project. But until
> then we can't fit anymore more screwdrivers in the toolbox. For me the
> only reasonable options are to use the existing metron-api as it's own
> separate service or call out to the pcap_query.sh script from our existing
> REST app. I could go either way rea

Re: [DISCUSS] Pcap panel architecture

2018-05-03 Thread Michael Miklavcic

Yes, completely agreed. We're on the same page.

On Thu, May 3, 2018 at 7:50 PM, Otto Fowler  wrote:

> I think my point is that maybe we should have a discuss about:
>
> * PCAP UI, goals etc
> * Where it would live and why, what that would mean etc
> * Backend ( this original mail )
>
>
>
> On May 3, 2018 at 18:34:00, Michael Miklavcic (michael.miklav...@gmail.com)
> wrote:
>
> Otto, what are you and your customers finding useful and/or difficult from
> a split management/alerts UI perspective? It might help us to restate the
> original scope and intent around maintaining separate management and alert
> UI's, to your point about "contrary to previous direction." I personally
> don't have a strong position on this other than 1) management is a
> different feature set from drilling into threat intel, yet many apps still
> have their management UI combined with the end user experience and 2) we
> should probably consider pcap in context of a workflow with alerts.
>
> On Thu, May 3, 2018 at 4:19 PM, Otto Fowler 
> wrote:
>
> > If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> > alerts ui anymore.
> >
> > It is becoming more of a “composite” app, with multiple feature ui’s
> > together. I didn’t think that
> > was what we were going for, thus the config ui and the alert ui.
> >
> > Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
> it
> > seems contrary to
> > our previous direction.
> >
> > There are a few things to consider if we are going to start moving
> > everything into Alerts Ui aren’t there?
> >
> > It may be a better road to bring it in on it’s own like the alerts ui
> > effort, so it can be released with ‘qualifiers’ and tested with
> > the right expectations without effecting the Alerts UI.
> >
> >
> >
> > On May 3, 2018 at 17:25:54, Ryan Merriman (merrim...@gmail.com) wrote:
> >
> > Otto,
> >
> > I'm assuming just adding it to the Alerts UI is less work but I wouldn't
> be
> > strongly opposed to it being it's own UI. What are the reasons for doing
> > that?
> >
> > Mike,
> >
> > On using metron-api:
> >
> > 1. I'm making an assumption about it not being used much. Maybe it
> > still works without issue. I agree, we'll have to test anything we build
> > so this is a minor issue.
> > 2. Updating metron-api to be asynchronous is a requirement in my opinion
> > 3. The MPack work is the major drawback for me. We're essentially
> > creating a brand new Metron component. There are a lot of examples we
> can
> > draw from but it's going to be a large chunk of new MPack code to
> maintain
> > and MPack development has been painful in the past. I think it will
> > include:
> > 1. Creating a start script
> > 2. Creating master.py and commands.py scripts for managing the
> > application lifecycle, service checks, etc
> > 3. Creating an -env.xml file for exposing properties in Ambari
> > 4. Adding the component to the various MPack files
> > (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> > 4. Our Storm topologies are completely different use cases and much more
> > complex so I don't understand the comparison. But if you prefer this
> > coding style then I think this is a minor issue as well.
> >
> > On micro-services:
> >
> > 1. Our REST service already includes a lot of dependencies and is
> > difficult to manage in it's current state. I just went through this on
> > https://github.com/apache/metron/pull/1008. It was painful. When we
> > tried to include mapreduce and yarn dependencies it became what seemed
> like
> > an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
> get
> > it to work it's going to make managing our REST service that much harder
> > than it already is. I think the shaded jars are the source of all this
> > trouble and I agree it would be nice to improve our architecture in this
> > area. However I don't think it's a simple fix and now we're getting into
> > the "will likely take a long time to plan and implement" concern. If
> > anyone has ideas on how to solve our shaded jar challenge I would be all
> > for it.
> > 2. All the MPack work listed above would also be required here. A
> > micro-services pattern is a significant shift and can't even give you
> > concrete examples of what exactly we would have to do. We would need to
> go
> > through extensive design and planning to even get to that point.
> &

Re: [DISCUSS] Pcap panel architecture

2018-05-03 Thread Michael Miklavcic

Tabs vs spaces was a Silicon Valley joke, man :-)

On Thu, May 3, 2018, 8:42 PM Ryan Merriman  wrote:

> Mike,
>
> I never said there was anything problematic in metron-api, just that is was
> inconsistent with the rest of Metron.  There is work involved in making it
> consistent which is why I listed it as a downside.  I'm less concerned with
> whether we use tabs or spaces but that we use one or the other.
>
> I apologize for not making this clearer in my original message, but I did
> not lead the POC development.  My involvement was helping troubleshoot
> issues they ran into and answering questions about Metron in general.  I've
> shared with you the information that I have which is my observations about
> the types of issues they ran into.  I don't have a branch or pom file you
> can experiment with.  I will reach out to that person and see if they are
> able to share the exact errors they hit.  Also, the "trade-offs that you
> seem to have already decided on" is not based on a specific issue or
> challenge they faced in the POC.  It's based off of the past couple years
> of working on our REST module and the reoccurring challenges and patterns I
> see over a period of time.
>
> Otto,
>
> Makes sense to me.  I will start the other threads.
>
> On Thu, May 3, 2018 at 8:50 PM, Otto Fowler 
> wrote:
>
> > I think my point is that maybe we should have a discuss about:
> >
> > * PCAP UI, goals etc
> > * Where it would live and why, what that would mean etc
> > * Backend ( this original mail )
> >
> >
> >
> > On May 3, 2018 at 18:34:00, Michael Miklavcic (
> michael.miklav...@gmail.com)
> > wrote:
> >
> > Otto, what are you and your customers finding useful and/or difficult
> from
> > a split management/alerts UI perspective? It might help us to restate the
> > original scope and intent around maintaining separate management and
> alert
> > UI's, to your point about "contrary to previous direction." I personally
> > don't have a strong position on this other than 1) management is a
> > different feature set from drilling into threat intel, yet many apps
> still
> > have their management UI combined with the end user experience and 2) we
> > should probably consider pcap in context of a workflow with alerts.
> >
> > On Thu, May 3, 2018 at 4:19 PM, Otto Fowler 
> > wrote:
> >
> > > If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t
> the
> > > alerts ui anymore.
> > >
> > > It is becoming more of a “composite” app, with multiple feature ui’s
> > > together. I didn’t think that
> > > was what we were going for, thus the config ui and the alert ui.
> > >
> > > Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
> > it
> > > seems contrary to
> > > our previous direction.
> > >
> > > There are a few things to consider if we are going to start moving
> > > everything into Alerts Ui aren’t there?
> > >
> > > It may be a better road to bring it in on it’s own like the alerts ui
> > > effort, so it can be released with ‘qualifiers’ and tested with
> > > the right expectations without effecting the Alerts UI.
> > >
> > >
> > >
> > > On May 3, 2018 at 17:25:54, Ryan Merriman (merrim...@gmail.com) wrote:
> > >
> > > Otto,
> > >
> > > I'm assuming just adding it to the Alerts UI is less work but I
> wouldn't
> > be
> > > strongly opposed to it being it's own UI. What are the reasons for
> doing
> > > that?
> > >
> > > Mike,
> > >
> > > On using metron-api:
> > >
> > > 1. I'm making an assumption about it not being used much. Maybe it
> > > still works without issue. I agree, we'll have to test anything we
> build
> > > so this is a minor issue.
> > > 2. Updating metron-api to be asynchronous is a requirement in my
> opinion
> > > 3. The MPack work is the major drawback for me. We're essentially
> > > creating a brand new Metron component. There are a lot of examples we
> > can
> > > draw from but it's going to be a large chunk of new MPack code to
> > maintain
> > > and MPack development has been painful in the past. I think it will
> > > include:
> > > 1. Creating a start script
> > > 2. Creating master.py and commands.py scripts for managing the
> > > application lifecycle, service checks, etc
> > > 3. Creating an -env.xml file for exposi

Re: [DISCUSS] Pcap panel architecture

2018-05-07 Thread Michael Miklavcic

That sounds fine - I'd imagine we'd be looking to hit the classpath related
problems asap when merging the modules.

For the module, we just have a pom that supplies external dependencies.
Rather than every metron module depending on Guava or Jackson directly, or
via transitive dependencies, we specify the version ourselves and depend on
the new module. We can then shade and relocate as needed, which should help
eliminate problems. We did something similar with metron-elasticsearch to
get around conflicts between elasticsearch and storm.
https://github.com/apache/metron/blob/master/metron-platform/metron-elasticsearch/pom.xml#L261

Mike


On Mon, May 7, 2018 at 7:39 AM, Ryan Merriman  wrote:

> Otto, your use case makes sense to me.  We'll have to think about how to
> manage the user to job relationships.  I'm assuming YARN jobs will be
> submitted as the metron service user so YARN won't keep track of this for
> us.  Is that assumption correct?  Do you have any ideas for doing that?
>
> Mike, I can start a feature branch and experiment with merging metron-api
> into metron-rest.  That should allow us to collaborate on any issues or
> challenges.   Also, can you expand on your idea to manage external
> dependencies as a special module?  That seems like a very attractive option
> to me.
>
> On Fri, May 4, 2018 at 8:39 AM, Otto Fowler 
> wrote:
>
> > From my response on the other thread, but applicable to the backend
> stuff:
> >
> > "The PCAP Query seems more like PCAP Report to me.  You are generating a
> > report based on parameters.
> > That report is something that takes some time and external process to
> > generate… ie you have to wait for it.
> >
> > I can almost imagine a flow where you:
> >
> > * Are in the AlertUI
> > * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> > possibly picking from on or more report ‘templates’
> > that have query options etc
> > * The report request is ‘queued’, that is dispatched to be be
> > executed/generated
> > * You as a user have a ‘queue’ of your report results, and when the
> report
> > is done it is queued there
> > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > info/meta has the yarn details )
> > * You can select the report from your queue and view it either in a new
> UI
> > or custom component
> > * You can then apply a different ‘view’ to the report or work with the
> > report data
> > * You can print / save etc
> > * You can associate the report with the alerts ( again in the report info
> > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> >
> >
> > We can introduce extensibility into the report templates, report views (
> > thinks that work with the json data of the report )
> >
> > Something like that.”
> >
> > Maybe we can do :
> >
> > template -> query parameters -> script => yarn info
> > yarn info + query info + alert context + yarn status => report info ->
> > stored in a user’s ‘report queue’
> > report persistence added to report info
> > metron-rest -> api to monitor the queue, read results ( page ), etc etc
> >
> >
> > On May 4, 2018 at 09:23:39, Ryan Merriman (merrim...@gmail.com) wrote:
> >
> > I started a separate thread on Pcap UI considerations and user
> > requirements
> > at Otto's request. This should help us keep these two related but
> separate
> > discussions focused.
> >
> > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul 
> > wrote:
> >
> > > Hello,
> > >
> > >
> > >
> > > (Youhouuu my first reply on this kind of mail chain^^)
> > >
> > >
> > >
> > > If I may, I would like to share my view on the following 3 points.
> > >
> > > - Backend:
> > >
> > > The current metron-api is totally seperate, it will be logic for me to
> > have
> > > it at the same place as the others rest api. Especially when more
> > security
> > > will be added, it will not be needed to do the job twice.
> > > The current implementation send back a pcap object which still need to
> > be
> > > decoded. In the opensoc, the decoding was done with tshard on the
> > frontend.
> > > It will be good to have this decoding happening directly on the backend
> > to
> > > not create a load on frontend. An option will be to install tshark on
> > the
> > > rest server and to use to convert the pcap to xml and then to a json
> > that
> > > will be send to the frontend.
> > >
> > > I tried to start directly the map/reduce job to search over all the
> pcap
> > > data from the rest server and as Ryan mention it, we had trouble. I
> will
> > > try to find back the error.
> > >
> > > Then in the POC, what we tried is to use the pcap_query script and this
> > > work fine. I just modified it that he sends back directly the job_id of
> > > yarn and not waiting that the job is finished. Then it will allow the
> UI
> > > and the rest server to know what the status of the research by querying
> > the
> > > yarn rest api. This will allow the UI and the rest server to be async
> > > without any blocking phase. What do yo

Re: [DISCUSS] Pcap panel architecture

2018-05-07 Thread Michael Miklavcic

What order did you add the hadoop or yarn classpath? The "shaded" package
stands out to me in this name "org.apache.hadoop.hbase.*shaded*
.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
those packages earlier on the classpath.

I think that find command needs a "jar tvf", otherwise you're looking for a
class name in jar file names.

Have you tried shading the rest jar?

I'd also look at the classpath you get when running "yarn jar" to start the
existing pcap service, per the instructions in metron-api/README.md.


On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman  wrote:

> To explore the idea of merging metron-api into metron-rest and running pcap
> queries inside our REST application, I created a simple test here:
> https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.  A
> summary of what's included:
>
>- Added pcap as a dependency in the metron-rest pom.xml
>- Added a pcap query controller endpoint at
>http://node1:8082/swagger-ui.html#!/pcap-query-controller/queryUsingGET
>- Added a pcap query service that runs a simple, hardcoded query
>
> Generate some pcap data using pycapa (
> https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and
> the
> pcap topology (
> https://github.com/apache/metron/tree/master/metron-
> platform/metron-pcap-backend#starting-the-topology).
> After this initial setup there should be data in HDFS at
> "/apps/metron/pcap".  I believe this should be enough to exercise the
> issue.  Just hit the endpoint referenced above.  I tested this in an
> already running full dev by building and deploying the metron-rest jar.  I
> did not rebuild full dev with this change but I would still expect it to
> work.  Let me know if it doesn't.
>
> The first error I see when I hit this endpoint is:
>
> java.lang.NoClassDefFoundError:
> org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
>
> Here are the things I've tried so far:
>
>- Run the REST application with the YARN jar command since this is how
>all our other YARN/MR-related applications are started (metron-api,
> MAAS,
>pcap query, etc).  I wouldn't expect this to work since we have runtime
>dependencies on our shaded elasticsearch and parser jars and I'm not
> aware
>of a way to add additional jars to the classpath with the YARN jar
> command
>(is there a way?).  Either way I get this error:
>
> 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir using
> jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar. skipping.
> java.lang.NullPointerException
>
>
>- I tried adding `yarn classpath` and `hadoop classpath` to the
>classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> script).  I
>get this error:
>
> java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> jaxrs.JacksonJaxbJsonProvider
>
>
>- I searched for the class in the previous attempt but could not find it
>in full dev:
>
> find / -name "*.jar" 2>/dev/null | xargs grep
> org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> jaxrs/JacksonJaxbJsonProvider
> 2>/dev/null
>
>
>- Further up in the stack trace I see the error happens when initiating
>the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.  I tried
>setting "yarn.timeline-service.enabled" in Ambari to false and then I
> get
>this error:
>
> Unable to parse
> '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
> URI, check the setting for mapreduce.application.framework.path
>
>
>- I've tried adding different hadoop, hbase, yarn and mapreduce Maven
>dependencies without any success
>   - hadoop-yarn-client
>   - hadoop-yarn-common
>   - hadoop-mapreduce-client-core
>   - hadoop-yarn-server-common
>   - hadoop-yarn-api
>   - hbase-server
>
> I will keep exploring other possible solutions.  Let me know if anyone has
> any ideas.
>
> On Mon, May 7, 2018 at 9:02 AM, Otto Fowler 
> wrote:
>
> > I can imagine a new generic service(s) capability whose job ( pun
> intended
> > ) is to
> > abstract the submittal, tracking, and storage of results to yarn.
> >
> > It would be extended with storage providers, queue provider, possibly
> some
> > set of policies or rather strategies.
> >
> > The pcap ‘report’ would be a client to that service, the specializes the
> > service operation for the way we want pcap to work.
> >
> > We can then re-use the generic service for other long running yarn
> > things…..
> >
> >
> > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwa...@gmail.com) wrote:
> >
> > RE: Tracking v. users
> >
> > The submittal and tracking can associate the submitter with the yarn job
> > and track that,
> > regardless of the yarn credentials.
> >
> > IE> if all submittals and monitoring are by the same yarn user ( Metron )
> > from a single or
> > co-operative set of services, that service can maintain the mapping.
> >
> >
> >
> > On May 7, 2018 at 09:39:52, Ryan

Re: [DISCUSS] Pcap panel architecture

2018-05-08 Thread Michael Miklavcic

@Ryan - pulled your branch and experimented with a few things. In doing so,
it dawned on me that by adding the yarn and hadoop classpath, you probably
didn't introduce a new classpath issue, rather you probably just moved onto
the next classpath issue, ie hbase per your exception about hbase jaxb.
Anyhow, I put up a branch with some pom changes worth trying in conjunction
with invoking the rest app startup via "/usr/bin/yarn jar"

https://github.com/mmiklavc/metron/tree/ryan-rest-test

https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65b20ca1c0c9

Mike


On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> That would be a step closer to something more like a micro-service
> architecture. However, I would want to make sure we think about the
> operational complexity, and mpack implications of having another server
> installed and running somewhere on the cluster (also, ssl, kerberos, etc
> etc requirements for that service).
>
> On 8 May 2018 at 14:27, Ryan Merriman  wrote:
>
> > +1 to having metron-api as it's own service and using a gateway type
> > pattern.
> >
> > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler 
> > wrote:
> >
> > > Why not have metron-api as it’s own service and use a ‘gateway’ type
> > > pattern in rest?
> > >
> > >
> > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrim...@gmail.com) wrote:
> > >
> > > Moving the yarn classpath command earlier in the classpath now gives
> this
> > > error:
> > >
> > > Caused by: java.lang.NoSuchMethodError:
> > > javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;
> > >
> > > I will experiment with other combinations, I suspect we will need
> > > finer-grain control over the order.
> > >
> > > The grep matches class names inside jar files. I use this all the time
> > and
> > > it's really useful.
> > >
> > > The metron-rest jar is already shaded.
> > >
> > > Reverse engineering the yarn jar command was the next thing I was going
> > to
> > > try. Will let you know how it goes.
> > >
> > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > package
> > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> adding
> > > > those packages earlier on the classpath.
> > > >
> > > > I think that find command needs a "jar tvf", otherwise you're looking
> > > for a
> > > > class name in jar file names.
> > > >
> > > > Have you tried shading the rest jar?
> > > >
> > > > I'd also look at the classpath you get when running "yarn jar" to
> start
> > > the
> > > > existing pcap service, per the instructions in metron-api/README.md.
> > > >
> > > >
> > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman 
> > > wrote:
> > > >
> > > > > To explore the idea of merging metron-api into metron-rest and
> > running
> > > > pcap
> > > > > queries inside our REST application, I created a simple test here:
> > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.
> A
> > > > > summary of what's included:
> > > > >
> > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > - Added a pcap query controller endpoint at
> > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > queryUsingGET
> > > > > - Added a pcap query service that runs a simple, hardcoded query
> > > > >
> > > > > Generate some pcap data using pycapa (
> > > > > https://github.com/apache/metron/tree/master/metron-sensors/pycapa
> )
> > > and
> > > > > the
> > > > > pcap topology (
> > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > After this initial setup there should be data in HDFS at
> > > > > "/apps/metron/pcap". I believe this should be enough to exercise
> the
> > > > > issue. Just hit the endpoint referenced a

Re: [DISCUSS] Pcap panel architecture

2018-05-08 Thread Michael Miklavcic

Sweet! That's great news. The pom changes are a lot simpler than I
expected. Very nice.

On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman  wrote:

> Finally figured it out.  Commit is here:
> https://github.com/merrimanr/incubator-metron/commit/
> 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
>
> It came down to figuring out the right combination of maven dependencies
> and passing in the HDP version to REST as a Java system property.  I also
> included some HDFS setup tasks.  I tested this in full dev and can now
> successfully run a pcap query and get results.  All you should have to do
> is generate some pcap data first.
>
> On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > @Ryan - pulled your branch and experimented with a few things. In doing
> so,
> > it dawned on me that by adding the yarn and hadoop classpath, you
> probably
> > didn't introduce a new classpath issue, rather you probably just moved
> onto
> > the next classpath issue, ie hbase per your exception about hbase jaxb.
> > Anyhow, I put up a branch with some pom changes worth trying in
> conjunction
> > with invoking the rest app startup via "/usr/bin/yarn jar"
> >
> > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> >
> > https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65
> > b20ca1c0c9
> >
> > Mike
> >
> >
> > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> > > That would be a step closer to something more like a micro-service
> > > architecture. However, I would want to make sure we think about the
> > > operational complexity, and mpack implications of having another server
> > > installed and running somewhere on the cluster (also, ssl, kerberos,
> etc
> > > etc requirements for that service).
> > >
> > > On 8 May 2018 at 14:27, Ryan Merriman  wrote:
> > >
> > > > +1 to having metron-api as it's own service and using a gateway type
> > > > pattern.
> > > >
> > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler  >
> > > > wrote:
> > > >
> > > > > Why not have metron-api as it’s own service and use a ‘gateway’
> type
> > > > > pattern in rest?
> > > > >
> > > > >
> > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrim...@gmail.com)
> > wrote:
> > > > >
> > > > > Moving the yarn classpath command earlier in the classpath now
> gives
> > > this
> > > > > error:
> > > > >
> > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > lang/String;
> > > > >
> > > > > I will experiment with other combinations, I suspect we will need
> > > > > finer-grain control over the order.
> > > > >
> > > > > The grep matches class names inside jar files. I use this all the
> > time
> > > > and
> > > > > it's really useful.
> > > > >
> > > > > The metron-rest jar is already shaded.
> > > > >
> > > > > Reverse engineering the yarn jar command was the next thing I was
> > going
> > > > to
> > > > > try. Will let you know how it goes.
> > > > >
> > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > michael.miklav...@gmail.com> wrote:
> > > > >
> > > > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > > > package
> > > > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> > > adding
> > > > > > those packages earlier on the classpath.
> > > > > >
> > > > > > I think that find command needs a "jar tvf", otherwise you're
> > looking
> > > > > for a
> > > > > > class name in jar file names.
> > > > > >
> > > > > > Have you tried shading the rest jar?
> > > > > >
> > > > > > I'd also look at the classpath you get when running "yarn jar" to
> > > start
> > > > > the
> > > > > > existing pcap service, per the instructions in

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

+1

On Wed, May 9, 2018 at 9:13 AM, Casey Stella  wrote:

> Is it about time for a release?  I know we got some substantial performance
> changes in since the last release.  I think we might have a justification
> for a release.
>
> Casey
>

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

Is this what you mean Otto?
https://github.com/apache/metron/blob/24822dddc68c264f59723f5e17d423cd497f6807/dev-utilities/release-utils/validate-jira-for-release

On Wed, May 9, 2018 at 9:52 AM, Casey Stella  wrote:

> I wasn't aware we had a script for that..is that in
> dev-utilities/release-utils?
>
> On Wed, May 9, 2018 at 11:41 AM Otto Fowler 
> wrote:
>
> > Can you run the issues included script and post that for us to see?
> >
> >
> > On May 9, 2018 at 11:14:11, Casey Stella (ceste...@gmail.com) wrote:
> >
> > Is it about time for a release? I know we got some substantial
> performance
> > changes in since the last release. I think we might have a justification
> > for a release.
> >
> > Casey
> >
> >
>

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

I get the following output (incidentally, I'm not sure if this is ok or
not, but I noticed that this script pulled every tag and branch for any and
all remotes I had defined in my local git repo)

~/devprojects/metron/dev-utilities/release-utils$
./validate-jira-for-release --version=0.5.0
--start=tags/apache-metron-0.4.2-releas
   JIRA  STATUS FIX VERSION
 ASSIGNEEFIX
METRON-1530   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1530
METRON-1545   To Do  Ryan
Merriman  https://issues.apache.org/jira/browse/METRON-1545
METRON-1543   To Do Nick
Allen  https://issues.apache.org/jira/browse/METRON-1543
METRON-1539   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1539
METRON-1520   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1520
METRON-1529   To Do Nick
Allen  https://issues.apache.org/jira/browse/METRON-1529
METRON-1511 In Progress Nick
Allen  https://issues.apache.org/jira/browse/METRON-1511
METRON-1528Done  Michael
Miklavcic  https://issues.apache.org/jira/browse/METRON-1528
METRON-1445Done  Michael
Miklavcic  https://issues.apache.org/jira/browse/METRON-1445
METRON-1502   To DoJustin
Leet  https://issues.apache.org/jira/browse/METRON-1502
METRON-1527Done  Michael
Miklavcic  https://issues.apache.org/jira/browse/METRON-1527
METRON-1499DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1499
METRON-1515   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1515
METRON-1522   To Do
Mohan  https://issues.apache.org/jira/browse/METRON-1522
METRON-1519DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1519
METRON-1347Done
 Unassigned  https://issues.apache.org/jira/browse/METRON-1347
METRON-1521   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1521
METRON-1516DoneOtto
Fowler  https://issues.apache.org/jira/browse/METRON-1516
METRON-1494DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1494
METRON-1510Done
 Unassigned  https://issues.apache.org/jira/browse/METRON-1510
METRON-1518DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1518
METRON-1465Done   0.4.3
 Unassigned  https://issues.apache.org/jira/browse/METRON-1465
METRON-1504   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1504
METRON-1505DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1505
METRON-1449DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1449
METRON-1462Done  Michael
Miklavcic  https://issues.apache.org/jira/browse/METRON-1462
METRON-1501   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1501
METRON-1497   To Do
Mohan  https://issues.apache.org/jira/browse/METRON-1497
METRON-1500DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1500
METRON-1491Done   Casey
Stella  https://issues.apache.org/jira/browse/METRON-1491
 METRON-590DoneNext + 1 Nick
Allen   https://issues.apache.org/jira/browse/METRON-590
METRON-1483   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1483
METRON-1487DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1487
METRON-1493DoneNext + 1 Nick
Allen  https://issues.apache.org/jira/browse/METRON-1493
METRON-1397DoneOtto
Fowler  https://issues.apache.org/jira/browse/METRON-1397
METRON-1299DoneOtto
Fowler  https://issues.apache.org/jira/browse/METRON-1299
METRON-1485   To Do Jon
Zeolla  https://issues.apache.org/jira/browse/METRON-1485
METRON-1488   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON-1488
METRON-1490   To Do
 Unassigned  https://issues.apache.org/jira/browse/METRON

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

ON-1429 SearchIntegrationTest refactor (merrimanr)
> > >> closes apache/metron#909
> > >> 3 months ago METRON-1426: SensorIndexingConfigController
> IntegrationTest
> > >> fails intermittently closes apache/metron#906
> > >> 4 months ago METRON-1417: Disable pcap-service by default in Monit
> > >> (mmiklavc via mmiklavc) closes apache/metron#905
> > >> 4 months ago METRON-1400: Elasticsearch service check fails in Ambari
> > >> (mmiklavc via mmiklavc) closes apache/metron#904
> > >> 4 months ago METRON-1428: Travis build failing from metron-config
> > >> (mmiklavc via mmiklavc) closes apache/metron#908
> > >> 4 months ago METRON-1302: Split up Indexing Topology into batch and
> > >> random access sections closes apache/incubator-metron#831
> > >> 4 months ago METRON-1395 Documentation missing for Produce a message
> to
> > a
> > >> Kafka topic Rest API endpoint (MohanDV via nickwallen) closes
> > >> apache/metron#897
> > >> 4 months ago METRON-1411 Fix sed command in Upgrading.md (justinleet)
> > >> closes apache/metron#900
> > >> 4 months ago METRON-1326: Metron deploy with Kerberos fails on Ambari
> > 2.5
> > >> during ES service stop (mmiklavc via mmiklavc) closes
> apache/metron#894
> > >> 4 months ago METRON-1380: Create a typosquatting use-case closes
> > >> apache/incubator-metron#882
> > >> 4 months ago METRON-1230: As a stopgap prior to METRON-777, add more
> > >> simplistic sideloading of custom Parsers closes
> > apache/incubator-metron#785
> > >> 4 months ago METRON-1378: Create a summarizer closes
> > >> apache/incubator-metron#879
> > >> 4 months ago METRON-1231 Separate Sensor name and topic in the
> > Management
> > >> UI (merrimanr) closes apache/metron#786
> > >> 4 months ago METRON-1382 Run Stellar in a Zeppelin Notebook
> (nickwallen)
> > >> closes apache/metron#884
> > >> 4 months ago METRON-1396 Fix .gitignore files to not ignore themselves
> > >> (justinleet) closes apache/metron#896
> > >> 4 months ago METRON-1366: Add an entropy stellar function (cstella via
> > >> mmiklavc) closes apache/metron#872
> > >> 4 months ago METRON-1390: Swagger UI for "Web Security Config"
> > Controller
> > >> needs request method (MohanDV via mmiklavc) closes apache/metron#889
> > >> 4 months ago METRON-1393: Fix bro Elasticsearch template (mmiklavc via
> > >> mmiklavc) closes apache/metron#893
> > >> 4 months ago METRON-1379: Add an OBJECT_GET stellar function closes
> > >> apache/incubator-metron#880
> > >> 4 months ago METRON-939: Upgrade ElasticSearch and Kibana (mmiklavc
> via
> > >> mmiklavc) closes apache/metron#840
> > >> 4 months ago METRON-1377: Stellar function to generate typosquatted
> > >> domains (similar to dnstwist) closes apache/incubator-metron#878
> > >> 4 months ago METRON-1385 Missing "properties" in index
> > template
> > >> causes ElasticsearchColumnMetadataDao.getColumnMetadata to fail
> > >> (merrimanr) closes apache/metron#886
> > >> 4 months ago METRON-1388 update public web site to point at 0.4.2 new
> > >> release (mattf-horton) closes apache/metron#887
> > >> 4 months ago METRON-1362 Improve Metron Deployment README (nickwallen)
> > >> closes apache/metron#869
> > >> 4 months ago METRON-1384 Increment master version number to 0.4.3 for
> > >> on-going development (mattf-horton via nickwallen) closes
> > apache/metron#885
> > >> 4 months ago METRON-1381 Add Apache license to MD files and remove the
> > >> Rat exclusion (justinleet) closes apache/metron#883
> > >> 4 months ago METRON-1071 Create CONTRIBUTING.md (justinleet) closes
> > >> apache/metron#881
> > >> 4 months ago METRON-1373 RAT failure for
> metron-interface/metron-alerts
> > >> (mattf-horton) closes apache/metron#875
> > >> 4 months ago METRON-1351 Create Installable Packages for Ubuntu Trusty
> > >> (nickwallen) closes apache/metron#868
> > >> 5 months ago METRON-1376 RC Check Script should have named parameters
> > >> (ottobackwards via nickwallen) closes apache/metron#877
> > >> 5 months ago METRON-1365: Allow PROFILE_GET to return a default value
> > for
> > >> a profile and entity that does not have a value written. closes
> > >> apache/incubator-metron#871
> > >> 5 months ago METRON-1348 Metron Service Checks Use Wrong Hostname
> > >> (nickwallen) closes apache/metron#864
> > >> 5 months ago METRON-1350: Add reservoir sampling functions to Stellar
> > >> closes apache/incubator-metron#867
> > >> 5 months ago METRON-1374 Script the RC checking process
> (ottobackwards)
> > >> closes apache/metron#876
> > >> 5 months ago METRON-1372 Validate JIRA for Releases (nickwallen)
> closes
> > >> apache/metron#874
> > >> 5 months ago METRON-1345: Update EC2 README for custom Ansible
> (mmiklavc
> > >> via mmiklavc) closes apache/metron#859
> > >> 5 months ago METRON-1349 Full Dev Builds Metron Twice (nickwallen)
> > closes
> > >> apache/metron#866
> > >> 5 months ago METRON-1343 Swagger UI for User Controller needs request
> > >> method (MohanDV via ottobackwards) closes apache/metron#862
> > >> 5 months ago METRON-1306: When index template install fails, we should
> > >> fail the install closes apache/incubator-metron#834
> > >> 5 months ago METRON-1341 Projection FieldTransformation
> > >> (simonellistonball via ottobackwards) closes apache/metron#861
> > >>
> > >>
> > >> On Wed, May 9, 2018 at 11:57 AM, Michael Miklavcic <
> > >> michael.miklav...@gmail.com> wrote:
> > >>
> > >>> Is this what you mean Otto?
> > >>> https://github.com/apache/metron/blob/24822dddc68c264f59723f
> > >>>
> > 5e17d423cd497f6807/dev-utilities/release-utils/validate-jira-for-release
> > >>>
> > >>> On Wed, May 9, 2018 at 9:52 AM, Casey Stella 
> > wrote:
> > >>>
> > >>> > I wasn't aware we had a script for that..is that in
> > >>> > dev-utilities/release-utils?
> > >>> >
> > >>> > On Wed, May 9, 2018 at 11:41 AM Otto Fowler <
> ottobackwa...@gmail.com
> > >
> > >>> > wrote:
> > >>> >
> > >>> > > Can you run the issues included script and post that for us to
> see?
> > >>> > >
> > >>> > >
> > >>> > > On May 9, 2018 at 11:14:11, Casey Stella (ceste...@gmail.com)
> > wrote:
> > >>> > >
> > >>> > > Is it about time for a release? I know we got some substantial
> > >>> > performance
> > >>> > > changes in since the last release. I think we might have a
> > >>> justification
> > >>> > > for a release.
> > >>> > >
> > >>> > > Casey
> > >>> > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
> --
>
> Jon
>

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

I'm also a +1 on 0.5.0. This is a fairly big release.

On Wed, May 9, 2018 at 12:05 PM, Nick Allen  wrote:

> +1 to 0.5.0
>
> On Wed, May 9, 2018 at 1:36 PM, zeo...@gmail.com  wrote:
>
> > I agree that it's probably time (more likely, overdue) for a release.
> > Based off of looking at all of those changes I would also suggest going
> to
> > at least 0.5.x.
> >
> > It probably makes sense to take a look at Upgrading.md
> > <https://github.com/apache/metron/blob/master/Upgrading.md> (and related
> > docs) to make sure it's accurate as well.
> >
> > Jon
> >
> > On Wed, May 9, 2018 at 1:30 PM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > Good call - I thought that made our last release, but this would be the
> > 2nd
> > > follow-on from when Nick originally posed a breakdown
> > > 4 months ago METRON-939: Upgrade ElasticSearch and Kibana (mmiklavc via
> > > mmiklavc) closes apache/metron#840
> > >
> > >
> > > https://lists.apache.org/thread.html/01fb18dd0ee10845588c0c1a4b3f2f
> > 36d7a107c66edd2247f61756c1@%3Cdev.metron.apache.org%3E
> > >
> > > On Wed, May 9, 2018 at 11:18 AM, zeo...@gmail.com 
> > > wrote:
> > >
> > > > We should also mention the Upgrade of ElasticSearch and Kibana
> > > >
> > > > Jon
> > > >
> > > > On Wed, May 9, 2018 at 12:49 PM Nick Allen 
> wrote:
> > > >
> > > > > Oh, and also the Solr work that is currently in a feature branch.
> We
> > > > would
> > > > > have to get the work finished up and merged though.  Sounds like we
> > are
> > > > > real close on that.
> > > > >
> > > > > On Wed, May 9, 2018 at 12:47 PM, Nick Allen 
> > > wrote:
> > > > >
> > > > > > The next part of the conversation would be, what should the
> version
> > > > > number
> > > > > > be?  To help with that, I have tried to summarize the changes in
> > the
> > > > > > release.  Of course, this is going to be heavily biased towards
> my
> > > own
> > > > > > interests, so please feel free to chime in if I have missed
> > anything.
> > > > > >
> > > > > >
> > > > > >- Significant performance improvements in Parsing,
> Enrichments,
> > > and
> > > > > >Indexing
> > > > > >
> > > > > >
> > > > > >- Support for Ubuntu installations
> > > > > >
> > > > > >
> > > > > >- Support for Storm 1.1 and HDP 2.6
> > > > > >
> > > > > >
> > > > > >- Event time processing in the Profiler
> > > > > >
> > > > > >
> > > > > >- Complex document support in the JSONMapParser
> > > > > >
> > > > > >
> > > > > >- Support for the Elasticsearch X-Pack
> > > > > >
> > > > > >
> > > > > >- Run Stellar in a Zeppelin Notebook
> > > > > >
> > > > > >
> > > > > >- Oodles of bug fixes and usability improvements
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, May 9, 2018 at 12:14 PM, Nick Allen 
> > > > wrote:
> > > > > >
> > > > > >> Something like this might be more digestible for these purposes.
> > > > > >>
> > > > > >> $git log --pretty="%cr %s" tags/apache-metron-0.4.2-
> release..HEAD
> > > > > >>
> > > > > >> 88 minutes ago METRON-1530 Default proxy config settings in
> > > > > >> metron-contrib need to be updated (sardell via merrimanr) closes
> > > > > >> apache/metron#998
> > > > > >> 5 days ago METRON-1545 Upgrade Spring and Spring Boot
> (merrimanr)
> > > > closes
> > > > > >> apache/metron#1008
> > > > > >> 7 days ago METRON-1543 Unable to Set Parser Output Topic in
> Sensor
> > > > > Config
> > > > > >> (nickwallen) closes apache/metron#1007
> > > > > >> 2 weeks ago METRON-1539: Specialized RENAME field transformer
> > closes
> > > > &

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

One item we haven't gotten around to was redoing the index names to use a
metron_ prefix. I'm the one that pushed the original DISCUSS thread on
this, but haven't had a chance to advance it. Does anyone have any strong
opinions on it? I originally thought it made sense to include alongside the
other major ES and Solr changes.

On Wed, May 9, 2018 at 12:13 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I'm also a +1 on 0.5.0. This is a fairly big release.
>
> On Wed, May 9, 2018 at 12:05 PM, Nick Allen  wrote:
>
>> +1 to 0.5.0
>>
>> On Wed, May 9, 2018 at 1:36 PM, zeo...@gmail.com 
>> wrote:
>>
>> > I agree that it's probably time (more likely, overdue) for a release.
>> > Based off of looking at all of those changes I would also suggest going
>> to
>> > at least 0.5.x.
>> >
>> > It probably makes sense to take a look at Upgrading.md
>> > <https://github.com/apache/metron/blob/master/Upgrading.md> (and
>> related
>> > docs) to make sure it's accurate as well.
>> >
>> > Jon
>> >
>> > On Wed, May 9, 2018 at 1:30 PM Michael Miklavcic <
>> > michael.miklav...@gmail.com> wrote:
>> >
>> > > Good call - I thought that made our last release, but this would be
>> the
>> > 2nd
>> > > follow-on from when Nick originally posed a breakdown
>> > > 4 months ago METRON-939: Upgrade ElasticSearch and Kibana (mmiklavc
>> via
>> > > mmiklavc) closes apache/metron#840
>> > >
>> > >
>> > > https://lists.apache.org/thread.html/01fb18dd0ee10845588c0c1a4b3f2f
>> > 36d7a107c66edd2247f61756c1@%3Cdev.metron.apache.org%3E
>> > >
>> > > On Wed, May 9, 2018 at 11:18 AM, zeo...@gmail.com 
>> > > wrote:
>> > >
>> > > > We should also mention the Upgrade of ElasticSearch and Kibana
>> > > >
>> > > > Jon
>> > > >
>> > > > On Wed, May 9, 2018 at 12:49 PM Nick Allen 
>> wrote:
>> > > >
>> > > > > Oh, and also the Solr work that is currently in a feature
>> branch.  We
>> > > > would
>> > > > > have to get the work finished up and merged though.  Sounds like
>> we
>> > are
>> > > > > real close on that.
>> > > > >
>> > > > > On Wed, May 9, 2018 at 12:47 PM, Nick Allen 
>> > > wrote:
>> > > > >
>> > > > > > The next part of the conversation would be, what should the
>> version
>> > > > > number
>> > > > > > be?  To help with that, I have tried to summarize the changes in
>> > the
>> > > > > > release.  Of course, this is going to be heavily biased towards
>> my
>> > > own
>> > > > > > interests, so please feel free to chime in if I have missed
>> > anything.
>> > > > > >
>> > > > > >
>> > > > > >- Significant performance improvements in Parsing,
>> Enrichments,
>> > > and
>> > > > > >Indexing
>> > > > > >
>> > > > > >
>> > > > > >- Support for Ubuntu installations
>> > > > > >
>> > > > > >
>> > > > > >- Support for Storm 1.1 and HDP 2.6
>> > > > > >
>> > > > > >
>> > > > > >- Event time processing in the Profiler
>> > > > > >
>> > > > > >
>> > > > > >- Complex document support in the JSONMapParser
>> > > > > >
>> > > > > >
>> > > > > >- Support for the Elasticsearch X-Pack
>> > > > > >
>> > > > > >
>> > > > > >- Run Stellar in a Zeppelin Notebook
>> > > > > >
>> > > > > >
>> > > > > >- Oodles of bug fixes and usability improvements
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, May 9, 2018 at 12:14 PM, Nick Allen > >
>> > > > wrote:
>> > > > > >
>> > > > > >> Something like this might be more digestible for these
>> purposes.
>> > > > > >>
>> > > > > >

Re: [DISCUSS] Pcap UI user requirements

2018-05-09 Thread Michael Miklavcic

We are limited by Yarn and MapReduce applications in the case of
pause/resume - I could be wrong, but I don't think that's something that's
supported unless you're talking about multiple MR jobs strung together.

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application

I don't see anything suggesting "SUSPENDED" or "PAUSED" as we have
available in workflow engines like Oozie.

"The valid application state can be one of the following:  ALL, NEW,
NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED"

Same goes for MR job commands:
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job

Mike

On Mon, May 7, 2018 at 2:04 PM, zeo...@gmail.com  wrote:

> From my perspective PCAP is primarily used as a follow-on to an alert or
> meta-alert - people very rarely use PCAP for initial hunting.  I know this
> has been brought up by Otto, Mike, and Ryan across the two related threads
> and I think it's all spot on.  Going from an alert or meta-alert to pulling
> PCAP would by far the primary use case for this in every SOC I've ever
> worked in (not necessarily a representative sample).
>
> I also have some additional thoughts on the feature side after doing some
> brainstorming and talking to two of the SOCs I work most with:
>  - Limit the size of the PCAP, not just the date range, and maybe even have
> a configurable cluster-wide admin max for PCAP retrieval, set to 0/infinite
> by default.
>  - Set priority of PCAP queries.  Perhaps there's an automated
> pcap retrieval 'just in case', which should have a lower priority than an
> interactive request via the UI.
>  - Ability to pause/resume (not just cancel) jobs.
>  - Configurable cluster-wide admin max # of current PCAP queries, set to
> 0/infinite by default.
>  - Ability to pull PCAP live off the wire and stream it into a file.
>  - Ability to filter PCAP by providing a BPF filter to apply in server-side
> post-processing (less efficient, but very versatile).
>  - Request what PCAP data exists in the cluster (answering "how far back
> can I go?")
>  - This is obvious and is probably assumed, but queries based on any set of
> the network 5 tuple (IPs, Ports, Protocol) with at least 1 required.
>
> Jon
>
> On Fri, May 4, 2018 at 9:44 AM Otto Fowler 
> wrote:
>
> > That is the ‘views’ part.
> >
> > We can have options on the data output, if you have output full data,
> then
> > we can have different views and interactions for inspection and level of
> > detail.
> >
> >
> >
> > On May 4, 2018 at 09:37:13, Michel Sumbul (michelsum...@gmail.com)
> wrote:
> >
> > It can be like a report but also to investigate some case where the user
> > want to see the whole packet (all the bits and bytes). Like in wireshark,
> > something interactive no?
> >
> > 2018-05-04 14:33 GMT+01:00 Otto Fowler :
> >
> > > The PCAP Query seems more like PCAP Report to me. You are generating a
> > > report based on parameters.
> > > That report is something that takes some time and external process to
> > > generate… ie you have to wait for it.
> > >
> > > I can almost imagine a flow where you:
> > >
> > > * Are in the AlertUI
> > > * Ask to generate a PCAP report based on some selected
> alerts/meta-alert,
> > > possibly picking from on or more report ‘templates’
> > > that have query options etc
> > > * The report request is ‘queued’, that is dispatched to be be
> > > executed/generated
> > > * You as a user have a ‘queue’ of your report results, and when the
> > report
> > > is done it is queued there
> > > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > > info/meta has the yarn details )
> > > * You can select the report from your queue and view it either in a new
> > UI
> > > or custom component
> > > * You can then apply a different ‘view’ to the report or work with the
> > > report data
> > > * You can print / save etc
> > > * You can associate the report with the alerts ( again in the report
> info
> > )
> > > with…. a ‘case’ or ‘ticket’ or investigation something or other
> > >
> > >
> > > We can introduce extensibility into the report templates, report views
> (
> > > thinks that work with the json data of the report )
> > >
> > > Something like that.
> > >
> > >
> > > On May 4, 2018 at 09:19:15, Ryan Merriman (merrim...@gmail.com) wrote:
> > >
> > > Continuing a discussion that started in a discuss thread about exposing
> > > Pcap query capabilities in the back end. How should we expose this
> > feature
> > > to users? Should it be integrated into the Alerts UI or be separate
> > > standalone UI?
> > >
> > > To summarize the general points made in the other thread:
> > >
> > > - Adding this capability to the Alerts UI will make it more of a
> > > composite app. Is that really what we want since we have separate UIs
> for
> > > Alerts and management?
> > > - Would it be better to bring it in on it's own so it can be released
> > > with qualifiers a

Re: [DISCUSS] Pcap panel architecture

2018-05-09 Thread Michael Miklavcic

This looks like a pretty good start Ryan. Does the metadata endpoint cover
this https://github.com/apache/metron/tree/master/
metron-platform/metron-api#the-pcapgettergetpcapsbyidentifiers-endpoint
from the original metron-api? If so, then we would be able to deprecate the
existing metron-api project. If we later go to micro-services, a pcap
module would spin back into the fold, but it would probably look different
from metron-api.

I commented on the UI thread, but to reiterate for the purpose of backend
functionality here I don't believe there is a way to "PAUSE" or "SUSPEND"
jobs. That said, I think GET /api/v1/pcap/stop/ is sufficient for
the job management operations.

On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman  wrote:

> Now that we are confident we can run submit a MR job from our current REST
> application, is this the desired approach?  Just want to confirm.
>
> Next I think we should map out what the REST interface will look like.
> Here are the endpoints I'm thinking about:
>
> GET /api/v1/pcap/metadata?basePath
>
> This endpoint will return metadata of pcap data stored in HDFS.  This would
> include pcap size, date ranges (how far back can I go), etc.  It would
> accept an optional HDFS basePath parameter for cases where pcap data is
> stored in multiple places and/or different from the default location.
>
> POST /api/v1/pcap/query
>
> This endpoint would accept a pcap request, submit a pcap query job, and
> return a job id.  The request would be an object containing the parameters
> documented here:  https://github.com/apache/metron/tree/master/
> metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> would be associated with a user that submits it.  An exception will be
> returned for violating constraints like too many queries submitted, query
> parameters out of limits, etc.
>
> GET /api/v1/pcap/status/
>
> This endpoint will return the status of a running job.  I imagine this is
> just a proxy to the YARN REST api.  We can discuss the implementation
> behind these endpoints later.
>
> GET /api/v1/pcap/stop/
>
> This endpoint would kill a running pcap job.  If the job has already
> completed this is a noop.
>
> GET /api/v1/pcap/list
>
> This endpoint will list a user's submitted pcap queries.  Items in the list
> would contain job id, status (is it finished?), start/end time, and number
> of pages.  Maybe there is some overlap with the status endpoint above and
> the status endpoint is not needed?
>
> GET /api/v1/pcap/pdml//
>
> This endpoint will return pcap results for the given page in pdml format (
> https://wiki.wireshark.org/PDML).  Are there other formats we want to
> support?
>
> GET /api/v1/pcap/raw//
>
> This endpoint will allow a user to download raw pcap results for the given
> page.
>
> DELETE /api/v1/pcap/
>
> This endpoint will delete pcap query results.  Not sure yet how this fits
> in with our broader cleanup strategy.
>
> This should get us started.  What did I miss and what would you change
> about these?  I did not include much detail related to security, cleanup
> strategy, or underlying implementation details but these are items we
> should discuss at some point.
>
> On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Sweet! That's great news. The pom changes are a lot simpler than I
> > expected. Very nice.
> >
> > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman 
> wrote:
> >
> > > Finally figured it out.  Commit is here:
> > > https://github.com/merrimanr/incubator-metron/commit/
> > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > >
> > > It came down to figuring out the right combination of maven
> dependencies
> > > and passing in the HDP version to REST as a Java system property.  I
> also
> > > included some HDFS setup tasks.  I tested this in full dev and can now
> > > successfully run a pcap query and get results.  All you should have to
> do
> > > is generate some pcap data first.
> > >
> > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > @Ryan - pulled your branch and experimented with a few things. In
> doing
> > > so,
> > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > probably
> > > > didn't introduce a new classpath issue, rather you probably just
> moved
> > > onto
> > > > the next classpath issue, ie hbase per your exception about hbase
> jaxb.
> > > > Anyhow, I put up a branch with some pom changes worth

Re: [DISCUSS] Release?

2018-05-09 Thread Michael Miklavcic

I don't have a strong opinion. The ES upgrade alone is a massive feature.
It could make it easier to include the index change I mentioned along with
Solr as a follow-up. I think if we did split, we could arguably start on
the next release with Solr almost immediately.

On Wed, May 9, 2018, 12:40 PM Nick Allen  wrote:

> Simon brought up the idea of including the Solr enhancements (currently in
> a feature branch) for the release.  What are people's opinions on this?  Is
> this something that is a blocker for the release?
>
> IMO, there is so much already in master waiting to be released that I don't
> see a need to include it in the next release.  I'd be happy to see that
> Solr work drive a follow-on release.
>
>
>
>
> On Wed, May 9, 2018 at 2:16 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > One item we haven't gotten around to was redoing the index names to use a
> > metron_ prefix. I'm the one that pushed the original DISCUSS thread on
> > this, but haven't had a chance to advance it. Does anyone have any strong
> > opinions on it? I originally thought it made sense to include alongside
> the
> > other major ES and Solr changes.
> >
> > On Wed, May 9, 2018 at 12:13 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I'm also a +1 on 0.5.0. This is a fairly big release.
> > >
> > > On Wed, May 9, 2018 at 12:05 PM, Nick Allen 
> wrote:
> > >
> > >> +1 to 0.5.0
> > >>
> > >> On Wed, May 9, 2018 at 1:36 PM, zeo...@gmail.com 
> > >> wrote:
> > >>
> > >> > I agree that it's probably time (more likely, overdue) for a
> release.
> > >> > Based off of looking at all of those changes I would also suggest
> > going
> > >> to
> > >> > at least 0.5.x.
> > >> >
> > >> > It probably makes sense to take a look at Upgrading.md
> > >> > <https://github.com/apache/metron/blob/master/Upgrading.md> (and
> > >> related
> > >> > docs) to make sure it's accurate as well.
> > >> >
> > >> > Jon
> > >> >
> > >> > On Wed, May 9, 2018 at 1:30 PM Michael Miklavcic <
> > >> > michael.miklav...@gmail.com> wrote:
> > >> >
> > >> > > Good call - I thought that made our last release, but this would
> be
> > >> the
> > >> > 2nd
> > >> > > follow-on from when Nick originally posed a breakdown
> > >> > > 4 months ago METRON-939: Upgrade ElasticSearch and Kibana
> (mmiklavc
> > >> via
> > >> > > mmiklavc) closes apache/metron#840
> > >> > >
> > >> > >
> > >> > >
> https://lists.apache.org/thread.html/01fb18dd0ee10845588c0c1a4b3f2f
> > >> > 36d7a107c66edd2247f61756c1@%3Cdev.metron.apache.org%3E
> > >> > >
> > >> > > On Wed, May 9, 2018 at 11:18 AM, zeo...@gmail.com <
> zeo...@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > We should also mention the Upgrade of ElasticSearch and Kibana
> > >> > > >
> > >> > > > Jon
> > >> > > >
> > >> > > > On Wed, May 9, 2018 at 12:49 PM Nick Allen 
> > >> wrote:
> > >> > > >
> > >> > > > > Oh, and also the Solr work that is currently in a feature
> > >> branch.  We
> > >> > > > would
> > >> > > > > have to get the work finished up and merged though.  Sounds
> like
> > >> we
> > >> > are
> > >> > > > > real close on that.
> > >> > > > >
> > >> > > > > On Wed, May 9, 2018 at 12:47 PM, Nick Allen <
> n...@nickallen.org
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > The next part of the conversation would be, what should the
> > >> version
> > >> > > > > number
> > >> > > > > > be?  To help with that, I have tried to summarize the
> changes
> > in
> > >> > the
> > >> > > > > > release.  Of course, this is going to be heavily biased
> > towards
> > >> my
> > >> > > own
> > >> > > > > > interests, so please feel fr

Re: [DISCUSS] Release?

2018-05-10 Thread Michael Miklavcic

If we're going to put Solr in the next release I think the index name
change can wait for that release as well.

On Thu, May 10, 2018 at 7:09 AM, Nick Allen  wrote:

> > I tend to like grouping the es changes into one release (i.e. include
> the index
> name change) and solr into another (next release).
>
> Is anyone willing to volunteer to do the work for the index name change?
>
> If there are no takers, I think we need to move on and cut a release.
>
>
>
>
> On Thu, May 10, 2018 at 8:43 AM, zeo...@gmail.com 
> wrote:
>
> > I tend to like grouping the es changes into one release (i.e. include the
> > index name change) and solr into another (next release).
> >
> > I think we go too long between releases myself and wouldn't be against
> > doing two releases just a couple of months apart.
> >
> > Jon
> >
> > On Wed, May 9, 2018, 14:47 Michael Miklavcic <
> michael.miklav...@gmail.com>
> > wrote:
> >
> > > I don't have a strong opinion. The ES upgrade alone is a massive
> feature.
> > > It could make it easier to include the index change I mentioned along
> > with
> > > Solr as a follow-up. I think if we did split, we could arguably start
> on
> > > the next release with Solr almost immediately.
> > >
> > > On Wed, May 9, 2018, 12:40 PM Nick Allen  wrote:
> > >
> > > > Simon brought up the idea of including the Solr enhancements
> (currently
> > > in
> > > > a feature branch) for the release.  What are people's opinions on
> this?
> > > Is
> > > > this something that is a blocker for the release?
> > > >
> > > > IMO, there is so much already in master waiting to be released that I
> > > don't
> > > > see a need to include it in the next release.  I'd be happy to see
> that
> > > > Solr work drive a follow-on release.
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, May 9, 2018 at 2:16 PM, Michael Miklavcic <
> > > > michael.miklav...@gmail.com> wrote:
> > > >
> > > > > One item we haven't gotten around to was redoing the index names to
> > > use a
> > > > > metron_ prefix. I'm the one that pushed the original DISCUSS thread
> > on
> > > > > this, but haven't had a chance to advance it. Does anyone have any
> > > strong
> > > > > opinions on it? I originally thought it made sense to include
> > alongside
> > > > the
> > > > > other major ES and Solr changes.
> > > > >
> > > > > On Wed, May 9, 2018 at 12:13 PM, Michael Miklavcic <
> > > > > michael.miklav...@gmail.com> wrote:
> > > > >
> > > > > > I'm also a +1 on 0.5.0. This is a fairly big release.
> > > > > >
> > > > > > On Wed, May 9, 2018 at 12:05 PM, Nick Allen 
> > > > wrote:
> > > > > >
> > > > > >> +1 to 0.5.0
> > > > > >>
> > > > > >> On Wed, May 9, 2018 at 1:36 PM, zeo...@gmail.com <
> > zeo...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > I agree that it's probably time (more likely, overdue) for a
> > > > release.
> > > > > >> > Based off of looking at all of those changes I would also
> > suggest
> > > > > going
> > > > > >> to
> > > > > >> > at least 0.5.x.
> > > > > >> >
> > > > > >> > It probably makes sense to take a look at Upgrading.md
> > > > > >> > <https://github.com/apache/metron/blob/master/Upgrading.md>
> > (and
> > > > > >> related
> > > > > >> > docs) to make sure it's accurate as well.
> > > > > >> >
> > > > > >> > Jon
> > > > > >> >
> > > > > >> > On Wed, May 9, 2018 at 1:30 PM Michael Miklavcic <
> > > > > >> > michael.miklav...@gmail.com> wrote:
> > > > > >> >
> > > > > >> > > Good call - I thought that made our last release, but this
> > would
> > > > be
> > > > > >> the
> > > > > >> > 2nd
> > > > > >> > > follow-on from when Nick originally posed a brea

Re: [DISCUSS] Release Manager

2018-05-10 Thread Michael Miklavcic

Thanks Matt for doing this for the community.

Justin Leet as new lord commander of the Night's Watch? Aye, dilly, dilly.

On Thu, May 10, 2018 at 9:07 AM, Justin Leet  wrote:

> I'd be happy to to volunteer to take over for a while.
>
> Thanks to Matt for all the help through the last couple releases!
>
> Justin
>
> On Thu, May 10, 2018 at 11:06 AM, Casey Stella  wrote:
>
> > Hi All,
> >
> > Matt Foley, our esteemed Release manager for the last couple releases,
> has
> > asked to be relieved.  So, I'm calling on volunteers for the next release
> > manager.  It should be a committer and there are a few things that
> require
> > a PMC member, I believe, but the release manager can ask for help from a
> > PMC member.
> >
> > So, Matt's watch has ended, who wants to volunteer?
> >
> > Casey
> >
>

Re: [DISCUSS] Release?

2018-05-10 Thread Michael Miklavcic

Nick, I just created a Jira for this and included a permalink to the
original DISCUSS thread.

https://issues.apache.org/jira/browse/METRON-1550


On Wed, May 9, 2018 at 12:36 PM, Nick Allen  wrote:

> IMO, It would be nice to have, but I don't consider it a blocker for the
> release.  Of course, if its something that we can knock out soon (this
> week?), then there would be no reason not to include it.
>
> Did you create a JIRA for this one so we can track it?
>
> On Wed, May 9, 2018 at 2:16 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > One item we haven't gotten around to was redoing the index names to use a
> > metron_ prefix. I'm the one that pushed the original DISCUSS thread on
> > this, but haven't had a chance to advance it. Does anyone have any strong
> > opinions on it? I originally thought it made sense to include alongside
> the
> > other major ES and Solr changes.
> >
> > On Wed, May 9, 2018 at 12:13 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I'm also a +1 on 0.5.0. This is a fairly big release.
> > >
> > > On Wed, May 9, 2018 at 12:05 PM, Nick Allen 
> wrote:
> > >
> > >> +1 to 0.5.0
> > >>
> > >> On Wed, May 9, 2018 at 1:36 PM, zeo...@gmail.com 
> > >> wrote:
> > >>
> > >> > I agree that it's probably time (more likely, overdue) for a
> release.
> > >> > Based off of looking at all of those changes I would also suggest
> > going
> > >> to
> > >> > at least 0.5.x.
> > >> >
> > >> > It probably makes sense to take a look at Upgrading.md
> > >> > <https://github.com/apache/metron/blob/master/Upgrading.md> (and
> > >> related
> > >> > docs) to make sure it's accurate as well.
> > >> >
> > >> > Jon
> > >> >
> > >> > On Wed, May 9, 2018 at 1:30 PM Michael Miklavcic <
> > >> > michael.miklav...@gmail.com> wrote:
> > >> >
> > >> > > Good call - I thought that made our last release, but this would
> be
> > >> the
> > >> > 2nd
> > >> > > follow-on from when Nick originally posed a breakdown
> > >> > > 4 months ago METRON-939: Upgrade ElasticSearch and Kibana
> (mmiklavc
> > >> via
> > >> > > mmiklavc) closes apache/metron#840
> > >> > >
> > >> > >
> > >> > > https://lists.apache.org/thread.html/
> 01fb18dd0ee10845588c0c1a4b3f2f
> > >> > 36d7a107c66edd2247f61756c1@%3Cdev.metron.apache.org%3E
> > >> > >
> > >> > > On Wed, May 9, 2018 at 11:18 AM, zeo...@gmail.com <
> zeo...@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > We should also mention the Upgrade of ElasticSearch and Kibana
> > >> > > >
> > >> > > > Jon
> > >> > > >
> > >> > > > On Wed, May 9, 2018 at 12:49 PM Nick Allen 
> > >> wrote:
> > >> > > >
> > >> > > > > Oh, and also the Solr work that is currently in a feature
> > >> branch.  We
> > >> > > > would
> > >> > > > > have to get the work finished up and merged though.  Sounds
> like
> > >> we
> > >> > are
> > >> > > > > real close on that.
> > >> > > > >
> > >> > > > > On Wed, May 9, 2018 at 12:47 PM, Nick Allen <
> n...@nickallen.org
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > The next part of the conversation would be, what should the
> > >> version
> > >> > > > > number
> > >> > > > > > be?  To help with that, I have tried to summarize the
> changes
> > in
> > >> > the
> > >> > > > > > release.  Of course, this is going to be heavily biased
> > towards
> > >> my
> > >> > > own
> > >> > > > > > interests, so please feel free to chime in if I have missed
> > >> > anything.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >- Significant performance improvements in Parsing,
> > >> Enrichments,
&g

Re: [DISCUSS] Metron release 0.5.0

2018-05-16 Thread Michael Miklavcic

Agreed Nick - the ES upgrade was pretty extensive on its own.

On Wed, May 16, 2018 at 5:24 AM, Nick Allen  wrote:

> Going to 0.5.0 is well justified without Solr IMO.
>
> On Wed, May 16, 2018, 7:01 AM Otto Fowler  wrote:
>
> > My question is:  Is updating the version a .4->.5 worthy change or would
> > adding Solr be that change?
> > Should we do another, last .4.x release and bump to .5 when solr hits?
> >
> >
> > On May 15, 2018 at 17:31:27, Nick Allen (n...@nickallen.org) wrote:
> >
> > +1 That plan works for me.
> >
> > IMHO, I don't think there are any open PRs or JIRAs that we need to block
> > the release on.
> >
> > I'd be open to cutting a release sooner too, but waiting until Tuesday
> > also
> > works.
> >
> > On Tue, May 15, 2018 at 4:39 PM, Justin Leet 
> > wrote:
> >
> > > I have a minor adjustment to the proposed timeframe. I'd like to move
> > the
> > > tentative date for starting the RC process to Tuesday the 22nd. I have
> a
> > > prior engagement on Monday that will consume the majority of the day. I
> > > was just too excited about releasing. Sorry about the trouble.
> > >
> > > Justin
> > >
> > > On Tue, May 15, 2018 at 4:26 PM, Justin Leet 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Based on a thread from last week
> > > >  > > 7c9ef9b3d104eab9df6573d9a4@%3Cdev.metron.apache.org%3E>,
> > > > I'll be taking over as the release manager for our next release.
> > Thanks
> > > > again to Matt for his work shepherding our previous releases!
> > > >
> > > > Version Number
> > > > This thread
> > > >  > > b9ec7fe255260ab955b5cdd695@%3Cdev.metron.apache.org%3E> had
> > > > a lot of support for the release bumping to 0.5.0 so I'd like to work
> > off
> > > > that, but we can adjust as needed.
> > > >
> > > > Proposed Timeframe
> > > > I would tentatively like to start work on the RC this coming Monday
> > 21st.
> > > > This can be shifted as needed, either to make time for more PRs to
> > make
> > > it
> > > > in or if we'd like to get things moving faster.
> > > >
> > > > I'm proposing we create this release from the Metron master branch,
> > plus
> > > > any commits the community considers necessary for the release and can
> > get
> > > > in the very near future.
> > > >
> > > > JIRA status
> > > > There are 24 open PRs at https://github.com/apache/metron/pulls. If
> > we
> > > > consider any of these necessary for this release, we should work on
> > > getting
> > > > them closed.
> > > >
> > > > There have been 123 commits since the current release (see the end of
> > the
> > > > message). I'm sure there'll be updates in Jira required to get
> > > everything
> > > > up to date. I will follow up on that, and I'd ask that we work on
> > > > preemptively cleaning our Jira up.
> > > >
> > > > Please respond with any specific PRs we'd like to have in for 0.5.0,
> > > along
> > > > with any Jira issues we feel are worth working on. In particular, in
> > the
> > > > previous discussion, https://issues.apache.org/
> jira/browse/METRON-1550
> > > came
> > > > up as something to potentially pull in, but I didn't see a consensus
> > > form.
> > > >
> > > > Completed PRs as of May 15 as generated by Nick's nifty one liner
> > > > git log --pretty="%cr %s" tags/apache-metron-0.4.2-release..HEAD
> > > >
> > > > Thanks,
> > > > Justin
> > > >
> > > > 3 hours ago METRON-1552: Add gzip file validation check to the geo
> > loader
> > > > (mmiklavc via mmiklavc) closes apache/metron#1011
> > > > 23 hours ago METRON-1551 Profiler Should Not Use Java Serialization
> > > > (nickwallen) closes apache/metron#1012
> > > > 4 days ago METRON-1549: Add empty object test to
> > > WriterBoltIntegrationTest
> > > > implementation (mmiklavc via mmiklavc) closes apache/metron#1009
> > > > 6 days ago METRON-1541 Mvn clean results in git status having deleted
> > > > files. (justinleet via nickwallen) closes apache/metron#1003
> > > > 6 days ago METRON-1461 MIN MAX stellar function should take a stats
> or
> > > > list object and return min/max (MohanDV via nickwallen) closes
> > > > apache/metron#942
> > > > 6 days ago METRON-1184 EC2 Deployment - Updating control_path to
> > > > accommodate for Linux (Ahmed Shah via ottobackwards) closes
> > > > apache/metron#754
> > > > 6 days ago METRON-1530 Default proxy config settings in
> metron-contrib
> > > > need to be updated (sardell via merrimanr) closes apache/metron#998
> > > > 11 days ago METRON-1545 Upgrade Spring and Spring Boot (merrimanr)
> > closes
> > > > apache/metron#1008
> > > > 13 days ago METRON-1543 Unable to Set Parser Output Topic in Sensor
> > > Config
> > > > (nickwallen) closes apache/metron#1007
> > > > 3 weeks ago METRON-1539: Specialized RENAME field transformer closes
> > > > apache/incubator-metron#1002
> > > > 3 weeks ago METRON-1520: Add caching for stellar field
> transformations
> > > > closes apache/in

[DISCUSS] Refactoring

2018-05-29 Thread Michael Miklavcic

I want to bring up the subject of code refactoring and how we should manage
this in PR's as our product evolves. As Metron matures, it's only natural
that we'll have and increasing number of contributors, and subsequently
contributions affecting many hardened parts of the code base. We've
generally not been particular about mixing refactoring changes with other
types of improvements or bug fixes. As a general best practice for software
engineering it is indeed desirable to undergo regular refactoring as a
matter of "scouts' rules" or "fixing broken windows." This helps keep code
readable and has the benefit of a fresh pair of eyes to see code in a new
way that allows the newcomer to introduce clarifying changes that the
original author(s) may not have considered.

While refactoring is generally applauded (because we have unit,
integration, and acceptance tests backing our changes), it does pose some
challenges during the review process. Depending on the type of PR, the
refactoring work can at times be many orders of magnitude larger than the
code pertinent to the desired change in functionality, whether bug fix or
feature enhancement, itself. While tests should protect against unintended
side effects (and sometimes they are also refactored) it does introduce the
possibility of new subtle bugs. It also makes a lot of PR's a conflated mix
of comments pertinent to the improvement/fix and opinions about best
practices around coding style.

I propose a simple change - we update our coding style guidelines in
section 2.1 to expand on refactoring. We currently cover whitespace and
comments:

"Don’t combine code changes with lots of edits of whitespace or comments;
it makes code review too difficult. It’s okay to fix an occasional comment
or indenting, but if wholesale comment or whitespace changes are needed,
make them a separate PR."

I propose we expand this to say:

"Don’t combine code changes with lots of edits of whitespace, comments, or
code changes specifically for refactoring purposes; it makes code review
too difficult. It’s okay to fix an occasional comment or indenting, but if
wholesale comment, whitespace or other refactoring changes are needed, make
them a separate PR."


I believe this provides additional clarity. I think it's one thing to
extract a method or introduce changes for code you're specifically
modifying, and another thing to introduce changes that affect surrounding
code. I would also propose we emphasize the Google checkstyle and
auto-formatting tooling when submitting any changes, but dealing with
enforcement is not my focus for this discuss thread.

https://cwiki.apache.org/confluence/display/METRON/Development+Guidelines

Best,
Michael Miklavcic

Re: [DISCUSS] Refactoring

2018-05-30 Thread Michael Miklavcic

Completely agreed on all points. Can we do that here and spin up a vote
thread following with the final proposed changes?

On Wed, May 30, 2018 at 9:46 AM, Casey Stella  wrote:

> I'm torn on this, honestly.  I completely agree that cosmetic refactoring
> gets in the way of review and the risk can be more than the reward,
> especially in a subtle bit of code.
> That being said, I'm a big fan of opportunistically refactoring to
> generalize or correct faulty assumptions.  Often, I can't justify making an
> abstraction until I have seen the need more than once, so I will make the
> abstraction, as long as it's small and well-contained, in the PR
> opportunistically, that motivates the 2nd usage.  I like that kind of
> opportunistic refactoring and I think that shouldn't be dissuaded.
>
> I agree with Otto, we should have a round of discussion on the doc text
> and I'd suggest we clarify to be cosmetic refactoring solely due to
> readability concerns.
>
> Just my $0.02
>
> On Tue, May 29, 2018 at 7:40 PM Otto Fowler 
> wrote:
>
>> On top of this, refactoring under another PR’s goals tends to be less
>> documented as to the intent
>> and effect.
>>
>> +1 for the idea, we should have a vote round or edit round on the doc’s
>> specific text.
>> Although I will say, that some things it doesn’t matter how much you break
>> them up wrt reviews.
>> We should have so many reviewers that this is a problem.
>>
>>
>>
>>
>> On May 29, 2018 at 20:05:49, Michael Miklavcic (
>> michael.miklav...@gmail.com)
>> wrote:
>>
>> I want to bring up the subject of code refactoring and how we should
>> manage
>> this in PR's as our product evolves. As Metron matures, it's only natural
>> that we'll have and increasing number of contributors, and subsequently
>> contributions affecting many hardened parts of the code base. We've
>> generally not been particular about mixing refactoring changes with other
>> types of improvements or bug fixes. As a general best practice for
>> software
>> engineering it is indeed desirable to undergo regular refactoring as a
>> matter of "scouts' rules" or "fixing broken windows." This helps keep code
>> readable and has the benefit of a fresh pair of eyes to see code in a new
>> way that allows the newcomer to introduce clarifying changes that the
>> original author(s) may not have considered.
>>
>> While refactoring is generally applauded (because we have unit,
>> integration, and acceptance tests backing our changes), it does pose some
>> challenges during the review process. Depending on the type of PR, the
>> refactoring work can at times be many orders of magnitude larger than the
>> code pertinent to the desired change in functionality, whether bug fix or
>> feature enhancement, itself. While tests should protect against unintended
>> side effects (and sometimes they are also refactored) it does introduce
>> the
>> possibility of new subtle bugs. It also makes a lot of PR's a conflated
>> mix
>> of comments pertinent to the improvement/fix and opinions about best
>> practices around coding style.
>>
>> I propose a simple change - we update our coding style guidelines in
>> section 2.1 to expand on refactoring. We currently cover whitespace and
>> comments:
>>
>> "Don’t combine code changes with lots of edits of whitespace or comments;
>> it makes code review too difficult. It’s okay to fix an occasional comment
>> or indenting, but if wholesale comment or whitespace changes are needed,
>> make them a separate PR."
>>
>> I propose we expand this to say:
>>
>> "Don’t combine code changes with lots of edits of whitespace, comments, or
>> code changes specifically for refactoring purposes; it makes code review
>> too difficult. It’s okay to fix an occasional comment or indenting, but if
>> wholesale comment, whitespace or other refactoring changes are needed,
>> make
>> them a separate PR."
>>
>>
>> I believe this provides additional clarity. I think it's one thing to
>> extract a method or introduce changes for code you're specifically
>> modifying, and another thing to introduce changes that affect surrounding
>> code. I would also propose we emphasize the Google checkstyle and
>> auto-formatting tooling when submitting any changes, but dealing with
>> enforcement is not my focus for this discuss thread.
>>
>> https://cwiki.apache.org/confluence/display/METRON/Development+Guidelines
>>
>> Best,
>> Michael Miklavcic
>>
>

Re: [DISCUSS] Refactoring

2018-05-30 Thread Michael Miklavcic

I'm fine with all the above. Allowing for notes of justification to assist
reviewers seems like a good way for us to avoid being pedantic about it.
This is fairly subjective, after all.

On Wed, May 30, 2018 at 9:59 AM, Casey Stella  wrote:

> Yeah, that's true.
>
> On Wed, May 30, 2018 at 8:58 AM Otto Fowler 
> wrote:
>
>> We can say that any refactoring that *is* necessary, needs to be written
>> out and justified in the review.
>> So, we don’t recommend it, but if you have to, and you can reasonably
>> defend it, OK.
>>
>>
>> On May 30, 2018 at 11:53:51, Casey Stella (ceste...@gmail.com) wrote:
>>
>> Yep, I think we can, mike.
>>
>> Let me start with a emendation:
>>
>> "Don’t combine code changes with lots of edits of whitespace, comments,
>> or
>> code changes specifically for cosmetic refactoring purposes aimed solely
>> readability; it makes code review
>> and merging difficult. It’s okay to fix an occasional comment or
>> indentation, but if
>> wholesale comment, whitespace or other refactoring changes are needed,
>> make
>> them a separate PR."
>>
>>
>> On Wed, May 30, 2018 at 8:48 AM Michael Miklavcic <
>> michael.miklav...@gmail.com> wrote:
>>
>> > Completely agreed on all points. Can we do that here and spin up a vote
>> > thread following with the final proposed changes?
>> >
>> > On Wed, May 30, 2018 at 9:46 AM, Casey Stella 
>> wrote:
>> >
>> >> I'm torn on this, honestly. I completely agree that cosmetic
>> refactoring
>> >> gets in the way of review and the risk can be more than the reward,
>> >> especially in a subtle bit of code.
>> >> That being said, I'm a big fan of opportunistically refactoring to
>> >> generalize or correct faulty assumptions. Often, I can't justify
>> making an
>> >> abstraction until I have seen the need more than once, so I will make
>> the
>> >> abstraction, as long as it's small and well-contained, in the PR
>> >> opportunistically, that motivates the 2nd usage. I like that kind of
>> >> opportunistic refactoring and I think that shouldn't be dissuaded.
>> >>
>> >> I agree with Otto, we should have a round of discussion on the doc
>> text
>> >> and I'd suggest we clarify to be cosmetic refactoring solely due to
>> >> readability concerns.
>> >>
>> >> Just my $0.02
>> >>
>> >> On Tue, May 29, 2018 at 7:40 PM Otto Fowler 
>> >> wrote:
>> >>
>> >>> On top of this, refactoring under another PR’s goals tends to be less
>> >>> documented as to the intent
>> >>> and effect.
>> >>>
>> >>> +1 for the idea, we should have a vote round or edit round on the
>> doc’s
>> >>> specific text.
>> >>> Although I will say, that some things it doesn’t matter how much you
>> >>> break
>> >>> them up wrt reviews.
>> >>> We should have so many reviewers that this is a problem.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On May 29, 2018 at 20:05:49, Michael Miklavcic (
>> >>> michael.miklav...@gmail.com)
>> >>> wrote:
>> >>>
>> >>> I want to bring up the subject of code refactoring and how we should
>> >>> manage
>> >>> this in PR's as our product evolves. As Metron matures, it's only
>> natural
>> >>> that we'll have and increasing number of contributors, and
>> subsequently
>> >>> contributions affecting many hardened parts of the code base. We've
>> >>> generally not been particular about mixing refactoring changes with
>> other
>> >>> types of improvements or bug fixes. As a general best practice for
>> >>> software
>> >>> engineering it is indeed desirable to undergo regular refactoring as
>> a
>> >>> matter of "scouts' rules" or "fixing broken windows." This helps keep
>> >>> code
>> >>> readable and has the benefit of a fresh pair of eyes to see code in a
>> new
>> >>> way that allows the newcomer to introduce clarifying changes that the
>> >>> original author(s) may not have considered.
>> >>>
>> >>> While refactoring is generally applauded (because we have unit,
>> >>>

Re: [DISCUSS] Field conversions

2018-06-06 Thread Michael Miklavcic

Yeah Otto, pre-0.5.0 (0.4.2) would be ES 2.3 if users were not using
master. ES upgrade is a big piece of this Apache release. Last release was
0.4.2 on Fri Dec 22 2017.

I'm +1 on the idea of an example referencing the ES docs and keeping this
as simple as possible.

* https://archive.apache.org/dist/metron/

On Tue, Jun 5, 2018 at 11:06 AM, Otto Fowler 
wrote:

> Aren’t people who are on an old version of ES everyone pre 0.5.0?  Like all
> the metron users?
>
>
> On June 5, 2018 at 12:31:30, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Yes, anything using elastic would need the field names changed. That said,
> people who are on such an old version (eol) will need to not the bullet
> with ES compatibility as some point.
>
> Simon
>
> > On 5 Jun 2018, at 17:17, Otto Fowler  wrote:
> >
> > Are there consequences with Kibana as well? queries, visualizations,
> > templates they may have?
> >
> >
> > On June 5, 2018 at 12:03:44, Nick Allen (n...@nickallen.org) wrote:
> >
> > I just don't know if telling users to do a bulk upgrade of their indices
> is
> > sufficient enough of an upgrade path. I would expect some to have
> > downstream processes dependent on those field names, which would also
> need
> > to be updated.
> >
> > Although, we could tell users to do any field name conversions that they
> > depend on using parser transformations; rather than the
> > `FieldNameConverter` abstractions. I *think* that would be a valid
> upgrade
> > path where we could just revert #1022.
> >
> >> On Tue, Jun 5, 2018 at 10:34 AM, Nick Allen  wrote:
> >>
> >> I am in favor of removing the `FieldNameConverter` abstraction as an end
> >> state. Although, I don't agree with Simon that we could have just done
> >> that directly without providing a backwards compatible solution as was
> > done
> >> in #1022. There are too many touch points that rely on that conversion
> > and
> >> users who expect fields to land in their indices named a certain way (no
> >> matter what version of ES they are running). If I am wrong and there is
> a
> >> better approach that works, then we should just revert #1022.
> >>
> >> On Tue, Jun 5, 2018 at 9:37 AM, Simon Elliston Ball <
> >> si...@simonellistonball.com> wrote:
> >>
> >>> I would definitely agree that the transformation should be removed. We
> >>> have
> >>> now however added a complex generic solution in the backend, which is
> >>> going
> >>> to be noop for most people. This was done I believe for the sake of
> >>> backward compatibility. I would argue however, that there is no need to
> >>> support ES 2.3, and therefore no need to support de-dotting
> >>> transformations. This does seem somewhat over-engineered to me, though
> > it
> >>> does save people re-indexing on upgrades. I suspect in reality that
> this
> >>> is
> >>> a rare edge case, and that we would do far better to settle on one
> >>> solution
> >>> (the dotted version, not the colons, to my mind)
> >>>
> >>> Simon
> >>>
>  On 5 June 2018 at 06:29, Ryan Merriman  wrote:
> 
>  I agree completely. I will leave this thread open for a day or two to
> >>> give
>  others a chance to weigh in. If no one opposes, I will creates Jiras
> >>> for
>  removing field transformations and transforming existing data.
> 
>  On Tue, Jun 5, 2018 at 8:21 AM, Casey Stella 
> >>> wrote:
> 
> > Well, on write it is a transformation, on read it's a translation.
> >>> This
>  is
> > to say that you're providing a mapping on read to translate field
> >>> names
> > given the index you're using. The other approach that I was
> >>> considering
> > last night is a field transformation REST call which translates
> > field
>  names
> > that the UI could call. So, the UI would pass 'source.type' to the
> >>> field
> > translation service and in Solr it'd return source.type and in ES
> > it'd
> > return source:type. Underneath the hood the service would use the
> >>> same
> > transformation as the writer uses. That's another way to skin this
> >>> cat.
> >
> > Ultimately, I think we should just ditch this field transformation
> > business, as Laurens said, as long as we have a utility to transform
> > existing data.
> >
> > On Tue, Jun 5, 2018 at 8:54 AM Ryan Merriman 
>  wrote:
> >
> >> Having 2 different patterns for configuring field name
> >>> transformations
>  on
> >> read vs write is confusing to me. I agree with both of you that
> >> normalizing on '.' and not having to do the translation at all
> >>> would be
> >> ideal. Like you both suggested, we would need some utility or
> >>> script
>  to
> >> convert preexisting data to match this format. There could also be
>  some
> >> adjustments a user would need to make in the UI but I feel like we
>  could
> >> document around that. Are there any objections to doing it this
> >>> way?
> >>
> >>
> >>
> >> On Mon, Jun 4, 20

Re: [GitHub] metron issue #1045: METRON-1594: KafkaWriter is asynchronous and may lose da...

2018-06-07 Thread Michael Miklavcic

Yes Otto

On Thu, Jun 7, 2018, 9:14 AM ottobackwards  wrote:

> Github user ottobackwards commented on the issue:
>
> https://github.com/apache/metron/pull/1045
>
> I assume you are talking to @nickwallen there @mmiklavc ?
>
>
> ---
>

Re: Using Java Rest Client instead of Transport Client for Elasticsearch

2018-06-13 Thread Michael Miklavcic

I think there is some level of support for auth via their REST api, but I
don't see anything specific to X-Pack as you mentioned. However, the major
reason we did not adopt it at the time of upgrade was because a number of
features were not available to REST yet and the effort to simultaneously
upgrade ES and migrate the API to REST was an effort decidedly too large
for the scope of the PR at the time.

On Wed, Jun 13, 2018 at 8:39 AM, Casey Stella  wrote:

> It was my understanding was that ES x-pack only supports the transport
> client (e.g.
> https://www.elastic.co/guide/en/x-pack/current/java-clients.html).  I
> think
> that was a major reason why we chose to go that route.  I might be wrong
> though.
>
> On Wed, Jun 13, 2018 at 10:30 AM Ali Nazemian 
> wrote:
>
> > Hi All,
> >
> >
> > I have noticed that the recommendation from Elasticsearch team is changed
> > to use Java Rest Client instead of Transport one. The rationale behind it
> > looks convincing and it can also help Metron to be more decoupled from
> > Elasticsearch roadmap, so Metron users can upgrade Elasticsearch with
> > minimum dependency to Metron support.
> >
> >
> > https://www.elastic.co/blog/state-of-the-official-
> elasticsearch-java-clients
> >
> > P.S: Transport client will be deprecated in ES 7 and will be removed
> > completely on 8.
> >
> >
> > Regards,
> > Ali
> >
>

Re: [DISCUSS] Merging Solr feature branch (METRON-1416) into master

2018-06-21 Thread Michael Miklavcic

+1 let's do it.

On Thu, Jun 21, 2018, 2:01 PM Nick Allen  wrote:

> +1 I think we should merge ASAP and kill the feature branch.  I think the
> work has well surpassed the level required to get it into master.
>
> On Thu, Jun 21, 2018 at 1:20 PM, Justin Leet 
> wrote:
>
> > Hi All,
> >
> > The Solr branch (/feature/METRON-1416-upgrade-solr
> >  >),
> > has been progressing for a while now.  I'd like to open up discussion
> > around what it takes to get it into master.
> >
> > The JIRA for tracking this feature branch is METRON-1416
> > .
> >
> > As shown in the JIRA, the majority of tasks are complete, with a few
> > outstanding issues. Of these, I believe these are the main ones of
> interest
> > to this discussion.
> >
> >- METRON-1629  -
> >There is an active PR #1072 <
> https://github.com/apache/metron/pull/1072
> > >
> >- METRON-1609  -
> >There is an active PR #1056 <
> https://github.com/apache/metron/pull/1056
> > >
> >- METRON-1602  -
> > Full
> >dev can run with Solr without this, it would simply be more
> convenient.
> >- METRON-1632  -
> >Causes a metaalert specific issue where UI filtering on
> >source.type:metaalert fails. More detail is on the Jira.
> >- Two validation tickets.  It's been run up on multinode, and manual
> >testing has happened (and I'm will be seen a bit more on the final PR
> by
> >various reviewers), so I'm inclined to just leave these open until
> we're
> >good to go.  Let me know if we want to handle this differently.
> >
> > I'm of the opinion both of the active PRs need to be merged before we
> merge
> > this into master, especially the documentation one.  The other two
> tickets
> > can be done in the future; one can be worked around and one is a
> metaalert
> > specific issue that primarily effects the alerts UI.
> >
> > As the branch has grown and diverged from master, it's gotten
> increasingly
> > unwieldy to maintain (and I think it's worth a follow-on discussion about
> > how we manage refactorings that happen in these sorts of branches).  I
> know
> > there's been at least a couple merges from master that have been
> > nontrivially difficult and required careful testing, particularly around
> > the DAO layer, to avoid regressions in both code and tests.
> >
> > The feature set is pretty complete.  The UI works, barring the metaalert
> > issue.  Much of the backend has been refactored and seen improved test
> > coverage benefiting both Solr and Elasticsearch.  The main difference
> > between ES and Solr is the lack of the equivalent visualizations to
> > Kibana.  I don't believe the feature branch needs to wait for this, as
> it's
> > pretty standalone work that can be added as usage and demand dictates.
> >
> > I'm of the opinion that the benefits of getting the branch into master
> > outweighs the issues still present, especially in terms of making
> > refactoring and features available and easing the dev burden.  The
> > remaining tickets are Solr specific, and ES functions as it does in
> master.
> >
> > Are there any must-haves before we bring this branch back?  Are there any
> > other concerns we have before a final PR is opened (pending completion of
> > active PRs and any other must-haves)?
> >
> > Justin
> >
>

Re: [DISCUSS] Treating null as false in boolean expressions in Stellar

2018-06-23 Thread Michael Miklavcic

I'm for both of these changes. +1 on the overall idea.

On Tue, Jun 19, 2018, 5:58 AM Charles Joynt 
wrote:

> I'd welcome both of these on the grounds that they'll make life easier
> writing short(er) Stellar code AND deciphering what someone else has
> written.
>
> -Original Message-
> From: Casey Stella [mailto:ceste...@gmail.com]
> Sent: 16 June 2018 18:33
> To: dev@metron.apache.org
> Subject: Re: [DISCUSS] Treating null as false in boolean expressions in
> Stellar
>
> I created a PR for the empty collection falseyness as well:
> https://github.com/apache/metron/pull/1064 so we can choose either of
> them if we so desire.
>
> On Sat, Jun 16, 2018 at 1:10 PM Casey Stella  wrote:
>
> > I created a PR for this functionality, in case we decided for it:
> > https://github.com/apache/metron/pull/1063
> >
> > Also, while we're talking, perhaps we should treat empty lists as
> > false as well, like javascript and python.
> > So, for instance, if [] then 'blah' else 'foo' would return foo.
> >
> > Thoughts?
> >
> > On Sat, Jun 16, 2018 at 10:17 AM Casey Stella 
> wrote:
> >
> >> Right now, because fields may not exist, users can have an awkward time.
> >> For instance, checking for is_alert, you end up having to preface
> >> checks with exists(is_alert).
> >>
> >> For instance, in one of our use-cases:
> >> https://github.com/apache/metron/tree/master/use-cases/geographic_log
> >> in_outliers
> >> we use
> >>
> >> "is_alert := exists(is_alert) && is_alert", "is_alert := is_alert ||
> >> (geo_outlier != null && geo_outlier == true)",
> >>
> >>  instead of :
> >>
> >> "is_alert := is_alert || geo_outlier == true",
> >>
> >> I suggest that we adopt a convention from javascript whereby we
> >> assume a field not existing or being null should act as false in
> >> boolean expressions.  This will simplify stellar's use and hopefully
> >> result in less awkwardness.
> >>
> >> Thoughts?
> >>
> >
>
> --
> G-RESEARCH believes the information provided herein is reliable. While
> every care has been taken to ensure accuracy, the information is furnished
> to the recipients with no warranty as to the completeness and accuracy of
> its contents and on condition that any errors or omissions shall not be
> made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and do
> not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by
> G-RESEARCH and are subject to archival storage, monitoring, review and
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company number
> 08127121.
> --
>

Re: [DISCUSS] Batch Profiler

2018-07-28 Thread Michael Miklavcic

+1 on the feature branch, Nick. I'll start reviewing the write-ups shortly.

On Fri, Jul 27, 2018, 9:29 AM Nick Allen  wrote:

> Hi Everyone -
>
> A while back I opened up a discuss thread around the general idea of a
> Batch Profiler [1].  I'd like to start making progress on a first draft of
> that functionality.
>
> I created METRON-1699 [2] which outlines the general approach and ideas.
> If you're interested, review that JIRA and let me know if you have
> feedback.  I will be adding sub-tasks to that JIRA as I make progress and
> can separate it into logical bits for review.
>
> I would like this effort to use a feature branch as it will take a number
> of PRs to get a first cut on the functionality.  Pending no disagreement, I
> will create the feature branch based on METRON-1699.
>
> [1]
>
> https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4ee601041fb47bfc97acb6825083@%3Cdev.
> ..
> <
> https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4ee601041fb47bfc97acb6825083@%3Cdev.metron.apache.org%3E
> >
> [2] https://issues.apache.org/jira/browse/METRON-1699
>

Re: Feature Branch Process

2018-07-30 Thread Michael Miklavcic

Hey Nick, thanks for starting this thread. Some thought of mine from recent
work in feature branches for Solr and the Pcap query panel:

   1. As a general rule, feature branches should be started for code
   changes that are not able to be delivered in 1-2 PR's of no more than 1-2k
   lines, but as usual a developer/contributor can make the case for
   exceptions. There should also be a DISCUSS to kick it off with a breakdown
   of what the feature branch will deliver. Your recent DISCUSS around the
   profiler is a good example.

https://lists.apache.org/thread.html/da81c1227ffda3a47eb2e5bb4d0b162dd6d36006241c4ba4b659587b@%3Cdev.metron.apache.org%3E
   2. I think the feature branch should include a parent Epic/Jira that
   describes the end state with accompanying Jiras delivered along the way.
  1. This is not a static list, but rather something that evolves as
  the feature is fleshed out.
  2. Any individual PR's with follow-on work per the comments on that
  PR should be added as a task to the final feature branch epic.
The point is
  that all PR's can be committed prior to every check list item,
but there is
  always a final accounting and balancing of the books. All PR's should
  continue to have accountability and roll up into the final
deliverable and
  the feature branch should not be merged into master until either a) all
  items have been addressed or b) a case has been made for making
those items
  follow-on work in master after the merge.

Best,
Mike

On Mon, Jul 2, 2018 at 12:08 PM Nick Allen  wrote:

> Maintaining a feature branch (FB) comes with its own overhead.  The longer
> we take to merge back into the main line, the harder it becomes.  I think
> we could have merged the Solr work sooner and avoided some of that
> overhead.  It might help to answer these questions in the Developer
> Guidelines.
>
> (Q) When should a feature branch be started?
>
> (Q) When is a feature branch finished?  When is it ready to be merged into
> master?
>
>
>
>
>
>
>
> On 2018/07/02 17:56:46, Nick Allen  wrote:
> > We recently merged our first feature branch back into master; the
> enhanced>
> > support for Solr.  How did the feature branch "process" go from start to>
> > finish?  What were some of the pain points?  What can we learn from the>
> > experience?>
> >
>

Re: Knox SSO feature branch PRs: a quick demo

2018-08-02 Thread Michael Miklavcic

Nice! Thanks for the walk-through, Simon. Agreed that this should assist
reviewers in understanding the feature branch and PR break-down.

Cheers,
Mike

On Thu, Aug 2, 2018 at 8:51 AM larry mccay  wrote:

> Hi Simon -
>
> I like how you walk through those various PRs and describe what is done at
> each step.
> Please feel free sure to bring any suggestions that you may have for
> improvements in Knox and KnoxSSO to the community.
> The public cert issue is a pain for sure - we do have a knoxcli command for
> exporting it but you need to be on the Knox machine to do that.
> I've been considering add an API to the admin API for retrieving it as well
> - but of course it will need to be up and running for that.
>
> thanks,
>
> --larry
>
>
> On Wed, Aug 1, 2018 at 11:34 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > I've recently put in a number of PRs on the Knox feature branch, and
> > thought it might be useful to post a quick 'sprint demo' style
> explanation
> > of what the various PRs and functionality entails:
> > https://youtu.be/9OJz6hg0N1I
> >
> > Hope this helps with review process. There are a couple of areas where
> that
> > need a little follow on improvement (Ambari mpack cosmetic oddness
> mainly).
> > Any thoughts and assistance on that would be very greatly appreciated.
> >
> > Simon
> >
>

Pcap Query Panel feature branch status

2018-08-07 Thread Michael Miklavcic

I'd like to put up a DISCUSS thread in the next day or so regarding getting
the pcap feature branch merged. In preparation, I am going to share an
accounting of completed and outstanding tasks. Can folks that have
contributed update their Jira status in the subtasks? It looks like the
current state of affairs is a bit outdated and I'd like to have this
buttoned up before we officially present this to the community.

https://issues.apache.org/jira/browse/METRON-1554

Also, any Jiras that have been created that are relevant to the feature
branch but have not been made subtasks should be converted.

   - Open the Jira
   - select "More"
   - choose "convert to subtask."
   - Search for METRON-1554 in the search box and select the Pcap epic that
   shows up.

Thanks,
Mike

Re: [DISCUSS] Metron Parsers in Nifi

2018-08-08 Thread Michael Miklavcic

I think it also provides customers greater control over their architecture
by giving them the flexibility to choose where/how to host their parsers.

To Justin's point about the API, my biggest concern about the RecordReader
approach is that it is not stable. We already have a similar problem in
having the TransportClient in ElasticSearch - they are prone to changing it
in minor versions with the advent of their newer REST API, which is
problematic for ensuring a stable installation.

>From my own perspective, our goal with NiFi, at least in part, should be
the ability to deploy our core parsing infrastructure, i.e.

   - pre-built parsers
   - custom java parsers
   - Stellar transforms
   - custom stellar transforms

And have the ability to configure it similarly to how we configure parsers
within Storm. Consistent with our recent parser chaining and aggregation
feature, users should be able to construct and deploy similar constructs in
NiFi. The core architectural shift would be that parser code should be
platform agnostic. We provide the plumbing in Storm, NiFi, and  and platform architects and devops teams can choose how
and where to deploy.

Best,
Mike


On Wed, Aug 8, 2018 at 9:57 AM James Sirota  wrote:

> Integration with NiFi would be useful for parsing low-volume telemetries
> at the edge.  This is a much more resource friendly way to do it than
> setting up dedicated storm topologies.  The integration would be that the
> NiFi processor parses the data and pushes it straight into the enrichment
> topic, saving us the resources of having multiple parsers in storm
>
> Thanks,
> James
>
> 07.08.2018, 11:29, "Otto Fowler" :
> > Why do we start over. We are going back and forth on implementation, and
> I
> > don’t think we have the same goals or concerns.
> >
> > What would be the requirements or goals of metron integration with Nifi?
> > How many levels or options for integration do we have?
> > What are the approaches to choose from?
> > Who are the target users?
> >
> > On August 7, 2018 at 12:24:56, Justin Leet (justinjl...@gmail.com)
> wrote:
> >
> > So how does the MetronRecordReader roll into everything? It seems like
> it'd
> > be more useful on the reader per format approach, but otherwise it
> doesn't
> > really seem like we gain much, and it requires getting everything linked
> up
> > properly to be used. Assuming we looked at doing it that way, is the idea
> > that we'd setup a ControllerService with the MetronRecordReader and a
> > MetronRecordWriter and then have the StellarTransformRecord processor
> > configured with those ControllerServices? How do we manage the
> > configurations of the everything that way? How does the ControllerService
> > get configured with whatever parser(s) are needed in the flow? Basically,
> > what's your vision for how everything would tie together?
> >
> > I also forgot to mention this in the original writeup, but there's
> another
> > reason to avoid the RecordReader: It's not considered stable. See
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> .
> > That alone makes me super hesitant to use it, if it can shift out from
> > under us in even in incremental version.
> >
> > I'm also unclear on why StellarTransformRecord processor matters for
> either
> > approach. With the Processor approach you could simply follow it up with
> > the Stellar processor, the same way you'd would in the RecordReader
> > approach. The Stellar processor should be a parallel improvement, not a
> > conflicting one.
> >
> > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler 
> wrote:
> >
> >>  A Metron Processor itself isn’t really necessary. A MetronRecordReader
> (
> >>  either the megalithic or a reader per format ) would be a good
> approach.
> >>  Then have StellarTransformRecord processor that can do Stellar on _any_
> >>  record, regardless of source.
> >>
> >>  On August 7, 2018 at 11:06:22, Justin Leet (justinjl...@gmail.com)
> wrote:
> >>
> >>  Thanks for the comments, Otto, this is definitely great feedback. I'd
> >>  love to respond inline, but the email's already starting to lose it's
> >>  formatting, so I'll go with the classic "wall of text". Let me know if
> I
> >>  didn't address everything.
> >>
> >>  Loading modules (or jars or whatever) outside of our Processor gives us
> >>  the benefit of making it incredibly easy for a users to create their
> own
> >>  parsers. I would definitely expect our own bundled parsers to be
> included
> >>  in our base NAR, but loading modules enables users to only have to
> learn
> >>  how Metron wants our stuff lined up and just plug it in. Having said
> that,
> >>  I could see having a wrapper for our bundled parsers that makes it
> really
> >>  easy to just say you want an MetronAsaParser or MetronBroParser, etc.
> That
> >>  would give us the best of both worlds, where it's easy to get setup our
> >>  bundled parsers and also trivial to pul

[ANNOUNCE] - Apache Metron Slack channel

2018-08-15 Thread Michael Miklavcic

The Metron community has a Slack channel available for communication
(similar to the existing IRC channel, only on Slack).

To join:

   1. Go to slack.com.
   2. For organization/group, you'll enter "the-asf"
   3. Use your Apache email for your login
   4. Click "Channels" and look for #metron (Created by ottO June 15, 2018)

Best
Mike Miklavcic

Re: [ANNOUNCE] - Apache Metron Slack channel

2018-08-15 Thread Michael Miklavcic

It's another option with different features. I imagine many people will use
both.

On Wed, Aug 15, 2018, 9:14 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Since this is committers only, would it make more sense to stick to IRC? Or
> is exclusivity the idea?
>
> On 15 August 2018 at 16:09, Nick Allen  wrote:
>
> > Thanks for the instructions!
> >
> > On Wed, Aug 15, 2018 at 10:22 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > The Metron community has a Slack channel available for communication
> > > (similar to the existing IRC channel, only on Slack).
> > >
> > > To join:
> > >
> > >1. Go to slack.com.
> > >2. For organization/group, you'll enter "the-asf"
> > >3. Use your Apache email for your login
> > >4. Click "Channels" and look for #metron (Created by ottO June 15,
> > 2018)
> > >
> > > Best
> > > Mike Miklavcic
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [ANNOUNCE] - Apache Metron Slack channel

2018-08-15 Thread Michael Miklavcic

Turns out we are able to invite folks on an ad-hoc basis. See instructions
here -
https://cwiki.apache.org/confluence/display/METRON/Community+Resources


On Wed, Aug 15, 2018 at 9:23 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> It's another option with different features. I imagine many people will
> use both.
>
> On Wed, Aug 15, 2018, 9:14 AM Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Since this is committers only, would it make more sense to stick to IRC?
>> Or
>> is exclusivity the idea?
>>
>> On 15 August 2018 at 16:09, Nick Allen  wrote:
>>
>> > Thanks for the instructions!
>> >
>> > On Wed, Aug 15, 2018 at 10:22 AM, Michael Miklavcic <
>> > michael.miklav...@gmail.com> wrote:
>> >
>> > > The Metron community has a Slack channel available for communication
>> > > (similar to the existing IRC channel, only on Slack).
>> > >
>> > > To join:
>> > >
>> > >1. Go to slack.com.
>> > >2. For organization/group, you'll enter "the-asf"
>> > >3. Use your Apache email for your login
>> > >4. Click "Channels" and look for #metron (Created by ottO June 15,
>> > 2018)
>> > >
>> > > Best
>> > > Mike Miklavcic
>> > >
>> >
>>
>>
>>
>> --
>> --
>> simon elliston ball
>> @sireb
>>
>

Re: [ANNOUNCE] - Apache Metron Slack channel

2018-08-15 Thread Michael Miklavcic

+ Metron user list

On Wed, Aug 15, 2018 at 10:38 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Turns out we are able to invite folks on an ad-hoc basis. See instructions
> here -
> https://cwiki.apache.org/confluence/display/METRON/Community+Resources
>
>
> On Wed, Aug 15, 2018 at 9:23 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
>> It's another option with different features. I imagine many people will
>> use both.
>>
>> On Wed, Aug 15, 2018, 9:14 AM Simon Elliston Ball <
>> si...@simonellistonball.com> wrote:
>>
>>> Since this is committers only, would it make more sense to stick to IRC?
>>> Or
>>> is exclusivity the idea?
>>>
>>> On 15 August 2018 at 16:09, Nick Allen  wrote:
>>>
>>> > Thanks for the instructions!
>>> >
>>> > On Wed, Aug 15, 2018 at 10:22 AM, Michael Miklavcic <
>>> > michael.miklav...@gmail.com> wrote:
>>> >
>>> > > The Metron community has a Slack channel available for communication
>>> > > (similar to the existing IRC channel, only on Slack).
>>> > >
>>> > > To join:
>>> > >
>>> > >1. Go to slack.com.
>>> > >2. For organization/group, you'll enter "the-asf"
>>> > >3. Use your Apache email for your login
>>> > >4. Click "Channels" and look for #metron (Created by ottO June 15,
>>> > 2018)
>>> > >
>>> > > Best
>>> > > Mike Miklavcic
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> --
>>> simon elliston ball
>>> @sireb
>>>
>>

Re: Slack Channel

2018-08-15 Thread Michael Miklavcic

Invite sent

On Wed, Aug 15, 2018 at 10:57 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Hello dev team, may I please join your slack channel :)
>

Re: [DISCUSS] Metron Release 0.6.0?

2018-08-15 Thread Michael Miklavcic

+1 here as well to the proposed releases.

On Wed, Aug 15, 2018 at 11:06 AM Casey Stella  wrote:

> +1 to both releases, this is plenty for an 0.6.0 and a 0.2.0
>
> On Wed, Aug 15, 2018 at 11:04 AM Justin Leet 
> wrote:
>
> > I just sent a thread about release cadence. Jon, I'd recommend starting a
> > thread on a 1.0 roadmap.  I thought about merging the threads, but I
> think
> > that's just going to result in more crosstalk, so I'll let you start that
> > conversation.
> >
> > On Wed, Aug 15, 2018 at 10:37 AM Nick Allen  wrote:
> >
> > > +1 to a 0.6.0 release that includes the Pcap Panel and Solr work.
> > >
> > > +1 to doing a 0.2.0 release for metron-bro-plugin-kafka.  I *think* we
> > need
> > > to do the plugin release first, so that the 0.6.0 Metron release will
> > point
> > > to plugin 0.2.0.
> > >
> > > FWIW, here are the changes since the last release.
> > >
> > > 6 days ago METRON-1730: Update steps to run pycapa on Centos 6
> (mmiklavc
> > > via mmiklavc) closes apache/metron#1152
> > > 2 weeks ago METRON-1701 Update General notes on the installation of
> > Pycapa
> > > on Kerberized cluster (MohanDV via nickwallen) closes
> apache/metron#1136
> > > 3 weeks ago METRON-1650 Packaging docker containers are too large
> > > (jameslamb via merrimanr) closes apache/metron#1091
> > > 3 weeks ago METRON-1604 : Add RHEL 7 power pc to OS family for the HCP
> > > management pack repo info closes apache/incubator-metron#1052
> > > 3 weeks ago METRON-1687: Upgrade the rat plugin to 0.13-SNAPSHOT closes
> > > apache/incubator-metron#1126
> > > 3 weeks ago METRON-1694: Clean up Metron REST docs closes
> > > apache/incubator-metron#1131
> > > 4 weeks ago METRON-1606 Add a 'wrap' to incoming messages in
> > the
> > > metron json parser (ottobackwards) closes apache/metron#1054
> > > 4 weeks ago METRON-1672 Add metron-alerts's UI unit tests to
> travis
> > > build process (justinleet) closes apache/metron#1106
> > > 4 weeks ago METRON-1684 Fix Markdown problems in 3rdPartyParser.md
> > > (justinleet) closes apache/metron#1110
> > > 4 weeks ago METRON-1657 Parser aggregation in storm (justinleet) closes
> > > apache/metron#1099
> > > 4 weeks ago METRON-1651 Fixing failing protractor e2e test (tiborm via
> > > merrimanr) closes apache/metron#1095
> > > 4 weeks ago METRON-1673 Fix Javadoc errors (justinleet) closes
> > > apache/metron#1107
> > > 4 weeks ago METRON-1620: Fixes for forensic clustering use case example
> > > (mmiklavc via mmiklavc) closes apache/metron#1065
> > > 4 weeks ago METRON-1659: The platform-info.sh should check for the
> > vagrant
> > > hostmanager plugin closes apache/incubator-metron#1100
> > > 4 weeks ago METRON-1658: Upgrade bro to 2.5.4 closes
> > > apache/incubator-metron#1101
> > > 4 weeks ago METRON-1236 Add start/stop/restart commands that execute
> > > successfully, when ambari agents run as non-root user closes
> > > apache/incubator-metron#1105
> > > 4 weeks ago METRON-1670: Stellar WEEK_OF_YEAR test is locale sensitive
> > > closes apache/incubator-metron#1104
> > > 5 weeks ago METRON-1660 On Solr, sorting by threat score fails
> > (justinleet)
> > > closes apache/metron#1102
> > > 5 weeks ago METRON-1656 Create KAKFA_SEEK function (nickwallen) closes
> > > apache/metron#1097
> > > 5 weeks ago METRON-1644: Support parser chaining closes
> > > apache/incubator-metron#1084
> > > 5 weeks ago METRON-1655 Make REGEXP_MATCH take multiple regexs in the
> 2nd
> > > arg (ottobackwards) closes apache/metron#1098
> > > 6 weeks ago METRON-1643: Create a REGEX_ROUTING field transformation
> > closes
> > > apache/incubator-metron#1083
> > > 6 weeks ago METRON-1652 Document X-Pack Common Problem (nickwallen)
> > closes
> > > apache/metron#1092
> > > 6 weeks ago METRON-1649 Intermittent Test Failure
> > > ProfileBuilderBoltTest#testFlushExpiredProfiles
> > > (nickwallen) closes apache/metron#1090
> > > 6 weeks ago METRON-1635 Alerts UI status update doesn't
> immediately
> > > show up (merrimanr) closes apache/metron#1080
> > > 6 weeks ago METRON-1642: KafkaWriter should be able choose the topic
> > from a
> > > field in addition to topology construction time closes
> > > apache/incubator-metron#1082
> > > 6 weeks ago METRON-1636: Fix broken unit test setup in metron-alerts
> > closes
> > > apache/incubator-metron#1085
> > > 7 weeks ago METRON-1631 Alerts UI: Dash score does not show if only
> > > filtering by one group (sardell via merrimanr) closes
> apache/metron#1079
> > > 7 weeks ago METRON-1647 Fix logging level score closes
> > > apache/incubator-metron#1089
> > > 7 weeks ago METRON-1621: Sorting alerts table by score closes
> > > apache/incubator-metron#1088
> > > 7 weeks ago METRON-1619: Stellar empty collections should be considered
> > > false in boolean expressions closes apache/incubator-metron#1064
> > > 7 weeks ago METRON-1646 Sensor Stubs should work when kerberized
> > > (nickwallen) closes apache/metron#1087
> > > 7 weeks ago METRON-1645: Check wether the Solr management pac

Re: [DISCUSS] Release cadence

2018-08-15 Thread Michael Miklavcic

I'm also a fan of the 2-3 month time frame for releases. And I agree it
fits nicely with our board report. That said, I think we should minimally
kick off a DISCUSS at least every 2 months per the recommendations above.
If it's warranted, great. If not, then we bring it up at a stated later
time for re-evaluation.

Fwiw, some upcoming features post-0.6.0 that I'm seeing which are also
large-ish and will fit nicely into the next cycle (pending completion, of
course):

   1. NiFi Metron parsers
   2. Profiler enhancements - bootstrapping, etc.
   3. Knox SSO



On Wed, Aug 15, 2018 at 11:10 AM Casey Stella  wrote:

> Strictly selfishly, I'd love for a release to happen quickly enough to have
> something to announce to the board during the reports.  Once every 2 months
> or when a sufficiently complicated change happens sounds like a sensible
> cadence.
>
> I very much support a "how do we get to 1.0" discussion, maybe as a
> separate thread?
>
> On Wed, Aug 15, 2018 at 11:56 AM zeo...@gmail.com 
> wrote:
>
> > I'm a fan of a hybrid time/feature-based cadence.  Something like "When 3
> > months has passed since our last release, or a sufficiently complicated
> > change has been introduced to master (like merging a FB), a discuss
> thread
> > is started".  I'm primarily thinking of what the upgrade path looks like
> > (more on that in a "how do we get to 1.0" discuss).
> >
> > Jon
> >
> > On Wed, Aug 15, 2018 at 11:02 AM Justin Leet 
> > wrote:
> >
> > > Hi all,
> > >
> > > In concert with the discuss thread on a potential 0.6.0 release, I'd
> also
> > > like start a discussion about our release cadence.  We've generally
> been
> > > pretty relaxed around doing releases, and I'm curious what people's
> > > thoughts are on adopting a somewhat more regular schedule.
> > >
> > > Couple questions I think are relevant
> > > 1. Is this something we should work towards and, if we do, how do we
> want
> > > to go about it?
> > >
> > >- "Whenever someone feels like pushing out a discuss thread"?
> > >- "Let's just start a discuss thread every X and if we want to
> release
> > >we release"?
> > >- "let's try to get a release out every X and what's on the bus is
> on
> > >the bus"?
> > >- Something else?
> > >
> > > 2. Assuming we do want to do more regular releases, what's the
> timeframe
> > > we'd like to shoot for?
> > >
> > > Personally, I'd like to just start a discuss thread regularly, with the
> > > built-in expectation that not every thread should necessarily lead to a
> > > release. I don't want to be forcing release overhead when there's not
> > > enough to merit a release, but releasing more often than we often do
> now
> > > would provide a lot of values to users.
> > >
> > > In terms of timeframe, I tend to think a 2-3 month cadence for the
> > threads
> > > is reasonable. It's long enough to potentially accrue enough features
> to
> > > merit a release, but short enough that when we pass on a release we're
> > > probably fine just waiting for another cycle to come around.  The last
> > > release was ~2 months ago and we have a good amount of stuff here, but
> I
> > > also don't expect two feature branches going in to be the norm.
> > >
> > > I'd expect whatever comes out of this thread to also be relatively
> > > informal. At least right now, I don't feel like we need a rigid
> schedule,
> > > and I'd still like people to feel encouraged to propose a release,
> > > particularly when there are a couple major features or critical fixes.
> > > Alternatively, I would expect some of these discuss threads to
> conclude,
> > > "We should do a release, but let's wait a couple waits for these
> tickets
> > to
> > > finish up" (e.g. like the Pcap query panel).
> > >
> > > Justin
> > >
> > --
> >
> > Jon
> >
>

Re: [DISCUSS] Release cadence

2018-08-15 Thread Michael Miklavcic

Works for me, that would be great.

On Wed, Aug 15, 2018 at 12:22 PM Casey Stella  wrote:

> If you like, I can volunteer to kick off a discuss thread when I submit the
> board report.
>
> On Wed, Aug 15, 2018 at 2:21 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I'm also a fan of the 2-3 month time frame for releases. And I agree it
> > fits nicely with our board report. That said, I think we should minimally
> > kick off a DISCUSS at least every 2 months per the recommendations above.
> > If it's warranted, great. If not, then we bring it up at a stated later
> > time for re-evaluation.
> >
> > Fwiw, some upcoming features post-0.6.0 that I'm seeing which are also
> > large-ish and will fit nicely into the next cycle (pending completion, of
> > course):
> >
> >1. NiFi Metron parsers
> >2. Profiler enhancements - bootstrapping, etc.
> >3. Knox SSO
> >
> >
> >
> > On Wed, Aug 15, 2018 at 11:10 AM Casey Stella 
> wrote:
> >
> > > Strictly selfishly, I'd love for a release to happen quickly enough to
> > have
> > > something to announce to the board during the reports.  Once every 2
> > months
> > > or when a sufficiently complicated change happens sounds like a
> sensible
> > > cadence.
> > >
> > > I very much support a "how do we get to 1.0" discussion, maybe as a
> > > separate thread?
> > >
> > > On Wed, Aug 15, 2018 at 11:56 AM zeo...@gmail.com 
> > > wrote:
> > >
> > > > I'm a fan of a hybrid time/feature-based cadence.  Something like
> > "When 3
> > > > months has passed since our last release, or a sufficiently
> complicated
> > > > change has been introduced to master (like merging a FB), a discuss
> > > thread
> > > > is started".  I'm primarily thinking of what the upgrade path looks
> > like
> > > > (more on that in a "how do we get to 1.0" discuss).
> > > >
> > > > Jon
> > > >
> > > > On Wed, Aug 15, 2018 at 11:02 AM Justin Leet 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > In concert with the discuss thread on a potential 0.6.0 release,
> I'd
> > > also
> > > > > like start a discussion about our release cadence.  We've generally
> > > been
> > > > > pretty relaxed around doing releases, and I'm curious what people's
> > > > > thoughts are on adopting a somewhat more regular schedule.
> > > > >
> > > > > Couple questions I think are relevant
> > > > > 1. Is this something we should work towards and, if we do, how do
> we
> > > want
> > > > > to go about it?
> > > > >
> > > > >- "Whenever someone feels like pushing out a discuss thread"?
> > > > >- "Let's just start a discuss thread every X and if we want to
> > > release
> > > > >we release"?
> > > > >- "let's try to get a release out every X and what's on the bus
> is
> > > on
> > > > >the bus"?
> > > > >- Something else?
> > > > >
> > > > > 2. Assuming we do want to do more regular releases, what's the
> > > timeframe
> > > > > we'd like to shoot for?
> > > > >
> > > > > Personally, I'd like to just start a discuss thread regularly, with
> > the
> > > > > built-in expectation that not every thread should necessarily lead
> > to a
> > > > > release. I don't want to be forcing release overhead when there's
> not
> > > > > enough to merit a release, but releasing more often than we often
> do
> > > now
> > > > > would provide a lot of values to users.
> > > > >
> > > > > In terms of timeframe, I tend to think a 2-3 month cadence for the
> > > > threads
> > > > > is reasonable. It's long enough to potentially accrue enough
> features
> > > to
> > > > > merit a release, but short enough that when we pass on a release
> > we're
> > > > > probably fine just waiting for another cycle to come around.  The
> > last
> > > > > release was ~2 months ago and we have a good amount of stuff here,
> > but
> > > I
> > > > > also don't expect two feature branches going in to be the norm.
> > > > >
> > > > > I'd expect whatever comes out of this thread to also be relatively
> > > > > informal. At least right now, I don't feel like we need a rigid
> > > schedule,
> > > > > and I'd still like people to feel encouraged to propose a release,
> > > > > particularly when there are a couple major features or critical
> > fixes.
> > > > > Alternatively, I would expect some of these discuss threads to
> > > conclude,
> > > > > "We should do a release, but let's wait a couple waits for these
> > > tickets
> > > > to
> > > > > finish up" (e.g. like the Pcap query panel).
> > > > >
> > > > > Justin
> > > > >
> > > > --
> > > >
> > > > Jon
> > > >
> > >
> >
>

Re: [DISCUSS] Pcap query branch completion

2018-08-16 Thread Michael Miklavcic

I'm +1, thanks for adding that fix, Ryan. (Note, for purposes of vote, I
was a contributor in the feature branch).

Mike

On Thu, Aug 16, 2018, 4:17 PM Ryan Merriman  wrote:

> We discovered a bug in our testing and felt it should be fixed before we
> merge.  There is a PR up for review that already has a +1:
> https://github.com/apache/metron/pull/1168.  I don't anticipate this
> changing anyone's vote but wanted to be clear about the state of the
> branch.  If anyone is concerned with this and would like more discussion
> before we merge, let me know.
>
> On Thu, Aug 16, 2018 at 8:25 AM, James Sirota  wrote:
>
> > +1 on the merge as well
> >
> > 16.08.2018, 05:46, "Casey Stella" :
> > > I'm +1 on the merge. This is great work and congrats to those who
> > > contributed to it!
> > >
> > > On Thu, Aug 16, 2018 at 8:27 AM Otto Fowler 
> > wrote:
> > >
> > >>  Looks good, thanks!
> > >>
> > >>  On August 15, 2018 at 19:38:12, Ryan Merriman (merrim...@gmail.com)
> > wrote:
> > >>
> > >>  Otto, I believe the items you requested are in the feature branch
> now.
> > Is
> > >>  there anything outstanding that we missed? The Jiras for the Pcap
> > feature
> > >>  branch should be up to date:
> > >>  https://issues.apache.org/jira/browse/METRON-1554
> > >>
> > >>  On Mon, Aug 13, 2018 at 5:13 PM, Ryan Merriman 
> > >>  wrote:
> > >>
> > >>  > - Date range limits on queries
> > >>  >
> > >>  > I will add a warning in the Job cleanup PR. That seems like an
> > >>  > appropriate place for it (ie. make sure you don't cause health
> > issues in
> > >>  > your cluster).
> > >>  >
> > >>  > - UI should manage a queue/history of jobs
> > >>  >
> > >>  > I can add some documentation around killing jobs manually with the
> > YARN
> > >>  > CLI. However if they haven't set up a YARN queue, I'm not sure how
> > you
> > >>  > would view only Pcap jobs. I'm also not sure how you would get the
> > >>  > application id for the job to kill because it's not displayed
> > anywhere in
> > >>  > the UI. However, I believe we are wired for a job name but REST
> > doesn't
> > >>  > set this. Maybe we could get a proper job name associated with pcap
> > >>  > queries and then this would be possible to document?
> > >>  >
> > >>  > - Documentation/blueprint for YARN configuration
> > >>  >
> > >>  > You make a good point. A YARN tuning guide for Metron does sound
> > useful.
> > >>  > I will add a follow on Jira.
> > >>  >
> > >>  > On Mon, Aug 13, 2018 at 4:53 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > >>  > wrote:
> > >>  >
> > >>  >>
> > >>  >> - Date range limits on queries
> > >>  >>
> > >>  >> I took the point the wrong way apparently, sorry, I withdraw. I
> > thought
> > >>  >> you meant allow specifying a limit on the query, not the system
> > imposing
> > >>  a
> > >>  >> limit.
> > >>  >> This should be documented with a warning or something
> > >>  >>
> > >>  >> - UI should manage a queue/history of jobs
> > >>  >>
> > >>  >> I was thinking that if there where multiple users/jobs, there
> should
> > >>  >> be some thought or documentation + script on how to manage them.
> > >>  >> “To see all the jobs still running on your cluster, across users
> > and ui
> > >>  >> instances do X”
> > >>  >> “If there is an issue with the jobs you can’t resolve in the UI
> for
> > that
> > >>  >> user, or you are an admin and want to do something then X"
> > >>  >>
> > >>  >> - Documentation/blueprint for YARN configuration
> > >>  >>
> > >>  >> I agree with what you are saying. Although, we offer guidance on
> > storm
> > >>  >> tuning, and that is conceptually the same isn’t it? That is why it
> > comes
> > >>  >> to mind.
> > >>  >> Maybe this can be a follow on, in the tuning guide?
> > >>  >>
> > >>  >> On August 13, 2018 at 17:36:41, Ryan Merriman (
> merrim...@gmail.com)
> > >>  >> wrote:
> > >>  >>
> > >>  >> - Date range limits on queries
> > >>  >>
> > >>  >> Can you describe what you think is needed here? Each Metron user
> > could
> > >>  >> have different volumes of pcap data spread out over different time
> > >>  >> periods. Are you saying we should limit the data range to
> something
> > >>  either
> > >>  >>
> > >>  >> constant or configurable? Are we sure all users would want this?
> Am
> > I
> > >>  >> misinterpreting this requirement?
> > >>  >>
> > >>  >> - UI should manage a queue/history of jobs
> > >>  >>
> > >>  >> What should we document here? Reading that bullet point again,
> it's
> > sort
> > >>  >> of vague and not very description. What I am referring to is a
> > design
> > >>  that
> > >>  >>
> > >>  >> provides users a way to view and manage jobs in the UI. Currently
> > jobs
> > >>  can
> > >>  >>
> > >>  >> only be run 1 at a time and progress is shown with a status bar,
> so
> > it's
> > >>  >> somewhat interactive.
> > >>  >>
> > >>  >> - Documentation/blueprint for YARN configuration
> > >>  >>
> > >>  >>
> > >>  >
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PMC- Apach

Re: [DISCUSS] Getting to a 1.0 release

2018-08-18 Thread Michael Miklavcic

Apologies for any spelling mishaps as I'm writing from my phone.

I'm for improving our docs. I'd like to see us guide our various profiles
of user towards the specific documentation for the abstraction levels
they'll be most interested in working from. I think we should have platform
docs about how we're a broadly useful, extensible streaming analytics
platform for cyber security as well as docs that emphasize more narrow and
specific use cases.

Personally, I think I see 3 potential tiers or classifications of docs.
These are just observations and ideas I had, not necessarily a prescription
for organizing docs:
- Low level tool instructions, eg
- how do I run the pcap toplogy and then query with the CLI and UI?
- Platform docs about building on top of Metron, e.g.
- writing custom parsers, enrichment, and threat Intel (imho we should
start to take a more opinionated view of leveraging Stellar as this
extension point rather than implementing new parser classes in Java)
- using the profiler for constructing outlier analysis use cases
- using MAAS for building and deploying models for use in enrichment
- Docs around more specific use cases that solve specific as opposed to
more general problems, similar to those we have in the use-cases folder.

I think one of our challenges currently is that our docs could be better
tailored to the "actors" we've talked about in the past. An individual SOC
analyst will have a very different set of interests than would a reseller
that wants to build on top of our platform to expose new modules and
functionality to those SOC analyst. And we can, and do, currently support
both.


On Sat, Aug 18, 2018, 9:34 AM Nick Allen  wrote:

> Yes, I imagine just a separate top level directory which would contain the
> docs.
>
> We would need someone to survey what doc tools are out there and provide
> some advice.
>
> Maybe we could look around at other open source projects that have done
> their docs well and emulate them.
>
> On Sat, Aug 18, 2018, 10:57 AM Kyle Richardson 
> wrote:
>
> > +1 to separating developer docs and user docs. How should we approach
> that.
> > Have a separate doc book? I haven’t had a ton of time to contribute to
> code
> > lately but I’d be happy to help write some of these.
> >
> > On Sat, Aug 18, 2018 at 9:48 AM Nick Allen  wrote:
> >
> > > Personally, I think the state of our docs and web presence is an
> > inhibitor
> > > to growing the Metron community.  Unless we can offer concise,
> compelling
> > > answers to the basic questions (What can I do with Metron?  Who does it
> > > help? How do I do that?), potential users and contributors are unable
> to
> > > see the value of Metron.
> > >
> > >
> > >
> > > On Sat, Aug 18, 2018 at 9:42 AM, Nick Allen 
> wrote:
> > >
> > > > I'd like to see us focus on improving our docs before a version 1.0.
> > > > Right now we just stitch together a bunch of READMEs, which is a
> great
> > > > stride from where we started, but is not ideal.
> > > >
> > > > Our docs should focused on the user and use cases; What can I do with
> > > > Metron?  Who does it help? How do I do that?
> > > >
> > > > The docs should be separate from the code base to allow for an
> > > > organization that is focused on the user rather than the
> > implementation.
> > > > This allows the READMEs to focus on the developer and the
> > implementation,
> > > > which should make them more digestible too.  The docs should be
> version
> > > > controlled and maintained through PRs, just like the code.  We should
> > > take
> > > > just as much pride in our docs as we do in our code.
> > > >
> > > >
> > > >
> > > > On Wed, Aug 15, 2018 at 4:35 PM, Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > >> Agreed, should we add TDE by default, and get the ranger policies on
> > by
> > > >> default? That leaves secured in Kafka, which would have to be built
> > into
> > > >> the consumers and producers to encrypt into the on disk Kafka
> topics.
> > > Does
> > > >> that seem necessary to people? It would have performance
> implications
> > > for
> > > >> sure.
> > > >>
> > > >> Simon
> > > >>
> > > >> > On 15 Aug 2018, at 21:26, Otto Fowler 
> > > wrote:
> > > >> >
> > > >> > Well, I look at it like this.
> > > >> >
> > > >> > The Secure Vault was part of the original metron pitch, and many
> may
> > > >> have used that as part of their evaluations.
> > > >> > “Look, it is going to have a security vault type thing, it is on
> the
> > > >> roadmap”.
> > > >> >
> > > >> > Regardless of the implementation, conceptually, security of data
> at
> > > >> rest is important, and is a major outstanding item or the core
> metron
> > > >> proposition.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >> On August 15, 2018 at 16:03:19, Simon Elliston Ball (
> > > >> si...@simonellistonball.com) wrote:
> > > >> >>
> > > >> >> That’s going back a way. I always saw that concept as begin about
> > the
> > > >> formats, e.g. Orc, and

Re: package.lock changes during build?

2018-08-25 Thread Michael Miklavcic

Somewhere along the line the dependencies appear to have changed, but the
file never got checked in. I don't like that this part of our build also
seems to be non-deterministic. If I build metron 0.4.x today, for instance,
what will I get? If the answer is "who knows?" that's unacceptable, imo.
I've glanced at the package file and see carrots littering the
dependencies, which as I understand it means "get me anything later than
this version." I do not think we should be doing that.

On Sat, Aug 25, 2018, 9:14 AM Casey Stella  wrote:

> I have looked into this for other reasons and the guidance that I've seen
> is to check in package-lock.json into source control.  I'll leave this
> stack overflow thread here:
>
> https://stackoverflow.com/questions/44206782/do-i-commit-the-package-lock-json-file-created-by-npm-5
>
> I want to point out that I hate that this changes as part of the build.  I
> haven't gotten a complete handle on exactly why package-lock is changing
> seemingly non-deterministically yet.
>
> Casey
>
> On Sat, Aug 25, 2018 at 11:05 AM Nick Allen  wrote:
>
> > Yes, I have noticed that also, but have not looked deeper.
> >
> > On Sat, Aug 25, 2018 at 10:32 AM Otto Fowler 
> > wrote:
> >
> > > I just did a PR, can saw that the package.lock file for alerts-ui was
> > > changed, with updated versions.
> > > I did *not* change the file, nor anything in metron-interface. That
> seems
> > > to imply that this file is changed or updated by
> > > something that happens during building or deploying full dev.
> > >
> > > Is this true?  How does this work?  Is this on purpose?
> > >
> > > ottO
> > >
> >
>

Re: package.lock changes during build?

2018-08-25 Thread Michael Miklavcic

You sir, are a gentleman and a scholar! Thanks for the background info, the
current state of affairs, the controversy, and finally (most of all) the
fix.

On Sat, Aug 25, 2018, 12:52 PM Shane Ardell 
wrote:

> NPM's use of lock files has been quite controversial. I won't go into it
> too deep here as there are endless posts criticizing and justifying their
> approach, but `npm install` will install all modules listed as dependencies
> in package.json and update package-lock.json accordingly instead of
> referencing the lock file. This caused a lot of outrage in the community (I
> would argue rightfully so), which led to a compromise in release 5.7.1 with
> `npm ci`. This command installs exactly what is specified in the
> package-lock.json.
>
> https://blog.npmjs.org/post/171556855892/introducing-npm-ci-for-faster-more-reliable
>
> Metron's build currently uses `npm install`, which is why we are seeing the
> package-lock.json update whenever we build locally. Coincidentally, I just
> addressed this by switching to `npm ci` in an open PR of mine because I
> noticed the same happening locally and I was already updating npm commands
> in the pom.xml.
>
> https://github.com/apache/metron/pull/1096/files#diff-e8f55f2d9e4f18085052a36d750e9648L60
>
>
>
> On Sat, Aug 25, 2018 at 7:13 PM Casey Stella  wrote:
>
> > Yeah, that's what I thought too, but I wonder if it triggers a change if
> > there's a dependency that is not version locked (i.e. the most recent
> > version of dependency x moved from y to z).
> >
> > On Sat, Aug 25, 2018 at 11:52 AM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > Somewhere along the line the dependencies appear to have changed, but
> the
> > > file never got checked in. I don't like that this part of our build
> also
> > > seems to be non-deterministic. If I build metron 0.4.x today, for
> > instance,
> > > what will I get? If the answer is "who knows?" that's unacceptable,
> imo.
> > > I've glanced at the package file and see carrots littering the
> > > dependencies, which as I understand it means "get me anything later
> than
> > > this version." I do not think we should be doing that.
> > >
> > >
> > > On Sat, Aug 25, 2018, 9:14 AM Casey Stella  wrote:
> > >
> > > > I have looked into this for other reasons and the guidance that I've
> > seen
> > > > is to check in package-lock.json into source control.  I'll leave
> this
> > > > stack overflow thread here:
> > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/44206782/do-i-commit-the-package-lock-json-file-created-by-npm-5
> > > >
> > > > I want to point out that I hate that this changes as part of the
> build.
> > > I
> > > > haven't gotten a complete handle on exactly why package-lock is
> > changing
> > > > seemingly non-deterministically yet.
> > > >
> > > > Casey
> > > >
> > > > On Sat, Aug 25, 2018 at 11:05 AM Nick Allen 
> > wrote:
> > > >
> > > > > Yes, I have noticed that also, but have not looked deeper.
> > > > >
> > > > > On Sat, Aug 25, 2018 at 10:32 AM Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I just did a PR, can saw that the package.lock file for alerts-ui
> > was
> > > > > > changed, with updated versions.
> > > > > > I did *not* change the file, nor anything in metron-interface.
> That
> > > > seems
> > > > > > to imply that this file is changed or updated by
> > > > > > something that happens during building or deploying full dev.
> > > > > >
> > > > > > Is this true?  How does this work?  Is this on purpose?
> > > > > >
> > > > > > ottO
> > > > > >
> > > > >
> > > >
> > >
> >
>

[DISCUSS] Feature branches post-merge

2018-09-06 Thread Michael Miklavcic

What are we doing with feature branches once they're complete and merged
into master? Is our expectation that we'll keep feature branches in
perpetuity, or should we plan to do some house cleaning once they've been
merged? I did a quick check of NiFi and Kafka and don't see much by way of
feature branches in their repos. I see plenty of RC's in both the branches
and tags listings, but nothing FB related. In previous discussions, we
talked quite a bit about us "trailblazing here," so it may be that this is
simply without much precedent and entirely for us to decide. I can
definitely see value in maintaining them for future reference, as it does
offer a nice bucket in which to collect the commits and discussion nicely,
but I wanted to get others' thoughts.

Best,
Mike

Re: [DISCUSS] Metron Release 0.6.0?

2018-09-07 Thread Michael Miklavcic

> show
> > > if
> > > > > only
> > > > > > > >> > filtering by one group (sardell via merrimanr) closes
> > > > > > > apache/metron#1079
> > > > > > > >> > 10 weeks ago METRON-1647 Fix logging level score closes
> > > > > > > >> > apache/incubator-metron#1089
> > > > > > > >> > 10 weeks ago METRON-1621: Sorting alerts table by score
> > closes
> > > > > > > >> > apache/incubator-metron#1088
> > > > > > > >> > 10 weeks ago METRON-1619: Stellar empty collections should
> > be
> > > > > > > considered
> > > > > > > >> > false in boolean expressions closes
> > > apache/incubator-metron#1064
> > > > > > > >> > 10 weeks ago METRON-1646 Sensor Stubs should work when
> > > > kerberized
> > > > > > > >> > (nickwallen) closes apache/metron#1087
> > > > > > > >> > 10 weeks ago METRON-1645: Check wether the Solr management
> > > pack
> > > > is
> > > > > > > >> > installed before configuring the solr principal name.
> closes
> > > > > > > >> > apache/incubator-metron#1086
> > > > > > > >> > 2 months ago Merge branch 'master' into
> > > > > > > feature/METRON-1416-upgrade-solr
> > > > > > > >> > 2 months ago METRON-1634 Alerts UI add comment
> doesn't
> > > > > > > immediately
> > > > > > > >> > show up. (merrimanr) closes apache/metron#1077
> > > > > > > >> > 2 months ago Merge branch 'master' into
> > > > > > > >> > feature/METRON-1554-pcap-query-panel
> > > > > > > >> > 2 months ago METRON-1555 Update REST to run YARN and MR
> jobs
> > > > > > > (merrimanr)
> > > > > > > >> > closes apache/metron#1019
> > > > > > > >> > 2 months ago METRON-1489 Retrofit UI tests to run reliably
> > > > during
> > > > > > > >> nightly
> > > > > > > >> > QE runs (sardell via nickwallen) closes apache/metron#1004
> > > > > > > >> > 2 months ago METRON-1637 Wrong path to escalate alert REST
> > > > > endpoint
> > > > > > > >> > (merrimanr) closes apache/metron#1078
> > > > > > > >> > 2 months ago METRON-1624 Set Profiler and Enrichment batch
> > > > > > parameters
> > > > > > > in
> > > > > > > >> > Ambari (nickwallen) closes apache/metron#1069
> > > > > > > >> > 2 months ago Merge remote-tracking branch 'origin/master'
> > into
> > > > > > > >> > feature/METRON-1416-upgrade-solr
> > > > > > > >> > 2 months ago Merge branch 'master' into
> > > > > > > feature/METRON-1416-upgrade-solr
> > > > > > > >> > (nickwallen) closes apache/metron#1075
> > > > > > > >> > 2 months ago METRON-1629 Update Solr documentation
> > (merrimanr
> > > > via
> > > > > > > >> > justinleet) closes apache/metron#1072
> > > > > > > >> > 3 months ago METRON-1633 Incorrect instructions when
> merging
> > > PR
> > > > > into
> > > > > > > >> > feature branch (nickwallen) closes apache/metron#1074
> > > > > > > >> > 3 months ago METRON-1630 Add threat.triage.score.field to
> > > > READMEs
> > > > > > > >> > (merrimanr) closes apache/metron#1073
> > > > > > > >> > 3 months ago METRON-1609 Elasticsearch settings in Ambari
> > > should
> > > > > not
> > > > > > > be
> > > > > > > >> > required if Solr is the indexer (nickwallen) closes
> > > > > > apache/metron#1056
> > > > > > > >> > 3 months ago METRON-1627 Alerts UI: Metaalert details
> > missing
> > > in
> > > > > > > details
> > > > > > > >> > panel when trying to add alert to existing metaalert
> > (sardell
> > > > via
> > > > > > > &g

Re: [DISCUSS] Feature branches post-merge

2018-09-07 Thread Michael Miklavcic

Ok, I'm +1 to this as well. Jira and our Git history does anything we need
for posterity (aside from those wanting to re-live old feature branch glory
days - who am I to judge?), so no need for the extra cruft. I've
participated on but not created a FB yet - is it an infra request to deal
with the CUD part of CRUD on a FB?

On Fri, Sep 7, 2018 at 11:20 AM zeo...@gmail.com  wrote:

> Yeah I don't have a good reason to suggest we keep 'em. so +1 to deleting
> old FBs.
>
> Jon
>
> On Fri, Sep 7, 2018 at 12:14 PM Nick Allen  wrote:
>
> > +1 delete old feature branches.
> >
> > BTW, there is a branch out there called METRON-113 that we probably need
> to
> > clean-up.  I'm not sure where that came from or why its still around.
> > Probably a fat-finger from long ago.
> >
> > On Fri, Sep 7, 2018 at 12:00 PM Otto Fowler 
> > wrote:
> >
> > > I would drop them.
> > > I’ve already clean up FB’s around dead things.
> > >
> > >
> > >
> > > On September 6, 2018 at 13:42:55, Michael Miklavcic (
> > > michael.miklav...@gmail.com) wrote:
> > >
> > > What are we doing with feature branches once they're complete and
> merged
> > > into master? Is our expectation that we'll keep feature branches in
> > > perpetuity, or should we plan to do some house cleaning once they've
> been
> > > merged? I did a quick check of NiFi and Kafka and don't see much by way
> > of
> > > feature branches in their repos. I see plenty of RC's in both the
> > branches
> > > and tags listings, but nothing FB related. In previous discussions, we
> > > talked quite a bit about us "trailblazing here," so it may be that this
> > is
> > > simply without much precedent and entirely for us to decide. I can
> > > definitely see value in maintaining them for future reference, as it
> does
> > > offer a nice bucket in which to collect the commits and discussion
> > nicely,
> > > but I wanted to get others' thoughts.
> > >
> > > Best,
> > > Mike
> > >
> >
> --
>
> Jon
>

Re: [GitHub] metron issue #1188: METRON-1769: Script creation of a release candidate

2018-09-07 Thread Michael Miklavcic

Yeah, the Angular upgrade was the other bit that came to mind. Shane's PR
for the Angular upgrade has the necessary +1's, but @nickwallen you had
requested we hold off on that for this release (which I completely agree
with). https://github.com/apache/metron/pull/1096

On Fri, Sep 7, 2018 at 10:24 AM nickwallen  wrote:

> Github user nickwallen commented on the issue:
>
> https://github.com/apache/metron/pull/1188
>
> > I'm assuming this always pulls HEAD from master to cut the release.
> Do we need or desire any support for cutting a release from a non-HEAD
> commit?
>
> It would be very useful to continue to merge PRs into master while a
> release is being voted on.
>
> I had thought that @mattf-horton use to do the releases in such a way
> that this was not a problem, but I could be wrong.
>
> For example, this morning I merged PR #1174 into master that I don't
> necessarily want in the next release.  I didn't think about the potential
> impact to the release if we have to cut a new RC.  Sorry about that
> @justinleet .
>
>
>
>
>
>
>
>
> ---
>

Re: [DISCUSS] Split apart releases for core Metron and the Bro plugin

2018-09-07 Thread Michael Miklavcic

+1 to deferring for this release and having the separation like NiFi. Since
we're bootstrapping from their process, what are they doing? I would assume
we'd want some sort of vote for the plugin version change as well.

On Fri, Sep 7, 2018 at 10:15 AM Nick Allen  wrote:

> +1 for complete separation as you've described.
>
> On Fri, Sep 7, 2018 at 11:31 AM Justin Leet  wrote:
>
> > I would like this to be a complete separation.  Complete with separate
> RCs,
> > separate call to vote, etc. There's a bit more overhead, but plugin
> > releases should be rarer and as the release infra gets improved and
> > scripted out more, I don't think it'll end up being much more than
> bundling
> > it together.
> >
> > On Fri, Sep 7, 2018 at 11:27 AM Nick Allen  wrote:
> >
> > > > Other projects, e.g. NiFi , split
> apart
> > > these releases within their dist directories.
> > >
> > > I prefer the way Nifi organizes it.  Definitely seems more logically
> > > organized.
> > >
> > >
> > > > If we split them apart, we can make the releases independently.  This
> > > fixes the problem of aligning the versions (simply release the plugin
> > > first, update full-dev, release core Metron).
> > >
> > > Does this entail a complete separation; including a separate
> > call-to-vote?
> > > One vote for core Metron and a separate vote for plugin?
> > >
> > >
> > > > Do we want try to get this separation done after the current release
> > > cycle is over?
> > >
> > > +1 Let's wait for the next release to hash this out.
> > >
> > >
> > >
> > >
> > > On Fri, Sep 7, 2018 at 10:27 AM Justin Leet 
> > wrote:
> > >
> > > > Right now, we tie together our main release and the Bro plugin, as
> seen
> > > in
> > > > our 0.4.2 release
> > > > https://archive.apache.org/dist/metron/0.4.2/ and the current RC.
> > > >
> > > > Other projects, e.g. NiFi , split
> apart
> > > > these
> > > > releases within their dist directories.
> > > >
> > > > In our case this might look something like
> > > > 0.5.0/
> > > > metron-bro-plugin-kafka
> > > > - 0.2.0/
> > > >
> > > > Right now, with the releases tied together, we aren't upgrading
> > full-dev
> > > > with the version of the plugin (because we're releasing
> simultaneously
> > > and
> > > > can't update the version number).
> > > >
> > > > If we split them apart, we can make the releases independently.  This
> > > fixes
> > > > the problem of aligning the versions (simply release the plugin
> first,
> > > > update full-dev, release core Metron).  The plugin also updates
> > > > substantially less often and we can just do those releases at a
> cadence
> > > we
> > > > choose.
> > > >
> > > > Any thoughts on doing this?
> > > > Do we want try to get this separation done after the current release
> > > cycle
> > > > is over?
> > > > If we do, do we have a preferred layout? I didn't see anything Apache
> > > > preferred in a quick search, but I definitely could have missed
> > something
> > > > (and https://checker.apache.org/projs/nifi.html looks clean for
> NiFi,
> > > so I
> > > > assume it's fine.)
> > > >
> > >
> >
>

Re: [GitHub] metron issue #1188: METRON-1769: Script creation of a release candidate

2018-09-07 Thread Michael Miklavcic

Whoops, yeah. Meant for a GitHub thread, not a email thread

On Fri, Sep 7, 2018, 1:19 PM Casey Stella  wrote:

> Mike, did you mean to reply to this on the dev list or were you aiming to
> make this comment on the PR?  If you were aiming to make this comment on
> the PR, then I think you need to go through github's UI.
>
> On Fri, Sep 7, 2018 at 1:34 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Yeah, the Angular upgrade was the other bit that came to mind. Shane's PR
> > for the Angular upgrade has the necessary +1's, but @nickwallen you had
> > requested we hold off on that for this release (which I completely agree
> > with). https://github.com/apache/metron/pull/1096
> >
> > On Fri, Sep 7, 2018 at 10:24 AM nickwallen  wrote:
> >
> > > Github user nickwallen commented on the issue:
> > >
> > > https://github.com/apache/metron/pull/1188
> > >
> > > > I'm assuming this always pulls HEAD from master to cut the
> release.
> > > Do we need or desire any support for cutting a release from a non-HEAD
> > > commit?
> > >
> > > It would be very useful to continue to merge PRs into master while
> a
> > > release is being voted on.
> > >
> > > I had thought that @mattf-horton use to do the releases in such a
> way
> > > that this was not a problem, but I could be wrong.
> > >
> > > For example, this morning I merged PR #1174 into master that I
> don't
> > > necessarily want in the next release.  I didn't think about the
> potential
> > > impact to the release if we have to cut a new RC.  Sorry about that
> > > @justinleet .
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---
> > >
> >
>

Re: [DISCUSS] Internal Metron fields

2018-09-07 Thread Michael Miklavcic

Can you elaborate on what you mean by "convert to internal?" From your
description, it looks like the challenge is from our violations of DRY when
it comes to constants referencing those keys, which would be eliminated by
refactoring.

On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman  wrote:

> I recently worked on a PR that involved changing the default behavior of
> the ElasticsearchWriter to store data using field names with the default
> Metron separator, dots.  One of the unfortunate consequences of this is
> that although dots are allowed in more recent versions of ES, it changes
> how these fields are stored.  Having a dot in a field name causes ES to
> treat it as an object field type.  We're not quite comfortable with this
> because it could introduce unforeseen side effects that may not be
> obvious.  Here's the PR:  https://github.com/apache/metron/pull/1181
>
> As I worked through it I noticed there are a couple fields that include
> separators where it's not actually necessary.  They are not nested by
> nature and are internal to Metron.  The fact that they are internal means
> they show up in constants and are hardcoded in several different places.
> That made the work in the PR above much harder and tedious than it should
> have been.  There are 2 in particular that I had to deal with:  source:type
> and threat:triage:score in metaalerts.
>
> Is it worth considering converting these to internal Metron fields so that
> they stay constant and this isn't a problem in the future?  I could see
> these fields following the same pattern as 'metron_alert'.  However this
> would cause pain when upgrading because existing data would need to be
> updated with these new fields.
>
> Just an idea.  Curious if other have an opinion on the subject.
>

Re: [MENTORS][DISCUSS] LICENSE and NOTICE likely outdated

2018-09-12 Thread Michael Miklavcic

I'm not sure I fully understand what is out of date. I know I have
personally modified our licenses a couple times in the past and used an
automated script that, I believe, Casey Stella had created for doing the
check. I even made some improvements to it a long ways back. It rips
through the maven dependency tree and tells you what isn't in the licenses
file and fails with a non-zero return code. I thought that was part of our
Travis build, or at the very least, the release lifecycle. Is that not the
case, or is there a different context we're talking about here?

I understand that convenience binaries might some issues with uberjars when
we go that route for 1.0. But is there any issue with the uberjars as
things currently stand? I was under the impression we are OK because we
don't distribute them. It's part of the build, just like tools such as
JUnit, that we don't actually ship.

Justin - These are the links for guidance that I've found. Is anything else
you've found that we should peruse while figuring this out?

   - https://www.apache.org/dev/licensing-howto.html
   - http://www.apache.org/legal/release-policy.html#artifacts

Mike

On Wed, Sep 12, 2018 at 10:29 AM Justin Leet  wrote:

> Hi all,
>
> As mentioned on the release voting thread, there was a Slack discussion
> around our LICENSE and NOTICE file likely being outdated because they
> haven't been actively kept up to date since graduation.  I suggested on the
> vote thread that we proceed with the current release, but consider it a
> blocker for the next release.
>
> Mentor input on this (and how other projects handle it), would be greatly
> appreciated.
>
> This discussion should result in JIRAs that are brought back to the thread,
> so we can make sure to track this.
>
> For context, in addition to the standard L&N management, when we build
> artifacts we shade a lot of jars into a uberjars, thus bundling
> dependencies.  However, our current releases are source only, but
> publishing convenience binaries came up in the 1.0 roadmap thread.
>
> I think there are a few things that need to happen to correct our current
> issue and make this easier in the future.
> 1) Get the LICENSE and NOTICE files up to date
> 2) Document the process we went through getting things up to date and (just
> as importantly) the reasoning behind it.
> 3) Update the PR checklist to include LICENSE and NOTICE files for new (and
> transitive) dependencies.
> 4) Update or add any processes we need to maintain this properly (e.g.
> release auditing)
> 5) Possibly build tooling for making some of this auditing easier (or use
> existing tool if anyone has suggestions)?
>
> Are there any other steps I'm missing that need to go into JIRAs?
> Any other concerns regarding these files that need to be addressed?
> Any other context I'm missing and that belongs in this discussion?
>

[DISCUSS] Knox SSO feature branch review and features

2018-09-14 Thread Michael Miklavcic

Hey all,

I started looking through the Knox SSO feature branch (see here
https://issues.apache.org/jira/browse/METRON-1663). This is some great new
security functionality work and it looks like it will bring some important
new features to the Metron platform. I'm coming at this pretty green, so I
do have some questions regarding the proposed changes from a high level
architectural perspective. There are a few changes within the current FB
PR's that I think could use some further explanation. At first glance, it
seems we could potentially simplify this branch a great deal and get it
completed much sooner if we narrowed the focus a bit. But I could certainly
be wrong here and happy for other opinions. I searched through the mailing
list history to see if there is any additional background and the main
DISCUSS thread I could find was regarding initially setting up the feature
branch, which talked about adding Knox and LDAP.
https://lists.apache.org/thread.html/cac2e6314284015b487121e77abf730abbb7ebec4ace014b19093b4c@%3Cdev.metron.apache.org%3E.
If I've missed any follow-up, please let me know.

Looking at the broader set of Jiras associated with 1663 and the first PR
1665, it looks like there are 4 main thrusts of this branch right now:

1. Knox/SSO
2. Node migrated to Spring Boot
3. JDBC removed completely in favor of LDAP
4. Introduction of Zuul, also microservices?

I strongly urge for the purpose of reviewing this feature branch that we
base much of the discussion off of
https://issues.apache.org/jira/browse/METRON-1755, the architecture
diagram. Minimally, an explanation of the current architecture along with
discussion around the additional proposed changes and rationale would be
useful for evaluation. I don't have a solid enough understanding yet of the
full scope of changes and how they differ from the existing architecture
just from looking at the PR's alone.

1. The first question is a general one regarding the necessity of the 3
additional features alongside Knox - migrating Node to Spring Boot,
removing JDBC altogether, adding dependencies on Netflix's Zuul framework.
Are these necessary for adding Knox/SSO? They seem like potentially
separate features, imo.
2. It looks like LDAP will be a required component for interacting with
Metron via the UI's. I see this PR
https://github.com/apache/metron/pull/1186 which removes JDBC
authentication. Are we ready to remove it completely or would it be better
to leave it as a minimal installation option? What is the proposed
migration path for existing users? Do we feel comfortable requiring that
all installations, including full dev, install and configure LDAP? For
comparison, in the PCAP feature branch we discussed removing the existing
PCAP REST application in the initial discussion, got agreement, and later
removed it in the course of working on the feature branch. The PR is fairly
clear, however I think we're just missing some basic discussion around the
implications, as I've outlined above. Some additional relevant discussion
occurred on this PR https://github.com/apache/metron/pull/1112 which
would be good to summarize for purposes of this overarching architecture
discussion.
3. Migration from Node to Spring Boot. I believe this is already used by
the REST application and if anything brings some cohesion to our server
strategy. Strictly speaking, is there a reason this is required for Knox?
It seems comparable to a component upgrade, such as moving from ES 2.x to
5.6.x and upgrading Angular 6.
4. Introduction of Netflix's Zuul.
https://issues.apache.org/jira/browse/METRON-1665.
- > "The UIs currently proxy to the REST API to avoid CORS issues,
this will be achieved with Zuul."
- Can we elaborate more on where or how CORS is a problem with our
existing architecture, how Zuul will help solve that, and how it
fits with
Knox? Wouldn't this be handled by Knox? Since Larry McCay chimed in with
interest on the original SSO thread about the FB, I'm hoping he is also
willing to chime in on this as well.
- This looks like it has the potential to be a rather large piece of
fundamental infrastructure (as it's also pertinent to microservices) to
pull into the platform, and I'd like to be sure the community is aware of
and is OK with the implications.
5. > "The proposal is to use a spring boot application, allowing us to
harmonize the security implementation across the UI static servers and the
REST layer, and to provide a routing platform for later microservices." -
https://issues.apache.org/jira/browse/METRON-1665.
- Microservices is a pretty loaded term. I know there had been some
discussion a while back during the PCAP feature branch start, but I don't
recall ever reaching a consensus on it. More detail in this thread -

https://lists.apache.org/thread.html/1db7c6fa

Re: [DISCUSS] Knox SSO feature branch review and features

2018-09-17 Thread Michael Miklavcic

 work just because we want to review these parts separately.
>
> For question 4, I will defer to Simon.  I don't believe we necessarily
> require Zuul so I will let him elaborate on why he choose that library and
> what the potential impact is of adding it to our project.
>
> For question 5 and 6, I will also defer to Simon on this.  The focus of
> this feature as I understand it is a consistent authentication mechanism
> and support for SSO.  I will let him lay out his vision for micro services.
>
> Knox SSO would be a great improvement and is what I think we should focus
> on in this feature branch.  Micro services is something we should certainly
> discuss but it might be a bit of a distraction and I wouldn't want to hold
> up the other useful parts.
>
> On Fri, Sep 14, 2018 at 8:38 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Hey all,
> >
> > I started looking through the Knox SSO feature branch (see here
> > https://issues.apache.org/jira/browse/METRON-1663). This is some great
> new
> > security functionality work and it looks like it will bring some
> important
> > new features to the Metron platform. I'm coming at this pretty green, so
> I
> > do have some questions regarding the proposed changes from a high level
> > architectural perspective. There are a few changes within the current FB
> > PR's that I think could use some further explanation. At first glance, it
> > seems we could potentially simplify this branch a great deal and get it
> > completed much sooner if we narrowed the focus a bit. But I could
> certainly
> > be wrong here and happy for other opinions. I searched through the
> mailing
> > list history to see if there is any additional background and the main
> > DISCUSS thread I could find was regarding initially setting up the
> feature
> > branch, which talked about adding Knox and LDAP.
> >
> >
> https://lists.apache.org/thread.html/cac2e6314284015b487121e77abf730abbb7ebec4ace014b19093b4c@%3Cdev.metron.apache.org%3E
> > .
> > If I've missed any follow-up, please let me know.
> >
> > Looking at the broader set of Jiras associated with 1663 and the first PR
> > 1665, it looks like there are 4 main thrusts of this branch right now:
> >
> >1.  Knox/SSO
> >2.  Node migrated to Spring Boot
> >3.  JDBC removed completely in favor of LDAP
> >4.  Introduction of Zuul, also microservices?
> >
> > I strongly urge for the purpose of reviewing this feature branch that we
> > base much of the discussion off of
> > https://issues.apache.org/jira/browse/METRON-1755, the architecture
> > diagram. Minimally, an explanation of the current architecture along with
> > discussion around the additional proposed changes and rationale would be
> > useful for evaluation. I don't have a solid enough understanding yet of
> the
> > full scope of changes and how they differ from the existing architecture
> > just from looking at the PR's alone.
> >
> >1. The first question is a general one regarding the necessity of the
> 3
> >additional features alongside Knox - migrating Node to Spring Boot,
> >removing JDBC altogether, adding dependencies on Netflix's Zuul
> > framework.
> >Are these necessary for adding Knox/SSO? They seem like potentially
> >separate features, imo.
> >2. It looks like LDAP will be a required component for interacting
> with
> >Metron via the UI's. I see this PR
> >https://github.com/apache/metron/pull/1186 which removes JDBC
> >authentication. Are we ready to remove it completely or would it be
> > better
> >to leave it as a minimal installation option? What is the proposed
> >migration path for existing users? Do we feel comfortable requiring
> that
> >all installations, including full dev, install and configure LDAP? For
> >comparison, in the PCAP feature branch we discussed removing the
> > existing
> >PCAP REST application in the initial discussion, got agreement, and
> > later
> >removed it in the course of working on the feature branch. The PR is
> > fairly
> >clear, however I think we're just missing some basic discussion around
> > the
> >implications, as I've outlined above. Some additional relevant
> > discussion
> >occurred on this PR https://github.com/apache/metron/pull/1112 which
> >would be good to summarize for purposes of this overarching
> architecture
> >discussion.
> >3. Migration from Node to Spring Boot. I believe this is

Re: [DISCUSS] Migrate from Protractor to Cypress

2018-09-19 Thread Michael Miklavcic

Shane,

Can you elaborate on the testing model you're proposing? I looked through
the overview and some of the documentation. As far as I can tell, this
would effectively be and e2e test for the UI *only*, so we would be missing
testing the actual integration points with the REST API or any other
potential endpoints.

   1. Are you proposing we migrate all existing e2e tests, including those
   that currently hit Elasticsearch?
   2. Would shifting to Cypress mean that all e2e tests would be isolated
   to only what is rendered via the browser? i.e. our e2e suite is no longer
   testing integration to a backend?

My assumption with the term e2e testing is that you are testing an entire
vertical slice with no substantive mock/stub/fake/spy/dummy [1] in the way
except for maybe some strategic cross-cutting concerns. It sounds like
Cypress does NOT mean full e2e. My initial reaction to this is that there's
a place for both forms of testing. If Cypress would help UI developers work
on incremental changes, similar to how unit tests via JUnit help Java
developers iterate on features, then I think that's great. I'm all for
that! But unit tests are only one form of testing - we also do integration
testing, which can flex multiple classes/components together, as well as
more broad stack integration/functional testing that verifies everything
works when integrated together. Generally speaking, total # of unit tests >
# of integration tests > # functional/acceptance tests. I think we should
carve out and define a testing approach for each. Can you elaborate a bit
on your vision for how to manage the test interactions, or lack thereof,
with the REST API as an integration endpoint? [2]

At the time the write-up James shared was written, it appears that Cypress
was not yet open source. Now, it's MIT license -
https://github.com/cypress-io/cypress/blob/develop/LICENSE.md.

Mike

1.
https://martinfowler.com/articles/mocksArentStubs.html#TheDifferenceBetweenMocksAndStubs
2. https://martinfowler.com/articles/practical-test-pyramid.html#UiTests

On Wed, Sep 19, 2018 at 8:47 AM James Sirota  wrote:

> This article comparing the two is not favorable for Cypress.  Are any of
> these concerns relevant to us?  If not, then I think Cypress is fine
>
>
> https://hackernoon.com/cypress-io-vs-protractor-e2e-testing-battle-d124ece91dc7
>
>

Re: [DISCUSS] Migrate from Protractor to Cypress

2018-09-20 Thread Michael Miklavcic

That's good feedback, thanks Shane!

On Thu, Sep 20, 2018 at 6:23 AM Shane Ardell 
wrote:

> While the Cypress team suggests taking advantage of stubs where you can,
> especially during development, we would definitely be able to test real
> endpoints [1]. It can be used exactly like how Protractor is used, but with
> the many benefits and features it provides [2]. Cypress also offers tools
> for unit testing [3], which I think may be causing confusion as to what
> exactly the library does. Cypress' main focus is e2e tests, but because of
> its architecture, it can be used for all types of tests.
>
> I agree with everything you mentioned, Mike. I think our approach now is
> fine, but in the future I do think it's worth considering the Cypress
> team's suggestions for when and when not to stub, but there are no hard and
> fast rules [4][5].
>
> I currently have a branch available on my fork where I've migrated over
> some e2e tests from Protractor to Cypress. With the exception of a little
> code cleanup, these tests perform the same steps as they do with
> Protractor. I have yet to include instructions in the README or include an
> npm script, but if anyone wants to see it in action they can do the
> following:
>
>- download this branch:
>https://github.com/sardell/metron/tree/METRON-1648,
>- run `npm ci` from meron-alerts,
>- start the e2e test server,
>- run  `./node_modules/.bin/cypress open`
>- start a single test by clicking on a file name in the Cypress user
>interface, or run them all by clicking the play button.
>
> I'll try to send some sort of benchmarks when I get a chance to show the
> speed difference between the two libraries.
>
> [1] https://docs.cypress.io/api/commands/request.html
> [2] https://www.cypress.io/features/
> [3] https://docs.cypress.io/guides/guides/stubs-spies-and-clocks.html
> [4]
>
> https://docs.cypress.io/guides/guides/network-requests.html#Testing-Strategies
> .
> [5]
>
> https://docs.cypress.io/guides/getting-started/testing-your-app.html#Stubbing-the-Server
>
> On Thu, Sep 20, 2018 at 12:09 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Shane,
> >
> > Can you elaborate on the testing model you're proposing? I looked through
> > the overview and some of the documentation. As far as I can tell, this
> > would effectively be and e2e test for the UI *only*, so we would be
> missing
> > testing the actual integration points with the REST API or any other
> > potential endpoints.
> >
> >1. Are you proposing we migrate all existing e2e tests, including
> those
> >that currently hit Elasticsearch?
> >2. Would shifting to Cypress mean that all e2e tests would be isolated
> >to only what is rendered via the browser? i.e. our e2e suite is no
> > longer
> >testing integration to a backend?
> >
> > My assumption with the term e2e testing is that you are testing an entire
> > vertical slice with no substantive mock/stub/fake/spy/dummy [1] in the
> way
> > except for maybe some strategic cross-cutting concerns. It sounds like
> > Cypress does NOT mean full e2e. My initial reaction to this is that
> there's
> > a place for both forms of testing. If Cypress would help UI developers
> work
> > on incremental changes, similar to how unit tests via JUnit help Java
> > developers iterate on features, then I think that's great. I'm all for
> > that! But unit tests are only one form of testing - we also do
> integration
> > testing, which can flex multiple classes/components together, as well as
> > more broad stack integration/functional testing that verifies everything
> > works when integrated together. Generally speaking, total # of unit
> tests >
> > # of integration tests > # functional/acceptance tests. I think we should
> > carve out and define a testing approach for each. Can you elaborate a bit
> > on your vision for how to manage the test interactions, or lack thereof,
> > with the REST API as an integration endpoint? [2]
> >
> > At the time the write-up James shared was written, it appears that
> Cypress
> > was not yet open source. Now, it's MIT license -
> > https://github.com/cypress-io/cypress/blob/develop/LICENSE.md.
> >
> > Mike
> >
> > 1.
> >
> >
> https://martinfowler.com/articles/mocksArentStubs.html#TheDifferenceBetweenMocksAndStubs
> > 2. https://martinfowler.com/articles/practical-test-pyramid.html#UiTests
> >
> >
> > On Wed, Sep 19, 2018 at 8:47 AM James Sirota  wrote:
> >
> > > This article comparing the two is not favorable for Cypress.  Are any
> of
> > > these concerns relevant to us?  If not, then I think Cypress is fine
> > >
> > >
> > >
> >
> https://hackernoon.com/cypress-io-vs-protractor-e2e-testing-battle-d124ece91dc7
> > >
> > >
> >
>

Re: [DISCUSS] Batch Profiler Feature Branch

2018-09-20 Thread Michael Miklavcic

I think I'm torn on this, specifically because it's batch and would
generally be run as-needed. Justin, can you elaborate on your concerns
there? This feels functionally very similar to our flat file loaders, which
all have inputs for config from the CLI only. On the other hand, our flat
file loaders are not typically seeding an existing structure. My concern of
a local file profiler config stems from this stated goal:
> The goal would be to enable “profile seeding” which allows profiles to be
populated from a time before the profile was created.
So if the config does not correctly match the profiler config held in ZK
and the user runs the batch seeding job, what happens?

On Thu, Sep 20, 2018 at 10:06 AM Justin Leet  wrote:

> The profile not being able to read from ZK feels like a fairly substantial,
> if subtle, set of potential problems.  I'd like to see that in either
> before merging or at least pretty soon after merging.  Is it a lot of work
> to add that functionality based on where things are right now?
>
> On Thu, Sep 20, 2018 at 9:59 AM Nick Allen  wrote:
>
> > Here is another limitation that I just thought. It can only read a
> profile
> > definition from a file.  It probably also makes sense to add an option
> that
> > allows it to read the current Profiler configuration from Zookeeper.
> >
> >
> > > Is it worth setting up a default config that pulls from the main
> indexing
> > output?
> >
> > Yes, I think that makes sense.  We want the Batch Profiler to point to
> the
> > right HDFS URL, no matter where/how Metron is deployed.  When Metron gets
> > spun-up on a cluster, I should be able to just run the Batch Profiler
> > without having to fuss with the input path.
> >
> >
> >
> >
> >
> > On Thu, Sep 20, 2018 at 9:46 AM Justin Leet 
> wrote:
> >
> > > Re:
> > >
> > > >  * You do not configure the Batch Profiler in Ambari.  It is
> configured
> > > > and executed completely from the command-line.
> > > >
> > >
> > > Is it worth setting up a default config that pulls from the main
> indexing
> > > output?  I'm a little on the fence about it, but it seems like making
> the
> > > most common case more or less built-in would be nice.
> > >
> > > Having said that, I do not consider that a requirement for merging the
> > > feature branch.
> > >
> > > On Wed, Sep 19, 2018 at 11:23 AM James Sirota 
> > wrote:
> > >
> > > > I think what you have outlined above is a good initial stab at the
> > > > feature.  Manual install of spark is not a big deal.  Configuring via
> > > > command line while we mature this feature is ok as well.  Doesn't
> look
> > > like
> > > > configuration steps are too hard.  I think you should merge.
> > > >
> > > > James
> > > >
> > > > 19.09.2018, 08:15, "Nick Allen" :
> > > > > I would like to open a discussion to get the Batch Profiler feature
> > > > branch
> > > > > merged into master as part of METRON-1699 [1] Create Batch
> Profiler.
> > > All
> > > > > of the work that I had in mind for our first draft of the Batch
> > > Profiler
> > > > > has been completed. Please take a look through what I have and let
> me
> > > > know
> > > > > if there are other features that you think are required *before* we
> > > > merge.
> > > > >
> > > > > Previous list discussions on this topic include [2] and [3].
> > > > >
> > > > > (Q) What can I do with the feature branch?
> > > > >
> > > > >   * With the Batch Profiler, you can backfill/seed profiles using
> > > > archived
> > > > > telemetry. This enables the following types of use cases.
> > > > >
> > > > >   1. As a Security Data Scientist, I want to understand the
> > > > historical
> > > > > behaviors and trends of a profile that I have created so that I can
> > > > > determine if I have created a feature set that has predictive value
> > for
> > > > > model building.
> > > > >
> > > > >   2. As a Security Data Scientist, I want to understand the
> > > > historical
> > > > > behaviors and trends of a profile that I have created so that I can
> > > > > determine if I have defined the profile correctly and created a
> > feature
> > > > set
> > > > > that matches reality.
> > > > >
> > > > >   3. As a Security Platform Engineer, I want to generate a
> > profile
> > > > > using archived telemetry when I deploy a new model to production so
> > > that
> > > > > models depending on that profile can function on day 1.
> > > > >
> > > > >   * METRON-1699 [1] includes a more detailed description of the
> > > feature.
> > > > >
> > > > > (Q) What work was completed?
> > > > >
> > > > >   * The Batch Profiler runs on Spark and was implemented in Java to
> > > > remain
> > > > > consistent with our current Java-heavy code base.
> > > > >
> > > > >   * The Batch Profiler is executed from the command-line. It can be
> > > > > launched using a script or by calling `spark-submit`, which may be
> > > useful
> > > > > for advanced users.
> > > > >
> > > > >   * Input telemetry can be consumed from multiple sources; for
> > example
> > > >

1 2 3 4 >

1 - 100 of 387 matches

Mail list logo