Re: [DISCUSS] New Stellar Functions

2017-04-10 Thread Michael Miklavcic
Hey Kyle,

It probably belongs here -
https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java
There is an existing JOIN function for strings - might this suit your
needs? I didn't see a unit test for it, so it would probably be good for us
to backfill with a test here as well. I can submit a PR for it, or if
you're already in that code, you're welcome to also -
https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java

e.g.
Object joined = run("JOIN(['A','B','C','D'], ':')",new HashedMap());
System.out.println(joined);
Object joined2 = run("JOIN(['A','B','C','D'], '')",new HashedMap());
System.out.println(joined2);

Output I get is:
A:B:C:D
ABCD

AFA where to put these functions, I believe we have a number of options now
that we have the ability to add to the storm topology classpath and
sideload jars.
- https://github.com/apache/incubator-metron/pull/204
- https://github.com/apache/incubator-metron/pull/468

I think that if the functions are unique to a customer, they should
probably be built as a stand-alone Maven project. I believe Otto is working
on this ATM if I'm not mistaken. If there is universal (across all
functions in the system, whether parsing, analytics or otherwise) then they
should probably go in with the dsl package in metron-common. At some point
we might want to make Stellar its own module, but there is some work there.

Best,
Mike


On Sun, Apr 9, 2017 at 2:26 PM, Kyle Richardson 
wrote:

> I have the need for a new Stellar function to perform string concatenation.
> I have it implemented but am curious about where new functions should live
> given the new capabilities around 3rd party Stellar function libraries.
>
> So, I guess my question is, should this function live in:
> 1) metron-common with the other string functions
> 2) another metron project
> 3) as a standalone project and not part of the metron source tree
>
> While I'm specifically asking about this case, I think it's also worthwhile
> that we think about where other new functions should live in the long term.
>
> Thanks!
>
> -Kyle
>


Re: [GitHub] incubator-metron issue #507: METRON-819: Document kafka console producer par...

2017-04-07 Thread Michael Miklavcic
Can you try listing and applying acls with the root user instead of metron?

On Fri, Apr 7, 2017 at 10:29 AM, nickwallen  wrote:

> Github user nickwallen commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/507
>
> I went through your instructions and all seemed well with the world.
> But then I tried to use the `kafka-console-producer` to actually write data
> to Kafka and it fails  Any ideas what the problem might be?
>
> ```
> [metron@node1 ~]$ kinit -kt /etc/security/keytabs/metron.headless.keytab
> met...@example.com
> [metron@node1 ~]$ echo "foo" | kafka-console-producer.sh
> --broker-list node1:6667 --topic yaf --security-protocol SASL_PLAINTEXT
> [2017-04-07 16:29:00,639] WARN The TGT cannot be renewed beyond the
> next expiry date: Sat Apr 08 16:28:58 UTC 2017.This process will not be
> able to authenticate new SASL connections after that time (for example, it
> will not be able to authenticate a new connection with a Kafka Broker).
> Ask your system administrator to either increase the 'renew until' time by
> doing : 'modprinc -maxrenewlife null ' within kadmin, or instead, to
> generate a keytab for null. Because the TGT's expiry cannot be further
> extended by refreshing, exiting refresh thread now.
> (org.apache.kafka.common.security.kerberos.KerberosLogin)
> [2017-04-07 16:29:00,897] WARN Error while fetching metadata with
> correlation id 0 : {yaf=TOPIC_AUTHORIZATION_FAILED}
> (org.apache.kafka.clients.NetworkClient)
> [2017-04-07 16:29:00,897] ERROR Error when sending message to topic
> yaf with key: null, value: 3 bytes with error: (org.apache.kafka.clients.
> producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not
> authorized to access topics: [yaf]
> ```
>
> I then tried to go back and check the Kafka ACLs and am now getting an
> error.  I was able to set the ACLs, but now I cannot see them.
>
> ```
> [metron@node1 ~]$ kinit -kt /etc/security/keytabs/metron.headless.keytab
> met...@example.com
> [metron@node1 ~]$ kafka-acls.sh --list --topic yaf
> --authorizer-properties zookeeper.connect=${ZOOKEEPER}:2181
> [2017-04-07 16:24:47,794] WARN Could not login: the client is being
> asked for a password, but the Zookeeper client code does not currently
> support obtaining a password from the user. Make sure that the client is
> configured to use a ticket cache (using the JAAS configuration setting
> 'useTicketCache=true)' and restart the client. If you still get this
> message after that, the TGT in the ticket cache has expired and must be
> manually refreshed. To do so, first determine if you are using a password
> or a keytab. If the former, run kinit in a Unix shell in the environment of
> the user who is running this Zookeeper client using the command 'kinit
> ' (where  is the name of the client's Kerberos principal). If
> the latter, do 'kinit -k -t  ' (where  is the name of
> the Kerberos principal, and  is the location of the keytab file).
> After manually refreshing your cache, restart this client. If you continue
> to see this message after manually refreshing yo
>  ur cache, ensure that your KDC host's clock is in sync with this host's
> clock. (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2017-04-07 16:24:47,796] WARN SASL configuration failed:
> javax.security.auth.login.LoginException: No password provided Will
> continue connection to Zookeeper server without SASL authentication, if
> Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)
> Error while executing ACL command: Authentication failure
> org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication
> failure
> at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> java:946)
>
> ```
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: Kerberos changes affected quick-dev and full-dev

2017-04-04 Thread Michael Miklavcic
Awesome, thanks David!

On Tue, Apr 4, 2017 at 7:16 AM, Casey Stella  wrote:

> Thanks David!
>
> On Mon, Apr 3, 2017 at 8:43 PM, David Lyle  wrote:
>
> > I've pushed a new Vagrant image for Quick Dev. You should be asked to
> > update the box the next time you 'vagrant up' Quick Dev.
> >
> > -D...
> >
> >
> > On Mon, Apr 3, 2017 at 2:33 PM, Casey Stella  wrote:
> >
> > > Thanks Justin,  the packer build is started, but this is going to take
> > some
> > > time.  Please use full-dev to validate your PRs in the meantime.  I
> will
> > > update this thread once it's uploaded.
> > >
> > > On Mon, Apr 3, 2017 at 2:13 PM, Justin Leet 
> > wrote:
> > >
> > > > The PR to fix full-dev is in master now.  We still need a new packer
> > > build
> > > > before we have quick-dev available again.
> > > >
> > > > Justin
> > > >
> > > > On Mon, Apr 3, 2017 at 10:53 AM, Justin Leet 
> > > > wrote:
> > > >
> > > > > Btw, here is a workaround for full-dev. In Ambari, add the line
> > > > "topology.worker.childopts="
> > > > > (no argument) to the elasticsearch.properties template, then
> restart
> > > > > indexing through Ambari to propogate the change out.
> > > > >
> > > > > For example, make the Storm section look like:
> > > > >
> > > > > # Storm #
> > > > > indexing.workers=1
> > > > > indexing.executors=0
> > > > > topology.worker.childopts=
> > > > >
> > > > > Justin
> > > > >
> > > > > On Mon, Apr 3, 2017 at 10:46 AM, Casey Stella 
> > > > wrote:
> > > > >
> > > > >> Hey guys,
> > > > >>
> > > > >> Just a quick heads up, the kerberos related changes (797 and 793)
> > that
> > > > >> went
> > > > >> in last week had mpack changes.  This means that a new packer
> build
> > > > needs
> > > > >> to be updated for quickdev to work.  Unfortunately, that didn't
> > happen
> > > > >> *and* there's a follow-on bug (METRON-818) that also involves
> mpack
> > > > >> changes
> > > > >> (https://github.com/apache/incubator-metron/pull/506).
> > > > >>
> > > > >> Also, this change fixes a bug introduced in 797 with full-dev, so
> > it's
> > > > >> getting high priority attention just right now.  I just wanted to
> > send
> > > > an
> > > > >> update and make sure everyone was aware of what's going on if you
> > try
> > > > >> full-dev or quick-dev and it fails for you.  I expect 818 to get
> in
> > > > >> quickly
> > > > >> (already has 2 +1s, so pretty much as soon as travis returns we'll
> > > > >> commit),
> > > > >> which should fix full-dev.
> > > > >>
> > > > >> Casey
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Metron VP

2017-03-14 Thread Michael Miklavcic
+1 to Casey (nonbinding)

On Sun, Mar 12, 2017 at 10:20 PM, James Sirota  wrote:

> I would like to propose that Casey Stella be our VP upon graduation.  I
> think has been the most outspoken proponent of the "Apache way" on our
> project and has made very significant contributions to moving it forward.
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-06 Thread Michael Miklavcic
Ok, yes I agree. In my experience with e2e/acceptance tests, they're best
kept general with an emphasis on verifying that all the plumbing works
together. So yes, there are definite edge cases I think we'll want to test
here, but I say that with the caveat that I think we should ideally cover
as many non-happy-path cases in unit and integration tests as possible. As
an example, I don't think it makes sense to cover most of the profiler
windowing DSL language edge cases in acceptance tests instead of or in
addition to unit/integration tests unless there is something specific to
the integration with a given an environment that we think could be
problematic.

M

On Mon, Mar 6, 2017 at 11:32 AM, Casey Stella <ceste...@gmail.com> wrote:

> No, I'm saying that they shouldn't be restricted to real-world use-cases.
> The E2E tests I laid out weren't real-world, but they did exercise the
> components similar to real-world use-cases.  They should also be able to be
> able to tread outside of the happy-path for those use-cases.
>
> On Mon, Mar 6, 2017 at 6:30 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > "I don't think acceptance tests should loosely associate with real uses,
> > but they should
> > be free to delve into weird non-happy-pathways."
> >
> > Not following - are you saying they should *tightly* associate with real
> > uses and additonally include non-happy-path?
> >
> > On Fri, Mar 3, 2017 at 12:57 PM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> > > It is absolutely not a naive question, Matt.  We don't have a lot (or
> > any)
> > > docs about our integration tests; it's more of a "follow the lead" type
> > of
> > > thing at the moment, but that should be rectified.
> > >
> > > The integration tests spin up and down infrastructure in-process, some
> of
> > > which are real and some of which are mock versions of the services.
> > These
> > > are good for catching some types of bugs, but often things sneak
> through,
> > > like:
> > >
> > >- Hbase and storm can't exist in the same JVM, so HBase is mocked in
> > >those cases.
> > >- The FileSystem that we get for Hadoop is the LocalRawFileSystem,
> not
> > >truly HDFS.  There are differences and we've run into
> > them..hilariously
> > > at
> > >times. ;)
> > >- Things done statically in a bolt are shared across all bolts
> because
> > >they all are threads in the same process
> > >
> > > It's good, it catches bugs, it lets us debug things easily, it runs
> with
> > > every single build automatically via travis.
> > > It's bad because it's awkward to get the dependencies isolated
> > sufficiently
> > > for all of these components to get them to play nice in the same JVM.
> > >
> > > Acceptance tests would be run against a real cluster, so they would:
> > >
> > >- run against real components, not testing or mock components
> > >- run against multiple nodes
> > >
> > > I can imagine a world where we can unify the two to a certain degree in
> > > many cases if we could spin up a docker version of Metron to run as
> part
> > of
> > > the build, but I think in the meantime, we should focus on providing
> > both.
> > >
> > > I suspect the reference application is possibly inspiring my
> suggestions
> > > here, but I think the main difference here is that the reference
> > > application is intended to be informational from a end-user
> perspective:
> > > it's detailing a use-case that users will understand.  I don't think
> > > acceptance tests should loosely associate with real uses, but they
> should
> > > be free to delve into weird non-happy-pathways.
> > >
> > > On Fri, Mar 3, 2017 at 2:16 PM, Matt Foley <ma...@apache.org> wrote:
> > >
> > > > Automating stuff that now has to be done manually gets a big +1.
> > > >
> > > > But, Casey, could you please clarify the relationship between what
> you
> > > > plan to do and the current “integration test” framework?  Will this
> be
> > in
> > > > the form of additional integration tests? Or a different test
> > framework?
> > > > Can it be done in the integration test framework, rather than
> creating
> > > new
> > > > mechanism?
> > > >
> > > > BTW, if that’s a naïve question, forgive me, but I could find zero
> > > > documentation for the existing in

Re: [DISCUSS] Wiki use and migration of docs

2017-03-06 Thread Michael Miklavcic
Just to clarify the point about migrating from the wiki - you'd like to see
use cases, demos, cookbooks, etc. moved into the git repo? I'm +1 for this
as it allows us to version the examples with each commit and release. A
feature that is currently lacking and more difficult to accomplish on the
wiki. We could always link the wiki and/or Metron website to the latest
master branch for up to date examples. I'd like to give people more avenues
to access information than less, but definitely do not want to duplicate
docs.

I think it also makes sense to me to keep the examples closer to the code.
As a compromise between project root level vs package level, I think
keeping an examples folder within each Maven module seems reasonable to me.
Examples.md seems ok. We might even choose to create a folder for examples
and name the md file by feature. Or we could create sub-folders naming the
feature and drop examples.md files in each folder. I'd also like to include
the concept of "cookbook" examples here as well. I think there is overlap
with demos and use cases, but cookbook examples can often be more specific
to a fine grained task.

Mike


On Mon, Mar 6, 2017 at 7:02 AM, zeo...@gmail.com  wrote:

> bump
>
> On Sat, Feb 11, 2017 at 2:15 PM zeo...@gmail.com  wrote:
>
> > This morning I had an opportunity to watch the video from yesterday's
> > community demo, and there was some really good discussion towards the end
> > about documentation of examples that I wanted to follow up with.  For
> > future reference, here is
> > the recording of what I'm referring to - this is all as a follow-up to
> > Matt's great work via METRON-660
> > .
> >
> > I am looking for feedback on an idea for the future of Metron
> > documentation.  At a high level, I would like to migrate materials from
> the
> > wiki pages throughout the git repo and modify our documentation
> generation
> > scripts to key in on tutorials vs readmes.  Once we have agreement on
> this
> > I would be happy to handle any data migration and manipulation as
> necessary.
> >
> > More specifically, I would like to establish a convention for the names
> of
> > example or tutorial md files that we could then use when generating the
> > release documentation.  Say we use "examples.md", we could then generate
> > an examples/tutorials top level area in the site-docs without having to
> add
> > it into the git repo itself.  In addition, this lets the examples.md
> > files exist more closely to the code they are about, which seems to be
> the
> > preference of most people currently working on the project.
> >
> > A good example of this would be to break Casey's outlier analysis example
> >  metron-analytics/metron-statistics#outlier-analysis>
> >   into a new examples.md in the same directory.  I would think more
> > generalized examples/tutorials would exist in the root of the git repo.
> > I'm also game for arguments that we take another approach, such as
> making a
> > new top level folder in the repo for all examples/tutorials, but that
> would
> > be less preferred in my opinion.
> >
> > We could probably move the overview
> > ,
> > architecture
> >  >,
> > tutorials
> > , and
> > governance
> > 
> > wiki materials without much of an issue.  Pages like the tech talks
> >  and
> > community
> > 
> information probably
> > fit better in the Metron site
> >  area of
> > GitHub, and not as a md.  The items that I wouldn't be sure about
> migrating
> > are things like the user research
> >  or
> meeting
> > notes  >.
> > Is there still value in having these materials published?  Maybe we leave
> > them behind in the Wiki and use it as more of an archive store for
> > historical context?
> >
> > If I don't get any strong disagreement with this idea, I'm going to throw
> > together a first attempt.  I also opened a ticket for this - METRON-714
> > .
> >
> > Jon
> > --
> >
> > Jon
> >
> > Sent from my mobile device
> >
> --
>
> Jon
>
> Sent from my mobile device
>


Re: [DISCUSS][PROPOSAL] Acceptance Tests

2017-03-06 Thread Michael Miklavcic
"I don't think acceptance tests should loosely associate with real uses,
but they should
be free to delve into weird non-happy-pathways."

Not following - are you saying they should *tightly* associate with real
uses and additonally include non-happy-path?

On Fri, Mar 3, 2017 at 12:57 PM, Casey Stella  wrote:

> It is absolutely not a naive question, Matt.  We don't have a lot (or any)
> docs about our integration tests; it's more of a "follow the lead" type of
> thing at the moment, but that should be rectified.
>
> The integration tests spin up and down infrastructure in-process, some of
> which are real and some of which are mock versions of the services.  These
> are good for catching some types of bugs, but often things sneak through,
> like:
>
>- Hbase and storm can't exist in the same JVM, so HBase is mocked in
>those cases.
>- The FileSystem that we get for Hadoop is the LocalRawFileSystem, not
>truly HDFS.  There are differences and we've run into them..hilariously
> at
>times. ;)
>- Things done statically in a bolt are shared across all bolts because
>they all are threads in the same process
>
> It's good, it catches bugs, it lets us debug things easily, it runs with
> every single build automatically via travis.
> It's bad because it's awkward to get the dependencies isolated sufficiently
> for all of these components to get them to play nice in the same JVM.
>
> Acceptance tests would be run against a real cluster, so they would:
>
>- run against real components, not testing or mock components
>- run against multiple nodes
>
> I can imagine a world where we can unify the two to a certain degree in
> many cases if we could spin up a docker version of Metron to run as part of
> the build, but I think in the meantime, we should focus on providing both.
>
> I suspect the reference application is possibly inspiring my suggestions
> here, but I think the main difference here is that the reference
> application is intended to be informational from a end-user perspective:
> it's detailing a use-case that users will understand.  I don't think
> acceptance tests should loosely associate with real uses, but they should
> be free to delve into weird non-happy-pathways.
>
> On Fri, Mar 3, 2017 at 2:16 PM, Matt Foley  wrote:
>
> > Automating stuff that now has to be done manually gets a big +1.
> >
> > But, Casey, could you please clarify the relationship between what you
> > plan to do and the current “integration test” framework?  Will this be in
> > the form of additional integration tests? Or a different test framework?
> > Can it be done in the integration test framework, rather than creating
> new
> > mechanism?
> >
> > BTW, if that’s a naïve question, forgive me, but I could find zero
> > documentation for the existing integration test capability, neither wiki
> > pages nor READMEs nor Jiras.  If there are any docs, please point me at
> > them.  Or even archived email threads.
> >
> > There is also something called the “Reference Application”
> > https://cwiki.apache.org/confluence/display/METRON/
> > Metron+Reference+Application which sounds remarkably like what you
> > propose to automate.  Is there / can there / should there be a
> relationship?
> >
> > Thanks,
> > --Matt
> >
> > On 3/3/17, 7:40 AM, "Otto Fowler"  wrote:
> >
> > +1
> >
> > I agree with Justin’s points.
> >
> >
> > On March 3, 2017 at 08:41:37, Justin Leet (justinjl...@gmail.com)
> > wrote:
> >
> > +1 to both. Having this would especially ease a lot of testing that
> > hits
> > multiple areas (which there is a fair amount of, given that we're
> > building
> > pretty quickly).
> >
> > I do want to point out that adding this type of thing makes the speed
> > of
> > our builds and tests more important, because they already take up a
> > good
> > amount of time. There are obviously tickets to optimize these things,
> > but
> > I would like to make sure we don't pile too much on to every testing
> > cycle
> > before a PR. Having said that, I think the testing proposed is
> > absolutely
> > valuable enough to go forward with.
> >
> > Justin
> >
> > On Fri, Mar 3, 2017 at 8:33 AM, Casey Stella 
> > wrote:
> >
> > > I also propose, once this is done, that we modify the developer
> > bylaws
> > and
> > > the github PR script to ensure that PR authors:
> > >
> > > - Update the acceptance tests where appropriate
> > > - Run the tests as a smoketest
> > >
> > >
> > >
> > > On Fri, Mar 3, 2017 at 8:21 AM, Casey Stella 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > After doing METRON-744, where I had to walk through a manual test
> > of
> > > every
> > > > place that Stellar touched, it occurred to me that we should
> script
> > this.
> > > > It also occurred to me that some 

Re: Metron Travis Build parameters

2017-02-28 Thread Michael Miklavcic
I can't think of any. I am +1 for adding it.

On Tue, Feb 28, 2017 at 1:38 PM, Otto Fowler 
wrote:

> 1395.36s$ time mvn -q -T 2C -DskipTests install && time mvn -q -T 2C
> surefire:test@unit-tests && mvn -q surefire:test@integration-tests && time
> build_utils/verify_licenses.sh
>
> Is there a reason we don’t ‘time’ the integration tests?
>


Re: [DISCUSS] Making adding new 3rd-party Stellar functions easier

2017-02-28 Thread Michael Miklavcic
Just so I'm clear Otto, when you discuss the UBER jar, this is a basic
parser jar with the system libraries from Metron. And based on METRON-744,
we would now be able to load parser jars that are created with the Maven
archetypes developed in METRON-258 by loading them into a specific location
in HDFS and simply restarting the topology? METRON-744 addresses Stellar
functions, so it sounds like we would just need to have METRON-258 expand
this capability to allow more than just custom classloading for Stellar
functions. I like the sound of this. Have I missed anything?

On Tue, Feb 28, 2017 at 10:36 AM, Otto Fowler 
wrote:

> Casey and I had some discussion on this today
> , and this what we
> roughed out with regards to custom class loaders and parsers and like to
> throw out there as a way forward.
>
> * See how METRON-744 goes, getting the class loading base framework going.
> * Look at METRON-258, which is the initial side loading work for parsers (
> which itself is dependent on the METRON-671 ).
> * We change the bolt’s parser loading to use the new class loading
> mechanism, but with the same jars, as well as other areas where it would
> makes sense to do so.
>
> * Evolve METRON-258’s ‘archive’ deployment to a more custom deployment
> similar to NiFi’s nars
> ** This would include changing the parser’s dependencies on ‘metron’ jars
> to be provided
> ** This would include a /repo deployment  for the jars the parser depends
> on ( outside of system jars )
> ** This would include archetype changes
> ** Expand the class loading to support loading the repos
> ** This would include deployment of parsers to hdfs
>
> The idea here being, there is only one UBER jar submitted to storm for
> parsers.  It contains the metron system frameworks.  By configuration, it
> creates a custom class loader for the parser/repo parser/lib directories
> and loads and runs the parser then.
>
>
> On February 28, 2017 at 09:28:31, Casey Stella (ceste...@gmail.com) wrote:
>
> I started tinkering with the idea of the classloader to see about how hard
> it would be and if it would even be feasible and realized it pretty much
> writes itself for a MVP, so I submitted a PR (sans testing plan, which I'll
> get to today): METRON-744 (
> https://github.com/apache/incubator-metron/pull/468)
>
> This would essentially conform to the second step above.
>
> On Mon, Feb 27, 2017 at 2:52 PM, Matt Foley  wrote:
>
> > Couple thoughts:
> >
> > 1. I see the Accumulo class loader allows multiple clients with
> > potentially conflicting loads, via the “context” mechanism. That’s good.
> > NiFi also used a multi-classloader mechanism to support potentially
> > conflicting side-loads of their Processor bundles (“nars”), but I don’t
> > think they supported re-loading (altho it’s been a few months since I
> > looked at it).
> >
> > 2. I like the idea of loading from a configured location in HDFS. This
> > gives a far smaller scope of filesystem to be watched and/or searched,
> and
> > of course obviates the deploy-to-many-servers problem. Altho it costs
> > another upload/maintenance tool for the admin to fiddle with.
> >
> > Thanks,
> > --Matt
> >
> > On 2/27/17, 11:22 AM, "Casey Stella"  wrote:
> >
> > Hi All,
> >
> > The benefit of Stellar is that adding new functionality is as simple as
> > providing a Jar. This enables people who want to integrate with
> > Metron to
> > easy add enrichments or other functionality. The snag currently with
> > this
> > is that we provide a single jar, so all stellar functions that we have
> > available must be dependencies of the main jar that drives the topology
> > plus what local directories we can configure via the storm configs.
> > This
> > makes the process of adding 3rd party jars not as easy as it could be.
> >
> > What I'm proposing is the following and I'd like to get some community
> > feedback on it:
> >
> > - Split the stellar lang into its own project which does not shade
> > its
> > dependencies from metron-common
> > - this makes creating your own stellar functions easier as you
> > only
> > need depend on a small project
> > - Adjust the the following to additionally load classes from a
> > location
> > in HDFS /apps/metron/stellar using something like accumulo (
> > https://accumulo.apache.org/blog/2014/05/03/accumulo-
> > classloader.html)
> > - Profiler topology
> > - Parser topology
> > - Enrichment topology
> > - Enrichment Flat file loader
> > - Enrichment MR loader
> > - Make the classloader reload upon new files
> > - This would necessitate a new Stellar FunctionResolver
> >
> > I'd like to propose starting with the first two and attempting the
> > third
> > after we get something stable with the first 2.
> >
> > What this will give us is the following workflow to enable new stellar
> > functions:
> >
> > - Build your function depending on stellar-lang into a Jar
> > - 

Re: [DISCUSS] Top domains enrichment config/extractor management

2017-02-24 Thread Michael Miklavcic
The reason I posed this question to the community is because I started to
recognize some of the shortcomings of doing this solely through Ambari, as
you and Nick have pointed out. I think an Ambari view over the management
UI is a great idea. And I'd love to see us provide a more robust mechanism
for loading these enrichments via the management UI. As you said, perhaps
Ambari could be used to manage the ZK config around active
enrichments/locations (the "USE" part of it) while the management UI is
used for actually loading and managing the enrichments themselves?


On Fri, Feb 24, 2017 at 8:12 AM, Casey Stella <ceste...@gmail.com> wrote:

> Late to chime in here, but I feel that we have discussed Ambari's role
> before and I think we should probably clarify, as a community a few things
> with regards Ambari vs a management UI built around the REST PR currently
> under review.  (I promise, I will get to the topic at hand eventually ;) :
>
>- Where functionality should live
>- Who is responsible for what
>
> I will now make a couple (possibly controversial) statements (some of
> which) we have actually discussed prior to this on the dev list:
>
>
>- I view Ambari as managing the install and the static configuration for
>Metron.  For us, this would include zookeeper configs as well as
> topology
>configuration.  This would be the persistent store of truth.
>- I view Zookeeper to be our runtime configuration store for the
>topologies.
>
>
>- I view a management UI (and the Stellar Shell) as managing
>functionality for interacting with the system.  Where it changes
>configuration, it must go through Ambari.
>- I believe the management UI should be exposed as an ambari view
>
> As such, I see the importation and management of enrichments, which is a
> data task, to be squarely in the purview of the management UI, whose job is
> the care and feeding of the data.  That being said, any configuration
> changes to USE the enrichment should at least be routed through ambari, but
> should be managed in the UI.
>
> Now the question becomes, should we have enrichment collateral (I'm
> including both hbase as well as geo or anything else we have) loaded at
> install-time.  I would argue that we should not.  Rather, we should design
> the management UI so that the enrichments can be added easily, with a
> wizard to enable the use of the enrichment via stellar for a sensor
>
> On that topic, I think we are doing too much as part of our install.  I
> would argue that we shouldn't pre-load even the geo data or depend on it
> for the default parsers.
>
> Casey
>
>
>
> On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > With the work committed in
> > https://github.com/apache/incubator-metron/pull/445 and
> > https://github.com/apache/incubator-metron/pull/432, we now have a
> robust
> > and flexible means to import enrichment sources and transform their
> > contents as they are inserted into HBase. One of the main motivators for
> > this new functionality was to add the ability to load top domain rankings
> > from sources such as Alexa. The proposal is to make this type of
> enrichment
> > a top-level feature in Metron by introducing it to the Ambari management
> UI
> > as a configurable set of properties in the MPack install. This comes with
> > some options and challenges in how we want to manage the configurations,
> > which I will outline below.
> >
> > *Use cases:*
> >
> >- Single load of top domains file
> >- Re-loading top domains file - need to be able to cleanup properly
> >- Cleaning up/deleting old enrichment data (this is a general feature
> >that we currently lack - I think it is worth a separate Jira/PR for
> >creating a MapReduce job that enables cleanup to occur).
> >- Modifying default top domains file source - there are other options
> >besides Alexa. And users may want to load a file from local URI since
> > many
> >data centers do not have direct access to the internet.
> >- Ability to modify the default extractor config JSON and tune the
> >Stellar transformations for both the value and indicator transforms.
> > Allows
> >more flexible handling of data based on other sources.
> >- Loading multiple top domains source enrichments. (Maybe a separate
> PR
> >for this if we even think it would be useful)
> >- Updating the top domain enrichment - This needs to be an atomic
> >operation in order to prevent incorrect data.
> >- Rolling back to an older version of the top domains enrichment.

Re: [DISCUSS] Top domains enrichment config/extractor management

2017-02-24 Thread Michael Miklavcic
(1) Agreed on supporting n data sources and their lifecycle. I don't
believe we are currently managing updating the Geo enrichments via Ambari,
but I definitely think this solution should handle that in a
datastore-agnostic way as well. The file is loaded into HDFS, which is a
bit different.
(2) Again agreed on not wanting every enrichment in all environments. And
for supporting multiple enrichment types, I do not believe a dropdown in
Ambari is the appropriate choice. That will most definitely not scale, imho.
(3) Yes
(4) Also yes. I'm leaning towards us providing the ability to load
enrichments via a zip bundle. I had recommended something similar a long
while back for Apache Falcon. It's clean, simple, and allows us to provide
some sort of manifest for defining the import. This also has the advantage
of us being able to potentially version the manifest format and options.
MPack is not an equivalent mechanism for this.


On Fri, Feb 24, 2017 at 7:30 AM, Nick Allen <n...@nickallen.org> wrote:

> >
> >
> > we now have a robust
> > ​ ​
> > and flexible means to import enrichment sources and transform their
> > contents as they are inserted into HBase. One of the main motivators for
> > this new functionality was to add the ability to load top domain rankings
> > from sources such as Alexa. The proposal is to make this type of
> enrichment
> > a top-level feature in Metron by introducing it to the Ambari management
> UI
>
>
> (1) In thinking through how the UI should work here, we should consider
> data sources beyond just those that would be loaded in HBase.  I would
> think the UI should be a single view of all data sources, no matter whether
> they load into HBase or not.
>
> It would also be good to think through how the solution might handle
> updating other types of data source, like the geo data, for instance. The
> geo data is something that needs to be updated on a regular basis.  Could
> this solution also manage that?
>
> I know Maxmind has a bit of code to manage updating their data, but I am
> not familiar with what it does or how it works.  Researching that might
> help inform this conversation.
>
>
> > How do folks feel about adding a set of dropdown options in the Ambari UI
> > for loading, updating, and deleting the top domains enrichment?
>
>
> (2) I think if this functionality is truly useful, there is likely going to
> be lots of different data sources that would be made available.  Many of
> which will NOT be applicable or desirable in every environment.
>
> This would be akin to packages or RPMs that are available to install on
> CentOS.  There are many to choose from, but in my specific environment
> there are many that I do not care about.
>
> Is an Ambari drop down scalable considering this usage pattern?
>
> Do we want Ambari to handle only the
> > initial install/load and have end users be responsible on an ongoing
> basis
> > for updates (users would be responsible for copying or distributing the
> > extractor_config.json for instance), or do we want to enable Ambari to
> > manage the configuration ongoing and enable functionality for reloading,
> > updating, and rollback?
>
>
> (3) Whatever solution we land on, it should handle refreshing/reloading the
> data on a regular basis.  This is something that has to be done for almost
> every useful data source and so should be baked into the solution. I don't
> think the functionality is that useful otherwise.
>
> (4) Another thing to consider is extensibility and ease of use.  If we can
> make it really easy to provide a means for loading a data source into
> Metron, then it is more likely that we will have community members willing
> to do that work.
>
> For example, think about the Homebrew project.  They make it stupid simple
> to add a new installable package.  You don't have to know how Homebrew
> works to contribute a package.  The result is they have tons of packages
> available.
>
> Does the Ambari MPack provide the right level of ease of use for that?
>
>
>
>
>
> On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > With the work committed in
> > https://github.com/apache/incubator-metron/pull/445 and
> > https://github.com/apache/incubator-metron/pull/432, we now have a
> robust
> > and flexible means to import enrichment sources and transform their
> > contents as they are inserted into HBase. One of the main motivators for
> > this new functionality was to add the ability to load top domain rankings
> > from sources such as Alexa. The proposal is to make this type of
> enrichment
> > a top-level feature in Metron by introducing it to the Am

Re: [DISCUSS] Sketch Libraries

2017-02-22 Thread Michael Miklavcic
@Casey That's correct - the stream-lib library has an HLL+ implementation
that not only works better for small data set sizes, but also scales to
much larger cardinalities than the previous HLL algorithm. The DataSketches
library appears to be a vanilla HLL implementation.

On Wed, Feb 22, 2017 at 7:11 AM, Casey Stella  wrote:

> So looking at it, it seems to fit the bill, with a couple of comments:
>
>- The quantiles stuff provides a CDF and PMF function, which is
>sufficient for our purposes.  I haven't seen any real comparison between
>t-digests and their approach.  A cursory glance at the source code
> leads me
>to believe that it's not tree-based, so I'd have to dig into it a bit
> more
>to understand the tradeoffs of their approach vs a tree-based approach
> like
>in t-digest
>- The HLL stuff seems to be pure HLL, rather than HLL+, which is what we
>support.  HLL+ has better accuracy characteristics for small sets, as I
>recall.  I'll defer to Mike Miklavcic on that as I haven't read the
> paper
>in a while.
>
> On the whole, I'd love to integrate with it and maybe swap out the t-digest
> approach for this since it has an active community around it.
>
> Anyway, thanks for bringing it to our attention and if anyone wants to take
> that on, I'd be on board with a +1 ;)
>
> Casey
>
> On Tue, Feb 21, 2017 at 10:22 PM, Matt Foley  wrote:
>
> > Looks interesting.  Any indication whether it supports MAD (median
> > absolute deviation) for outlier detection?
> >
> >
> > On 2/21/17, 8:08 AM, "Nick Allen"  wrote:
> >
> > We currently use the tdunning/t-digest
> >  library for generating our
> > STATS_*
> > sketches and then a separate library addthis/stream-lib
> >  for doing the HLL distinct
> > count.
> >
> > I ran across another library originating from Yahoo that looks quite
> > featureful, well documented and quite active.  On the surface it
> > *seems* to
> > be able to do what we need for both the STATS_* sketches and HLL.
> >
> > https://datasketches.github.io/
> >
> >
> > Has anyone evaluated this library before?  Are there deficiencies as
> > compared to the libraries that we currently use?
> >
> >
> >
> >
>


Re: [DISCUSS] 0.3.1 Release situation

2017-02-22 Thread Michael Miklavcic
+1 on RC5 from master/HEAD

On Wed, Feb 22, 2017 at 10:16 AM, Nick Allen  wrote:

> ​+1 on re-cutting RC5 from head.​  We're going to have to go through the
> same level of effort either way.  Might as well get more value out of it.
>
>
> On Wed, Feb 22, 2017 at 11:43 AM, Casey Stella  wrote:
>
> > I'm in favor of moving 0.3.1 RC5 concurrent with master.  I see a number
> of
> > things there will make the release better:
> >
> >- Better docs in the doc-book
> >- The CEF parser
> >
> >
> > Casey
> >
> > On Wed, Feb 22, 2017 at 7:46 AM, Kyle Richardson <
> > kylerichards...@gmail.com>
> > wrote:
> >
> > > +1 on pulling and cutting a new RC. Would we simply patch rc4 with this
> > one
> > > change or include all of the master commits too?
> > >
> > > -Kyle
> > >
> > > On Wed, Feb 22, 2017 at 10:29 AM, Nick Allen 
> wrote:
> > >
> > > > +1 I agree with you Casey.  I think we should re-cut the release.
> > > >
> > > > On Wed, Feb 22, 2017 at 10:27 AM, Casey Stella 
> > > wrote:
> > > >
> > > > > As you are all aware by now, we have an issue with our maven build.
> > In
> > > > > short, we tripped on https://github.com/maxmind/
> > GeoIP2-java/issues/77
> > > > >
> > > > > As such, our build no longer works, but also our RC for 0.3.1 no
> > longer
> > > > > builds.  I am inclined to pull the release candidate from voting on
> > > > > incubator general and re-cut a new candidate after the fix
> > METRON-734 (
> > > > > https://github.com/apache/incubator-metron/pull/462) gets in later
> > > > today.
> > > > > My reasoning is that the current situation makes the release
> > candidate
> > > > > un-releasable due to it not being able to be build.
> > > > >
> > > > > I would like to bring that decision to the community and get some
> > > > feedback,
> > > > > though, before I summarily retract the candidate on incubator
> > general.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best,
> > > > >
> > > > > Casey
> > > > >
> > > >
> > >
> >
>


[DISCUSS] Top domains enrichment config/extractor management

2017-02-21 Thread Michael Miklavcic
With the work committed in
https://github.com/apache/incubator-metron/pull/445 and
https://github.com/apache/incubator-metron/pull/432, we now have a robust
and flexible means to import enrichment sources and transform their
contents as they are inserted into HBase. One of the main motivators for
this new functionality was to add the ability to load top domain rankings
from sources such as Alexa. The proposal is to make this type of enrichment
a top-level feature in Metron by introducing it to the Ambari management UI
as a configurable set of properties in the MPack install. This comes with
some options and challenges in how we want to manage the configurations,
which I will outline below.

*Use cases:*

   - Single load of top domains file
   - Re-loading top domains file - need to be able to cleanup properly
   - Cleaning up/deleting old enrichment data (this is a general feature
   that we currently lack - I think it is worth a separate Jira/PR for
   creating a MapReduce job that enables cleanup to occur).
   - Modifying default top domains file source - there are other options
   besides Alexa. And users may want to load a file from local URI since many
   data centers do not have direct access to the internet.
   - Ability to modify the default extractor config JSON and tune the
   Stellar transformations for both the value and indicator transforms. Allows
   more flexible handling of data based on other sources.
   - Loading multiple top domains source enrichments. (Maybe a separate PR
   for this if we even think it would be useful)
   - Updating the top domain enrichment - This needs to be an atomic
   operation in order to prevent incorrect data.
   - Rolling back to an older version of the top domains enrichment. Also
   needs to be atomic.
   - Ability to schedule an enrichment load on schedule - we would like to
   defer this to an external scheduling mechanism, e.g. cron or Control M. The
   enrichment loading system should have the necessary features to enable this
   type of automation without data integrity issues.

*Considerations:*

   - As mentioned above, we want to add this feature to the Ambari MPack.
   This requires at least 2 parameters to work. We need the ability to specify
   a URI as well as an extractor config.
   - How do we want to manage the extractor config? The most obvious
   solution is to provide a text field in Ambari with a default JSON config.
   When a load is initiated, Ambari would place a fresh copy of the extractor
   config in the /tmp/ directory. This is an ephemeral file that isn't needed
   other than during a load.
   - It seems easy enough to have the load occur during the initial
   install, however subsequent loads would require a different workflow. How
   do folks feel about adding a set of dropdown options in the Ambari UI for
   loading, updating, and deleting the top domains enrichment? I believe we
   are doing something similar for the ElasticSearch templates currently.
   - In the case of atomic operations for updates and rollbacks, I propose
   we add a property to Zookeeper that is reference-able in the enrichment
   itself. The idea would be to create a "top-domains" property in ZK that
   points to an enrichment key with a load timestamp associated with it, e.g.
   top-domains_20170221042000. This would also allow a mapreduce job to be
   written that cleans up old enrichments. Another option is to create a new
   table in HBase if/when you update the enrichment and change the enrichment
   config manually. Deleting an old enrichment would simply be a matter of
   dropping the table in HBase. A relevant discussion of the tradeoffs of
   having many small tables versus 1 large table can be found here -
   http://grokbase.com/t/hbase/user/11bjbdw94q/multiple-tables-vs-big-fat-table
   - In order to update or rollback an enrichment as mentioned above, we
   would also ideally provide a mechanism for changing the rowkey pointed to
   by the enrichment.

In summary of the use cases and considerations above, this boils down to
how we'd like to leverage Ambari here. Do we want Ambari to handle only the
initial install/load and have end users be responsible on an ongoing basis
for updates (users would be responsible for copying or distributing the
extractor_config.json for instance), or do we want to enable Ambari to
manage the configuration ongoing and enable functionality for reloading,
updating, and rollback?

Best,
Mike


Re: [DISCUSS] Coding style via checkstyle

2017-02-21 Thread Michael Miklavcic
Justin, Just to clarify, the default Sun conventions don't abide fully to
our style guide, correct? As you mention above, we would need to create a
custom checkstyle.xml to handle "extended the character limit of a line
past 80 and made it two space indents"

On Tue, Feb 21, 2017 at 3:22 PM, Justin Leet <justinjl...@gmail.com> wrote:

> I'm also +1 to blanket formatting.
>
> As a note, we'll also need to set most of the sun checkstyle violations to
> warnings and only fail the build on error.  Even the quick and dirty
> reformat I tried only got rid of several thousand violations, leaving well
> over 10k violations.  We'll need to set it to only fail code styling
> violations. If we're clever, we could probably even set it up so that test
> classes don't even throw warnings on things like magic numbers, but that's
> not at all necessary for right now.
>
> The current PR also does not include a custom checkstyle.xml. It's just
> reporting on the default Sun conventions as a start.  We can pretty easily
> take that and modify it, though.  I'd also say we should include the
> IntelliJ codestyle template along with instructions for installing in that
> portion of effort. I could be pretty easily persuaded to include it in this
> PR also though, if we want.
>
> That checking is not part of the current PR (checkstyle is only in the
> reporting element, not the build element).  It's easy enough to setup
> though.  That setup is discussed at maven-checkstyle-plugin
> <https://maven.apache.org/plugins/maven-checkstyle-plugin/usage.html>.
>
>
> On Tue, Feb 21, 2017 at 2:13 PM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > I would also like the idea configuration.
> >
> >
> > On February 21, 2017 at 17:06:09, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > +1 to blanket reformat as well.
> >
> > On Tue, Feb 21, 2017 at 1:25 PM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> > > +1. I agree with Michael’s points.
> > >
> > >
> > > On February 21, 2017 at 16:23:21, Michael Miklavcic (
> > > michael.miklav...@gmail.com) wrote:
> > >
> > > +1 to a blanket reformat, failed build for improper formatting, and
> > > automated formatting. I strongly prefer to remove "thinking" from my
> code
> > > formatting and it has worked very well for me on large projects in the
> > > past. There is capability now in IntelliJ to work with Checkstyle as
> > well.
> > > https://youtrack.jetbrains.com/issue/IDEA-61520#comment=27-1292600
> > > https://plugins.jetbrains.com/idea/plugin/1065-checkstyle-idea
> > >
> > > A quick search didn't yield any obviously robust tools for automating
> the
> > > formatting other than an older non-maintained project named Jalopy. I
> > think
> > > the checkstyle integration with IntelliJ and Eclipse should suffice
> since
> > > the Maven plugin would give devs the ability to run checks locally and
> in
> > > Github via Travis.
> > >
> > >
> > > On Tue, Feb 21, 2017 at 12:32 PM, Nick Allen <n...@nickallen.org>
> wrote:
> > >
> > > > I would be in favor of a blanket, reformat. Whether that is for the
> > > entire
> > > > code base or one project at a time. Might be able to conquer and
> divide
> > > > some of the heavy-lifting of testing, if we do a project at a time.
> But
> > > > whichever way you think is easier. I'd be glad to help.
> > > >
> > > > On Tue, Feb 21, 2017 at 1:57 PM, Justin Leet <justinjl...@gmail.com>
> > > > wrote:
> > > >
> > > > > I already tried a blanket, manual reformat the other day, through
> > > > > IntelliJ. I did every file matching *.java in the project and it
> was
> > > > > pretty quick. I didn't validate everything looked perfect
> afterwards,
> > > > but I
> > > > > did click into a few files and things looked fine. I'm not quite
> sure
> > > > what
> > > > > the lifecycle of our autogenerated stuff is, so we'd want to regen
> > > > > afterwards, but it's a pretty trivial thing to do.
> > > > >
> > > > > I'm sure there's more nuance (and definitely more testing) than
> that,
> > > but
> > > > > off the top of my head I'm not sure what it would be. Either way, I
> > > don't
> > > > > think there's a huge amount of effort to just do the reformat, but
> > we'd
> > > > > still want to spin everything up an

Re: [DISCUSS] Coding style via checkstyle

2017-02-21 Thread Michael Miklavcic
+1 to a blanket reformat, failed build for improper formatting, and
automated formatting. I strongly prefer to remove "thinking" from my code
formatting and it has worked very well for me on large projects in the
past. There is capability now in IntelliJ to work with Checkstyle as well.
https://youtrack.jetbrains.com/issue/IDEA-61520#comment=27-1292600
https://plugins.jetbrains.com/idea/plugin/1065-checkstyle-idea

A quick search didn't yield any obviously robust tools for automating the
formatting other than an older non-maintained project named Jalopy. I think
the checkstyle integration with IntelliJ and Eclipse should suffice since
the Maven plugin would give devs the ability to run checks locally and in
Github via Travis.


On Tue, Feb 21, 2017 at 12:32 PM, Nick Allen  wrote:

> I would be in favor of a blanket, reformat.  Whether that is for the entire
> code base or one project at a time.  Might be able to conquer and divide
> some of the heavy-lifting of testing, if we do a project at a time.  But
> whichever way you think is easier.  I'd be glad to help.
>
> On Tue, Feb 21, 2017 at 1:57 PM, Justin Leet 
> wrote:
>
> > I already tried a blanket, manual reformat the other day, through
> > IntelliJ.  I did every file matching *.java in the project and it was
> > pretty quick. I didn't validate everything looked perfect afterwards,
> but I
> > did click into a few files and things looked fine. I'm not quite sure
> what
> > the lifecycle of our autogenerated stuff is, so we'd want to regen
> > afterwards, but it's a pretty trivial thing to do.
> >
> > I'm sure there's more nuance (and definitely more testing) than that, but
> > off the top of my head I'm not sure what it would be. Either way, I don't
> > think there's a huge amount of effort to just do the reformat, but we'd
> > still want to spin everything up and test it and so on.  It's probably
> more
> > work for everybody to rebase onto the (vastly) reformatted code than
> > anything else, which will vary pretty significantly.
> >
> > For (slight) context, the changes are enough to eliminate ~5k checkstyle
> > warnings (and there might be more if we have to tweak anything in the
> code
> > formatting).
> >
> > On Tue, Feb 21, 2017 at 10:34 AM, Casey Stella 
> wrote:
> >
> > > Any idea, with those modifications to checkstyle, how much effort it
> will
> > > take to reformat the code to conform?
> > >
> > > On Tue, Feb 21, 2017 at 8:23 AM, Justin Leet 
> > > wrote:
> > >
> > > > As part of:
> > > > https://issues.apache.org/jira/browse/METRON-726
> > > > https://github.com/apache/incubator-metron/pull/459
> > > >
> > > > I integrated checkstyle into the mvn:site command, and have
> checkstyle
> > > > reports being run as part of the mvn:site reporting. I expect to be
> > > > celebrating hitting 25k checkstyle warnings soon.
> > > >
> > > > I tested out creating a code formatting setup in IntelliJ, with a
> > couple
> > > > slight modifications of the default Sun conventions (extended the
> > > character
> > > > limit of a line past 80 and made it two space indents). Given that
> > > > checkstyle includes it as a default option, it's probably reasonably
> > > close
> > > > to the Sun conventions. I'm thinking we probably also at least create
> > an
> > > > Eclipse profile, to open up ease of development.
> > > >
> > > > There's probably also a discussion about how exactly we want to
> enforce
> > > it.
> > > > Is it just something we add to the PR checklist and have reviewers
> > give a
> > > > glance, do we setup a hook to autoformat code, etc?
> > > >
> > > > Justin
> > > >
> > >
> >
>


Re: custom date format required for snort, but not working

2017-02-21 Thread Michael Miklavcic
We decided at some point that given that there is an option in Snort to
enable years in the timestamp that this was the best option for handling
the dates. This should already be the default for Vagrant.

On Tue, Feb 21, 2017 at 5:59 AM, Otto Fowler 
wrote:

> ok -
>
> # Configure Snort to show year in timestamps
> config show_year
>
> looks like it fixed it for him.
> I create a jira to make sure this is in our default
>
> On February 20, 2017 at 16:47:29, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> There is someone on the user list getting errors from snort, and I sent him
> this reply:
>
> -
> 2017-02-20 16:00:14 ERROR BasicSnortParser:179 - Unable to parse message:
> 02/18-16:24:46.262884 ,1,999158,0,"'snort test
> alert'",TCP,192.168.1.85,58472,192.168.1.216,22,34:68:
> 95:01:D1:BB,52:54:00:E0:8F:0D,0x42,***A,0x6756B8AF,
> 0xA5EF764E,,0x5A4,64,16,57034,52,53248
> java.time.format.DateTimeParseException: Text '02/18-16:24:46.262884'
> could
> not be parsed at index 5
>
> We are expect a date more like 01/27/16-16:01:04.877970
> So the year is missing.
>
>
> Our default date formatter for snort is defined as
> MM/dd/yy-HH:mm:ss.SS
>
> You can change this by adding “dateFormat”:”your format” to your parser
> configuration
> ——
>
> The issue is, I can’t get this to work.  I don’t think that the
> ZonedTimeDate will work if the year is missing.
>
> I tried the following test:
>
> import java.time.ZoneId;
>
> import java.time.ZonedDateTime;
>
> import java.time.format.DateTimeFormatter;
>
>
> class Untitled {
>
> public static void main(String[] args) {
>
> String fmt = "MM/dd-HH:mm:ss.SS";
>
> String old = "MM/dd/yy-HH:mm:ss.SS";
>
> String dateString = "02/18-16:24:46.262900";
>
> String oldString = "02/18/17-16:24:46.262900";
>
> DateTimeFormatter df = DateTimeFormatter.ofPattern(fmt);
>
> df = df.withZone(ZoneId.systemDefault());
>
> ZonedDateTime zdt = ZonedDateTime.parse(dateString,df);
>
> System.out.println(String.format("%d",zdt.toInstant().toEpochMilli()));
>
> }
>
> }
>
>
> Old and oldString work.
>
>
> fmt and dateString don’t with exception:
>
>
> Exception in thread "main" java.time.format.DateTimeParseException: Text
> '02/18-16:24:46.262900' could not be parsed: Unable to obtain ZonedDateTime
> from TemporalAccessor: {MonthOfYear=2, DayOfMonth=18},ISO,America/New_York
> resolved to 16:24:46.262900 of type java.time.format.Parsed
>
> at
> java.time.format.DateTimeFormatter.createError(
> DateTimeFormatter.java:1920)
>
> at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1855)
>
> at java.time.ZonedDateTime.parse(ZonedDateTime.java:597)
>
> at Untitled.main(Untitled 2.java:13)
>
> Caused by: java.time.DateTimeException: Unable to obtain ZonedDateTime from
> TemporalAccessor: {MonthOfYear=2, DayOfMonth=18},ISO,America/New_York
> resolved to 16:24:46.262900 of type java.time.format.Parsed
>
> at java.time.ZonedDateTime.from(ZonedDateTime.java:565)
>
> at java.time.format.Parsed.query(Parsed.java:226)
>
> at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
>
> ... 2 more
>
> Caused by: java.time.DateTimeException: Unable to obtain LocalDate from
> TemporalAccessor: {MonthOfYear=2, DayOfMonth=18},ISO,America/New_York
> resolved to 16:24:46.262900 of type java.time.format.Parsed
>
> at java.time.LocalDate.from(LocalDate.java:368)
>
> at java.time.ZonedDateTime.from(ZonedDateTime.java:559)
>
> ... 4 more
>
>
> The snort parser doesn’t document the dateFormat override ( METRON-729 ).
> I don’t now and have not found a way to modify how snort outputs date
> string.
>
> Any ideas?
>


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC4

2017-02-10 Thread Michael Miklavcic
Verified steps 0 and 1. Will work on the remaining steps as time permits.

M

On Fri, Feb 10, 2017 at 1:22 PM, Casey Stella  wrote:

> This is a call to vote on releasing Apache Metron 0.3.1-RC4 incubating
>
> Full list of changes in this release:
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC4-incubating/CHANGES
>
> The tag/commit to be voted upon is apache-metron-0.3.1-rc4-incubating:
> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc4-incubating
>
> The source archive being voted upon can be found here:
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC4-incubating/apache-metron-0.3.1-rc4-incubating.tar.gz
>
> Other release files, signatures and digests can be found here:
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC4-incubating/
>
> The release artifacts are signed with the following key:
> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/
> apache-metron-0.3.1-rc4-incubating
>
> The book associated with this RC is located at
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC4-incubating/book-site/index.html
>
> Please vote on releasing this package as Apache Metron 0.3.1-RC4 incubating
>
> When voting, please list the actions taken to verify the release.
>
> Recommended build validation and verification instructions are posted here:
> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>
>
> This vote will be open for at least 72 hours.
>
> [ ] +1 Release this package as Apache Metron 0.3.1-RC4 incubating
>
> [ ]  0 No opinion
>
> [ ] -1 Do not release this package because...
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
FYI, found this for Docker - https://docs.travis-ci.com/user/docker/

On Tue, Feb 7, 2017 at 9:09 AM, David Lyle <dlyle65...@gmail.com> wrote:

> Absolutely agree. I also think we'd want both once we've done that. Travis
> is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
> of medium duration tests and would be great for automating our distributed
> testing if we found infrastructure to support it. I've seen them used in
> concert to provide a good solution.
>
> But, initially, I'd like to see us get our in-process stuff replaced with
> docker where (if) it makes sense, refactored to run in parallel, the poms
> refactored to handle our dependencies better and our uber jars removed
> where they can be and minimized where they cannot be.
>
> Which, I think, is a long-winded way of saying "I'd like to see us do what
> Casey suggested." :)
>
> -D...
>
>
> On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I agree with this. I don't think we should switch to an alternate system
> > until we find that we are absolutely incapable of eking out any further
> > efficiency from the current setup.
> >
> > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella <ceste...@gmail.com> wrote:
> >
> > > I believe that some people use travis and some people request Jenkins
> > from
> > > Apache Infra.  That being said, personally, I think we should take the
> > > opportunity to correct the underlying issues.  50 minutes for a build
> > seems
> > > excessive to me.
> > >
> > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <ottobackwa...@gmail.com>
> > > wrote:
> > >
> > > > Is there an alternative to Travis?  Do other like sized apache
> projects
> > > > have these problems?  Do they use travis?
> > > >
> > > >
> > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > > wrote:
> > > >
> > > > For those with pending/building pull requests, it will come as no
> > > surprise
> > > > that our build times are increasing at a pace that is worrisome. In
> > fact,
> > > > we have hit a fundamental limit associated with Travis over the
> > weekend.
> > > > We have creeped up into the 40+ minute build territory and travis
> seems
> > > to
> > > > error out at around 49 minutes.
> > > >
> > > > Taking the current build (
> > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> looking
> > > at
> > > > just job times, we're spending about 19 - 20 minutes (1176.53
> seconds)
> > in
> > > > tests out of 44 minutes and 42 seconds to do the build. This places
> the
> > > > unit tests at around 43% of the build time. I say all of this to
> point
> > > out
> > > > that while unit tests are a portion of the build, they are not even
> the
> > > > majority of the build time. We need an approach that addresses the
> > whole
> > > > build performance holistically and we need it soonest.
> > > >
> > > > To seed the discussion, I will point to a few things that come to
> mind
> > > > that
> > > > fit into three broad categories:
> > > >
> > > > *Tests are Slow*
> > > >
> > > >
> > > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> > and
> > > > make up 14 minutes of the build. Considering what we can do to speed
> > > those
> > > > tests as a tactical approach may be worth considering
> > > > - We are spinning up the same services (e.g. kafka, storm) for
> multiple
> > > > tests, instead use the docker infrastructure to spin them up once and
> > > then
> > > > use them throughout the tests.
> > > >
> > > >
> > > > *Tests aren't parallel*
> > > >
> > > > Currently we cannot run the build in parallel due to the integration
> > test
> > > > infrastructure spinning up its own services that bind to the same
> > ports.
> > > > If we correct this, we can run the builds in parallel with mvn -T
> > > >
> > > > - Correct this by decoupling the infrastructure from the tests and
> > > > refactoring the tests to run in parallel.
> > > > - Make the integration testing infrastructure bind intelligently to
> > > > whatever port is available.
> > > > - M

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
I agree with this. I don't think we should switch to an alternate system
until we find that we are absolutely incapable of eking out any further
efficiency from the current setup.

On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to shade and relocate dependencies that we want to isolate from
> > our
> > transitive dependencies. The consequences of this is that we spend a lot
> > of time in the build shading and relocating maven module output.
> >
> > - Do the hard work to walk our transitive dependencies and ensure that
> > we are including only one copy of every library by using exclusions
> > effectively. This will not only bring down build times, it will make sure
> > we know what we're including.
> > - Try to devise a strategy where we only shade once at the end. This
> > could look like some combination of
> > - standardizing on the lowest common denominator of a troublesome
> > library
> > - We shade in dependencies so they can use different versions of
> > libraries (e.g. metron-common with a modern version of guava) than the
> > final jars.
> > - exclusions
> > - externalizing infrastructure out to not necessitate spinning up
> > hadoop components in-process for integration tests (i.e. hbase server
> > conflicts with storm in a few dependencies)
> >
> > *Final Thoughts*
> >
> > If I had three to pick, I'd pick
> >
> > - moving off of the in-memory component infrastructure to docker images
> > - fixing the maven poms to exclude correctly
> > - ensuring the resulting tests are parallelizable
> >
> > I will point out that fixing the maven poms to exclude correctly (i.e. we
> > choose the version of every jar that we depend on transitively) ticks
> > multiple boxes, not just making things faster.
> >
> > What are your thoughts? What did I miss? We need a plan and we need to
> > execute on it soon, otherwise travis is going to keep smacking us hard.
> It
> > may be worth while constructing a tactical plan and then a more strategic
> > plan that we can work toward. I was heartened at how much some of these
> > suggestions dovetail with the discussion around the future of the docker
> > infrastructure.
> >
> > Best,
> >
> > 

Re: [DISCUSS] Next Release (0.3.1) Content

2017-02-03 Thread Michael Miklavcic
Hi Casey,

My understanding from legal and our mentors is that as it currently stands,
we would not have to wait.

http://mail-archives.apache.org/mod_mbox/incubator-metron-dev/201701.mbox/%3CCAF1jEfCGVGd9tVDJfRwoxFjEmwHD59DuXTdMu-_OJQwcLvryog%40mail.gmail.com%3E

and

http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201702.mbox/%3CCAOqetn-kB_ekCxB4z%2Bb0J%3DG0qJJKFepkfmMGa8OvBhxmPcvjvw%40mail.gmail.com%3E


On Thu, Feb 2, 2017 at 3:03 PM, Casey Stella  wrote:

> Just a quick release update.  As of now, we are waiting on
>
>- METRON-660 to get reviewed and make it in
>- METRON-692 to get our upgrade.md completed for this release
>
> Mike Miklavcic, you sent an email to the legal-discuss about our kraken
> dependency and it looked like we didn't have to change, but could you
> comment on this thread in the dev list so I know if we need to wait for
> METRON-650.
>
> Casey
>
> On Thu, Feb 2, 2017 at 4:58 PM, Casey Stella  wrote:
>
> > Ok, I've created the upgrading document for 0.3.0 to 0.3.1 and included
> > the things that I know about and the things Jon mentioned here.  Please,
> if
> > you have knowledge of other breaking/non-compatible changes between the
> > 0.3.0 release and master, comment on this PR (https://github.com/apache/
> > incubator-metron/pull/437) and I will incorporate them.
> >
> > On Fri, Jan 27, 2017 at 10:04 AM, zeo...@gmail.com 
> > wrote:
> >
> >> To start I was mostly concerned with having a per-version list of
> >> non-backwards-compatible changes, so upgrades that may skip a version of
> >> two can look at what all may be impacted.  We should also probably
> >> document
> >> any sort of upgrade flaws as well, such as METRON-447
> >> , METRON-448
> >> , etc.  I do think
> that
> >> we should have a more rigorous document, but I wouldn't push that for
> the
> >> 0.3.1 release.  I see that (along with the Management UI, API, etc.) as
> >> key
> >> (required) components of a 1.0 release.  I'd just like to see the
> >> foundation begin to be laid and iterated on.
> >>
> >> That said, this probably constitutes a mention in the development
> >> guidelines
> >>  >> pageId=61332235>
> >> once
> >> it's in master.
> >>
> >> Jon
> >>
> >> On Fri, Jan 27, 2017 at 9:05 AM Casey Stella 
> wrote:
> >>
> >> > I should add, you may be thinking something more rigorous and
> >> > step-by-step.  If so, you think you might be interested in
> volunteering
> >> to
> >> > do a first draft as a PR that we can adjust?
> >> >
> >> > On Fri, Jan 27, 2017 at 9:01 AM, Casey Stella 
> >> wrote:
> >> >
> >> > > So, I agree with the Upgrading.md and I was going to submit a PR at
> >> least
> >> > > to describe the the changes to indexing configurations that I made
> >> during
> >> > > the 3.0.1 release.
> >> > >
> >> > >
> >> > > On Thu, Jan 26, 2017 at 10:50 PM, zeo...@gmail.com <
> zeo...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I haven't had a chance to look through the unresolved JIRAs but I
> did
> >> > want
> >> > >> to mention a few quick things.
> >> > >>
> >> > >> First, when we released 0.3.0 and dropped the BETA flag, one of the
> >> > things
> >> > >> that was discussed was putting together a method of documenting
> >> upgrades
> >> > >> from one version to the next.  As one of the first steps toward
> >> making
> >> > >> that
> >> > >> a reasonable process, I think we should assemble more detailed
> >> release
> >> > >> notes, especially outlining non-backwards compatible changes.  In
> the
> >> > >> "[DISCUSS] Next Release Name" email thread Kyle Richardson
> suggested
> >> we
> >> > >> use
> >> > >> "UPGRADING.md" to do this, and I still agree with that thought, but
> >> I'm
> >> > >> open to alternatives.
> >> > >>
> >> > >> Separately, a *nice to have* would be *METRON-660*, which was
> >> discussed
> >> > in
> >> > >> the "[PROPOSAL] up-to-date versioned documentation" thread, to give
> >> us
> >> > >> some
> >> > >> cleaner documentation using the existing READMEs.  I'd be happy to
> >> help
> >> > >> with this one, I'm just not sure what the next steps are, aside
> from
> >> the
> >> > >> start that Matt has here
> >> > >>  >.
> >> > >>
> >> > >> My last *nice to have* is *METRON-635*, which I have a *PR open for
> >> here
> >> > >> *.  If I
> could
> >> get
> >> > >> someone else to reproduce the error that I'm seeing I would be
> happy
> >> to
> >> > >> pursue additional testing, troubleshooting, etc.  I've seen others
> >> > report
> >> > >> the same issue
> >> > >>  >> > >> question=search%2Fsearch=relevance=scp_if_ssh>
> >> > >> on the HCC 

Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-31 Thread Michael Miklavcic
All, unless there are any objections, for now it might be best to keep the
status quo and circle back around to modifying this dependency at a later
time. While not ideal, it does not appear to currently violate any
licensing rules.

Best,
Mike

On Tue, Jan 31, 2017 at 11:04 AM, Billie Rinaldi <bil...@apache.org> wrote:

> On Tue, Jan 31, 2017 at 9:41 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Hi Billie,
> >
> > Thanks for the feedback and info. I'm working on resolving this over the
> > next couple of days and could use some additional guidance when you have
> a
> > moment.
> >
> > You mention building the Kraken code as part of Metron. So I could
> > literally pull down the full source, plop it in a new Maven submodule
> > within the Metron project structure and be good to go? Seems like this
> > might actually be easiest. Plus we'd have the source code.
> >
>
> There is an IP clearance process for bringing externally developed code
> into an ASF project: http://incubator.apache.org/ip-clearance/
> It says a software grant form is required. I am not sure what the options
> are in the case where a software grant can't be obtained and the code is
> ASLv2 licensed. This would probably have to be discussed on the incubator
> general list or on legal.
>
>
> >
> > Alternatively, you mention publishing to Maven Central. It looks like
> Maven
> > Central has some requirements for publishing artifacts that might prove
> > hairy - http://central.sonatype.org/pages/requirements.html. Let's say I
> > fork the project (not via Metron) in my own github fork and modify the
> > Kraken poms to provide the necessary info. I'm supposed to provide the
> scm
> > location (my github repo), javadoc, signed jars, etc. I'd also need to
> > modify the groupId. Should that then be something personal (e.g.
> > com.michaelmiklavcic) or would it be ok to use org.apache.metron as a
> > groupId? I prefer to use Metron's groupId. I believe there is also a
> review
> > process involved with getting artifacts published to the central repo
> which
> > might take some time.
> >
>
> You not should use Apache's groupId. I heard from Josh that Sonatype
> encourages com.github..
>
>
> >
> > I think the submodule sounds like the best approach, but want to be sure
> > I've understood the recommendations correctly. We need to resolve this as
> > part of our move out of incubation to TLP status.
> >
> > Thanks,
> > Mike
> >
> >
> > On Tue, Jan 17, 2017 at 2:49 PM, Billie Rinaldi <bil...@apache.org>
> wrote:
> >
> > > On Fri, Jan 13, 2017 at 3:35 PM, Matt Foley <ma...@apache.org> wrote:
> > >
> > > > Perhaps it would be more appropriate to put it under
> > > > https://dist.apache.org/repos/dist/release/incubator/metron/ ,
> perhaps
> > > as
> > > > https://dist.apache.org/repos/dist/release/incubator/metron/mvn-repo
> ?
> > > >
> > >
> > > No, we could only do that if it were a release artifact for an official
> > > release. There is some more information about releases here:
> > > http://www.apache.org/dev/release.html#what. Specifically, anything
> that
> > > is
> > > published is considered a release, and that would definitely include
> > > anything on dist.apache.org. We can only release source code and
> binary
> > > artifacts resulting from compiling that source code.
> > >
> > >
> > > >
> > > > We should not host anything with a license that isn’t compatible with
> > > > inclusion in an Apache project.  If we post only non-source
> artifacts,
> > > then
> > > > that would include packages with “Category B List” licenses (that is,
> > > > ‘"WEAK COPYLEFT" LICENSES’) as well as “Category A List” licenses
> > (those
> > > > “SIMILAR IN TERMS TO THE APACHE LICENSE 2.0”) -- per
> > > > https://www.apache.org/legal/resolved .  For versioning, we could
> > simply
> > > > structure as a maven repo, and in fact that’s what I think we should
> > do.
> > > >
> > > > Hosting the source code is not, I think, something we are supposed to
> > do
> > > > for non-Apache projects: https://www.apache.org/legal/resolved
> again,
> > > > this time the very first question:
> > > >
> > > > CAN ASF PMCS HOST PROJECTS THAT ARE NOT UNDER THE APACHE LICENSE?
> > > > No. See the Apache Software Foundation licenses page for more
> > >

Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-31 Thread Michael Miklavcic
My reservation with this approach is that we're still depending on an
unreleased version of files from a github repo that we do not control. So
if Kraken or OpenSOC pull their repos, then this no longer works. The 2
approaches outlined above mitigate this risk as follows:
1. Once published to Maven Central, the artifacts are there forever. The
source code is also required to publish to Maven Central, so even though
it's not in a repo structure, it could still be referenced if absolutely
needed.
2. Bringing in the source code allows us to avoid a separate review process
with Sonatype/Maven for getting into Maven Central. Kraken pcap source
hasn't been modified in quite a while, so we wouldn't really be missing
much by forking the project internally. And we'd have the full source
building in case we need to address any bugs or security issues. Much
easier than the other option.

On Tue, Jan 31, 2017 at 10:53 AM, JJ Meyer <jjmey...@gmail.com> wrote:

> Mike,
>
> You wouldn't  need to necessarily download the code and host it in the
> metron repo. Git sub-modules are sometimes used in cases like this. It is
> more like a pointer to an external repo. Below is a short read on how we
> could potentially do this with git sub-modules. I have used these in the
> past. I will say sometimes it becomes a bit confusing as to what version of
> the submodule is being used. It could just be me though.
>
> http://alex.nederlof.com/blog/2013/07/08/using-git-submodules-for-maven-
> artifacts-not-in-central/
>
> Thanks,
>
> On Tue, Jan 31, 2017 at 11:41 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Hi Billie,
> >
> > Thanks for the feedback and info. I'm working on resolving this over the
> > next couple of days and could use some additional guidance when you have
> a
> > moment.
> >
> > You mention building the Kraken code as part of Metron. So I could
> > literally pull down the full source, plop it in a new Maven submodule
> > within the Metron project structure and be good to go? Seems like this
> > might actually be easiest. Plus we'd have the source code.
> >
> > Alternatively, you mention publishing to Maven Central. It looks like
> Maven
> > Central has some requirements for publishing artifacts that might prove
> > hairy - http://central.sonatype.org/pages/requirements.html. Let's say I
> > fork the project (not via Metron) in my own github fork and modify the
> > Kraken poms to provide the necessary info. I'm supposed to provide the
> scm
> > location (my github repo), javadoc, signed jars, etc. I'd also need to
> > modify the groupId. Should that then be something personal (e.g.
> > com.michaelmiklavcic) or would it be ok to use org.apache.metron as a
> > groupId? I prefer to use Metron's groupId. I believe there is also a
> review
> > process involved with getting artifacts published to the central repo
> which
> > might take some time.
> >
> > I think the submodule sounds like the best approach, but want to be sure
> > I've understood the recommendations correctly. We need to resolve this as
> > part of our move out of incubation to TLP status.
> >
> > Thanks,
> > Mike
> >
> >
> > On Tue, Jan 17, 2017 at 2:49 PM, Billie Rinaldi <bil...@apache.org>
> wrote:
> >
> > > On Fri, Jan 13, 2017 at 3:35 PM, Matt Foley <ma...@apache.org> wrote:
> > >
> > > > Perhaps it would be more appropriate to put it under
> > > > https://dist.apache.org/repos/dist/release/incubator/metron/ ,
> perhaps
> > > as
> > > > https://dist.apache.org/repos/dist/release/incubator/metron/mvn-repo
> ?
> > > >
> > >
> > > No, we could only do that if it were a release artifact for an official
> > > release. There is some more information about releases here:
> > > http://www.apache.org/dev/release.html#what. Specifically, anything
> that
> > > is
> > > published is considered a release, and that would definitely include
> > > anything on dist.apache.org. We can only release source code and
> binary
> > > artifacts resulting from compiling that source code.
> > >
> > >
> > > >
> > > > We should not host anything with a license that isn’t compatible with
> > > > inclusion in an Apache project.  If we post only non-source
> artifacts,
> > > then
> > > > that would include packages with “Category B List” licenses (that is,
> > > > ‘"WEAK COPYLEFT" LICENSES’) as well as “Category A List” licenses
> > (those
> > > > “SIMILAR IN TERMS TO THE APACHE LICENSE 2.0”

Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-31 Thread Michael Miklavcic
Hi Billie,

Thanks for the feedback and info. I'm working on resolving this over the
next couple of days and could use some additional guidance when you have a
moment.

You mention building the Kraken code as part of Metron. So I could
literally pull down the full source, plop it in a new Maven submodule
within the Metron project structure and be good to go? Seems like this
might actually be easiest. Plus we'd have the source code.

Alternatively, you mention publishing to Maven Central. It looks like Maven
Central has some requirements for publishing artifacts that might prove
hairy - http://central.sonatype.org/pages/requirements.html. Let's say I
fork the project (not via Metron) in my own github fork and modify the
Kraken poms to provide the necessary info. I'm supposed to provide the scm
location (my github repo), javadoc, signed jars, etc. I'd also need to
modify the groupId. Should that then be something personal (e.g.
com.michaelmiklavcic) or would it be ok to use org.apache.metron as a
groupId? I prefer to use Metron's groupId. I believe there is also a review
process involved with getting artifacts published to the central repo which
might take some time.

I think the submodule sounds like the best approach, but want to be sure
I've understood the recommendations correctly. We need to resolve this as
part of our move out of incubation to TLP status.

Thanks,
Mike


On Tue, Jan 17, 2017 at 2:49 PM, Billie Rinaldi <bil...@apache.org> wrote:

> On Fri, Jan 13, 2017 at 3:35 PM, Matt Foley <ma...@apache.org> wrote:
>
> > Perhaps it would be more appropriate to put it under
> > https://dist.apache.org/repos/dist/release/incubator/metron/ , perhaps
> as
> > https://dist.apache.org/repos/dist/release/incubator/metron/mvn-repo ?
> >
>
> No, we could only do that if it were a release artifact for an official
> release. There is some more information about releases here:
> http://www.apache.org/dev/release.html#what. Specifically, anything that
> is
> published is considered a release, and that would definitely include
> anything on dist.apache.org. We can only release source code and binary
> artifacts resulting from compiling that source code.
>
>
> >
> > We should not host anything with a license that isn’t compatible with
> > inclusion in an Apache project.  If we post only non-source artifacts,
> then
> > that would include packages with “Category B List” licenses (that is,
> > ‘"WEAK COPYLEFT" LICENSES’) as well as “Category A List” licenses (those
> > “SIMILAR IN TERMS TO THE APACHE LICENSE 2.0”) -- per
> > https://www.apache.org/legal/resolved .  For versioning, we could simply
> > structure as a maven repo, and in fact that’s what I think we should do.
> >
> > Hosting the source code is not, I think, something we are supposed to do
> > for non-Apache projects: https://www.apache.org/legal/resolved again,
> > this time the very first question:
> >
> > CAN ASF PMCS HOST PROJECTS THAT ARE NOT UNDER THE APACHE LICENSE?
> > No. See the Apache Software Foundation licenses page for more
> details,
> > and the Apache Software Foundation page for additional background.
> >
>
> Kraken does appear to be licensed under ASLv2. Based on that, it might be
> possible to use the kraken code as the basis of a submodule of the Metron
> project, so that the necessary kraken jars would be built as part of the
> Metron build.
>
> Alternatively, someone could just push the kraken jars to Maven central
> under a new group id. Here's an example of a personal GitHub repo project
> configured to publish to Maven central via Sonatype:
> https://github.com/joshelser/dropwizard-hadoop-metrics2/
> blob/master/pom.xml.
>
>
> >
> > On 1/13/17, 8:11 AM, "Billie Rinaldi" <bil...@apache.org> wrote:
> >
> > No, we can't host artifacts in a git repo, or on a website. It would
> be
> > like distributing a release that hasn't been voted upon.
> >
> > Regarding message threading, in Gmail adding a [tag] to the subject
> > does
> > not create a new thread. So the change is not visible in my mailbox
> > unless
> > the rest of the subject is changed as well.
> >
> > On Mon, Jan 9, 2017 at 1:00 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > This is a question primarily for the mentors.
> > >
> > > *Background*
> > > metron-common is currently depending on the openSOC github repo for
> > hosting
> > > kraken artifacts. The original reason for this was that these jars
> > are not
> > > hosted in Maven Central, and they were not reliably available i

Re: [DISCUSS] How to do Sliding Windows in Profiler

2017-01-25 Thread Michael Miklavcic
Hey guys,

1. I'm going to Google that brandy analogy later. It sounds tasty :)
2. The option for reading/writing multiple profiles makes sense to me and
provides much greater flexibility and efficiency.
3. My gut reaction to reading through this is "wow." This seems really
complicated. I'm all for the flexibility provided by these composable
functions and think this would be much better for users with either some
templates or other form of abstraction to simplify common patterns.

Mike


On Wed, Jan 25, 2017 at 8:02 AM, Casey Stella  wrote:

> So, I think the key takeaway, at least for me, and my key feedback is that:
>
>- We should separate the wrong behavior of MAD with the ability to
>improve windowing.  We can fix one without the other and, in my opinion,
>the key fix for MAD is keeping the value distribution independent of the
>MAD state, as Matt noted at the end of this email.
>- Because the above necessitates tracking two things, it'd be nice to
>adopt Matt's suggestion to be able to sort of merge profiles track two
>results.
>
> One more thing, in general, if you ever want to merge outputs on read, then
> you should never write out the merged data (unless the data you're writing
> has a way to untangle the overlap).  They are two separate semantics.
>
> On merge on read, you choose the window at read time.  An example of this
> for a hypothetically adjusted MAD to have a OUTLIER_MAD_INIT function which
> takes the value distribution would be:
> {
>   "profiles": [
> {
>   "profile": "[ 'sketchy_values', 'sketchy_mad' ]",
>   "foreach": "'global'",
>   "init" : {
> "s": "OUTLIER_MAD_INIT(STATS_MERGE(PROFILE_GET('sketchy_values',
> 'global', 5, 'MINUTES')))",
> "val_dist": "STATS_INIT()"
> },
>   "update": {
> "val_dist": "STATS_ADD(val_dist, value)",
> "s": "OUTLIER_MAD_ADD(s, value)"
> },
>   "result": {
> "sketchy_values": "val_dist",
> "sketchy_mad": "s",
> }
> }
>   ]
> }
>
> You would then read this in the enrichment topology or wherever
> via OUTLIER_MAD_SCORE(OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad',
> 'global', 5, 'MINUTES')), value)
>
>
> On merge on write, you choose the window size on write time and should read
> only the latest profile entry on read, so that would look like:
> {
>   "profiles": [
> {
>   "profile": "[ 'sketchy_values', 'sketchy_mad', 'windowed_mad' ]",
>   "foreach": "'global'",
>   "init" : {
> "s": "OUTLIER_MAD_INIT(STATS_MERGE(PROFILE_GET('sketchy_values',
> 'global', 5, 'MINUTES')))",
> "val_dist": "STATS_INIT()"
> },
>   "update": {
> "val_dist": "STATS_ADD(val_dist, value)",
> "s": "OUTLIER_MAD_ADD(s, value)"
> },
>   "result": {
> "sketchy_values": "val_dist",
> "sketchy_mad": "s",
> "windowed_mad" : "OUTLIER_MAD_STATE_MERGE(s,
> PROFILE_GET('sketchy_mad', 'global', 4, 'MINUTES'))"
> }
> }
>   ]
> }
>
>
> You would then score pulling only the most recent MAD state.  Merging
> across them would double count entries, as Matt mentioned above
> OUTLIER_MAD_SCORE(GET_LAST(PROFILE_GET('sketchy_mad', 'global', 1,
> 'MINUTES')), value)
>
> So, tl;dr, I'm in favor of Matt's syntax to merge profiles and give
> multiple outputs.  Also, different but coupled problem, MAD needs fixing to
> decouple tracking value distributions from the distribution of the
> deviation from the median.
>
> Casey
>
> On Wed, Jan 25, 2017 at 9:28 AM, Nick Allen  wrote:
>
> > That was a lot to digest so I apologize if I have missed some of your
> > thought process.  I probably need to read through this a couple more
> times.
> >
> >
> > Can anyone specify a better way to do Sliding Window profiles correctly
> > > with current functionality?
> >
> >
> > Could we not change the underlying implementation of the profiler to
> > support sliding windows?  Ideally the same profile definition could be
> > applied to either a tumbling or sliding window, rather than requiring a
> > change in the profile definition itself.
> >
> > The work that I had done for METRON-590, made it possible to configure
> > either a tumbling or sliding window.  The downside of that work is that
> it
> > was an all-or-nothing change, all profiles (in the same topology) were
> > either tumbling or sliding.  But perhaps we can enhance it from there.
> >
> >
> > A much more elegant solution would be to allow “result” to write two
> > > objects.
> >
> >
> > Yes, I like this! I have had the same thought myself.  This would be a
> > simple change and would provide the user with some flexibility.  Does
> this
> > change solve the entire problem though?
> >
> >
> > Obviously, I do need to sit down and think on this a bit more.  I really
> > wish we could handle these use cases with a profile definition that
> remains
> > simple and easy to grok.  Thanks 

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-19 Thread Michael Miklavcic
aring issues.
> > > • Doxia-markdown doesn’t process the triple back-tick (```) the
> > same
> > > way as
> > > Github Markdown. It seems to color-code it as , but
> doesn’t
> > > preserve
> > > line breaks, which is really bad.
> > > • Similarly, it only processes bullet lists in isolation, and
> it
> > > doesn’t
> > > correctly combine bullet lists subordinate to a numbered list.
> > >
> > > The upshot is that
> > > • both code and bullet lists often lose their linebreaks, and
> get
> >     > mushed
> > > into run-on paragraphs, usually combined with the preceding
> > paragraph,
> > > and
> > > • bullet lists interrupt numbered lists and make them start
> over
> > at #1.
> > >
> > > Perhaps 80-90% of these issues can be fixed by editing the
> > markdown
> > > files
> > > to put blank lines around the list formats. I started doing
> > this, but I
> > > didn’t want to obscure the proto by editing tons of .md files.
> > As of
> > > this
> > > proto, only the half dozen actually broken files (that caused
> > maven
> > > site
> > > build errors) have been fixed.
> > > The other 10-20% will just require simplification of the
> > markdown used,
> > > unless we can get an updated version of the plugins.
> > >
> > > Anyway, please take a look and share your thoughts.
> > >
> > > Thanks,
> > > --Matt
> > >
> > >
> > > On 1/16/17, 1:02 PM, "Michael Miklavcic" <
> > michael.miklav...@gmail.com>
> > > wrote:
> > >
> > > Hey Matt, feel free to ping me.
> > >
> > > On Mon, Jan 16, 2017 at 1:39 PM, Matt Foley <ma...@apache.org>
> > wrote:
> > >
> > > > I looked into the Falcon website and doxia over the weekend,
> > and I’m
> > > > convinced that using the doxia-markdown plugin should make it
> > dirt
> > > simple
> > > > to do what’s been discussed in this thread, with no overhead
> > on the
> > > part
> > > of
> > > > people writing the README.md files.
> > > >
> > > > I fiddled with trying to do a POC, and unfortunately
> concluded
> > > (again)
> > > > that I don’t really know maven very well :-)
> > > > Are there any maven experts out there who would be willing to
> > give me
> > > some
> > > > pointers (offline) on how to make use of this apparently
> > simple maven
> > > > plug-in?
> > > >
> > > > I can do the bit of scripting needed to gather the docs. I’ve
> > opened
> > > > https://issues.apache.org/jira/browse/METRON-660 with some
> > > sub-tasks for
> > > > this work.
> > > > --Matt
> > > >
> > > > On 1/13/17, 12:04 PM, "zeo...@gmail.com" <zeo...@gmail.com>
> > wrote:
> > > >
> > > > +1 on any improvement to documentation and more consistency.
> > At this
> > > > point, I think getting rid of or hiding some of the pages on
> > the wiki
> > > > (at
> > > > least for the short term) would be better than leaving them
> > around
> > > > because
> > > > there's a lot of misinformation.
> > > >
> > > > Jon
> > > >
> > > > On Fri, Jan 13, 2017 at 10:13 AM Nick Allen <
> > n...@nickallen.org>
> > > > wrote:
> > > >
> > > > > +1 I think it is sorely needed.
> > > > >
> > > > > If we can come up with a really slick solution like Spark,
> > then
> > > > great. I am
> > > > > also not against a half-baked solution that can later
> evolve
> > into
> > > > something
> > > > > else. For example, create an index README.md that links
> > together
> > > > all the
> > > > > exis

Re: [GitHub] incubator-metron issue #418: METRON-666 Fix javadoc doclint errors

2017-01-17 Thread Michael Miklavcic
How does Maven split tasks up these tasks? My only concern there is that we
have some robust integration test infrastructure data setup that may not be
thread-safe.

On Tue, Jan 17, 2017 at 7:55 AM, jjmeyer0  wrote:

> Github user jjmeyer0 commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/418
>
> If time outs are going to continue being an issue, maybe we should
> start looking into using Maven's parallel builds. For example, when I build
> locally I always do: `mvn -T 2C clean package`. This means build using 2
> threads per core. It usually cuts the build time in half for me.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: [DISCUSS] Moving GeoIP management away from MySQL

2017-01-16 Thread Michael Miklavcic
I'm also in agreement on this.

On Mon, Jan 16, 2017 at 2:11 PM, Nick Allen  wrote:

> +1 to using the Java API with the MMDB file provided by Maxmind.  This is
> what I had thought we were doing when we discussed this a few months back.
> I'd rather use the Maxmind tools as-provided instead of engineering
> something on top of it.
>
> On Mon, Jan 16, 2017 at 3:59 PM, JJ Meyer  wrote:
>
> > Matt, I agree with your points on why we shouldn't just get rid of the
> > database just to get rid of a database. But IMO, I think we may be
> > reinventing the wheel a little bit by even putting the maxmind data into
> > MySQL. Right now we are already downloading a maxmind file. To me it
> seems
> > simpler to push the file to HDFS where we can pick it up and have the
> > maxmind client use that instead of importing data into a DB and then
> > running a query. Also, I believe the data gets updated weekly. So syncing
> > may become easier too.
> >
> > James, I believe it works with the paid and free versions of geoip. I
> know
> > NiFi uses this client library in their Geo enrichment processor.
> >
> > Also, if it is decided that using a SQL database is still the best
> > solution, I think there is a benefit to using their library. We would
> just
> > have to implement a `DatabaseProvider` that hits some SQL db instead of
> > using their standard implementation.
> >
> > Thanks,
> > JJ
> >
> > On Mon, Jan 16, 2017 at 2:27 PM, James Sirota 
> wrote:
> >
> > > Hi Guys, I just wanted to clarify one point that I think is lost in
> this
> > > tread.  Geo enrichment is NOT a key-value enrichment.  It requires a
> > range
> > > scan and a join (which is why it's implemented via mySql and not
> Hbase).
> > > To account for this access pattern via a key-value store you would
> > > inevitably have to do something funky or in case of Hbase I don't think
> > > there is a way to avoid doing a range scan.
> > >
> > > With respect to mapdb it only has support for Maps, Sets, Lists,
> Queues.
> > > Are we sure it provides enough functionality for us to do this
> > enrichment?
> > >
> > > With respect to the Maxmind client, are we sure we can use it on the
> > > mySql-backed version of their DB?  I thought the Maxmind database
> itself
> > is
> > > proprietary and is something you have to pay for.  My understanding is
> > that
> > > the client is designed for that proprietary version.
> > >
> > > I somewhat agree with Matt's point.  If mySql is a problem because of
> > > licensing, the path of least resistance to remove mySql dependencies
> > would
> > > be to simply switch to postgresql.  We will always have conventional
> sql
> > > databases in our stack because other big data tools use them. Why not
> > take
> > > advantage of them too?
> > >
> > > Thanks,
> > > James
> > >
> > > 16.01.2017, 12:27, "Matt Foley" :
> > > > Hi Justin, and team,
> > > > Several components of the Hadoop Stack utilize a SQL database,
> usually
> > > for metadata of some sort. Ambari knows this and arranges for them to
> > share
> > > a single database installation (on or off the cluster), unless they
> > > explicitly configure use of different databases (which is allowed for
> > sites
> > > that desire it). Ambari defaults to using PostgreSQL, altho it’s happy
> to
> > > use MySQL, Oracle, or Microsoft, along with whatever each component
> > > historically defined as their default (such as Derby).
> > > >
> > > > If we want to start with a replacement of current functionality, I
> > would
> > > suggest switching the default database to PostgreSQL. Replacing fast,
> > > efficient, and proven db services with a file-based api library (but no
> > > standard way to propagate the underlying storage files) seems to me to
> be
> > > taking a step backwards.
> > > >
> > > > Sticking with a SQL-based service will surely minimize the amount of
> > > code changes needed. And making the SQL either dialect-independent or
> > > capable of switching among dialects, then enables us to do what the
> rest
> > of
> > > the Hadoop stack does: allow enterprise customers to substitute Oracle
> or
> > > Microsoft enterprise-class databases where they wish. Regarding the
> > > drivers, we should study what the other Stack components do; I’m not an
> > > expert in those areas.
> > > >
> > > > Using the same db as the rest of the stack also means administrators
> > can
> > > be confident they’ve set up adequate backup and recovery processes.
> > > > All these are valuable reasons not to roll our own storage system for
> > > this enrichment data. IMO, of course.
> > > >
> > > > Cheers,
> > > > --Matt
> > > >
> > > > On 1/16/17, 9:52 AM, "Kyle Richardson" 
> > > wrote:
> > > >
> > > > +1 Agree with David's order
> > > >
> > > > -Kyle
> > > >
> > > > On Mon, Jan 16, 2017 at 12:41 PM, David Lyle <
> dlyle65...@gmail.com
> > >
> > > wrote:
> > > >
> > > > > Def 

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-16 Thread Michael Miklavcic
Hey Matt, feel free to ping me.

On Mon, Jan 16, 2017 at 1:39 PM, Matt Foley  wrote:

> I looked into the Falcon website and doxia over the weekend, and I’m
> convinced that using the doxia-markdown plugin should make it dirt simple
> to do what’s been discussed in this thread, with no overhead on the part of
> people writing the README.md files.
>
> I fiddled with trying to do a POC, and unfortunately concluded (again)
> that I don’t really know maven very well :-)
> Are there any maven experts out there who would be willing to give me some
> pointers (offline) on how to make use of this apparently simple maven
> plug-in?
>
> I can do the bit of scripting needed to gather the docs.  I’ve opened
> https://issues.apache.org/jira/browse/METRON-660 with some sub-tasks for
> this work.
> --Matt
>
> On 1/13/17, 12:04 PM, "zeo...@gmail.com"  wrote:
>
> +1 on any improvement to documentation and more consistency.  At this
> point, I think getting rid of or hiding some of the pages on the wiki
> (at
> least for the short term) would be better than leaving them around
> because
> there's a lot of misinformation.
>
> Jon
>
> On Fri, Jan 13, 2017 at 10:13 AM Nick Allen 
> wrote:
>
> > +1 I think it is sorely needed.
> >
> > If we can come up with a really slick solution like Spark, then
> great. I am
> > also not against a half-baked solution that can later evolve into
> something
> > else.  For example, create an index README.md that links together
> all the
> > existing READMEs and run Pandoc on it.  Not ideal, but way better
> than what
> > we have.
> >
> >
> >
> > On Fri, Jan 13, 2017 at 9:53 AM, Otto Fowler <
> ottobackwa...@gmail.com>
> > wrote:
> >
> > > I think something that does what you have laid out here, no matter
> the
> > > implementation details would be ideal
> > >
> > >
> > > On January 12, 2017 at 18:05:24, Matt Foley (ma...@apache.org)
> wrote:
> > >
> > > We currently have three forms of documentation, with the following
> > > advantages and disadvantages:
> > >
> > > || Docs || Pro || Con ||
> > > | CWiki |
> > > Easy to edit, no special tools required, don't have to be a
> developer to
> > > contribute, google and wiki search |
> > > Not versioned, no review process, distant from the code, obsolete
> content
> > > tends to accumulate |
> > > | Site |
> > > Versioned and reviewed, only committers can edit, google search |
> > > Yet another arcane toolset must be learned, only web programmers
> feel
> > > comfortable contributing, "asf-site" branch not related to code
> versions,
> > > distant from the code, tends to go obsolete due to non-maintenance
> |
> > > | README.md |
> > > Versioned and reviewed, only committers can edit, tied to code
> versions,
> > > highly local to the code being documented |
> > > Non-developers don't know about them, may be scared by github, poor
> > scoring
> > > in google search, no high-level presentation |
> > >
> > > Various discussion threads indicate the developer community likes
> > > README-based docs, and it's easy to see why from the above. I
> propose
> > this
> > > extension to the README-based documentation, to address their
> > > disadvantages:
> > >
> > > 1. Produce a script that gathers the README.md files from all code
> > > subdirectories into a hierarchical list. The script would have an
> > exclusion
> > > list for non-user-content, which at this point would consist of
> [site/*,
> > > build_utils/*]. The hierarchy would be sorted depth-first. The
> resulting
> > > hierarchical list at this time (with six added README files to
> complete
> > the
> > > hierarchy) would be:
> > >
> > > ./README.md
> > > ./metron-analytics/README.md <== (need file here)
> > > ./metron-analytics/metron-maas-service/README.md
> > > ./metron-analytics/metron-profiler/README.md
> > > ./metron-analytics/metron-profiler-client/README.md
> > > ./metron-analytics/metron-statistics/README.md
> > > ./metron-deployment/README.md
> > > ./metron-deployment/amazon-ec2/README.md
> > > ./metron-deployment/packaging/README.md <== (need file here)
> > > ./metron-deployment/packaging/ambari/README.md <== (need file
> here)
> > > ./metron-deployment/packaging/docker/ansible-docker/README.md
> > > ./metron-deployment/packaging/docker/rpm-docker/README.md
> > > ./metron-deployment/packer-build/README.md
> > > ./metron-deployment/roles/ <== (need file here)
> > > ./metron-deployment/roles/kibana/README.md
> > > ./metron-deployment/roles/monit/README.md
> > > ./metron-deployment/roles/opentaxii/README.md
> > > ./metron-deployment/roles/pcap_replay/README.md
> > > ./metron-deployment/roles/sensor-test-mode/README.md
>

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-15 Thread Michael Miklavcic
hack the pyc files (on each
> node) to force the data to be reloaded from Ambari-server.  Best solution
> is don’t cheat.
>
> Also, there may be circumstances under which the Ambari-agent will detect
> changes and re-write the latest version it knows of the config files, even
> without a Save or Start action at the Ambari-server.  I’m not sure of this
> and need to check with Ambari developers.  It may no longer happen, altho
> I’m pretty sure change detection/reversion was a feature of early versions
> of Ambari.
>
> Hope this helps,
> --Matt
>
> 
> From: Michael Miklavcic <michael.miklav...@gmail.com>
> Reply-To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.
> org>
> Date: Thursday, January 12, 2017 at 3:59 PM
> To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>
> Subject: Re: [DISCUSS] Ambari Metron Configuration Management consequences
> and call to action
>
> Hi Casey,
>
> Thanks for starting this thread. I believe you are correct in your
> assessment of the 4 options for updating configs in Metron. When using more
> than one of these options we can get into a split-brain scenario. A basic
> example is updating the global config on disk and using the
> zk_load_configs.sh. Later, if a user decides to restart Ambari, the cached
> version stored by Ambari (it's in the MySQL or other database backing
> Ambari) will be written out to disk in the defined config directory, and
> subsequently loaded using the zk_load_configs.sh under the hood. Any global
> configuration modified outside of Ambari will be lost at this point. This
> is obviously undesirable, but I also like the purpose and utility exposed
> by the multiple config management interfaces we currently have available. I
> also agree that a service would be best.
>
> For reference, here's my understanding of the current configuration
> loading mechanisms and their deps.
>
> 
>
> Mike
>
>
> On Thu, Jan 12, 2017 at 3:08 PM, Casey Stella <ceste...@gmail.com> wrote:
>
> In the course of discussion on the PR for METRON-652
> <https://github.com/apache/incubator-metron/pull/415> something that I
> should definitely have understood better came to light and I thought that
> it was worth bringing to the attention of the community to get
> clarification/discuss is just how we manage configs.
>
> Currently (assuming the management UI that Ryan Merriman submitted) configs
> are managed/adjusted via a couple of different mechanism.
>
>- zk_load_utils.sh: pushed and pulled from disk to zookeeper
>- Stellar REPL: pushed and pulled via the CONFIG_GET/CONFIG_PUT
> functions
>- Ambari: initialized via the zk_load_utils script and then some of them
>are managed directly (global config) and some indirectly
> (sensor-specific
>configs).
>   - NOTE: Upon service restart, it may or may not overwrite changes on
>   disk or on zookeeper.  *Can someone more knowledgeable than me about
>   this describe precisely the semantics that we can expect on
> service restart
>   for Ambari? What gets overwritten on disk and what gets updated
> in ambari?*
>- The Management UI: manages some of the configs. *RYAN: Which configs
>do we support here and which don't we support here?*
>
> As you can see, we have a mishmash of mechanisms to update and manage the
> configuration for Metron in zookeeper.  In the beginning the approach was
> just to edit configs on disk and push/pull them via zk_load_utils.  Configs
> could be historically managed using source control, etc.  As we got more
> and more components managing the configs, we haven't taken care that they
> they all work with each other in an expected way (I believe these are
> true..correct me if I'm wrong):
>
>- If configs are modified in the management UI or the Stellar REPL and
>someone forgets to pull the configs from zookeeper to disk, before they
> do
>a push via zk_load_utils, they will clobber the configs in zookeeper
> with
>old configs.
>- If the global config is changed on disk and the ambari service
>restarts, it'll get reset with the original global config.
>- *Ryan, in the management UI, if someone changes the zookeeper configs
>from outside, are those configs reflected immediately in the UI?*
>
>
> It seems to me that we have a couple of options here:
>
>- A service to intermediate and handle config update/retrieval and
>tracking historical changes so these different mechanisms can use a
> common
>component for config management/tracking and refactor the existing
>mechanisms to use that service
> 

Re: [PROPOSAL] up-to-date versioned documentation

2017-01-12 Thread Michael Miklavcic
Casey, Matt - These guys are using doxia
https://github.com/apache/falcon/tree/master/docs

Honestly, I kind of like Spark's approach -
https://github.com/apache/spark/tree/master/docs

Mike

On Thu, Jan 12, 2017 at 4:48 PM, Matt Foley  wrote:

> I’m ambivalent; I think we’d end up tied to the doxia processing pipeline,
> which is “yet another arcane toolset” to learn.  Using .md as the input
> format decreases the dependency, but we’d still be dependent on it.
>
> I had anticipated that the web page would be a write-once thing that would
> be only a couple days for an experienced Web developer. But I was going to
> get an estimate from some co-workers before actually trying to get it
> implemented. And the script is a few hours of work with find and awk.
>
> On the other hand, doxia is certainly an expectable solution.  Is setting
> up that infrastructure less work than developing the web page?  Or is it
> actually just a matter of a few lines in pom.xml?
>
>
> On 1/12/17, 3:24 PM, "Casey Stella"  wrote:
>
> Just a followup thought that's a bit more constructive, maybe we could
> migrate the README.md's into a site directory and use doxia markdown
> (example here ) to
> generate the site as part of the build to resolve 1 through 3?
>
> On Thu, Jan 12, 2017 at 6:19 PM, Casey Stella 
> wrote:
>
> > So, I do think this would be better than what we currently do.  I
> like a
> > few things in particular:
> >
> >- I don't like the wiki one bit.
> >- We have a LOT of documentation in the README.md's and it's
> sometimes
> >poorly organized
> >- I like a documentation preprocessing pipeline to be present.
> For
> >instance, a major ask is all of the stellar functions in one
> place.  That's
> >solved by updating an index manually in the READMEs and keeping
> it in sync
> >with the annotation.  I'd like to make a stellar annotation ->
> markdown
> >generator as part of the build and this would be nice for such a
> task.
> >
> > My only concern is that the html generation/viewer seems like a fair
> > amount of engineering.  Are you sure there isn't something easier
> that we
> > could conform to?  I'm sure we aren't the only project in the world
> that
> > has this particular issue.  Is there something like a maven site
> plugin or
> > something?  Just a thought.  I'll come back with more :)
> >
> > Great ideas!  Keep them coming!
> >
> > Casey
> >
> > On Thu, Jan 12, 2017 at 6:05 PM, Matt Foley 
> wrote:
> >
> >> We currently have three forms of documentation, with the following
> >> advantages and disadvantages:
> >>
> >> || Docs || Pro || Con ||
> >> | CWiki |
> >>   Easy to edit, no special tools required, don't have to be a
> >> developer to contribute, google and wiki search |
> >> Not versioned, no review process, distant from the code, obsolete
> content
> >> tends to accumulate |
> >> | Site |
> >>   Versioned and reviewed, only committers can edit, google
> search |
> >>   Yet another arcane toolset must be learned, only web
> programmers
> >> feel comfortable contributing, "asf-site" branch not related to code
> >> versions, distant from the code, tends to go obsolete due to
> >> non-maintenance |
> >> | README.md |
> >>   Versioned and reviewed, only committers can edit, tied to code
> >> versions, highly local to the code being documented |
> >>   Non-developers don't know about them, may be scared by
> github, poor
> >> scoring in google search, no high-level presentation |
> >>
> >> Various discussion threads indicate the developer community likes
> >> README-based docs, and it's easy to see why from the above.  I
> propose this
> >> extension to the README-based documentation, to address their
> disadvantages:
> >>
> >> 1. Produce a script that gathers the README.md files from all code
> >> subdirectories into a hierarchical list.  The script would have an
> >> exclusion list for non-user-content, which at this point would
> consist of
> >> [site/*, build_utils/*].  The hierarchy would be sorted
> depth-first.  The
> >> resulting hierarchical list at this time (with six added README
> files to
> >> complete the hierarchy) would be:
> >>
> >> ./README.md
> >> ./metron-analytics/README.md  <== (need file here)
> >> ./metron-analytics/metron-maas-service/README.md
> >> ./metron-analytics/metron-profiler/README.md
> >> ./metron-analytics/metron-profiler-client/README.md
> >> ./metron-analytics/metron-statistics/README.md
> >> ./metron-deployment/README.md
> >> ./metron-deployment/amazon-ec2/README.md
> >> ./metron-deployment/packaging/README.md  <== 

Re: [DISCUSS] Dev Guide and Committer Review Guide additions?

2017-01-12 Thread Michael Miklavcic
"Also, what would people think of dropping Ansible in favor of Ambari and
Docker as the preferred deployment management approaches?"

Agreed about publishing via Ambari. I'm not sure about fully replacing
Vagrant just yet, but we could move that direction. Docker would allow us
to more easily test a realistic multi-node setup on a single machine. In
the meantime, maybe a quick win could be to use Ansible to deploy and
install the MPack to the quickdev environment? This way we're leveraging
the rpm's as well as the MPack code and installing in nearly the same
manner as most users.

On Thu, Jan 12, 2017 at 3:49 PM, Matt Foley <ma...@apache.org> wrote:

> I think I hear 3 major areas not adequately covered by our usual “code
> review”:
> 1. Documentation
> 2. Deployment Builds
> 3. Management of config parameters
>
> The other areas mentioned by Otto (testing, perf test, Stellar impact, and
> REST api impact), are entirely valid, but fall under existing code and
> architecture that seems generally adequate.
>
> Regarding #1, Documentation, I’d like to branch a discussion thread for a
> proposal I’m about to make, to enhance our use of README files as usable
> and up-to-date end-user documentation, linked from the Metron site.
> Implicit in that is the idea that we’d deprecate using the cwiki for
> anything but long-lived demonstrations/tutorials that are unlikely to go
> obsolete.
>
> For #2, Deployment Builds:  This is difficult, and unfortunately I’m not
> an expert with these things, but we need to automate this as much as
> possible.  Config params will always interact heavily with deployment
> issues, but let’s leave that for #3 :0)
>
> As far as RPMs, Ansible playbooks, or Docker images go, we’d like to
> automate so that developers never have to do anything when they are
> committing modifications of existing components, and even when new
> components are added (like the Profiler is being added now), it should
> insofar as possible be automated via maven declarations.  But that takes
> input from the experts in each of the areas.
>
> Also, what would people think of dropping Ansible in favor of Ambari and
> Docker as the preferred deployment management approaches?
>
> #3, Management of config parameters:  I’ve been thinking about this
> lately, but haven’t written up a proposal yet.  I’m bothered by the wide
> ranging variability in the way Metron configs are managed: files,
> zookeeper, environment variables, traditional Hadoop-style configs, and
> roll-your-own json configs, sometimes shared, sometimes duplicated, not to
> mention Ambari over it all.  This has been encouraged by the huge number of
> Stack components that Metron depends on, and the relative independence of
> the components Metron itself is composed of.
>
> But I think as Otto points out, as we grow the number of components and
> mature out of the incubator, we have to get this under control.  We need an
> architecture for management of configuration parameters of the Metron
> topologies.  (We can’t do much about the Stack components, but Ambari is
> establishing a culture around managing those.)  The architecture needs to
> include update methodology for semantic changes in parameter sets.
>
> I’m mulling such an architecture, but what do other people think?  Is this
> a valid need?
>
> Thanks,
> --Matt
>
> On 1/12/17, 8:23 AM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> wrote:
>
> Hi Otto,
>
> You make a great point.
>
> AFA RPM/MPack, we do have some work in the pipeline for streamlining
> things
> a bit with the RPM's and MPack code such that they will be used for
> performing the Metron install in the sandbox VM's rather than Ansible.
> (I'd
> search for the public Jiras and post them here, but Jira is down for
> maintenance currently.) This should help make it obvious that a change
> or
> new feature requires modifications because they will be in the critical
> path to testing.
>
> Documentation is still tricky because we have README files, javadoc,
> and
> the wiki. But in general I think the current approach is to put
> concrete
> functionality docs in the READMEs as much as possible because they can
> be
> tracked and versioned with Git. I think the community has actually been
> doing a pretty good job here. The wiki is a little more tricky because
> there is typically only one version, which tracks master, not
> necessarily
> the latest stable release.
>
> Mike
>
>
> On Thu, Jan 12, 2017 at 8:42 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > As Metron evolves to include new deployment options, features, and
> 

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

2017-01-12 Thread Michael Miklavcic
Hi Casey,

Thanks for starting this thread. I believe you are correct in your
assessment of the 4 options for updating configs in Metron. When using more
than one of these options we can get into a split-brain scenario. A basic
example is updating the global config on disk and using the
zk_load_configs.sh. Later, if a user decides to restart Ambari, the cached
version stored by Ambari (it's in the MySQL or other database backing
Ambari) will be written out to disk in the defined config directory, and
subsequently loaded using the zk_load_configs.sh under the hood. Any global
configuration modified outside of Ambari will be lost at this point. This
is obviously undesirable, but I also like the purpose and utility exposed
by the multiple config management interfaces we currently have available. I
also agree that a service would be best.

For reference, here's my understanding of the current configuration loading
mechanisms and their deps.

[image: Inline image 1]

Mike


On Thu, Jan 12, 2017 at 3:08 PM, Casey Stella  wrote:

> In the course of discussion on the PR for METRON-652
>  something that I
> should definitely have understood better came to light and I thought that
> it was worth bringing to the attention of the community to get
> clarification/discuss is just how we manage configs.
>
> Currently (assuming the management UI that Ryan Merriman submitted) configs
> are managed/adjusted via a couple of different mechanism.
>
>- zk_load_utils.sh: pushed and pulled from disk to zookeeper
>- Stellar REPL: pushed and pulled via the CONFIG_GET/CONFIG_PUT
> functions
>- Ambari: initialized via the zk_load_utils script and then some of them
>are managed directly (global config) and some indirectly
> (sensor-specific
>configs).
>   - NOTE: Upon service restart, it may or may not overwrite changes on
>   disk or on zookeeper.  *Can someone more knowledgeable than me about
>   this describe precisely the semantics that we can expect on
> service restart
>   for Ambari? What gets overwritten on disk and what gets updated
> in ambari?*
>- The Management UI: manages some of the configs. *RYAN: Which configs
>do we support here and which don't we support here?*
>
> As you can see, we have a mishmash of mechanisms to update and manage the
> configuration for Metron in zookeeper.  In the beginning the approach was
> just to edit configs on disk and push/pull them via zk_load_utils.  Configs
> could be historically managed using source control, etc.  As we got more
> and more components managing the configs, we haven't taken care that they
> they all work with each other in an expected way (I believe these are
> true..correct me if I'm wrong):
>
>- If configs are modified in the management UI or the Stellar REPL and
>someone forgets to pull the configs from zookeeper to disk, before they
> do
>a push via zk_load_utils, they will clobber the configs in zookeeper
> with
>old configs.
>- If the global config is changed on disk and the ambari service
>restarts, it'll get reset with the original global config.
>- *Ryan, in the management UI, if someone changes the zookeeper configs
>from outside, are those configs reflected immediately in the UI?*
>
>
> It seems to me that we have a couple of options here:
>
>- A service to intermediate and handle config update/retrieval and
>tracking historical changes so these different mechanisms can use a
> common
>component for config management/tracking and refactor the existing
>mechanisms to use that service
>- Standardize on exactly one component to manage the configs and regress
>the others (that's a verb, right?   nicer than delete.)
>
> I happen to like the service approach, myself, but I wanted to put it up
> for discussion and hopefully someone will volunteer to design such a thing.
>
> To frame the debate, I want us to keep in mind a couple of things that may
> or may not be relevant to the discussion:
>
>- We will eventually be moving to support kerberos so there should at
>least be a path to use kerberos for any solution IMO
>- There is value in each of the different mechanisms in place now.  If
>there weren't, then they wouldn't have been created.  Before we try to
> make
>this a "there can be only one" argument, I'd like to hear very good
>arguments.
>
> Finally, I'd appreciate if some people might answer the questions I have in
> bold there.  Hopefully this discussion, if nothing else happens, will
> result in fodder for proper documentation of the ins and outs of each of
> the components bulleted above.
>
> Best,
>
> Casey
>


Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Michael Miklavcic
I like the flexibility and expressibility of the first option with Stellar
filters.

M

On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella  wrote:

> As of METRON-652 , we
> will have decoupled the indexing configuration from the enrichment
> configuration.  As an immediate follow-up to that, I'd like to provide the
> ability to turn off and on writers via the configs.  I'd like to get some
> community feedback on how the functionality should work, if y'all are
> amenable. :)
>
>
> As of now, we have 3 possible writers which can be used in the indexing
> topology:
>
>- Solr
>- Elasticsearch
>- HDFS
>
> HDFS is always used, elasticsearch or solr is used depending on how you
> start the indexing topology.
>
> A couple of proposals come to mind immediately:
>
> *Index Filtering*
>
> You would be able to specify a filter as defined by a stellar statement
> (likely a reuse of the StellarFilter that exists in the Parsers) which
> would allow you to indicate on a message-by-message basis whether or not to
> write the message.
>
> The semantics of this would be as follows:
>
>- Default (i.e. unspecified) is to pass everything through (hence
>backwards compatible with the current default config).
>- Messages which have the associated stellar statement evaluate to true
>for the writer type will be written, otherwise not.
>
>
> Sample indexing config which would write out no messages to HDFS and write
> out only messages containing a field called "field1":
> {
>"index" : "squid"
>   ,"batchSize" : 100
>   ,"filters" : {
>   "HDFS" : "false"
>  ,"ES" : "exists(field1)"
>  }
> }
>
> *Index On/Off Switch*
>
> A simpler solution would be to just provide a list of writers to write
> messages.  The semantics would be as follows:
>
>- If the list is unspecified, then the default is to write all messages
>for every writer in the indexing topology
>- If the list is specified, then a writer will write all messages if and
>only if it is named in the list.
>
> Sample indexing config which turns off HDFS and keeps on Elasticsearch:
> {
>"index" : "squid"
>   ,"batchSize" : 100
>   ,"writers" : [ "ES" ]
> }
>
> Thanks in advance for the feedback!  Also, if you have any other, better
> ideas than the ones presented here, let me know too.
>
> Best,
>
> Casey
>


Re: [DISCUSS] Dev Guide and Committer Review Guide additions?

2017-01-12 Thread Michael Miklavcic
Hi Otto,

You make a great point.

AFA RPM/MPack, we do have some work in the pipeline for streamlining things
a bit with the RPM's and MPack code such that they will be used for
performing the Metron install in the sandbox VM's rather than Ansible. (I'd
search for the public Jiras and post them here, but Jira is down for
maintenance currently.) This should help make it obvious that a change or
new feature requires modifications because they will be in the critical
path to testing.

Documentation is still tricky because we have README files, javadoc, and
the wiki. But in general I think the current approach is to put concrete
functionality docs in the READMEs as much as possible because they can be
tracked and versioned with Git. I think the community has actually been
doing a pretty good job here. The wiki is a little more tricky because
there is typically only one version, which tracks master, not necessarily
the latest stable release.

Mike


On Thu, Jan 12, 2017 at 8:42 AM, Otto Fowler 
wrote:

> As Metron evolves to include new deployment options, features, and
> configurations it is hard and only getting harder for contributors,
> committers, and reviewers to understand what the required changes are
> across the different areas of the system to correctly and completely
> introduce a change or new feature in the system.
>
> We have talked some about the requirements or expectations for submitters
> with regards to tests and coverage, coding style, and documentation  but I
> don’t think we have enough guidance on deployment or other changes that
> need to be considered.  For committers it is pretty much the same, with the
> extra stuff around that process.
>
> Right now it seems as a committer I’m counting on others like Nick or Casey
> to understand anything that may be missing from a submission when I review
> it.  Should there by an Ambari/RPM change?   Does this change the RestAPI?
> Does this effect STELLAR Lang/SHELL?  Does it need customer Docker Compose
> work?  etc etc.
>
> I think as we grow the community and try to get out of incubation it will
> be impractical for us to count on this, and we are even now increasing the
> risk of regression or functional gaps ( unrealized ) that will have an
> adverse effect on having a stable master.
>
> I think we should discuss if and how we can improve this or the issue of my
> sanity ;).
>
> What are the criteria that we need to have submitters and reviewers have in
> mind?
> * Test
> * Doc
> ** Obsoleting of existing documentation/how-to’s ( even hortonworks posts )
> * Performance
> ** How do we test for performance?
> *** Standards
> *** Tools and processes
> * Deployment
> ** RPM
>   ** Docker
> ** Ansible
> ** Ambari
> ** AWS Script
> * Functional
> ** STELLAR/Shell
> ** REST api’s
> * Dev/review guide
> ** Does the review / submit guide need to account for it?
>
> Any thoughts?
>


[DISCUSS][MENTORS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-11 Thread Michael Miklavcic
Restarting this thread w/new subject

This is a question primarily for the mentors.

*Background*
metron-common is currently depending on the openSOC github repo for hosting
kraken artifacts. The original reason for this was that these jars are not
hosted in Maven Central, and they were not reliably available in the Kraken
repo. https://issues.apache.org/jira/browse/METRON-650 is tracking work
around copying these artifacts to the Metron repo.

Kraken source on openSOC - https://github.com/OpenSOC/kraken
Krake maven repo on openSOC -
https://github.com/OpenSOC/kraken/tree/mvn-repo

*Ask*
Create a new branch in incubator-metron to host any necessary maven
artifacts. This branch would simply be incubator-metron/mvn-repo. This is
similar to how we've hosted the asf-site.

*Concerns/Questions*

   1. Can we host these jars/artifacts in this manner?
   2. Concerns regarding licensing?
   3. Do we need to also grab and host the source code?


Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-11 Thread Michael Miklavcic
Ok, that was funny

On Wed, Jan 11, 2017 at 7:43 AM, Casey Stella <ceste...@gmail.com> wrote:

> I'd recommend restarting this discuss thread with the subject
> "[DISCUSS][MENTORS] Hosting Kraken maven artifacts in incubator-metron git
> repo".  That's the Mentors bat-symbol. The other option is to repeat
> "mentors" 3 times and either the mentors will appear or you will be eaten
> by a grue. ;)
>
> On Wed, Jan 11, 2017 at 9:32 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Any comment from the mentors on this?
> >
> > On Mon, Jan 9, 2017 at 2:00 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > This is a question primarily for the mentors.
> > >
> > > *Background*
> > > metron-common is currently depending on the openSOC github repo for
> > > hosting kraken artifacts. The original reason for this was that these
> > jars
> > > are not hosted in Maven Central, and they were not reliably available
> in
> > > the Kraken repo. https://issues.apache.org/jira/browse/METRON-650 is
> > > tracking work around copying these artifacts to the Metron repo.
> > >
> > > Kraken source on openSOC - https://github.com/OpenSOC/kraken
> > > Krake maven repo on openSOC - https://github.com/OpenSOC/
> > > kraken/tree/mvn-repo
> > >
> > > *Ask*
> > > Create a new branch in incubator-metron to host any necessary maven
> > > artifacts. This branch would simply be incubator-metron/mvn-repo. This
> is
> > > similar to how we've hosted the asf-site.
> > >
> > > *Concerns/Questions*
> > >
> > >1. Can we host these jars/artifacts in this manner?
> > >2. Concerns regarding licensing?
> > >3. Do we need to also grab and host the source code?
> > >
> > >
> >
>


Re: [DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-11 Thread Michael Miklavcic
Any comment from the mentors on this?

On Mon, Jan 9, 2017 at 2:00 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> This is a question primarily for the mentors.
>
> *Background*
> metron-common is currently depending on the openSOC github repo for
> hosting kraken artifacts. The original reason for this was that these jars
> are not hosted in Maven Central, and they were not reliably available in
> the Kraken repo. https://issues.apache.org/jira/browse/METRON-650 is
> tracking work around copying these artifacts to the Metron repo.
>
> Kraken source on openSOC - https://github.com/OpenSOC/kraken
> Krake maven repo on openSOC - https://github.com/OpenSOC/
> kraken/tree/mvn-repo
>
> *Ask*
> Create a new branch in incubator-metron to host any necessary maven
> artifacts. This branch would simply be incubator-metron/mvn-repo. This is
> similar to how we've hosted the asf-site.
>
> *Concerns/Questions*
>
>1. Can we host these jars/artifacts in this manner?
>2. Concerns regarding licensing?
>3. Do we need to also grab and host the source code?
>
>


Re: Kafka error when upgrading to Metron 0.3.0

2017-01-09 Thread Michael Miklavcic
Hi Tyler, You shouldn't have to skip the unit tests as we ensure every
commit runs with 100% of tests passing. There are rare instances that this
is not true, but in general all tests should pass. I just pulled down
latest master and re-ran the following successfully.

mvn clean install -PHDP-2.5.0.0

Which branch or commit are you working from, and what command did you run
for the build? Be aware that Maven will ignore misspelled profile names and
instead use the default profile without warning.

On Mon, Jan 9, 2017 at 10:32 AM, Tyler Moore <tmo...@goflyball.com> wrote:

> Seems like it's just the unit tests that are failing for pcap backend. Was
> able to build successfully using new came with -DskipTests option along
> with the profile argument.
>
> Regards,
>
> Tyler Moore
> IT Specialist
> tyler.math...@yahoo.com
> 248-909-2769
>
> > On Jan 9, 2017, at 02:29, Michael Miklavcic <michael.miklav...@gmail.com>
> wrote:
> >
> > Hi Tyler, I don't recall seeing any failures the last time I ran this. I
> > will take a look.
> >
> > On Jan 8, 2017 9:56 PM, "Tyler Moore" <tmo...@goflyball.com> wrote:
> >
> > Michael,
> >
> > I am receiving error when trying to build with HPD profile, have you had
> > any problems with pcap-backend tests failing when building with hdp
> profile?
> > Errors and log files provided below:
> >
> > Results :
> >
> > Failed tests:
> >  PcapTopologyIntegrationTest.testTimestampInKey:152->testTopology:388->
> assertInOrder:542
> > null
> >  PcapTopologyIntegrationTest.testTimestampInPacket:135->
> testTopology:388->assertInOrder:542
> > null
> >
> >
> >
> > Tests run: 2, Failures: 2, Errors: 0, Skipped: 0
> >
> > [INFO] 
> > 
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Metron . SUCCESS [
> > 0.639 s]
> > [INFO] metron-analytics ... SUCCESS [
> > 0.029 s]
> > [INFO] metron-maas-common . SUCCESS [
> > 8.771 s]
> > [INFO] metron-platform  SUCCESS [
> > 0.057 s]
> > [INFO] metron-test-utilities .. SUCCESS [
> > 0.976 s]
> > [INFO] metron-integration-test  SUCCESS [
> > 3.888 s]
> > [INFO] metron-maas-service  SUCCESS
> [01:19
> > min]
> > [INFO] metron-common .. SUCCESS [
> > 41.718 s]
> > [INFO] metron-statistics .. SUCCESS [
> > 30.713 s]
> > [INFO] metron-hbase ... SUCCESS [
> > 37.286 s]
> > [INFO] metron-profiler-common . SUCCESS [
> > 7.967 s]
> > [INFO] metron-profiler-client . SUCCESS [
> > 54.088 s]
> > [INFO] metron-profiler  SUCCESS
> [02:28
> > min]
> > [INFO] metron-writer .. SUCCESS [
> > 7.151 s]
> > [INFO] metron-enrichment .. SUCCESS [
> > 57.076 s]
> > [INFO] metron-indexing  SUCCESS [
> > 7.028 s]
> > [INFO] metron-solr  SUCCESS [
> > 36.695 s]
> > [INFO] metron-pcap  SUCCESS [
> > 1.123 s]
> > [INFO] metron-parsers . SUCCESS
> [01:58
> > min]
> > [INFO] metron-pcap-backend  FAILURE [
> > 49.928 s]
> > [INFO] metron-data-management . SKIPPED
> > [INFO] metron-api . SKIPPED
> > [INFO] metron-management .. SKIPPED
> > [INFO] elasticsearch-shaded ... SKIPPED
> > [INFO] metron-elasticsearch ... SKIPPED
> > [INFO] metron-deployment .. SKIPPED
> > [INFO] Metron Ambari Management Pack .. SKIPPED
> > [INFO] 
> > 
> > [INFO] BUILD FAILURE
> > [INFO] 
> > ----
> > [INFO] Total time: 11:31 min
> > [INFO] Finished at: 2017-01-08T22:23:59-05:00
> > [INFO] Final Memory: 287M/6202M
> > [INFO] -

[DISCUSS] Hosting Kraken maven artifacts in incubator-metron git repo

2017-01-09 Thread Michael Miklavcic
This is a question primarily for the mentors.

*Background*
metron-common is currently depending on the openSOC github repo for hosting
kraken artifacts. The original reason for this was that these jars are not
hosted in Maven Central, and they were not reliably available in the Kraken
repo. https://issues.apache.org/jira/browse/METRON-650 is tracking work
around copying these artifacts to the Metron repo.

Kraken source on openSOC - https://github.com/OpenSOC/kraken
Krake maven repo on openSOC -
https://github.com/OpenSOC/kraken/tree/mvn-repo

*Ask*
Create a new branch in incubator-metron to host any necessary maven
artifacts. This branch would simply be incubator-metron/mvn-repo. This is
similar to how we've hosted the asf-site.

*Concerns/Questions*

   1. Can we host these jars/artifacts in this manner?
   2. Concerns regarding licensing?
   3. Do we need to also grab and host the source code?


Re: Kafka error when upgrading to Metron 0.3.0

2017-01-08 Thread Michael Miklavcic
Hi Tyler, I don't recall seeing any failures the last time I ran this. I
will take a look.

On Jan 8, 2017 9:56 PM, "Tyler Moore" <tmo...@goflyball.com> wrote:

Michael,

I am receiving error when trying to build with HPD profile, have you had
any problems with pcap-backend tests failing when building with hdp profile?
Errors and log files provided below:

Results :

Failed tests:
  
PcapTopologyIntegrationTest.testTimestampInKey:152->testTopology:388->assertInOrder:542
null
  
PcapTopologyIntegrationTest.testTimestampInPacket:135->testTopology:388->assertInOrder:542
null



Tests run: 2, Failures: 2, Errors: 0, Skipped: 0

[INFO] 

[INFO] Reactor Summary:
[INFO]
[INFO] Metron . SUCCESS [
 0.639 s]
[INFO] metron-analytics ... SUCCESS [
 0.029 s]
[INFO] metron-maas-common . SUCCESS [
 8.771 s]
[INFO] metron-platform  SUCCESS [
 0.057 s]
[INFO] metron-test-utilities .. SUCCESS [
 0.976 s]
[INFO] metron-integration-test  SUCCESS [
 3.888 s]
[INFO] metron-maas-service  SUCCESS [01:19
min]
[INFO] metron-common .. SUCCESS [
41.718 s]
[INFO] metron-statistics .. SUCCESS [
30.713 s]
[INFO] metron-hbase ... SUCCESS [
37.286 s]
[INFO] metron-profiler-common . SUCCESS [
 7.967 s]
[INFO] metron-profiler-client . SUCCESS [
54.088 s]
[INFO] metron-profiler  SUCCESS [02:28
min]
[INFO] metron-writer .. SUCCESS [
 7.151 s]
[INFO] metron-enrichment .. SUCCESS [
57.076 s]
[INFO] metron-indexing  SUCCESS [
 7.028 s]
[INFO] metron-solr  SUCCESS [
36.695 s]
[INFO] metron-pcap  SUCCESS [
 1.123 s]
[INFO] metron-parsers . SUCCESS [01:58
min]
[INFO] metron-pcap-backend  FAILURE [
49.928 s]
[INFO] metron-data-management . SKIPPED
[INFO] metron-api . SKIPPED
[INFO] metron-management .. SKIPPED
[INFO] elasticsearch-shaded ... SKIPPED
[INFO] metron-elasticsearch ... SKIPPED
[INFO] metron-deployment .. SKIPPED
[INFO] Metron Ambari Management Pack .. SKIPPED
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time: 11:31 min
[INFO] Finished at: 2017-01-08T22:23:59-05:00
[INFO] Final Memory: 287M/6202M
[INFO] 

[ERROR] Failed to execute goal org.apache.maven.plugins:
maven-surefire-plugin:2.18:test (integration-tests) on project
metron-pcap-backend: There are test failures.


Regards,

Tyler Moore
Software Engineer
Phone: 248-909-2769 <(248)%20909-2769>
Email: moore.ty...@goflyball.com


On Thu, Jan 5, 2017 at 5:21 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Hey Tyler,
>
> Build Metron with the HDP profile to get the proper deps for this
> -PHDP-2.5.0.0
>
> Hopefully that works for you.
>
> Best,
> Mike
>
>
> On Thu, Jan 5, 2017 at 3:17 PM, Tyler Moore <tmo...@goflyball.com> wrote:
>
> > Hey all,
> >
> > Wondering if there is a solution to the "Offset lags for kafka not
> > supported for older versions. Please update kafka spout to latest
> version."
> > error seen when upgrading to metron 0.3.0?
> >
> > I know it is due to kafka-storm dependency that needs updated, but what
> is
> > the best way to go about that? Is it as simple at changing the version in
> > the metron pom.xml and provisioning?
> >
> > Regards,
> >
> > Tyler Moore
> > Software Engineer
> > Phone: 248-909-2769
> > Email: moore.ty...@goflyball.com
> >
>


Re: METRON-648 GrokWebSphereParserTest and BasicAsaParserTest are not 2017-safe

2017-01-04 Thread Michael Miklavcic
+1

On Wed, Jan 4, 2017 at 8:04 AM, Justin Leet <justinjl...@gmail.com> wrote:

> In the short term, we could just generate the timestamp appropriately with
> the current year in the test for the test and spin off another JIRA for
> actually addressing the question of what we do with this data (Keep in mind
> we can eventually have replay use cases, so assuming the past year might
> not be totally sufficient either.)
>
> At that point it'll at least be year agnostic, but probably not the actual
> output we want. Normally, I'd rather it be handled correctly, but given
> that our builds fail, I'd rather have something less broken until we get a
> more correct solution.
>
> I can take care of doing that today.  Any objections to that solution?
>
> Justin
>
> On Wed, Jan 4, 2017 at 9:34 AM, Kyle Richardson <kylerichards...@gmail.com
> >
> wrote:
>
> > Unfortunately, it's not going to be quite as simple as just adding the
> year
> > into the test strings, at least for GrokWebSphereParserTest. (For
> > BasicAsaParserTest, updating the test string worked just fine.)
> >
> > It turns out that that grok pattern being used only expects the month and
> > day in the timestamp of the syslog messages. I'm happy to a take a stab
> and
> > making it year safe by reusing some of the code from the BasicAsaParser;
> > however, I have limited time today and it will likely take me until
> Friday
> > to get a PR submitted given the new scope of changes required.
> >
> > -Kyle
> >
> > On Wed, Jan 4, 2017 at 12:50 AM, Matt Foley <ma...@apache.org> wrote:
> >
> > > Yes, this is an endemic problem with log processing.  And I agree
> adding
> > > the year to the testString is the best fix for our short-term problem.
> > >
> > > For future consideration, we should consider if there should be an
> > > assumption/preference in the parser that the logs are in the “past”.
> > > Granted, if the timezone is also unspecified, there is still a 24 hr
> > period
> > > of uncertainty, but it does seem that on Jan 3 2017 the preferred
> > > interpretation of “Apr 15” would be Apr 15 2016, not 2017.
> > >
> > > Cheers,
> > > --Matt
> > >
> > > On 1/3/17, 5:14 PM, "Michael Miklavcic" <michael.miklav...@gmail.com>
> > > wrote:
> > >
> > > I also introduced a Clock object and testing mechanism back in
> > > METRON-235 -
> > > https://github.com/apache/incubator-metron/pull/156
> > > Sample test utilizing the Clock object here -
> > > https://github.com/apache/incubator-metron/blob/master/
> > > metron-platform/metron-pcap-backend/src/test/java/org/
> > > apache/metron/pcap/query/PcapCliTest.java
> > >
> > > That being said, it's probably better to use the new java.time
> fixed
> > > clock
> > > implementation in all places, as referenced by Matt. I'm agreed
> with
> > > everyone on a quick fix for the build and a follow-on PR to
> introduce
> > > appropriate dep injection for testing.
> > >
> > > AFA string dates with no year, we had something similar show up in
> > the
> > > Snort parser. There ended up being a configuration option in Snort
> to
> > > enable a year to be printed, but we may want to offer alternatives
> > for
> > > other parsers. Regardless of how we approach this it gets messy
> when
> > > you
> > > start thinking about potentially different src/dest timezones
> across
> > a
> > > new
> > > year boundary in addition to data replay. I would urge our main
> goal
> > > here
> > > to be idempotency.
> > >
> > > Best,
> > > Mike
> > >
> > > On Tue, Jan 3, 2017 at 5:05 PM, Kyle Richardson <
> > > kylerichards...@gmail.com>
> > > wrote:
> > >
> > > > Agreed. I prefer the quick win to get us back to successful
> builds.
> > > >
> > > > I do think it's worth a general discussion around how we want to
> > > handle
> > > > the parsing of string dates with no year. In the long run, Matt's
> > > > suggestion of incorporating the Clock object is probably the
> route
> > > to go;
> > > > albeit as a separate enhancement PR.
> > > >
> > > > I'll start a new discuss thread for that and submit a PR for the
> > > quick fix.
> > > >
> 

Re: METRON-648 GrokWebSphereParserTest and BasicAsaParserTest are not 2017-safe

2017-01-03 Thread Michael Miklavcic
I also introduced a Clock object and testing mechanism back in METRON-235 -
https://github.com/apache/incubator-metron/pull/156
Sample test utilizing the Clock object here -
https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-pcap-backend/src/test/java/org/apache/metron/pcap/query/PcapCliTest.java

That being said, it's probably better to use the new java.time fixed clock
implementation in all places, as referenced by Matt. I'm agreed with
everyone on a quick fix for the build and a follow-on PR to introduce
appropriate dep injection for testing.

AFA string dates with no year, we had something similar show up in the
Snort parser. There ended up being a configuration option in Snort to
enable a year to be printed, but we may want to offer alternatives for
other parsers. Regardless of how we approach this it gets messy when you
start thinking about potentially different src/dest timezones across a new
year boundary in addition to data replay. I would urge our main goal here
to be idempotency.

Best,
Mike

On Tue, Jan 3, 2017 at 5:05 PM, Kyle Richardson 
wrote:

> Agreed. I prefer the quick win to get us back to successful builds.
>
> I do think it's worth a general discussion around how we want to handle
> the parsing of string dates with no year. In the long run, Matt's
> suggestion of incorporating the Clock object is probably the route to go;
> albeit as a separate enhancement PR.
>
> I'll start a new discuss thread for that and submit a PR for the quick fix.
>
> -Kyle
>
> > On Jan 3, 2017, at 5:20 PM, David Lyle  wrote:
> >
> > I'm not sure I'm an owner, but I have an opinion. :)
> >
> > I'd just add "2016". Easy and targeted.
> >
> > -D...
> >
> >
> >> On Tue, Jan 3, 2017 at 5:08 PM, Matt Foley  wrote:
> >>
> >> I’ll subordinate this to METRON-647 since it was evidently filed while I
> >> was writing METRON-648 (I did check before!)
> >>
> >> The question below remains valid, however…
> >>
> >>
> >> On 1/3/17, 1:59 PM, "Matt Foley"  wrote:
> >>
> >>Hi all,
> >>As described in https://issues.apache.org/jira/browse/METRON-648 ,
> >> these two test modules are not year-safe, and are suddenly (as of 2017)
> >> giving false Travis errors.
> >>
> >>I can fix it quickly, but a question for the “owners” of GrokParser:
> >> Do you have an opinion as to whether the fix should be done by adding
> >> "2016" to the testString values in the GrokWebSphereParserTest test
> module
> >> (easy, and only affects the test module), vs making GrokParser use a
> Clock
> >> object set to 2016 (more involved, and affecting core code, but allowing
> >> for more interesting testing)?
> >>
> >>For those interested, BasicAsaParserTest::testShortTimestamp()
> >> illustrates the use of Clock object in the Asa Parser and its test
> module.
> >>
> >>Thanks,
> >>--Matt
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>


Re: [DISCUSS] Coding Guidelines

2016-12-21 Thread Michael Miklavcic
Works for me also.

On Wed, Dec 21, 2016 at 12:38 PM, Matt Foley <ma...@apache.org> wrote:

> Works for me, thanks.
>
> On 12/21/16, 11:21 AM, "Casey Stella" <ceste...@gmail.com> wrote:
>
> Sure, how about making it generic to "a deployed cluster"?
>
> On Wed, Dec 21, 2016 at 2:20 PM, Matt Foley <ma...@apache.org> wrote:
>
> > +1 on Casey’s first edit.  However, wrt the second, can we please not
> > require vagrant?  Any of our single-node test deployments, including
> > vagrant, ansible, mpack, or (soon :-) docker, should be acceptable.
> >
> > Thanks,
> > --Matt (who can’t run vagrant workably on the systems available to
> me)
> >
> >
> > On 12/21/16, 8:52 AM, "Michael Miklavcic" <
> michael.miklav...@gmail.com>
> > wrote:
> >
> > Agreed on Casey's addition to 2.5. What do you think about
> saying the
> > pla
> > should be stated on the PR, since that will be replicated to Jira
> > automatically?
> >
> > On Wed, Dec 21, 2016 at 7:49 AM, Casey Stella <
> ceste...@gmail.com>
> > wrote:
> >
> > > Oh, one more, I propose the following addition to 2.5:
> > > >
> > > > JIRAs will have a description of how to exercise the
> functionality
> > in a
> > > > step-by-step manner on a Quickdev vagrant instance to aid
> review
> > and
> > > > validation.
> > >
> > >
> > > When Mike, Otto and I moved the system to the current version
> of
> > Storm, we
> > > needed a broader smoke test than just running data through that
> > exercised a
> > > variety of the features. We pulled those smoke tests from the
> various
> > > discussions in the JIRAs.
> > >
> > >
> > >
> > > On Wed, Dec 21, 2016 at 9:38 AM, Casey Stella <
> ceste...@gmail.com>
> > wrote:
> > >
> > > > We have been having a lively discussion on METRON-590 (see
> > > > https://github.com/apache/incubator-metron/pull/395) around
> > creating
> > > > multiple abstractions to do the same (or very nearly the
> same)
> > thing.
> > > >
> > > > I'd like to propose an addition to section 2.3 which reads:
> > > >
> > > >> Contributions which provide abstractions which are either
> very
> > similar
> > > to
> > > >> or a subset of existing abstractions should use and extend
> > existing
> > > >> abstractions rather than provide competing abstractions
> unless
> > > engineering
> > > >> exigencies (e.g. performance ) make such an operation
> impossible
> > without
> > > >> compromising core functionality of the platform.
> > > >
> > > >
> > > > I'd like to suggest the following anecdote from the early
> years of
> > the
> > > > codebase to justify the above:
> > > >
> > > > Stellar started as a predicate language only for threat
> triage
> > rules. As
> > > > such, when the task of creating Field Transformations came
> to me, I
> > > needed
> > > > something like Stellar except I needed it to return arbitrary
> > objects,
> > > > rather than just booleans. In my infinite wisdom, I chose to
> fork
> > the
> > > > language, create a second, more specific DSL for field
> > transformations,
> > > > thereby creating "Metron Query Language" and "Metron
> Transformation
> > > > Language."
> > > >
> > > > I felt a nagging feeling at the time that I should just
> expand the
> > query
> > > > language, but I convinced myself that it would require too
> much
> > testing
> > > and
> > > > it would be a change that was too broad in scope. It took 3
> months
> > for me
> > > > to get around to unifying those languages and if we had more
> > people using
> > > > it, it would have bee

Re: [DISCUSS] Coding Guidelines

2016-12-21 Thread Michael Miklavcic
> > 2.2 Code Style
> >>> > > Follow the Sun Code Conventions outlined here:
> >>> > > http://www.oracle.com/technetwork/java/codeconvtoc-
> 136057.ht
> >>> ml
> >>> > > except that indents are 2 spaces instead of 4
> >>> > > 2.3 Coding Standards
> >>> > > Implementation matches what the documentation says
> >>> > > Logger name is effectively the result of Class.getName()
> >>> > > Class & member access - as restricted as it can be (subject
> >>> to testing
> >>> > > requirements)
> >>> > > Appropriate NullPointerException and
> >>> IllegalArgumentException argument
> >>> > > checks
> >>> > > Asserts - verify they should always be true
> >>> > > Look for accidental propagation of exceptions
> >>> > > Look for unanticipated runtime exceptions
> >>> > > Try-finally used as necessary to restore consistent state
> >>> > > Logging levels conform to Log4j levels
> >>> > > Possible deadlocks - look for inconsistent locking order
> >>> > > Race conditions - look for missing or inadequate
> >>> synchronization
> >>> > > Consistent synchronization - always locking the same
> >>> object(s)
> >>> > > Look for synchronization or documentation saying there's no
> >>> synchronization
> >>> > > Look for possible performance problems
> >>> > > Look at boundary conditions for problems
> >>> > > Configuration entries are retrieved/set via setter/getter
> >>> methods
> >>> > > Implementation details do NOT leak into interfaces
> >>> > > Variables and arguments should be interfaces where possible
> >>> > > If equals is overridden then hashCode is overridden (and
> >>> vice versa)
> >>> > > Objects are checked (instanceof) for appropriate type
> before
> >>> casting (use
> >>> > > generics if possible)
> >>> > > Public API changes have been publicly discussed
> >>> > > Use of static member variables should be used with caution
> >>> especially in
> >>> > > Map/reduce tasks due to the JVM reuse feature
> >>> > > 2.4 Documentation
> >>> > >
> >>> > > Code-Level Documentation
> >>> > > Self-documenting code (variable, method, class) has a clear
> >>> semantic name
> >>> > > Accurate, sufficient for developers to code against
> >>> > > Follows standard Javadoc conventions
> >>> > > Loggers and logging levels covered if they do not follow
> our
> >>> conventions
> >>> > > (see below)
> >>> > > System properties, configuration options, and resources
> >>> covered
> >>> > > Illegal arguments are properly documented as appropriate
> >>> > > Package and overview Javadoc are updated as appropriate
> >>> > > Javadoc comments are mandatory for all public APIs
> >>> > > Generate Javadocs for release builds
> >>> > >
> >>> > > Feature-level documentation - should be version controlled
> >>> in github in
> >>> > > README files.
> >>> > > Accurate description of the feature
> >>> > > Sample configuration and deployment options
> >>> > > Sample usage scenarios
> >>> > >
> >>> > > High-Level Design documentation - architecture description
> >>> and diagrams
> >>> > > should be a part of a wiki entry.
> >>> > > Provide diagrams/charts where appropriate. Visuals are
> >>> always welcome
> >>> > > Provide purpose of the feature/module and why it exists
> >>> within the project
> >>> > > Describe system flows through the feature/module where
> >>> appropriate
> >>> > > Describe how th

Re: [DISCUSS] Coding Guidelines

2016-12-20 Thread Michael Miklavcic
Were you thinking javadoc or something more? I wouldn't mind seeing us
produce a javadoc site, if we aren't already doing so.

On Dec 20, 2016 9:25 AM, "zeo...@gmail.com"  wrote:

> Regarding documentation - while I'm not a huge fan of that approach (I
> would prefer to see documentation generated from the code), I think it
> could work in the short term.  Having that outlined both in the coding
> guidelines and on the wiki would be important.
>
> I agree with the comments about author != committer, and 100% code
> coverage.
>
> Jon
>
> On Tue, Dec 20, 2016 at 11:10 AM James Sirota  wrote:
>
> > In my view the lower-level documentation that should be source controlled
> > with the code belongs on github and then use case documentation and
> > top-level architecture diagrams belong on the wiki.  What do you think?
> >
> > I think if the author is not a committer and can't merge then the
> reviewer
> > should probably merge or the PR originator should ping the dev board to
> get
> > someone to merge the PR in.  Does that seem reasonable to everyone?
> >
> > 18.12.2016, 13:10, "Kyle Richardson" :
> > > Couple of questions/comments:
> > >
> > > In 2.4, we talk about Javadoc and code comments but not too much about
> > the
> > > user documentation. Should we, possibly in a section 4, give some
> > > recommendations on what should go into the README files versus on the
> > wiki?
> > > This could also help the reviewer know if the change is documented
> > > sufficiently.
> > >
> > > In 2.6, we say that 1 qualified reviewer (Apache committer or PPMC
> > member)
> > > other than the author of the PR must have given it a +1. In the case
> > where
> > > the author is not a committer (who could merge their own PR), should we
> > > state that the reviewer will be responsible for the merge?
> > >
> > > -Kyle
> > >
> > > On Fri, Dec 16, 2016 at 6:39 PM, James Sirota 
> > wrote:
> > >
> > >>  Lets move this back to the discuss thread since it's still generating
> > that
> > >>  many comments. Please post all your feedback and I will incorporate
> it
> > and
> > >>  put it back to a vote.
> > >>
> > >>  Thanks,
> > >>  James
> > >>
> > >>  16.12.2016, 16:12, "Matt Foley" :
> > >>  > +1
> > >>  >
> > >>  > In 2.2 (follow Sun guidelines), do you want to add the notation
> > “except
> > >>  that indents are 2 spaces instead of 4”, as Hadoop does? Or does the
> > Metron
> > >>  community like 4-space indents? I see both in the Metron code.
> > >>  >
> > >>  > My +1 holds in either case.
> > >>  > --Matt
> > >>  >
> > >>  > On 12/16/16, 9:34 AM, "James Sirota"  wrote:
> > >>  >
> > >>  > I incorporated the changes to the coding guidelines from our
> discuss
> > >>  thread. I'd like to get them voted on to make them official.
> > >>  >
> > >>  > https://cwiki.apache.org/confluence/pages/viewpage.
> > >>  action?pageId=61332235
> > >>  >
> > >>  > Please vote +1, -1, 0
> > >>  >
> > >>  > The vote will be open for 72 hours.
> > >>  >
> > >>  > ---
> > >>  > Thank you,
> > >>  >
> > >>  > James Sirota
> > >>  > PPMC- Apache Metron (Incubating)
> > >>  > jsirota AT apache DOT org
> > >>
> > >>  ---
> > >>  Thank you,
> > >>
> > >>  James Sirota
> > >>  PPMC- Apache Metron (Incubating)
> > >>  jsirota AT apache DOT org
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> --
>
> Jon
>
> Sent from my mobile device
>


Re: [DISCUSS] Metron IRC channel

2016-12-16 Thread Michael Miklavcic
Jira search

On Fri, Dec 16, 2016 at 11:50 AM, Otto Fowler 
wrote:

> Start with jira and git?
>
>
>
> On December 16, 2016 at 13:17:06, Casey Stella (ceste...@gmail.com) wrote:
>
> Hi all,
>
> Any ideas of what features we would like to add? The options are at:
> http://wilderness.apache.org/manual.html
>
> On Wed, Nov 16, 2016 at 3:14 PM, Casey Stella  wrote:
>
> > Done
> >
> > https://issues.apache.org/jira/browse/INFRA-12931
> >
> >
> > On Wed, Nov 16, 2016 at 12:34 PM, Yohann Lepage 
> > wrote:
> >
> >> Could an official member of the project fill an issue on ASF INFRA as
> >> described on https://reference.apache.org/pmc/github to get the bot on
> >> #apache-metron?
> >>
> >>
> >> 2016-11-04 18:18 GMT+01:00 James Sirota :
> >>
> >> > We tried using slack during the early days of the project and it was
> >> > frowned upon by Apache. So we abandoned it in favor of IRC and message
> >> > lists.
> >> >
> >> > 04.11.2016, 04:54, "zeo...@gmail.com" :
> >> > > Is anybody interested in migrating this to slack? I'm personally a
> >> fan of
> >> > > the benefits this provides - just wanted to bring it up and see if
> >> anyone
> >> > > else was thinking the same thing. If not, no biggie.
> >> > >
> >> > > Jon
> >> > >
> >> > > On Thu, Sep 29, 2016 at 1:52 PM zeo...@gmail.com 
> >> > wrote:
> >> > >
> >> > >> +1 #apache-metron
> >> > >>
> >> > >> On Thu, Sep 29, 2016 at 1:45 PM David Lyle 
> >> > wrote:
> >> > >>
> >> > >> ditto.
> >> > >>
> >> > >> On Thu, Sep 29, 2016 at 1:29 PM, Casey Stella 
> >> > wrote:
> >> > >>
> >> > >> > I'd agree; let's focus on #apache-metron
> >> > >> >
> >> > >> > On Thu, Sep 29, 2016 at 11:55 AM, James Sirota <
> >> jsir...@apache.org>
> >> > >> wrote:
> >> > >> >
> >> > >> > > I would just keep #apache-metron and open it up to general
> >> public
> >> > >> > >
> >> > >> > > 29.09.2016, 08:54, "Yohann Lepage" :
> >> > >> > > > Hi everyone,
> >> > >> > > >
> >> > >> > > > There are currently two IRC channels on FreeNode for Metron:
> >> > >> > > > - #apache-metron
> >> > >> > > > - #apache-metron-dev
> >> > >> > > >
> >> > >> > > > One channel is maybe enough as we are less than 5 users.
> >> > >> > > >
> >> > >> > > > What do you think? Which one to keep ?
> >> > >> > > >
> >> > >> > > > Related issue: https://issues.apache.org/jira
> >> /browse/METRON-337
> >> > -
> >> > >> > > > Invite ASFBot to #apache-metron-dev IRC channel
> >> > >> > > > --
> >> > >> > > > Yohann L.
> >> > >> > >
> >> > >> > > ---
> >> > >> > > Thank you,
> >> > >> > >
> >> > >> > > James Sirota
> >> > >> > > PPMC- Apache Metron (Incubating)
> >> > >> > > jsirota AT apache DOT org
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >> --
> >> > >>
> >> > >> Jon
> >> > > --
> >> > >
> >> > > Jon
> >> >
> >> > ---
> >> > Thank you,
> >> >
> >> > James Sirota
> >> > PPMC- Apache Metron (Incubating)
> >> > jsirota AT apache DOT org
> >> >
> >>
> >>
> >>
> >> --
> >> Yohann L.
> >>
> >
> >
>


Re: [VOTE] Coding Guidelines

2016-12-16 Thread Michael Miklavcic
I like that Otto. I'd like to tweak it just a bit to say automated tests. I
also tend to manually/smoke test things, but I'm on the fence if we want to
specify that as well. Just for example, with the new HyperLogLogPlus
Stellar functions I added, there are unit and integration tests, but I also
spun up the REPL to make sure everything works as expected.

“All merged patches will be reviewed with the expectation that automated
tests exist
and are consistent with project testing methodology and practices, and
cover the appropriate cases ( see reviewers guide )"

On Fri, Dec 16, 2016 at 12:36 PM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> Also, this doesn’t specify the acceptable way to measure that.
>
> The question is how to properly phrase it.
>
> “All merged patches will be reviewed with the expectation that tests exist
> and are consistent with project testing methodology and practices, and
> cover the appropriate cases ( see reviewers guide )"
>
>
> On December 16, 2016 at 13:58:14, Casey Stella (ceste...@gmail.com) wrote:
>
> Yeah I don't like that requirement at all. Sensible unit and integration
> test representation is a decent goal, but I don't like a code coverage req.
> On Fri, Dec 16, 2016 at 13:48 Michael Miklavcic <
> michael.miklav...@gmail.com>
>
> wrote:
>
> > Can someone clarify this point in merge requirements?
> >
> > "All merged patches must have 100% test coverage."
> >
> >
> > On Fri, Dec 16, 2016 at 10:34 AM, James Sirota <jsir...@apache.org>
> wrote:
> >
> > > I incorporated the changes to the coding guidelines from our discuss
> > > thread. I'd like to get them voted on to make them official.
> > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=61332235
> > >
> > > Please vote +1, -1, 0
> > >
> > > The vote will be open for 72 hours.
> > >
> > > ---
> > > Thank you,
> > >
> > > James Sirota
> > > PPMC- Apache Metron (Incubating)
> > > jsirota AT apache DOT org
> > >
> >
>


Re: [VOTE] Coding Guidelines

2016-12-16 Thread Michael Miklavcic
Can someone clarify this point in merge requirements?

"All merged patches must have 100% test coverage."


On Fri, Dec 16, 2016 at 10:34 AM, James Sirota  wrote:

> I incorporated the changes to the coding guidelines from our discuss
> thread.  I'd like to get them voted on to make them official.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235
>
> Please vote +1, -1, 0
>
> The vote will be open for 72 hours.
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: Process for closing JIRAs

2016-12-05 Thread Michael Miklavcic
To the first question, mark the Jira as DONE. I don't believe we can do fix
version until the community agrees on the next version increment, which we
normally won't know until we're getting close to cutting a release and can
make an assessment of it being major or minor.

On Mon, Dec 5, 2016 at 11:48 AM, Kyle Richardson 
wrote:

> What's our current process for closing JIRAs once the github PR has been
> merged? Are we setting it to Done with a particular fix version?
>
> If a JIRA is a duplicate, are we marking it any particular way?
>
> Thanks,
> Kyle
>


Re: [DISCUSS] Metron next version rev

2016-11-15 Thread Michael Miklavcic
The question is if we actually need to back-port at all at this point. I
think the assertion here is that pretty much everyone using Metron right
now is currently getting patches, etc. by upgrading to the latest release.
If/when we find a need to fork release branches we can certainly do it and
have a more involved discussion pertinent to the circumstances at hand.

On Tue, Nov 15, 2016 at 10:17 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> Would the back ports also have to go through a full ‘apache release’
> process and be planned out as well?
> I don’t think that should all be worked out as we go.
>
>
> On November 15, 2016 at 12:13:55, Michael Miklavcic (
> michael.miklav...@gmail.com) wrote:
>
> I'm a +1 on David and Nick's suggestions. 1 and 2 now, and let 3 happen
> organically when the community has a need.
>
> On Tue, Nov 15, 2016 at 9:29 AM, David Lyle <dlyle65...@gmail.com> wrote:
>
> > I think that's an excellent understanding and suggestion on #3.
> >
> > Fwiw, the norm I've seen is to allow the requester and the dev to work
> that
> > out.
> >
> > Thanks,
> >
> > -D...
> >
> >
> > On Tue, Nov 15, 2016 at 11:22 AM, Nick Allen <n...@nickallen.org>
> wrote:
> >
> > > I broke down what I am understanding of your suggestion into bullet
> > > points. Please correct me if I am wrong.
> > >
> > > (1) Bump the rev immediately following a release
> > > (2) Update the current version in master to 0.4.0
> > > (3) Maintain and back port bug fixes to a 0.3.x branch
> > >
> > >
> > > I would agree with you on items (1) and (2); +1 on those.
> > >
> > > Item (3) is what drove my questions. I feel this needs a little more
> > > discussion to outline what gets back ported, how it is back ported and
> > when
> > > that might occur. I am not concerned about the technicalities of
> > > maintaining multiple branches, more the process side of things.
> > >
> > > Maybe we could sit on (3) until there is a community member with an
> > > interest in back porting a fix? Right now, I don't know of any, but
> maybe
> > > I've missed a conversation.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 10:42 AM, David Lyle <dlyle65...@gmail.com>
> > wrote:
> > >
> > > > So, the notion is- we're going to have a 0.4.0 release at some
> future
> > > > point. If, during that release cycle, we found critical bug fix type
> > > issues
> > > > that we wanted to release out of cycle, we could patch the 0.3.0
> branch
> > > and
> > > > cut a release from there. You're correct that we'd have to commit
> them
> > to
> > > > both branches. I don't know of a way to avoid that with that type of
> > > stuff,
> > > > so I think the best we can do is to minimize them.
> > > >
> > > > Alternatively, we can decide that the next release will be 0.3.1 and
> > > either
> > > > abandon the notion of semantic versioning [1] (i.e. put features and
> > bug
> > > > fixes in a x.x.1 release) or only release bug fixes.
> > > >
> > > > I don't really have a strong preference excepting that I know for
> sure
> > > that
> > > > master will no longer be 0.3.0 once 0.3.0 is released, so we should
> > bump
> > > > the rev immediately following the 0.3.0 release (if not sooner).
> > > >
> > > > -D...
> > > >
> > > > [1] http://semver.org/
> > > >
> > > >
> > > > On Tue, Nov 15, 2016 at 9:52 AM, Nick Allen <n...@nickallen.org>
> > wrote:
> > > >
> > > > > What kind of PRs would qualify as 0.3.x fixes? How would we decide
> > > that?
> > > > > For those we would then have to commit them against both the 0.3.x
> > > branch
> > > > > and master (0.4.0), right?
> > > > >
> > > > > Off the top of your head, can you think of a few recent PRs that
> > would
> > > > > qualify as patches? I'd just like to get a feel for how many of
> > those
> > > > > might exist.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Nov 14, 2016 at 6:58 PM, David Lyle <dlyle65...@gmail.com>
>
> > > > wrote:
> > > > >
> > > > > > Hi Mike,
> > > > > >
> > > > > > I'd like to se

Re: [DISCUSS] Metron next version rev

2016-11-15 Thread Michael Miklavcic
I'm a +1 on David and Nick's suggestions. 1 and 2 now, and let 3 happen
organically when the community has a need.

On Tue, Nov 15, 2016 at 9:29 AM, David Lyle <dlyle65...@gmail.com> wrote:

> I think that's an excellent understanding and suggestion on #3.
>
> Fwiw, the norm I've seen is to allow the requester and the dev to work that
> out.
>
> Thanks,
>
> -D...
>
>
> On Tue, Nov 15, 2016 at 11:22 AM, Nick Allen <n...@nickallen.org> wrote:
>
> > I broke down what I am understanding of your suggestion into bullet
> > points.  Please correct me if I am wrong.
> >
> > (1) Bump the rev immediately following a release
> > (2) Update the current version in master to 0.4.0
> > (3) Maintain and back port bug fixes to a 0.3.x branch
> >
> >
> > I would agree with you on items (1) and (2); +1 on those.
> >
> > Item (3) is what drove my questions.  I feel this needs a little more
> > discussion to outline what gets back ported, how it is back ported and
> when
> > that might occur.  I am not concerned about the technicalities of
> > maintaining multiple branches, more the process side of things.
> >
> > Maybe we could sit on (3) until there is a community member with an
> > interest in back porting a fix? Right now, I don't know of any, but maybe
> > I've missed a conversation.
> >
> >
> >
> >
> >
> > On Tue, Nov 15, 2016 at 10:42 AM, David Lyle <dlyle65...@gmail.com>
> wrote:
> >
> > > So, the notion is- we're going to have a 0.4.0 release at some future
> > > point. If, during that release cycle, we found critical bug fix type
> > issues
> > > that we wanted to release out of cycle, we could patch the 0.3.0 branch
> > and
> > > cut a release from there. You're correct that we'd have to commit them
> to
> > > both branches. I don't know of a way to avoid that with that type of
> > stuff,
> > > so I think the best we can do is to minimize them.
> > >
> > > Alternatively, we can decide that the next release will be 0.3.1 and
> > either
> > > abandon the notion of semantic versioning [1] (i.e. put features and
> bug
> > > fixes in a x.x.1 release) or only release bug fixes.
> > >
> > > I don't really have a strong preference excepting that I know for sure
> > that
> > > master will no longer be 0.3.0 once 0.3.0 is released, so we should
> bump
> > > the rev immediately following the 0.3.0 release (if not sooner).
> > >
> > > -D...
> > >
> > > [1] http://semver.org/
> > >
> > >
> > > On Tue, Nov 15, 2016 at 9:52 AM, Nick Allen <n...@nickallen.org>
> wrote:
> > >
> > > > What kind of PRs would qualify as 0.3.x fixes?  How would we decide
> > that?
> > > > For those we would then have to commit them against both the 0.3.x
> > branch
> > > > and master (0.4.0), right?
> > > >
> > > > Off the top of your head, can you think of a few recent PRs that
> would
> > > > qualify as patches?  I'd just like to get a feel for how many of
> those
> > > > might exist.
> > > >
> > > >
> > > >
> > > > On Mon, Nov 14, 2016 at 6:58 PM, David Lyle <dlyle65...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Mike,
> > > > >
> > > > > I'd like to see us increment the version on master ASAP. Once 0.3.0
> > is
> > > > > released, master is no longer the 0.3.0 branch.
> > > > >
> > > > > I recommend that we run 0.3.x patches off the 0.3.0 release branch
> > and
> > > > > rename master to 0.4.0.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -D...
> > > > >
> > > > >
> > > > > On Mon, Nov 14, 2016 at 4:40 PM, Michael Miklavcic <
> > > > > michael.miklav...@gmail.com> wrote:
> > > > >
> > > > > > 1 - What the next version should be.
> > > > > > 2 - When we should increment the version
> > > > > >
> > > > > > On Mon, Nov 14, 2016 at 2:35 PM, zeo...@gmail.com <
> > zeo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry, but I don't exactly follow.  Are you looking to discuss
> > what
> > > > the
> > > > > > > version number should be next time around (1.0 vs 0.4 vs

Re: [DISCUSS] Metron next version rev

2016-11-15 Thread Michael Miklavcic
We'd cut a release from master - we'd initially increment to 0.4.0-SNAPSHOT
with the approach David is recommending. And any patches in 0.3.x would
need to also be applied to master.

On Tue, Nov 15, 2016 at 7:57 AM, Nick Allen <n...@nickallen.org> wrote:

> And where would the next release get cut from; master or the 0.3.x branch?
> Or is that something we decide when we cut a release based on what we want
> to include?
>
>
>
> On Tue, Nov 15, 2016 at 9:52 AM, Nick Allen <n...@nickallen.org> wrote:
>
> > What kind of PRs would qualify as 0.3.x fixes?  How would we decide that?
> > For those we would then have to commit them against both the 0.3.x branch
> > and master (0.4.0), right?
> >
> > Off the top of your head, can you think of a few recent PRs that would
> > qualify as patches?  I'd just like to get a feel for how many of those
> > might exist.
> >
> >
> >
> > On Mon, Nov 14, 2016 at 6:58 PM, David Lyle <dlyle65...@gmail.com>
> wrote:
> >
> >> Hi Mike,
> >>
> >> I'd like to see us increment the version on master ASAP. Once 0.3.0 is
> >> released, master is no longer the 0.3.0 branch.
> >>
> >> I recommend that we run 0.3.x patches off the 0.3.0 release branch and
> >> rename master to 0.4.0.
> >>
> >> Thanks,
> >>
> >> -D...
> >>
> >>
> >> On Mon, Nov 14, 2016 at 4:40 PM, Michael Miklavcic <
> >> michael.miklav...@gmail.com> wrote:
> >>
> >> > 1 - What the next version should be.
> >> > 2 - When we should increment the version
> >> >
> >> > On Mon, Nov 14, 2016 at 2:35 PM, zeo...@gmail.com <zeo...@gmail.com>
> >> > wrote:
> >> >
> >> > > Sorry, but I don't exactly follow.  Are you looking to discuss what
> >> the
> >> > > version number should be next time around (1.0 vs 0.4 vs 0.3.1?) or
> >> what
> >> > > tasks need to be accomplished before the next version of metron is
> >> > > considered ready?  Thanks,
> >> > >
> >> > > Jon
> >> > >
> >> > > On Mon, Nov 14, 2016, 16:29 Michael Miklavcic <
> >> > michael.miklav...@gmail.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > This is a thread to discuss what the next version of Metron should
> >> be
> >> > > > after Apache
> >> > > > Metron 0.3.0-RC1 incubating is released, e.g. 0.3.1?
> >> > > >
> >> > > > I'd like to up the rev asap, however one thing to consider is that
> >> we
> >> > > might
> >> > > > change the version again at a later point in time prior to the
> next
> >> > > > release. This current release candidate being a case in point. On
> >> the
> >> > > other
> >> > > > hand, I am also currently working on simplifying the version
> change
> >> > > process
> >> > > > so it might not be a big deal either way.
> >> > > >
> >> > > > Thanks,
> >> > > > Mike Miklavcic
> >> > > >
> >> > > --
> >> > >
> >> > > Jon
> >> > >
> >> > > Sent from my mobile device
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Nick Allen <n...@nickallen.org>
> >
>
>
>
> --
> Nick Allen <n...@nickallen.org>
>


Re: Help with custom enrichment / parser

2016-11-04 Thread Michael Miklavcic
Can you check for any exceptions in the enrichment logs using the following
grep?
grep --color=auto -C 3 -R -iE "exception" /var/log/storm

It would also be good to know where the data is getting hung up. Can you
check if you're getting tuples transferring and acking through the indexing
Kafka spout?

On Thu, Nov 3, 2016 at 3:41 PM, Tyler Moore <tmo...@goflyball.com> wrote:

> The sample i'm sending has over logs about 40,000 records so I don't think
> that is the issue.
>
> My batch size is 5 and the this is what it looks like when i dump it from
> zookeeper
> ENRICHMENT Config: bro
> {
>   "index" : "bro",
>   "batchSize" : 5,
>   "enrichment" : {
> "fieldMap" : {
>   "geo" : [ "ip_dst_addr", "ip_src_addr" ],
>   "host" : [ "ip_src_addr", "ip_dst_addr" ],
>   "hbaseEnrichment" : [ "ip_src_addr", "ip_dst_addr" ]
> },
> "fieldToTypeMap" : {
>   "ip_dst_addr" : [ "hostname", "asset" ],
>   "ip_src_addr" : [ "hostname", "asset" ]
> },
> "config" : { }
>   },
>   "threatIntel" : {
> "fieldMap" : {
>   "hbaseThreatIntel" : [ "ip_src_addr", "ip_dst_addr" ]
> },
> "fieldToTypeMap" : {
>   "ip_src_addr" : [ "malicious_ip" ],
>   "ip_dst_addr" : [ "malicious_ip" ]
> },
> "config" : { },
> "triageConfig" : {
>       "riskLevelRules" : { },
>   "aggregator" : "MAX",
>   "aggregationConfig" : { }
> }
>   },
>   "configuration" : { }
> }
>
> I loaded an extractor config file with it so I'm wondering if that should
> have populated the config fields here or maybe I need to add mappings to
> the column families in there?
>
> Regards,
>
> Tyler
>
> Regards,
>
> Tyler Moore
> Software Engineer
> Flyball Labs
>
> On Thu, Nov 3, 2016 at 3:55 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Not sure about the python-kafka lib issues. Regarding enrichment data
> > getting written to ES, how many records have you processed and what is
> your
> > batch size? You might need to write more records or adjust this for the
> > values to propagate through. See the "Sensor Enrichment Configuration"
> > section -
> > https://github.com/apache/incubator-metron/tree/master/
> > metron-platform/metron-enrichment
> >
> >
> > On Thu, Nov 3, 2016 at 1:03 PM, Tyler Moore <tmo...@goflyball.com>
> wrote:
> >
> > > Mike,
> > >
> > > I am using quick-dev vagrant deployment and at the moment testing
> locally
> > > but we plan on having data from remote locations streaming in to be
> > parsed.
> > > I was able to get the parsers running, thanks to casey, looks like i
> > missed
> > > an update to the Hbase enrichment writer naming convention.
> > > Still working on the enrichment configs though, they aren't throwing
> any
> > > errors and storm says they are emitting data, but not being written to
> > > elastic.
> > > As well with the python-kafka library, can't figure out why the json
> > > serializer isn't working, as long as I have a parser implemented I
> could
> > > forego serializing the data
> > > prior to sending to a kafka topic correct??
> > >
> > > Thanks for all your help thus far!
> > >
> > > Regards,
> > >
> > > Tyler
> > >
> > > Regards,
> > >
> > > Tyler Moore
> > > Software Engineer
> > > Flyball Labs
> > >
> > > On Thu, Nov 3, 2016 at 2:42 PM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > Tyler,
> > > >
> > > > Thanks for the interest in Metron and welcome to the community! :)
> > > >
> > > > Just curious, what type of environment are you running in? Full
> cluster
> > > or
> > > > are you using the full-dev or quick-dev vagrant deployment vagrant
> > > scripts?
> > > >
> > > > Best,
> > > > Mike Miklavcic
> > > >
> > > >
> > > > On Thu, Nov 3, 2016 at 10:34 AM, Tyler Moore <tmo...@goflyball.com>
> > > wrote:
> > > >
> > > > > Haven't heard of the acrony

Re: [DISCUSS] Intentional processing delays

2016-11-04 Thread Michael Miklavcic
So, you want to queue up the data awaiting a key match on the enrichment
data, up to a max timeout and/or buffer size? Seems like this should belong
at the spout level to avoid buffer overflows, depending on how big the data
sets are and how far apart the matching records/elements are spaced in time.

On Fri, Nov 4, 2016 at 7:28 AM, zeo...@gmail.com  wrote:

> Is there a good method (i.e. something using Stellar/ZK) to implement an
> intentional processing delay to all tuples in a specific topology?  I plan
> to do some custom enrichments, but the data used to do the enrichment
> *may* be
> ingested at roughly the same time the data to be enriched is (it also may
> not ever be sent).  So I'd like to add a delay in my cluster that applies
> to certain parser topologies.
>
> I took a look around in the documentation and in JIRA and didn't find
> anything available or being worked on, but I did see that this may conflict
> with METRON-322.  Essentially what I'm considering is a {sleep,delay,wait}
> stellar function, but it could also be a delay in a parser's kafka spout
> (much less of a fan of the second option).
>
> I'm looking for feedback on the best way to approach this, and I'd be happy
> to do the work myself (if necessary) when it gets to that point.  I did
> consider implementing this delay upstream (in the sensor itself), but after
> looking in more detail it doesn't seem as feasible.
>
> Jon
> --
>
> Jon
>


Re: Help with custom enrichment / parser

2016-11-03 Thread Michael Miklavcic
Not sure about the python-kafka lib issues. Regarding enrichment data
getting written to ES, how many records have you processed and what is your
batch size? You might need to write more records or adjust this for the
values to propagate through. See the "Sensor Enrichment Configuration"
section -
https://github.com/apache/incubator-metron/tree/master/metron-platform/metron-enrichment


On Thu, Nov 3, 2016 at 1:03 PM, Tyler Moore <tmo...@goflyball.com> wrote:

> Mike,
>
> I am using quick-dev vagrant deployment and at the moment testing locally
> but we plan on having data from remote locations streaming in to be parsed.
> I was able to get the parsers running, thanks to casey, looks like i missed
> an update to the Hbase enrichment writer naming convention.
> Still working on the enrichment configs though, they aren't throwing any
> errors and storm says they are emitting data, but not being written to
> elastic.
> As well with the python-kafka library, can't figure out why the json
> serializer isn't working, as long as I have a parser implemented I could
> forego serializing the data
> prior to sending to a kafka topic correct??
>
> Thanks for all your help thus far!
>
> Regards,
>
> Tyler
>
> Regards,
>
> Tyler Moore
> Software Engineer
> Flyball Labs
>
> On Thu, Nov 3, 2016 at 2:42 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Tyler,
> >
> > Thanks for the interest in Metron and welcome to the community! :)
> >
> > Just curious, what type of environment are you running in? Full cluster
> or
> > are you using the full-dev or quick-dev vagrant deployment vagrant
> scripts?
> >
> > Best,
> > Mike Miklavcic
> >
> >
> > On Thu, Nov 3, 2016 at 10:34 AM, Tyler Moore <tmo...@goflyball.com>
> wrote:
> >
> > > Haven't heard of the acronym before, i'm kinda new to the dev game :D
> > >
> > > Do you have any idea why my the enriched data isn't being written to
> > > elasticsearch?
> > >
> > > Regards,
> > >
> > > Tyler Moore
> > > Software Engineer
> > > Flyball Labs
> > >
> > > On Thu, Nov 3, 2016 at 12:15 PM, Casey Stella <ceste...@gmail.com>
> > wrote:
> > >
> > > > Thanks for finding that; I fixed it in the wiki.  Isn't OSS awesome?
> ;)
> > > >
> > > > On Thu, Nov 3, 2016 at 12:11 PM, Tyler Moore <tmo...@goflyball.com>
> > > wrote:
> > > >
> > > > > No problem,
> > > > >
> > > > > I was following the Metron application tutorials in the Metron
> wiki:
> > > > > https://cwiki.apache.org/confluence/display/METRON/
> > > > > 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+
> > > Streaming+Enrichment
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Tyler Moore
> > > > > Software Engineer
> > > > > Flyball Labs
> > > > >
> > > > > On Thu, Nov 3, 2016 at 11:59 AM, Casey Stella <ceste...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Ah, so quick feedback here, that class path has changed from
> > > > > > org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter to
> > > > > > org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter
> > > > > >
> > > > > > There is probably some outdated documentation somewhere, would
> you
> > > mind
> > > > > > pointing out where you got that one?
> > > > > >
> > > > > > Casey
> > > > > >
> > > > > > On Thu, Nov 3, 2016 at 11:56 AM, Tyler Moore <
> tmo...@goflyball.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Casey,
> > > > > > >
> > > > > > > Thanks for the quick reply, love your work by the way!
> > > > > > >
> > > > > > > When I try to upload the parser I am getting a stack trace like
> > > this:
> > > > > > > 15:43:33.182 [main-EventThread] INFO  o.a.c.f.s.
> > > > ConnectionStateManager
> > > > > -
> > > > > > > State change: CONNECTED
> > > > > > > java.lang.IllegalStateException: Unable to instantiate
> > connector:
> > > > > class
> > > > >

Re: Help with custom enrichment / parser

2016-11-03 Thread Michael Miklavcic
Tyler,

Thanks for the interest in Metron and welcome to the community! :)

Just curious, what type of environment are you running in? Full cluster or
are you using the full-dev or quick-dev vagrant deployment vagrant scripts?

Best,
Mike Miklavcic


On Thu, Nov 3, 2016 at 10:34 AM, Tyler Moore  wrote:

> Haven't heard of the acronym before, i'm kinda new to the dev game :D
>
> Do you have any idea why my the enriched data isn't being written to
> elasticsearch?
>
> Regards,
>
> Tyler Moore
> Software Engineer
> Flyball Labs
>
> On Thu, Nov 3, 2016 at 12:15 PM, Casey Stella  wrote:
>
> > Thanks for finding that; I fixed it in the wiki.  Isn't OSS awesome? ;)
> >
> > On Thu, Nov 3, 2016 at 12:11 PM, Tyler Moore 
> wrote:
> >
> > > No problem,
> > >
> > > I was following the Metron application tutorials in the Metron wiki:
> > > https://cwiki.apache.org/confluence/display/METRON/
> > > 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+
> Streaming+Enrichment
> > >
> > >
> > >
> > > Regards,
> > >
> > > Tyler Moore
> > > Software Engineer
> > > Flyball Labs
> > >
> > > On Thu, Nov 3, 2016 at 11:59 AM, Casey Stella 
> > wrote:
> > >
> > > > Ah, so quick feedback here, that class path has changed from
> > > > org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter to
> > > > org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter
> > > >
> > > > There is probably some outdated documentation somewhere, would you
> mind
> > > > pointing out where you got that one?
> > > >
> > > > Casey
> > > >
> > > > On Thu, Nov 3, 2016 at 11:56 AM, Tyler Moore 
> > > wrote:
> > > >
> > > > > Casey,
> > > > >
> > > > > Thanks for the quick reply, love your work by the way!
> > > > >
> > > > > When I try to upload the parser I am getting a stack trace like
> this:
> > > > > 15:43:33.182 [main-EventThread] INFO  o.a.c.f.s.
> > ConnectionStateManager
> > > -
> > > > > State change: CONNECTED
> > > > > java.lang.IllegalStateException: Unable to instantiate connector:
> > > class
> > > > > not
> > > > > found
> > > > > at
> > > > > org.apache.metron.common.utils.ReflectionUtils.createInstance(
> > > > > ReflectionUtils.java:56)
> > > > > at
> > > > > org.apache.metron.parsers.topology.ParserTopologyBuilder.
> > > > createParserBolt(
> > > > > ParserTopologyBuilder.java:155)
> > > > > at
> > > > > org.apache.metron.parsers.topology.ParserTopologyBuilder.build(
> > > > > ParserTopologyBuilder.java:94)
> > > > > at
> > > > > org.apache.metron.parsers.topology.ParserTopologyCLI.
> > > > > main(ParserTopologyCLI.java:298)
> > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter
> > > > > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > > > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > > > > at java.lang.Class.forName0(Native Method)
> > > > > at java.lang.Class.forName(Class.java:264)
> > > > > at
> > > > > org.apache.metron.common.utils.ReflectionUtils.createInstance(
> > > > > ReflectionUtils.java:53)
> > > > > ... 3 more
> > > > >
> > > > > The storm supervisor log is saying the some of the prcosses aren't
> > > > > starting 2016-11-03 15:32:25.730 b.s.d.supervisor [INFO]
> > > > > 9b0734b2-5e5f-4109-aabd-cf343f54e3a4 still hasn't started
> > > > > and is throwing TimoutExceptions, I believe that is due to the
> > parser.
> > > > >
> > > > > Without the parser though (when troubleshooting the enrichment
> config
> > > > from
> > > > > #1) I don't receive and errors from storm and the enrichment bolts
> > seem
> > > > to
> > > > > be splitting the data but writer bolt emits 0 everytime.
> > > > > We are able to use the built-in hostname enrichment but the custom
> > one
> > > I
> > > > > built (which will eventually be converted into asset discovery
> > > > enrichment)
> > > > > doesn't seem to be writing to elastic search. Do I need to setup a
> > new
> > > > > index template to receive the data from the new enrichment config?
> Or
> > > > > should I be looking at creating a new spout / bolt to transfer the
> > > data?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Tyler
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Tyler Moore
> > > > > Software Engineer
> > > > > Flyball Labs
> > > > >
> > > > > On Thu, Nov 3, 2016 at 9:26 AM, Casey Stella 
> > > wrote:
> > > > >
> > > > > > First off Tyler, thanks for using Metron.
> > > > > >
> > > > > > Do you have any errors or stack traces that are being thrown
> > (keeping
> > > > in
> > > > > > mind that in storm, they may be in the storm logs (/var/log/storm
> > on
> > > > the
> > > > > > supervisor nodes)?
> > > > > >
> > > > > > On Wed, Nov 2, 2016 at 10:47 PM, Tyler Moore <
> tmo...@goflyball.com
> 

Re: [DISCUSS] Next Release Name

2016-11-02 Thread Michael Miklavcic
Hi Jon, I have commented on 370 -
https://issues.apache.org/jira/browse/METRON-370

Best,
Mike

On Wed, Nov 2, 2016 at 3:11 PM, zeo...@gmail.com  wrote:

> I personally would like to see the following things done before things
> leave BETA:
> (1) Address data integrity concerns (Specifically thinking of METRON-370,
> METRON-517)
> (2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
> and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> for).
>
> I would also want to see the upgrade path (as opposed to rebuild) be more
> thoroughly and regularly tested once things leave BETA.  From my
> perspective I think the project is very close but not yet ready.
>
> Jon
>
> On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
>
> Hello Everyone,
>
> Now that the discussion around the next release has started, it has been
> proposed and I think it's a good time to discuss what to name this next
> release.  Before, we have adopted the BETA suffix.  I think it might be
> time to drop it and call the next release 0.2.2
>
> Thoughts?
>
> Best,
>
> Casey
>
> --
>
> Jon
>


Re: [DISCUSS] Improving quick-dev

2016-10-13 Thread Michael Miklavcic
I think this may have come up in another PR already (have to look for it).
But maybe we could maintain our flexibility in quick-dev by installing the
sensors and not starting them until we need them. I think it's useful to
have a quick "genuine" e2e testing environment that doesn't require running
through a full install. I'm also not opposed to extracting the integration
test functionality into general purpose data generators.

On Thu, Oct 13, 2016 at 8:31 PM, Nick Allen  wrote:

> To Jon's point, I think it would be useful to have a Demo box that uses
> generators to produce 3 or 4 types of telemetry that shows up in the Metron
> Dashboard.  This box would be different from Quick-Dev in that everything
> starts automatically, so that a user just has to launch it and the should
> start seeing data in the Metron Dashboard right away.  In fact, we could
> even pre-load the Elasticsearch indices so that the user has more of a
> history to mine when using the Demo box.
>
> On Thu, Oct 13, 2016 at 2:04 PM, zeo...@gmail.com 
> wrote:
>
> > +1 Ryan and Otto's comments.
> >
> > I also strongly think we need to make a demo environment easier, but that
> > should be different than quick-dev.
> >
> > Jon
> >
> > On Thu, Oct 13, 2016 at 1:15 PM Otto Fowler 
> > wrote:
> >
> > > - create scripts/utilities to easily run a topology locally in an IDE
> > > instead of in the VM
> > >
> > >
> > >  THIS.
> > >
> > >
> > > On October 13, 2016 at 12:36:45, Ryan Merriman (merrim...@gmail.com)
> > > wrote:
> > >
> > > Working with the quick-dev vagrant VM recently left a lot to be
> desired.
> > > All forthcoming comments are made under the assumption that this VM is
> > > intended for development purposes. If that is not true, I think we
> should
> > > consider adding a VM for this purpose (or Docker containers?). Here are
> > > the issues I ran into that I think can be improved:
> > >
> > > - had to upgrade VirtualBox from 5.0.16 to 5.0.20
> > > - had to update to the latest metron/hdp-base Vagrant box
> > > - takes forever to spin up
> > > - VM is constrained for resources making it unstable
> > > - spent a large amount of time troubleshooting sensors (no raw messages
> > > in Kafka)
> > > - no easy way to debug topologies
> > >
> > > Fortunately I think we can make this a much better experience without a
> > > major effort. Here are my ideas to do this:
> > >
> > > - update the prereqs for VirtualBox
> > > - add a check for the appropriate base box version (Jira has already
> > > been created https://issues.apache.org/jira/browse/METRON-497)
> > > - don't install any sensors and replace them with a data generator that
> > > just loops through sample data and emits to Kafka (could also be used
> to
> > > replay and troubleshoot edge cases)
> > > - everything in monit is off by default except for ES or other critical
> > > services
> > > - create scripts/utilities to easily run a topology locally in an IDE
> > > instead of in the VM
> > > - improved documentation with examples of how to run and troubleshoot
> > > topologies
> > >
> > > Is this a worthwhile effort? I think this would also give users an
> easier
> > > path to demonstrate or tour Metron's capabilities. Are there any other
> > > improvements people would like to see? Should we wait for Docker?
> > > Thoughts?
> > >
> > > Ryan Merriman
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> Nick Allen 
>


Re: [DISCUSS] Dockerize Metron

2016-10-01 Thread Michael Miklavcic
I really like this

On Oct 1, 2016 5:03 PM, "Ryan Merriman" <merrim...@gmail.com> wrote:

> I don't think this will be an issue.  The remote debugging strategy should
> work if you want to go that route but I would just run the topology in
> local mode (that's how the integration tests work now).  The Storm topology
> continues  to runs locally in your IDE while all the other components that
> were in-memory components (HBase, Kafka, etc) now run in Docker
> containers.  We can continue to leverage
> https://github.com/apache/incubator-metron/blob/master/
> metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/
> integration/components/ParserTopologyComponent.java
> for parser topologies and the --local option for flux-based topologies.
>
> On Sat, Oct 1, 2016 at 12:28 PM, Nick Allen <n...@nickallen.org> wrote:
>
> > It's definitely possible to debug in a Docker environment using a remote
> > debugger.  I remember reading something along these lines a few weeks
> ago.
> >
> > https://blog.docker.com/2016/09/java-development-using-docker/
> >
> > On Sat, Oct 1, 2016 at 1:15 PM, Casey Stella <ceste...@gmail.com> wrote:
> >
> > > I agree with that concern. Being able to debug a running topology has
> > > helped in so many circumstances. Not sure how to accomplish this in a
> > > dockerized environment.
> > >
> > > On Sat, Oct 1, 2016 at 13:11 Michael Miklavcic <
> > > michael.miklav...@gmail.com>
> > > wrote:
> > >
> > > > Any ideas on how debugging works when leveraging Docker? In spite of
> > the
> > > > classpath troubles, one of the benefits of the current single-JVM
> > > approach
> > > > is that you can easily debug, set break points, etc within an IDE. Is
> > > that
> > > > still doable? Seems like there might need to be some remote debugging
> > > magic
> > > > done to accomplish this.
> > > >
> > > > On Fri, Sep 30, 2016 at 6:39 PM, James Sirota <jsir...@apache.org>
> > > wrote:
> > > >
> > > > > I agree with making an effort to create containers.  I suggest
> doing
> > it
> > > > on
> > > > > a feature branch until we are feature complete and are able to
> > migrate
> > > > our
> > > > > integration tests into a dockerized environment.
> > > > >
> > > > > 30.09.2016, 14:52, "Nick Allen" <n...@nickallen.org>:
> > > > > >  Relieve dependency version conflict issues? Umm, yes please.
> I'll
> > > > take a
> > > > > >  second helping of that too.
> > > > > >
> > > > > >  On Fri, Sep 30, 2016 at 5:00 PM, Ryan Merriman <
> > merrim...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > >>   I would like to open up a discussion around creating Docker
> > images
> > > > for
> > > > > >>   Metron. Having this available would provide a leaner
> alternative
> > > to
> > > > > the
> > > > > >>   ansible/vagrant environment for development tesing (and even
> > > demoing
> > > > > or
> > > > > >>   exploring features). It could also relieve some of the
> > dependency
> > > > > version
> > > > > >>   conflict issues that we've been experiencing when running
> > > > integration
> > > > > tests
> > > > > >>   in a single JVM.
> > > > > >>
> > > > > >>   I would suggest the initial version be intended only for
> > > development
> > > > > and
> > > > > >>   testing purposes. The general approach could be to create an
> > image
> > > > for
> > > > > >>   each service we depend on and use something like Docker
> compose
> > to
> > > > > package
> > > > > >>   them together. A Dockerfile would either install the service
> > from
> > > > > scratch
> > > > > >>   or extend a community image then add any Metron related
> > > dependencies
> > > > > or
> > > > > >>   configurations on top. The metron-deployment project code
> could
> > be
> > > > > used as
> > > > > >>   a guide.
> > > > > >>
> > > > > >>   I would like to see th

Re: [DISCUSS] Dockerize Metron

2016-10-01 Thread Michael Miklavcic
Any ideas on how debugging works when leveraging Docker? In spite of the
classpath troubles, one of the benefits of the current single-JVM approach
is that you can easily debug, set break points, etc within an IDE. Is that
still doable? Seems like there might need to be some remote debugging magic
done to accomplish this.

On Fri, Sep 30, 2016 at 6:39 PM, James Sirota  wrote:

> I agree with making an effort to create containers.  I suggest doing it on
> a feature branch until we are feature complete and are able to migrate our
> integration tests into a dockerized environment.
>
> 30.09.2016, 14:52, "Nick Allen" :
> >  Relieve dependency version conflict issues? Umm, yes please. I'll take a
> >  second helping of that too.
> >
> >  On Fri, Sep 30, 2016 at 5:00 PM, Ryan Merriman 
> wrote:
> >
> >>   I would like to open up a discussion around creating Docker images for
> >>   Metron. Having this available would provide a leaner alternative to
> the
> >>   ansible/vagrant environment for development tesing (and even demoing
> or
> >>   exploring features). It could also relieve some of the dependency
> version
> >>   conflict issues that we've been experiencing when running integration
> tests
> >>   in a single JVM.
> >>
> >>   I would suggest the initial version be intended only for development
> and
> >>   testing purposes. The general approach could be to create an image for
> >>   each service we depend on and use something like Docker compose to
> package
> >>   them together. A Dockerfile would either install the service from
> scratch
> >>   or extend a community image then add any Metron related dependencies
> or
> >>   configurations on top. The metron-deployment project code could be
> used as
> >>   a guide.
> >>
> >>   I would like to see these images added initially to support
> development and
> >>   testing:
> >>
> >>  - Kafka with topics preconfigured
> >>  - Storm with metron topology assets installed
> >>  - Zookeeper with paths created and sample configs loaded
> >>  - HBase with sample enrichments and threat intel loaded
> >>  - Elasticsearch configured for Metron
> >>  - MySQL with databases/tables/users created and geo data loaded
> >>
> >>   Other images that could also be useful:
> >>
> >>  - Images for each sensor
> >>  - Ambari?
> >>  - Solr
> >>
> >>   Looking forward to hearing what everyone thinks.
> >>
> >>   Ryan Merriman
> >
> >  --
> >  Nick Allen 
>
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
> 30.09.2016, 14:52, "Nick Allen" :
> > Relieve dependency version conflict issues? Umm, yes please. I'll take a
> > second helping of that too.
> >
> > On Fri, Sep 30, 2016 at 5:00 PM, Ryan Merriman 
> wrote:
> >
> >>  I would like to open up a discussion around creating Docker images for
> >>  Metron. Having this available would provide a leaner alternative to the
> >>  ansible/vagrant environment for development tesing (and even demoing or
> >>  exploring features). It could also relieve some of the dependency
> version
> >>  conflict issues that we've been experiencing when running integration
> tests
> >>  in a single JVM.
> >>
> >>  I would suggest the initial version be intended only for development
> and
> >>  testing purposes. The general approach could be to create an image for
> >>  each service we depend on and use something like Docker compose to
> package
> >>  them together. A Dockerfile would either install the service from
> scratch
> >>  or extend a community image then add any Metron related dependencies or
> >>  configurations on top. The metron-deployment project code could be
> used as
> >>  a guide.
> >>
> >>  I would like to see these images added initially to support
> development and
> >>  testing:
> >>
> >> - Kafka with topics preconfigured
> >> - Storm with metron topology assets installed
> >> - Zookeeper with paths created and sample configs loaded
> >> - HBase with sample enrichments and threat intel loaded
> >> - Elasticsearch configured for Metron
> >> - MySQL with databases/tables/users created and geo data loaded
> >>
> >>  Other images that could also be useful:
> >>
> >> - Images for each sensor
> >> - Ambari?
> >> - Solr
> >>
> >>  Looking forward to hearing what everyone thinks.
> >>
> >>  Ryan Merriman
> >
> > --
> > Nick Allen 
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: new PPMC Member: Nick Allen

2016-09-30 Thread Michael Miklavcic
Congrats Nick!

On Fri, Sep 30, 2016 at 1:43 PM, James Sirota  wrote:

> The Podling Project Management Committee (PPMC) for Apache Metron
> (Incubating)
> has asked Nick Allen to become a PPMC member and we are pleased
> to announce that they have accepted.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
> Being a PPMC member enables assistance with the management
> and to guide the direction of the project.
>
> Nick,
>
> Please update the project status page (http://incubator.apache.org/
> projects/metron.html)
> and the Metron community page (http://metron.incubator.
> apache.org/community/) to verify
> your commit access has been granted.
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: new PPMC Member: David Lyle

2016-09-30 Thread Michael Miklavcic
Congrats David!

On Fri, Sep 30, 2016 at 1:42 PM, James Sirota  wrote:

> The Podling Project Management Committee (PPMC) for Apache Metron
> (Incubating)
> has asked David Lyle to become a PPMC member and we are pleased
> to announce that they have accepted.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
> Being a PPMC member enables assistance with the management
> and to guide the direction of the project.
>
> David,
>
> Please update the project status page (http://incubator.apache.org/
> projects/metron.html)
> and the Metron community page (http://metron.incubator.
> apache.org/community/) to verify
> your commit access has been granted.
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: [CANCELLED][VOTE] Releasing Apache Metron 0.2.1BETA-RC1

2016-09-29 Thread Michael Miklavcic
https://issues.apache.org/jira/browse/METRON-476 is committed to master

On Thu, Sep 29, 2016 at 6:09 PM, James Sirota <jsir...@apache.org> wrote:

> This vote is cancelled due to 2 binding -1s.  We will cut RC2 and re-post
> a thread for a re-vote
>
> 29.09.2016, 13:58, "Casey Stella" <ceste...@gmail.com>:
> > Ok, looks like PR is out. For the record, I'm -1 (binding).
> >
> > On Thu, Sep 29, 2016 at 4:32 PM, David Lyle <dlyle65...@gmail.com>
> wrote:
> >
> >>  Hi Mike,
> >>
> >>  Yeah, that's all it is. We just crossed is all.
> >>
> >>  -D...
> >>
> >>  On Thu, Sep 29, 2016 at 4:29 PM, Michael Miklavcic <
> >>  michael.miklav...@gmail.com> wrote:
> >>
> >>  > Sweet! Thanks David.
> >>  >
> >>  > On Thu, Sep 29, 2016 at 4:18 PM, David Lyle <dlyle65...@gmail.com>
> >>  wrote:
> >>  >
> >>  > > Sounds good to me. I can push up a pr straightaway. Testing the fix
> >>  now.
> >>  > >
> >>  > > -D...
> >>  > >
> >>  > >
> >>  > > On Thu, Sep 29, 2016 at 4:17 PM, Casey Stella <ceste...@gmail.com>
> >>  > wrote:
> >>  > >
> >>  > > > Let's cancel the vote, someone put up a PR to correct and cut a
> rc2.
> >>  > How
> >>  > > > about that?
> >>  > > >
> >>  > > > On Thu, Sep 29, 2016 at 4:16 PM, David Lyle <
> dlyle65...@gmail.com>
> >>  > > wrote:
> >>  > > >
> >>  > > > > Okay, so it looks like METRON-466 and METRON-389 crossed paths
> and
> >>  > the
> >>  > > > > metron_version in the quick dev is still 0.2.0BETA.
> >>  > > > >
> >>  > > > > Easy to fix, how do you want to proceed?
> >>  > > > >
> >>  > > > > -D...
> >>  > > > >
> >>  > > > >
> >>  > > > > On Thu, Sep 29, 2016 at 4:13 PM, David Lyle <
> dlyle65...@gmail.com>
> >>  > > > wrote:
> >>  > > > >
> >>  > > > > > Correction, the solr bundle didn't deploy - let me retract
> that
> >>  +1
> >>  > > for
> >>  > > > a
> >>  > > > > > sec.
> >>  > > > > >
> >>  > > > > > On Thu, Sep 29, 2016 at 4:09 PM, David Lyle <
> >>  dlyle65...@gmail.com>
> >>  > > > > wrote:
> >>  > > > > >
> >>  > > > > >> +1 (binding)
> >>  > > > > >>
> >>  > > > > >> checksums/gpg - checked
> >>  > > > > >> Rat Check - passed
> >>  > > > > >> Integration Tests - passed
> >>  > > > > >> Package Builds - success
> >>  > > > > >> Quick Dev - worked as expected
> >>  > > > > >>
> >>  > > > > >> On Thu, Sep 29, 2016 at 2:32 PM, James Sirota <
> >>  jsir...@apache.org
> >>  > >
> >>  > > > > wrote:
> >>  > > > > >>
> >>  > > > > >>> +1, binding
> >>  > > > > >>>
> >>  > > > > >>> 29.09.2016, 11:31, "James Sirota" <jsir...@apache.org>:
> >>  > > > > >>> > This is a call to vote on releasing Apache Metron
> >>  0.2.1BETA-RC1
> >>  > > > > >>> incubating
> >>  > > > > >>> >
> >>  > > > > >>> > Full list of changes in this release:
> >>  > > > > >>> >
> >>  > > > > >>> > https://dist.apache.org/repos/
> dist/dev/incubator/metron/0.2.
> >>  > > > > >>> 1BETA-RC1-incubating/CHANGES
> >>  > > > > >>> >
> >>  > > > > >>> > The tag/commit to be voted upon is Metron_0.2.1BETA_rc1:
> >>  > > > > >>> >
> >>  > > > > >>> > https://git-wip-us.apache.org/
> repos/asf?p=incubator-metron.g
> >>  > > > > >>> it;a=commit;h=823cd2a83063bf232d25d3f58c08ab1fa8e06319
> >>  > > > > >>> >
> >>  > > > > >>> > The s

Re: [VOTE] Releasing Apache Metron 0.2.1BETA-RC1

2016-09-29 Thread Michael Miklavcic
Sweet! Thanks David.

On Thu, Sep 29, 2016 at 4:18 PM, David Lyle  wrote:

> Sounds good to me. I can push up a pr straightaway. Testing the fix now.
>
> -D...
>
>
> On Thu, Sep 29, 2016 at 4:17 PM, Casey Stella  wrote:
>
> > Let's cancel the vote, someone put up a PR to correct and cut a rc2.  How
> > about that?
> >
> > On Thu, Sep 29, 2016 at 4:16 PM, David Lyle 
> wrote:
> >
> > > Okay, so it looks like METRON-466 and METRON-389 crossed paths and the
> > > metron_version in the quick dev is still 0.2.0BETA.
> > >
> > > Easy to fix, how do you want to proceed?
> > >
> > > -D...
> > >
> > >
> > > On Thu, Sep 29, 2016 at 4:13 PM, David Lyle 
> > wrote:
> > >
> > > > Correction, the solr bundle didn't deploy - let me retract that +1
> for
> > a
> > > > sec.
> > > >
> > > > On Thu, Sep 29, 2016 at 4:09 PM, David Lyle 
> > > wrote:
> > > >
> > > >> +1 (binding)
> > > >>
> > > >> checksums/gpg - checked
> > > >> Rat Check - passed
> > > >> Integration Tests - passed
> > > >> Package Builds - success
> > > >> Quick Dev - worked as expected
> > > >>
> > > >> On Thu, Sep 29, 2016 at 2:32 PM, James Sirota 
> > > wrote:
> > > >>
> > > >>> +1, binding
> > > >>>
> > > >>> 29.09.2016, 11:31, "James Sirota" :
> > > >>> > This is a call to vote on releasing Apache Metron 0.2.1BETA-RC1
> > > >>> incubating
> > > >>> >
> > > >>> > Full list of changes in this release:
> > > >>> >
> > > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > > >>> 1BETA-RC1-incubating/CHANGES
> > > >>> >
> > > >>> > The tag/commit to be voted upon is Metron_0.2.1BETA_rc1:
> > > >>> >
> > > >>> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.g
> > > >>> it;a=commit;h=823cd2a83063bf232d25d3f58c08ab1fa8e06319
> > > >>> >
> > > >>> > The source archive being voted upon can be found here:
> > > >>> >
> > > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > > >>> 1BETA-RC1-incubating/apache-metron-0.2.1BETA-RC1-incubating.tar.gz
> > > >>> >
> > > >>> > Other release files, signatures and digests can be found here:
> > > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > > >>> 1BETA-RC1-incubating/
> > > >>> >
> > > >>> > The release artifacts are signed with the following key:
> > > >>> >
> > > >>> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.g
> > > >>> it;a=blob;f=KEYS;h=c11bcb9b7385b4d155501aa097afd890f1070a18;
> > > >>> hb=refs/tags/Metron_0.2.1BETA_rc1
> > > >>> >
> > > >>> > Please vote on releasing this package as Apache Metron
> > 0.2.1BETA-RC1
> > > >>> incubating
> > > >>> >
> > > >>> > When voting, please list the actions taken to verify the release.
> > > >>> > Recommended build validation and verification instructions are
> > posted
> > > >>> here:
> > > >>> > https://cwiki.apache.org/confluence/display/METRON/
> > Verifying+Builds
> > > >>> >
> > > >>> > This vote will be open for at least 72 hours.
> > > >>> >
> > > >>> > [ ] +1 Release this package as Apache Metron 0.2.0BETA-RC2
> > incubating
> > > >>> > [ ] 0 No opinion
> > > >>> > [ ] -1 Do not release this package because...
> > > >>> >
> > > >>> > ---
> > > >>> > Thank you,
> > > >>> >
> > > >>> > James Sirota
> > > >>> > PPMC- Apache Metron (Incubating)
> > > >>> > jsirota AT apache DOT org
> > > >>>
> > > >>> ---
> > > >>> Thank you,
> > > >>>
> > > >>> James Sirota
> > > >>> PPMC- Apache Metron (Incubating)
> > > >>> jsirota AT apache DOT org
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>


Re: [VOTE] Releasing Apache Metron 0.2.1BETA-RC1

2016-09-29 Thread Michael Miklavcic
David, is line 42 in
incubator-metron/metron-deployment/inventory/quick-dev-platform/group_vars/all
the only issue you see?

On Thu, Sep 29, 2016 at 4:17 PM, Casey Stella  wrote:

> Let's cancel the vote, someone put up a PR to correct and cut a rc2.  How
> about that?
>
> On Thu, Sep 29, 2016 at 4:16 PM, David Lyle  wrote:
>
> > Okay, so it looks like METRON-466 and METRON-389 crossed paths and the
> > metron_version in the quick dev is still 0.2.0BETA.
> >
> > Easy to fix, how do you want to proceed?
> >
> > -D...
> >
> >
> > On Thu, Sep 29, 2016 at 4:13 PM, David Lyle 
> wrote:
> >
> > > Correction, the solr bundle didn't deploy - let me retract that +1 for
> a
> > > sec.
> > >
> > > On Thu, Sep 29, 2016 at 4:09 PM, David Lyle 
> > wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> checksums/gpg - checked
> > >> Rat Check - passed
> > >> Integration Tests - passed
> > >> Package Builds - success
> > >> Quick Dev - worked as expected
> > >>
> > >> On Thu, Sep 29, 2016 at 2:32 PM, James Sirota 
> > wrote:
> > >>
> > >>> +1, binding
> > >>>
> > >>> 29.09.2016, 11:31, "James Sirota" :
> > >>> > This is a call to vote on releasing Apache Metron 0.2.1BETA-RC1
> > >>> incubating
> > >>> >
> > >>> > Full list of changes in this release:
> > >>> >
> > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > >>> 1BETA-RC1-incubating/CHANGES
> > >>> >
> > >>> > The tag/commit to be voted upon is Metron_0.2.1BETA_rc1:
> > >>> >
> > >>> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.g
> > >>> it;a=commit;h=823cd2a83063bf232d25d3f58c08ab1fa8e06319
> > >>> >
> > >>> > The source archive being voted upon can be found here:
> > >>> >
> > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > >>> 1BETA-RC1-incubating/apache-metron-0.2.1BETA-RC1-incubating.tar.gz
> > >>> >
> > >>> > Other release files, signatures and digests can be found here:
> > >>> > https://dist.apache.org/repos/dist/dev/incubator/metron/0.2.
> > >>> 1BETA-RC1-incubating/
> > >>> >
> > >>> > The release artifacts are signed with the following key:
> > >>> >
> > >>> > https://git-wip-us.apache.org/repos/asf?p=incubator-metron.g
> > >>> it;a=blob;f=KEYS;h=c11bcb9b7385b4d155501aa097afd890f1070a18;
> > >>> hb=refs/tags/Metron_0.2.1BETA_rc1
> > >>> >
> > >>> > Please vote on releasing this package as Apache Metron
> 0.2.1BETA-RC1
> > >>> incubating
> > >>> >
> > >>> > When voting, please list the actions taken to verify the release.
> > >>> > Recommended build validation and verification instructions are
> posted
> > >>> here:
> > >>> > https://cwiki.apache.org/confluence/display/METRON/
> Verifying+Builds
> > >>> >
> > >>> > This vote will be open for at least 72 hours.
> > >>> >
> > >>> > [ ] +1 Release this package as Apache Metron 0.2.0BETA-RC2
> incubating
> > >>> > [ ] 0 No opinion
> > >>> > [ ] -1 Do not release this package because...
> > >>> >
> > >>> > ---
> > >>> > Thank you,
> > >>> >
> > >>> > James Sirota
> > >>> > PPMC- Apache Metron (Incubating)
> > >>> > jsirota AT apache DOT org
> > >>>
> > >>> ---
> > >>> Thank you,
> > >>>
> > >>> James Sirota
> > >>> PPMC- Apache Metron (Incubating)
> > >>> jsirota AT apache DOT org
> > >>>
> > >>
> > >>
> > >
> >
>


Re: [DISCUSS] Upcoming Metron Build

2016-09-29 Thread Michael Miklavcic
https://issues.apache.org/jira/browse/METRON-398 (version bump) is now
committed


On Wed, Sep 28, 2016 at 5:41 PM, James Sirota <jsir...@apache.org> wrote:

> Sounds like we are still debating METRON-466.  I want to make sure we get
> consensus on it before we wend up the build.  Lets wait a day until we get
> resolution
>
> 28.09.2016, 12:34, "David Lyle" <dlyle65...@gmail.com>:
> > No. METRON-466 is in review.
> >
> > On Wednesday, September 28, 2016, James Sirota <jsir...@apache.org>
> wrote:
> >
> >>  Do we now have everything we wanted in there?
> >>
> >>  27.09.2016, 08:59, "David Lyle" <dlyle65...@gmail.com <javascript:;>>:
> >>  > I'd like to hold out for https://issues.apache.org/
> >>  jira/browse/METRON-466.
> >>  > I'm testing this now, so am a few hours out.
> >>  >
> >>  > -D...
> >>  >
> >>  > On Tue, Sep 27, 2016 at 11:41 AM, Michael Miklavcic <
> >>  > michael.miklav...@gmail.com <javascript:;>> wrote:
> >>  >
> >>  >> Also need this Jira to bump the version.
> >>  >>
> >>  >> https://issues.apache.org/jira/browse/METRON-398
> >>  >>
> >>  >> On Tue, Sep 27, 2016 at 11:38 AM, James Sirota <jsir...@apache.org
> >>  <javascript:;>> wrote:
> >>  >>
> >>  >> > We are preparing the next release (will be put up for a vote
> within
> >>  the
> >>  >> > next few days) with the following list of Jiras. Do you feel there
> >>  are
> >>  >> any
> >>  >> > other Jiras that should go into this release or are there any
> >>  critical
> >>  >> bugs
> >>  >> > that anyone knows of that we should address before releasing?
> Please
> >>  >> > comment on this thread. Otherwise, we will put up the release for
> a
> >>  vote
> >>  >> > shortly.
> >>  >> >
> >>  >> > METRON-457 Correct GrokParser logging spelling error (mmiklavc via
> >>  >> > cestella) closes apache/incubator-metron#274
> >>  >> > METRON-449 JSONMapParser should unfold maps to arbitrary depths
> >>  >> closes
> >>  >> > apache/incubator-metron#271
> >>  >> > METRON-453: Add a stellar shell function to open an external
> editor
> >>  >> > and return the editor's contents closes
> apache/incubator-metron#272
> >>  >> > METRON-452: Add rudimentary configuration management functions to
> >>  >> > Stellar closes apache/incubator-metron#269
> >>  >> > METRON-374: Add appropriate bundled 3rd party licenses to NOTICE
> and
> >>  >> > LICENSE where appropriate closes apache/incubator-metron#229
> >>  >> > METRON-427 Create Ambari Management Pack for Metron Installation
> >>  >> > closes apache/incubator-metron#266
> >>  >> > METRON-434: JSON Parser closes apache/incubator-metron#261
> >>  >> > METRON-437 Profile Definition's 'inputTopic' field is Extraneous
> >>  >> > (nickwallen) closes apache/incubator-metron#264
> >>  >> > METRON-445 Fix typos in metron-deployment roles (JonZeolla via
> >>  >> > nickwallen) closes apache/incubator-metron#267
> >>  >> > METRON-438: Back the Stellar REPL with a readline implementation
> >>  >> > closes apache/incubator-metron#265
> >>  >> > METRON-426: Stellar does not support scientific notation as a
> literal
> >>  >> > closes apache/incubator-metron#257
> >>  >> > METRON-435: Create Stellar REPL (nickwallen via cestella) closes
> >>  >> > apache/incubator-metron#262
> >>  >> > METRON-436: Updated architecture diagrams for Metron READMEs
> >>  >> > (anandsubbu via cestella) closes apache/incubator-metron#263
> >>  >> > METRON-433: Documentation update closes
> apache/incubator-metron#260
> >>  >> > METRON-428: Allow a kafka offset to be passed to the
> ParserTopology
> >>  >> > CLI closes apache/incubator-metron#258
> >>  >> > METRON-429 Profiler Missing Dependencies When Uber Jar Deployed
> >>  >> > (nickwallen) closes apache/incubator-metron#259
> >>  >> > METRON-384 Deployment fails at task Wait for Elasticsearch
> >>  >> Host
> >>  >> > to Start (2xyo via dl

Re: [DISCUSS] Upcoming Metron Build

2016-09-27 Thread Michael Miklavcic
Also need this Jira to bump the version.

https://issues.apache.org/jira/browse/METRON-398

On Tue, Sep 27, 2016 at 11:38 AM, James Sirota  wrote:

> We are preparing the next release (will be put up for a vote within the
> next few days) with the following list of Jiras.  Do you feel there are any
> other Jiras that should go into this release or are there any critical bugs
> that anyone knows of that we should address before releasing?  Please
> comment on this thread.  Otherwise, we will put up the release for a vote
> shortly.
>
> METRON-457 Correct GrokParser logging spelling error (mmiklavc via
> cestella) closes apache/incubator-metron#274
> METRON-449 JSONMapParser should unfold maps to arbitrary depths closes
> apache/incubator-metron#271
> METRON-453: Add a stellar shell function to open an external editor
> and return the editor's contents closes apache/incubator-metron#272
> METRON-452: Add rudimentary configuration management functions to
> Stellar closes apache/incubator-metron#269
> METRON-374: Add appropriate bundled 3rd party licenses to NOTICE and
> LICENSE where appropriate closes apache/incubator-metron#229
> METRON-427 Create Ambari Management Pack for Metron Installation
> closes apache/incubator-metron#266
> METRON-434: JSON Parser closes apache/incubator-metron#261
> METRON-437 Profile Definition's 'inputTopic' field is Extraneous
> (nickwallen) closes apache/incubator-metron#264
> METRON-445 Fix typos in metron-deployment roles (JonZeolla via
> nickwallen) closes apache/incubator-metron#267
> METRON-438: Back the Stellar REPL with a readline implementation
> closes apache/incubator-metron#265
> METRON-426: Stellar does not support scientific notation as a literal
> closes apache/incubator-metron#257
> METRON-435: Create Stellar REPL (nickwallen via cestella) closes
> apache/incubator-metron#262
> METRON-436: Updated architecture diagrams for Metron READMEs
> (anandsubbu via cestella) closes apache/incubator-metron#263
> METRON-433: Documentation update closes apache/incubator-metron#260
> METRON-428: Allow a kafka offset to be passed to the ParserTopology
> CLI closes apache/incubator-metron#258
> METRON-429 Profiler Missing Dependencies When Uber Jar Deployed
> (nickwallen) closes apache/incubator-metron#259
> METRON-384  Deployment fails at task Wait for Elasticsearch Host
> to Start (2xyo via dlyle65535) closes apache/incubator-metron#221
> METRON-257 Enable pcap result pagination from the Pcap CLI (mmiklavc
> via cestella) closes apache/incubator-metron#256
> METRON-420 Add Expiration to a Profile Definition (nickwallen) closes
> apache/incubator-metron#254
> METRON-413 Allow Start/End Time Range Search in Profiler Client API
> (nickwallen) closes apache/incubator-metron#249
> METRON-419 Update Tuple to HBase Mapper/Bolt to Set TTL (nickwallen)
> closes apache/incubator-metron#252
> METRON-422 Remove bluecoat.json (cestella via nickwallen) closes
> apache/incubator-metron#255
> METRON-418 Set TTL on HBase Puts (nickwallen) closes
> apache/incubator-metron#251
> METRON-415: Allow a Profile to Store Any Type as its Value closes
> apache/incubator-metron#253
> METRON-411 Support Greater Range of Profile Periods  (nickwallen)
> closes apache/incubator-metron#246
> METRON-416: Provide the ability to store mergeable data structures for
> summarizing data on-line closes apache/incubator-metron#250
> METRON-397: Add a stellar function to interact with the HBase
> enrichment table closes apache/incubator-metron#234
> METRON-391 Create Stellar Function to Read Profile Data for Model
> Scoring (nickwallen via cestella) closes apache/incubator-metron#242
> METRON-414 Kibana Ansible Install Fails with SSL Error closes
> apache/incubator-metron#248
> METRON-407: We currently do not provide defaults if the Stix
> Observable does not specify a condition closes apache/incubator-metron#244
> METRON-406: Stellar variable resolution does not resolve variables
> with ':' in them closes apache/incubator-metron#243
> METRON-399 Stellar Date Functions Should Default to Current Time
> (nickwallen) closes apache/incubator-metron#237
> METRON-389 Create Java API to Read Profile Data During Model Scoring
> (nickwallen) closes apache/incubator-metron#236
> METRON-385 Create Ambari Service Definition for Indexing (justinleet
> via cestella) closes apache/incubator-metron#222
> METRON-395 Fix Metron Bro parser not parsing some timestamp values
> (mmiklavc via cestella) closes apache/incubator-metron#232
> METRON-400 Deploy Probes to running Docker Container closes
> apache/incubator-metron#241
> METRON-408 Intermittent Failures of Profile Integration Tests
> (nickwallen via cestella) closes apache/incubator-metron#245
> METRON-381: Add support for multiple reducers in pcap_query.sh closes
> apache/incubator-metron#217
> METRON-404 

[DISCUSS] Storm topology sideloading jars

2016-09-19 Thread Michael Miklavcic
As part of https://issues.apache.org/jira/browse/METRON-356 it is now
possible to add hbase and hadoop conf to the Storm topology classpath. It
is also desirable to expand this functionality to sideloading jars for
Storm topologies. That way, users can add additional dependencies without
having to recompile/repackage existing jars. One suggestion is to leverage
HDFS to store custom jars and add them to the topology.classpath. I want to
open this discussion to the community.

Best,

Mike