Re: [DISCUSS] Release?

2018-05-09 Thread Casey Stella
; > > >> apache/metron#898
> > > > > >> 3 months ago METRON-1392 Fix a test case to expect an Exception
> > when
> > > > > >> replication factor more than number of brokers while creating
> > topic
> > > > > >> (MohanDV via merrimanr) closes apache/metron#892
> > > > > >> 3 months ago METRON-1413 Add Metron Commit Tool (nickwallen)
> > closes
> > > > > >> apache/metron#902
> > > > > >> 3 months ago METRON-1429 SearchIntegrationTest refactor
> > (merrimanr)
> > > > > >> closes apache/metron#909
> > > > > >> 3 months ago METRON-1426: SensorIndexingConfigController
> > > > IntegrationTest
> > > > > >> fails intermittently closes apache/metron#906
> > > > > >> 4 months ago METRON-1417: Disable pcap-service by default in
> Monit
> > > > > >> (mmiklavc via mmiklavc) closes apache/metron#905
> > > > > >> 4 months ago METRON-1400: Elasticsearch service check fails in
> > > Ambari
> > > > > >> (mmiklavc via mmiklavc) closes apache/metron#904
> > > > > >> 4 months ago METRON-1428: Travis build failing from
> metron-config
> > > > > >> (mmiklavc via mmiklavc) closes apache/metron#908
> > > > > >> 4 months ago METRON-1302: Split up Indexing Topology into batch
> > and
> > > > > >> random access sections closes apache/incubator-metron#831
> > > > > >> 4 months ago METRON-1395 Documentation missing for Produce a
> > message
> > > > to
> > > > > a
> > > > > >> Kafka topic Rest API endpoint (MohanDV via nickwallen) closes
> > > > > >> apache/metron#897
> > > > > >> 4 months ago METRON-1411 Fix sed command in Upgrading.md
> > > (justinleet)
> > > > > >> closes apache/metron#900
> > > > > >> 4 months ago METRON-1326: Metron deploy with Kerberos fails on
> > > Ambari
> > > > > 2.5
> > > > > >> during ES service stop (mmiklavc via mmiklavc) closes
> > > > apache/metron#894
> > > > > >> 4 months ago METRON-1380: Create a typosquatting use-case closes
> > > > > >> apache/incubator-metron#882
> > > > > >> 4 months ago METRON-1230: As a stopgap prior to METRON-777, add
> > more
> > > > > >> simplistic sideloading of custom Parsers closes
> > > > > apache/incubator-metron#785
> > > > > >> 4 months ago METRON-1378: Create a summarizer closes
> > > > > >> apache/incubator-metron#879
> > > > > >> 4 months ago METRON-1231 Separate Sensor name and topic in the
> > > > > Management
> > > > > >> UI (merrimanr) closes apache/metron#786
> > > > > >> 4 months ago METRON-1382 Run Stellar in a Zeppelin Notebook
> > > > (nickwallen)
> > > > > >> closes apache/metron#884
> > > > > >> 4 months ago METRON-1396 Fix .gitignore files to not ignore
> > > themselves
> > > > > >> (justinleet) closes apache/metron#896
> > > > > >> 4 months ago METRON-1366: Add an entropy stellar function
> (cstella
> > > via
> > > > > >> mmiklavc) closes apache/metron#872
> > > > > >> 4 months ago METRON-1390: Swagger UI for "Web Security Config"
> > > > > Controller
> > > > > >> needs request method (MohanDV via mmiklavc) closes
> > apache/metron#889
> > > > > >> 4 months ago METRON-1393: Fix bro Elasticsearch template
> (mmiklavc
> > > via
> > > > > >> mmiklavc) closes apache/metron#893
> > > > > >> 4 months ago METRON-1379: Add an OBJECT_GET stellar function
> > closes
> > > > > >> apache/incubator-metron#880
> > > > > >> 4 months ago METRON-939: Upgrade ElasticSearch and Kibana
> > (mmiklavc
> > > > via
> > > > > >> mmiklavc) closes apache/metron#840
> > > > > >> 4 months ago METRON-1377: Stellar function to generate
> > typosquatted
> > > > > >> domains (similar to dnstwist) closes apache/incubator-metron#878
> > > > > >> 4 months ago METRON-1385 Missing properties in index
> > > > > template
> > > > > >> causes ElasticsearchColumnMetadataDao.getColumnMetadata to fail
> > > > > >> (merrimanr) closes apache/metron#886
> > > > > >> 4 months ago METRON-1388 update public web site to point at
> 0.4.2
> > > new
> > > > > >> release (mattf-horton) closes apache/metron#887
> > > > > >> 4 months ago METRON-1362 Improve Metron Deployment README
> > > (nickwallen)
> > > > > >> closes apache/metron#869
> > > > > >> 4 months ago METRON-1384 Increment master version number to
> 0.4.3
> > > for
> > > > > >> on-going development (mattf-horton via nickwallen) closes
> > > > > apache/metron#885
> > > > > >> 4 months ago METRON-1381 Add Apache license to MD files and
> remove
> > > the
> > > > > >> Rat exclusion (justinleet) closes apache/metron#883
> > > > > >> 4 months ago METRON-1071 Create CONTRIBUTING.md (justinleet)
> > closes
> > > > > >> apache/metron#881
> > > > > >> 4 months ago METRON-1373 RAT failure for
> > > > metron-interface/metron-alerts
> > > > > >> (mattf-horton) closes apache/metron#875
> > > > > >> 4 months ago METRON-1351 Create Installable Packages for Ubuntu
> > > Trusty
> > > > > >> (nickwallen) closes apache/metron#868
> > > > > >> 5 months ago METRON-1376 RC Check Script should have named
> > > parameters
> > > > > >> (ottobackwards via nickwallen) closes apache/metron#877
> > > > > >> 5 months ago METRON-1365: Allow PROFILE_GET to return a default
> > > value
> > > > > for
> > > > > >> a profile and entity that does not have a value written. closes
> > > > > >> apache/incubator-metron#871
> > > > > >> 5 months ago METRON-1348 Metron Service Checks Use Wrong
> Hostname
> > > > > >> (nickwallen) closes apache/metron#864
> > > > > >> 5 months ago METRON-1350: Add reservoir sampling functions to
> > > Stellar
> > > > > >> closes apache/incubator-metron#867
> > > > > >> 5 months ago METRON-1374 Script the RC checking process
> > > > (ottobackwards)
> > > > > >> closes apache/metron#876
> > > > > >> 5 months ago METRON-1372 Validate JIRA for Releases (nickwallen)
> > > > closes
> > > > > >> apache/metron#874
> > > > > >> 5 months ago METRON-1345: Update EC2 README for custom Ansible
> > > > (mmiklavc
> > > > > >> via mmiklavc) closes apache/metron#859
> > > > > >> 5 months ago METRON-1349 Full Dev Builds Metron Twice
> (nickwallen)
> > > > > closes
> > > > > >> apache/metron#866
> > > > > >> 5 months ago METRON-1343 Swagger UI for User Controller needs
> > > request
> > > > > >> method (MohanDV via ottobackwards) closes apache/metron#862
> > > > > >> 5 months ago METRON-1306: When index template install fails, we
> > > should
> > > > > >> fail the install closes apache/incubator-metron#834
> > > > > >> 5 months ago METRON-1341 Projection FieldTransformation
> > > > > >> (simonellistonball via ottobackwards) closes apache/metron#861
> > > > > >>
> > > > > >>
> > > > > >> On Wed, May 9, 2018 at 11:57 AM, Michael Miklavcic <
> > > > > >> michael.miklav...@gmail.com> wrote:
> > > > > >>
> > > > > >>> Is this what you mean Otto?
> > > > > >>> https://github.com/apache/metron/blob/24822dddc68c264f59723f
> > > > > >>>
> > > > >
> > >
> 5e17d423cd497f6807/dev-utilities/release-utils/validate-jira-for-release
> > > > > >>>
> > > > > >>> On Wed, May 9, 2018 at 9:52 AM, Casey Stella <
> ceste...@gmail.com
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>> > I wasn't aware we had a script for that..is that in
> > > > > >>> > dev-utilities/release-utils?
> > > > > >>> >
> > > > > >>> > On Wed, May 9, 2018 at 11:41 AM Otto Fowler <
> > > > ottobackwa...@gmail.com
> > > > > >
> > > > > >>> > wrote:
> > > > > >>> >
> > > > > >>> > > Can you run the issues included script and post that for us
> > to
> > > > see?
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > On May 9, 2018 at 11:14:11, Casey Stella (
> ceste...@gmail.com
> > )
> > > > > wrote:
> > > > > >>> > >
> > > > > >>> > > Is it about time for a release? I know we got some
> > substantial
> > > > > >>> > performance
> > > > > >>> > > changes in since the last release. I think we might have a
> > > > > >>> justification
> > > > > >>> > > for a release.
> > > > > >>> > >
> > > > > >>> > > Casey
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > --
> > > >
> > > > Jon
> > > >
> > >
> > --
> >
> > Jon
> >
>


Re: [DISCUSS] Release Manager

2018-05-10 Thread Casey Stella
I'm +1 to Justin being RM; he's going to have big shoes to fill with Matt
gone. ;) Also, if it wasn't obvious, deep and hearty thanks to Matt again
for being our RM.

On Thu, May 10, 2018 at 12:06 PM Ryan Merriman <merrim...@gmail.com> wrote:

> Thanks for all your help Matt.
>
> On Thu, May 10, 2018 at 10:53 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Thanks Matt for doing this for the community.
> >
> > Justin Leet as new lord commander of the Night's Watch? Aye, dilly,
> dilly.
> >
> > On Thu, May 10, 2018 at 9:07 AM, Justin Leet <justinjl...@gmail.com>
> > wrote:
> >
> > > I'd be happy to to volunteer to take over for a while.
> > >
> > > Thanks to Matt for all the help through the last couple releases!
> > >
> > > Justin
> > >
> > > On Thu, May 10, 2018 at 11:06 AM, Casey Stella <ceste...@gmail.com>
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Matt Foley, our esteemed Release manager for the last couple
> releases,
> > > has
> > > > asked to be relieved.  So, I'm calling on volunteers for the next
> > release
> > > > manager.  It should be a committer and there are a few things that
> > > require
> > > > a PMC member, I believe, but the release manager can ask for help
> from
> > a
> > > > PMC member.
> > > >
> > > > So, Matt's watch has ended, who wants to volunteer?
> > > >
> > > > Casey
> > > >
> > >
> >
>


[DISCUSS] Release Manager

2018-05-10 Thread Casey Stella
Hi All,

Matt Foley, our esteemed Release manager for the last couple releases, has
asked to be relieved.  So, I'm calling on volunteers for the next release
manager.  It should be a committer and there are a few things that require
a PMC member, I believe, but the release manager can ask for help from a
PMC member.

So, Matt's watch has ended, who wants to volunteer?

Casey


Re: [DISCUSS] Treating null as false in boolean expressions in Stellar

2018-06-16 Thread Casey Stella
I created a PR for the empty collection falseyness as well:
https://github.com/apache/metron/pull/1064 so we can choose either of them
if we so desire.

On Sat, Jun 16, 2018 at 1:10 PM Casey Stella  wrote:

> I created a PR for this functionality, in case we decided for it:
> https://github.com/apache/metron/pull/1063
>
> Also, while we're talking, perhaps we should treat empty lists as false as
> well, like javascript and python.
> So, for instance, if [] then 'blah' else 'foo' would return foo.
>
> Thoughts?
>
> On Sat, Jun 16, 2018 at 10:17 AM Casey Stella  wrote:
>
>> Right now, because fields may not exist, users can have an awkward time.
>> For instance, checking for is_alert, you end up having to preface checks
>> with exists(is_alert).
>>
>> For instance, in one of our use-cases:
>> https://github.com/apache/metron/tree/master/use-cases/geographic_login_outliers
>> we use
>>
>> "is_alert := exists(is_alert) && is_alert",
>> "is_alert := is_alert || (geo_outlier != null && geo_outlier == true)",
>>
>>  instead of :
>>
>> "is_alert := is_alert || geo_outlier == true",
>>
>> I suggest that we adopt a convention from javascript whereby we assume a
>> field not existing or being null should act as false in boolean
>> expressions.  This will simplify stellar's use and hopefully result in less
>> awkwardness.
>>
>> Thoughts?
>>
>


Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-13 Thread Casey Stella
 June 2018 at 18:00, Charles Joynt 
> >>wrote:
> >>
> >>> Thanks for the responses. I appreciate the willingness to look at
> >>> creating a NiFi processer. That would be great!
> >>>
> >>> Just to follow up on this (after a week looking after the "ops" side
> >>> of
> >>> dev-ops): I really don't want to have to use the flatfile loader
> >>> script, and I'm not going to be able to write a Metron-style HBase
> >>> key generator any time soon, but I have had some success with a
> different approach.
> >>>
> >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> >>> 192.168.0.198"
> >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> >>>
> >>> I then followed your instructions in this blog:
> >>> https://cwiki.apache.org/confluence/display/METRON/
> >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
> >>> ent
> >>>
> >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> >>> into HBase:
> >>>
> >>> {
> >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> >>> "writerClassName": "org.apache.metron.enrichment.writer.
> >>> SimpleHbaseEnrichmentWriter",
> >>> "sensorTopic": "dns",
> >>> "parserConfig": {
> >>> "shew.table": " dns",
> >>> "shew.cf": "dns",
> >>> "shew.keyColumns": "name",
> >>> "shew.enrichmentType": "dns",
> >>> "columns": {
> >>> "name": 0,
> >>> "type": 1,
> >>> "data": 2
> >>> }
> >>> },
> >>> }
> >>>
> >>> And... it seems to be working. At least, I have data in HBase which
> >>> looks more like the output of the flatfile loader.
> >>>
> >>> Charlie
> >>>
> >>> -Original Message-
> >>> From: Casey Stella [mailto:ceste...@gmail.com]
> >>> Sent: 05 June 2018 14:56
> >>> To: dev@metron.apache.org
> >>> Subject: Re: Writing enrichment data directly from NiFi with
> >>> PutHBaseJSON
> >>>
> >>> The problem, as you correctly diagnosed, is the key in HBase.  We
> >>> construct the key very specifically in Metron, so it's unlikely to
> >>> work out of the box with the NiFi processor unfortunately.  The key
> >>> that we use is formed here in the codebase:
> >>> https://github.com/cestella/incubator-metron/blob/master/
> >>> metron-platform/metron-enrichment/src/main/java/org/
> >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >>>
> >>> To put that in english, consider the following:
> >>>
> >>>- type - The enrichment type
> >>>- indicator - the indicator to use
> >>>- hash(*) - A murmur 3 128bit hash function
> >>>
> >>> the key is hash(indicator) + type + indicator
> >>>
> >>> This hash prefixing is a standard practice in hbase key design that
> >>> allows the keys to be uniformly distributed among the regions and
> >>> prevents hotspotting.  Depending on how the PutHBaseJSON processor
> >>> works, if you can construct the key and pass it in, then you might be
> >>> able to either construct the key in NiFi or write a processor to
> construct the key.
> >>> Ultimately though, what Carolyn said is true..the easiest approach is
> >>> probably using the flatfile loader.
> >>> If you do get this working in NiFi, however, do please let us know
> >>> and/or consider contributing it back to the project as a PR :)
> >>>
> >>>
> >>>
> >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> >>> charles.jo...@gresearch.co.uk>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > I work as a Dev/Ops Data Engineer within

Re: [DISCUSS] Refactoring

2018-05-30 Thread Casey Stella
Yep, I think we can, mike.

Let me start with a emendation:

"Don’t combine code changes with lots of edits of whitespace, comments, or
code changes specifically for cosmetic refactoring purposes aimed solely
readability; it makes code review
and merging difficult. It’s okay to fix an occasional comment or
indentation, but if
wholesale comment, whitespace or other refactoring changes are needed, make
them a separate PR."


On Wed, May 30, 2018 at 8:48 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Completely agreed on all points. Can we do that here and spin up a vote
> thread following with the final proposed changes?
>
> On Wed, May 30, 2018 at 9:46 AM, Casey Stella  wrote:
>
>> I'm torn on this, honestly.  I completely agree that cosmetic refactoring
>> gets in the way of review and the risk can be more than the reward,
>> especially in a subtle bit of code.
>> That being said, I'm a big fan of opportunistically refactoring to
>> generalize or correct faulty assumptions.  Often, I can't justify making an
>> abstraction until I have seen the need more than once, so I will make the
>> abstraction, as long as it's small and well-contained, in the PR
>> opportunistically, that motivates the 2nd usage.  I like that kind of
>> opportunistic refactoring and I think that shouldn't be dissuaded.
>>
>> I agree with Otto, we should have a round of discussion on the doc text
>> and I'd suggest we clarify to be cosmetic refactoring solely due to
>> readability concerns.
>>
>> Just my $0.02
>>
>> On Tue, May 29, 2018 at 7:40 PM Otto Fowler 
>> wrote:
>>
>>> On top of this, refactoring under another PR’s goals tends to be less
>>> documented as to the intent
>>> and effect.
>>>
>>> +1 for the idea, we should have a vote round or edit round on the doc’s
>>> specific text.
>>> Although I will say, that some things it doesn’t matter how much you
>>> break
>>> them up wrt reviews.
>>> We should have so many reviewers that this is a problem.
>>>
>>>
>>>
>>>
>>> On May 29, 2018 at 20:05:49, Michael Miklavcic (
>>> michael.miklav...@gmail.com)
>>> wrote:
>>>
>>> I want to bring up the subject of code refactoring and how we should
>>> manage
>>> this in PR's as our product evolves. As Metron matures, it's only natural
>>> that we'll have and increasing number of contributors, and subsequently
>>> contributions affecting many hardened parts of the code base. We've
>>> generally not been particular about mixing refactoring changes with other
>>> types of improvements or bug fixes. As a general best practice for
>>> software
>>> engineering it is indeed desirable to undergo regular refactoring as a
>>> matter of "scouts' rules" or "fixing broken windows." This helps keep
>>> code
>>> readable and has the benefit of a fresh pair of eyes to see code in a new
>>> way that allows the newcomer to introduce clarifying changes that the
>>> original author(s) may not have considered.
>>>
>>> While refactoring is generally applauded (because we have unit,
>>> integration, and acceptance tests backing our changes), it does pose some
>>> challenges during the review process. Depending on the type of PR, the
>>> refactoring work can at times be many orders of magnitude larger than the
>>> code pertinent to the desired change in functionality, whether bug fix or
>>> feature enhancement, itself. While tests should protect against
>>> unintended
>>> side effects (and sometimes they are also refactored) it does introduce
>>> the
>>> possibility of new subtle bugs. It also makes a lot of PR's a conflated
>>> mix
>>> of comments pertinent to the improvement/fix and opinions about best
>>> practices around coding style.
>>>
>>> I propose a simple change - we update our coding style guidelines in
>>> section 2.1 to expand on refactoring. We currently cover whitespace and
>>> comments:
>>>
>>> "Don’t combine code changes with lots of edits of whitespace or comments;
>>> it makes code review too difficult. It’s okay to fix an occasional
>>> comment
>>> or indenting, but if wholesale comment or whitespace changes are needed,
>>> make them a separate PR."
>>>
>>> I propose we expand this to say:
>>>
>>> "Don’t combine code changes with lots of edits of whitespace, comments,
>>> or
>>> code changes spec

Re: [DISCUSS] Refactoring

2018-05-30 Thread Casey Stella
I'm torn on this, honestly.  I completely agree that cosmetic refactoring
gets in the way of review and the risk can be more than the reward,
especially in a subtle bit of code.
That being said, I'm a big fan of opportunistically refactoring to
generalize or correct faulty assumptions.  Often, I can't justify making an
abstraction until I have seen the need more than once, so I will make the
abstraction, as long as it's small and well-contained, in the PR
opportunistically, that motivates the 2nd usage.  I like that kind of
opportunistic refactoring and I think that shouldn't be dissuaded.

I agree with Otto, we should have a round of discussion on the doc text and
I'd suggest we clarify to be cosmetic refactoring solely due to readability
concerns.

Just my $0.02

On Tue, May 29, 2018 at 7:40 PM Otto Fowler  wrote:

> On top of this, refactoring under another PR’s goals tends to be less
> documented as to the intent
> and effect.
>
> +1 for the idea, we should have a vote round or edit round on the doc’s
> specific text.
> Although I will say, that some things it doesn’t matter how much you break
> them up wrt reviews.
> We should have so many reviewers that this is a problem.
>
>
>
>
> On May 29, 2018 at 20:05:49, Michael Miklavcic (
> michael.miklav...@gmail.com)
> wrote:
>
> I want to bring up the subject of code refactoring and how we should manage
> this in PR's as our product evolves. As Metron matures, it's only natural
> that we'll have and increasing number of contributors, and subsequently
> contributions affecting many hardened parts of the code base. We've
> generally not been particular about mixing refactoring changes with other
> types of improvements or bug fixes. As a general best practice for software
> engineering it is indeed desirable to undergo regular refactoring as a
> matter of "scouts' rules" or "fixing broken windows." This helps keep code
> readable and has the benefit of a fresh pair of eyes to see code in a new
> way that allows the newcomer to introduce clarifying changes that the
> original author(s) may not have considered.
>
> While refactoring is generally applauded (because we have unit,
> integration, and acceptance tests backing our changes), it does pose some
> challenges during the review process. Depending on the type of PR, the
> refactoring work can at times be many orders of magnitude larger than the
> code pertinent to the desired change in functionality, whether bug fix or
> feature enhancement, itself. While tests should protect against unintended
> side effects (and sometimes they are also refactored) it does introduce the
> possibility of new subtle bugs. It also makes a lot of PR's a conflated mix
> of comments pertinent to the improvement/fix and opinions about best
> practices around coding style.
>
> I propose a simple change - we update our coding style guidelines in
> section 2.1 to expand on refactoring. We currently cover whitespace and
> comments:
>
> "Don’t combine code changes with lots of edits of whitespace or comments;
> it makes code review too difficult. It’s okay to fix an occasional comment
> or indenting, but if wholesale comment or whitespace changes are needed,
> make them a separate PR."
>
> I propose we expand this to say:
>
> "Don’t combine code changes with lots of edits of whitespace, comments, or
> code changes specifically for refactoring purposes; it makes code review
> too difficult. It’s okay to fix an occasional comment or indenting, but if
> wholesale comment, whitespace or other refactoring changes are needed, make
> them a separate PR."
>
>
> I believe this provides additional clarity. I think it's one thing to
> extract a method or introduce changes for code you're specifically
> modifying, and another thing to introduce changes that affect surrounding
> code. I would also propose we emphasize the Google checkstyle and
> auto-formatting tooling when submitting any changes, but dealing with
> enforcement is not my focus for this discuss thread.
>
> https://cwiki.apache.org/confluence/display/METRON/Development+Guidelines
>
> Best,
> Michael Miklavcic
>


Re: [DISCUSS] Refactoring

2018-05-30 Thread Casey Stella
Yeah, that's true.

On Wed, May 30, 2018 at 8:58 AM Otto Fowler  wrote:

> We can say that any refactoring that *is* necessary, needs to be written
> out and justified in the review.
> So, we don’t recommend it, but if you have to, and you can reasonably
> defend it, OK.
>
>
> On May 30, 2018 at 11:53:51, Casey Stella (ceste...@gmail.com) wrote:
>
> Yep, I think we can, mike.
>
> Let me start with a emendation:
>
> "Don’t combine code changes with lots of edits of whitespace, comments, or
> code changes specifically for cosmetic refactoring purposes aimed solely
> readability; it makes code review
> and merging difficult. It’s okay to fix an occasional comment or
> indentation, but if
> wholesale comment, whitespace or other refactoring changes are needed,
> make
> them a separate PR."
>
>
> On Wed, May 30, 2018 at 8:48 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > Completely agreed on all points. Can we do that here and spin up a vote
> > thread following with the final proposed changes?
> >
> > On Wed, May 30, 2018 at 9:46 AM, Casey Stella 
> wrote:
> >
> >> I'm torn on this, honestly. I completely agree that cosmetic
> refactoring
> >> gets in the way of review and the risk can be more than the reward,
> >> especially in a subtle bit of code.
> >> That being said, I'm a big fan of opportunistically refactoring to
> >> generalize or correct faulty assumptions. Often, I can't justify making
> an
> >> abstraction until I have seen the need more than once, so I will make
> the
> >> abstraction, as long as it's small and well-contained, in the PR
> >> opportunistically, that motivates the 2nd usage. I like that kind of
> >> opportunistic refactoring and I think that shouldn't be dissuaded.
> >>
> >> I agree with Otto, we should have a round of discussion on the doc text
> >> and I'd suggest we clarify to be cosmetic refactoring solely due to
> >> readability concerns.
> >>
> >> Just my $0.02
> >>
> >> On Tue, May 29, 2018 at 7:40 PM Otto Fowler 
> >> wrote:
> >>
> >>> On top of this, refactoring under another PR’s goals tends to be less
> >>> documented as to the intent
> >>> and effect.
> >>>
> >>> +1 for the idea, we should have a vote round or edit round on the
> doc’s
> >>> specific text.
> >>> Although I will say, that some things it doesn’t matter how much you
> >>> break
> >>> them up wrt reviews.
> >>> We should have so many reviewers that this is a problem.
> >>>
> >>>
> >>>
> >>>
> >>> On May 29, 2018 at 20:05:49, Michael Miklavcic (
> >>> michael.miklav...@gmail.com)
> >>> wrote:
> >>>
> >>> I want to bring up the subject of code refactoring and how we should
> >>> manage
> >>> this in PR's as our product evolves. As Metron matures, it's only
> natural
> >>> that we'll have and increasing number of contributors, and
> subsequently
> >>> contributions affecting many hardened parts of the code base. We've
> >>> generally not been particular about mixing refactoring changes with
> other
> >>> types of improvements or bug fixes. As a general best practice for
> >>> software
> >>> engineering it is indeed desirable to undergo regular refactoring as a
> >>> matter of "scouts' rules" or "fixing broken windows." This helps keep
> >>> code
> >>> readable and has the benefit of a fresh pair of eyes to see code in a
> new
> >>> way that allows the newcomer to introduce clarifying changes that the
> >>> original author(s) may not have considered.
> >>>
> >>> While refactoring is generally applauded (because we have unit,
> >>> integration, and acceptance tests backing our changes), it does pose
> some
> >>> challenges during the review process. Depending on the type of PR, the
> >>> refactoring work can at times be many orders of magnitude larger than
> the
> >>> code pertinent to the desired change in functionality, whether bug fix
> or
> >>> feature enhancement, itself. While tests should protect against
> >>> unintended
> >>> side effects (and sometimes they are also refactored) it does
> introduce
> >>> the
> >>> possibility of new subtle bugs. It also makes a lot of PR's a
> conflated
> >>> mix
> >>> of

Re: [VOTE] Metron Release Candidate 0.5.0-RC1

2018-05-29 Thread Casey Stella
Just a question, do we need anything new in the Upgrading.md for this
release?  Any migration that we expect people to do?

On Tue, May 29, 2018 at 11:30 AM Nick Allen  wrote:

> METRON-1544 was just merged into master.
>
>
> On Tue, May 29, 2018 at 2:16 PM, Justin Leet 
> wrote:
>
> > I'm going to go ahead and cancel RC1, since METRON-1544 looks pretty set.
> >
> > A new release candidate will be cut.
> >
> > Results (including my own vote):
> > +1
> > Nick Allen
> >
> >
> > -1
> > Otto Fowler
> > Justin Leet
> >
> > On Tue, May 29, 2018 at 10:39 AM, Justin Leet 
> > wrote:
> >
> >> I didn't realize METRON-1544 wasn't in.  I'm definitely okay with
> >> cancelling the vote, and kicking out a new RC.
> >>
> >> On Tue, May 29, 2018 at 7:11 AM, Otto Fowler 
> >> wrote:
> >>
> >>> -1 (binding)
> >>>
> >>> My yield for building this it terrible.  1 in 3.
> >>>
> >>> I propose https://github.com/apache/metron/pull/1015 inclusion.
> >>>
> >>>
> >>> On May 27, 2018 at 17:50:43, Nick Allen (n...@nickallen.org) wrote:
> >>>
> >>> No, the PR for this transient issue is still under review.
> >>>
> >>> On Sun, May 27, 2018 at 10:53 AM, Otto Fowler  >
> >>> wrote:
> >>>
> >>> >
> >>> > Failed tests:
> >>> >   CachingStellarProcessorTest.testCaching:73 expected:<6> but was:<5>
> >>> >
> >>> > I thought we landed a fix for this?
> >>> >
> >>> >
> >>> > On May 27, 2018 at 08:24:19, zeo...@gmail.com (zeo...@gmail.com)
> >>> wrote:
> >>> >
> >>> > We did discuss doing a release since there were two new commits, but
> I
> >>> > don't think it was included in this round.
> >>> >
> >>> > Jon
> >>> >
> >>> > On Sat, May 26, 2018, 10:22 Otto Fowler 
> >>> wrote:
> >>> >
> >>> > > Is there a BRO RC # for this?
> >>> > >
> >>> > >
> >>> > > On May 25, 2018 at 14:53:25, Nick Allen (n...@nickallen.org)
> wrote:
> >>> > >
> >>> > > +1 Release this package as Apache Metron 0.5.0-RC1
> >>> > >
> >>> > > Ran through all validation steps using the `metron-rc-check`
> script,
> >>> > which
> >>> > > included running all the tests, license checks, and spun-up the
> >>> CentOS
> >>> > dev
> >>> > > environment successfully.
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Fri, May 25, 2018 at 1:40 PM, Nick Allen 
> >>> wrote:
> >>> > >
> >>> > > > This release does not contain an updated Bro plugin so the RC
> check
> >>> > > script
> >>> > > > does not currently work. Try using the patch at
> >>> > > > https://github.com/apache/metron/pull/1034.
> >>> > > >
> >>> > > >
> >>> > > > On Thu, May 24, 2018 at 3:23 PM, Justin Leet 
> >>> wrote:
> >>> > > >
> >>> > > >> Hi all,
> >>> > > >>
> >>> > > >> This is a call to vote on releasing Apache Metron 0.5.0
> >>> > > >>
> >>> > > >> Full list of changes in this release:
> >>> > > >> https://dist.apache.org/repos/dist/dev/metron/0.5.0-RC1/CHANGES
> >>> > > >>
> >>> > > >> The tag/commit to be voted upon is:
> >>> > > >>
> >>> > > >> (apache/metron) apache-metron-0.5.0-rc1
> >>> > > >>
> >>> > > >> The source archive being voted upon can be found here:
> >>> > > >> https://dist.apache.org/repos/dist/dev/metron/0.5.0-RC1/
> >>> > > >> apache-metron-0.5.0-rc1.tar.gz
> >>> > > >>
> >>> > > >> Other release files, signatures and digests can be found here:
> >>> > > >> https://dist.apache.org/repos/dist/dev/metron/0.5.0-RC1/
> >>> > > >>
> >>> > > >> The release artifacts are signed with the following key:
> >>> > > >> https://dist.apache.org/repos/dist/dev/metron/0.5.0-RC1/KEYS
> >>> > > >>
> >>> > > >> Please vote on releasing this package as Apache Metron 0.5.0-RC1
> >>> > > >>
> >>> > > >> When voting, please list the actions taken to verify the
> release.
> >>> > > >>
> >>> > > >> Recommended build validation and verification instructions are
> >>> posted
> >>> > > >> here:
> >>> > > >> https://cwiki.apache.org/confluence/display/METRON/Verifying
> >>> +Builds
> >>> > > >>
> >>> > > >> This vote will be open until 4pm EDT on Tuesday May 29 2018, to
> >>> > account
> >>> > > >> for
> >>> > > >> the weekend.
> >>> > > >>
> >>> > > >> [ ] +1 Release this package as Apache Metron 0.3.0-RC1
> >>> > > >>
> >>> > > >> [ ] 0 No opinion
> >>> > > >>
> >>> > > >> [ ] -1 Do not release this package because...
> >>> > > >>
> >>> > > >
> >>> > > >
> >>> > >
> >>> > --
> >>> >
> >>> > Jon
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>


Re: [DISCUSS] Field conversions

2018-06-05 Thread Casey Stella
Well, on write it is a transformation, on read it's a translation.  This is
to say that you're providing a mapping on read to translate field names
given the index you're using.  The other approach that I was considering
last night is a field transformation REST call which translates field names
that the UI could call.  So, the UI would pass 'source.type' to the field
translation service and in Solr it'd return source.type and in ES it'd
return source:type.  Underneath the hood the service would use the same
transformation as the writer uses.  That's another way to skin this cat.

Ultimately, I think we should just ditch this field transformation
business, as Laurens said, as long as we have a utility to transform
existing data.

On Tue, Jun 5, 2018 at 8:54 AM Ryan Merriman  wrote:

> Having 2 different patterns for configuring field name transformations on
> read vs write is confusing to me.  I agree with both of you that
> normalizing on '.' and not having to do the translation at all would be
> ideal.  Like you both suggested, we would need some utility or script to
> convert preexisting data to match this format.  There could also be some
> adjustments a user would need to make in the UI but I feel like we could
> document around that.  Are there any objections to doing it this way?
>
>
>
> On Mon, Jun 4, 2018 at 4:30 PM, Laurens Vets  wrote:
>
> > ES 2.x support officially ended 4 months ago (
> > https://www.elastic.co/support/eol), so why still support ':' at all? :)
> > Additionally, 2.x isn't even supported at all on the last 2 Ubuntu LTS
> > releases (16.04 & 18.05).
> >
> > Therefor, move everything to use '.' and provide a conversion/upgrade
> > script to change '.' to ':'?
> >
> >
> > On 2018-06-04 13:55, Ryan Merriman wrote:
> >
> >> We've been dealing with a reoccurring challenge in Metron.  It is common
> >> for various fields to contain '.' characters for the purpose of making
> >> them
> >> more readable, namespacing, etc.  At one point we only supported
> >> Elasticsearch 2.3 which did not allow dots and forced us to use ':'
> >> instead.  This limitation does not exist in later versions of
> >> Elasticsearch
> >> or Solr.
> >>
> >> Now we're in a situation where we need to allow a user to use either one
> >> because they may still be using ES 2.3 or have data with ':' characters
> in
> >> field names.  We've attempted to make this configurable in a couple
> >> different PRs:
> >>
> >> https://github.com/apache/metron/pull/1022
> >> https://github.com/apache/metron/pull/1010
> >> https://github.com/apache/metron/pull/1038
> >>
> >> The approaches taken in these are not consistent and fall short in
> >> different ways.  The first (METRON-1569 Allow user to change field name
> >> conversion when indexing) only applies to indexing and not querying.
> The
> >> others only apply to a single field which does not scale well.  Now we
> >> have
> >> an issue with another field in
> >> https://issues.apache.org/jira/browse/METRON-1600.  Rather than
> >> continuing
> >> with a patchwork of different fixes I want to attempt to design a
> >> system-wide solution.
> >>
> >> My first thought is to expand
> https://github.com/apache/metron/pull/1022
> >> to
> >> apply globally.  However this is not trivial and would require
> significant
> >> changes.  It would also make https://github.com/apache/metron/pull/1010
> >> obsolete and we might end up having to revert all of it.
> >>
> >> Does anyone have any ideas or opinions?  I am still researching
> solutions
> >> but would love some guidance from the community.
> >>
> >
>


Re: [DISCUSS] Field conversions

2018-06-05 Thread Casey Stella
To be clear, I'm not even suggesting that we create any tooling here.  I'd
say just a reference to the ES docs and a call-out in Upgrading.md would
suffice as long as we have some strong reason to believe it'll work.  As
far as I'm concerned, the sooner we're out of the business of transforming
fields, the better.

On Tue, Jun 5, 2018 at 9:49 AM Justin Leet  wrote:

> ES does have some docs around how this gets handled in upgrades:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html
>
> Might be worth taking a look to see what conflicts we'd have going from 2.x
> to 5.x and figuring out where to go from there.
>
> On Tue, Jun 5, 2018 at 9:46 AM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > I guess in principal you could use
> > https://www.elastic.co/guide/en/elasticsearch/reference/
> > current/docs-reindex.html#docs-reindex-change-name
> > to reindex with the new fields. It wouldn't be hard to script up a bit of
> > python to help users out with that, or of course to leave that as an
> > exercise to the reader. It would be nice to have a script that read and
> > transformed fields for templates and indices to replace the colons with
> > dots in ES.
> >
> > Simon
> >
> > On 5 June 2018 at 06:40, Casey Stella  wrote:
> >
> > > +1 to that, Simon.  Do we have a sense of if there are utilities
> provided
> > > by ES to do this kind of migration transformation easily?
> > >
> > > On Tue, Jun 5, 2018 at 9:37 AM Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > > > I would definitely agree that the transformation should be removed.
> We
> > > have
> > > > now however added a complex generic solution in the backend, which is
> > > going
> > > > to be noop for most people. This was done I believe for the sake of
> > > > backward compatibility. I would argue however, that there is no need
> to
> > > > support ES 2.3, and therefore no need to support de-dotting
> > > > transformations. This does seem somewhat over-engineered to me,
> though
> > it
> > > > does save people re-indexing on upgrades. I suspect in reality that
> > this
> > > is
> > > > a rare edge case, and that we would do far better to settle on one
> > > solution
> > > > (the dotted version, not the colons, to my mind)
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:29, Ryan Merriman  wrote:
> > > >
> > > > > I agree completely.  I will leave this thread open for a day or two
> > to
> > > > give
> > > > > others a chance to weigh in.  If no one opposes, I will creates
> Jiras
> > > for
> > > > > removing field transformations and transforming existing data.
> > > > >
> > > > > On Tue, Jun 5, 2018 at 8:21 AM, Casey Stella 
> > > wrote:
> > > > >
> > > > > > Well, on write it is a transformation, on read it's a
> translation.
> > > > This
> > > > > is
> > > > > > to say that you're providing a mapping on read to translate field
> > > names
> > > > > > given the index you're using.  The other approach that I was
> > > > considering
> > > > > > last night is a field transformation REST call which translates
> > field
> > > > > names
> > > > > > that the UI could call.  So, the UI would pass 'source.type' to
> the
> > > > field
> > > > > > translation service and in Solr it'd return source.type and in ES
> > > it'd
> > > > > > return source:type.  Underneath the hood the service would use
> the
> > > same
> > > > > > transformation as the writer uses.  That's another way to skin
> this
> > > > cat.
> > > > > >
> > > > > > Ultimately, I think we should just ditch this field
> transformation
> > > > > > business, as Laurens said, as long as we have a utility to
> > transform
> > > > > > existing data.
> > > > > >
> > > > > > On Tue, Jun 5, 2018 at 8:54 AM Ryan Merriman <
> merrim...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Having 2 different patterns for configuring field name
> > > > transformations
> > > > > on
> > > > > > > read vs write is 

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
The problem, as you correctly diagnosed, is the key in HBase.  We construct
the key very specifically in Metron, so it's unlikely to work out of the
box with the NiFi processor unfortunately.  The key that we use is formed
here in the codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows
the keys to be uniformly distributed among the regions and prevents
hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
construct the key and pass it in, then you might be able to either
construct the key in NiFi or write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is
probably using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or
consider contributing it back to the project as a PR :)



On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt 
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a company in
> London where we are in the process of implementing Metron. I have been
> tasked with implementing feeds of network environment data into HBase so
> that this data can be used as enrichment sources for our security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that it
> exactly matches what I would get from the flatfile_loader.sh script. A
> colleague of mine has already loaded some DNS data using that script, so I
> am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP
> listener and routes it to a PutHBaseJSON processor. The flow is working, in
> the sense that data is successfully written to HBase, but despite (naively)
> specifying "Row Identifier Encoding Strategy = Binary", the results in
> HBase don't look correct. Comparing the output from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840,
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783,
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> From source JSON:
>
>
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}
>
> I know that there are some differences in column family / field names, but
> my worry is the ROW id. Presumably I need to encode my row key, "k" in the
> JSON data, in a way that matches how the flatfile_loader.sh script did it.
>
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-readable ROW ids?
>
> Charlie Joynt
>
> --
> G-RESEARCH believes the information provided herein is reliable. While
> every care has been taken to ensure accuracy, the information is furnished
> to the recipients with no warranty as to the completeness and accuracy of
> its contents and on condition that any errors or omissions shall not be
> made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and do
> not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by
> G-RESEARCH and are subject to archival storage, monitoring, review and
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company number
> 08127121.
> --
>


Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
I'd be in strong support of that, Simon.  I think we should have some other
NiFi components in Metron to enable users to interact with our
infrastructure from NiFi (e.g. being able to transform via stellar, etc).

On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Do we, the community, think it would be a good idea to create a
> PutMetronEnrichment NiFi processor for this use case? It seems a number of
> people want to use NiFi to manage and schedule loading of enrichments for
> example.
>
> Simon
>
> On 5 June 2018 at 06:56, Casey Stella  wrote:
>
> > The problem, as you correctly diagnosed, is the key in HBase.  We
> construct
> > the key very specifically in Metron, so it's unlikely to work out of the
> > box with the NiFi processor unfortunately.  The key that we use is formed
> > here in the codebase:
> > https://github.com/cestella/incubator-metron/blob/master/
> > metron-platform/metron-enrichment/src/main/java/org/
> > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >
> > To put that in english, consider the following:
> >
> >- type - The enrichment type
> >- indicator - the indicator to use
> >- hash(*) - A murmur 3 128bit hash function
> >
> > the key is hash(indicator) + type + indicator
> >
> > This hash prefixing is a standard practice in hbase key design that
> allows
> > the keys to be uniformly distributed among the regions and prevents
> > hotspotting.  Depending on how the PutHBaseJSON processor works, if you
> can
> > construct the key and pass it in, then you might be able to either
> > construct the key in NiFi or write a processor to construct the key.
> > Ultimately though, what Carolyn said is true..the easiest approach is
> > probably using the flatfile loader.
> > If you do get this working in NiFi, however, do please let us know and/or
> > consider contributing it back to the project as a PR :)
> >
> >
> >
> > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > charles.jo...@gresearch.co.uk>
> > wrote:
> >
> > > Hello,
> > >
> > > I work as a Dev/Ops Data Engineer within the security team at a company
> > in
> > > London where we are in the process of implementing Metron. I have been
> > > tasked with implementing feeds of network environment data into HBase
> so
> > > that this data can be used as enrichment sources for our security
> events.
> > > First-off I wanted to pull in DNS data for an internal domain.
> > >
> > > I am assuming that I need to write data into HBase in such a way that
> it
> > > exactly matches what I would get from the flatfile_loader.sh script. A
> > > colleague of mine has already loaded some DNS data using that script,
> so
> > I
> > > am using that as a reference.
> > >
> > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > listener and routes it to a PutHBaseJSON processor. The flow is
> working,
> > in
> > > the sense that data is successfully written to HBase, but despite
> > (naively)
> > > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > > HBase don't look correct. Comparing the output from HBase scan
> commands I
> > > see:
> > >
> > > flatfile_loader.sh produced:
> > >
> > > ROW:
> > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > x05whois\x00\x0E192.168.0.198
> > > CELL: column=data:v, timestamp=1516896203840,
> > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> > >
> > > PutHBaseJSON produced:
> > >
> > > ROW:  server.domain.local
> > > CELL: column=dns:v, timestamp=1527778603783,
> > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > >
> > > From source JSON:
> > >
> > >
> > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > ,"type":"A","data":"192.168.0.198"}}
> > >
> > > I know that there are some differences in column family / field names,
> > but
> > > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> > the
> > > JSON data, in a way that matches how the flatfile_loader.sh script did
> > it.
> > >
> > > Can anyone explain how 

Re: [DISCUSS] Field conversions

2018-06-05 Thread Casey Stella
Agreed, we should definitely have a clear picture about how to do that,
maybe even a worked example in the use-cases that we can reference.  I'm
just saying we don't need to migrate ES docs into Metron, but rather
reference them as much as we possibly can.

On Tue, Jun 5, 2018 at 11:38 AM Otto Fowler  wrote:

> It is still our user list and dev list that will have the burden of
> talking folks through that.
>
>
> On June 5, 2018 at 09:58:32, Casey Stella (ceste...@gmail.com) wrote:
>
> To be clear, I'm not even suggesting that we create any tooling here. I'd
> say just a reference to the ES docs and a call-out in Upgrading.md would
> suffice as long as we have some strong reason to believe it'll work. As
> far as I'm concerned, the sooner we're out of the business of transforming
> fields, the better.
>
> On Tue, Jun 5, 2018 at 9:49 AM Justin Leet  wrote:
>
> > ES does have some docs around how this gets handled in upgrades:
> >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html
> >
> > Might be worth taking a look to see what conflicts we'd have going from
> 2.x
> > to 5.x and figuring out where to go from there.
> >
> > On Tue, Jun 5, 2018 at 9:46 AM, Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> > > I guess in principal you could use
> > > https://www.elastic.co/guide/en/elasticsearch/reference/
> > > current/docs-reindex.html#docs-reindex-change-name
> > > to reindex with the new fields. It wouldn't be hard to script up a bit
> of
> > > python to help users out with that, or of course to leave that as an
> > > exercise to the reader. It would be nice to have a script that read
> and
> > > transformed fields for templates and indices to replace the colons
> with
> > > dots in ES.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:40, Casey Stella  wrote:
> > >
> > > > +1 to that, Simon. Do we have a sense of if there are utilities
> > provided
> > > > by ES to do this kind of migration transformation easily?
> > > >
> > > > On Tue, Jun 5, 2018 at 9:37 AM Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > > > I would definitely agree that the transformation should be
> removed.
> > We
> > > > have
> > > > > now however added a complex generic solution in the backend, which
> is
> > > > going
> > > > > to be noop for most people. This was done I believe for the sake
> of
> > > > > backward compatibility. I would argue however, that there is no
> need
> > to
> > > > > support ES 2.3, and therefore no need to support de-dotting
> > > > > transformations. This does seem somewhat over-engineered to me,
> > though
> > > it
> > > > > does save people re-indexing on upgrades. I suspect in reality
> that
> > > this
> > > > is
> > > > > a rare edge case, and that we would do far better to settle on one
> > > > solution
> > > > > (the dotted version, not the colons, to my mind)
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:29, Ryan Merriman 
> wrote:
> > > > >
> > > > > > I agree completely. I will leave this thread open for a day or
> two
> > > to
> > > > > give
> > > > > > others a chance to weigh in. If no one opposes, I will creates
> > Jiras
> > > > for
> > > > > > removing field transformations and transforming existing data.
> > > > > >
> > > > > > On Tue, Jun 5, 2018 at 8:21 AM, Casey Stella 
>
> > > > wrote:
> > > > > >
> > > > > > > Well, on write it is a transformation, on read it's a
> > translation.
> > > > > This
> > > > > > is
> > > > > > > to say that you're providing a mapping on read to translate
> field
> > > > names
> > > > > > > given the index you're using. The other approach that I was
> > > > > considering
> > > > > > > last night is a field transformation REST call which
> translates
> > > field
> > > > > > names
> > > > > > > that the UI could call. So, the UI would pass 'source.type' to
> > the
> > > > > field
> > > > > > > translation service and in 

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

2018-06-05 Thread Casey Stella
I agree with Simon here, the benefit of providing NiFi tooling is to enable
NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar
enrichments, etc).  This would tie it to Metron pretty closely.

On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler  wrote:

> Nifi releases more often then Metron does, that might be an issue.
>
>
> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> To be honest, I would expect this to be heavily linked to the Metron
> releases, since it's going to use other metron classes and dependencies to
> ensure compatibility. For example, a Stellar NiFi processor will be linked
> to Metron's stellar-common, the enrichment loader will depend on key
> construction code from metron-enrichment (and should align to it). I was
> also considering an opinionated PublishMetron which linked to the Metron
> kafka, and hid some of the dances you have to do to make the readMetadata
> functions to work (i.e. some sugar around our mild abuse of kafka keys,
> which prevents people hurting their kafka by choosing the wrong
> partitioner).
>
> To that extent, I think the releases belong with Metron releases, though of
> course that does increase our release and test burden.
>
> On 5 June 2018 at 10:55, Otto Fowler  wrote:
>
> > Similar to Bro, we may need to release out of cycle.
> >
> >
> >
> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > Do you mean in the sense of a separate module, or are you suggesting we
> go
> > as far as a sub-project?
> >
> > On 5 June 2018 at 10:08, Otto Fowler  wrote:
> >
> > > If we do that, we should have it as a separate component maybe.
> > >
> > >
> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > > si...@simonellistonball.com) wrote:
> > >
> > > @otto, well, of course we would use the record api... it's great.
> > >
> > > @casey, I have actually written a stellar processor, which applies
> > stellar
> > > to all FlowFile attributes outputting the resulting stellar variable
> > space
> > > to either attributes or as json in the content.
> > >
> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > > since I'm half way there.
> > >
> > > Simon
> > >
> > >
> > >
> > > On 5 June 2018 at 08:41, Otto Fowler  wrote:
> > >
> > > > We have jiras about ‘diverting’ and reading from nifi flows already
> > > >
> > > >
> > > > On June 5, 2018 at 11:11:45, Casey Stella (ceste...@gmail.com)
> wrote:
> > > >
> > > > I'd be in strong support of that, Simon. I think we should have some
> > > other
> > > > NiFi components in Metron to enable users to interact with our
> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > > etc).
> > > >
> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > > > Do we, the community, think it would be a good idea to create a
> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > > number
> > > > of
> > > > > people want to use NiFi to manage and schedule loading of
> > enrichments
> > > for
> > > > > example.
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:56, Casey Stella  wrote:
> > > > >
> > > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > > construct
> > > > > > the key very specifically in Metron, so it's unlikely to work out
> > of
> > > > the
> > > > > > box with the NiFi processor unfortunately. The key that we use is
> > > > formed
> > > > > > here in the codebase:
> > > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > > >
> > > > > > To put that in english, consider the following:
> > > > > >
> > > > > > - type - The enrichment type
> > > > > > - indicator - the indicator to use
> > > > > > - hash(*) - A murmur 3 128bit hash function
> > >

Re: [DISCUSS] Deprecating metron-api

2018-06-29 Thread Casey Stella
I have no objection and would consider it to be a prerequisite to bringing
in the PR unless there's someone depending on it out there.  You might want
to cc user@ as well, to get a broader set of input for the "are people
using it?" question.

On Fri, Jun 29, 2018 at 5:21 PM Ryan Merriman  wrote:

> We are currently working on adding pcap query capabilities to the Alerts UI
> as part of https://issues.apache.org/jira/browse/METRON-1554.  This
> involves exposing pcap endpoints in our REST application which will make
> metron-api obsolete.
>
> Is anyone currently using this module?  Are there any objections to
> deprecating it and removing it from our codebase once this feature branch
> is complete?
>


Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

2018-06-22 Thread Casey Stella
Hey Michel,

Those are good questions and there were some reasons surrounding that.  In
fact, historically, we had fewer topologies (e.g. indexing and enrichment
were merged). Even earlier on, we had just one giant topology per parser
that enriched and indexed.  The long story short is that we moved this way
because we saw how people were using metron and we gained more insight
tuning Metron.  That led us down this architectural path.

Some of the reasons that we went this way:

   - Fewer large topologies were a nightmare to tune
  - Enrichment would have different memory requirements than, say,
  parsers or indexing
  - You can adjust the kafka topic params per topology to adjust the
  number of partitions, etc.
   - Having the separate topologies gives a natural set of extension points
   for customization and enhancement (e.g. you want a phase between parsing
   and enrichment).
   - Decoupling the topologies lets us spin up and down parts of Metron
   without affecting others (e.g. you don't have to take down enrichments to
   add a parser, even for a moment)
   - The movement to Flux meant we were limited in how much we could adjust
   the topology at runtime (e.g. colocating parsers and enrichment would mean
   moving away from flux essentially as the topology changes its structure)

Best,

Casey


On Fri, Jun 22, 2018 at 5:25 PM Michel Sumbul 
wrote:

> Hi Everyone,
>
> I was asking myself what was the architectural reason to split the
> ingestion in metron in 4 differents toppologies that all read/write to
> kafka?
>
> For example, why the parsing and enrichment topologies have not been
> merged? Would it not be possible when you parse the message to directly
> enricht it?
>
> Im asking that because splitting in several topologies means that all of
> the topologies read/write to Kafka, which produce a bigger load on the
> kafka cluster and then a need for way more infrastructure/servers. The cost
> is especially true when we speak about TBs of data ingested every day.
>
> Im sure there were a very good reason, I was just curious.
>
> Thanks,
> Michel
>


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Casey Stella
should the need arise.
> >
> > In summary, my impressions are that at this point the features and level
> of
> > abstraction feel appropriate to me. I think it buys us 1) learning from a
> > starting typosquatting use case, 2) flexibility to change and adapt it
> > without affecting users, and 3) enough concrete capability to make more
> > specific use cases easy to deliver with a UI.
> >
> > Cheers,
> > Mike
> >
> > On Jan 4, 2018 9:59 AM, "Casey Stella" <ceste...@gmail.com> wrote:
> >
> > > It also occurs to me that even in this situation, it's not a sufficient
> > > generalization for just Bloom, but this is a bloom filter of the output
> > of
> > > the all the typosquatted domains for the domain in each row. If we
> > wanted
> > > to hard code, we'd have to hard code specifically the bloom filter
> *for*
> > > typosquatting use-case. Hard coding this would prevent things like
> bloom
> > > filters containing malicious IPs from a reference source, for instance.
> > >
> > > On Thu, Jan 4, 2018 at 10:46 AM, Casey Stella <ceste...@gmail.com>
> > wrote:
> > >
> > > > So, there is value outside of just bloom usage. The most specific
> > > example
> > > > of this would be in order to configure a bloom filter, we need to
> know
> > at
> > > > least an upper bound of the number of items that are going to be
> added
> > to
> > > > the bloom filter. In order to do that, we need to count the number of
> > > > typosquatted domains. Specifically at https://github.com/
> > > > cestella/incubator-metron/tree/typosquat_merge/use-
> > > > cases/typosquat_detection#configure-the-bloom-filter you can see how
> > we
> > > > use the CONSOLE writer with an extractor config to count the number
> of
> > > > typosquatted domains in the alexa top 10k dataset so we can size the
> > > filter
> > > > appropriately.
> > > >
> > > > I'd argue that other types of probabalistic data structures could
> also
> > > > make sense here as well, like statistical sketches. Consider, for
> > > instance,
> > > > a cheap and dirty DGA indicator where we take the Alexa top 1M and
> look
> > > at
> > > > the distribution of shannon entropy in the domains. If the shannon
> > > entropy
> > > > of a domain going across metron is more than 5 std devs from the
> mean,
> > > that
> > > > could be circumstantial evidence of a malicious attack. This would
> > > yield a
> > > > lot of false positives, but used in conjunction with other indicators
> > it
> > > > could be valuable.
> > > >
> > > > Computing that would be as follows:
> > > >
> > > > {
> > > > "config" : {
> > > > "columns" : {
> > > > "rank" : 0,
> > > > "domain" : 1
> > > > },
> > > > "value_transform" : {
> > > > "domain" : "DOMAIN_REMOVE_TLD(domain)"
> > > > },
> > > > "value_filter" : "LENGTH(domain) > 0",
> > > > "state_init" : "STATS_INIT()",
> > > > "state_update" : {
> > > > "state" : "STATS_ADD(state, STRING_ENTROPY(domain))"
> > > > },
> > > > "state_merge" : "STATS_MERGE(states)",
> > > > "separator" : ","
> > > > },
> > > > "extractor" : "CSV"
> > > > }
> > > >
> > > > Also, for another example, imagine a situation where we have a
> > SPARK_SQL
> > > > engine rather than just LOCAL for summarizing. We could create a
> > general
> > > > summary of URL lengths in bro data which could be used for
> determining
> > if
> > > > someone is trying to send in very large URLs maliciously (see Jon
> > > Zeolla's
> > > > concerns in https://issues.apache.org/jira/browse/METRON-517 for a
> > > > discussion of this). In order to do that, we could simply execute:
> > > >
> > > > $METRON_HOME/bin/flatfile_summarizer.sh -i "select uri from bro" -o
> > > /tmp/reference/bro_uri_distribution.ser -e ~/uri_length_extractor.json
> > -p
> > > 5 -om HDFS -m SPARK_SQL
> > > >
> > > > with uri_length_extractor.json containing:
> >

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-05 Thread Casey Stella
Well, you can pull the default configs from global configs, but you might
want to override them (similar to the profiler).  For instance, you might
want to interact with another hbase table than the one globally configured.

On Fri, Jan 5, 2018 at 12:04 PM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> I would imagine the ‘stellar-object-repo’ would be part of the global
> configuration or configuration passed to the command.
> why specify in the function itself?
>
>
>
>
> On January 5, 2018 at 11:22:32, Casey Stella (ceste...@gmail.com) wrote:
>
> I like that, specifically the repositories abstraction. Perhaps we can
> construct some longer term JIRAs for extensions.
> For the current state of affairs (wrt to the OBJECT_GET call) I was
> imagining the simple default HDFS solution as a first cut and
> following on adding a repository name (e.g. OBJECT_GET(path, repo_name)
> with repo_name being optional and defaulting to HDFS
> for backwards compatibility.
>
> In effect, this would be the next step that I'm proposing
> OBJECT_GET(paths,
> repo_name, repo_config) which would be backwards compatible
>
> - paths - a single path or a list of paths (if a list, then a list of
> objects returned)
> - repo_name - optional name for repo, defaulted to HDFS if we don't
> specify
> - repo_config - optional config map
>
>
> This would open things like:
>
> - OBJECT_GET('key', 'HBASE', { 'hbase.table' : 'table', 'hbase.cf' :
> 'cf'} ) -- pulling from HBase
>
> Eventually we might also be able to fold ENRICHMENT_GET as just a special
> repo instance.
>
> On Fri, Jan 5, 2018 at 10:26 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > If we separate the concerns as I have state previously :
> >
> > 1. Stellar can load objects into ‘caches’ from some repository and refer
> to
> > them.
> > 2. The repositories
> > 3. Some number of strategies to populate and possibly update the
> > repository, from spark,
> > to MR jobs to whatever you would classify the flat file stuff as.
> > 4. Let the Stellar API for everything but LOAD() follow after we get
> usage
> >
> > Then the particulars of ‘3’ are less important.
> >
> >
> >
> > On January 5, 2018 at 09:02:41, Justin Leet (justinjl...@gmail.com)
> wrote:
> >
> > I agree with the general sentiment that we can tailor specific use cases
> > via UI, and I'm worried that the use case specific solution
> (particularly
> > in light of the note that it's not even general to the class of bloom
> > filter problems, let alone an actually general problem) becomes more
> work
> > than this as soon as about 2 more uses cases actually get realized.
> > Pushing that to the UI lets people solve a variety of problems if they
> > really want to dig in, while still giving flexibility to provide a more
> > tailored experience for what we discover the 80% cases are in practice.
> >
> > Keeping in mind I am mostly unfamiliar with the extractor config itself,
> I
> > am wondering if it makes sense to split up the config a bit. While a lot
> > of implementation details are shared, maybe the extractor config itself
> > should be refactored into a couple parts analogous to ETL (as a follow
> on
> > task, I think if this is true, it predates Casey's proposed change). It
> > doesn't necessarily make it less complex, but it might make it more
> easily
> > digestible if it's split up by idea (parsing, transformation, etc.).
> >
> > Re: Mike's point, I don't think we want the actual processing broken up
> as
> > ETL, but the representation to the user in terms of configuration could
> be
> > similar (Since we're already doing parsing and transformation). We don't
> > have to implement it as an ETL pipeline, but it does potentially offer
> the
> > user a way to quickly grasp what the JSON blob is actually specifying.
> > Making it easy to understand, even if it's not the ideal way to interact
> is
> > potentially still a win.
> >
> > On Thu, Jan 4, 2018 at 1:28 PM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I mentioned this earlier, but I'll reiterate that I think this
> approach
> > > gives us the ability to make specific use cases via a UI, or other
> > > interface should we choose to add one, while keeping the core
> adaptable
> > and
> > > flexible. This is ideal for middle tier as I think this effectively
> gives
> > > us the ability to pivot to other use cases very easily while not being
> so
> > > generic as to be useless. The fact that you were able to create this
&g

[DISCUSS] Generating and Interacting with serialized summary objects

2017-12-24 Thread Casey Stella
Hi all,

I wanted to get some feedback on a sensible plan for something.  It
occurred to me the other day when considering the use-case of detecting
typosquatted domains, that one approach was to generate the set of
typosquatted domains for some set of reference domains and compare domains
as they flow through.

One way we could do this would be to generate this data and import the
typosquatted domains into HBase.  I thought, however, that another approach
which may trade-off accuracy to remove the network hop and potential disk
seek by constructing a bloom filter that includes the set of typosquatted
domains.

The challenge was that we don't have a way to do this currently.  We do,
however, have a loading infrastructure (e.g. the flatfile_loader) and
configuration (see
https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#common-extractor-properties)
which handles:

   - parsing flat files
   - transforming the rows
   - filtering the rows

To enable the new use-case of generating a summary object (e.g. a bloom
filter), in METRON-1378 (https://github.com/apache/metron/pull/879) I
propose that we create a new utility that uses the same extractor config
add the ability to:

   - initialize a state object
   - update the object for every row
   - merge the state objects (in the case of multiple threads, in the case
   of one thread it's not needed).

I think this is a sensible decision because:

   - It's a minimal movement from the flat file loader
  - Uses the same configs
  - Abstracts and reuses the existing infrastructure
   - Having one extractor config means that it should be easier to generate
   a UI around this to simplify the experience

All that being said, our extractor config is..shall we say...daunting :).
I am sensitive to the fact that this adds to an existing difficult config.
I propose that this is an initial step forward to support the use-case and
we can enable something more composable going forward.  My concern in
considering this as the first step was that it felt that the composable
units for data transformation and manipulation suddenly takes us into a
place where Stellar starts to look like Pig or Spark RDD API.  I wasn't
ready for that without a lot more discussion.

To summarize, what I'd like to get from the community is, after reviewing
the entire use-case at
https://github.com/cestella/incubator-metron/tree/typosquat_merge/use-cases/typosquat_detection
:

   - Is this so confusing that it does not belong in Metron even as a
   first-step?
   - Is there a way to extend the extractor config in a less confusing way
   to enable this?

I apologize for making the discuss thread *after* the JIRAs, but I felt
this one might bear having some working code to consider.


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2017-12-24 Thread Casey Stella
I'll start this discussion off with my idea around a 2nd step that is more
adaptable.  I propose the following set of stellar functions backed by
Spark in the metron-management project:

   - CSV_PARSE(location, separator?, columns?) : Constructs a Spark
   Dataframe for reading the flatfile
   - SQL_TRANSFORM(dataframe, spark sql statement): Transforms the dataframe
   - SUMMARIZE(state_init, state_update, state_merge): Summarize the
   dataframe using the lambda functions:
  - state_init - executed once per worker to initialize the state
  - state_update - executed once per row
  - state_merge - Merge the worker states into one worker state
   - OBJECT_SAVE(obj, output_path) : Save the object obj to the path
   output_path on HDFS.

This would enable more flexibility and composibility than the
configuration-based approach that we have in the flatfile loader.
My concern with this approach, and the reason I didn't do it initially, was
that I think that users will want at least 2 ways to summarize data (or
load data):

   - A configuration based approach, which enables a UI
   - A set of stellar functions via the scriptable REPL

I would argue that both have a place and I started with the configuration
based approach as it was a more natural extension of what we already had.
I'd love to hear thoughts about this idea too.


On Sun, Dec 24, 2017 at 8:20 PM, Casey Stella <ceste...@gmail.com> wrote:

> Hi all,
>
> I wanted to get some feedback on a sensible plan for something.  It
> occurred to me the other day when considering the use-case of detecting
> typosquatted domains, that one approach was to generate the set of
> typosquatted domains for some set of reference domains and compare domains
> as they flow through.
>
> One way we could do this would be to generate this data and import the
> typosquatted domains into HBase.  I thought, however, that another approach
> which may trade-off accuracy to remove the network hop and potential disk
> seek by constructing a bloom filter that includes the set of typosquatted
> domains.
>
> The challenge was that we don't have a way to do this currently.  We do,
> however, have a loading infrastructure (e.g. the flatfile_loader) and
> configuration (see https://github.com/apache/metron/tree/master/metron-
> platform/metron-data-management#common-extractor-properties)  which
> handles:
>
>- parsing flat files
>- transforming the rows
>- filtering the rows
>
> To enable the new use-case of generating a summary object (e.g. a bloom
> filter), in METRON-1378 (https://github.com/apache/metron/pull/879) I
> propose that we create a new utility that uses the same extractor config
> add the ability to:
>
>- initialize a state object
>- update the object for every row
>- merge the state objects (in the case of multiple threads, in the
>case of one thread it's not needed).
>
> I think this is a sensible decision because:
>
>- It's a minimal movement from the flat file loader
>   - Uses the same configs
>   - Abstracts and reuses the existing infrastructure
>- Having one extractor config means that it should be easier to
>generate a UI around this to simplify the experience
>
> All that being said, our extractor config is..shall we say...daunting :).
> I am sensitive to the fact that this adds to an existing difficult config.
> I propose that this is an initial step forward to support the use-case and
> we can enable something more composable going forward.  My concern in
> considering this as the first step was that it felt that the composable
> units for data transformation and manipulation suddenly takes us into a
> place where Stellar starts to look like Pig or Spark RDD API.  I wasn't
> ready for that without a lot more discussion.
>
> To summarize, what I'd like to get from the community is, after reviewing
> the entire use-case at https://github.com/cestella/incubator-metron/tree/
> typosquat_merge/use-cases/typosquat_detection:
>
>- Is this so confusing that it does not belong in Metron even as a
>first-step?
>- Is there a way to extend the extractor config in a less confusing
>way to enable this?
>
> I apologize for making the discuss thread *after* the JIRAs, but I felt
> this one might bear having some working code to consider.
>


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2017-12-24 Thread Casey Stella
Oh, one more thing, while the example here is around typosquatting, this is
of use outside of that.  Pretty much any large existence-style query can be
enabled via this construction (create a summary bloom filter).  There are
other use-cases involving other data structures too.

On Sun, Dec 24, 2017 at 8:20 PM, Casey Stella <ceste...@gmail.com> wrote:

> Hi all,
>
> I wanted to get some feedback on a sensible plan for something.  It
> occurred to me the other day when considering the use-case of detecting
> typosquatted domains, that one approach was to generate the set of
> typosquatted domains for some set of reference domains and compare domains
> as they flow through.
>
> One way we could do this would be to generate this data and import the
> typosquatted domains into HBase.  I thought, however, that another approach
> which may trade-off accuracy to remove the network hop and potential disk
> seek by constructing a bloom filter that includes the set of typosquatted
> domains.
>
> The challenge was that we don't have a way to do this currently.  We do,
> however, have a loading infrastructure (e.g. the flatfile_loader) and
> configuration (see https://github.com/apache/metron/tree/master/metron-
> platform/metron-data-management#common-extractor-properties)  which
> handles:
>
>- parsing flat files
>- transforming the rows
>- filtering the rows
>
> To enable the new use-case of generating a summary object (e.g. a bloom
> filter), in METRON-1378 (https://github.com/apache/metron/pull/879) I
> propose that we create a new utility that uses the same extractor config
> add the ability to:
>
>- initialize a state object
>- update the object for every row
>- merge the state objects (in the case of multiple threads, in the
>case of one thread it's not needed).
>
> I think this is a sensible decision because:
>
>- It's a minimal movement from the flat file loader
>   - Uses the same configs
>   - Abstracts and reuses the existing infrastructure
>- Having one extractor config means that it should be easier to
>generate a UI around this to simplify the experience
>
> All that being said, our extractor config is..shall we say...daunting :).
> I am sensitive to the fact that this adds to an existing difficult config.
> I propose that this is an initial step forward to support the use-case and
> we can enable something more composable going forward.  My concern in
> considering this as the first step was that it felt that the composable
> units for data transformation and manipulation suddenly takes us into a
> place where Stellar starts to look like Pig or Spark RDD API.  I wasn't
> ready for that without a lot more discussion.
>
> To summarize, what I'd like to get from the community is, after reviewing
> the entire use-case at https://github.com/cestella/incubator-metron/tree/
> typosquat_merge/use-cases/typosquat_detection:
>
>- Is this so confusing that it does not belong in Metron even as a
>first-step?
>- Is there a way to extend the extractor config in a less confusing
>way to enable this?
>
> I apologize for making the discuss thread *after* the JIRAs, but I felt
> this one might bear having some working code to consider.
>


Anand is a new Committer!

2018-01-11 Thread Casey Stella
The Project Management Committee (PMC) for Apache Metron has invited Anand
Subramanian to become a committer and we are pleased to announce that they
have accepted.

Congratulations and welcome, Anand!


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-04 Thread Casey Stella
 happen in this context,
> unless
> >>>  you’re talking about pushing to something like livy for example
> (eminently
> >>>  sensible for things like cross instance caching and faster RPC-ish
> access
> >>>  to an existing spark context which seem to be what Casey is driving
> at with
> >>>  the spark piece.
> >>>
> >>>  To address the question of text manipulation in Stellar / metron
> >>>  enrichment ingest etc, we already have this outside of the context of
> the
> >>>  issues here. I would argue that yes, we don’t want too many paths for
> this,
> >>>  and that maybe our parser approach might be heavily related to
> text-based
> >>>  ingest. I would say the scope worth dealing with here though is not
> really
> >>>  text manipulation, but summarisation, which is not well served by
> existing
> >>>  CLI tools like awk / sed and friends.
> >>>
> >>>  Simon
> >>>
> >>>  > On 3 Jan 2018, at 15:48, Nick Allen <n...@nickallen.org> wrote:
> >>>  >
> >>>  >> Even with 5 threads, it takes an hour for the full Alexa 1m, so I
> >>>  think
> >>>  > this will impact performance
> >>>  >
> >>>  > What exactly takes an hour? Adding 1M entries to a bloom filter?
> That
> >>>  > seems really high, unless I am not understanding something.
> >>>  >
> >>>  >
> >>>  >
> >>>  >
> >>>  >
> >>>  >
> >>>  > On Wed, Jan 3, 2018 at 10:17 AM, Casey Stella <ceste...@gmail.com>
> >>>  wrote:
> >>>  >
> >>>  >> Thanks for the feedback, Nick.
> >>>  >>
> >>>  >> Regarding "IMHO, I'd rather not reinvent the wheel for text
> >>>  manipulation."
> >>>  >>
> >>>  >> I would argue that we are not reinventing the wheel for text
> >>>  manipulation
> >>>  >> as the extractor config exists already and we are doing a similar
> >>>  thing in
> >>>  >> the flatfile loader (in fact, the code is reused and merely
> extended).
> >>>  >> Transformation operations are already supported in our codebase in
> the
> >>>  >> extractor config, this PR has just added some hooks for stateful
> >>>  >> operations.
> >>>  >>
> >>>  >> Furthermore, we will need a configuration object to pass to the
> REST
> >>>  call
> >>>  >> if we are ever to create a UI around importing data into hbase or
> >>>  creating
> >>>  >> these summary objects.
> >>>  >>
> >>>  >> Regarding your example:
> >>>  >> $ cat top-1m.csv | awk -F, '{print $2}' | sed '/^$/d' | stellar -i
> >>>  >> 'DOMAIN_REMOVE_TLD(_)' | stellar -i 'BLOOM_ADD(_)'
> >>>  >>
> >>>  >> I'm very sympathetic to this type of extension, but it has some
> issues:
> >>>  >>
> >>>  >> 1. This implies a single-threaded addition to the bloom filter.
> >>>  >> 1. Even with 5 threads, it takes an hour for the full alexa 1m,
> >>>  so I
> >>>  >> think this will impact performance
> >>>  >> 2. There's not a way to specify how to merge across threads if we
> >>>  do
> >>>  >> make a multithread command line option
> >>>  >> 2. This restricts these kinds of operations to roles with heavy
> unix
> >>>  CLI
> >>>  >> knowledge, which isn't often the types of people who would be doing
> >>>  this
> >>>  >> type of operation
> >>>  >> 3. What if we need two variables passed to stellar?
> >>>  >> 4. This approach will be harder to move to Hadoop. Eventually we
> >>>  will
> >>>  >> want to support data on HDFS being processed by Hadoop (similar to
> >>>  >> flatfile
> >>>  >> loader), so instead of -m LOCAL being passed for the flatfile
> >>>  summarizer
> >>>  >> you'd pass -m SPARK and the processing would happen on the cluster
> >>>  >> 1. This is particularly relevant in this case as it's a
> >>>  >> embarrassingly parallel problem in general
> >>>  >>
> >>>  >> In summary, while this a CLI approach is attractive, I prefer th

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-04 Thread Casey Stella
It also occurs to me that even in this situation, it's not a sufficient
generalization for just Bloom, but this is a bloom filter of the output of
the all the typosquatted domains for the domain in each row.  If we wanted
to hard code, we'd have to hard code specifically the bloom filter *for*
typosquatting use-case.  Hard coding this would prevent things like bloom
filters containing malicious IPs from a reference source, for instance.

On Thu, Jan 4, 2018 at 10:46 AM, Casey Stella <ceste...@gmail.com> wrote:

> So, there is value outside of just bloom usage.  The most specific example
> of this would be in order to configure a bloom filter, we need to know at
> least an upper bound of the number of items that are going to be added to
> the bloom filter.  In order to do that, we need to count the number of
> typosquatted domains.  Specifically at https://github.com/
> cestella/incubator-metron/tree/typosquat_merge/use-
> cases/typosquat_detection#configure-the-bloom-filter you can see how we
> use the CONSOLE writer with an extractor config to count the number of
> typosquatted domains in the alexa top 10k dataset so we can size the filter
> appropriately.
>
> I'd argue that other types of probabalistic data structures could also
> make sense here as well, like statistical sketches. Consider, for instance,
> a cheap and dirty DGA indicator where we take the Alexa top 1M and look at
> the distribution of shannon entropy in the domains.  If the shannon entropy
> of a domain going across metron is more than 5 std devs from the mean, that
> could be circumstantial evidence of a malicious attack.  This would yield a
> lot of false positives, but used in conjunction with other indicators it
> could be valuable.
>
> Computing that would be as follows:
>
> {
>   "config" : {
> "columns" : {
>"rank" : 0,
>"domain" : 1
> },
> "value_transform" : {
>"domain" : "DOMAIN_REMOVE_TLD(domain)"
> },
> "value_filter" : "LENGTH(domain) > 0",
> "state_init" : "STATS_INIT()",
> "state_update" : {
>"state" : "STATS_ADD(state, STRING_ENTROPY(domain))"
>  },
> "state_merge" : "STATS_MERGE(states)",
> "separator" : ","
>   },
>   "extractor" : "CSV"
> }
>
> Also, for another example, imagine a situation where we have a SPARK_SQL
> engine rather than just LOCAL for summarizing.  We could create a general
> summary of URL lengths in bro data which could be used for determining if
> someone is trying to send in very large URLs maliciously (see Jon Zeolla's
> concerns in https://issues.apache.org/jira/browse/METRON-517 for a
> discussion of this).  In order to do that, we could simply execute:
>
> $METRON_HOME/bin/flatfile_summarizer.sh -i "select uri from bro" -o 
> /tmp/reference/bro_uri_distribution.ser -e ~/uri_length_extractor.json -p 5 
> -om HDFS -m SPARK_SQL
>
> with uri_length_extractor.json containing:
>
> {
>   "config" : {
> "value_filter" : "LENGTH(uri) > 0",
> "state_init" : "STATS_INIT()",
> "state_update" : {
>"state" : "STATS_ADD(state, LENGTH(uri))"
>  },
> "state_merge" : "STATS_MERGE(states)",
> "separator" : ","
>   },
>   "extractor" : "SQL_ROW"
> }
>
>
> Regarding value filter, that's already around in the extractor config
> because of the need to transform data in the flatfile loader.  While I
> definitely see the desire to use unix tools to prep data, there are some
> things that aren't as easy to do.  For instance, here, removing the TLD of
> a domain is not a trivial task in a shell script and we have existing
> functions for that in Stellar.  I would see people using both.
>
> To address the issue of a more targeted experience to bloom, I think that
> sort of specialization should best exist in the UI layer.  Having a more
> complete and expressive backend reused across specific UIs seems to be the
> best of all worlds.  It allows power users to drop down and do more complex
> things and still provides a (mostly) code-free and targeted experience for
> users.  It seems to me that limiting the expressibility in the backend
> isn't the right way to go since this work just fits in with our existing
> engine.
>
>
> On Thu, Jan 4, 2018 at 1:40 AM, James Sirota <jsir...@apache.org> wrote:
>
>> I just went through these pull requests as well and also a

Re: Full Dev -> Heartbeat issues

2018-01-08 Thread Casey Stella
I haven't seen that one.  I spun one up from master on Friday and it seemed
ok.  Sorry, "works for me!" isn't super helpful, but it may be relevant
since master is close to 0.4.2 :)

On Mon, Jan 8, 2018 at 11:11 AM, Otto Fowler 
wrote:

> I just started up full dev from the 0.4.2 release tag, and ended up with
> failed heartbeats for all my services in ambari.
> After investigation, I found the my /etc/hosts ( on node1 ) had multiple
> entries for node1 :
>
> [vagrant@node1 ~]$ cat /etc/hosts
> 127.0.0.1 node1 node1
> 127.0.0.1   localhost
>
> ## vagrant-hostmanager-start
> 192.168.66.121 node1
>
> ## vagrant-hostmanager-end
>
> After removing the 127.0.0.1 node1 node1 line and restarting the machine +
> all the services etc my issues are resolved and my board is green.
>
> I am not sure why this may happen.
> Hopefully if you are seeing this, this will help.
>
> Anyone know why this may happen?
>
>
> ottO
>
>
>


Re: Travis for Apache/Metron is in trouble

2018-01-18 Thread Casey Stella
I made an infra ticket: https://issues.apache.org/jira/browse/INFRA-15865

On Thu, Jan 18, 2018 at 11:42 AM, Otto Fowler 
wrote:

> 24hr long build is blocking up master’s travis build.
> Who can nuke it?
>
> ottO
>


Re: Some more upgrade fallout... Can't restart Metron Indexing

2018-01-18 Thread Casey Stella
So, the challenge here is that our install script isn't smart enough right
now to skip creating tables that are already created.  One thing you could
do is

   1. rename the hbase tables for metron (see
   
https://stackoverflow.com/questions/27966072/how-do-you-rename-a-table-in-hbase
   )
   2. let the install create them anew
   3. stop metron
   4. delete the new empty hbase tables
   5. swap in the old tables
   6. start metron

What we probably should do is not barf if the tables exist, but rather warn.

On Thu, Jan 18, 2018 at 12:02 PM, Laurens Vets  wrote:

> After upgrading from 0.4.1 to 0.4.2, I can't seem to start or restart
> Metron Indexing. I get the following errors:
>
> stderr:   /var/lib/ambari-agent/data/errors-2468.txt
>
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 160, in 
> Indexing().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 280, in execute
> method(env)
>   File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 82, in start
> self.configure(env)
>   File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_master.py", line 72, in configure
> commands.create_hbase_tables()
>   File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.2/pa
> ckage/scripts/indexing_commands.py", line 126, in create_hbase_tables
> user=self.__params.hbase_user
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 155, in __init__
> self.env.run()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 160, in run
> self.run_action(resource, action)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 124, in run_action
> provider_action()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
> line 273, in action_run
> tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 70, in inner
> result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 92, in checked_call
> tries=tries, try_sleep=try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 140, in _call_wrapper
> result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 293, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'echo
> "create 'metron_update','t'" | hbase shell -n' returned 1. ERROR
> RuntimeError: Table already exists: metron_update!
>
> stdout:   /var/lib/ambari-agent/data/output-2468.txt
>
> 2018-01-18 16:54:30,101 - Using hadoop conf dir:
> /usr/hdp/current/hadoop-client/conf
> 2018-01-18 16:54:30,301 - Using hadoop conf dir:
> /usr/hdp/current/hadoop-client/conf
> 2018-01-18 16:54:30,302 - Group['metron'] {}
> 2018-01-18 16:54:30,303 - Group['livy'] {}
> 2018-01-18 16:54:30,303 - Group['elasticsearch'] {}
> 2018-01-18 16:54:30,303 - Group['spark'] {}
> 2018-01-18 16:54:30,303 - Group['zeppelin'] {}
> 2018-01-18 16:54:30,304 - Group['hadoop'] {}
> 2018-01-18 16:54:30,304 - Group['kibana'] {}
> 2018-01-18 16:54:30,304 - Group['users'] {}
> 2018-01-18 16:54:30,304 - User['hive'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,305 - User['storm'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,306 - User['zookeeper'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,306 - User['infra-solr'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,307 - User['ams'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,307 - User['tez'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['users']}
> 2018-01-18 16:54:30,308 - User['zeppelin'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,309 - User['metron'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,309 - User['livy'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,310 - User['elasticsearch'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,310 - User['spark'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
> 2018-01-18 16:54:30,311 - User['ambari-qa'] {'gid': 'hadoop',
> 'fetch_nonlocal_groups': True, 'groups': ['users']}
> 2018-01-18 

Re: [DISCUSS] Upgrading Solr

2018-01-18 Thread Casey Stella
+1 to both the feature branch and user@ announcement.

On Thu, Jan 18, 2018 at 2:45 PM, Otto Fowler 
wrote:

> +1 to the feature branch.
>
> Also, there have been some questions about solr support recently, I think
> when the feature branch
> is ready you should announce it on user@ too, we may get some help from
> folks looking for this.
>
>
>
> On January 18, 2018 at 14:26:14, Justin Leet (justinjl...@gmail.com)
> wrote:
>
> Now that we have ES at a modern version, we should consider bringing Solr
> to a modern version as well.
>
> The focus of this work would be to get us in a place where Solr is
> upgraded, along with the related work of building out the Solr
> functionality to parity with Elasticsearch. The goal would not be to add
> net new functionality, just to get Solr and ES in the same place for the
> alerts UI and REST interface. Additionally, it would include the various
> supporting necessities such as ensuring associated DAOs are testable, and
> so on.
>
> Given the testing, reviewing, and iteration involved, I'd like to propose
> doing this work in a feature in a feature branch.
>
> Jiras would be created based on this discussion once it dies down a bit.
>


Re: [DISCUSS] Time to remove github updates from dev?

2018-01-19 Thread Casey Stella
I could get behind that.

On Fri, Jan 19, 2018 at 3:31 PM, Andre  wrote:

> Folks,
>
> May I suggest Metron follows the NiFi mailing list strategy (we got
> inspired by another project but I don't recall the name) and remove the
> github comments from the dev list?
>
> Within NiFi we have both the dev and the issues lists. dev is for humans,
> issues is for JIRA and github commits.[1]
>
> This allows the list thread list to be cleaner and is particularly helpful
> for those reading the list from a list aggregation service.
>
> Cheers
>
>
> [1] https://lists.apache.org/list.html?iss...@nifi.apache.org
>


Re: Metron User Community Meeting Call

2018-01-26 Thread Casey Stella
I can't wait!  This is going to be really cool :)

On Fri, Jan 26, 2018 at 5:25 PM, James Sirota  wrote:

> Yeah very interested in the presentation as well
>
> 26.01.2018, 15:15, "Simon Elliston Ball" :
> > This is going to be a really exciting call. Looking forward to seeing
> how the GCR Canary sings :)
> >
> > I’m going to volunteer https://hortonworks.zoom.us/my/simonellistonball
> as a location for the meeting.
> >
> > I would also support the idea of a quick poll on what people are doing
> with Metron, and maybe if anyone wants to volunteer at the end of the
> meeting it would be great to have an open mic of use cases.
> >
> > Talk to you all Wednesday.
> >
> > Simon
> >
> >>  On 26 Jan 2018, at 22:10, Seal, Steve  wrote:
> >>
> >>  HI all,
> >>
> >>  I have several people on my team that are looking forward to hearing
> about Ahmed’s work.
> >>
> >>  Steve
> >>
> >>  From: Daniel Schafer [mailto:daniel.scha...@sstech.us]
> >>  Sent: Friday, January 26, 2018 5:05 PM
> >>  To: u...@metron.apache.org; dev@metron.apache.org
> >>  Subject: Re: Metron User Community Meeting Call
> >>
> >>  My team members and me would like to join as well.
> >>  We can provide Zoom Meeting login if necessary.
> >>
> >>  Thanks
> >>
> >>  Daniel
> >>  7134806608
> >>
> >>  From: Ahmed Shah  carleton.ca>>
> >>  Reply-To: "u...@metron.apache.org " <
> u...@metron.apache.org >
> >>  Date: Friday, January 26, 2018 at 2:06 PM
> >>  To: "dev@metron.apache.org " <
> dev@metron.apache.org >, "
> u...@metron.apache.org " <
> u...@metron.apache.org >
> >>  Subject: Re: Metron User Community Meeting Call
> >>
> >>  Looking forward to presenting!
> >>
> >>  Just a thought...
> >>  In advanced should we create a Google Forms to collect survey data on
> who is using Metron, how they are using it, ext.. and present the results
> to the group?
> >>
> >>  -Ahmed
> >>  ___
> >>  Ahmed Shah (PMP, M. Eng.)
> >>  Cybersecurity Analyst & Developer
> >>  GCR - Cybersecurity Operations Center
> >>  Carleton University - cugcr.com  proofpoint.com/v2/url?u=https-3A__cugcr.com_tiki_lce_index.
> php=DwMGaQ=H50I6Bh8SW87d_bXfZP_8g=yeB_CytRmKpr9adMUN0qfcwJfnmWAQuHY9
> inQHsSRow=1J5p3hWBZj3Fc4Xy-CytnTi_kafYqRMsY-Ntvr5HlHw=
> Pj0RGStdqj0bZkCYqDZCE_ZA1mRVP-jN6kxxYqgzK2E=>
> >>
> >>  From: Andrew Psaltis >
> >>  Sent: January 26, 2018 1:53 PM
> >>  To: dev@metron.apache.org 
> >>  Subject: Re: Metron User Community Meeting Call
> >>
> >>  Count me in. Very interested to hear about Ahmed's journey.
> >>
> >>  On Fri, Jan 26, 2018 at 8:58 AM, Kyle Richardson <
> kylerichards...@gmail.com >
> >>  wrote:
> >>
> >>  > Thanks! I'll be there. Excited to hear Ahmed's successes and
> challenges.
> >>  >
> >>  > -Kyle
> >>  >
> >>  > On Thu, Jan 25, 2018 at 7:44 PM zeo...@gmail.com  zeo...@gmail.com> > wrote:
> >>  >
> >>  > > Thanks Otto, I'm in to attend at that time/place.
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > On Thu, Jan 25, 2018, 14:45 Otto Fowler  > wrote:
> >>  > >
> >>  > >> I would like to propose a Metron user community meeting. I
> propose that
> >>  > >> we set the meeting next week, and will throw out Wednesday,
> January
> >>  > 31st at
> >>  > >> 09:30AM PST, 12:30 on the East Coast and 5:30 in London Towne.
> This
> >>  > meeting
> >>  > >> will be held over a web-ex, the details of which will be included
> in the
> >>  > >> actual meeting notice.
> >>  > >> Topics
> >>  > >>
> >>  > >> We have a volunteer for a community member presentation:
> >>  > >>
> >>  > >> Ahmed Shah (PMP, M. Eng.) Cybersecurity Analyst & Developer GCR -
> >>  > >> Cybersecurity Operations Center Carleton University - cugcr.com <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__cugcr.com=DwQGaQ=
> H50I6Bh8SW87d_bXfZP_8g=yeB_CytRmKpr9adMUN0qfcwJfnmWAQuHY9
> inQHsSRow=1J5p3hWBZj3Fc4Xy-CytnTi_kafYqRMsY-Ntvr5HlHw=
> d7cvqZL6hK21y2Y3YW0B49AlEgsICM0D9An4huvIsUI=>
> >>  > >>
> >>  > >> Ahmed would like to talk to the community about
> >>  > >>
> >>  > >> -
> >>  > >>
> >>  > >> Who the GCR group is
> >>  > >> -
> >>  > >>
> >>  > >> How they use Metron 0.4.1
> >>  > >> -
> >>  > >>
> >>  > >> Walk through their dashboards, UI management screen, nifi
> >>  > >> -
> >>  > >>
> >>  > >> Challenges we faced up until now
> >>  > >>
> >>  > >> I would like to thank Ahmed for stepping forward for this meeting.
> >>  > >>
> >>  > >> If you have something you would like to present or 

Re: [DISCUSS] Move SHELL type functions from management to stellar common

2018-01-31 Thread Casey Stella
I'd be in favor of that.  That is general purpose stuff.

On Wed, Jan 31, 2018 at 9:12 AM, Otto Fowler 
wrote:

> Per:  https://issues.apache.org/jira/browse/METRON-876
>
> I think we should move the shell/console type functions from stellar
> management to stellar-common, and guard them with CONSOLE capability.
> Thoughts?
>
> ottO
>


Re: [DISCUSS] Move SHELL type functions from management to stellar common

2018-01-31 Thread Casey Stella
I assumed he was talking about the SHELL_EDIT stuff and maybe the file
loading bits.  The config stuff is metron specific

On Wed, Jan 31, 2018 at 10:06 AM, Nick Allen  wrote:

> > I think we should move the shell/console type functions from stellar
>
> What functions specifically?  Are you talking about
> `metron-platform/metron-management`?
>
> If you are talking about he functions like 'CONFIG_GET' and 'CONFIG_PUT',
> those seem specific to interacting with Metron.  They don't seem very
> stellar-common-ish to me.
>
>
> > ... and guard them with CONSOLE capability.
>
> I think some sort of guard is appropriate. However it is done you need to
> make sure they still work in Zeppelin.  There is no CONSOLE capability in
> Zeppelin, currently.
>
>
>
> On Wed, Jan 31, 2018 at 9:12 AM, Otto Fowler 
> wrote:
>
> > Per:  https://issues.apache.org/jira/browse/METRON-876
> >
> > I think we should move the shell/console type functions from stellar
> > management to stellar-common, and guard them with CONSOLE capability.
> > Thoughts?
> >
> > ottO
> >
>


Re: When things change in hdfs, how do we know

2018-01-31 Thread Casey Stella
Hmm, I have heard this feedback before.  Perhaps a more low-key approach
would be either a static timer that checked or a timer bolt that sent a
periodic timer and the parser bolt reconfigured the parser (or indeed we
added a Reloadable interface with a 'reload' method).  We could be smart
also and only set up the topology with the timer bolt if the parser
actually implemented the Reloadable interface.  Just some thoughts that
might be easy and avoid instability.

On Tue, Jan 30, 2018 at 3:42 PM, Otto Fowler 
wrote:

> It is still @unstable, but the jiras :
> https://issues.apache.org/jira/browse/HDFS-8940?jql=
> project%20%3D%20HDFS%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22)%20AND%20text%20~%20%22INotify%22
> that I see are stall from over the summer.
>
> They also seem geared to scale or changing the filter object not the api.
>
>
>
> On January 30, 2018 at 14:19:56, JJ Meyer (jjmey...@gmail.com) wrote:
>
> Hello all,
>
> I had created a NiFi processor a long time back that used the inotify API.
> One thing I noticed while working with it is that it is marked with the
> `Unstable` annotation. It may be worth checking if anymore work is going on
> with it and if it will impact this (if it hasn't already been looked into).
>
> Thanks,
> JJ
>
> On Mon, Jan 29, 2018 at 7:27 AM, Otto Fowler 
> wrote:
>
> > I have updated the jira as well
> >
> >
> > On January 29, 2018 at 08:22:34, Otto Fowler (ottobackwa...@gmail.com)
> > wrote:
> >
> > https://github.com/ottobackwards/hdfs-inotify-zookeeper
> >
>


Re: When things change in hdfs, how do we know

2018-01-31 Thread Casey Stella
Well, it'll be one listener per worker and if you have a lot of workers,
it's going to be a bad time probably.

On Wed, Jan 31, 2018 at 11:50 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> I don’t think the Unstable means the implementation will crash.  I think
> it means
> it is a newish-api, and there should be 1 listeners.
>
> Having 1 listener shouldn’t be an issue.
>
>
>
> On January 31, 2018 at 11:45:54, Casey Stella (ceste...@gmail.com) wrote:
>
> Hmm, I have heard this feedback before. Perhaps a more low-key approach
> would be either a static timer that checked or a timer bolt that sent a
> periodic timer and the parser bolt reconfigured the parser (or indeed we
> added a Reloadable interface with a 'reload' method). We could be smart
> also and only set up the topology with the timer bolt if the parser
> actually implemented the Reloadable interface. Just some thoughts that
> might be easy and avoid instability.
>
> On Tue, Jan 30, 2018 at 3:42 PM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > It is still @unstable, but the jiras :
> > https://issues.apache.org/jira/browse/HDFS-8940?jql=
> > project%20%3D%20HDFS%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22)%20AND%20text%20~%20%22INotify%22
> > that I see are stall from over the summer.
> >
> > They also seem geared to scale or changing the filter object not the
> api.
> >
> >
> >
> > On January 30, 2018 at 14:19:56, JJ Meyer (jjmey...@gmail.com) wrote:
> >
> > Hello all,
> >
> > I had created a NiFi processor a long time back that used the inotify
> API.
> > One thing I noticed while working with it is that it is marked with the
> > `Unstable` annotation. It may be worth checking if anymore work is going
> on
> > with it and if it will impact this (if it hasn't already been looked
> into).
> >
> > Thanks,
> > JJ
> >
> > On Mon, Jan 29, 2018 at 7:27 AM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> > > I have updated the jira as well
> > >
> > >
> > > On January 29, 2018 at 08:22:34, Otto Fowler (ottobackwa...@gmail.com)
>
> > > wrote:
> > >
> > > https://github.com/ottobackwards/hdfs-inotify-zookeeper
> > >
> >
>
>


Re: [DISCUSS] Persistence store for user profile settings

2018-02-01 Thread Casey Stella
So, I'll answer your question with some questions:

   - No matter the data store we use upgrading will take some care, right?
   - Do we currently depend on a RDBMS anywhere?  I want to say that we do
   in the REST layer already, right?
   - If we don't use a RDBMs, what's the other option?  What are the pros
   and cons?
   - Have we considered non-server offline persistent solutions (e.g.
   https://www.html5rocks.com/en/features/storage)?



On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman  wrote:

> There is currently a PR up for review that allows a user to configure and
> save the list of facet fields that appear in the left column of the Alerts
> UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
> support which means we can store those in a relational database.
>
> However I'm not 100% sure this is the best place to keep this.  As we add
> more use cases like this the backing tables in the RDBMS will need to be
> managed.  This could make upgrading more tedious and error-prone.  Is there
> are a better way to store this, assuming we can leverage a component that's
> already included in our stack?
>
> Ryan
>


Re: [DISCUSS] Alternatives to split/join enrichment

2018-02-22 Thread Casey Stella
So, these are good questions, as usual Otto :)

> how does this effect the distribution of work through the cluster, and
resiliency of the topologies?

This moves us to a data parallelism scheme rather than a task parallelism
scheme.  This, in effect means, that we will not be distributing the
partial enrichments across the network for a given message, but rather
distributing the messages across the network for *full* enrichment.  So,
the bundle of work is the same, but we're not concentrating capabilities in
specific workers.  Then again, as soon as we moved to stellar enrichments
and sub-groups where you can interact with hbase or geo from within
stellar, we sorta abandoned specialization.  Resiliency shouldn't be
effected and, indeed, it should be easier to reason about.  We ack after
every bolt in the new scheme rather than avoid acking until we join and ack
the original tuple.  In fact, I'm still not convinced there's not a bug
somewhere in that join bolt that makes it so we don't ack the right tuple.

> Is anyone else doing it like this?

The stormy way of doing this is to specialize in the bolts and join, no
doubt, in a fan-out/fan-in pattern.  I do not think it's unheard of,
though, to use a threadpool.  It's slightly peculiar inasmuch as storm has
its own threading model, but it is an embarassingly parallel task and the
main shift is trading the unit of parallelism from enrichment task to
message to the gain of fewer network hops.  That being said, as long as
you're not emitting from a different thread that you are receiving from,
there's no technical limitation.

> Can we have multiple thread pools and group tasks together ( or separate
them ) wrt hbase?

We could, but I think we might consider starting with just a simple static
threadpool that we configure at the topology level (e.g. multiple worker
threads can share the same threadpool that we can configure).  I think as
the trend of moving everything to stellar continues, we may end up in a
situation where we don't have a coherent or clear way to differentiate
between thread pools like we do now.

> Also, how are we to measure the effect?

Well, some of the benefits here are at an architectural/feature level, the
most exciting of which is that this approach opens up avenues for stellar
subgroups to depend on each other.  Slightly less exciting, but still nice
is the fact that this normalizes us with *other* streaming technologies and
the decoupling work done as part of the PR (soon to be released) will make
it easy to transition if we so desire.  Beyond that, for performance,
someone will have to run some performance tests or try it out in a
situation where they're having some enrichment performance issues.  Until
we do that, I think we should probably just keep it as a parallel approach
that you can swap out if you so desire.

On Thu, Feb 22, 2018 at 11:48 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> This sounds worth exploring.  A couple of questions:
>
> * how does this effect the distribution of work through the cluster, and
> resiliency of the topologies?
> * Is anyone else doing it like this?
> * Can we have multiple thread pools and group tasks together ( or separate
> them ) wrt hbase?
>
>
>
> On February 22, 2018 at 11:32:39, Casey Stella (ceste...@gmail.com) wrote:
>
> Hi all,
>
> I've been thinking and working on something that I wanted to get some
> feedback on. The way that we do our enrichments, the split/join
> architecture was created to effectively to parallel enrichments in a
> storm-like way in contrast to OpenSoc.
>
> There are some good parts to this architecture:
>
> - It works, enrichments are done in parallel
> - You can tune individual enrichments differently
> - It's very storm-like
>
> There are also some deficiencies:
>
> - It's hard to reason about
> - Understanding the latency of enriching a message requires looking
> at multiple bolts that each give summary statistics
> - The join bolt's cache is really hard to reason about when performance
> tuning
> - During spikes in traffic, you can overload the join bolt's cache
> and drop messages if you aren't careful
> - In general, it's hard to associate a cache size and a duration kept
> in cache with throughput and latency
> - There are a lot of network hops per message
> - Right now we are stuck at 2 stages of transformations being done
> (enrichment and threat intel). It's very possible that you might want
> stellar enrichments to depend on the output of other stellar enrichments.
> In order to implement this in split/join you'd have to create a cycle in
> the storm topology
>
> I propose a change. I propose that we move to a model where we do
> enrichments in a single bolt in parallel using a static threadpool (e.g.
> multiple workers in the same process would share the threadpool). IN all
> other

[DISCUSS] Alternatives to split/join enrichment

2018-02-22 Thread Casey Stella
Hi all,

I've been thinking and working on something that I wanted to get some
feedback on.  The way that we do our enrichments, the split/join
architecture was created to effectively to parallel enrichments in a
storm-like way in contrast to OpenSoc.

There are some good parts to this architecture:

   - It works, enrichments are done in parallel
   - You can tune individual enrichments differently
   - It's very storm-like

There are also some deficiencies:

   - It's hard to reason about
  - Understanding the latency of enriching a message requires looking
  at multiple bolts that each give summary statistics
   - The join bolt's cache is really hard to reason about when performance
   tuning
  - During spikes in traffic, you can overload the join bolt's cache
  and drop messages if you aren't careful
  - In general, it's hard to associate a cache size and a duration kept
  in cache with throughput and latency
   - There are a lot of network hops per message
   - Right now we are stuck at 2 stages of transformations being done
   (enrichment and threat intel).  It's very possible that you might want
   stellar enrichments to depend on the output of other stellar enrichments.
   In order to implement this in split/join you'd have to create a cycle in
   the storm topology

I propose a change.  I propose that we move to a model where we do
enrichments in a single bolt in parallel using a static threadpool (e.g.
multiple workers in the same process would share the threadpool).  IN all
other ways, this would be backwards compatible.  A transparent drop-in for
the existing enrichment topology.

There are some pros/cons about this too:

   - Pro
  - Easier to reason about from an individual message perspective
  - Architecturally decoupled from Storm
 - This sets us up if we want to consider other streaming
 technologies
  - Fewer bolts
 - spout -> enrichment bolt -> threatintel bolt -> output bolt
  - Way fewer network hops per message
 - currently 2n+1 where n is the number of enrichments used (if
 using stellar subgroups, each subgroup is a hop)
  - Easier to reason about from a performance perspective
 - We trade cache size and eviction timeout for threadpool size
  - We set ourselves up to have stellar subgroups with dependencies
 - i.e. stellar subgroups that depend on the output of other
 subgroups
 - If we do this, we can shrink the topology to just spout ->
 enrichment/threat intel -> output
  - Con
  - We can no longer tune stellar enrichments independent from HBase
  enrichments
 - To be fair, with enrichments moving to stellar, this is the case
 in the split/join approach too
  - No idea about performance


What I propose is to submit a PR that will deliver an alternative,
completely backwards compatible topology for enrichment that you can use by
adjusting the start_enrichment_topology.sh script to use
remote-unified.yaml instead of remote.yaml.  If we live with it for a while
and have some good experiences with it, maybe we can consider retiring the
old enrichment topology.

Thoughts?  Keep me honest; if I have over or understated the issues for
split/join or missed some important architectural issue let me know.  I'm
going to submit a PR to this effect by the EOD today so things will be more
obvious.


Re: [DISCUSS] Alternatives to split/join enrichment

2018-02-22 Thread Casey Stella
FYI, the PR for this is up at https://github.com/apache/metron/pull/940
For those interested, please comment on the actual implementation there.

On Thu, Feb 22, 2018 at 12:43 PM, Casey Stella <ceste...@gmail.com> wrote:

> So, these are good questions, as usual Otto :)
>
> > how does this effect the distribution of work through the cluster, and
> resiliency of the topologies?
>
> This moves us to a data parallelism scheme rather than a task parallelism
> scheme.  This, in effect means, that we will not be distributing the
> partial enrichments across the network for a given message, but rather
> distributing the messages across the network for *full* enrichment.  So,
> the bundle of work is the same, but we're not concentrating capabilities in
> specific workers.  Then again, as soon as we moved to stellar enrichments
> and sub-groups where you can interact with hbase or geo from within
> stellar, we sorta abandoned specialization.  Resiliency shouldn't be
> effected and, indeed, it should be easier to reason about.  We ack after
> every bolt in the new scheme rather than avoid acking until we join and ack
> the original tuple.  In fact, I'm still not convinced there's not a bug
> somewhere in that join bolt that makes it so we don't ack the right tuple.
>
> > Is anyone else doing it like this?
>
> The stormy way of doing this is to specialize in the bolts and join, no
> doubt, in a fan-out/fan-in pattern.  I do not think it's unheard of,
> though, to use a threadpool.  It's slightly peculiar inasmuch as storm has
> its own threading model, but it is an embarassingly parallel task and the
> main shift is trading the unit of parallelism from enrichment task to
> message to the gain of fewer network hops.  That being said, as long as
> you're not emitting from a different thread that you are receiving from,
> there's no technical limitation.
>
> > Can we have multiple thread pools and group tasks together ( or separate
> them ) wrt hbase?
>
> We could, but I think we might consider starting with just a simple static
> threadpool that we configure at the topology level (e.g. multiple worker
> threads can share the same threadpool that we can configure).  I think as
> the trend of moving everything to stellar continues, we may end up in a
> situation where we don't have a coherent or clear way to differentiate
> between thread pools like we do now.
>
> > Also, how are we to measure the effect?
>
> Well, some of the benefits here are at an architectural/feature level, the
> most exciting of which is that this approach opens up avenues for stellar
> subgroups to depend on each other.  Slightly less exciting, but still nice
> is the fact that this normalizes us with *other* streaming technologies and
> the decoupling work done as part of the PR (soon to be released) will make
> it easy to transition if we so desire.  Beyond that, for performance,
> someone will have to run some performance tests or try it out in a
> situation where they're having some enrichment performance issues.  Until
> we do that, I think we should probably just keep it as a parallel approach
> that you can swap out if you so desire.
>
> On Thu, Feb 22, 2018 at 11:48 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>> This sounds worth exploring.  A couple of questions:
>>
>> * how does this effect the distribution of work through the cluster, and
>> resiliency of the topologies?
>> * Is anyone else doing it like this?
>> * Can we have multiple thread pools and group tasks together ( or
>> separate them ) wrt hbase?
>>
>>
>>
>> On February 22, 2018 at 11:32:39, Casey Stella (ceste...@gmail.com)
>> wrote:
>>
>> Hi all,
>>
>> I've been thinking and working on something that I wanted to get some
>> feedback on. The way that we do our enrichments, the split/join
>> architecture was created to effectively to parallel enrichments in a
>> storm-like way in contrast to OpenSoc.
>>
>> There are some good parts to this architecture:
>>
>> - It works, enrichments are done in parallel
>> - You can tune individual enrichments differently
>> - It's very storm-like
>>
>> There are also some deficiencies:
>>
>> - It's hard to reason about
>> - Understanding the latency of enriching a message requires looking
>> at multiple bolts that each give summary statistics
>> - The join bolt's cache is really hard to reason about when performance
>> tuning
>> - During spikes in traffic, you can overload the join bolt's cache
>> and drop messages if you aren't careful
>> - In general, it's hard to associate a cache size and a duration kept
>> in cache with thr

Re: Apache Website Required Links

2018-02-15 Thread Casey Stella
Just reporting back that Anand's PR METRON-1386 (
https://github.com/apache/metron/pull/935) has been merged into master and
the asf-site branch.
Kudos to Anand!

Casey

On Wed, Feb 7, 2018 at 9:11 AM, Anand Subramanian <
asubraman...@hortonworks.com> wrote:

> I can take a shot at this if there are no other takers.
>
> Regards,
> Anand
>
> On 2/5/18, 8:59 PM, "Justin Leet"  wrote:
>
> I'd created a Jira awhile ago, but it deserves a callout to the
> community.
> Especially if someone wants to grab it, it's probably something pretty
> easy
> (and valuable!) to do.
>
> There's a set of required links on Apache web pages, which can be seen
> at Website
> Navigation Links Policy
> 
>
> Reporting is at Site Check For Project - Metron
> 
>
> This ticket is available at:
> METRON-1386 
>
>
>


Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-03 Thread Casey Stella
Thanks for the feedback, Nick.

Regarding "IMHO, I'd rather not reinvent the wheel for text manipulation."

I would argue that we are not reinventing the wheel for text manipulation
as the extractor config exists already and we are doing a similar thing in
the flatfile loader (in fact, the code is reused and merely extended).
Transformation operations are already supported in our codebase in the
extractor config, this PR has just added some hooks for stateful operations.

Furthermore, we will need a configuration object to pass to the REST call
if we are ever to create a UI around importing data into hbase or creating
these summary objects.

Regarding your example:
$ cat top-1m.csv | awk -F, '{print $2}' | sed '/^$/d' | stellar -i
'DOMAIN_REMOVE_TLD(_)' | stellar -i 'BLOOM_ADD(_)'

I'm very sympathetic to this type of extension, but it has some issues:

   1. This implies a single-threaded addition to the bloom filter.
  1. Even with 5 threads, it takes an hour for the full alexa 1m, so I
  think this will impact performance
  2. There's not a way to specify how to merge across threads if we do
  make a multithread command line option
   2. This restricts these kinds of operations to roles with heavy unix CLI
   knowledge, which isn't often the types of people who would be doing this
   type of operation
   3. What if we need two variables passed to stellar?
   4. This approach will be harder to move to Hadoop.  Eventually we will
   want to support data on HDFS being processed by Hadoop (similar to flatfile
   loader), so instead of -m LOCAL being passed for the flatfile summarizer
   you'd pass -m SPARK and the processing would happen on the cluster
  1. This is particularly relevant in this case as it's a
  embarrassingly parallel problem in general

In summary, while this a CLI approach is attractive, I prefer the extractor
config solution because it is the solution with the smallest iteration that:

   1. Reuses existing metron extraction infrastructure
   2. Provides the most solid base for the extensions that will be sorely
   needed soon (and will keep it in parity with the flatfile loader)
   3. Provides the most solid base for a future UI extension in the
   management UI to support both summarization and loading




On Tue, Dec 26, 2017 at 11:27 AM, Nick Allen <n...@nickallen.org> wrote:

> First off, I really do like the typosquatting use case and a lot of what
> you have described.
>
> > We need a way to generate the summary sketches from flat data for this to
> > work.
> > ​..​
> >
>
> I took this quote directly from your use case.  Above is the point that I'd
> like to discuss and what your proposed solutions center on.  This is what I
> think you are trying to do, at least with PR #879
> <https://github.com/apache/metron/pull/879>...
>
> (Q) Can we repurpose Stellar functions so that they can operate on text
> stored in a file system?
>
>
> Whether we use the (1) Configuration or the (2) Function-based approach
> that you described, fundamentally we are introducing new ways to perform
> text manipulation inside of Stellar.
>
> IMHO, I'd rather not reinvent the wheel for text manipulation.  It would be
> painful to implement and maintain a bunch of Stellar functions for text
> manipulation.  People already have a large number of tools available to do
> this and everyone has their favorites.  People are resistant to learning
> something new when they already are familiar with another way to do the
> same thing.
>
> So then the question is, how else can we do this?  My suggestion is that
> rather than introducing text manipulation tools inside of Stellar, we allow
> people to use the text manipulation tools they already know, but with the
> Stellar functions that we already have.  And the obvious way to tie those
> two things together is the Unix pipeline.
>
> A quick, albeit horribly incomplete, example to flesh this out a bit more
> based on the example you have in PR #879
> <https://github.com/apache/metron/pull/879>.  This would allow me to
> integrate Stellar with whatever external tools that I want.
>
> $ cat top-1m.csv | awk -F, '{print $2}' | sed '/^$/d' | stellar -i
> 'DOMAIN_REMOVE_TLD(_)' | stellar -i 'BLOOM_ADD(_)'
>
>
>
>
>
>
>
>
> On Sun, Dec 24, 2017 at 8:28 PM, Casey Stella <ceste...@gmail.com> wrote:
>
> > I'll start this discussion off with my idea around a 2nd step that is
> more
> > adaptable.  I propose the following set of stellar functions backed by
> > Spark in the metron-management project:
> >
> >- CSV_PARSE(location, separator?, columns?) : Constructs a Spark
> >Dataframe for reading the flatfile
> >- SQL_TRANSFORM(dataframe, spark sql statement): Transforms the
> > 

Re: [DISCUSS] Generating and Interacting with serialized summary objects

2018-01-03 Thread Casey Stella
It's actually many more than 1M.  There are 1M domains, each domain could
have upwards of 300 - 1000 possible typosquatted domains.

You will notice from
https://github.com/cestella/incubator-metron/tree/typosquat_merge/use-cases/typosquat_detection#generate-the-bloom-filter
that we are not adding the domain to the bloom filter, we're adding each
domain generated from DOMAIN_TYPOSQUAT to the bloom filter.  In fact, we
would very specifically NOT want the base domain as that would not be an
indication of typosquatting (going to google.com would be legit, going to
goggle.com would not).



On Wed, Jan 3, 2018 at 10:48 AM, Nick Allen <n...@nickallen.org> wrote:

> > Even with 5 threads, it takes an hour for the full Alexa 1m, so I  think
> this will impact performance
>
> What exactly takes an hour?  Adding 1M entries to a bloom filter?  That
> seems really high, unless I am not understanding something.
>
>
>
>
>
>
> On Wed, Jan 3, 2018 at 10:17 AM, Casey Stella <ceste...@gmail.com> wrote:
>
> > Thanks for the feedback, Nick.
> >
> > Regarding "IMHO, I'd rather not reinvent the wheel for text
> manipulation."
> >
> > I would argue that we are not reinventing the wheel for text manipulation
> > as the extractor config exists already and we are doing a similar thing
> in
> > the flatfile loader (in fact, the code is reused and merely extended).
> > Transformation operations are already supported in our codebase in the
> > extractor config, this PR has just added some hooks for stateful
> > operations.
> >
> > Furthermore, we will need a configuration object to pass to the REST call
> > if we are ever to create a UI around importing data into hbase or
> creating
> > these summary objects.
> >
> > Regarding your example:
> > $ cat top-1m.csv | awk -F, '{print $2}' | sed '/^$/d' | stellar -i
> > 'DOMAIN_REMOVE_TLD(_)' | stellar -i 'BLOOM_ADD(_)'
> >
> > I'm very sympathetic to this type of extension, but it has some issues:
> >
> >1. This implies a single-threaded addition to the bloom filter.
> >   1. Even with 5 threads, it takes an hour for the full alexa 1m, so
> I
> >   think this will impact performance
> >   2. There's not a way to specify how to merge across threads if we
> do
> >   make a multithread command line option
> >2. This restricts these kinds of operations to roles with heavy unix
> CLI
> >knowledge, which isn't often the types of people who would be doing
> this
> >type of operation
> >3. What if we need two variables passed to stellar?
> >4. This approach will be harder to move to Hadoop.  Eventually we will
> >want to support data on HDFS being processed by Hadoop (similar to
> > flatfile
> >loader), so instead of -m LOCAL being passed for the flatfile
> summarizer
> >you'd pass -m SPARK and the processing would happen on the cluster
> >   1. This is particularly relevant in this case as it's a
> >   embarrassingly parallel problem in general
> >
> > In summary, while this a CLI approach is attractive, I prefer the
> extractor
> > config solution because it is the solution with the smallest iteration
> > that:
> >
> >1. Reuses existing metron extraction infrastructure
> >2. Provides the most solid base for the extensions that will be sorely
> >needed soon (and will keep it in parity with the flatfile loader)
> >3. Provides the most solid base for a future UI extension in the
> >management UI to support both summarization and loading
> >
> >
> >
> >
> > On Tue, Dec 26, 2017 at 11:27 AM, Nick Allen <n...@nickallen.org> wrote:
> >
> > > First off, I really do like the typosquatting use case and a lot of
> what
> > > you have described.
> > >
> > > > We need a way to generate the summary sketches from flat data for
> this
> > to
> > > > work.
> > > > ​..​
> > > >
> > >
> > > I took this quote directly from your use case.  Above is the point that
> > I'd
> > > like to discuss and what your proposed solutions center on.  This is
> > what I
> > > think you are trying to do, at least with PR #879
> > > <https://github.com/apache/metron/pull/879>...
> > >
> > > (Q) Can we repurpose Stellar functions so that they can operate on text
> > > stored in a file system?
> > >
> > >
> > > Whether we use the (1) Configuration or the (2) Function-based approach
> > > that you described, fundamentally we are intr

Re: [DISCUSS] Merging Solr feature branch (METRON-1416) into master

2018-06-21 Thread Casey Stella
I think that we should merge now, but I’m perhaps biased since I did one of
the hard merges. I think that since the major outstanding bug is being
worked and we are otherwise feature complete, the feature branch did its
job and we are ready to merge.
On Thu, Jun 21, 2018 at 10:21 Justin Leet  wrote:

> Hi All,
>
> The Solr branch (/feature/METRON-1416-upgrade-solr
> ),
> has been progressing for a while now.  I'd like to open up discussion
> around what it takes to get it into master.
>
> The JIRA for tracking this feature branch is METRON-1416
> .
>
> As shown in the JIRA, the majority of tasks are complete, with a few
> outstanding issues. Of these, I believe these are the main ones of interest
> to this discussion.
>
>- METRON-1629  -
>There is an active PR #1072  >
>- METRON-1609  -
>There is an active PR #1056  >
>- METRON-1602  -
> Full
>dev can run with Solr without this, it would simply be more convenient.
>- METRON-1632  -
>Causes a metaalert specific issue where UI filtering on
>source.type:metaalert fails. More detail is on the Jira.
>- Two validation tickets.  It's been run up on multinode, and manual
>testing has happened (and I'm will be seen a bit more on the final PR by
>various reviewers), so I'm inclined to just leave these open until we're
>good to go.  Let me know if we want to handle this differently.
>
> I'm of the opinion both of the active PRs need to be merged before we merge
> this into master, especially the documentation one.  The other two tickets
> can be done in the future; one can be worked around and one is a metaalert
> specific issue that primarily effects the alerts UI.
>
> As the branch has grown and diverged from master, it's gotten increasingly
> unwieldy to maintain (and I think it's worth a follow-on discussion about
> how we manage refactorings that happen in these sorts of branches).  I know
> there's been at least a couple merges from master that have been
> nontrivially difficult and required careful testing, particularly around
> the DAO layer, to avoid regressions in both code and tests.
>
> The feature set is pretty complete.  The UI works, barring the metaalert
> issue.  Much of the backend has been refactored and seen improved test
> coverage benefiting both Solr and Elasticsearch.  The main difference
> between ES and Solr is the lack of the equivalent visualizations to
> Kibana.  I don't believe the feature branch needs to wait for this, as it's
> pretty standalone work that can be added as usage and demand dictates.
>
> I'm of the opinion that the benefits of getting the branch into master
> outweighs the issues still present, especially in terms of making
> refactoring and features available and easing the dev burden.  The
> remaining tickets are Solr specific, and ES functions as it does in master.
>
> Are there any must-haves before we bring this branch back?  Are there any
> other concerns we have before a final PR is opened (pending completion of
> active PRs and any other must-haves)?
>
> Justin
>


Re: [DISCUSS] Metron Release 0.6.0?

2018-08-15 Thread Casey Stella
+1 to both releases, this is plenty for an 0.6.0 and a 0.2.0

On Wed, Aug 15, 2018 at 11:04 AM Justin Leet  wrote:

> I just sent a thread about release cadence. Jon, I'd recommend starting a
> thread on a 1.0 roadmap.  I thought about merging the threads, but I think
> that's just going to result in more crosstalk, so I'll let you start that
> conversation.
>
> On Wed, Aug 15, 2018 at 10:37 AM Nick Allen  wrote:
>
> > +1 to a 0.6.0 release that includes the Pcap Panel and Solr work.
> >
> > +1 to doing a 0.2.0 release for metron-bro-plugin-kafka.  I *think* we
> need
> > to do the plugin release first, so that the 0.6.0 Metron release will
> point
> > to plugin 0.2.0.
> >
> > FWIW, here are the changes since the last release.
> >
> > 6 days ago METRON-1730: Update steps to run pycapa on Centos 6 (mmiklavc
> > via mmiklavc) closes apache/metron#1152
> > 2 weeks ago METRON-1701 Update General notes on the installation of
> Pycapa
> > on Kerberized cluster (MohanDV via nickwallen) closes apache/metron#1136
> > 3 weeks ago METRON-1650 Packaging docker containers are too large
> > (jameslamb via merrimanr) closes apache/metron#1091
> > 3 weeks ago METRON-1604 : Add RHEL 7 power pc to OS family for the HCP
> > management pack repo info closes apache/incubator-metron#1052
> > 3 weeks ago METRON-1687: Upgrade the rat plugin to 0.13-SNAPSHOT closes
> > apache/incubator-metron#1126
> > 3 weeks ago METRON-1694: Clean up Metron REST docs closes
> > apache/incubator-metron#1131
> > 4 weeks ago METRON-1606 Add a wrap to incoming messages in
> the
> > metron json parser (ottobackwards) closes apache/metron#1054
> > 4 weeks ago METRON-1672 Add metron-alertss UI unit tests to travis
> > build process (justinleet) closes apache/metron#1106
> > 4 weeks ago METRON-1684 Fix Markdown problems in 3rdPartyParser.md
> > (justinleet) closes apache/metron#1110
> > 4 weeks ago METRON-1657 Parser aggregation in storm (justinleet) closes
> > apache/metron#1099
> > 4 weeks ago METRON-1651 Fixing failing protractor e2e test (tiborm via
> > merrimanr) closes apache/metron#1095
> > 4 weeks ago METRON-1673 Fix Javadoc errors (justinleet) closes
> > apache/metron#1107
> > 4 weeks ago METRON-1620: Fixes for forensic clustering use case example
> > (mmiklavc via mmiklavc) closes apache/metron#1065
> > 4 weeks ago METRON-1659: The platform-info.sh should check for the
> vagrant
> > hostmanager plugin closes apache/incubator-metron#1100
> > 4 weeks ago METRON-1658: Upgrade bro to 2.5.4 closes
> > apache/incubator-metron#1101
> > 4 weeks ago METRON-1236 Add start/stop/restart commands that execute
> > successfully, when ambari agents run as non-root user closes
> > apache/incubator-metron#1105
> > 4 weeks ago METRON-1670: Stellar WEEK_OF_YEAR test is locale sensitive
> > closes apache/incubator-metron#1104
> > 5 weeks ago METRON-1660 On Solr, sorting by threat score fails
> (justinleet)
> > closes apache/metron#1102
> > 5 weeks ago METRON-1656 Create KAKFA_SEEK function (nickwallen) closes
> > apache/metron#1097
> > 5 weeks ago METRON-1644: Support parser chaining closes
> > apache/incubator-metron#1084
> > 5 weeks ago METRON-1655 Make REGEXP_MATCH take multiple regexs in the 2nd
> > arg (ottobackwards) closes apache/metron#1098
> > 6 weeks ago METRON-1643: Create a REGEX_ROUTING field transformation
> closes
> > apache/incubator-metron#1083
> > 6 weeks ago METRON-1652 Document X-Pack Common Problem (nickwallen)
> closes
> > apache/metron#1092
> > 6 weeks ago METRON-1649 Intermittent Test Failure
> > ProfileBuilderBoltTest#testFlushExpiredProfiles
> > (nickwallen) closes apache/metron#1090
> > 6 weeks ago METRON-1635 Alerts UI status update doesnt immediately
> > show up (merrimanr) closes apache/metron#1080
> > 6 weeks ago METRON-1642: KafkaWriter should be able choose the topic
> from a
> > field in addition to topology construction time closes
> > apache/incubator-metron#1082
> > 6 weeks ago METRON-1636: Fix broken unit test setup in metron-alerts
> closes
> > apache/incubator-metron#1085
> > 7 weeks ago METRON-1631 Alerts UI: Dash score does not show if only
> > filtering by one group (sardell via merrimanr) closes apache/metron#1079
> > 7 weeks ago METRON-1647 Fix logging level score closes
> > apache/incubator-metron#1089
> > 7 weeks ago METRON-1621: Sorting alerts table by score closes
> > apache/incubator-metron#1088
> > 7 weeks ago METRON-1619: Stellar empty collections should be considered
> > false in boolean expressions closes apache/incubator-metron#1064
> > 7 weeks ago METRON-1646 Sensor Stubs should work when kerberized
> > (nickwallen) closes apache/metron#1087
> > 7 weeks ago METRON-1645: Check wether the Solr management pack is
> installed
> > before configuring the solr principal name. closes
> > apache/incubator-metron#1086
> > 7 weeks ago Merge branch 'master' into feature/METRON-1416-upgrade-solr
> > 7 weeks ago METRON-1634 Alerts UI add comment doesnt immediately
> show
> > up. (merrimanr) closes apache/metron#1077
> > 7 

Re: [DISCUSS] Release cadence

2018-08-15 Thread Casey Stella
If you like, I can volunteer to kick off a discuss thread when I submit the
board report.

On Wed, Aug 15, 2018 at 2:21 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I'm also a fan of the 2-3 month time frame for releases. And I agree it
> fits nicely with our board report. That said, I think we should minimally
> kick off a DISCUSS at least every 2 months per the recommendations above.
> If it's warranted, great. If not, then we bring it up at a stated later
> time for re-evaluation.
>
> Fwiw, some upcoming features post-0.6.0 that I'm seeing which are also
> large-ish and will fit nicely into the next cycle (pending completion, of
> course):
>
>1. NiFi Metron parsers
>2. Profiler enhancements - bootstrapping, etc.
>3. Knox SSO
>
>
>
> On Wed, Aug 15, 2018 at 11:10 AM Casey Stella  wrote:
>
> > Strictly selfishly, I'd love for a release to happen quickly enough to
> have
> > something to announce to the board during the reports.  Once every 2
> months
> > or when a sufficiently complicated change happens sounds like a sensible
> > cadence.
> >
> > I very much support a "how do we get to 1.0" discussion, maybe as a
> > separate thread?
> >
> > On Wed, Aug 15, 2018 at 11:56 AM zeo...@gmail.com 
> > wrote:
> >
> > > I'm a fan of a hybrid time/feature-based cadence.  Something like
> "When 3
> > > months has passed since our last release, or a sufficiently complicated
> > > change has been introduced to master (like merging a FB), a discuss
> > thread
> > > is started".  I'm primarily thinking of what the upgrade path looks
> like
> > > (more on that in a "how do we get to 1.0" discuss).
> > >
> > > Jon
> > >
> > > On Wed, Aug 15, 2018 at 11:02 AM Justin Leet 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > In concert with the discuss thread on a potential 0.6.0 release, I'd
> > also
> > > > like start a discussion about our release cadence.  We've generally
> > been
> > > > pretty relaxed around doing releases, and I'm curious what people's
> > > > thoughts are on adopting a somewhat more regular schedule.
> > > >
> > > > Couple questions I think are relevant
> > > > 1. Is this something we should work towards and, if we do, how do we
> > want
> > > > to go about it?
> > > >
> > > >- "Whenever someone feels like pushing out a discuss thread"?
> > > >- "Let's just start a discuss thread every X and if we want to
> > release
> > > >we release"?
> > > >- "let's try to get a release out every X and what's on the bus is
> > on
> > > >the bus"?
> > > >- Something else?
> > > >
> > > > 2. Assuming we do want to do more regular releases, what's the
> > timeframe
> > > > we'd like to shoot for?
> > > >
> > > > Personally, I'd like to just start a discuss thread regularly, with
> the
> > > > built-in expectation that not every thread should necessarily lead
> to a
> > > > release. I don't want to be forcing release overhead when there's not
> > > > enough to merit a release, but releasing more often than we often do
> > now
> > > > would provide a lot of values to users.
> > > >
> > > > In terms of timeframe, I tend to think a 2-3 month cadence for the
> > > threads
> > > > is reasonable. It's long enough to potentially accrue enough features
> > to
> > > > merit a release, but short enough that when we pass on a release
> we're
> > > > probably fine just waiting for another cycle to come around.  The
> last
> > > > release was ~2 months ago and we have a good amount of stuff here,
> but
> > I
> > > > also don't expect two feature branches going in to be the norm.
> > > >
> > > > I'd expect whatever comes out of this thread to also be relatively
> > > > informal. At least right now, I don't feel like we need a rigid
> > schedule,
> > > > and I'd still like people to feel encouraged to propose a release,
> > > > particularly when there are a couple major features or critical
> fixes.
> > > > Alternatively, I would expect some of these discuss threads to
> > conclude,
> > > > "We should do a release, but let's wait a couple waits for these
> > tickets
> > > to
> > > > finish up" (e.g. like the Pcap query panel).
> > > >
> > > > Justin
> > > >
> > > --
> > >
> > > Jon
> > >
> >
>


Re: [DISCUSS] Release cadence

2018-08-15 Thread Casey Stella
Strictly selfishly, I'd love for a release to happen quickly enough to have
something to announce to the board during the reports.  Once every 2 months
or when a sufficiently complicated change happens sounds like a sensible
cadence.

I very much support a "how do we get to 1.0" discussion, maybe as a
separate thread?

On Wed, Aug 15, 2018 at 11:56 AM zeo...@gmail.com  wrote:

> I'm a fan of a hybrid time/feature-based cadence.  Something like "When 3
> months has passed since our last release, or a sufficiently complicated
> change has been introduced to master (like merging a FB), a discuss thread
> is started".  I'm primarily thinking of what the upgrade path looks like
> (more on that in a "how do we get to 1.0" discuss).
>
> Jon
>
> On Wed, Aug 15, 2018 at 11:02 AM Justin Leet 
> wrote:
>
> > Hi all,
> >
> > In concert with the discuss thread on a potential 0.6.0 release, I'd also
> > like start a discussion about our release cadence.  We've generally been
> > pretty relaxed around doing releases, and I'm curious what people's
> > thoughts are on adopting a somewhat more regular schedule.
> >
> > Couple questions I think are relevant
> > 1. Is this something we should work towards and, if we do, how do we want
> > to go about it?
> >
> >- "Whenever someone feels like pushing out a discuss thread"?
> >- "Let's just start a discuss thread every X and if we want to release
> >we release"?
> >- "let's try to get a release out every X and what's on the bus is on
> >the bus"?
> >- Something else?
> >
> > 2. Assuming we do want to do more regular releases, what's the timeframe
> > we'd like to shoot for?
> >
> > Personally, I'd like to just start a discuss thread regularly, with the
> > built-in expectation that not every thread should necessarily lead to a
> > release. I don't want to be forcing release overhead when there's not
> > enough to merit a release, but releasing more often than we often do now
> > would provide a lot of values to users.
> >
> > In terms of timeframe, I tend to think a 2-3 month cadence for the
> threads
> > is reasonable. It's long enough to potentially accrue enough features to
> > merit a release, but short enough that when we pass on a release we're
> > probably fine just waiting for another cycle to come around.  The last
> > release was ~2 months ago and we have a good amount of stuff here, but I
> > also don't expect two feature branches going in to be the norm.
> >
> > I'd expect whatever comes out of this thread to also be relatively
> > informal. At least right now, I don't feel like we need a rigid schedule,
> > and I'd still like people to feel encouraged to propose a release,
> > particularly when there are a couple major features or critical fixes.
> > Alternatively, I would expect some of these discuss threads to conclude,
> > "We should do a release, but let's wait a couple waits for these tickets
> to
> > finish up" (e.g. like the Pcap query panel).
> >
> > Justin
> >
> --
>
> Jon
>


Re: Slack Channel

2018-08-15 Thread Casey Stella
Sorry Simon, I retract the comment!  I didn't realize it was possible, but
it is possible to invite.

On Wed, Aug 15, 2018 at 1:01 PM Casey Stella  wrote:

> Sadly, it's the ASF slack and I believe it requires an @apache.org email
> address.
>
> On Wed, Aug 15, 2018 at 12:57 PM Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Hello dev team, may I please join your slack channel :)
>>
>


Re: [DISCUSS] Pcap query branch completion

2018-08-16 Thread Casey Stella
I'm +1 on the merge. This is great work and congrats to those who
contributed to it!

On Thu, Aug 16, 2018 at 8:27 AM Otto Fowler  wrote:

> Looks good, thanks!
>
>
> On August 15, 2018 at 19:38:12, Ryan Merriman (merrim...@gmail.com) wrote:
>
> Otto, I believe the items you requested are in the feature branch now. Is
> there anything outstanding that we missed? The Jiras for the Pcap feature
> branch should be up to date:
> https://issues.apache.org/jira/browse/METRON-1554
>
> On Mon, Aug 13, 2018 at 5:13 PM, Ryan Merriman 
> wrote:
>
> > - Date range limits on queries
> >
> > I will add a warning in the Job cleanup PR. That seems like an
> > appropriate place for it (ie. make sure you don't cause health issues in
> > your cluster).
> >
> > - UI should manage a queue/history of jobs
> >
> > I can add some documentation around killing jobs manually with the YARN
> > CLI. However if they haven't set up a YARN queue, I'm not sure how you
> > would view only Pcap jobs. I'm also not sure how you would get the
> > application id for the job to kill because it's not displayed anywhere in
> > the UI. However, I believe we are wired for a job name but REST doesn't
> > set this. Maybe we could get a proper job name associated with pcap
> > queries and then this would be possible to document?
> >
> > - Documentation/blueprint for YARN configuration
> >
> > You make a good point. A YARN tuning guide for Metron does sound useful.
> > I will add a follow on Jira.
> >
> > On Mon, Aug 13, 2018 at 4:53 PM, Otto Fowler 
> > wrote:
> >
> >>
> >> - Date range limits on queries
> >>
> >> I took the point the wrong way apparently, sorry, I withdraw. I thought
> >> you meant allow specifying a limit on the query, not the system imposing
> a
> >> limit.
> >> This should be documented with a warning or something
> >>
> >> - UI should manage a queue/history of jobs
> >>
> >> I was thinking that if there where multiple users/jobs, there should
> >> be some thought or documentation + script on how to manage them.
> >> “To see all the jobs still running on your cluster, across users and ui
> >> instances do X”
> >> “If there is an issue with the jobs you can’t resolve in the UI for that
> >> user, or you are an admin and want to do something then X"
> >>
> >> - Documentation/blueprint for YARN configuration
> >>
> >> I agree with what you are saying. Although, we offer guidance on storm
> >> tuning, and that is conceptually the same isn’t it? That is why it comes
> >> to mind.
> >> Maybe this can be a follow on, in the tuning guide?
> >>
> >> On August 13, 2018 at 17:36:41, Ryan Merriman (merrim...@gmail.com)
> >> wrote:
> >>
> >> - Date range limits on queries
> >>
> >> Can you describe what you think is needed here? Each Metron user could
> >> have different volumes of pcap data spread out over different time
> >> periods. Are you saying we should limit the data range to something
> either
> >>
> >> constant or configurable? Are we sure all users would want this? Am I
> >> misinterpreting this requirement?
> >>
> >> - UI should manage a queue/history of jobs
> >>
> >> What should we document here? Reading that bullet point again, it's sort
> >> of vague and not very description. What I am referring to is a design
> that
> >>
> >> provides users a way to view and manage jobs in the UI. Currently jobs
> can
> >>
> >> only be run 1 at a time and progress is shown with a status bar, so it's
> >> somewhat interactive.
> >>
> >> - Documentation/blueprint for YARN configuration
> >>
> >>
> >
>


Re: package.lock changes during build?

2018-08-25 Thread Casey Stella
Yeah, that's what I thought too, but I wonder if it triggers a change if
there's a dependency that is not version locked (i.e. the most recent
version of dependency x moved from y to z).

On Sat, Aug 25, 2018 at 11:52 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Somewhere along the line the dependencies appear to have changed, but the
> file never got checked in. I don't like that this part of our build also
> seems to be non-deterministic. If I build metron 0.4.x today, for instance,
> what will I get? If the answer is "who knows?" that's unacceptable, imo.
> I've glanced at the package file and see carrots littering the
> dependencies, which as I understand it means "get me anything later than
> this version." I do not think we should be doing that.
>
>
> On Sat, Aug 25, 2018, 9:14 AM Casey Stella  wrote:
>
> > I have looked into this for other reasons and the guidance that I've seen
> > is to check in package-lock.json into source control.  I'll leave this
> > stack overflow thread here:
> >
> >
> https://stackoverflow.com/questions/44206782/do-i-commit-the-package-lock-json-file-created-by-npm-5
> >
> > I want to point out that I hate that this changes as part of the build.
> I
> > haven't gotten a complete handle on exactly why package-lock is changing
> > seemingly non-deterministically yet.
> >
> > Casey
> >
> > On Sat, Aug 25, 2018 at 11:05 AM Nick Allen  wrote:
> >
> > > Yes, I have noticed that also, but have not looked deeper.
> > >
> > > On Sat, Aug 25, 2018 at 10:32 AM Otto Fowler 
> > > wrote:
> > >
> > > > I just did a PR, can saw that the package.lock file for alerts-ui was
> > > > changed, with updated versions.
> > > > I did *not* change the file, nor anything in metron-interface. That
> > seems
> > > > to imply that this file is changed or updated by
> > > > something that happens during building or deploying full dev.
> > > >
> > > > Is this true?  How does this work?  Is this on purpose?
> > > >
> > > > ottO
> > > >
> > >
> >
>


Re: package.lock changes during build?

2018-08-25 Thread Casey Stella
I have looked into this for other reasons and the guidance that I've seen
is to check in package-lock.json into source control.  I'll leave this
stack overflow thread here:
https://stackoverflow.com/questions/44206782/do-i-commit-the-package-lock-json-file-created-by-npm-5

I want to point out that I hate that this changes as part of the build.  I
haven't gotten a complete handle on exactly why package-lock is changing
seemingly non-deterministically yet.

Casey

On Sat, Aug 25, 2018 at 11:05 AM Nick Allen  wrote:

> Yes, I have noticed that also, but have not looked deeper.
>
> On Sat, Aug 25, 2018 at 10:32 AM Otto Fowler 
> wrote:
>
> > I just did a PR, can saw that the package.lock file for alerts-ui was
> > changed, with updated versions.
> > I did *not* change the file, nor anything in metron-interface. That seems
> > to imply that this file is changed or updated by
> > something that happens during building or deploying full dev.
> >
> > Is this true?  How does this work?  Is this on purpose?
> >
> > ottO
> >
>


Re: package.lock changes during build?

2018-08-25 Thread Casey Stella
Agreed! Great insight Shane!
On Sat, Aug 25, 2018 at 16:00 Michael Miklavcic 
wrote:

> You sir, are a gentleman and a scholar! Thanks for the background info, the
> current state of affairs, the controversy, and finally (most of all) the
> fix.
>
> On Sat, Aug 25, 2018, 12:52 PM Shane Ardell 
> wrote:
>
> > NPM's use of lock files has been quite controversial. I won't go into it
> > too deep here as there are endless posts criticizing and justifying their
> > approach, but `npm install` will install all modules listed as
> dependencies
> > in package.json and update package-lock.json accordingly instead of
> > referencing the lock file. This caused a lot of outrage in the community
> (I
> > would argue rightfully so), which led to a compromise in release 5.7.1
> with
> > `npm ci`. This command installs exactly what is specified in the
> > package-lock.json.
> >
> >
> https://blog.npmjs.org/post/171556855892/introducing-npm-ci-for-faster-more-reliable
> >
> > Metron's build currently uses `npm install`, which is why we are seeing
> the
> > package-lock.json update whenever we build locally. Coincidentally, I
> just
> > addressed this by switching to `npm ci` in an open PR of mine because I
> > noticed the same happening locally and I was already updating npm
> commands
> > in the pom.xml.
> >
> >
> https://github.com/apache/metron/pull/1096/files#diff-e8f55f2d9e4f18085052a36d750e9648L60
> >
> >
> >
> > On Sat, Aug 25, 2018 at 7:13 PM Casey Stella  wrote:
> >
> > > Yeah, that's what I thought too, but I wonder if it triggers a change
> if
> > > there's a dependency that is not version locked (i.e. the most recent
> > > version of dependency x moved from y to z).
> > >
> > > On Sat, Aug 25, 2018 at 11:52 AM Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > Somewhere along the line the dependencies appear to have changed, but
> > the
> > > > file never got checked in. I don't like that this part of our build
> > also
> > > > seems to be non-deterministic. If I build metron 0.4.x today, for
> > > instance,
> > > > what will I get? If the answer is "who knows?" that's unacceptable,
> > imo.
> > > > I've glanced at the package file and see carrots littering the
> > > > dependencies, which as I understand it means "get me anything later
> > than
> > > > this version." I do not think we should be doing that.
> > > >
> > > >
> > > > On Sat, Aug 25, 2018, 9:14 AM Casey Stella 
> wrote:
> > > >
> > > > > I have looked into this for other reasons and the guidance that
> I've
> > > seen
> > > > > is to check in package-lock.json into source control.  I'll leave
> > this
> > > > > stack overflow thread here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/44206782/do-i-commit-the-package-lock-json-file-created-by-npm-5
> > > > >
> > > > > I want to point out that I hate that this changes as part of the
> > build.
> > > > I
> > > > > haven't gotten a complete handle on exactly why package-lock is
> > > changing
> > > > > seemingly non-deterministically yet.
> > > > >
> > > > > Casey
> > > > >
> > > > > On Sat, Aug 25, 2018 at 11:05 AM Nick Allen 
> > > wrote:
> > > > >
> > > > > > Yes, I have noticed that also, but have not looked deeper.
> > > > > >
> > > > > > On Sat, Aug 25, 2018 at 10:32 AM Otto Fowler <
> > > ottobackwa...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I just did a PR, can saw that the package.lock file for
> alerts-ui
> > > was
> > > > > > > changed, with updated versions.
> > > > > > > I did *not* change the file, nor anything in metron-interface.
> > That
> > > > > seems
> > > > > > > to imply that this file is changed or updated by
> > > > > > > something that happens during building or deploying full dev.
> > > > > > >
> > > > > > > Is this true?  How does this work?  Is this on purpose?
> > > > > > >
> > > > > > > ottO
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Getting to a 1.0 release

2018-08-20 Thread Casey Stella
I completely agree, Mike.  Our docs are either very high level or very low
level (and possibly stale) and, worse, aren't aimed at the actors that
you've stated.
I think that the HBase project does a good job of providing coherent and
useable documentation in their "HBase Book" (see
https://hbase.apache.org/book.html).
It's not actor-specific, but it is coherent advice for the practical
practitioner of HBase (both admin and developer) and speaks with one
voice.  I think Metron's need
is a bit different, but at the minimum some coherent docs that speaks with
one voice and has a coherent pitch about what Metron is used for and what
it isn't used for
is well needed.

On Sat, Aug 18, 2018 at 1:00 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Apologies for any spelling mishaps as I'm writing from my phone.
>
> I'm for improving our docs. I'd like to see us guide our various profiles
> of user towards the specific documentation for the abstraction levels
> they'll be most interested in working from. I think we should have platform
> docs about how we're a broadly useful, extensible streaming analytics
> platform for cyber security as well as docs that emphasize more narrow and
> specific use cases.
>
> Personally, I think I see 3 potential tiers or classifications of docs.
> These are just observations and ideas I had, not necessarily a prescription
> for organizing docs:
> - Low level tool instructions, eg
> - how do I run the pcap toplogy and then query with the CLI and UI?
> - Platform docs about building on top of Metron, e.g.
> - writing custom parsers, enrichment, and threat Intel (imho we should
> start to take a more opinionated view of leveraging Stellar as this
> extension point rather than implementing new parser classes in Java)
> - using the profiler for constructing outlier analysis use cases
> - using MAAS for building and deploying models for use in enrichment
> - Docs around more specific use cases that solve specific as opposed to
> more general problems, similar to those we have in the use-cases folder.
>
> I think one of our challenges currently is that our docs could be better
> tailored to the "actors" we've talked about in the past. An individual SOC
> analyst will have a very different set of interests than would a reseller
> that wants to build on top of our platform to expose new modules and
> functionality to those SOC analyst. And we can, and do, currently support
> both.
>
>
> On Sat, Aug 18, 2018, 9:34 AM Nick Allen  wrote:
>
> > Yes, I imagine just a separate top level directory which would contain
> the
> > docs.
> >
> > We would need someone to survey what doc tools are out there and provide
> > some advice.
> >
> > Maybe we could look around at other open source projects that have done
> > their docs well and emulate them.
> >
> > On Sat, Aug 18, 2018, 10:57 AM Kyle Richardson <
> kylerichards...@gmail.com>
> > wrote:
> >
> > > +1 to separating developer docs and user docs. How should we approach
> > that.
> > > Have a separate doc book? I haven’t had a ton of time to contribute to
> > code
> > > lately but I’d be happy to help write some of these.
> > >
> > > On Sat, Aug 18, 2018 at 9:48 AM Nick Allen  wrote:
> > >
> > > > Personally, I think the state of our docs and web presence is an
> > > inhibitor
> > > > to growing the Metron community.  Unless we can offer concise,
> > compelling
> > > > answers to the basic questions (What can I do with Metron?  Who does
> it
> > > > help? How do I do that?), potential users and contributors are unable
> > to
> > > > see the value of Metron.
> > > >
> > > >
> > > >
> > > > On Sat, Aug 18, 2018 at 9:42 AM, Nick Allen 
> > wrote:
> > > >
> > > > > I'd like to see us focus on improving our docs before a version
> 1.0.
> > > > > Right now we just stitch together a bunch of READMEs, which is a
> > great
> > > > > stride from where we started, but is not ideal.
> > > > >
> > > > > Our docs should focused on the user and use cases; What can I do
> with
> > > > > Metron?  Who does it help? How do I do that?
> > > > >
> > > > > The docs should be separate from the code base to allow for an
> > > > > organization that is focused on the user rather than the
> > > implementation.
> > > > > This allows the READMEs to focus on the developer and the
> > > implementation,
> > > > > which should make them more digestible too.  The docs should be
> > version
> > > > > controlled and maintained through PRs, just like the code.  We
> should
> > > > take
> > > > > just as much pride in our docs as we do in our code.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 15, 2018 at 4:35 PM, Simon Elliston Ball <
> > > > > si...@simonellistonball.com> wrote:
> > > > >
> > > > >> Agreed, should we add TDE by default, and get the ranger policies
> on
> > > by
> > > > >> default? That leaves secured in Kafka, which would have to be
> built
> > > into
> > > > >> the consumers and producers to encrypt into the on disk Kafka
> > 

Re: [DISCUSS] Contributing a General Purpose Regex Parser

2018-08-29 Thread Casey Stella
+1, I look forward to the PR.

On Tue, Aug 28, 2018 at 8:37 AM Nick Allen  wrote:

> I'd love to see a PR for this.  I know there are others in the community
> looking for something similar.
>
> On Sun, Aug 26, 2018 at 7:28 PM  wrote:
>
> > Hello,
> >
> >
> >
> > We have implemented a general purpose regex parser for Metron that we are
> > interested in contributing back to the community.
> >
> >
> >
> > While the Metron Grok parser provides some regex based capability today,
> > the intention of this general purpose regex parser is to:
> >
> >1. Allow for more advanced parsing scenarios (specifically, dealing
> with
> >multiple regex lines for devices that contain several log formats
> within
> >them)
> >2. Give users and developers of Metron additional options for parsing
> >3. With the new parser chaining and regex routing feature available in
> >Metron, this gives some additional flexibility to logically separate a
> > flow
> >by:
> >   1. Regex routing to segregate logs at a device level and handle
> >   envelope unwrapping
> >   2. This general purpose regex parser to parse an entire device type
> >   that contains multiple log formats within the single device (for
> > example,
> >   RHEL logs)
> >
> >
> >
> >  At  a high level control flow is like this:
> >
> > 1. Identify the record type if incoming raw message.
> >
> > 2. Find and apply the regular expression of corresponding record type to
> > extract the fields (using named groups).
> >
> > 3. Apply the message header regex to extract the fields in the header
> part
> > of the message (using named groups).
> >
> >
> > The parser config uses the following structure:
> >
> >"recordTypeRegex":
> "(?(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
> >
> >"messageHeaderRegex": "(?(?<=^<)
> >
> >
> \\d{1,4}(?=>)).*?(?(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?(?<=\\s).*?(?=\\s))
> > ",
> >
> >"fields": [
> >
> >   {
> >
> > "recordType": "kernel",
> >
> > "regex": ".*(?(?<=\\]|\\w\\:).*?(?=$))"
> >
> >   },
> >
> >   {
> >
> > "recordType": "syslog",
> >
> > "regex":
> >
> >
> ".*(?(?<=PID\\s=\\s).*?(?=\\sLine)).*(?(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?.*?(?=\")).*(?(?<=\").*?(?=$))"
> >
> >   }
> >
> > ]
> >
> >
> >
> > Where:
> >
> >- recordTypeRegex is used to distinctly identify a record type. It
> >inputs a valid regular expression and may also have named groups,
> which
> >would be extracted into fields.
> >- messageHeaderRegex is used to specify a regular expression to
> extract
> >fields from a message part which is common across all the messages
> (i.e,
> >syslog fields, standard headers)
> >- fields: json list of objects containing recordType and regex. The
> >expression that is evaluated is based on the output of the
> > recordTypeRegex
> >- Note: recordTypeRegex and messageHeaderRegex could be specified as
> >lists also (as a JSON array), where the list will be evaluated in
> order
> >until a matching regular expression is found.
> >
> >
> >
> >
> >
> > If there are no objections to having this type of Parser within Metron,
> we
> > will open a JIRA/PR for code review.
> >
> > *Jagdeep Singh*
> >
>


Re: [DISCUSS] Feature branches post-merge

2018-09-06 Thread Casey Stella
I’d get rid of them.
On Thu, Sep 6, 2018 at 13:42 Michael Miklavcic 
wrote:

> What are we doing with feature branches once they're complete and merged
> into master? Is our expectation that we'll keep feature branches in
> perpetuity, or should we plan to do some house cleaning once they've been
> merged? I did a quick check of NiFi and Kafka and don't see much by way of
> feature branches in their repos. I see plenty of RC's in both the branches
> and tags listings, but nothing FB related. In previous discussions, we
> talked quite a bit about us "trailblazing here," so it may be that this is
> simply without much precedent and entirely for us to decide. I can
> definitely see value in maintaining them for future reference, as it does
> offer a nice bucket in which to collect the commits and discussion nicely,
> but I wanted to get others' thoughts.
>
> Best,
> Mike
>


Re: IRC Channel -> OPS?

2018-08-31 Thread Casey Stella
wait, I'm an op?  Coming up in the world!  Do we need this still?  I'm
currently afk, but will get to it tomorrow.

On Wed, Aug 29, 2018 at 4:23 PM Otto Fowler  wrote:

> Damn, I was hoping not.  It will never happen now
>
>
> On August 29, 2018 at 15:49:26, zeo...@gmail.com (zeo...@gmail.com) wrote:
>
> Isn't it Casey?
>
> Jon
>
> On Wed, Aug 29, 2018, 08:41 Otto Fowler  wrote:
>
> > Who has ops in the irc channel?
> > Can you pop in and set the topic to something like:
> > “There is an ASF slack with an active metron channel, please email
> > dev@metron.apache.org and request an invite”
> >
> --
>
> Jon
>


Re: [GitHub] metron issue #1188: METRON-1769: Script creation of a release candidate

2018-09-07 Thread Casey Stella
Mike, did you mean to reply to this on the dev list or were you aiming to
make this comment on the PR?  If you were aiming to make this comment on
the PR, then I think you need to go through github's UI.

On Fri, Sep 7, 2018 at 1:34 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Yeah, the Angular upgrade was the other bit that came to mind. Shane's PR
> for the Angular upgrade has the necessary +1's, but @nickwallen you had
> requested we hold off on that for this release (which I completely agree
> with). https://github.com/apache/metron/pull/1096
>
> On Fri, Sep 7, 2018 at 10:24 AM nickwallen  wrote:
>
> > Github user nickwallen commented on the issue:
> >
> > https://github.com/apache/metron/pull/1188
> >
> > > I'm assuming this always pulls HEAD from master to cut the release.
> > Do we need or desire any support for cutting a release from a non-HEAD
> > commit?
> >
> > It would be very useful to continue to merge PRs into master while a
> > release is being voted on.
> >
> > I had thought that @mattf-horton use to do the releases in such a way
> > that this was not a problem, but I could be wrong.
> >
> > For example, this morning I merged PR #1174 into master that I don't
> > necessarily want in the next release.  I didn't think about the potential
> > impact to the release if we have to cut a new RC.  Sorry about that
> > @justinleet .
> >
> >
> >
> >
> >
> >
> >
> >
> > ---
> >
>


Re: Security Feature Branch?

2018-07-12 Thread Casey Stella
I would support this being a feature branch.  It sounds like a valuable but
large contribution.

On Thu, Jul 12, 2018 at 10:51 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> I've been doing some work on getting the Metron UIs and REST layers to work
> with Apache KnoxSSO, and LDAP authentication, to remove the need to store
> passwords in MySQL, allow AD integration, secure up our authentication
> points. I'm also working in a Knox service to allow the gateway to provide
> full SSL for the interfaces and avoid all the proxying and CORS things we
> have to worry about.
>
> This has ended up being a pretty chunky piece of work which involves very
> significant changes to the UIs, REST layer, and introduces Knox to the
> blueprint, as well as messing with the full-dev build scripts, and adding
> ansible roles.
>
> As such, in-order to make it a bit more reviewable, would it be better to
> contribute it to a feature branch? It could arguably be broken into a
> series of PRs, but at least some parts of full dev would be broken between
> most of the logical steps, since it's all kinda co-dependent, so it's
> easier to look at as a unit.
>
> Simon
>


Re: Security Feature Branch?

2018-07-12 Thread Casey Stella
I added the feature branch: feature/METRON-1663-knoxsso

https://git-wip-us.apache.org/repos/asf?p=metron.git;a=shortlog;h=refs/heads/feature/METRON-1663-knoxsso

On Thu, Jul 12, 2018 at 11:13 AM Otto Fowler 
wrote:

> I think I understand what you are saying very very very well Simon.  I am
> not sure what would be different about your submittal from other submittals
> where that argument failed.
>
> On July 12, 2018 at 11:07:02, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Agreed Otto, the challenge is that essentially each change cuts across
> dependencies in every component. I could break it down into the changes for
> making SSO work, and the changes for making it install, and the changes for
> making full-dev work, but that would mean violating our policy that testing
> should be done for each PR on full dev, hence the one PR one unit approach.
> Does that work, or do we want to review on the basis of a series of
> untestable bits, and then a final working build PR that pulls it together?
>
> Simon
>
> On 12 July 2018 at 16:00, Otto Fowler  wrote:
>
> > Our policy in the past on such things is to require that they are broken
> > into small reviewable chunks on a feature branch, even if the end to end
> > working version was more ‘usable’.
> >
> >
> >
> > On July 12, 2018 at 10:51:30, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > I've been doing some work on getting the Metron UIs and REST layers to
> work
> > with Apache KnoxSSO, and LDAP authentication, to remove the need to store
> > passwords in MySQL, allow AD integration, secure up our authentication
> > points. I'm also working in a Knox service to allow the gateway to
> provide
> > full SSL for the interfaces and avoid all the proxying and CORS things we
> > have to worry about.
> >
> > This has ended up being a pretty chunky piece of work which involves very
> > significant changes to the UIs, REST layer, and introduces Knox to the
> > blueprint, as well as messing with the full-dev build scripts, and adding
> > ansible roles.
> >
> > As such, in-order to make it a bit more reviewable, would it be better to
> > contribute it to a feature branch? It could arguably be broken into a
> > series of PRs, but at least some parts of full dev would be broken
> between
> > most of the logical steps, since it's all kinda co-dependent, so it's
> > easier to look at as a unit.
> >
> > Simon
> >
> >
>
>
> --
> --
> simon elliston ball
> @sireb
>


Re: Master is failed in Travis

2018-01-22 Thread Casey Stella
This could be one of those intermittent test failures related to timing.
Specifically this:

test(org.apache.metron.rest.controller.SensorIndexingConfigControllerIntegrationTest)
 Time elapsed: 0.064 sec  <<< FAILURE!
java.lang.AssertionError: Status expected:<404> but was:<200>
at 
org.springframework.test.util.AssertionErrors.fail(AssertionErrors.java:54)
at 
org.springframework.test.util.AssertionErrors.assertEquals(AssertionErrors.java:81)
at 
org.springframework.test.web.servlet.result.StatusResultMatchers$10.match(StatusResultMatchers.java:664)
at 
org.springframework.test.web.servlet.MockMvc$1.andExpect(MockMvc.java:171)
at 
org.apache.metron.rest.controller.SensorIndexingConfigControllerIntegrationTest.test(SensorIndexingConfigControllerIntegrationTest.java:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
at 
org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
at 
org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:252)
at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:94)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at 
org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:191)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)



On Mon, Jan 22, 2018 at 10:21 AM, Nick Allen  wrote:

> I had created this JIRA for the specific issue earlier this morning.  I
> have no idea why it is breaking and I am not currently looking into it.
> Definitely nothing to do with the most recent commit.
>
> https://issues.apache.org/jira/browse/METRON-1414
>
>
> On Mon, Jan 22, 2018 at 10:18 AM, Otto Fowler 
> wrote:
>
> > https://travis-ci.org/apache/metron/builds/330900667
> >
>


Re: [VOTE] Metron Release Candidate 0.4.2-RC2

2018-04-10 Thread Casey Stella
It seems that 0.4.2 never got released.  Is there a reason for this?

On Fri, Dec 22, 2017 at 9:40 PM Justin Leet <justinjl...@gmail.com> wrote:

> +1 validated with Otto's script
>
> * Checksums
> * Signatures
> * Build
> * Tests
>
> On Tue, Dec 19, 2017 at 11:47 PM, Anand Subramanian <
> asubraman...@hortonworks.com> wrote:
>
> >
> > * mvn clean package at root level
> > * mvn clean package -Pbuild-rpms at metron-deployment level and generate
> > RPMs
> > * Brought up Metron stack on 12-node CentOS 7 openstack cluster using the
> > generated RPMs
> > * Bro, YAF and snort - ingest into kafka topics and validated indices
> > * Add squid telemetry, ingest into kafka topic and validated indices
> > * Management UI, Alerts UI and Swagger UI sanity check
> >
> > +1 (non-binding)
> >
> >
> > Regards,
> > Anand
> >
> >
> >
> >
> > On 12/20/17, 3:11 AM, "Casey Stella" <ceste...@gmail.com> wrote:
> >
> > >+1 validated via Otto's script
> > >* Checksums
> > >* Sigs
> > >* Build
> > >* Full dev validation
> > >
> > >On Tue, Dec 19, 2017 at 2:45 PM, Nick Allen <n...@nickallen.org> wrote:
> > >
> > >> +1  I validated using Otto's great script.
> > >>
> > >> * Validated the list of changes
> > >> * Checksums
> > >> * Sigs
> > >> * Build
> > >> * Tests
> > >> * Full Dev
> > >>
> > >> On Tue, Dec 19, 2017 at 6:23 AM, Matt Foley <ma...@apache.org> wrote:
> > >>
> > >> > Colleagues,
> > >> > This is a call to vote on releasing Apache Metron 0.4.2 and its
> > >> associated
> > >> > metron-bro-plugin-kafka 0.1.0.
> > >> > The release candidate is available at
> https://dist.apache.org/repos/
> > >> > dist/dev/metron/0.4.2-RC2/
> > >> >
> > >> > Full list of changes in this release:
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/CHANGES and
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
> > >> CHANGES.bro-plugin
> > >> >
> > >> > The github tags to be voted upon are:
> > >> > (apache/metron) apache-metron-0.4.2-rc2 and
> (apache/metron-bro-plugin-
> > >> kafka)
> > >> > 0.1
> > >> >
> > >> > The source archives being voted upon can be found here:
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
> > >> > apache-metron-0.4.2-rc2.tar.gz
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
> > >> > apache-metron-bro-plugin-kafka_0.1.0.tar.gz
> > >> >
> > >> > The site-book is at:
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
> > >> > site-book/index.html
> > >> >
> > >> > Other release files, signatures and digests can be found here:
> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
> > >> >
> > >> > The release artifacts are signed with the following key:
> > >> > 4169 AA27 ECB3 1663 in https://dist.apache.org/repos/
> > >> > dist/dev/metron/0.4.2-RC2/KEYS
> > >> >
> > >> > Please vote on releasing this package as Apache Metron 0.4.2 and
> > Apache
> > >> > Metron-bro-plugin-kafka 0.1.0
> > >> >
> > >> > When voting, please list the actions taken to verify the release.
> > >> >
> > >> > Recommended build validation and verification instructions are
> posted
> > >> here:
> > >> > https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
> > >> > or you are encouraged to try the new release verification script
> that
> > >> Otto
> > >> > published via email on 11 Dec, available at
> > >> > https://github.com/ottobackwards/Metron-and-Nifi-
> > >> > Scripts/blob/master/metron/metron-rc-check
> > >> >
> > >> > This vote will be open until 9am PST on Friday 22 Dec 2017.
> > >> >
> > >> > Thank you,
> > >> > --Matt
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> >
>


Re: [VOTE] Metron Release Candidate 0.4.2-RC2

2018-04-10 Thread Casey Stella
Nevermind, it's just the internal apache release status wasn't updated.
Sorry, I updated it as part of the board report.  Let me make sure I update
teh docs for releasing.

On Tue, Apr 10, 2018 at 10:35 AM Casey Stella <ceste...@gmail.com> wrote:

> It seems that 0.4.2 never got released.  Is there a reason for this?
>
> On Fri, Dec 22, 2017 at 9:40 PM Justin Leet <justinjl...@gmail.com> wrote:
>
>> +1 validated with Otto's script
>>
>> * Checksums
>> * Signatures
>> * Build
>> * Tests
>>
>> On Tue, Dec 19, 2017 at 11:47 PM, Anand Subramanian <
>> asubraman...@hortonworks.com> wrote:
>>
>> >
>> > * mvn clean package at root level
>> > * mvn clean package -Pbuild-rpms at metron-deployment level and generate
>> > RPMs
>> > * Brought up Metron stack on 12-node CentOS 7 openstack cluster using
>> the
>> > generated RPMs
>> > * Bro, YAF and snort - ingest into kafka topics and validated indices
>> > * Add squid telemetry, ingest into kafka topic and validated indices
>> > * Management UI, Alerts UI and Swagger UI sanity check
>> >
>> > +1 (non-binding)
>> >
>> >
>> > Regards,
>> > Anand
>> >
>> >
>> >
>> >
>> > On 12/20/17, 3:11 AM, "Casey Stella" <ceste...@gmail.com> wrote:
>> >
>> > >+1 validated via Otto's script
>> > >* Checksums
>> > >* Sigs
>> > >* Build
>> > >* Full dev validation
>> > >
>> > >On Tue, Dec 19, 2017 at 2:45 PM, Nick Allen <n...@nickallen.org>
>> wrote:
>> > >
>> > >> +1  I validated using Otto's great script.
>> > >>
>> > >> * Validated the list of changes
>> > >> * Checksums
>> > >> * Sigs
>> > >> * Build
>> > >> * Tests
>> > >> * Full Dev
>> > >>
>> > >> On Tue, Dec 19, 2017 at 6:23 AM, Matt Foley <ma...@apache.org>
>> wrote:
>> > >>
>> > >> > Colleagues,
>> > >> > This is a call to vote on releasing Apache Metron 0.4.2 and its
>> > >> associated
>> > >> > metron-bro-plugin-kafka 0.1.0.
>> > >> > The release candidate is available at
>> https://dist.apache.org/repos/
>> > >> > dist/dev/metron/0.4.2-RC2/
>> > >> >
>> > >> > Full list of changes in this release:
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/CHANGES
>> and
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
>> > >> CHANGES.bro-plugin
>> > >> >
>> > >> > The github tags to be voted upon are:
>> > >> > (apache/metron) apache-metron-0.4.2-rc2 and
>> (apache/metron-bro-plugin-
>> > >> kafka)
>> > >> > 0.1
>> > >> >
>> > >> > The source archives being voted upon can be found here:
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
>> > >> > apache-metron-0.4.2-rc2.tar.gz
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
>> > >> > apache-metron-bro-plugin-kafka_0.1.0.tar.gz
>> > >> >
>> > >> > The site-book is at:
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
>> > >> > site-book/index.html
>> > >> >
>> > >> > Other release files, signatures and digests can be found here:
>> > >> > https://dist.apache.org/repos/dist/dev/metron/0.4.2-RC2/
>> > >> >
>> > >> > The release artifacts are signed with the following key:
>> > >> > 4169 AA27 ECB3 1663 in https://dist.apache.org/repos/
>> > >> > dist/dev/metron/0.4.2-RC2/KEYS
>> > >> >
>> > >> > Please vote on releasing this package as Apache Metron 0.4.2 and
>> > Apache
>> > >> > Metron-bro-plugin-kafka 0.1.0
>> > >> >
>> > >> > When voting, please list the actions taken to verify the release.
>> > >> >
>> > >> > Recommended build validation and verification instructions are
>> posted
>> > >> here:
>> > >> >
>> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>> > >> > or you are encouraged to try the new release verification script
>> that
>> > >> Otto
>> > >> > published via email on 11 Dec, available at
>> > >> > https://github.com/ottobackwards/Metron-and-Nifi-
>> > >> > Scripts/blob/master/metron/metron-rc-check
>> > >> >
>> > >> > This vote will be open until 9am PST on Friday 22 Dec 2017.
>> > >> >
>> > >> > Thank you,
>> > >> > --Matt
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> >
>>
>


Re: Another intermittant build failure?

2018-04-11 Thread Casey Stella
I have not personally seen that one yet, but I will not deny that it
exists.  It could be very intermittent or triggered under load in travis
too.  Either way, we should probably investigate and fix.

On Wed, Apr 11, 2018 at 3:57 PM Otto Fowler  wrote:

> I had a PR build fail with an issue with the Zookeeper cache.
>
> https://travis-ci.org/apache/metron/builds/365122993
>
>
> Failed tests:
>
> ZKConfigurationsCacheIntegrationTest.validateUpdate:230->lambda$validateUpdate$9:230
> expected:<{hdfs={index=yaf, batchSize=1, enabled=true},
> elasticsearch={index=yaf, batchSize=25, batchTimeout=7,
> enabled=false}, solr={index=yaf, batchSize=5, enabled=false}}> but
> was:<{}>
>
>
> It passed when I ran it again.
>


Re: [DISCUSS] Metron RPM spec changelog

2018-04-18 Thread Casey Stella
I think I'd prefer to see the changelog only include the release entries,
rather than individual entries per dev.  We keep the spec file in source
control to determine the individual changes between releases.  I'm happy to
have my mind changed, though.

On Wed, Apr 18, 2018 at 9:47 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> We discovered yesterday while reviewing a PR that the RPM changelog hasn't
> been maintained since 9/25/17. There are 7 changes to that file that have
> not been logged in the changelog itself. The question is if we want to keep
> maintaining the changelog and, if so, should we patch the existing log with
> the missing commits. Any opinions on this? I myself don't have a strong
> opinion either way, but we shouldn't leave it in its current state.
>
> Mike
>
>
> Quoting the conversation between myself and Justin Leet:
>
> https://github.com/apache/metron/pull/996#issuecomment-382194736
> @justinleet Do we still want/need to do this? The last log change was Tue
> Sep 25 2017 by @merrimanr in METRON-1207. However, there have been 6
> changes to the spec since then that have not made it to the change log. I
> believe there was a reason we started doing this (in duplication of source
> control), but I don't recall specifically. Do remember why that was?
>
> https://github.com/apache/metron/pull/996#issuecomment-382199021
> I believe, and my memory is pretty fuzzy, is that it's best practice to
> maintain that changelog because it's useful for auditing and tracking
> purposes given that it's available on the rpm itself.
>
> There's probably a couple questions here
>
>
>1. Are we going to maintain it going forward? If not, we should just
>dump it entirely.
>2. If we choose to do so, do we want/need to update the changelog for
>the missing commits (and probably to use the dev list as authors, rather
>than individuals)?
>
>
> Might be worth opening a discuss on it. I could be persuaded either way in
> terms of whether we update it for this PR or not, but I have a slight
> preference on adding it until there's agreement we aren't doing it.
>


Re: [VOTE] Development Guidelines Addendum on Inactive Pull Requests

2018-04-20 Thread Casey Stella
+1

On Fri, Apr 20, 2018 at 11:17 AM David Lyle  wrote:

> +1 sounds good to me.
>
> -D...
>
>
> On Fri, Apr 20, 2018 at 11:09 AM, zeo...@gmail.com 
> wrote:
>
> > +1 (non-binding)
> >
> > On Fri, Apr 20, 2018 at 9:42 AM Michel Sumbul 
> > wrote:
> >
> > > +1
> > >
> > > 2018-04-20 14:40 GMT+01:00 Otto Fowler :
> > >
> > > > +1
> > > >
> > > >
> > > > On April 20, 2018 at 09:30:30, Nick Allen (n...@nickallen.org)
> wrote:
> > > >
> > > > I am proposing the following addition to the project's development
> > > > guidelines [1]. Based on these guidelines, an abandoned pull request
> > can
> > > > be closed in roughly 6 weeks time (4 weeks of inactivity plus 2 weeks
> > to
> > > > respond to a committer's request.)
> > > >
> > > > Please vote +1, 0, or -1 and also indicate if your vote is binding or
> > > > non-binding. More information on voting can be found in the Apache
> > Metron
> > > > By-Laws [2].
> > > >
> > > > This vote will remain open for at least 72 hours, excluding this
> > weekend.
> > > > I plan to close the vote no sooner than Wednesday, April 25, 2018 at
> > 8:00
> > > > AM EST.
> > > >
> > > > The discuss thread that preceeded this vote can be found here [3].
> > > >
> > > > --
> > > >
> > > > 2.6.1 Inactive Pull Requests
> > > >
> > > >
> > > > Contributions can often take a significant amount of time to complete
> > the
> > > > code review process. This process requires active participation from
> > the
> > > > contributor. If the contributor is unable to actively participate,
> the
> > > > pull request is unlikely to successfully complete this process.
> > > >
> > > > Pull Requests that have failed to receive active participation from
> the
> > > > contributor for an extended period of time risk being abandoned. Any
> > > > committer can submit a request for Apache Infra to close a pull
> request
> > > > that has been abandoned according to the following guidelines.
> > > >
> > > >
> > > > - A pull request is 'inactive' if no comments or updates have been
> made
> > > > by the contributor in the previous 4 weeks.
> > > >
> > > >
> > > > - For any 'inactive' pull request, a committer can request from the
> > > > contributor justification for keeping the pull request open.
> > > >
> > > >
> > > > - The committer's request should be made as a public comment on the
> > pull
> > > > request. The committer should refer the contributor to these
> > development
> > > > guidelines for inactive pull requests.
> > > >
> > > >
> > > > - If the contributor publically responds to the request, the pull
> > > > request is no longer consider 'inactive'.
> > > >
> > > >
> > > > - If the contributor does not respond to the request within 2 weeks,
> > the
> > > > pull request is considered 'abandoned'.
> > > >
> > > >
> > > > - A committer can cast a -1 vote on any 'abandoned' pull request
> using
> > > > these development guidelines as justification.
> > > >
> > > >
> > > > - A committer can submit a request to Apache Infra to close the
> > > > 'abandoned' pull request based on this -1 vote.
> > > >
> > > > --
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/METRON/
> > Development+Guidelines
> > > >
> > > > [2] https://cwiki.apache.org/confluence/display/METRON/
> > > > Apache+Metron+Bylaws
> > > >
> > > > [3]
> > > > https://lists.apache.org/thread.html/a4e72af67994c8e818f843a9ea8cc2
> > > > 86d81b5c72002fd011d66111f6@%3Cdev.metron.apache.org%3E
> > > >
> > >
> > --
> >
> > Jon
> >
>


Re: [DISCUSS] Time to remove github updates from dev?

2018-03-19 Thread Casey Stella
+1


On Mon, Mar 19, 2018 at 8:16 AM Andre <andre-li...@fucs.org> wrote:

> Folks,
>
> All rejoice. This has been finally implemented.
>
> Cheers
>
> On 7 Feb 2018 08:33, "Andre" <andre-li...@fucs.org> wrote:
>
> > All,
> >
> > Turns out the process is simpler:
> >
> > A PMC member must create the lists using the self-management potal:
> >
> >
> > selfserve.apache.org
> >
> >
> > Once this is done someone can update the INFRA-15988 ticket and the folks
> > will execute the changes.
> >
> >
> >
> > On Wed, Jan 31, 2018 at 12:15 AM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> >> We could also just skip ‘b’ and go directly to ‘c’ like apache-commons
> >> and have
> >> commits@ issues@.
> >>
> >>
> >>
> >>
> >> On January 30, 2018 at 08:03:37, Andre (andre-li...@fucs.org) wrote:
> >>
> >> James,
> >>
> >> Give nobody opposed, I would suggest one of the PMCs contact the INFRA
> to
> >> get this actioned.
> >>
> >> They would need to assist with:
> >>
> >> 1. Creation of the new "issues" list
> >> 2. redirect both GitHub and JIRA integrations to the new list
> >>
> >> Cheers
> >>
> >> On Sat, Jan 27, 2018 at 9:40 AM, James Sirota <jsir...@apache.org>
> >> wrote:
> >>
> >> > Should we file an infra ticket on this?
> >> >
> >> > 19.01.2018, 13:56, "zeo...@gmail.com" <zeo...@gmail.com>:
> >> > > I would give that +1 as well.
> >> > >
> >> > > Jon
> >> > >
> >> > > On Fri, Jan 19, 2018 at 3:32 PM Casey Stella <ceste...@gmail.com>
> >> wrote:
> >> > >
> >> > >> I could get behind that.
> >> > >>
> >> > >> On Fri, Jan 19, 2018 at 3:31 PM, Andre <andre-li...@fucs.org>
> >> wrote:
> >> > >>
> >> > >> > Folks,
> >> > >> >
> >> > >> > May I suggest Metron follows the NiFi mailing list strategy (we
> >> got
> >> > >> > inspired by another project but I don't recall the name) and
> >> remove
> >> > the
> >> > >> > github comments from the dev list?
> >> > >> >
> >> > >> > Within NiFi we have both the dev and the issues lists. dev is for
> >> > humans,
> >> > >> > issues is for JIRA and github commits.[1]
> >> > >> >
> >> > >> > This allows the list thread list to be cleaner and is
> particularly
> >> > >> helpful
> >> > >> > for those reading the list from a list aggregation service.
> >> > >> >
> >> > >> > Cheers
> >> > >> >
> >> > >> >
> >> > >> > [1] https://lists.apache.org/list.html?iss...@nifi.apache.org
> >> > >> >
> >> > >
> >> > > --
> >> > >
> >> > > Jon
> >> >
> >> > ---
> >> > Thank you,
> >> >
> >> > James Sirota
> >> > PMC- Apache Metron
> >> > jsirota AT apache DOT org
> >> >
> >> >
> >>
> >>
> >
>


Re: [DISCUSS] Knox SSO feature branch review and features

2018-09-27 Thread Casey Stella
I'm coming in late to the game here, but for my mind a feature branch
should involve the minimum architectural change to accomplish a given
feature.
The feature in question is SSO integration.  It seems to me that the
operative question is can we do the feature without making the OTHER
architectural change
(e.g. migrating from expressjs to spring boot + zuul).  I would argue that
if we WANT to do that, then it should be a separate feature branch.

Thus, I leave with a question: is there a way to accomplish this feature
without ripping out expressjs?

   - If so and it is feasible, I would argue that we should decouple this
   into a separate feature branch.
   - If so and it is infeasible, I'd like to hear an argument as to the
   infeasibility and let's decide given that
   - If it is not possible, then I'd argue that we should keep them coupled
   and move this through as-is.

On a side-note, it feels a bit weird that we're narrowing to a bundled
proxy, rather than having that be a pluggable thing.  I'm not super
knowledgeable in this space, so I apologize
in advance if this is naive, but isn't this a pluggable, external component
(e.g. nginx)?

On Thu, Sep 27, 2018 at 5:05 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I've spent some more time reading through Simon's response and the added
> sequence diagram. This is definitely helpful - thank you Simon.
>
> I need to redact my initial list:
>
>1. Node migrated to Spring Boot, expressjs migrated to a
>non-JS/non-NodeJs proxying mechanism (ie Zuul in this case)
>2. JDBC removed completely in favor of LDAP
>3. Knox/SSO
>
> I'm a bit conflicted on the best way to move forward and would like some
> thoughts from other community members on this. I think an argument can be
> made that 1 and 2 are independent of 3, and should/could really be
> independent PR's against master.
>
> The need for a replacement for expressjs (Zuul in this case) is an artifact
> that our request/response cycle for REST calls is a simple matter of
> forwarding with some additional headers for authentication. There's a
> JSESSIONID managed by the client browser in our current architecture, for
> example. You login to the alerts or the management UI which forwards a
> request to REST, which looks up credentials in a backend database, and
> passes the results back up the chain. All browser requests go directly to
> the specific UI you're working with - this is the CORS problem. You can't,
> without some effort with headers for adding other domains to the safe list
> or disabling the security check for CORS, make remote calls directly to
> REST. That's why we proxy. Switching over to Spring Boot leaves a gap with
> expressjs having handled the proxying and filtering, since it's only
> available to a NodeJs application (it's server-side javascript vs the
> client side javascript deployed via our Angular applications). Enter Zuul,
> which now effectively handles that. At runtime, Zuul is a part of the
> Spring app that serves up our UI's. It handles the requests via filtering,
> forwards them to REST, manages the response back to the client. Very
> similar to what expressjs was doing, per my current understanding. The
> sequence diagrams Simon added are useful, and I think some of what was less
> clear was what we currently vs what the new changes are doing to the
> architecture. This is no fault of Simon's - there simply wasn't any
> architecture diagrams/documents around this before. Here's my impression of
> the very very basic current state - someone more familiar with this
> architecture please advise if I'm incorrect about anything (probably Ryan).
>
> https://imgur.com/f8GtSmh
>
> Zuul would be replacing the bit about expressjs in the diagram, and instead
> of node we have spring boot. This covers 1. 2 and 3 are other issues. I'd
> like to see similar exposition of those server processes with knox
> involved. I imagine in that case we bump up from 3 to 4 server instances
> for the additional knox endpoint.
>
> Mike
>
>
>
>
>
> On Wed, Sep 19, 2018 at 11:28 AM James Sirota  wrote:
>
> > Thank you, Simon.  The diagrams help a lot
> >
> > 19.09.2018, 21:27, "Simon Elliston Ball" :
> > > To clarify some of this I've put some documentation into
> > > https://github.com/apache/metron/pull/1203 under METRON-1755 (
> > > https://issues.apache.org/jira/browse/METRON-1755). Hopefully the
> > diagrams
> > > there should make it clearer.
> > >
> > > Simon
> > >
> > > On Tue, 18 Sep 2018 at 14:17, Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > >>  Hi Mike,
> > >>
> > >>  Some good points here which could do with some clarification. I
> suspect
> > >>  the architecture documentation could be clearer and fill in some of
> > these
> > >>  gaps, and I'll have a look at working on that and providing some
> > diagrams.
> > >>
> > >>  The short version is that the Zuul proxy gateway has been added to
> > replace
> > >>  the Nodejs express 

Re: [DISCUSS] Slack Channel Use

2018-10-22 Thread Casey Stella
I am of 2 minds, but I tend to agree.  On the one hand, it's definitely the
preference that we use the mailing lists for the reasons you stated (and
also because not everyone has access to slack generally).  On the other
hand, I think an interactive medium like Slack has a lot of advantages in
terms of user satisfaction.  Ultimately, though, we may satisfy 1 user at
the cost of not persisting the discussion and satisfying many users.

I'll go along with a specific preference to drive more discussion to the
mailing list.

Casey

On Mon, Oct 22, 2018 at 12:18 PM Nick Allen  wrote:

> It seems that we are seeing a lot of Metron usage and support questions on
> the Slack Channel.
> These are questions that previously would have been directed to the User or
> Dev mailing lists.  Since this is occurring in the Slack Channel, the
> conversations are not archived.
>
> In my opinion, this is not good for the Metron community.  Having this
> persisted in a discoverable form (like a mailing list archive) not only
> helps support current users, but also helps *potential* users understand
> how Metron is being used.
>
> Does anyone else agree or disagree?  At a minimum, I feel we need to do
> something to direct these conversations back to the mailing list.
>


Re: [DISCUSS] Slack Channel Use

2018-10-22 Thread Casey Stella
Agreed, the benefit of the mailing list is that it’s searchable by ponymail
and the major search engines.
On Mon, Oct 22, 2018 at 17:18 Nick Allen  wrote:

> I don't know that it is the same kind of searchable.  Is it being indexed
> by the major search engines?  I have never used a search engine and
> uncovered the answer to my problem in a Slack archive.
>
> On Mon, Oct 22, 2018 at 5:05 PM Otto Fowler 
> wrote:
>
> > According to Greg Stein, an infra admin on the NiFi slack, the ASF slack
> > that metron is in IS the standard plan, not the free one and is
> searchable
> > past 10,000 messages.
> >
> >
> >
> > On October 22, 2018 at 15:35:51, Michael Miklavcic (
> > michael.miklav...@gmail.com) wrote:
> >
> > ...From an archival and broader reach point of view, I do think there's
> > something to be said about using the mailing list. It's also easier to
> link
> > to Q/A threads from the mailing list archives and do searches...
> >
> >
> https://lists.apache.org/thread.html/1aa85bc13d41e04a1f85c3100c2b803abe35d79b54062bbeaab83ace@%3Cdev.metron.apache.org%3E
> >
> > How very Inception.
> >
> >
> > On Mon, Oct 22, 2018 at 1:32 PM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I just want to point out that we currently have 32 members in the
> Metron
> > > Slack channel which I personally think is a great sign. This is good
> from
> > a
> > > community perspective and helps foster interactive sessions where
> > required.
> > > From an archival and broader reach point of view, I do think there's
> > > something to be said about using the mailing list. It's also easier to
> > link
> > > to Q/A threads from the mailing list archives and do searches. As
> such, I
> > > would also go along with Nick's suggestion and urge members to prefer
> the
> > > user/dev list where possible.
> > >
> > > On Mon, Oct 22, 2018 at 10:51 AM Justin Leet 
> > > wrote:
> > >
> > >> If we want to push more discussion to the dev list, my obvious follow
> up
> > >> question then is "What are we hoping to get out of Slack/irc/other
> > >> interactive medium?". What discussion would we even want on there, if
> we
> > >> can't have decisions and don't want usage/support?
> > >>
> > >> On Mon, Oct 22, 2018 at 12:44 PM Casey Stella 
> > wrote:
> > >>
> > >> > I am of 2 minds, but I tend to agree. On the one hand, it's
> definitely
> > >> the
> > >> > preference that we use the mailing lists for the reasons you stated
> > (and
> > >> > also because not everyone has access to slack generally). On the
> other
> > >> > hand, I think an interactive medium like Slack has a lot of
> advantages
> > >> in
> > >> > terms of user satisfaction. Ultimately, though, we may satisfy 1
> user
> > >> at
> > >> > the cost of not persisting the discussion and satisfying many users.
> > >> >
> > >> > I'll go along with a specific preference to drive more discussion to
> > the
> > >> > mailing list.
> > >> >
> > >> > Casey
> > >> >
> > >> > On Mon, Oct 22, 2018 at 12:18 PM Nick Allen 
> > wrote:
> > >> >
> > >> > > It seems that we are seeing a lot of Metron usage and support
> > >> questions
> > >> > on
> > >> > > the Slack Channel.
> > >> > > These are questions that previously would have been directed to
> the
> > >> User
> > >> > or
> > >> > > Dev mailing lists. Since this is occurring in the Slack Channel,
> the
> > >> > > conversations are not archived.
> > >> > >
> > >> > > In my opinion, this is not good for the Metron community. Having
> > this
> > >> > > persisted in a discoverable form (like a mailing list archive) not
> > >> only
> > >> > > helps support current users, but also helps *potential* users
> > >> understand
> > >> > > how Metron is being used.
> > >> > >
> > >> > > Does anyone else agree or disagree? At a minimum, I feel we need
> to
> > >> do
> > >> > > something to direct these conversations back to the mailing list.
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: [DISCUSS] Stellar REST client

2018-10-19 Thread Casey Stella
I think it makes a lot of sense.  A couple of questions:

   - What actions do you see the REST verbs corresponding to?  I would
   understand GET (which is in effect "evaluate an expression", right?), but
   I'm not sure about the others.
   - We should probably be careful about caching stellar expressions.  Not
   all stellar expressions are deterministic (e.g. PROFILE_GET may not be as
   the lookback window is bound to current time).  Ultimately, I think we
   should probably bake whether a function is deterministic into stellar so
   that *stellar* can cache where appropriate (e.g. if every part of an
   expression is deterministic, then pull from cache otherwise recompute).
   All of this to say, if you're going to make it configurable, IMO we should
   make it a configuration that the user passes in at request time so they
   have the control over whether the expression is safe to cache or otherwise.

Without more compelling reasons to not do so, I'd suggest we use HTTP
Components as it's another apache project and under active
development/support.  I'd also be ok with OkHttp if it's actively
maintained.

On Fri, Oct 19, 2018 at 11:46 AM Ryan Merriman  wrote:

> I want to open up discussion around adding a Stellar REST client function.
> There are services available to enrich security telemetry and they are
> commonly exposed through a REST interface.  The primary purpose of this
> discuss thread to collect requirements from the community and agree on a
> general architectural approach.
>
> At a minimum I see a Stellar REST client supporting:
>
>- Common HTTP verbs including GET, POST, DELETE, etc
>- Option to provide headers and request parameters as needed
>- Support for basic authentication
>- Proper request and error handling (we can discuss further how this
>should work)
>- SSL support
>- Option to use a proxy server (including authentication)
>- JSON format
>
> In addition to these functional requirements, I would also propose we
> include these performance requirements:
>
>- Provide a configurable caching layer
>- Provide a mechanism for pooling connections
>- Provide clear documentation and guidance on how to properly use this
>feature since there is a significant risk of introducing latency issues
>
> What else would you like to see included?
>
> I think the primary architectural decision we need to make (based on the
> agreed upon requirements of course) is an appropriate Java HTTP/REST client
> library.  Ideally we choose a library that supports everything we need
> OOTB.  I think the majority of the work for this feature will involve
> wrapping this library in a Stellar function and exposing the configuration
> knobs through Metron's configuration interface (Ambari, Zookeeper, etc).  I
> have done some very light research and here is my initial list:
>
>- Apache HttpComponents - https://hc.apache.org/
>- Has support for all of the features listed above as far as I can tell
>   - Doesn't introduce a large number of new dependencies (am I wrong
>   here?)
>   - Is sort of included already (we will need to upgrade from
>   httpclient)
>   - Lower level
>- Google HTTP Client Library for Java -
>
> https://developers.google.com/api-client-library/java/google-http-java-client/
>- Higher level API with pluggable components
>   - Introduces dependencies (we've had issues with Guava in the past)
>- Netflix Ribbon - https://github.com/Netflix/ribbon
>   - Has a lot of nice features that may be useful in the future
>   - Introduces dependencies (including guava)
>   - Hasn't been committed to in the last 5-6 months
>- Unirest - https://github.com/Kong/unirest-java
>   - Lightweight API built on top of HttpComponents
>   - Pluggable serialization library (jackson is an issue for us so this
>   is nice)
>   - Also has not received a commit in a while
>- OkHttp - http://square.github.io/okhttp/
>- Good documentation and looks easy to use
>   - Actively maintained
>
> Obviously we have a lot of choices.  I think it comes down to balancing the
> tradeoff between ease of use (HttpComponents will likely require the most
> work since it is lower level) and capability.  Introducing additional
> dependencies is something we should also be mindful of because our shading
> practices.
>
> This should get us started.  Let me know what you think!
>


Re: [DISCUSS] Slack Channel Use

2018-10-24 Thread Casey Stella
Not for nothing, but at least according to the last board report that I
submitted, the user@ traffic is up 100% and the dev list traffic is flat as
compared to last quarter.  That's not to say that we couldn't stand more
discussion on the lists, but a lot of the dev discussion happens on github
and JIRA and I'm happy to see an uptick in user traffic.

On Wed, Oct 24, 2018 at 10:05 AM Otto Fowler 
wrote:

> I wouldn’t be so quick to related the slack discussion with perceived
> activity on the list.
> That is more do to the other things that are bigger issues.
>
>
> On October 24, 2018 at 07:15:30, Nick Allen (n...@nickallen.org) wrote:
>
> > I have heard recently people thought Metron is sort of dead just because
> the mailing list is not so active anymore!
>
> That is exactly my concern.
>
>
> On Wed, Oct 24, 2018, 2:49 AM Ali Nazemian  wrote:
>
> > I kind of expect to have Slack for more dev related discussions rather
> than
> > user QA. I guess it is quite common to expect mailing list to be used for
> > the purpose of knowledge sharing to make sure it will be accessible by
> > other users as well. Of course, it is a trade-off that most of the other
> > Apache projects decided to accept the risk of keeping user related
> > discussions out of Slack/IRC. However, it sometimes happens to see the
> > mixture of questions coming to Slack. I have heard recently people
> thought
> > Metron is sort of dead just because the mailing list is not so active
> > anymore!
> >
> > Cheers,
> > Ali
> >
> > On Tue, Oct 23, 2018 at 8:23 AM Casey Stella  wrote:
> >
> > > Agreed, the benefit of the mailing list is that it’s searchable by
> > ponymail
> > > and the major search engines.
> > > On Mon, Oct 22, 2018 at 17:18 Nick Allen  wrote:
> > >
> > > > I don't know that it is the same kind of searchable. Is it being
> > indexed
> > > > by the major search engines? I have never used a search engine and
> > > > uncovered the answer to my problem in a Slack archive.
> > > >
> > > > On Mon, Oct 22, 2018 at 5:05 PM Otto Fowler  >
> > > > wrote:
> > > >
> > > > > According to Greg Stein, an infra admin on the NiFi slack, the ASF
> > > slack
> > > > > that metron is in IS the standard plan, not the free one and is
> > > > searchable
> > > > > past 10,000 messages.
> > > > >
> > > > >
> > > > >
> > > > > On October 22, 2018 at 15:35:51, Michael Miklavcic (
> > > > > michael.miklav...@gmail.com) wrote:
> > > > >
> > > > > ...From an archival and broader reach point of view, I do think
> > there's
> > > > > something to be said about using the mailing list. It's also easier
> > to
> > > > link
> > > > > to Q/A threads from the mailing list archives and do searches...
> > > > >
> > > > >
> > > >
> > >
> >
>
> https://lists.apache.org/thread.html/1aa85bc13d41e04a1f85c3100c2b803abe35d79b54062bbeaab83ace@%3Cdev.metron.apache.org%3E
> > > > >
> > > > > How very Inception.
> > > > >
> > > > >
> > > > > On Mon, Oct 22, 2018 at 1:32 PM Michael Miklavcic <
> > > > > michael.miklav...@gmail.com> wrote:
> > > > >
> > > > > > I just want to point out that we currently have 32 members in the
> > > > Metron
> > > > > > Slack channel which I personally think is a great sign. This is
> > good
> > > > from
> > > > > a
> > > > > > community perspective and helps foster interactive sessions where
> > > > > required.
> > > > > > From an archival and broader reach point of view, I do think
> > there's
> > > > > > something to be said about using the mailing list. It's also
> easier
> > > to
> > > > > link
> > > > > > to Q/A threads from the mailing list archives and do searches. As
> > > > such, I
> > > > > > would also go along with Nick's suggestion and urge members to
> > prefer
> > > > the
> > > > > > user/dev list where possible.
> > > > > >
> > > > > > On Mon, Oct 22, 2018 at 10:51 AM Justin Leet <
> > justinjl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> If we want to push more discussion t

Re: [DISCUSS] Slack Channel Use

2018-10-24 Thread Casey Stella
quick clarification, I said "a lot of dev discussion happens on github and
JIRA".  I want to make sure I didn't mean to imply that larger decisions
were being made outside of the appropriate place, the dev list.

On Wed, Oct 24, 2018 at 10:08 AM Casey Stella  wrote:

> Not for nothing, but at least according to the last board report that I
> submitted, the user@ traffic is up 100% and the dev list traffic is flat
> as compared to last quarter.  That's not to say that we couldn't stand more
> discussion on the lists, but a lot of the dev discussion happens on github
> and JIRA and I'm happy to see an uptick in user traffic.
>
> On Wed, Oct 24, 2018 at 10:05 AM Otto Fowler 
> wrote:
>
>> I wouldn’t be so quick to related the slack discussion with perceived
>> activity on the list.
>> That is more do to the other things that are bigger issues.
>>
>>
>> On October 24, 2018 at 07:15:30, Nick Allen (n...@nickallen.org) wrote:
>>
>> > I have heard recently people thought Metron is sort of dead just because
>> the mailing list is not so active anymore!
>>
>> That is exactly my concern.
>>
>>
>> On Wed, Oct 24, 2018, 2:49 AM Ali Nazemian  wrote:
>>
>> > I kind of expect to have Slack for more dev related discussions rather
>> than
>> > user QA. I guess it is quite common to expect mailing list to be used
>> for
>> > the purpose of knowledge sharing to make sure it will be accessible by
>> > other users as well. Of course, it is a trade-off that most of the other
>> > Apache projects decided to accept the risk of keeping user related
>> > discussions out of Slack/IRC. However, it sometimes happens to see the
>> > mixture of questions coming to Slack. I have heard recently people
>> thought
>> > Metron is sort of dead just because the mailing list is not so active
>> > anymore!
>> >
>> > Cheers,
>> > Ali
>> >
>> > On Tue, Oct 23, 2018 at 8:23 AM Casey Stella 
>> wrote:
>> >
>> > > Agreed, the benefit of the mailing list is that it’s searchable by
>> > ponymail
>> > > and the major search engines.
>> > > On Mon, Oct 22, 2018 at 17:18 Nick Allen  wrote:
>> > >
>> > > > I don't know that it is the same kind of searchable. Is it being
>> > indexed
>> > > > by the major search engines? I have never used a search engine and
>> > > > uncovered the answer to my problem in a Slack archive.
>> > > >
>> > > > On Mon, Oct 22, 2018 at 5:05 PM Otto Fowler <
>> ottobackwa...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > According to Greg Stein, an infra admin on the NiFi slack, the ASF
>> > > slack
>> > > > > that metron is in IS the standard plan, not the free one and is
>> > > > searchable
>> > > > > past 10,000 messages.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On October 22, 2018 at 15:35:51, Michael Miklavcic (
>> > > > > michael.miklav...@gmail.com) wrote:
>> > > > >
>> > > > > ...From an archival and broader reach point of view, I do think
>> > there's
>> > > > > something to be said about using the mailing list. It's also
>> easier
>> > to
>> > > > link
>> > > > > to Q/A threads from the mailing list archives and do searches...
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> https://lists.apache.org/thread.html/1aa85bc13d41e04a1f85c3100c2b803abe35d79b54062bbeaab83ace@%3Cdev.metron.apache.org%3E
>> > > > >
>> > > > > How very Inception.
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 22, 2018 at 1:32 PM Michael Miklavcic <
>> > > > > michael.miklav...@gmail.com> wrote:
>> > > > >
>> > > > > > I just want to point out that we currently have 32 members in
>> the
>> > > > Metron
>> > > > > > Slack channel which I personally think is a great sign. This is
>> > good
>> > > > from
>> > > > > a
>> > > > > > community perspective and helps foster interactive sessions
>> where
>> > > > > required.
>> > > > > > From an archival and broader reach point of view, I do think
>> > there's
>> > >

Re: [DISCUSS] Deprecate split-join enrichment topology in favor of unified enrichment topology

2018-11-01 Thread Casey Stella
+1
On Thu, Nov 1, 2018 at 18:34 Nick Allen  wrote:

> +1
>
> On Thu, Nov 1, 2018, 6:27 PM Justin Leet  wrote:
>
> > +1, I haven't seen any case where the split-join topology isn't made
> > obsolete by the unified topology.
> >
> > On Thu, Nov 1, 2018 at 6:17 PM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > Fellow Metronians,
> > >
> > > We've had the unified enrichment topology around for a number of months
> > > now, it has proved itself stable, and there is yet to be a time that I
> > have
> > > seen the split-join topology outperform the unified one. Here are some
> > > simple reasons to deprecate the split-join topology.
> > >
> > >1. Unified topology performs better.
> > >2. The configuration, especially for performance tuning is much,
> much
> > >simpler in the unified model.
> > >3. The footprint within the cluster is smaller.
> > >4. One of the first activities for any install is that we spend time
> > >instructing users to switch to the unified topology.
> > >5. One less moving part to maintain.
> > >
> > > I'd like to recommend that we deprecate the split-join topology and
> make
> > > the unified enrichment topology the new default.
> > >
> > > Best,
> > > Mike
> > >
> >
>


Re: [DISCUSS] Attribution and merging the Elasticsearch client migration

2018-11-15 Thread Casey Stella
Can you at least rename your commits to have METRON-1834 prefixing them?
On Thu, Nov 15, 2018 at 15:19 Michael Miklavcic 
wrote:

> https://github.com/apache/metron/pull/1242
>
> TL;DR
> I'd like to discuss the best option to merge METRON-1834 into master. I
> want to propose handling this like a feature branch and merging it as-is.
> ---
>
> I'm sure most folks' initial reaction will be some skepticism akin to "have
> you tried turning it off again," as this was my initial reaction as well.
> It does not seem like this should be difficult. And I'm hoping that this
> may be some esoteric thing on my system, though I believe this is a real
> problem. A rather tedious explanation follows of what I've tried and the
> problems encountered along the way. What seemed like a really simple
> problem instead appears to be a bit much for Git to handle without
> requiring redoing merges and another full round of testing. I'd much prefer
> to avoid that in this instance.
>
> This PR is ready to be merged into master. It's recent and very close to
> fully up to date in the branch. Latest master merges cleanly. There is an
> attribution to Casey Stella for the base point of this PR that I need to
> include when getting this into master. When I created my branch, I
> collapsed his initial set of commits into a single squashed commit on
> master at the time, and I started to work from there. Over time, I made a
> number of additional commits and merges from master. Now for the issues.
>
> Originally, my expectation was that I could have 2 commits - the original
> squashed commit from Casey along with all my additional commits (and the
> merges with master) right on top. Nice clean history on master. Turns out,
> this doesn't work as cleanly as expected because a combination of the
> multiple merges and the need to keep the original commit with attribution
> to Casey's work. A normal git pull --squash works fine, as expected, but we
> lose the base commit, and therefore the requisite attribution. Here are
> some other things I've tried, to no avail.
>
>1. Git pull --squash after a merge with master. This will squash the
>entire tree back to the branch point. No good.
>2. Git rebase -i master. Allows you to cleanly apply changes, but then
>it ends up having problems with a clean rebase and shows conflicts. I
>expect this is because of the merge history being necessary.
>3. Checking out a branch from the base point squashed commit from Casey,
>and attempt to apply my changes on top. Numerous methods for
>squashing/rebasing my changes on top applies nicely in the branch. But
> then
>it once again causes merge conflicts when I attempt to get this onto
>master. Things I attempted include: manually copying files, rebasing
> all my
>commits plus merges on top of the base commit, git merge --squash,
>intimidation.
>
> For one example of the result I'm talking about, this looks "good" but it's
> missing a ton of recent commits because they get caught up in the rebase
> and get squashed in with my commit. When you attempt to merge this onto
> master, it is just plain wrong (see example below with merge conflicts).
> * 22c3b3bc 2018-11-15 | METRON-1834: Migrate Elasticsearch from
> TransportClient to new Java REST API (mmiklavc via mmiklavc) closes
> apache/metron#1242 (HEAD -> stella-es-base2) [mmiklavc]
> * 84232e90 2018-10-08 | METRON-1834: Elasticsearch rest client migration
> base work starting point for apache/metron#1242 (cstella via mmiklavc)
> [cstella]
> * 5bfc08c5 2018-10-08 | METRON-1792 Simplify Profile Definitions in
> Integration Tests (nickwallen) closes apache/metron#1211 [nickwallen]
>
> Here's 1 merge conflict (say what??)
> CONFLICT (rename/rename): Rename
>
> "metron-interface/metron-config/src/app/rxjs-operators.ts"->"metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/ParserRunnerResults.java"
> in branch "HEAD" rename
>
> "metron-interface/metron-config/src/app/rxjs-operators.ts"->"metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/FieldMapping.java"
> in "stella-es-base2"
>
> If I attempt to use rebase on master instead of merge, it really seems to
> mess up the files. Again, another example where I have TODO's that are most
> definitely removed by a commit in my branch and also do not exist in
> master. I'm not sure what's happening here, but I don't trust it.
> {
>   //TODO: It shouldn't require an assertEventually() here as it should
> be synchronous.
>   // Before merging, please figure out why.
>   TestUtils.assertEventually(()

Re: [DISCUSS] Add e2e step to PR checklist

2018-10-05 Thread Casey Stella
This is really good feedback, Nick. I agree, we need them to be reliable
enough to not be a source of constant false positives prior to putting them
into the checklist.
On Thu, Oct 4, 2018 at 15:34 Nick Allen  wrote:

> I think we still have an issue of reliability.  I can never reliably get
> them all to pass.  I have no idea which failures are real.  Am I the only
> one that experiences this?
>
> We need a reliable pass/fail on these before we talk about adding them to
> the checklist.  For example, I just tried to run them on METRON-1771.  I
> don't think we have a problem with these changes, but I have not been able
> to get one run to fully pass.  See the attached output of those runs.
>
>
>
> On Wed, Oct 3, 2018 at 7:36 AM Shane Ardell 
> wrote:
>
>> I ran them locally a handful of times just now, and on average they took
>> approximately 15 minutes to complete.
>>
>> On Tue, Oct 2, 2018, 18:22 Michael Miklavcic > >
>> wrote:
>>
>> > @Shane Just how much time are we talking about, on average? I don't
>> think
>> > many in the community have had much exposure to running the e2e tests in
>> > their current form. It might still be worth it in the short term.
>> >
>> > On Tue, Oct 2, 2018 at 10:20 AM Shane Ardell 
>> > wrote:
>> >
>> > > The protractor-flake package should catch and re-run false failures,
>> so
>> > > people shouldn't get failing tests when they are done running. I just
>> > meant
>> > > that we often re-run flaky tests with protractor-flake, so it can
>> take a
>> > > while to run and could increase the build time considerably.
>> > >
>> > > On Tue, Oct 2, 2018, 18:00 Casey Stella  wrote:
>> > >
>> > > > Are the tests so brittle that, even with flaky, people will run upon
>> > > false
>> > > > failures as part of contributing a PR?  If so, do we have a list of
>> the
>> > > > brittle ones (and the things that would disambiguate a true failure
>> > from
>> > > a
>> > > > false failure) that we can add to the documentation?
>> > > >
>> > > > On Tue, Oct 2, 2018 at 11:58 AM Shane Ardell <
>> shane.m.ard...@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > I also would like to eventually have these tests automated. There
>> > are a
>> > > > > couple hurdles to setting up our e2e tests to run with our build.
>> I
>> > > think
>> > > > > the biggest hurdle is setting up a dedicated server with data for
>> the
>> > > e2e
>> > > > > tests to use. I would assume this requires funding, engineering
>> > > support,
>> > > > > obfuscated data, etc. I also think we should migrate our e2e
>> tests to
>> > > > > Cypress first because Protractor lacks debugging tools that would
>> > make
>> > > > our
>> > > > > life much easier if, for example, we had a failure in our CI build
>> > but
>> > > > > could not reproduce locally. In addition, our current Protractor
>> > tests
>> > > > are
>> > > > > brittle and extremely slow.
>> > > > >
>> > > > > All that said, it seems we agree that we could add another PR
>> > checklist
>> > > > > item in the meantime. Clarifying those e2e test instructions
>> should
>> > be
>> > > > part
>> > > > > of that task.
>> > > > >
>> > > > > On Mon, Oct 1, 2018 at 2:36 PM Casey Stella 
>> > > wrote:
>> > > > >
>> > > > > > I'd also like to make sure that clear instructions are provided
>> (or
>> > > > > linked
>> > > > > > to) about how to run them.  Also, we need to make sure the
>> > > instructions
>> > > > > are
>> > > > > > rock-solid for running them.
>> > > > > > Looking at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/metron/tree/master/metron-interface/metron-alerts#e2e-tests
>> > > > > > ,
>> > > > > > would someone who doesn't have much or any knowledge of the UI
>> be
>> > > able
>> > > > to
>> > >

Re: [DISCUSS] Feature Branch guidance

2018-09-29 Thread Casey Stella
subsequent feature branches have gotten much better. We should take
> the
> > lessons learned along the way and formalize them as Casey is recommending
> > in our bylaws. I'll be following up with more specific thoughts on
> > language.
> >
> > Best,
> > Mike
> >
> >
> > On Fri, Sep 28, 2018 at 10:13 AM Justin Leet 
> > wrote:
> >
> > > Ticket created: https://issues.apache.org/jira/browse/METRON-1799
> > >
> > > I think that whole '/develop' is orphaned and can be dropped.
> > >
> > > On Fri, Sep 28, 2018 at 12:12 PM Casey Stella 
> > wrote:
> > >
> > > > I just noticed this, but googling "metron bylaws" yields
> > > > http://metron.apache.org/develop/bylaws.html which is not our
> bylaws.
> > > Our
> > > > bylaws are on
> > > >
> > https://cwiki.apache.org/confluence/display/METRON/Apache+Metron+Bylaws
> > > >
> > > > We should fix that.
> > > >
> > > > On Fri, Sep 28, 2018 at 12:02 PM Casey Stella 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > Given discussions about the current high-profile feature branch
> (Knox
> > > > > SSO), I thought it might be appropriate to have a conversation
> about
> > > what
> > > > > constitutes a feature branch and get some of this encoded in the
> > > > community
> > > > > guidelines.
> > > > >
> > > > > Specifically, there was the request made that we split up the Knox
> > SSO
> > > > > feature branch due to the current implementation including a
> > distinct,
> > > > new
> > > > > architectural change (specifically, in my mind, the migration from
> > > > > expressjs to Spring Boot + Zuul). In that discussion, I made the
> > > > assertion
> > > > > that "for my mind a feature branch should involve the minimum
> > > > > architectural change to accomplish a given feature." (I apologize
> for
> > > > > quoting myself here ;) . I realize, however, that we have not
> encoded
> > > > > this in our bylaws and I might not be speaking with the authority
> of
> > > the
> > > > > community at my back.
> > > > >
> > > > > Ultimately, I feel very sad that we didn't get around to clarifying
> > > this
> > > > > and we're in a situation where, had we made this clarification
> > earlier,
> > > > > we'd not be in a situation where a contributor has to make a major
> > > > > refactoring to a substantial feature. To that end, I think we
> should
> > > > > hammer this out once and for all so it's clear.
> > > > >
> > > > > So, I'd like to separate out this discussion here. My justification
> > > for
> > > > > my belief (beyond that I think it's the correct thing to do) was
> the
> > > > > precedent set with 777, admittedly pre-feature branches, wherein we
> > > > > requested a series of splits to get to smaller units of
> functionality
> > > and
> > > > > isolate large architectural changes. I think that it is good
> > practice
> > > to
> > > > > isolate major feature changes to separate feature branches to
> ensure
> > > that
> > > > > we get sufficient discussion around each of those changes. To that
> > > end,
> > > > > I'd like some language encoded into the bylaws describing this.
> > > > >
> > > > > So, I'm opening up the discussion. Most specifically what I'd like
> > to
> > > > > know is:
> > > > >
> > > > > - What do you think about the idea that feature branches should
> > > > > contain the minimal architectural changes to accomplish a feature
> > > > (unless
> > > > > the full scope of architectural changes are mentioned in the
> > discuss
> > > > thread
> > > > > about the feature)?
> > > > > - If you agree (or disagree), do you think a change to the bylaws
> > is
> > > > > merited to encode this policy? If you do think so, then what
> > > wording
> > > > would
> > > > > you suggest?
> > > > >
> > > > >
> > > > > Hopefully that all makes sense.
> > > > >
> > > > > Casey
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Add e2e step to PR checklist

2018-10-01 Thread Casey Stella
I'd also like to make sure that clear instructions are provided (or linked
to) about how to run them.  Also, we need to make sure the instructions are
rock-solid for running them.
Looking at
https://github.com/apache/metron/tree/master/metron-interface/metron-alerts#e2e-tests,
would someone who doesn't have much or any knowledge of the UI be able to
run that without assistance?

For instance, we use full-dev, do we need to stop data from being played
into full-dev for the tests to work?

Casey

On Mon, Oct 1, 2018 at 8:29 AM Casey Stella  wrote:

> I'm not super keen on expanding the steps to contribute, especially in an
> avenue that should be automated.
> That being said, I think that until we get to the point of automating the
> e2e tests, it's sensible to add them to the checklist.
> So, I would support it, but I would also urge us to move forward the
> efforts of running these tests as part of the CI build.
>
> What is the current gap there?
>
> Casey
>
> On Mon, Oct 1, 2018 at 7:41 AM Shane Ardell 
> wrote:
>
>> Hello everyone,
>>
>> In another discussion thread from July, I briefly mentioned the idea of
>> adding a step to the pull request checklist asking contributors to run the
>> UI end-to-end tests. Since we aren't running e2e tests as part of the CI
>> build, it's easy for contributors to unintentionally break these tests.
>> Reminding contributors to run these tests will hopefully help catch
>> situations like this before opening a pull request.
>>
>> Does this make sense to everyone?
>>
>> Regards,
>> Shane
>>
>


Re: [DISCUSS] Add e2e step to PR checklist

2018-10-01 Thread Casey Stella
I'm not super keen on expanding the steps to contribute, especially in an
avenue that should be automated.
That being said, I think that until we get to the point of automating the
e2e tests, it's sensible to add them to the checklist.
So, I would support it, but I would also urge us to move forward the
efforts of running these tests as part of the CI build.

What is the current gap there?

Casey

On Mon, Oct 1, 2018 at 7:41 AM Shane Ardell 
wrote:

> Hello everyone,
>
> In another discussion thread from July, I briefly mentioned the idea of
> adding a step to the pull request checklist asking contributors to run the
> UI end-to-end tests. Since we aren't running e2e tests as part of the CI
> build, it's easy for contributors to unintentionally break these tests.
> Reminding contributors to run these tests will hopefully help catch
> situations like this before opening a pull request.
>
> Does this make sense to everyone?
>
> Regards,
> Shane
>


Re: [DISCUSS] Replacing Moment.js with date-fns or native functions

2018-09-26 Thread Casey Stella
I think it's fine.  My only concern would be that we aren't accidentally
using moment.js somewhere for something that date-fns doesn't do.  I
suspect whoever picks up the ticket will figure that out pretty quick
though. ;) . I'm +1 on the move; you convinced me.

On Wed, Sep 26, 2018 at 8:36 AM Tamás Fodor  wrote:

> I'd like to open a discussion about replacing Moment.js
>  in Alerts UI. date-fns  has
> almost the same functionality to manipulate date/time in the application.
> Moment.js is a great library but it has a huge footprint in the bundle and
> it's significant when it comes to performance optimization in order to
> decrease the initial load time.
>
> Here you can find a brief introduction of the problem and a few ideas how
> to replace it with date-fns or native functions.
> https://github.com/you-dont-need/You-Dont-Need-Momentjs
>
> *The problem:*
> In order to use Moment.js, we have to import the whole library which has
> significant impact on the size of the production bundle of our javascript
> code.
>
> *A few examples:*
>
> https://github.com/apache/metron/blob/e66cfc80e6a6fa53110c3f2fa8ee0d31ea997bf6/metron-interface/metron-alerts/src/app/utils/utils.ts#L18
>
> https://github.com/apache/metron/blob/ccdbeff5076553382091d4b9423ed48ccdba10ee/metron-interface/metron-alerts/src/app/shared/pipes/time-lapse.pipe.ts#L20
>
> Even though, we have tree-shaking
>  which is a great feature of
> javascript bundlers (in our case Webpack), because of how Moment.js is
> structured, it prevents the bundler from doing tree-shaking properly
> <
> https://github.com/you-dont-need/You-Dont-Need-Momentjs/blob/master/README.md
> >.
> date-fns is just fine with that
> .
> For example, to use the `format` functions we don't have to import the
> whole library, we can import only the format function
>  individually.
>
> What do you think?
>
> Tamas
>


[DISCUSS] Feature Branch guidance

2018-09-28 Thread Casey Stella
Hi All,

Given discussions about the current high-profile feature branch (Knox SSO),
I thought it might be appropriate to have a conversation about what
constitutes a feature branch and get some of this encoded in the community
guidelines.

Specifically, there was the request made that we split up the Knox SSO
feature branch due to the current implementation including a distinct, new
architectural change (specifically, in my mind, the migration from
expressjs to Spring Boot + Zuul).  In that discussion, I made the assertion
that "for my mind a feature branch should involve the minimum architectural
change to accomplish a given feature." (I apologize for quoting myself here
;) . I realize, however, that we have not encoded this in our bylaws and I
might not be speaking with the authority of the community at my back.

Ultimately, I feel very sad that we didn't get around to clarifying this
and we're in a situation where, had we made this clarification earlier,
we'd not be in a situation where a contributor has to make a major
refactoring to a substantial feature.  To that end, I think we should
hammer this out once and for all so it's clear.

So, I'd like to separate out this discussion here.  My justification for my
belief (beyond that I think it's the correct thing to do) was the precedent
set with 777, admittedly pre-feature branches, wherein we requested a
series of splits to get to smaller units of functionality and isolate large
architectural changes.  I think that it is good practice to isolate major
feature changes to separate feature branches to ensure that we get
sufficient discussion around each of those changes.  To that end, I'd like
some language encoded into the bylaws describing this.

So, I'm opening up the discussion.  Most specifically what I'd like to know
is:

   - What do you think about the idea that feature branches should contain
   the minimal architectural changes to accomplish a feature (unless the full
   scope of architectural changes are mentioned in the discuss thread about
   the feature)?
   - If you agree (or disagree), do you think a change to the bylaws is
   merited to encode this policy?  If you do think so, then what wording would
   you suggest?


Hopefully that all makes sense.

Casey


Re: [DISCUSS] Feature Branch guidance

2018-09-28 Thread Casey Stella
I just noticed this, but googling "metron bylaws" yields
http://metron.apache.org/develop/bylaws.html which is not our bylaws.  Our
bylaws are on
https://cwiki.apache.org/confluence/display/METRON/Apache+Metron+Bylaws

We should fix that.

On Fri, Sep 28, 2018 at 12:02 PM Casey Stella  wrote:

> Hi All,
>
> Given discussions about the current high-profile feature branch (Knox
> SSO), I thought it might be appropriate to have a conversation about what
> constitutes a feature branch and get some of this encoded in the community
> guidelines.
>
> Specifically, there was the request made that we split up the Knox SSO
> feature branch due to the current implementation including a distinct, new
> architectural change (specifically, in my mind, the migration from
> expressjs to Spring Boot + Zuul).  In that discussion, I made the assertion
> that "for my mind a feature branch should involve the minimum
> architectural change to accomplish a given feature." (I apologize for
> quoting myself here ;) . I realize, however, that we have not encoded
> this in our bylaws and I might not be speaking with the authority of the
> community at my back.
>
> Ultimately, I feel very sad that we didn't get around to clarifying this
> and we're in a situation where, had we made this clarification earlier,
> we'd not be in a situation where a contributor has to make a major
> refactoring to a substantial feature.  To that end, I think we should
> hammer this out once and for all so it's clear.
>
> So, I'd like to separate out this discussion here.  My justification for
> my belief (beyond that I think it's the correct thing to do) was the
> precedent set with 777, admittedly pre-feature branches, wherein we
> requested a series of splits to get to smaller units of functionality and
> isolate large architectural changes.  I think that it is good practice to
> isolate major feature changes to separate feature branches to ensure that
> we get sufficient discussion around each of those changes.  To that end,
> I'd like some language encoded into the bylaws describing this.
>
> So, I'm opening up the discussion.  Most specifically what I'd like to
> know is:
>
>- What do you think about the idea that feature branches should
>contain the minimal architectural changes to accomplish a feature (unless
>the full scope of architectural changes are mentioned in the discuss thread
>about the feature)?
>- If you agree (or disagree), do you think a change to the bylaws is
>merited to encode this policy?  If you do think so, then what wording would
>you suggest?
>
>
> Hopefully that all makes sense.
>
> Casey
>


Re: [ANNOUNCE] Apache Metron release 0.7.0

2018-12-17 Thread Casey Stella
+1 to that!!
On Mon, Dec 17, 2018 at 13:16 Michael Miklavcic 
wrote:

> And a big thanks to Justin Leet for being our release manager. Great work
> Justin!
>
> On Mon, Dec 17, 2018 at 10:07 AM Justin Leet  wrote:
>
>> Hi all,
>>
>> I’m pleased to announce the release of Metron 0.7.0! There's been a lot
>> of work on improvements, upgrades, discussion, and more. Thanks to everyone
>> who's contributed, and thank you to our users.
>>
>> Details:
>> The official release source code tarballs may be obtained at any of the
>> mirrors listed in
>> http://www.apache.org/dyn/closer.cgi/metron/0.7.0
>>
>> As usual, the secure signatures and confirming hashes may be obtained at
>> https://dist.apache.org/repos/dist/release/metron/0.7.0
>>
>> The release branches in github is
>> https://github.com/apache/metron/tree/Metron_0.7.0 (tag
>> apache-metron_0.7.0-release)
>>
>> The release doc book is at
>> http://metron.apache.org/current-book/index.html
>> The Apache Metron web site at http://metron.apache.org/ has been
>> updated; please refresh your web browser cache if the new links do not
>> immediately appear.
>>
>> Change lists and Release Notes may be obtained at the same locations as
>> the tarballs.
>> For your reading pleasure, the change list is appended to this message.
>>
>> CHANGES (in reverse chronological order):
>>
>> METRON-1928 Bump Metron version to 0.7.0 for release. (justinleet) 
>> closes apache/metron#1293
>> METRON-1931 Update dev utilities to support new repo location 
>> (rlenferink via justinleet) closes apache/metron#1295
>> METRON-1922 Escaping incorrectly handled in current aesh version 
>> (justinleet) closes apache/metron#1291
>> METRON-1867 Remove `/api/v1/update/replace` endpoint (nickwallen) closes 
>> apache/metron#1284
>> METRON-1810 Storm Profiler Intermittent Test Failure (nickwallen) closes 
>> apache/metron#1289
>> METRON-1909 Remove http filter from release utils changelog generation 
>> (justinleet) closes apache/metron#1283
>> METRON-1869 Unable to Sort an Escalated Meta Alert (nickwallen) closes 
>> apache/metron#1280
>> METRON-1889: Add any missing timestamp fields to unified enrichment 
>> topology (mmiklavc via mmiklavc) closes apache/metron#1286
>> METRON-1913 metron-alert UI - Build broken by missing transitive 
>> dependency (tiborm via sardell) closes apache/metron#1285
>> METRON-1845 Correct Test Data Load in Elasticsearch Integration Tests 
>> (nickwallen) closes apache/metron#1247
>> METRON-1888 Default Topology Settings in MPack Cause Profiler to Stall 
>> (nickwallen) closes apache/metron#1276
>> METRON-1887: Add logging to the ClasspathFunctionResolver (mmiklavc via 
>> mmiklavc) closes apache/metron#1274
>> METRON-1873 Update Bootstrap version in Management UI (sardell) closes 
>> apache/metron#1267
>> METRON-1825 Upgrade bro to 2.5.5 (JonZeolla via nickwallen) closes 
>> apache/metron#1237
>> METRON-1890 Metron Vagrant should disable audio (ottobackwards) closes 
>> apache/metron#1277
>> METRON-1874 Create a Parser Debugger (nickwallen) closes 
>> apache/metron#1265
>> METRON-1880 Use Caffeine for Profiler Caching (nickwallen) closes 
>> apache/metron#1270
>> METRON-1877 Nested IF ELSE statements can cause parse errors in Stellar 
>> (justinleet) closes apache/metron#1268
>> METRON-1872 Move rat plugin away from snapshot version (justinleet) 
>> closes apache/metron#1264
>> METRON-1875 Expose configurable global settings in the Alerts UI 
>> (merrimanr) closes apache/metron#1266
>> METRON-1834: Migrate Elasticsearch from TransportClient to new Java REST 
>> API (mmiklavc via mmiklavc) closes apache/metron#1242
>> METRON-1834: Migrate Elasticsearch from TransportClient to new Java REST 
>> API (cstella via mmiklavc)
>> METRON-1749 Update Angular to latest release in Management UI (sardell 
>> via nickwallen) closes apache/metron#1217
>> METRON-1870 Intermittent Stellar REST test failures (merrimanr via 
>> nickwallen) closes apache/metron#1263
>> METRON-1868 metron-committer-common incorrectly checking REPO_NAME 
>> (JonZeolla via jonzeolla) closes apache/metron#1260
>> METRON-1740 Improve Palo Alto parser to handle CONFIG and SYSTEM syslog 
>> messages (liuy-tnz via nickwallen) closes apache/metron#1171
>> METRON-1847 Create reusable script with functions from prepare-commit 
>> (ottobackwards) closes apache/metron#1248
>> METRON-1850 Stellar REST function (merrimanr) closes apache/metron#1250
>> METRON-1858 BasicFireEyeParser check style cleanup and optimization 
>> (ottobackwards) closes apache/metron#1255
>> METRON-1864 Stellar date format test fails after daylight saving 
>> (ottobackwards) closes apache/metron#1258
>> METRON-1861 METRON-1861: REST fails to start when LDAP enabled and 
>> 'Active Spring profiles' config is empty (anandsubbu via justinleet) closes 
>> apache/metron#1256
>> METRON-1853: Add shutdown hook to Stellar 

Re: [DISCUSS] Metron documentation improvements

2018-12-20 Thread Casey Stella
I definitely agree with option 3; that's a no-brainer IMO.  I thought for
sure this was already happening, honestly.

As for 2, we could even script the broken link check by:

   - Serving up the site locally via python with `python -m http.server`
   from the site-book output directory
   - Looking at the output of wget --spider -e robots=off -w 1 -r -p
   http://localhost:8000 for 404s


As for 1, I'm fine with it if that's where we want to go.  I'm a +0

On Thu, Dec 20, 2018 at 2:47 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> We recently had our site-book doc generation break due to our not including
> it in the Travis build. The fix for a broken build seems simple enough -
> add it to our build process and assuming it doesn't cause build timeout
> issues, we should be good to go.
>
> Beyond that, there are additional issues with the existing process. We have
> a step in our PR review for validating that the docs are rendering
> properly. I know I've gone back and corrected issues with broken images or
> incorrectly rendering pages at least a few times now. On one hand, we might
> say this is simply a matter of being better about validating documentation
> during the review process. That may be true, but rather than fight upstream
> like a salmon, I would prefer to simplify things, automate what we can, and
> use technology to work with us. Based on this conversation on METRON-1950 -
> (
>
> https://lists.apache.org/thread.html/e2acf91efc5f51ba0e26d76b00ca02415d3c6ee0adee74a037ab2beb@%3Cdev.metron.apache.org%3E
> ),
> I'd like to open up a general convo about improvements to our documentation
> generation.
>
> *Current Issues:*
>
>1. Duplicated effort - have to check pages render in Github and the
>Doxia-generated site-book
>2. Inconsistent model - what works for Github markdown may not work for
>Doxia, and vice versa
>3. Github is part of our workflow and easy to check, Doxia requires an
>extra separate step - suffers unintentional bugs due to #2.
>4. Images have to be manually added to the site rendering code for
>copying to the "images" folder, and explicit src ref replacements have
> to
>be included for all affected pages/links as well.
>5. Page links and images are not validated - this currently requires
>manual review and intervention during PR review and whenever we create a
>new Metron release.
>6. Failed site-book build is not validated. Broken build does not fail
>Travis
>
> *Options and Solutions:*
>
>1. Otto has already brought up using Ascii doc as one option for solving
>a number of these issues.
>2. For issue #5, we can write a scraper that validates links or use
>tooling like Cypress for this.
>3. For issue #6, we can add site-book building to our Travis runs. It's
>pretty quick to generate and will catch the more egregious rendering
> bugs.
>I plan to look at this presently.
>
> Mike
>


Re: [DISCUSS] Metron documentation improvements

2018-12-20 Thread Casey Stella
You will want to add a -Dlocalhost to the wget to ensure you're not
checking domains linked from our docs and turn travis into google. ;)


On Thu, Dec 20, 2018 at 3:19 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Well golly, I love this. I'll give that a whirl!
>
> On Thu, Dec 20, 2018 at 1:08 PM Casey Stella  wrote:
>
> > I definitely agree with option 3; that's a no-brainer IMO.  I thought for
> > sure this was already happening, honestly.
> >
> > As for 2, we could even script the broken link check by:
> >
> >- Serving up the site locally via python with `python -m http.server`
> >from the site-book output directory
> >- Looking at the output of wget --spider -e robots=off -w 1 -r -p
> >http://localhost:8000 for 404s
> >
> >
> > As for 1, I'm fine with it if that's where we want to go.  I'm a +0
> >
> > On Thu, Dec 20, 2018 at 2:47 PM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > We recently had our site-book doc generation break due to our not
> > including
> > > it in the Travis build. The fix for a broken build seems simple enough
> -
> > > add it to our build process and assuming it doesn't cause build timeout
> > > issues, we should be good to go.
> > >
> > > Beyond that, there are additional issues with the existing process. We
> > have
> > > a step in our PR review for validating that the docs are rendering
> > > properly. I know I've gone back and corrected issues with broken images
> > or
> > > incorrectly rendering pages at least a few times now. On one hand, we
> > might
> > > say this is simply a matter of being better about validating
> > documentation
> > > during the review process. That may be true, but rather than fight
> > upstream
> > > like a salmon, I would prefer to simplify things, automate what we can,
> > and
> > > use technology to work with us. Based on this conversation on
> > METRON-1950 -
> > > (
> > >
> > >
> >
> https://lists.apache.org/thread.html/e2acf91efc5f51ba0e26d76b00ca02415d3c6ee0adee74a037ab2beb@%3Cdev.metron.apache.org%3E
> > > ),
> > > I'd like to open up a general convo about improvements to our
> > documentation
> > > generation.
> > >
> > > *Current Issues:*
> > >
> > >1. Duplicated effort - have to check pages render in Github and the
> > >Doxia-generated site-book
> > >2. Inconsistent model - what works for Github markdown may not work
> > for
> > >Doxia, and vice versa
> > >3. Github is part of our workflow and easy to check, Doxia requires
> an
> > >extra separate step - suffers unintentional bugs due to #2.
> > >4. Images have to be manually added to the site rendering code for
> > >copying to the "images" folder, and explicit src ref replacements
> have
> > > to
> > >be included for all affected pages/links as well.
> > >5. Page links and images are not validated - this currently requires
> > >manual review and intervention during PR review and whenever we
> > create a
> > >new Metron release.
> > >6. Failed site-book build is not validated. Broken build does not
> fail
> > >Travis
> > >
> > > *Options and Solutions:*
> > >
> > >1. Otto has already brought up using Ascii doc as one option for
> > solving
> > >a number of these issues.
> > >2. For issue #5, we can write a scraper that validates links or use
> > >tooling like Cypress for this.
> > >3. For issue #6, we can add site-book building to our Travis runs.
> > It's
> > >pretty quick to generate and will catch the more egregious rendering
> > > bugs.
> > >I plan to look at this presently.
> > >
> > > Mike
> > >
> >
>


Re: [DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing

2018-12-19 Thread Casey Stella
We absolutely should be acking the dropped messages otherwise they'll be in
a replay loop.  Not acking is a flat-out bug IMO.

On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> When a message is filtered by the message filtering mechanism, we
> explicitly drop the message (and presumably ack it in Storm), as explained
> here -
>
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered
> .
> When using the REGEX_SELECT field transformation (see here -
>
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration
> )
> with the kafka.topicField option for parser-chaining, it's unclear to me
> whether we expect the same behavior (drop message, ack it). The
> interpretation I get from this example in the parser-chaining doc
>
> https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
> suggests to me that the approach we take for messages with message
> filtering is the correct one, however in testing an example with dropped
> messages, we appear not to ack those dropped messages.
>
> Before I go creating a fix I thought it best to summarize and confirm my
> expectations on this functionality. Messages from a REGEX_SELECT that don't
> match a pattern, and therefore don't get a value assigned to their output
> topic value, should be dropped and acked.
>
> *Example:*
> {
> "parserClassName": "org.apache.metron.parsers.GrokParser",
> "sensorTopic": "myInTopic",
> ...
> "parserConfig": {
> ...,
> "kafka.topicField": "output_topic"
> },
> "fieldTransformations": [
> {
> "input": [
> "message"
> ],
> "output": [
> "output_topic"
> ],
> "transformation": "REGEX_SELECT",
> "config": {
> "world": "^Hello "
> }
> },
> ...
> }
>
> *Input Records:*
> "...sshd[32469]: Hello..."
> "...sshd[30432]: Bye..."
>
> *Output:*
> Kafka topic = "world" (as determined by the REGEX_SELECT pattern match that
> sets the "output_topic" property used by kafka.topicField)
> 1 record present
> contents of that record = our record with "Hello" in it
> 1 record is dropped ("Bye" record) and will not be forwarded any further
> through the pipeline.
>


Re: [DISCUSS] Managing intermittent test failures

2018-11-29 Thread Casey Stella
+1, I'd say mention it on the dev list and slack channel.

On Thu, Nov 29, 2018 at 10:26 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Every now and then we see intermittent test failures, and rather than
> sweeping them under the rug, we should have a simple method to track and
> handle them. I started creating Jiras for tests that I've seen fail, but
> that don't fail consistently, or even fail more than once. For example,
> https://issues.apache.org/jira/browse/METRON-1851.
>
> I think we're all taking steps to varying degrees already, but I want to
> call it out formally. I propose we create a ticket and add the label
> "test-failure." It might also make sense to send a quick note to the dev
> list or Slack channel, so attention can be brought to it and anyone else
> that may have run into an issue with the test can chime in. We can clean
> them out every few months - maybe do a review going into a release and
> close any that have not been reproduced for some time. What do you all
> think?
>
> Mike
>


Re: [ANNOUNCE] Shane Ardell is a committer

2018-11-27 Thread Casey Stella
Congrats Shane!  Well deserved!

On Tue, Nov 27, 2018 at 7:24 AM Tamás Fodor  wrote:

> Congratulations Shane! 
>
> On Mon, Nov 19, 2018 at 5:58 PM Mohan Venkateshaiah <
> mvenkatesha...@hortonworks.com> wrote:
>
> > Congrats Shane !!
> >
> > Thanks
> > Mohan DV
> >
> > On 11/19/18, 10:26 PM, "Michael Miklavcic"  >
> > wrote:
> >
> > Congrats!
> >
> > On Mon, Nov 19, 2018 at 8:56 AM Shane Ardell <
> shane.m.ard...@gmail.com
> > >
> > wrote:
> >
> > > I want to extend a huge thank you to everyone part of Apache
> > Metron's PMC
> > > for offering me this opportunity!
> > >
> > > Cheers,
> > > Shane
> > >
> > > On Mon, Nov 19, 2018 at 4:53 PM zeo...@gmail.com  >
> > wrote:
> > >
> > > > Congrats Shane!
> > > >
> > > > Jon
> > > >
> > > > On Mon, Nov 19, 2018 at 10:43 AM Anand Subramanian <
> > > > asubraman...@hortonworks.com> wrote:
> > > >
> > > > > Many congratulations, Shane!
> > > > >
> > > > > Cheers,
> > > > > Anand
> > > > >
> > > > > On 11/19/18, 8:36 PM, "James Sirota" 
> wrote:
> > > > >
> > > > >
> > > > > The Project Management Committee (PMC) for Apache Metron
> has
> > > invited
> > > > > Shane Ardell to become a committer and we are pleased to
> > announce that
> > > he
> > > > > has accepted.  I wanted to congratulate Shane on this
> > achievement.
> > > > >
> > > > >
> > > > > Being a committer enables easier contribution to the
> project
> > since
> > > > > there is no need to go via the patch submission process. This
> > should
> > > > enable
> > > > > better productivity. Being a PMC member enables assistance with
> > the
> > > > > management and to guide the direction of the project.
> > > > > ---
> > > > > Thank you,
> > > > >
> > > > > James Sirota
> > > > > PMC- Apache Metron
> > > > > jsirota AT apache DOT org
> > > > >
> > > > >
> > > > >
> > > > > --
> > > >
> > > > Jon Zeolla
> > > >
> > >
> >
> >
> >
>


Re: [MENTORS][DISCUSS] LICENSE and NOTICE likely outdated

2018-09-12 Thread Casey Stella
> I understand that convenience binaries might some issues with uberjars
when
we go that route for 1.0. But is there any issue with the uberjars as
things currently stand? I was under the impression we are OK because we
don't distribute them.

My impression is that this incorrect and it was the guidance of the mentors
that we were indeed in a situation where we bundle dependencies as our
build creates uberjars, thus we have to ensure that the jars contain the
appropriate notices and license.  This was done as part of METRON-531 (see
https://github.com/apache/metron/pull/335 for the discussion between Josh
Elsner and myself).  It is my understanding that independent of whether we
release binaries, the fact that our build system builds uber jars, we must
ensure that the license and notices are correct within those bundled
artifiacts of our build.

Mentors, if I'm wrong in this, please let me know, but if I'm right, then
it is likely that we need a change in process to keep these license and
notices files updated in the individual projects *or* we could choose to
stop creating uberjars.

Casey

On Wed, Sep 12, 2018 at 1:09 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I'm not sure I fully understand what is out of date. I know I have
> personally modified our licenses a couple times in the past and used an
> automated script that, I believe, Casey Stella had created for doing the
> check. I even made some improvements to it a long ways back. It rips
> through the maven dependency tree and tells you what isn't in the licenses
> file and fails with a non-zero return code. I thought that was part of our
> Travis build, or at the very least, the release lifecycle. Is that not the
> case, or is there a different context we're talking about here?
>
> I understand that convenience binaries might some issues with uberjars when
> we go that route for 1.0. But is there any issue with the uberjars as
> things currently stand? I was under the impression we are OK because we
> don't distribute them. It's part of the build, just like tools such as
> JUnit, that we don't actually ship.
>
> Justin - These are the links for guidance that I've found. Is anything else
> you've found that we should peruse while figuring this out?
>
>- https://www.apache.org/dev/licensing-howto.html
>- http://www.apache.org/legal/release-policy.html#artifacts
>
> Mike
>
>
> On Wed, Sep 12, 2018 at 10:29 AM Justin Leet 
> wrote:
>
> > Hi all,
> >
> > As mentioned on the release voting thread, there was a Slack discussion
> > around our LICENSE and NOTICE file likely being outdated because they
> > haven't been actively kept up to date since graduation.  I suggested on
> the
> > vote thread that we proceed with the current release, but consider it a
> > blocker for the next release.
> >
> > Mentor input on this (and how other projects handle it), would be greatly
> > appreciated.
> >
> > This discussion should result in JIRAs that are brought back to the
> thread,
> > so we can make sure to track this.
> >
> > For context, in addition to the standard L management, when we build
> > artifacts we shade a lot of jars into a uberjars, thus bundling
> > dependencies.  However, our current releases are source only, but
> > publishing convenience binaries came up in the 1.0 roadmap thread.
> >
> > I think there are a few things that need to happen to correct our current
> > issue and make this easier in the future.
> > 1) Get the LICENSE and NOTICE files up to date
> > 2) Document the process we went through getting things up to date and
> (just
> > as importantly) the reasoning behind it.
> > 3) Update the PR checklist to include LICENSE and NOTICE files for new
> (and
> > transitive) dependencies.
> > 4) Update or add any processes we need to maintain this properly (e.g.
> > release auditing)
> > 5) Possibly build tooling for making some of this auditing easier (or use
> > existing tool if anyone has suggestions)?
> >
> > Are there any other steps I'm missing that need to go into JIRAs?
> > Any other concerns regarding these files that need to be addressed?
> > Any other context I'm missing and that belongs in this discussion?
> >
>


Re: [DISCUSS] Shaded jar classifiers

2019-06-03 Thread Casey Stella
This looks good to me, honestly.  Anything to make the build more
understandable and help find classpath issues easier is a good idea IMO.

Just curious, did you test that PR in both solr and ES (you added an
exclude in the ES portion of the code) and did you spin it up in full-dev
(to ensure ambari doesn't have any dependencies on the jar names)?

Other than that, I'm +1 to the effort!

On Mon, Jun 3, 2019 at 8:55 AM Ryan Merriman  wrote:

> I recently opened a PR  that
> has potential to significantly change (for the better in my opinion) the
> way our Maven build process works.  I want to highlight this and get any
> feedback on potential issues that may come with this change.
>
> I frequently run into the classpath version issues (especially with the
> recent module reorganization work) and find them extremely challenging to
> troubleshoot.  I believe we have found the root cause (from the PR
> description):
>
> "When a module that uses the shaded plugin without a classifier is added to
> another module as a dependency:
>
> 1. Any Maven excludes added to that dependency are ignored
> 2. The Maven dependency:tree tool does not accurately report the transitive
> dependencies pulled in by that dependency"
>
> After making this change, a number of classpath version problems popped up
> as expected.  However they are now easy to track down and resolve.
>
> Does anyone have any concerns with making this change?  Are there things
> I'm not thinking of?
>


Re: [DISCUSS] Metron Release - 0.7.1 next steps

2019-05-02 Thread Casey Stella
FWIW, I'm in favor of 2.  I think it's a relatively minor bug and the
impact is limited.  I do agree that it should be a blocker for 0.8.0 though.

On Thu, May 2, 2019 at 9:31 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I am still in favor of option 2. I will volunteer and submit the doc PR. I
> agree we should not rush through a review process for a maintenance
> release. The implications to the UI, as Otto asked, are that aggregated
> parsers will not show up in the UI. You cannot create them there. Actually,
> any parser not created through the UI (eg CLI) will not show up in the UI,
> aggregated or not.
>
> As a separate issue, I will also volunteer to see if I can help Tamas find
> the discuss thread mentioned. It should be linked to the PR or feature
> branch for reference. That may also be a gap in dev guidelines that should
> be spelled out.
>
> On Thu, May 2, 2019, 7:17 AM Nick Allen  wrote:
>
> > To echo Justin's comments, I am in favor of #2, which provides a clear,
> > well-defined path to a release.
> >
> >- Why hold back a release, especially a point release containing 89
> >improvements, for one issue that will not affect most users?
> >
> >
> >- It is one thing to stall a release to address a bug of limited
> scope,
> >where a fix is well understood and ready for review, but it is
> > completely
> >another issue to delay for this.
> >
> >
> >- I don't see a set of reviewable PRs yet that will push this over the
> >finish line.  As has been noted, there were fundamental problems with
> > #1360
> >(which has now been closed) that would have prevented adequate review
> by
> >the community.
> >
> >
> >- Why drive this issue with the pressure of a stalled release, instead
> >of just releasing the fix when it is ready and has been adequately
> >reviewed?  Swarming on an issue does not often produce quality
> results.
> >
> > For those in favor of #1, can someone please provide a clear outline of
> > what the fix looks-like?  How many PRs will this require?  When are these
> > PRs likely to be ready?  Who is driving this?  Tamás has already
> commented
> > that this not a quick fix. This path is very murky to me, but maybe I am
> > just ignorant on this.
> >
> > I would also urge other committers and users who don't have a binding
> vote
> > on the release to share their opinion on the path forward.
> >
> >
> >
> >
> > On Thu, May 2, 2019 at 7:17 AM Otto Fowler 
> > wrote:
> >
> > > If you can find a link in the archives for that thread, it would really
> > > help.
> > >
> > > I don’t think sending them up as one sensor would work…. as something
> > > quick.  I think it is an interesting idea from a higher level that
> would
> > > need some more thought though ( IE: what if every sensor in the ui was
> a
> > > sensor group, and the existing  entries where just groups of 1 ).
> > >
> > > As far as I can see, we have brought up the idea of a release
> ourselves,
> > I
> > > don’t see why we don’t just swarm this issue and get it right then
> > release.
> > >
> > >
> > >
> > > On May 2, 2019 at 04:16:31, Tamás Fodor (ftamas.m...@gmail.com) wrote:
> > >
> > > In PR#1360 we introduced a new state management strategy involving a
> new
> > > module called Ngrx. We had a discussion thread on this a few months ago
> > and
> > > we successfully convinced you about the benefits. This is one of the
> > > reasons why this PR is going to be still huge after cleaning up the
> > commit
> > > history. After you having a look at the changes and the feature itself,
> > > there's likely have questions about why certain parts work as they do.
> > The
> > > thing what I'd like to point out is that, yes, it probably takes more
> > time
> > > to get it in.
> > >
> > > In order to being able to release the RC, wouldn't it be an easy and
> > quick
> > > fix on the backend if it sent the aggregated parsers to the client as
> > they
> > > were one sensor? It's just an idea, it might be wrong, but at least we
> > > shouldn't have to wait until the aforementioned PR gets ready to be
> > merged
> > > to the master.
> > >
> > > On Wed, May 1, 2019 at 4:16 PM Justin Leet 
> > wrote:
> > >
> > > > Short version: I'm in favor of #2 of 0.7.1 and #1 as a blocker for
> > 0.8.0.
> > > > #3 seems like a total waste of time and effort.
> > > >
> > > > The wall of text version:
> > > > I agree this isn't "just the wrong thing shown", but for completely
> > > > different reasons.
> > > >
> > > > To be extremely clear about what the problem is: Our "dev"
> environment
> > > > (whose very name implies the audience is develops) uses a
> > > performance-based
> > > > advanced feature to ensure that all our of sample flows are regularly
> > run
> > > > and produce data. This feature has a bare minimal implementation to
> be
> > > > enabled via Ambari, which it currently is by default. This is because
> > of
> > > > the limited resources available that previously resulted in us
> 

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Casey Stella
I just want to chime in and say I'm STRONGLY in favor of a docker-based
approach to testing (I specifically like the JUnit 5 extensions
suggestion).  I think that forcing a full-dev evaluation for every small PR
is a barrier to entry that I'd like to overcome.  I also think that this is
going to not be trivial.

There will be weirdness/drama with:

   - cleanup
   - setup in situations where multi-components are used
   - debuggability (right now we run the tests in the same JVM and setting
   breakpoints is trivial, even in the innards of Hadoop.  This is very
   valuable for figuring out what's going wrong and we'll need SOME solution
   for it)
   - possible resource limitations in travis for running tests with
   multiple components

Even so, with ALL of that being said, I still think the value outweighs the
difficulty by a factor of 10.  Being able to be confident after a travis
run that people aren't introducing subtle classpath or cross-component
interaction issues would open up 80% of the class of PRs that don't require
full-dev review.  That being said, I still don't think it's 100%.
Specifically, PRs which can credibly be argued that they touch installation
pathways would still need to be verified in full-dev as it's the only path
to validating that (otherwise we would be regressing in test coverage).

On Wed, May 1, 2019 at 9:33 PM Justin Leet  wrote:

> >
> > My impression is that this is already the status quo. But, if we think we
> > need to be more clear on this, let's put up a vote to change the coding
> > guidelines and PR checklist. I've done this many times in the past, the
> > most obvious instances are when I've made doc changes or unit test
> > modifications because those will not impact full dev. I will own this
> item.
> > I think it can probably get rolled in with my dev guideline changes for
> > architecture diagrams.
>
>
> For completeness in our PR checklist: "- [ ] Have you verified the basic
> functionality of the build by building and running locally with Vagrant
> full-dev environment or the equivalent?"  In practice, you're right, but
> any newer contributors aren't necessarily going to know this.
>
> 1. I think we should create Jiras with the end deliverable being that our
> > private vs public API endpoints are clearly delineated. From there, we
> > create another round of javadoc - for the public APIs let the javadocs
> rain
> > from the heavens to your heart's content. It's for public consumption and
> > should assist end users. See Mockito, for example -
> >
> >
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> > .
> > For developer docs, I'm of the *extremely strong* opinion that this
> should
> > be limited. Emphasize module, package, class, and method naming
> conventions
> > over all else. If it doesn't make sense just reading the code, take a
> > minute to summarize what you're doing and consider refactoring. For
> > legitimately more complex and necessary code passages, add a note. For
> > multi-class interactions that provide a larger story arc, add dev docs to
> > the relevant READMEs. Our use of Zookeeper Curator and its interaction
> with
> > our topology config loading is a perfect example of a feature that would
> > fit this need.
> >
> 2. I'm an immediate -1 on any documentation that looks like " /** Open the
> > car door **/ public void openCarDoor() {...}" :-). The code speaks for
> > itself.
> > 3. Publish javadocs for public APIs, not our internal dev APIs. Let your
> > fav IDE fill in the gaps for devs.
>
>
> I'm +1 on delineating public vs private APIs like you've outlined there.  I
> think our dev stuff is *better* than our general usage guides, but there's
> room for improvement. I'm fairly agnostic on the dev docs because to be
> honest, a ton of our older code is not at all self explanatory, and to be
> blunt refactoring a lot of it is a substantial lift (as we've all seen
> multiple times trying to refactor it).  If this were greenfield, I'd be in
> much stronger agreement with you, but I suspect in practice there's a lot
> of stuff nobody's going to refactor for awhile.
>
>
> > Full dev until we vote to replace the existing setup and can be confident
> > that the new approach 1. is stable, 2. takes <= the amount of time to
> > complete as full dev. I am +1 for migrating towards this approach and
> think
> > we should do so when it's dialed in.
>
>
> Great, I look forward to that getting in.
>
> Justin, what are your thoughts on leveraging this approach along with
> > long-lived Docker containers?
> >
>
> Apparently, there's actually an extension for running Docker containers,
> see  https://faustxvi.github.io/junit5-docker/.  My main hesitation there
> is more around how much effort to migrate it is. I think that's almost
> certainly a cleaner long term solution, but I suspect the 80% solution of
> migrating what we have *might* be easier.  There might also be ways of just
> leveraging this by moving stuff 

Re: [DISCUSS] Deprecate Least Recently Used Pruner

2019-08-13 Thread Casey Stella
Ah, that feature.  Yes, it never seemed to catch on.  It actually wasn't
from OpenSOC, but a very early feature of Metron.  The use-case was that
enrichments may go stale and removing them based on TTL was easy to do, but
not ideal.  The LeastRecentlyUsedPruner was a MR job which would allow
enrichments to be pruned which had not been *read* in x amount of time.  It
did this by capturing bloom filters with enrichment keys used for a
time-range and the MR job would use those bloom filters to determine which
keys to remove.

I'd be ok with it either being used or removed.  It's unclear to me whether
the use-case that hbase needs to be pruned based on usage was as valid as
we thought.  I guess that makes me +0 on the request to deprecate.

On Tue, Aug 13, 2019 at 6:28 PM Nick Allen  wrote:

> Sure.  I should have provided some more context.  I can tell you what I do
> know about it.  Perhaps others can provide some more color.
>
>- This is functionality accessed by a user by running the script; ${
>METRON_HOME}/bin/threatintel_bulk_prune.sh
>
>
>- If you are using access trackers with your HBase enrichments, it runs
>as an MR job that counts the number of times each Enrichment is used.
> I am
>assuming that it then prunes those that are less frequently accessed.
>
>
>- It was originally created here;
>https://github.com/apache/metron/pull/22
>
>
> On Tue, Aug 13, 2019 at 6:11 PM Otto Fowler 
> wrote:
>
> > Can you summarize what it does? Is it from OpenSOC?
> >
> >
> >
> >
> > On August 13, 2019 at 17:53:52, Nick Allen (n...@nickallen.org) wrote:
> >
> > As part of https://github.com/apache/metron/pull/1470, I found it
> > difficult
> > to update the "Least Recently Used Pruner" to work with HBase 2.0.2. I am
> > sure that given more time and effort, I could make it work, but is it
> worth
> > it?
> >
> > This is a feature that I myself am not familiar with. I do not know of
> > anyone using this. I also did not find much documentation on how to use
> > this feature. I certainly don't know the entire user community, so please
> > let me know if anyone is using this functionality or believes that it
> > should be maintained going forward.
> >
> > Would you support deprecating this feature?
> >
> > Thanks
> >
>


Dev list/commit statistics

2020-01-10 Thread Casey Stella
Hi all,

First off, thank every one of our amazing committers and contributors.  You
guys are awesome!  That being said, we seem to have a decline in
development activity recently both in terms of dev list discussions (this
has been sorta historic) as well as commits.   I saw then when going
through the board report preparations.  I'll excerpt the relevant
statistics from it here:

Community health is mixed.  We are seeing an up-tick in user@ mailing list
> discussion (~50% increase as compared to the last quarter).
> At the same time, we are seeing decreased activity in both the
> last month as well as compared to the last quarter.  Specifically:
> * 40% drop in traffic on the dev list
> * Perhaps most worrying, a 43% drop in commits as compared to last quarter.



I think part of this is due to the holidays, but I wanted to open up the
discussion and see if anyone had ideas about how to get some fresh blood in
the project and see if anyone else had any thoughts.

Best,

Casey


Re: [DISCUSS] Next Release - Life After 0.7.1

2020-01-15 Thread Casey Stella
I'd recommend pulling this into a separate thread and tagging the question
with [MENTORS].  FWIW, I'm of the opinion that you should just denote in
the commit that it was a dependabot contribution, squash like we normally
do and not rewrite the user for attribution.  dependabot does not appear to
have a username, so I think that's ok.

On Wed, Jan 15, 2020 at 6:18 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Ok, I've tested and +1'ed https://github.com/apache/metron/pull/1552.
> Anyone have any idea how to properly merge and attribute a PR like this?
> Did a quick search on "apache attribution dependabot" and nothing useful
> showed up on that pass.
>
> M
>
> On Tue, Jan 14, 2020 at 11:36 AM Otto Fowler 
> wrote:
>
> > yes
> >
> >
> >
> >
> > On January 14, 2020 at 13:05:17, Michael Miklavcic (
> > michael.miklav...@gmail.com) wrote:
> >
> > We should probably also get this resolved for this release.
> > https://issues.apache.org/jira/browse/METRON-2340. Thoughts?
> >
> > On Mon, Dec 16, 2019 at 2:19 PM Shane Ardell 
> > wrote:
> >
> > > Both PR #1527  and #1533
> > >  are now merged into
> master.
> > >
> > > On Fri, Dec 13, 2019 at 3:57 PM Justin Leet 
> > wrote:
> > >
> > > > I also brought up https://github.com/apache/metron/pull/1282 and
> > > > https://github.com/apache/metron/pull/1552 if anyone has any
> thoughts
> > > on
> > > > them.
> > > >
> > > > On Fri, Dec 13, 2019 at 11:58 AM Shane Ardell <
> > shane.m.ard...@gmail.com>
> >
> > > > wrote:
> > > >
> > > > > Quick update from my end: I just left a +1 on
> > > > > https://github.com/apache/metron/pull/1527 and will merge it into
> > > master
> > > > > shortly. We are actively looking into the cause of the bug I
> > > encountered
> > > > in
> > > > > https://github.com/apache/metron/pull/1533. It would be nice to
> have
> > > > this
> > > > > in the release, but I would not categorize it as critical like
> #1527.
> > > I'm
> > > > > optimistic we will have this resolved and merged into master by
> this
> > > > > weekend, but I'm fine reducing scope to not include it.
> > > > >
> > > > > On Fri, Dec 13, 2019 at 11:24 AM Nick Allen 
> > > wrote:
> > > > >
> > > > > > Are we just waiting on the following PRs as release blockers? Any
> > > > > others?
> > > > > >
> > > > > > - https://github.com/apache/metron/pull/1533
> > > > > > - https://github.com/apache/metron/pull/1527
> > > > > >
> > > > > > Being towards the end of the year, people are going to be on
> > holiday.
> > > > It
> > > > > > would be great if we could focus on reducing scope and getting a
> > > > release
> > > > > > cut.
> > > > > >
> > > > > >
> > > > > > On Sat, Dec 7, 2019 at 10:04 AM Justin Leet <
> justinjl...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > https://github.com/apache/metron/pull/1568 and
> > > > > > > https://github.com/apache/metron/pull/1554 are in master now.
> > > > > > >
> > > > > > > On Fri, Dec 6, 2019 at 7:16 PM Justin Leet <
> > justinjl...@gmail.com>
> >
> > > > > > wrote:
> > > > > > >
> > > > > > > > I'd like to throw https://github.com/apache/metron/pull/1552
> > on
> > > > the
> > > > > > > pile.
> > > > > > > > Per https://issues.apache.org/jira/browse/LEGAL-491, we
> should
> > > > just
> > > > > > note
> > > > > > > > the contribution comes from dependabot. Would someone more
> > > familiar
> > > > > > with
> > > > > > > > the implications of upgrading that be able to review it, or
> > give
> > > > some
> > > > > > > > advice on what we should be looking for in the review?
> > > > > > > >
> > > > > > > > On Thu, Dec 5, 2019 at 12:06 PM Shane Ardell <
> > > > > shane.m.ard...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Speaking on the UI-related PRs that Justin mentioned, I also
> > > would
> > > > > > like
> > > > > > > to
> > > > > > > >> see both of them merged before a release. At the moment,
> #1527
> > > > does
> > > > > > not
> > > > > > > >> address a few "stale data state" message inconsistencies
> that
> > > > become
> > > > > > > >> apparent as a result of that PR's work (you can read more
> > about
> > > it
> > > > > in
> > > > > > > the
> > > > > > > >> PR comments). That said, I think those inconsistencies can
> be
> > > > > tracked
> > > > > > > and
> > > > > > > >> addressed separately from the current PR.
> > > > > > > >>
> > > > > > > >> On Thu, Dec 5, 2019 at 11:51 AM Michael Miklavcic <
> > > > > > > >> michael.miklav...@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > I think the junit upgrade should go in also. I'm almost
> > > finished
> > > > > > > >> reviewing
> > > > > > > >> > that.
> > > > > > > >> >
> > > > > > > >> > On Thu, Dec 5, 2019, 8:50 AM Justin Leet <
> > > justinjl...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > If we're going to do a bug fix release, I'd like to see
> > > some
> > > > of
> > > > > > the
> > > > > 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Casey Stella
As far as I know there is no minimum bar of development activity to keep a
project open.  I think we would all be grateful for any investment that you
or your organization would want to make.

It also occurs to me that your observation is absolutely spot on: we have a
LOT of moving parts.
I see some deficiencies here:

   - We depend on a lot of the various hadoop ecosystem projects and they
   have to work together very precisely:
  - This makes for a system that is hard to install.
  - This also makes for a system which is hard to tune/manage
   - We have a large surface area of coverage
   - We have an installer, backend system and front-end UI, which stretches
  our developers a bit thin, especially since there isn't even interest in
  those systems

Perhaps a reconsideration of the scope and technologies that we use would
be merited?  If we were to decide to, for instance:

   - Consolidate scope: focus on a viable backend/API rather than a UI
   - Consolidate technology: reposition ourselves on top of Spark as a
   consolidated streaming/batch system
   - Make SQL our external interface: write out to parquet + the Hive
   metastore and let users pin up presto tables or hive tables as they see fit

This might reduce some of our surface area and make it more viable to get
started?

Anyway, just some thoughts.

Casey

On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom  wrote:

> Hi Casey,
>
> I'm new here and new to contributing to an open source project. Thus far
> my contribution has been questions, however the steep learning curve has
> had me working to understand all the moving parts for the last 18 months
> and I see that as a big investment by my organization.
>
> What is a level that would be viable?
>
> If my organization were to contribute I don't know that it would be soon
> enough or at the volume that is recognized as viable, which is why I ask
> the question.
>
>
> On 2020-04-08 15:05:51-07:00 Casey Stella wrote:
>
> Hi all,
>
> When composing the board report today, I realized that we have effectively
> had no development in the last quarter on this project.  Please be aware
> that I say this without a shred of blame or judgement (especially so
> considering I have not contributed in a long time).  That being said, I
> would like to pose the question to the community:
>
> Do we feel that this project is viable?  If so, how are we going to spur
> new contributions?  If not, then should we begin the process to fold the
> project?
>
>
> Best,
>
> Casey
>
>


Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Casey Stella
Hi all,

When composing the board report today, I realized that we have effectively
had no development in the last quarter on this project.  Please be aware
that I say this without a shred of blame or judgement (especially so
considering I have not contributed in a long time).  That being said, I
would like to pose the question to the community:

Do we feel that this project is viable?  If so, how are we going to spur
new contributions?  If not, then should we begin the process to fold the
project?


Best,

Casey


Re: Metron-2340 - Geo Database

2020-05-20 Thread Casey Stella
Yeah, that Perl utility appears to be GPL, but the thing that it creates
need not be GPL, I presume.  Am I reading that correctly?

On Wed, May 20, 2020 at 21:55 Otto Fowler  wrote:

>  I “believe” that those files apply to the database that they ship, looking
> at the perl It doesn’t seem to mention any claim or restrictions on the
> databases produced.
>
> I don’t think you are talking about distributing the perl module, but
> rather making it a build dependency,  Is that right?
>
> On May 17, 2020 at 10:49:11, Yerex, Tom (tom.ye...@ubc.ca) wrote:
>
> Good morning,
>
> I have been working on a solution for Metron-2340 (
> hxxps://issues.apache.org/jira/projects/METRON/issues/METRON-2340), and
> need some direction. br/> <
> Using Maxmind's database writer (
> hxxps://github.com/maxmind/MaxMind-DB-Writer-perl), I have been
> creating the mmdb files required for the build tests to succeed. The
> Perl code I was going to place in dev-utilities/build-utils with the
> name generate_geoip.pl, and I noticed there are some additional files
> to consider.
>
> As an example, in GeoLite2-ASN.tar.gz there is the following:
>
> GeoLite2-ASN_20181120/
> GeoLite2-ASN_20181120/COPYRIGHT.txt
> GeoLite2-ASN_20181120/LICENSE.txt
> GeoLite2-ASN_20181120/GeoLite2-ASN.mmdb
>
> I am not using the original mmdb data from Maxmind, instead generating
> data using the Perl library. Are the copyright and license files
> required? Is there a copyright and license document from the Apache
> foundation I should be using here?
>
> Thank you,
>
> Tom.
>


Re: [VOTE] Move Apache Metron to the Apache Attic and Dissolve PMC

2020-11-16 Thread Casey Stella
+1

On Mon, Nov 16, 2020 at 09:01 Justin Leet  wrote:

> Hi all,
>
> This is a vote thread to retire Metron to the Attic, and dissolve the PMC.
> This follows a discussion thread on the dev list ([DISCUSS] Retire Metron
> to the Attic
> <
> https://lists.apache.org/thread.html/reb31f643fac20d3ad09521fd702b19922412b7a4e8e08062968268c5%40%3Cdev.metron.apache.org%3E
> >).
> More details can be found in that discussion, but the most relevant link is
> the specific process at Moving a project to the Attic
> .
>
> As noted in the process page, this is a PMC vote. As usual, feel encouraged
> to contribute non-binding votes.
>
> The vote will run 72 hours, until Nov 19th at 9:00 am EST.
>
> Thank you,
> Justin
>


Fwd: [VOTE] Move Apache Metron to the Apache Attic and Dissolve PMC

2020-11-19 Thread Casey Stella
By this vote thread[1] the Apache Metron community has voted to move Apache
Metron to the attic.
We would therefore like to add the following resolution to the next board
report:

Terminate the Apache Metron Project

   WHEREAS, the Project Management Committee of the Apache Metron
   project has chosen by vote to recommend moving the project to the
   Attic; and

   WHEREAS, the Board of Directors deems it no longer in the best
   interest of the Foundation to continue the Apache Metron project
   due to inactivity;

   NOW, THEREFORE, BE IT RESOLVED, that the Apache Metron
   project is hereby terminated; and be it further

   RESOLVED, that the Attic PMC be and hereby is tasked with
   oversight over the software developed by the Apache Metron
   Project; and be it further

   RESOLVED, that the office of "Vice President, Apache Metron" is
   hereby terminated; and be it further

   RESOLVED, that the Apache Metron PMC is hereby terminated.

Best,

Casey Stella
VP Apache Metron
[1]
https://lists.apache.org/thread.html/r81c4b8ee3f4075938883728c2bd5f7f8897d4ef797699ff1a85998be%40%3Cdev.metron.apache.org%3E

-- Forwarded message -
From: Justin Leet 
Date: Thu, Nov 19, 2020 at 3:18 PM
Subject: Re: [VOTE] Move Apache Metron to the Apache Attic and Dissolve PMC
To: 


The result of the vote is:

7 binding +1s
1 non-binding +1

The vote passes to move Apache Metron to the Attic, and the next step is to
inform the board of the vote and add a resolution to the next board meeting.

Thanks,
Justin

On Mon, Nov 16, 2020 at 6:26 PM Ryan Merriman  wrote:

> +1
>
> > On Nov 16, 2020, at 5:19 PM, David Lyle  wrote:
> >
> > +1
> >
> >> On Mon, Nov 16, 2020 at 3:10 PM Michael Miklavcic <
> >> michael.miklav...@gmail.com> wrote:
> >>
> >> +1
> >>
> >>> On Mon, Nov 16, 2020 at 7:01 AM Justin Leet 
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> This is a vote thread to retire Metron to the Attic, and dissolve the
> >> PMC.
> >>> This follows a discussion thread on the dev list ([DISCUSS] Retire
> Metron
> >>> to the Attic
> >>> <
> >>>
> >>
>
https://lists.apache.org/thread.html/reb31f643fac20d3ad09521fd702b19922412b7a4e8e08062968268c5%40%3Cdev.metron.apache.org%3E
> >>>> ).
> >>> More details can be found in that discussion, but the most relevant
> link
> >> is
> >>> the specific process at Moving a project to the Attic
> >>> <http://attic.apache.org/process.html>.
> >>>
> >>> As noted in the process page, this is a PMC vote. As usual, feel
> >> encouraged
> >>> to contribute non-binding votes.
> >>>
> >>> The vote will run 72 hours, until Nov 19th at 9:00 am EST.
> >>>
> >>> Thank you,
> >>> Justin
> >>>
> >>
>


Re: [DISCUSS] Retire Metron to the Attic

2020-11-09 Thread Casey Stella
Hi all,
I'm in complete support of this. Given the current level of interest, I
believe that we should move this project to the attic.

Best,

Casey Stella

On Mon, Nov 2, 2020 at 7:28 PM Justin Leet  wrote:

> Hi all,
>
> I want to start a discussion in the community to consider retiring Metron
> to the Apache Attic and dissolve the project management committee. For
> anyone unaware, an overview is available at Apache Attic
> <https://attic.apache.org/>. In short, it's a way to provide a wind-down
> process for projects reaching their end of life. This process, and
> potential next steps, can be found here:
> http://attic.apache.org/process.html.
>
> We've seen a substantial decrease in contributions (both code and mailing
> lists) over the past year or so (see:
> https://github.com/apache/metron/commits/master). There's been some
> interest in the project, but in my personal opinion, not enough to build or
> sustain real momentum. Additionally, there's been an inability to close the
> loop on paring down the project's scope and building a solid foundation for
> future work (see: Development Activity has dropped to effectively 0, what
> should we do?
> <
> https://lists.apache.org/thread.html/rf7ea1c1afb347e352efff50f58fbd58779f71e6c814d2a5563e381d7%40%3Cdev.metron.apache.org%3E
> >).
> Finally, the last release was over a year ago and I don't see a release
> naturally building anytime soon. We haven't seen much active development or
> participation for a substantial period of time, which to me implies we've
> reached a natural end of life.
>
> The most obvious practical effects would be no new releases, setting source
> read-only, mailing lists closed down, automated processes ended, etc. In
> addition, the PMC would also be dissolved as part of this process.
> Community members may still fork all or part of Metron, but should keep in
> mind that the Metron trademark remains with the ASF.
>
> What opinions are there either in favor of or against moving the project to
> the Attic? Any questions about the process (or comments/corrections anyone
> would like to add)?
>
> Thanks,
> Justin
>


<    1   2