[ANNOUNCE] Metron Apache Community Demo for Metron_0.3.1

2017-02-07 Thread James Sirota
Lets hold the demo of the latest Metron build from 9-AM to 10AM PST on Feb.10, 
2017.  Please respond to this thread with the set of features that you would 
like to demo

Topic: Metron Community Demo


Join from PC, Mac, Linux, iOS or Android: 
https://hortonworks.zoom.us/j/263433204
 
Or join by phone:

+1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
+1 855 880 1246 (US Toll Free)
+1 877 369 0926 (US Toll Free)
Meeting ID: 263 433 204 
International numbers available: 
https://hortonworks.zoom.us/zoomconference?m=TCITFbHy7jY9C0o_Ylpbpx6gzd_9L9W7 

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2

2017-02-07 Thread Matt Foley
Casey, the below vote call message has several inconsistencies that invalidate 
it.  Please search for “RC1” or “rc1”.  I count three, starting with the first 
line :-)  There is also an instance of “0.3.0”.
Thanks,
--Matt

On 2/7/17, 8:18 AM, "Casey Stella"  wrote:

This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating


Full list of changes in this release:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/CHANGES


The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating:


https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating

The source archive being voted upon can be found here:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz

Other release files, signatures and digests can be found here:


https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/

The release artifacts are signed with the following key:


https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc2-incubating


Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating


When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted here:

https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds


This vote will be open for at least 72 hours.


[ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating

[ ]  0 No opinion

[ ] -1 Do not release this package because...





[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
 #14.1 passed

 Elapsed time 23 min 28 sec
On my second try around 5pm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2

2017-02-07 Thread Nick Allen
+1 binding

Valid checksums
Build successful
Integration tests successful
Deploy of "Quick Dev" successful
Deploy of "Full Dev" successful


On Tue, Feb 7, 2017 at 11:18 AM, Casey Stella  wrote:

> This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating
>
>
> Full list of changes in this release:
>
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC2-incubating/CHANGES
>
>
> The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating
>
> The source archive being voted upon can be found here:
>
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz
>
> Other release files, signatures and digests can be found here:
>
> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> 1-RC2-incubating/
>
> The release artifacts are signed with the following key:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/
> apache-metron-0.3.1-rc2-incubating
>
>
> Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating
>
>
> When voting, please list the actions taken to verify the release.
>
> Recommended build validation and verification instructions are posted here:
>
> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
>
>
> This vote will be open for at least 72 hours.
>
>
> [ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating
>
> [ ]  0 No opinion
>
> [ ] -1 Do not release this package because...
>


Re: [GitHub] incubator-metron issue #439: add stellar external functions feature (work in...

2017-02-07 Thread Tyler Moore
Sorry for late reply work has been crazy around here too.
Changed name and added a few comments and I am not sure if I mentioned it
but devopsec is just my dev handle / dev name, wasn't sure if that was
apparent.
I will push up another revision later this week, add in your comments and
i'll revise as we go, thanks.

Regards,

Tyler Moore
Software Engineer
Phone: 248-909-2769
Email: tmo...@goflyball.com 


On Sat, Feb 4, 2017 at 9:01 PM, JonZeolla  wrote:

> Github user JonZeolla commented on the issue:
>
> https://github.com/apache/incubator-metron/pull/439
>
> The title of this PR should be "METRON-571: Add stellar keywords for
> executing local commands".
>
> This was originally my JIRA, I just haven't been able to work on it
> and Tyler had a similar need so he's taking a stab at it.  That said, I
> will try to provide some context in response to your comments, but
> @devopsec please correct me if I have misunderstood.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[GitHub] incubator-metron pull request #445: METRON-706: Add Stellar transformations ...

2017-02-07 Thread mmiklavc
GitHub user mmiklavc opened a pull request:

https://github.com/apache/incubator-metron/pull/445

METRON-706: Add Stellar transformations and filters to enrichment and 
threat intel loaders

This PR completes work in https://issues.apache.org/jira/browse/METRON-706

(Note: there are commits from @cestella that I had merged in the process of 
working on this. They are squashed in master but show up here. They only show 
in the commit history, not the diff)

Motivation for this PR is to expand where we expose Stellar capabilities. 
This work enables transformations and filtering on enrichment and threatintel 
extractors. The user is now able to specify transformation expressions on the 
column values and separately filter records based on a provided predicate. The 
same can also be done independently for the key indicator value used as part of 
the HBase key. In addition, a new property has been added to the configuration 
that allows a user to specify a Zookeeper quorum and reference global 
properties specified in the global config.

See the updated README for documentation details on the new properties.

**Testing**

Testing follows closely with the methods defined in 
[#432](https://github.com/apache/incubator-metron/pull/432#issuecomment-276733075)

* Download the Alexa top 1m data set
```
wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
unzip top-1m.csv.zip
```

* Stage import file
```
head -n 1 top-1m.csv > top-10k.csv
head -n 10 top-1m.csv > top-10.csv
```

* Create an extractor.json for the CSV data by editing extractor.json and 
pasting in these contents. (Set your zk_quorum to your own value if different 
from the default Vagrant quick-dev environment):
```
{
  "config" : {
"zk_quorum" : "node1:2181",
"columns" : {
   "rank" : 0,
   "domain" : 1
},
"value_transform" : {
   "domain" : "DOMAIN_REMOVE_TLD(domain)",
   "port" : "es.port"
},
"value_filter" : "LENGTH(domain) > 0",
"indicator_column" : "domain",
"indicator_transform" : {
   "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
},
"indicator_filter" : "LENGTH(indicator) > 0",
"type" : "top_domains",
"separator" : ","
  },
  "extractor" : "CSV"
}
```

The "port" property/variable here is referencing "es.port" from the global 
config.

* Run the import (parallelism of 5, batch size of 128)
```
echo "truncate 'enrichment'" | hbase shell && 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e 
./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell
```

You should see 9275 records in HBase. (Less than the perhaps expected 10k)

* Now run it again on the top-10 set.
```
echo "truncate 'enrichment'" | hbase shell && 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10.csv -t enrichment -c t -e 
./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell
```

You should get 9 values as below:
```
scan 'enrichment'
ROW 
COLUMN+CELL
 \x09\x00\x0F,\x10\xE5\xD1\xDE_\xBF\x9E\xA7d\xF2\xA8\x94\x00\x0Btop_dom 
column=t:v, timestamp=1486513090953, 
value={"port":"9300","domain":"yahoo","rank":"5"}
 ains\x00\x05yahoo
 \x11\xCA\xCF\x01\xB4\xC5\x11@\x0C\xA1A,\xE9j~O\x00\x0Btop_domains\x00\ 
column=t:v, timestamp=1486513090979, 
value={"port":"9300","domain":"tmall","rank":"10"}
 x05tmall
 \x13)`\xFC\xF2\xBF\xF9\xC1a\xC8a\xF1h\x0E\xB5\x11\x00\x0Btop_domains\x 
column=t:v, timestamp=1486513090930, 
value={"port":"9300","domain":"youtube","rank":"2"}
 00\x07youtube
 1\xC2I\x05k\xEA\x0EY\xE1\xAD\xA0$U\xA9kc\x00\x0Btop_domains\x00\x06goo 
column=t:v, timestamp=1486513090964, 
value={"port":"9300","domain":"google","rank":"7"}
 gle
 =\xDD\xDFH\x95\xC0\xB9\xD9\xBAKX\x8B\x9B2T\x9F\x00\x0Btop_domains\x00\ 
column=t:v, timestamp=1486513090942, 
value={"port":"9300","domain":"facebook","rank":"3"}
 x08facebook
 D\xDE\x1C\x9A\xCF\x07S\x9A\xDEB\xDB\x87D\x1F\x1D\xF4\x00\x0Btop_domain 
column=t:v, timestamp=1486513090974, 
value={"port":"9300","domain":"qq","rank":"9"}
 s\x00\x02qq
 u\xBC\xFC\xC9\x09\x9Af\xE1\xC8\xA5\x9A\x93\xCB0c\x01\x00\x0Btop_domain 
column=t:v, timestamp=1486513090970, 
value={"port":"9300","domain":"amazon","rank":"8"}
 s\x00\x06amazon
 \xC7\xA5.l\xC21\xFAQ8\x1E\x5C\x99p\x93_\x9A\x00\x0Btop_domains\x00\x09 
column=t:v, timestamp=1486513090958, 
value={"port":"9300","domain":"wikipedia","rank":"6"}
 wikipedia
 \xCC\xCA\xBF;\x92\xA1\xA0k\xE4\x83i\xBD\xC3\xA8\xE8p\x00\x0Btop_domain 
column=t:v, timestamp=1486513090948, 
value={"port":"9300","domain":"baidu","rank":"4"}
 s\x00\x05baidu
```

Once again, 

[GitHub] incubator-metron pull request #439: METRON-571 add stellar external function...

2017-02-07 Thread devopsec
Github user devopsec commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/439#discussion_r99967075
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/ExternalFunctions.java
 ---
@@ -0,0 +1,292 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.common.dsl.functions;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.util.List;
+import java.lang.ProcessBuilder;
+import java.lang.ClassLoader;
+import java.lang.reflect.Method;
+import java.util.Map;
+import java.util.regex.Pattern;
+import com.google.common.base.Joiner;
+import com.google.common.base.Splitter;
+import com.google.common.collect.Iterables;
+import org.apache.metron.common.dsl.Context;
+import org.apache.metron.common.dsl.StellarFunction;
+import org.apache.metron.common.dsl.ParseException;
+import org.apache.metron.common.dsl.Stellar;
+
+/**
+ * Executes external script on server via stellar process
+ */
+public class ExternalFunctions {
+
+   public static class ExecuteScript implements StellarFunction {
+
+private ThreadedStreamHandler inStream;
+private ThreadedStreamHandler errStream;
+private boolean isOnTheList = false;
+
+@Stellar(name="EXEC_SCRIPT",
+description = "Executes an external shell function via 
stellar.",
+params = {
+"exec - the executing cmd (ie. bash, sh, python)",
+"name - name of the script, located in /scripts " +
+"Do NOT include any special chars 
except(_), Do include file extension"
+},
+returns = "the return value of the function"
+)
+
+   @Override
+public Object apply(List args, Context context) throws 
ParseException {
+String exec = "";
+String name = "";
+String path = "";
+
+// if args are provided, get args, only if in whitelist
+if (args.size() >= 1) {
+Object execObj = args.get(0);
+if (!(execObj instanceof String)) { //check if string
+return null;
+}
+else if (((String) execObj).length() > 0) {
+exec = (String) execObj;
+}
+else {
+return null;
+}
+
+Object nameObj = args.get(1);
+if (!(nameObj instanceof String)) { //check if string
+return null;
+}
+else if (((String) nameObj).length() > 0) {
+name = (String) nameObj;
+}
+else {
+return null;
+}
+
+if (!Pattern.matches("[0-9A-Za-z.]+", name)) {
+return null; //if not on whitelist
+}
+
+path = "/scripts" + name;
+try {
+File script = new File(path);
+if (!script.exists() || script.isDirectory()) {
+return null;
+}
+}
+catch (NullPointerException e)  {
+System.err.println("Error: " + e.toString());
+return null;
+}
--- End diff --

I did plan on adding real error handling, are there any exceptions that 
should be thrown up the chain? I am thinking we should allow some of these to 
be logged in the storm logs or log them all to some location depending 

[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
Also, with only 30 parallel builds available across all apache projects, 
asking for 2 separate containers per build might be greedy. ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
Ok, this was just too volatile.  I also got 43 minutes for the integration 
test run.  It's probably worth investigating in the future, but for now I'm 
going to revert to just one container:
* parallelize the package phase
* parallelize the unit test phase
* run the integration tests in serial


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
@ottobackwards Yeah, we have to use the apache [job 
queue](https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci) 
(which has 30 parallel builds) on travis rather than the big one for all open 
source projects.  I suspect it gets slammed during the day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
you bet btw -the build on this pr isn't even going at all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
Looking at those logs, it appears that the first phase, the build (not the 
actual integration tests) is 4 minutes 39 seconds in my integration-test phase 
[build](https://api.travis-ci.org/jobs/199327334/log.txt?deansi=true) on travis 
vs 21 minutes 53 seconds on 
[yours](https://api.travis-ci.org/jobs/199370358/log.txt?deansi=true)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
integration tests took 43 minutes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
 #14 passed

 Elapsed time 43 min 10 sec
 Total time 1 hr 6 min 21 sec


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
Yep, that's how I did it.  You can see a build of it on my account 
[here](https://travis-ci.org/cestella/incubator-metron/builds/199349340)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
@ottobackwards Yes, that should do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...

2017-02-07 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/444
  
So to test this - if you have travis set up for your git account, I think 
you just push it to a branch in your repo and let travis build it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
haha there was some desperation there, I'll admit. ;)

On Tue, Feb 7, 2017 at 3:12 PM, Otto Fowler  wrote:

> This PR gets a star just for the commit messages, it isn’t even Friday
> Casey
>
>
> On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote:
>
> I spent a minute or two looking at how we might use travis
> configuration-alone to drop the wall-clock time of the build and put it up
> for review at https://github.com/apache/incubator-metron/pull/444
>
> It does 2 things:
>
> - Separates the build, the unit tests and the integration tests
> - Parallelizes the unit tests and the build and runs the integration
> tests within the travis container
> - Runs the unit tests and integration tests in separate travis
> containers using travis' build matrix
>
> This ultimately cuts the wallclock time down to 24 minutes for me on
> travis
> and should give us some time where we're not constantly bouncing builds to
> act on the suggestions here.
>
>
> On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
> >
> > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle 
> wrote:
> >
> > > Absolutely agree. I also think we'd want both once we've done that.
> > Travis
> > > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> > runs
> > > of medium duration tests and would be great for automating our
> > distributed
> > > testing if we found infrastructure to support it. I've seen them used
> in
> > > concert to provide a good solution.
> > >
> > > But, initially, I'd like to see us get our in-process stuff replaced
> with
> > > docker where (if) it makes sense, refactored to run in parallel, the
> poms
> > > refactored to handle our dependencies better and our uber jars removed
> > > where they can be and minimized where they cannot be.
> > >
> > > Which, I think, is a long-winded way of saying "I'd like to see us do
> > what
> > > Casey suggested." :)
> > >
> > > -D...
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > I agree with this. I don't think we should switch to an alternate
> > system
> > > > until we find that we are absolutely incapable of eking out any
> further
> > > > efficiency from the current setup.
> > > >
> > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> > wrote:
> > > >
> > > > > I believe that some people use travis and some people request
> Jenkins
> > > > from
> > > > > Apache Infra. That being said, personally, I think we should take
> > the
> > > > > opportunity to correct the underlying issues. 50 minutes for a
> build
> > > > seems
> > > > > excessive to me.
> > > > >
> > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Is there an alternative to Travis? Do other like sized apache
> > > projects
> > > > > > have these problems? Do they use travis?
> > > > > >
> > > > > >
> > > > > > On February 6, 2017 at 17:02:37, Casey Stella (
> ceste...@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > For those with pending/building pull requests, it will come as
> no
> > > > > surprise
> > > > > > that our build times are increasing at a pace that is worrisome.
> In
> > > > fact,
> > > > > > we have hit a fundamental limit associated with Travis over the
> > > > weekend.
> > > > > > We have creeped up into the 40+ minute build territory and
> travis
> > > seems
> > > > > to
> > > > > > error out at around 49 minutes.
> > > > > >
> > > > > > Taking the current build (
> > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > > looking
> > > > > at
> > > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > > seconds)
> > > > in
> > > > > > tests out of 44 minutes and 42 seconds to do the build. This
> places
> > > the
> > > > > > unit tests at around 43% of the build time. I say all of this to
> > > point
> > > > > out
> > > > > > that while unit tests are a portion of the build, they are not
> even
> > > the
> > > > > > majority of the build time. We need an approach that addresses
> the
> > > > whole
> > > > > > build performance holistically and we need it soonest.
> > > > > >
> > > > > > To seed the discussion, I will point to a few things that come
> to
> > > mind
> > > > > > that
> > > > > > fit into three broad categories:
> > > > > >
> > > > > > *Tests are Slow*
> > > > > >
> > > > > >
> > > > > > - *Tactical*: We have around 13 tests that take more than 30
> > seconds
> > > > and
> > > > > > make up 14 minutes of the build. Considering what we can do to
> > speed
> > > > > those
> > > > > > tests as a tactical approach may be worth considering
> > > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > > multiple
> > > > > > tests, instead use the docker infrastructure to spin them 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Otto Fowler
This PR gets a star just for the commit messages, it isn’t even Friday Casey


On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote:

I spent a minute or two looking at how we might use travis
configuration-alone to drop the wall-clock time of the build and put it up
for review at https://github.com/apache/incubator-metron/pull/444

It does 2 things:

- Separates the build, the unit tests and the integration tests
- Parallelizes the unit tests and the build and runs the integration
tests within the travis container
- Runs the unit tests and integration tests in separate travis
containers using travis' build matrix

This ultimately cuts the wallclock time down to 24 minutes for me on travis
and should give us some time where we're not constantly bouncing builds to
act on the suggestions here.


On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
>
> On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
>
> > Absolutely agree. I also think we'd want both once we've done that.
> Travis
> > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> runs
> > of medium duration tests and would be great for automating our
> distributed
> > testing if we found infrastructure to support it. I've seen them used
in
> > concert to provide a good solution.
> >
> > But, initially, I'd like to see us get our in-process stuff replaced
with
> > docker where (if) it makes sense, refactored to run in parallel, the
poms
> > refactored to handle our dependencies better and our uber jars removed
> > where they can be and minimized where they cannot be.
> >
> > Which, I think, is a long-winded way of saying "I'd like to see us do
> what
> > Casey suggested." :)
> >
> > -D...
> >
> >
> > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I agree with this. I don't think we should switch to an alternate
> system
> > > until we find that we are absolutely incapable of eking out any
further
> > > efficiency from the current setup.
> > >
> > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> wrote:
> > >
> > > > I believe that some people use travis and some people request
Jenkins
> > > from
> > > > Apache Infra. That being said, personally, I think we should take
> the
> > > > opportunity to correct the underlying issues. 50 minutes for a
build
> > > seems
> > > > excessive to me.
> > > >
> > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> ottobackwa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Is there an alternative to Travis? Do other like sized apache
> > projects
> > > > > have these problems? Do they use travis?
> > > > >
> > > > >
> > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)

> > > > wrote:
> > > > >
> > > > > For those with pending/building pull requests, it will come as no
> > > > surprise
> > > > > that our build times are increasing at a pace that is worrisome.
In
> > > fact,
> > > > > we have hit a fundamental limit associated with Travis over the
> > > weekend.
> > > > > We have creeped up into the 40+ minute build territory and travis
> > seems
> > > > to
> > > > > error out at around 49 minutes.
> > > > >
> > > > > Taking the current build (
> > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > looking
> > > > at
> > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > seconds)
> > > in
> > > > > tests out of 44 minutes and 42 seconds to do the build. This
places
> > the
> > > > > unit tests at around 43% of the build time. I say all of this to
> > point
> > > > out
> > > > > that while unit tests are a portion of the build, they are not
even
> > the
> > > > > majority of the build time. We need an approach that addresses
the
> > > whole
> > > > > build performance holistically and we need it soonest.
> > > > >
> > > > > To seed the discussion, I will point to a few things that come to
> > mind
> > > > > that
> > > > > fit into three broad categories:
> > > > >
> > > > > *Tests are Slow*
> > > > >
> > > > >
> > > > > - *Tactical*: We have around 13 tests that take more than 30
> seconds
> > > and
> > > > > make up 14 minutes of the build. Considering what we can do to
> speed
> > > > those
> > > > > tests as a tactical approach may be worth considering
> > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > multiple
> > > > > tests, instead use the docker infrastructure to spin them up once
> and
> > > > then
> > > > > use them throughout the tests.
> > > > >
> > > > >
> > > > > *Tests aren't parallel*
> > > > >
> > > > > Currently we cannot run the build in parallel due to the
> integration
> > > test
> > > > > infrastructure spinning up its own services that bind to the same
> > > ports.
> > > > > If we correct this, we can run the builds in parallel with mvn -T
> > > > >
> > > > > - 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Down to 24 minutes?  Nice job.

On Tue, Feb 7, 2017 at 1:49 PM, Casey Stella  wrote:

> I spent a minute or two looking at how we might use travis
> configuration-alone to drop the wall-clock time of the build and put it up
> for review at https://github.com/apache/incubator-metron/pull/444
>
> It does 2 things:
>
>- Separates the build, the unit tests and the integration tests
>- Parallelizes the unit tests and the build and runs the integration
>tests within the travis container
>- Runs the unit tests and integration tests in separate travis
>containers using travis' build matrix
>
> This ultimately cuts the wallclock time down to 24 minutes for me on travis
> and should give us some time where we're not constantly bouncing builds to
> act on the suggestions here.
>
>
> On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
> >
> > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
> >
> > > Absolutely agree. I also think we'd want both once we've done that.
> > Travis
> > > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> > runs
> > > of medium duration tests and would be great for automating our
> > distributed
> > > testing if we found infrastructure to support it. I've seen them used
> in
> > > concert to provide a good solution.
> > >
> > > But, initially, I'd like to see us get our in-process stuff replaced
> with
> > > docker where (if) it makes sense, refactored to run in parallel, the
> poms
> > > refactored to handle our dependencies better and our uber jars removed
> > > where they can be and minimized where they cannot be.
> > >
> > > Which, I think, is a long-winded way of saying "I'd like to see us do
> > what
> > > Casey suggested." :)
> > >
> > > -D...
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > I agree with this. I don't think we should switch to an alternate
> > system
> > > > until we find that we are absolutely incapable of eking out any
> further
> > > > efficiency from the current setup.
> > > >
> > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> > wrote:
> > > >
> > > > > I believe that some people use travis and some people request
> Jenkins
> > > > from
> > > > > Apache Infra.  That being said, personally, I think we should take
> > the
> > > > > opportunity to correct the underlying issues.  50 minutes for a
> build
> > > > seems
> > > > > excessive to me.
> > > > >
> > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Is there an alternative to Travis?  Do other like sized apache
> > > projects
> > > > > > have these problems?  Do they use travis?
> > > > > >
> > > > > >
> > > > > > On February 6, 2017 at 17:02:37, Casey Stella (
> ceste...@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > For those with pending/building pull requests, it will come as no
> > > > > surprise
> > > > > > that our build times are increasing at a pace that is worrisome.
> In
> > > > fact,
> > > > > > we have hit a fundamental limit associated with Travis over the
> > > > weekend.
> > > > > > We have creeped up into the 40+ minute build territory and travis
> > > seems
> > > > > to
> > > > > > error out at around 49 minutes.
> > > > > >
> > > > > > Taking the current build (
> > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > > looking
> > > > > at
> > > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > > seconds)
> > > > in
> > > > > > tests out of 44 minutes and 42 seconds to do the build. This
> places
> > > the
> > > > > > unit tests at around 43% of the build time. I say all of this to
> > > point
> > > > > out
> > > > > > that while unit tests are a portion of the build, they are not
> even
> > > the
> > > > > > majority of the build time. We need an approach that addresses
> the
> > > > whole
> > > > > > build performance holistically and we need it soonest.
> > > > > >
> > > > > > To seed the discussion, I will point to a few things that come to
> > > mind
> > > > > > that
> > > > > > fit into three broad categories:
> > > > > >
> > > > > > *Tests are Slow*
> > > > > >
> > > > > >
> > > > > > - *Tactical*: We have around 13 tests that take more than 30
> > seconds
> > > > and
> > > > > > make up 14 minutes of the build. Considering what we can do to
> > speed
> > > > > those
> > > > > > tests as a tactical approach may be worth considering
> > > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > > multiple
> > > > > > tests, instead use the docker infrastructure to spin them up once
> > and
> > > > > then
> > > > > > use them throughout the tests.
> > > > > >
> > > > > >
> > > > > > *Tests aren't parallel*
> > > > > >
> > > > > > Currently we cannot run 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
I spent a minute or two looking at how we might use travis
configuration-alone to drop the wall-clock time of the build and put it up
for review at https://github.com/apache/incubator-metron/pull/444

It does 2 things:

   - Separates the build, the unit tests and the integration tests
   - Parallelizes the unit tests and the build and runs the integration
   tests within the travis container
   - Runs the unit tests and integration tests in separate travis
   containers using travis' build matrix

This ultimately cuts the wallclock time down to 24 minutes for me on travis
and should give us some time where we're not constantly bouncing builds to
act on the suggestions here.


On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
>
> On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
>
> > Absolutely agree. I also think we'd want both once we've done that.
> Travis
> > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> runs
> > of medium duration tests and would be great for automating our
> distributed
> > testing if we found infrastructure to support it. I've seen them used in
> > concert to provide a good solution.
> >
> > But, initially, I'd like to see us get our in-process stuff replaced with
> > docker where (if) it makes sense, refactored to run in parallel, the poms
> > refactored to handle our dependencies better and our uber jars removed
> > where they can be and minimized where they cannot be.
> >
> > Which, I think, is a long-winded way of saying "I'd like to see us do
> what
> > Casey suggested." :)
> >
> > -D...
> >
> >
> > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I agree with this. I don't think we should switch to an alternate
> system
> > > until we find that we are absolutely incapable of eking out any further
> > > efficiency from the current setup.
> > >
> > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> wrote:
> > >
> > > > I believe that some people use travis and some people request Jenkins
> > > from
> > > > Apache Infra.  That being said, personally, I think we should take
> the
> > > > opportunity to correct the underlying issues.  50 minutes for a build
> > > seems
> > > > excessive to me.
> > > >
> > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> ottobackwa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Is there an alternative to Travis?  Do other like sized apache
> > projects
> > > > > have these problems?  Do they use travis?
> > > > >
> > > > >
> > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > > > wrote:
> > > > >
> > > > > For those with pending/building pull requests, it will come as no
> > > > surprise
> > > > > that our build times are increasing at a pace that is worrisome. In
> > > fact,
> > > > > we have hit a fundamental limit associated with Travis over the
> > > weekend.
> > > > > We have creeped up into the 40+ minute build territory and travis
> > seems
> > > > to
> > > > > error out at around 49 minutes.
> > > > >
> > > > > Taking the current build (
> > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > looking
> > > > at
> > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > seconds)
> > > in
> > > > > tests out of 44 minutes and 42 seconds to do the build. This places
> > the
> > > > > unit tests at around 43% of the build time. I say all of this to
> > point
> > > > out
> > > > > that while unit tests are a portion of the build, they are not even
> > the
> > > > > majority of the build time. We need an approach that addresses the
> > > whole
> > > > > build performance holistically and we need it soonest.
> > > > >
> > > > > To seed the discussion, I will point to a few things that come to
> > mind
> > > > > that
> > > > > fit into three broad categories:
> > > > >
> > > > > *Tests are Slow*
> > > > >
> > > > >
> > > > > - *Tactical*: We have around 13 tests that take more than 30
> seconds
> > > and
> > > > > make up 14 minutes of the build. Considering what we can do to
> speed
> > > > those
> > > > > tests as a tactical approach may be worth considering
> > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > multiple
> > > > > tests, instead use the docker infrastructure to spin them up once
> and
> > > > then
> > > > > use them throughout the tests.
> > > > >
> > > > >
> > > > > *Tests aren't parallel*
> > > > >
> > > > > Currently we cannot run the build in parallel due to the
> integration
> > > test
> > > > > infrastructure spinning up its own services that bind to the same
> > > ports.
> > > > > If we correct this, we can run the builds in parallel with mvn -T
> > > > >
> > > > > - Correct this by decoupling the infrastructure from the tests and
> > > > > refactoring the tests to run in parallel.
> > > > > - Make the 

[GitHub] incubator-metron pull request #444: METRON-705: Parallelize the build in tra...

2017-02-07 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/incubator-metron/pull/444

METRON-705: Parallelize the build in travis to the extent that is obvious

Travis suggests 
[here](https://blog.travis-ci.com/2012-11-28-speeding-up-your-tests-by-parallelizing-them/)
 that for situations where the integration get chunky, one can parallelize them 
using their build matrix functionality. Also, if we can separate those out, we 
can also process-parallelize the unit and build.

Currently the build time is cut roughly in half to 24 minutes wall-clock.

**NOTE: This is just a stopgap that requires no code changes to lower build 
wall-clock times.  This is not intended to replace work parallelizing the 
integration tests or making the build take less time.**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron parallel_build

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #444


commit b0e56a1b0faea33118f5fde5a23dd1982cee8c77
Author: cstella 
Date:   2017-02-07T15:25:36Z

Trying out parallelizing the unit and build but not integration tests.

commit 02d9dc1e211f5a11ed0a7172d76c7cca97590989
Author: cstella 
Date:   2017-02-07T15:29:44Z

Empty push

commit a15daf72c411aed7b05c298eac6816cc5163cc0d
Author: cstella 
Date:   2017-02-07T15:48:31Z

Updating.

commit 4da00b23d26bada058306317a969449a3bd87108
Author: cstella 
Date:   2017-02-07T16:30:20Z

make sure to run rat.

commit f4605994b1db018e60496d02abae369887ef5d78
Author: cstella 
Date:   2017-02-07T16:33:30Z

quiet down rat.

commit 939fb394ff34128fe704aeace8d7b4ce9f4daf41
Author: cstella 
Date:   2017-02-07T16:57:58Z

Updating.

commit 896e9b90aefef63572b0d4f69bd745506d10ebc8
Author: cstella 
Date:   2017-02-07T17:06:24Z

Adding.

commit c06862e87679140079c9250b170b0a9bee25e094
Author: cstella 
Date:   2017-02-07T17:10:02Z

making rat happy.

commit 6299cd23f1498b91630a7c118b8ca4366a7aa4be
Author: cstella 
Date:   2017-02-07T17:12:16Z

skipping other things.

commit 11ec82d1edd6c0e6f158da37b2026b4dc07c466a
Author: cstella 
Date:   2017-02-07T17:26:03Z

Updating to actually run unit tests.

commit 29296e35352d61ffe0f97bf1e9c78158808f7621
Author: cstella 
Date:   2017-02-07T18:06:20Z

Update to build matrix

commit 3eaa6428c3d42e555de472cf3502612350d513df
Author: cstella 
Date:   2017-02-07T18:09:02Z

Whoops.

commit 2e0c4521d7fd8e86ae0e18cfdf487617008bb378
Author: cstella 
Date:   2017-02-07T18:12:43Z

mised the echo

commit 424dd1bf50b84275e0382be50354fceca558659e
Author: cstella 
Date:   2017-02-07T18:38:06Z

putting time statements.

commit ab2688dc2ee772da479ae24862a0b184e8e39379
Author: cstella 
Date:   2017-02-07T19:41:01Z

Commenting the exclude.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: BulkMessageWriterBolt and MessageGetters

2017-02-07 Thread Ryan Merriman
You are correct, the BulkMessageWriterBolt/MessageGetters combination is
not flexible enough.  You would have to modify BulkMessageWriterBolt.  I
have addressed this in METRON-695 which will be submitted as a PR shortly.
It will be easy to do what you want after that is merged in.

Ryan

On Tue, Feb 7, 2017 at 1:24 PM, Nick Allen  wrote:

> I am trying to use the `BulkMessageWriterBolt` to write a specific tuple
> field named "measurement" to a Kafka topic.
>
> -   id: "kafkaBolt"
> className: "org.apache.metron.writer.bolt.BulkMessageWriterBolt"
> constructorArgs:
> - "${kafka.zk}"
> configMethods:
> -   name: "withMessageWriter"
> args:
> - ref: "kafkaWriter"
> -   name: "withMessageGetter"
> args:
> - "measurement"
>
> Rather than wanting the name of a field, it wants the name of a valid
> `MessageGetters` enum; either RAW or NAMED.  It seems like there is no way
> for me to plugin a `NamedMessageGetter` with a custom field name like
> "measurement".
>
> Am I missing something?  Is there a way to do this out-of-the-box?
>


BulkMessageWriterBolt and MessageGetters

2017-02-07 Thread Nick Allen
I am trying to use the `BulkMessageWriterBolt` to write a specific tuple
field named "measurement" to a Kafka topic.

-   id: "kafkaBolt"
className: "org.apache.metron.writer.bolt.BulkMessageWriterBolt"
constructorArgs:
- "${kafka.zk}"
configMethods:
-   name: "withMessageWriter"
args:
- ref: "kafkaWriter"
-   name: "withMessageGetter"
args:
- "measurement"

Rather than wanting the name of a field, it wants the name of a valid
`MessageGetters` enum; either RAW or NAMED.  It seems like there is no way
for me to plugin a `NamedMessageGetter` with a custom field name like
"measurement".

Am I missing something?  Is there a way to do this out-of-the-box?


Re: [Discuss] Direction of metron-docker

2017-02-07 Thread Nick Allen
​Is having a goal of replacing Vagrant/Virtualbox for Docker in "Quick Dev"
and "Full Dev" mutually exclusive of the goals you outlined above?  We
could have both, no?  I am unsure if you are objecting to this specific
goal or not.​

On Mon, Feb 6, 2017 at 1:03 PM, Ryan Merriman  wrote:

> From the README:
>
> "Metron Docker is a Docker Compose application that is intended for
> development and integration testing of Metron. Use this instead of Vagrant
> when:
>
> - You want an environment that can be built and spun up quickly
> - You need to frequently rebuild and restart services
> - You only need to test, troubleshoot or develop against a subset of
> services"
>
> The "Quick Dev" environment actually serves 2 purposes:  a development
> environment and an end-to-end testing environment.  This module was
> intended to supplement or provide an alternative to the development
> environment part of "Quick Dev", not the end-to-end testing part.  It does
> have "Docker" in the name of the module so I can see how that might suggest
> a fully supported deployment option.  It shouldn't be used for that though
> because it doesn't include Ambari or MPack and isn't a true representation
> of a production Metron cluster.
>
> What is the direction?  I could see this evolving into a collection of
> profiles or recipes.  Need to development a custom parser?  Spin up an
> application that only includes the Storm, Kafka and Zookeeper images.  Want
> to develop a custom Kibana dashboard?  Spin up Elasticsearch and Kibana
> images preloaded with data.  Maybe an analytics profile could be created
> that only includes the tools you need for that?  The application that
> exists now in metron-docker could be considered a "rest" profile or a
> collection of containers that support all the functions of the rest API.
> It's very general purpose and supports a lot of use cases so I considered
> it a good starting point.  It's very useful if you're developing a UI and
> have limited knowledge of Ambari or big data platform services.  That was
> the initial motivation.
>
> I think you should view this as more of a toolbox and not a turnkey
> installation solution.  Maintaining and building development environments
> is something Docker is a really good fit for and I have found this works
> much better than our Ansible/Vagrant environment.  It's really fast and
> stays up all the time.
>
> But it's completely optional.  Use it if you think it will help you.  Or
> don't if "Quick Dev" is good enough and you've figured out how to tune it
> so that it's not completely unusable.  If everybody thinks it's confusing
> and no one uses it then we can take it out and I'll just go back to
> maintaining it privately.  But then I would miss out on Kyle's awesome
> contribution :)
>
> Ryan
>
> On Mon, Feb 6, 2017 at 10:12 AM, Nick Allen  wrote:
>
> > So what is the direction then, Ryan?  Can you describe what this is
> > supposed to be used for?
> >
> > I had thought people wanted this to replace the existing Vagrant-based
> > "Quick Dev"?  But apparently this is the assumption that you think I am
> > wrong on.
> >
> >
> >
> > On Mon, Feb 6, 2017 at 10:46 AM, Ryan Merriman 
> > wrote:
> >
> > > I agree with everything Kyle said and I think some of Nick's
> assumptions
> > > are false.  I don't see this a third deployment option.
> > >
> > > I can understand people not wanting to maintain another deployment path
> > > with Metron already being as big as it is.  Ensuring that you've tested
> > and
> > > updated all the appropriate components is already tedious.  But in the
> > case
> > > of this module, is it something that needs to updated anytime someone
> > makes
> > > a deployment related change?  I don't think so and I've never had that
> > > expectation.  The build won't fail and nothing from this project is
> ever
> > > deployed or shipped.  For me, maintaining this tool as needed is good
> > > enough.  What happens if a change is introduced that breaks
> something?  I
> > > discover it as I'm using the tool, fix it, contribute it back and move
> > on.
> > > No big deal.  I had been maintaining this privately for a while before
> > the
> > > PR was submitted and the work to keep it current with master was pretty
> > > minimal.  Does that mean it should live somewhere else besides the
> master
> > > branch in Metron?  I'm not sure what the answer is but there should be
> a
> > > way to share and collaborate with the community on tools like this that
> > > aren't necessarily deployed to production.  Kyle's contribution is
> > valuable
> > > and something I would definitely use.
> > >
> > > Ryan
> > >
> >
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
FYI, found this for Docker - https://docs.travis-ci.com/user/docker/

On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:

> Absolutely agree. I also think we'd want both once we've done that. Travis
> is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
> of medium duration tests and would be great for automating our distributed
> testing if we found infrastructure to support it. I've seen them used in
> concert to provide a good solution.
>
> But, initially, I'd like to see us get our in-process stuff replaced with
> docker where (if) it makes sense, refactored to run in parallel, the poms
> refactored to handle our dependencies better and our uber jars removed
> where they can be and minimized where they cannot be.
>
> Which, I think, is a long-winded way of saying "I'd like to see us do what
> Casey suggested." :)
>
> -D...
>
>
> On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I agree with this. I don't think we should switch to an alternate system
> > until we find that we are absolutely incapable of eking out any further
> > efficiency from the current setup.
> >
> > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:
> >
> > > I believe that some people use travis and some people request Jenkins
> > from
> > > Apache Infra.  That being said, personally, I think we should take the
> > > opportunity to correct the underlying issues.  50 minutes for a build
> > seems
> > > excessive to me.
> > >
> > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > > wrote:
> > >
> > > > Is there an alternative to Travis?  Do other like sized apache
> projects
> > > > have these problems?  Do they use travis?
> > > >
> > > >
> > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > > wrote:
> > > >
> > > > For those with pending/building pull requests, it will come as no
> > > surprise
> > > > that our build times are increasing at a pace that is worrisome. In
> > fact,
> > > > we have hit a fundamental limit associated with Travis over the
> > weekend.
> > > > We have creeped up into the 40+ minute build territory and travis
> seems
> > > to
> > > > error out at around 49 minutes.
> > > >
> > > > Taking the current build (
> > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> looking
> > > at
> > > > just job times, we're spending about 19 - 20 minutes (1176.53
> seconds)
> > in
> > > > tests out of 44 minutes and 42 seconds to do the build. This places
> the
> > > > unit tests at around 43% of the build time. I say all of this to
> point
> > > out
> > > > that while unit tests are a portion of the build, they are not even
> the
> > > > majority of the build time. We need an approach that addresses the
> > whole
> > > > build performance holistically and we need it soonest.
> > > >
> > > > To seed the discussion, I will point to a few things that come to
> mind
> > > > that
> > > > fit into three broad categories:
> > > >
> > > > *Tests are Slow*
> > > >
> > > >
> > > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> > and
> > > > make up 14 minutes of the build. Considering what we can do to speed
> > > those
> > > > tests as a tactical approach may be worth considering
> > > > - We are spinning up the same services (e.g. kafka, storm) for
> multiple
> > > > tests, instead use the docker infrastructure to spin them up once and
> > > then
> > > > use them throughout the tests.
> > > >
> > > >
> > > > *Tests aren't parallel*
> > > >
> > > > Currently we cannot run the build in parallel due to the integration
> > test
> > > > infrastructure spinning up its own services that bind to the same
> > ports.
> > > > If we correct this, we can run the builds in parallel with mvn -T
> > > >
> > > > - Correct this by decoupling the infrastructure from the tests and
> > > > refactoring the tests to run in parallel.
> > > > - Make the integration testing infrastructure bind intelligently to
> > > > whatever port is available.
> > > > - Move the integration tests to their own project. This will let us
> run
> > > > the build in parallel since an individual project's test will be run
> > > > serially.
> > > >
> > > > *Packaging is Painful*
> > > >
> > > > We have a sensitive environment in terms of dependencies. As such, we
> > are
> > > > careful to shade and relocate dependencies that we want to isolate
> from
> > > > our
> > > > transitive dependencies. The consequences of this is that we spend a
> > lot
> > > > of time in the build shading and relocating maven module output.
> > > >
> > > > - Do the hard work to walk our transitive dependencies and ensure
> that
> > > > we are including only one copy of every library by using exclusions
> > > > effectively. This will not only bring down build times, it will make
> > sure
> > > > we know what we're including.
> > > > - Try to devise a strategy where we only shade once at the end. 

[VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2

2017-02-07 Thread Casey Stella
This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating


Full list of changes in this release:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/CHANGES


The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating

The source archive being voted upon can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz

Other release files, signatures and digests can be found here:

https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/

The release artifacts are signed with the following key:

https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc2-incubating


Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating


When voting, please list the actions taken to verify the release.

Recommended build validation and verification instructions are posted here:

https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds


This vote will be open for at least 72 hours.


[ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating

[ ]  0 No opinion

[ ] -1 Do not release this package because...


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread David Lyle
Absolutely agree. I also think we'd want both once we've done that. Travis
is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
of medium duration tests and would be great for automating our distributed
testing if we found infrastructure to support it. I've seen them used in
concert to provide a good solution.

But, initially, I'd like to see us get our in-process stuff replaced with
docker where (if) it makes sense, refactored to run in parallel, the poms
refactored to handle our dependencies better and our uber jars removed
where they can be and minimized where they cannot be.

Which, I think, is a long-winded way of saying "I'd like to see us do what
Casey suggested." :)

-D...


On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I agree with this. I don't think we should switch to an alternate system
> until we find that we are absolutely incapable of eking out any further
> efficiency from the current setup.
>
> On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:
>
> > I believe that some people use travis and some people request Jenkins
> from
> > Apache Infra.  That being said, personally, I think we should take the
> > opportunity to correct the underlying issues.  50 minutes for a build
> seems
> > excessive to me.
> >
> > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > wrote:
> >
> > > Is there an alternative to Travis?  Do other like sized apache projects
> > > have these problems?  Do they use travis?
> > >
> > >
> > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > wrote:
> > >
> > > For those with pending/building pull requests, it will come as no
> > surprise
> > > that our build times are increasing at a pace that is worrisome. In
> fact,
> > > we have hit a fundamental limit associated with Travis over the
> weekend.
> > > We have creeped up into the 40+ minute build territory and travis seems
> > to
> > > error out at around 49 minutes.
> > >
> > > Taking the current build (
> > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> > at
> > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds)
> in
> > > tests out of 44 minutes and 42 seconds to do the build. This places the
> > > unit tests at around 43% of the build time. I say all of this to point
> > out
> > > that while unit tests are a portion of the build, they are not even the
> > > majority of the build time. We need an approach that addresses the
> whole
> > > build performance holistically and we need it soonest.
> > >
> > > To seed the discussion, I will point to a few things that come to mind
> > > that
> > > fit into three broad categories:
> > >
> > > *Tests are Slow*
> > >
> > >
> > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> > > make up 14 minutes of the build. Considering what we can do to speed
> > those
> > > tests as a tactical approach may be worth considering
> > > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > > tests, instead use the docker infrastructure to spin them up once and
> > then
> > > use them throughout the tests.
> > >
> > >
> > > *Tests aren't parallel*
> > >
> > > Currently we cannot run the build in parallel due to the integration
> test
> > > infrastructure spinning up its own services that bind to the same
> ports.
> > > If we correct this, we can run the builds in parallel with mvn -T
> > >
> > > - Correct this by decoupling the infrastructure from the tests and
> > > refactoring the tests to run in parallel.
> > > - Make the integration testing infrastructure bind intelligently to
> > > whatever port is available.
> > > - Move the integration tests to their own project. This will let us run
> > > the build in parallel since an individual project's test will be run
> > > serially.
> > >
> > > *Packaging is Painful*
> > >
> > > We have a sensitive environment in terms of dependencies. As such, we
> are
> > > careful to shade and relocate dependencies that we want to isolate from
> > > our
> > > transitive dependencies. The consequences of this is that we spend a
> lot
> > > of time in the build shading and relocating maven module output.
> > >
> > > - Do the hard work to walk our transitive dependencies and ensure that
> > > we are including only one copy of every library by using exclusions
> > > effectively. This will not only bring down build times, it will make
> sure
> > > we know what we're including.
> > > - Try to devise a strategy where we only shade once at the end. This
> > > could look like some combination of
> > > - standardizing on the lowest common denominator of a troublesome
> > > library
> > > - We shade in dependencies so they can use different versions of
> > > libraries (e.g. metron-common with a modern version of guava) than the
> > > final jars.
> > > - exclusions
> > > - externalizing infrastructure out to not necessitate spinning up
> > 

[GitHub] incubator-metron pull request #443: METRON-703: Rev the version from 0.3.0 t...

2017-02-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-metron/pull/443


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1

2017-02-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/443
  
Built and installed the mpack.  Versioning looks good where it shows up, 
and everything installed and started up correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
I agree with this. I don't think we should switch to an alternate system
until we find that we are absolutely incapable of eking out any further
efficiency from the current setup.

On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to shade and relocate dependencies that we want to isolate from
> > our
> > transitive dependencies. The consequences of this is that we spend a lot
> > of time in the build shading and relocating maven module output.
> >
> > - Do the hard work to walk our transitive dependencies and ensure that
> > we are including only one copy of every library by using exclusions
> > effectively. This will not only bring down build times, it will make sure
> > we know what we're including.
> > - Try to devise a strategy where we only shade once at the end. This
> > could look like some combination of
> > - standardizing on the lowest common denominator of a troublesome
> > library
> > - We shade in dependencies so they can use different versions of
> > libraries (e.g. metron-common with a modern version of guava) than the
> > final jars.
> > - exclusions
> > - externalizing infrastructure out to not necessitate spinning up
> > hadoop components in-process for integration tests (i.e. hbase server
> > conflicts with storm in a few dependencies)
> >
> > *Final Thoughts*
> >
> > If I had three to pick, I'd pick
> >
> > - moving off of the in-memory component infrastructure to docker images
> > - fixing the maven poms to exclude correctly
> > - ensuring the resulting tests are parallelizable
> >
> > I will point out that fixing the maven poms to exclude correctly (i.e. we
> > choose the version of every jar that we depend on transitively) ticks
> > multiple boxes, not just making things faster.
> >
> > What are your thoughts? What did I miss? We need a plan and we need to
> > execute on it soon, otherwise travis is going to keep smacking us hard.
> It
> > may be worth while constructing a tactical plan and then a more strategic
> > plan that we can work toward. I was heartened at how much some of these
> > suggestions dovetail with the discussion around the future of the docker
> > infrastructure.
> >
> > Best,
> >
> > 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread JJ Meyer
Mike, unfortunately something changed recently, and I can't run `mvn clean
install -T 2C` locally anymore.

I'd like to echo that I think working on fixing the dependency issue is a
very good idea. We've actually faced issues with this on the REST API PR.
Working to fix this and having a standard way of including/excluding
dependencies will be helpful to all, and to Ryan's point will benefit us
outside of this context.

On Tue, Feb 7, 2017 at 9:36 AM, Ryan Merriman  wrote:

> Debugging integration tests in an IDE uses the same approach with our
> current infrastructure or with docker:  start up the topology with
> LocalRunner.  I've had mixed success with our current infrastructure.  As
> Mike alluded to, some tests work fine (most of the parser topologies and
> enrichment topology) while others fail when run in my IDE but work on the
> command line (ES integration test due to guava issues and Squid topology
> due to some issue with the remove subdomains Stellar function).  Of course
> with Docker infrastructure you will need a test runner to launch topologies
> in LocalRunner.  They are short and simple though and I have one written
> for each topology that I can share when appropriate.
>
> There are some advantages and disadvantages to switching the integration
> tests to use Docker.  The infrastructure we have now works and could be
> adjusted to overcome it's primary weaknesses (single classloader and start
> up/shutdown after each test).  With Docker the classloader issue goes away
> for the most part (or is much better than it is now) without any extra
> work.  For spinning services up/down once instead of with each test, we
> will need to adjust our tests to clean up after themselves or (even better)
> namespace all testing objects so that tests don't step on each other.  That
> work would have to be done no matter which infrastructure approach we
> take.  Probably the biggest downside to using Docker is that all
> integration tests will need to be adjusted and we'll likely hit some issues
> that we'll need to resolve.  I was bitten several times by services that
> broadcast their host address (Kafka for example) and I bet we hit more of
> those.  We'll also need to add a few more containers (HDFS for sure) but
> those are easy to create as long as you don't hit the issue I just
> mentioned.
>
> I think all of the suggestions so far are good ideas.  I think it goes
> without saying that we should do one at a time and maybe even reassess
> after we see the impact of each change.  I would vote for doing the
> Maven/shading one first because it is all around beneficial, even outside
> of this context.
>
> On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella  wrote:
>
> > I believe that some people use travis and some people request Jenkins
> from
> > Apache Infra.  That being said, personally, I think we should take the
> > opportunity to correct the underlying issues.  50 minutes for a build
> seems
> > excessive to me.
> >
> > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > wrote:
> >
> > > Is there an alternative to Travis?  Do other like sized apache projects
> > > have these problems?  Do they use travis?
> > >
> > >
> > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > wrote:
> > >
> > > For those with pending/building pull requests, it will come as no
> > surprise
> > > that our build times are increasing at a pace that is worrisome. In
> fact,
> > > we have hit a fundamental limit associated with Travis over the
> weekend.
> > > We have creeped up into the 40+ minute build territory and travis seems
> > to
> > > error out at around 49 minutes.
> > >
> > > Taking the current build (
> > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> > at
> > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds)
> in
> > > tests out of 44 minutes and 42 seconds to do the build. This places the
> > > unit tests at around 43% of the build time. I say all of this to point
> > out
> > > that while unit tests are a portion of the build, they are not even the
> > > majority of the build time. We need an approach that addresses the
> whole
> > > build performance holistically and we need it soonest.
> > >
> > > To seed the discussion, I will point to a few things that come to mind
> > > that
> > > fit into three broad categories:
> > >
> > > *Tests are Slow*
> > >
> > >
> > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> > > make up 14 minutes of the build. Considering what we can do to speed
> > those
> > > tests as a tactical approach may be worth considering
> > > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > > tests, instead use the docker infrastructure to spin them up once and
> > then
> > > use them throughout the tests.
> > >
> > >
> > > *Tests aren't parallel*
> > >
> > > Currently we cannot run the build in parallel due 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Debugging integration tests in an IDE uses the same approach with our
current infrastructure or with docker:  start up the topology with
LocalRunner.  I've had mixed success with our current infrastructure.  As
Mike alluded to, some tests work fine (most of the parser topologies and
enrichment topology) while others fail when run in my IDE but work on the
command line (ES integration test due to guava issues and Squid topology
due to some issue with the remove subdomains Stellar function).  Of course
with Docker infrastructure you will need a test runner to launch topologies
in LocalRunner.  They are short and simple though and I have one written
for each topology that I can share when appropriate.

There are some advantages and disadvantages to switching the integration
tests to use Docker.  The infrastructure we have now works and could be
adjusted to overcome it's primary weaknesses (single classloader and start
up/shutdown after each test).  With Docker the classloader issue goes away
for the most part (or is much better than it is now) without any extra
work.  For spinning services up/down once instead of with each test, we
will need to adjust our tests to clean up after themselves or (even better)
namespace all testing objects so that tests don't step on each other.  That
work would have to be done no matter which infrastructure approach we
take.  Probably the biggest downside to using Docker is that all
integration tests will need to be adjusted and we'll likely hit some issues
that we'll need to resolve.  I was bitten several times by services that
broadcast their host address (Kafka for example) and I bet we hit more of
those.  We'll also need to add a few more containers (HDFS for sure) but
those are easy to create as long as you don't hit the issue I just
mentioned.

I think all of the suggestions so far are good ideas.  I think it goes
without saying that we should do one at a time and maybe even reassess
after we see the impact of each change.  I would vote for doing the
Maven/shading one first because it is all around beneficial, even outside
of this context.

On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
I believe that some people use travis and some people request Jenkins from
Apache Infra.  That being said, personally, I think we should take the
opportunity to correct the underlying issues.  50 minutes for a build seems
excessive to me.

On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
wrote:

> Is there an alternative to Travis?  Do other like sized apache projects
> have these problems?  Do they use travis?
>
>
> On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) wrote:
>
> For those with pending/building pull requests, it will come as no surprise
> that our build times are increasing at a pace that is worrisome. In fact,
> we have hit a fundamental limit associated with Travis over the weekend.
> We have creeped up into the 40+ minute build territory and travis seems to
> error out at around 49 minutes.
>
> Taking the current build (
> https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking at
> just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> tests out of 44 minutes and 42 seconds to do the build. This places the
> unit tests at around 43% of the build time. I say all of this to point out
> that while unit tests are a portion of the build, they are not even the
> majority of the build time. We need an approach that addresses the whole
> build performance holistically and we need it soonest.
>
> To seed the discussion, I will point to a few things that come to mind
> that
> fit into three broad categories:
>
> *Tests are Slow*
>
>
> - *Tactical*: We have around 13 tests that take more than 30 seconds and
> make up 14 minutes of the build. Considering what we can do to speed those
> tests as a tactical approach may be worth considering
> - We are spinning up the same services (e.g. kafka, storm) for multiple
> tests, instead use the docker infrastructure to spin them up once and then
> use them throughout the tests.
>
>
> *Tests aren't parallel*
>
> Currently we cannot run the build in parallel due to the integration test
> infrastructure spinning up its own services that bind to the same ports.
> If we correct this, we can run the builds in parallel with mvn -T
>
> - Correct this by decoupling the infrastructure from the tests and
> refactoring the tests to run in parallel.
> - Make the integration testing infrastructure bind intelligently to
> whatever port is available.
> - Move the integration tests to their own project. This will let us run
> the build in parallel since an individual project's test will be run
> serially.
>
> *Packaging is Painful*
>
> We have a sensitive environment in terms of dependencies. As such, we are
> careful to shade and relocate dependencies that we want to isolate from
> our
> transitive dependencies. The consequences of this is that we spend a lot
> of time in the build shading and relocating maven module output.
>
> - Do the hard work to walk our transitive dependencies and ensure that
> we are including only one copy of every library by using exclusions
> effectively. This will not only bring down build times, it will make sure
> we know what we're including.
> - Try to devise a strategy where we only shade once at the end. This
> could look like some combination of
> - standardizing on the lowest common denominator of a troublesome
> library
> - We shade in dependencies so they can use different versions of
> libraries (e.g. metron-common with a modern version of guava) than the
> final jars.
> - exclusions
> - externalizing infrastructure out to not necessitate spinning up
> hadoop components in-process for integration tests (i.e. hbase server
> conflicts with storm in a few dependencies)
>
> *Final Thoughts*
>
> If I had three to pick, I'd pick
>
> - moving off of the in-memory component infrastructure to docker images
> - fixing the maven poms to exclude correctly
> - ensuring the resulting tests are parallelizable
>
> I will point out that fixing the maven poms to exclude correctly (i.e. we
> choose the version of every jar that we depend on transitively) ticks
> multiple boxes, not just making things faster.
>
> What are your thoughts? What did I miss? We need a plan and we need to
> execute on it soon, otherwise travis is going to keep smacking us hard. It
> may be worth while constructing a tactical plan and then a more strategic
> plan that we can work toward. I was heartened at how much some of these
> suggestions dovetail with the discussion around the future of the docker
> infrastructure.
>
> Best,
>
> Casey
>
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
Mike, I can verify that the integration tests do not run in parallel via
mvn -T 1C clean install

At a minimum the integration test infrastructure will need to hunt for an
open port to bind to rather than assuming one.

On Tue, Feb 7, 2017 at 9:26 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I can't recall, did we have a good solution around Docker and remote
> debugging integration tests from the IDE? On the topic of test refactoring
> and running in parallel, I'm all for it. I know JJ had been doing this on
> his local machine at one point, but we'd need to be sure all tests are
> truly independent. E.g. counts on hbase tables would need to be very
> specific or every test should use unique tables. Also, can we spin up
> something like Docker in Travis? How many cores do we get? I'll look into
> that and see what we get.
>
> I'm all for simplifying our dependencies. Shading the jars takes an
> incredible amount of time and has consistently bitten us repeatedly.
> Another bummer about the jar shading has been that the build runs
> differently in IntelliJ than it does from the Maven command line. I don't
> think we'll get away from it entirely, but we may be able to make this
> better as well.
>
> From my most recent local build, these are the biggest offending modules:
> metron-profiler  SUCCESS [05:56 min]
> metron-parsers . SUCCESS [09:38 min]
> metron-data-management . SUCCESS [09:15 min]
> elasticsearch-shaded ... SUCCESS [08:05 min]
>
> I'm going to take a look at Travis and also see what pom dependencies I can
> start excluding.
>
>
> On Mon, Feb 6, 2017 at 3:02 PM, Casey Stella  wrote:
>
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome.  In
> fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build.  This places the
> > unit tests at around 43% of the build time.  I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time.  We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> >- *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> >make up 14 minutes of the build.  Considering what we can do to speed
> > those
> >tests as a tactical approach may be worth considering
> >- We are spinning up the same services (e.g. kafka, storm) for
> multiple
> >tests, instead use the docker infrastructure to spin them up once and
> > then
> >use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> >- Correct this by decoupling the infrastructure from the tests and
> >refactoring the tests to run in parallel.
> >- Make the integration testing infrastructure bind intelligently to
> >whatever port is available.
> >- Move the integration tests to their own project.  This will let us
> run
> >the build in parallel since an individual project's test will be run
> >serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies.  As such, we
> are
> > careful to shade and relocate dependencies that we want to isolate from
> our
> > transitive dependencies.  The consequences of this is that we spend a lot
> > of time in the build shading and relocating maven module output.
> >
> >- Do the hard work to walk our transitive dependencies and ensure that
> >we are including only one copy of every library by using exclusions
> >effectively.  This will not only bring down build times, it will make
> > sure
> >we know what we're including.
> >- Try to devise a strategy where we only shade once at the end.  This
> >could look like some combination of
> >   - standardizing on the lowest common denominator of a troublesome
> >   library
> >  - We shade in dependencies so they can use different versions of
> >  libraries (e.g. metron-common with a modern version of guava)
> > than the
> >  

[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/443
  
I verified that this works in vagrant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1

2017-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/443
  
I'm in the process of spinning this up in vagrant and I believe @justinleet 
will be testing out the mpack just to make sure nothing is borked.  Please hold 
off until we report in to commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1

2017-02-07 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/incubator-metron/pull/443
  
+1 Did a quick find-search and did not find any out-of-place 0.3.0 tags 
left.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[RESULT][VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1

2017-02-07 Thread Casey Stella
The vote fails; a new release candidate will be cut when METRON-703 is
accepted.

Results:
+1
Nick Allen
James Sirota
Casey Stella

-1
David Lyle


[GitHub] incubator-metron pull request #443: METRON-703: Rev the version from 0.3.0 t...

2017-02-07 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/incubator-metron/pull/443

METRON-703: Rev the version from 0.3.0 to 0.3.1

In order to release, we need to up the version to 0.3.1 so that the 
artifacts produced continue to function.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron METRON-703

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #443


commit 3d22fc5d9abdc5c5edacb7e8545c3f18e916624f
Author: cstella 
Date:   2017-02-07T14:14:14Z

METRON-703: Upping the version to 0.3.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1

2017-02-07 Thread Casey Stella
Whoops, you're absolutely right.  We forgot to rev the version in the
artifacts.  I'm going to cancel the vote and rerelease when that JIRA gets
in.

On Tue, Feb 7, 2017 at 7:56 AM, David Lyle  wrote:

> -1 Unless I'm mistaken, the artifacts are versioned 0.3.0.
>
> -D...
>
> On Mon, Feb 6, 2017 at 10:46 PM, James Sirota  wrote:
>
> > +1 deployed on AWS
> >
> > 06.02.2017, 15:39, "Nick Allen" :
> > > +1
> > >
> > > Valid checksums
> > > Build successful
> > > Integration tests successful
> > > Deploy of "Full Dev" successful
> > > Deploy of "Quick Dev" successful
> > >
> > > On Mon, Feb 6, 2017 at 3:43 PM, Casey Stella 
> wrote:
> > >
> > >>  This is a call to vote on releasing Apache Metron 0.3.1-RC1
> incubating
> > >>
> > >>  Full list of changes in this release:
> > >>
> > >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > >>  1-RC1-incubating/CHANGES
> > >>
> > >>  The tag/commit to be voted upon is apache-metron-0.3.0-rc1-
> incubating:
> > >>
> > >>  https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> > >>  git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc1-incubating
> > >>
> > >>  The source archive being voted upon can be found here:
> > >>
> > >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > >>  1-RC1-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz
> > >>
> > >>  Other release files, signatures and digests can be found here:
> > >>
> > >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> > >>  1-RC1-incubating/
> > >>
> > >>  The release artifacts are signed with the following key:
> > >>
> > >>  https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> > >>  git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d
> > 9c260ba55e;hb=refs/tags/
> > >>  apache-metron-0.3.1-rc1-incubating
> > >>
> > >>  Please vote on releasing this package as Apache Metron 0.3.1-RC1
> > incubating
> > >>
> > >>  When voting, please list the actions taken to verify the release.
> > >>
> > >>  Recommended build validation and verification instructions are posted
> > here:
> > >>
> > >>  https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
> > >>
> > >>  This vote will be open for at least 72 hours.
> > >>
> > >>  [ ] +1 Release this package as Apache Metron 0.3.1-RC1 incubating
> > >>
> > >>  [ ] 0 No opinion
> > >>
> > >>  [ ] -1 Do not release this package because...
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>


Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1

2017-02-07 Thread David Lyle
-1 Unless I'm mistaken, the artifacts are versioned 0.3.0.

-D...

On Mon, Feb 6, 2017 at 10:46 PM, James Sirota  wrote:

> +1 deployed on AWS
>
> 06.02.2017, 15:39, "Nick Allen" :
> > +1
> >
> > Valid checksums
> > Build successful
> > Integration tests successful
> > Deploy of "Full Dev" successful
> > Deploy of "Quick Dev" successful
> >
> > On Mon, Feb 6, 2017 at 3:43 PM, Casey Stella  wrote:
> >
> >>  This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating
> >>
> >>  Full list of changes in this release:
> >>
> >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> >>  1-RC1-incubating/CHANGES
> >>
> >>  The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating:
> >>
> >>  https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> >>  git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc1-incubating
> >>
> >>  The source archive being voted upon can be found here:
> >>
> >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> >>  1-RC1-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz
> >>
> >>  Other release files, signatures and digests can be found here:
> >>
> >>  https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.
> >>  1-RC1-incubating/
> >>
> >>  The release artifacts are signed with the following key:
> >>
> >>  https://git-wip-us.apache.org/repos/asf?p=incubator-metron.
> >>  git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d
> 9c260ba55e;hb=refs/tags/
> >>  apache-metron-0.3.1-rc1-incubating
> >>
> >>  Please vote on releasing this package as Apache Metron 0.3.1-RC1
> incubating
> >>
> >>  When voting, please list the actions taken to verify the release.
> >>
> >>  Recommended build validation and verification instructions are posted
> here:
> >>
> >>  https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds
> >>
> >>  This vote will be open for at least 72 hours.
> >>
> >>  [ ] +1 Release this package as Apache Metron 0.3.1-RC1 incubating
> >>
> >>  [ ] 0 No opinion
> >>
> >>  [ ] -1 Do not release this package because...
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>


Re: [Discuss] Direction of metron-docker

2017-02-07 Thread Dima Kovalyov
>From a user perspective,

We used Vagrant when we first encountered Metron to see and learn about
it, it was quick and easy to deploy - handy. Now we switched to Ambari
Mpack for internal use. We mostly write and deploy Parsers for Metron,
so having just Ambari Mpack is enough for us. I haven't used Vagrant
ever since we started using Ambari Mpack (which was harder to grasp then
Vagrant though).

I like what Ryan suggests to use Docker for, if it will be as easy to
spin-up (basically if setup will be documented) and allow seamless
development for the core team we then would kill two rabbits with that.

Casey,
> *Is the docker infrastructure sufficient to replace vagrant at the moment?*
>
> I do not consider it to be a sufficient environment to acceptance test
> features because it does not install Metron in a realistic manner that
> mimics a user.  Vagrant isn't currently where it should be in that regard
> and that is the reason that it is currently getting an overhaul to get
> closer to that ideal.
Considering that we move towards Ambari Mpack (something that was not
considered main deployment solution originally) can we really call it
realistic manner once Ansible (and if) will be deprecated? What will be
the difference between Vagrant and just spinning HDP + Metron inside
Virtualbox?

- Dima


On 02/07/2017 05:12 AM, Kyle Richardson wrote:
> I like the idea of porting some of the integration tests to metron-docker.
> I believe the maven plugin used in the rpm-docker project could be used to
> support that goal.
>
> I agree with Ryan in that I see this as more of a toolbox for developers
> than a supported deployment method. That is the vain I originally created
> this PR in actually. I could continue to load the elasticsearch templates
> manually when working with metron-docker but thought it would be worthwhile
> to automate with a few lines of code.
>
> I have another PR just about ready to go to include a hadoop/hdfs container
> in metron-docker. Would folks see value in including this? The idea was to
> provide an easier way to iterate on HDFS indexing options for cold
> storage/archive data.
>
> As for maintainability, the minimum would be to keep consistent versions of
> storm, hbase, etc between the docker containers and the current supported
> HDP stack. The automation pieces are nice to haves (not blockers in my
> mind) and will continue to simplify as we move more configs into zookeeper
> from the filesystem. I can't think of anything too onerous here but I may
> be missing something obvious.
>
> -Kyle
>
> On Mon, Feb 6, 2017 at 2:30 PM, Otto Fowler  wrote:
>
>> Beyond the utility, is the cost of maintaining the docker path.  It is just
>> another thing that reviewers and committers have to keep in mind or know
>> about when looking at PR’s.  Maybe if there was a better and wider spread
>> understanding of the work that is done and how continue it, it would not
>> seem so onerous.  It can’t be something that as long as one or two specific
>> people keep up with it, it will be OK, or rather it should not be.  Even
>> if, or perhaps because it won’t break the build.
>>
>> There is a lot of utility and value to metron-docker, maybe we just need to
>> think through the sustainability and maintaining issues, so it is a how can
>> we make it work to the project’s satisfaction.
>>
>> On February 6, 2017 at 14:11:04, Casey Stella (ceste...@gmail.com) wrote:
>>
>> So, I'm late chiming in here, but I'll go ahead anyway. :)
>>
>> There are a couple of questions here that stand out:
>>
>> *Is the docker infrastructure sufficient to replace vagrant at the moment?*
>>
>> I do not consider it to be a sufficient environment to acceptance test
>> features because it does not install Metron in a realistic manner that
>> mimics a user. Vagrant isn't currently where it should be in that regard
>> and that is the reason that it is currently getting an overhaul to get
>> closer to that ideal.
>>
>> *Does it scratch an itch?*
>>
>> Yes, it does, I think. For those who want a limited portion of metron spun
>> up to smoke-test features in a targeted way, this works well. That being
>> said, in my opinion, you still need to test in vagrant or a cluster. Matt
>> brings up a good point as well about integration test infrastructure. I
>> think there could be an even bigger itch to scratch there as the cost of
>> spinning up and down integration testing components per-test can be time
>> consuming and lead to long build times.
>>
>> *Can we unify them?*
>>
>> I don't know; I'd like to, honestly. I think that it'd be a good
>> discussion to have and it'd be nice to have a path to victory there,
>> because I'm not thrilled about having so many avenues to install. If we
>> don't unify them, I feel that docker will eventually get so far out of date
>> that it will become unusable, frankly.
>>
>>
>> Ultimately, I don't care about the tech stack that we use, docker vs
>> vagrant vs vagrant on docker