[ANNOUNCE] Metron Apache Community Demo for Metron_0.3.1
Lets hold the demo of the latest Metron build from 9-AM to 10AM PST on Feb.10, 2017. Please respond to this thread with the set of features that you would like to demo Topic: Metron Community Demo Join from PC, Mac, Linux, iOS or Android: https://hortonworks.zoom.us/j/263433204 Or join by phone: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll) +1 855 880 1246 (US Toll Free) +1 877 369 0926 (US Toll Free) Meeting ID: 263 433 204 International numbers available: https://hortonworks.zoom.us/zoomconference?m=TCITFbHy7jY9C0o_Ylpbpx6gzd_9L9W7 --- Thank you, James Sirota PPMC- Apache Metron (Incubating) jsirota AT apache DOT org
Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2
Casey, the below vote call message has several inconsistencies that invalidate it. Please search for “RC1” or “rc1”. I count three, starting with the first line :-) There is also an instance of “0.3.0”. Thanks, --Matt On 2/7/17, 8:18 AM, "Casey Stella"wrote: This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating Full list of changes in this release: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/CHANGES The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating: https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating The source archive being voted upon can be found here: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz Other release files, signatures and digests can be found here: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/ The release artifacts are signed with the following key: https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc2-incubating Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating When voting, please list the actions taken to verify the release. Recommended build validation and verification instructions are posted here: https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds This vote will be open for at least 72 hours. [ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating [ ] 0 No opinion [ ] -1 Do not release this package because...
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/444 #14.1 passed Elapsed time 23 min 28 sec On my second try around 5pm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2
+1 binding Valid checksums Build successful Integration tests successful Deploy of "Quick Dev" successful Deploy of "Full Dev" successful On Tue, Feb 7, 2017 at 11:18 AM, Casey Stellawrote: > This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating > > > Full list of changes in this release: > > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > 1-RC2-incubating/CHANGES > > > The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating: > > https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating > > The source archive being voted upon can be found here: > > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > 1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz > > Other release files, signatures and digests can be found here: > > https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > 1-RC2-incubating/ > > The release artifacts are signed with the following key: > > https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/ > apache-metron-0.3.1-rc2-incubating > > > Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating > > > When voting, please list the actions taken to verify the release. > > Recommended build validation and verification instructions are posted here: > > https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds > > > This vote will be open for at least 72 hours. > > > [ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating > > [ ] 0 No opinion > > [ ] -1 Do not release this package because... >
Re: [GitHub] incubator-metron issue #439: add stellar external functions feature (work in...
Sorry for late reply work has been crazy around here too. Changed name and added a few comments and I am not sure if I mentioned it but devopsec is just my dev handle / dev name, wasn't sure if that was apparent. I will push up another revision later this week, add in your comments and i'll revise as we go, thanks. Regards, Tyler Moore Software Engineer Phone: 248-909-2769 Email: tmo...@goflyball.comOn Sat, Feb 4, 2017 at 9:01 PM, JonZeolla wrote: > Github user JonZeolla commented on the issue: > > https://github.com/apache/incubator-metron/pull/439 > > The title of this PR should be "METRON-571: Add stellar keywords for > executing local commands". > > This was originally my JIRA, I just haven't been able to work on it > and Tyler had a similar need so he's taking a stab at it. That said, I > will try to provide some context in response to your comments, but > @devopsec please correct me if I have misunderstood. > > > --- > If your project is set up for it, you can reply to this email and have your > reply appear on GitHub as well. If your project does not have this feature > enabled and wishes so, or if the feature is enabled but not working, please > contact infrastructure at infrastruct...@apache.org or file a JIRA ticket > with INFRA. > --- >
[GitHub] incubator-metron pull request #445: METRON-706: Add Stellar transformations ...
GitHub user mmiklavc opened a pull request: https://github.com/apache/incubator-metron/pull/445 METRON-706: Add Stellar transformations and filters to enrichment and threat intel loaders This PR completes work in https://issues.apache.org/jira/browse/METRON-706 (Note: there are commits from @cestella that I had merged in the process of working on this. They are squashed in master but show up here. They only show in the commit history, not the diff) Motivation for this PR is to expand where we expose Stellar capabilities. This work enables transformations and filtering on enrichment and threatintel extractors. The user is now able to specify transformation expressions on the column values and separately filter records based on a provided predicate. The same can also be done independently for the key indicator value used as part of the HBase key. In addition, a new property has been added to the configuration that allows a user to specify a Zookeeper quorum and reference global properties specified in the global config. See the updated README for documentation details on the new properties. **Testing** Testing follows closely with the methods defined in [#432](https://github.com/apache/incubator-metron/pull/432#issuecomment-276733075) * Download the Alexa top 1m data set ``` wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip unzip top-1m.csv.zip ``` * Stage import file ``` head -n 1 top-1m.csv > top-10k.csv head -n 10 top-1m.csv > top-10.csv ``` * Create an extractor.json for the CSV data by editing extractor.json and pasting in these contents. (Set your zk_quorum to your own value if different from the default Vagrant quick-dev environment): ``` { "config" : { "zk_quorum" : "node1:2181", "columns" : { "rank" : 0, "domain" : 1 }, "value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)", "port" : "es.port" }, "value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain", "indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)" }, "indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains", "separator" : "," }, "extractor" : "CSV" } ``` The "port" property/variable here is referencing "es.port" from the global config. * Run the import (parallelism of 5, batch size of 128) ``` echo "truncate 'enrichment'" | hbase shell && /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e ./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell ``` You should see 9275 records in HBase. (Less than the perhaps expected 10k) * Now run it again on the top-10 set. ``` echo "truncate 'enrichment'" | hbase shell && /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10.csv -t enrichment -c t -e ./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell ``` You should get 9 values as below: ``` scan 'enrichment' ROW COLUMN+CELL \x09\x00\x0F,\x10\xE5\xD1\xDE_\xBF\x9E\xA7d\xF2\xA8\x94\x00\x0Btop_dom column=t:v, timestamp=1486513090953, value={"port":"9300","domain":"yahoo","rank":"5"} ains\x00\x05yahoo \x11\xCA\xCF\x01\xB4\xC5\x11@\x0C\xA1A,\xE9j~O\x00\x0Btop_domains\x00\ column=t:v, timestamp=1486513090979, value={"port":"9300","domain":"tmall","rank":"10"} x05tmall \x13)`\xFC\xF2\xBF\xF9\xC1a\xC8a\xF1h\x0E\xB5\x11\x00\x0Btop_domains\x column=t:v, timestamp=1486513090930, value={"port":"9300","domain":"youtube","rank":"2"} 00\x07youtube 1\xC2I\x05k\xEA\x0EY\xE1\xAD\xA0$U\xA9kc\x00\x0Btop_domains\x00\x06goo column=t:v, timestamp=1486513090964, value={"port":"9300","domain":"google","rank":"7"} gle =\xDD\xDFH\x95\xC0\xB9\xD9\xBAKX\x8B\x9B2T\x9F\x00\x0Btop_domains\x00\ column=t:v, timestamp=1486513090942, value={"port":"9300","domain":"facebook","rank":"3"} x08facebook D\xDE\x1C\x9A\xCF\x07S\x9A\xDEB\xDB\x87D\x1F\x1D\xF4\x00\x0Btop_domain column=t:v, timestamp=1486513090974, value={"port":"9300","domain":"qq","rank":"9"} s\x00\x02qq u\xBC\xFC\xC9\x09\x9Af\xE1\xC8\xA5\x9A\x93\xCB0c\x01\x00\x0Btop_domain column=t:v, timestamp=1486513090970, value={"port":"9300","domain":"amazon","rank":"8"} s\x00\x06amazon \xC7\xA5.l\xC21\xFAQ8\x1E\x5C\x99p\x93_\x9A\x00\x0Btop_domains\x00\x09 column=t:v, timestamp=1486513090958, value={"port":"9300","domain":"wikipedia","rank":"6"} wikipedia \xCC\xCA\xBF;\x92\xA1\xA0k\xE4\x83i\xBD\xC3\xA8\xE8p\x00\x0Btop_domain column=t:v, timestamp=1486513090948, value={"port":"9300","domain":"baidu","rank":"4"} s\x00\x05baidu ``` Once again,
[GitHub] incubator-metron pull request #439: METRON-571 add stellar external function...
Github user devopsec commented on a diff in the pull request: https://github.com/apache/incubator-metron/pull/439#discussion_r99967075 --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/ExternalFunctions.java --- @@ -0,0 +1,292 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.common.dsl.functions; + +import java.io.BufferedReader; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.OutputStream; +import java.io.PrintWriter; +import java.util.List; +import java.lang.ProcessBuilder; +import java.lang.ClassLoader; +import java.lang.reflect.Method; +import java.util.Map; +import java.util.regex.Pattern; +import com.google.common.base.Joiner; +import com.google.common.base.Splitter; +import com.google.common.collect.Iterables; +import org.apache.metron.common.dsl.Context; +import org.apache.metron.common.dsl.StellarFunction; +import org.apache.metron.common.dsl.ParseException; +import org.apache.metron.common.dsl.Stellar; + +/** + * Executes external script on server via stellar process + */ +public class ExternalFunctions { + + public static class ExecuteScript implements StellarFunction { + +private ThreadedStreamHandler inStream; +private ThreadedStreamHandler errStream; +private boolean isOnTheList = false; + +@Stellar(name="EXEC_SCRIPT", +description = "Executes an external shell function via stellar.", +params = { +"exec - the executing cmd (ie. bash, sh, python)", +"name - name of the script, located in /scripts " + +"Do NOT include any special chars except(_), Do include file extension" +}, +returns = "the return value of the function" +) + + @Override +public Object apply(List args, Context context) throws ParseException { +String exec = ""; +String name = ""; +String path = ""; + +// if args are provided, get args, only if in whitelist +if (args.size() >= 1) { +Object execObj = args.get(0); +if (!(execObj instanceof String)) { //check if string +return null; +} +else if (((String) execObj).length() > 0) { +exec = (String) execObj; +} +else { +return null; +} + +Object nameObj = args.get(1); +if (!(nameObj instanceof String)) { //check if string +return null; +} +else if (((String) nameObj).length() > 0) { +name = (String) nameObj; +} +else { +return null; +} + +if (!Pattern.matches("[0-9A-Za-z.]+", name)) { +return null; //if not on whitelist +} + +path = "/scripts" + name; +try { +File script = new File(path); +if (!script.exists() || script.isDirectory()) { +return null; +} +} +catch (NullPointerException e) { +System.err.println("Error: " + e.toString()); +return null; +} --- End diff -- I did plan on adding real error handling, are there any exceptions that should be thrown up the chain? I am thinking we should allow some of these to be logged in the storm logs or log them all to some location depending
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/444 Also, with only 30 parallel builds available across all apache projects, asking for 2 separate containers per build might be greedy. ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/444 Ok, this was just too volatile. I also got 43 minutes for the integration test run. It's probably worth investigating in the future, but for now I'm going to revert to just one container: * parallelize the package phase * parallelize the unit test phase * run the integration tests in serial --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/444 @ottobackwards Yeah, we have to use the apache [job queue](https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci) (which has 30 parallel builds) on travis rather than the big one for all open source projects. I suspect it gets slammed during the day. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/444 you bet btw -the build on this pr isn't even going at all --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/444 Looking at those logs, it appears that the first phase, the build (not the actual integration tests) is 4 minutes 39 seconds in my integration-test phase [build](https://api.travis-ci.org/jobs/199327334/log.txt?deansi=true) on travis vs 21 minutes 53 seconds on [yours](https://api.travis-ci.org/jobs/199370358/log.txt?deansi=true) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/444 integration tests took 43 minutes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/444 #14 passed Elapsed time 43 min 10 sec Total time 1 hr 6 min 21 sec --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/444 Yep, that's how I did it. You can see a build of it on my account [here](https://travis-ci.org/cestella/incubator-metron/builds/199349340) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user mmiklavc commented on the issue: https://github.com/apache/incubator-metron/pull/444 @ottobackwards Yes, that should do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #444: METRON-705: Parallelize the build in travis to ...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/444 So to test this - if you have travis set up for your git account, I think you just push it to a branch in your repo and let travis build it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] Build Times are getting out of hand
haha there was some desperation there, I'll admit. ;) On Tue, Feb 7, 2017 at 3:12 PM, Otto Fowlerwrote: > This PR gets a star just for the commit messages, it isn’t even Friday > Casey > > > On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote: > > I spent a minute or two looking at how we might use travis > configuration-alone to drop the wall-clock time of the build and put it up > for review at https://github.com/apache/incubator-metron/pull/444 > > It does 2 things: > > - Separates the build, the unit tests and the integration tests > - Parallelizes the unit tests and the build and runs the integration > tests within the travis container > - Runs the unit tests and integration tests in separate travis > containers using travis' build matrix > > This ultimately cuts the wallclock time down to 24 minutes for me on > travis > and should give us some time where we're not constantly bouncing builds to > act on the suggestions here. > > > On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/ > > > > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle > wrote: > > > > > Absolutely agree. I also think we'd want both once we've done that. > > Travis > > > is good for smoke testing PRs and Commits. Jenkins is good for nightly > > runs > > > of medium duration tests and would be great for automating our > > distributed > > > testing if we found infrastructure to support it. I've seen them used > in > > > concert to provide a good solution. > > > > > > But, initially, I'd like to see us get our in-process stuff replaced > with > > > docker where (if) it makes sense, refactored to run in parallel, the > poms > > > refactored to handle our dependencies better and our uber jars removed > > > where they can be and minimized where they cannot be. > > > > > > Which, I think, is a long-winded way of saying "I'd like to see us do > > what > > > Casey suggested." :) > > > > > > -D... > > > > > > > > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < > > > michael.miklav...@gmail.com> wrote: > > > > > > > I agree with this. I don't think we should switch to an alternate > > system > > > > until we find that we are absolutely incapable of eking out any > further > > > > efficiency from the current setup. > > > > > > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella > > wrote: > > > > > > > > > I believe that some people use travis and some people request > Jenkins > > > > from > > > > > Apache Infra. That being said, personally, I think we should take > > the > > > > > opportunity to correct the underlying issues. 50 minutes for a > build > > > > seems > > > > > excessive to me. > > > > > > > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler < > > ottobackwa...@gmail.com> > > > > > wrote: > > > > > > > > > > > Is there an alternative to Travis? Do other like sized apache > > > projects > > > > > > have these problems? Do they use travis? > > > > > > > > > > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella ( > ceste...@gmail.com) > > > > > wrote: > > > > > > > > > > > > For those with pending/building pull requests, it will come as > no > > > > > surprise > > > > > > that our build times are increasing at a pace that is worrisome. > In > > > > fact, > > > > > > we have hit a fundamental limit associated with Travis over the > > > > weekend. > > > > > > We have creeped up into the 40+ minute build territory and > travis > > > seems > > > > > to > > > > > > error out at around 49 minutes. > > > > > > > > > > > > Taking the current build ( > > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), > > > looking > > > > > at > > > > > > just job times, we're spending about 19 - 20 minutes (1176.53 > > > seconds) > > > > in > > > > > > tests out of 44 minutes and 42 seconds to do the build. This > places > > > the > > > > > > unit tests at around 43% of the build time. I say all of this to > > > point > > > > > out > > > > > > that while unit tests are a portion of the build, they are not > even > > > the > > > > > > majority of the build time. We need an approach that addresses > the > > > > whole > > > > > > build performance holistically and we need it soonest. > > > > > > > > > > > > To seed the discussion, I will point to a few things that come > to > > > mind > > > > > > that > > > > > > fit into three broad categories: > > > > > > > > > > > > *Tests are Slow* > > > > > > > > > > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 > > seconds > > > > and > > > > > > make up 14 minutes of the build. Considering what we can do to > > speed > > > > > those > > > > > > tests as a tactical approach may be worth considering > > > > > > - We are spinning up the same services (e.g. kafka, storm) for > > > multiple > > > > > > tests, instead use the docker infrastructure to spin them
Re: [DISCUSS] Build Times are getting out of hand
This PR gets a star just for the commit messages, it isn’t even Friday Casey On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote: I spent a minute or two looking at how we might use travis configuration-alone to drop the wall-clock time of the build and put it up for review at https://github.com/apache/incubator-metron/pull/444 It does 2 things: - Separates the build, the unit tests and the integration tests - Parallelizes the unit tests and the build and runs the integration tests within the travis container - Runs the unit tests and integration tests in separate travis containers using travis' build matrix This ultimately cuts the wallclock time down to 24 minutes for me on travis and should give us some time where we're not constantly bouncing builds to act on the suggestions here. On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic < michael.miklav...@gmail.com> wrote: > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/ > > On Tue, Feb 7, 2017 at 9:09 AM, David Lylewrote: > > > Absolutely agree. I also think we'd want both once we've done that. > Travis > > is good for smoke testing PRs and Commits. Jenkins is good for nightly > runs > > of medium duration tests and would be great for automating our > distributed > > testing if we found infrastructure to support it. I've seen them used in > > concert to provide a good solution. > > > > But, initially, I'd like to see us get our in-process stuff replaced with > > docker where (if) it makes sense, refactored to run in parallel, the poms > > refactored to handle our dependencies better and our uber jars removed > > where they can be and minimized where they cannot be. > > > > Which, I think, is a long-winded way of saying "I'd like to see us do > what > > Casey suggested." :) > > > > -D... > > > > > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < > > michael.miklav...@gmail.com> wrote: > > > > > I agree with this. I don't think we should switch to an alternate > system > > > until we find that we are absolutely incapable of eking out any further > > > efficiency from the current setup. > > > > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella > wrote: > > > > > > > I believe that some people use travis and some people request Jenkins > > > from > > > > Apache Infra. That being said, personally, I think we should take > the > > > > opportunity to correct the underlying issues. 50 minutes for a build > > > seems > > > > excessive to me. > > > > > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler < > ottobackwa...@gmail.com> > > > > wrote: > > > > > > > > > Is there an alternative to Travis? Do other like sized apache > > projects > > > > > have these problems? Do they use travis? > > > > > > > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > > > > wrote: > > > > > > > > > > For those with pending/building pull requests, it will come as no > > > > surprise > > > > > that our build times are increasing at a pace that is worrisome. In > > > fact, > > > > > we have hit a fundamental limit associated with Travis over the > > > weekend. > > > > > We have creeped up into the 40+ minute build territory and travis > > seems > > > > to > > > > > error out at around 49 minutes. > > > > > > > > > > Taking the current build ( > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), > > looking > > > > at > > > > > just job times, we're spending about 19 - 20 minutes (1176.53 > > seconds) > > > in > > > > > tests out of 44 minutes and 42 seconds to do the build. This places > > the > > > > > unit tests at around 43% of the build time. I say all of this to > > point > > > > out > > > > > that while unit tests are a portion of the build, they are not even > > the > > > > > majority of the build time. We need an approach that addresses the > > > whole > > > > > build performance holistically and we need it soonest. > > > > > > > > > > To seed the discussion, I will point to a few things that come to > > mind > > > > > that > > > > > fit into three broad categories: > > > > > > > > > > *Tests are Slow* > > > > > > > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 > seconds > > > and > > > > > make up 14 minutes of the build. Considering what we can do to > speed > > > > those > > > > > tests as a tactical approach may be worth considering > > > > > - We are spinning up the same services (e.g. kafka, storm) for > > multiple > > > > > tests, instead use the docker infrastructure to spin them up once > and > > > > then > > > > > use them throughout the tests. > > > > > > > > > > > > > > > *Tests aren't parallel* > > > > > > > > > > Currently we cannot run the build in parallel due to the > integration > > > test > > > > > infrastructure spinning up its own services that bind to the same > > > ports. > > > > > If we correct this, we can run the builds in parallel with mvn -T > > > > > > > > > > -
Re: [DISCUSS] Build Times are getting out of hand
Down to 24 minutes? Nice job. On Tue, Feb 7, 2017 at 1:49 PM, Casey Stellawrote: > I spent a minute or two looking at how we might use travis > configuration-alone to drop the wall-clock time of the build and put it up > for review at https://github.com/apache/incubator-metron/pull/444 > > It does 2 things: > >- Separates the build, the unit tests and the integration tests >- Parallelizes the unit tests and the build and runs the integration >tests within the travis container >- Runs the unit tests and integration tests in separate travis >containers using travis' build matrix > > This ultimately cuts the wallclock time down to 24 minutes for me on travis > and should give us some time where we're not constantly bouncing builds to > act on the suggestions here. > > > On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/ > > > > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle wrote: > > > > > Absolutely agree. I also think we'd want both once we've done that. > > Travis > > > is good for smoke testing PRs and Commits. Jenkins is good for nightly > > runs > > > of medium duration tests and would be great for automating our > > distributed > > > testing if we found infrastructure to support it. I've seen them used > in > > > concert to provide a good solution. > > > > > > But, initially, I'd like to see us get our in-process stuff replaced > with > > > docker where (if) it makes sense, refactored to run in parallel, the > poms > > > refactored to handle our dependencies better and our uber jars removed > > > where they can be and minimized where they cannot be. > > > > > > Which, I think, is a long-winded way of saying "I'd like to see us do > > what > > > Casey suggested." :) > > > > > > -D... > > > > > > > > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < > > > michael.miklav...@gmail.com> wrote: > > > > > > > I agree with this. I don't think we should switch to an alternate > > system > > > > until we find that we are absolutely incapable of eking out any > further > > > > efficiency from the current setup. > > > > > > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella > > wrote: > > > > > > > > > I believe that some people use travis and some people request > Jenkins > > > > from > > > > > Apache Infra. That being said, personally, I think we should take > > the > > > > > opportunity to correct the underlying issues. 50 minutes for a > build > > > > seems > > > > > excessive to me. > > > > > > > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler < > > ottobackwa...@gmail.com> > > > > > wrote: > > > > > > > > > > > Is there an alternative to Travis? Do other like sized apache > > > projects > > > > > > have these problems? Do they use travis? > > > > > > > > > > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella ( > ceste...@gmail.com) > > > > > wrote: > > > > > > > > > > > > For those with pending/building pull requests, it will come as no > > > > > surprise > > > > > > that our build times are increasing at a pace that is worrisome. > In > > > > fact, > > > > > > we have hit a fundamental limit associated with Travis over the > > > > weekend. > > > > > > We have creeped up into the 40+ minute build territory and travis > > > seems > > > > > to > > > > > > error out at around 49 minutes. > > > > > > > > > > > > Taking the current build ( > > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), > > > looking > > > > > at > > > > > > just job times, we're spending about 19 - 20 minutes (1176.53 > > > seconds) > > > > in > > > > > > tests out of 44 minutes and 42 seconds to do the build. This > places > > > the > > > > > > unit tests at around 43% of the build time. I say all of this to > > > point > > > > > out > > > > > > that while unit tests are a portion of the build, they are not > even > > > the > > > > > > majority of the build time. We need an approach that addresses > the > > > > whole > > > > > > build performance holistically and we need it soonest. > > > > > > > > > > > > To seed the discussion, I will point to a few things that come to > > > mind > > > > > > that > > > > > > fit into three broad categories: > > > > > > > > > > > > *Tests are Slow* > > > > > > > > > > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 > > seconds > > > > and > > > > > > make up 14 minutes of the build. Considering what we can do to > > speed > > > > > those > > > > > > tests as a tactical approach may be worth considering > > > > > > - We are spinning up the same services (e.g. kafka, storm) for > > > multiple > > > > > > tests, instead use the docker infrastructure to spin them up once > > and > > > > > then > > > > > > use them throughout the tests. > > > > > > > > > > > > > > > > > > *Tests aren't parallel* > > > > > > > > > > > > Currently we cannot run
Re: [DISCUSS] Build Times are getting out of hand
I spent a minute or two looking at how we might use travis configuration-alone to drop the wall-clock time of the build and put it up for review at https://github.com/apache/incubator-metron/pull/444 It does 2 things: - Separates the build, the unit tests and the integration tests - Parallelizes the unit tests and the build and runs the integration tests within the travis container - Runs the unit tests and integration tests in separate travis containers using travis' build matrix This ultimately cuts the wallclock time down to 24 minutes for me on travis and should give us some time where we're not constantly bouncing builds to act on the suggestions here. On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic < michael.miklav...@gmail.com> wrote: > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/ > > On Tue, Feb 7, 2017 at 9:09 AM, David Lylewrote: > > > Absolutely agree. I also think we'd want both once we've done that. > Travis > > is good for smoke testing PRs and Commits. Jenkins is good for nightly > runs > > of medium duration tests and would be great for automating our > distributed > > testing if we found infrastructure to support it. I've seen them used in > > concert to provide a good solution. > > > > But, initially, I'd like to see us get our in-process stuff replaced with > > docker where (if) it makes sense, refactored to run in parallel, the poms > > refactored to handle our dependencies better and our uber jars removed > > where they can be and minimized where they cannot be. > > > > Which, I think, is a long-winded way of saying "I'd like to see us do > what > > Casey suggested." :) > > > > -D... > > > > > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < > > michael.miklav...@gmail.com> wrote: > > > > > I agree with this. I don't think we should switch to an alternate > system > > > until we find that we are absolutely incapable of eking out any further > > > efficiency from the current setup. > > > > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella > wrote: > > > > > > > I believe that some people use travis and some people request Jenkins > > > from > > > > Apache Infra. That being said, personally, I think we should take > the > > > > opportunity to correct the underlying issues. 50 minutes for a build > > > seems > > > > excessive to me. > > > > > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler < > ottobackwa...@gmail.com> > > > > wrote: > > > > > > > > > Is there an alternative to Travis? Do other like sized apache > > projects > > > > > have these problems? Do they use travis? > > > > > > > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > > > > wrote: > > > > > > > > > > For those with pending/building pull requests, it will come as no > > > > surprise > > > > > that our build times are increasing at a pace that is worrisome. In > > > fact, > > > > > we have hit a fundamental limit associated with Travis over the > > > weekend. > > > > > We have creeped up into the 40+ minute build territory and travis > > seems > > > > to > > > > > error out at around 49 minutes. > > > > > > > > > > Taking the current build ( > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), > > looking > > > > at > > > > > just job times, we're spending about 19 - 20 minutes (1176.53 > > seconds) > > > in > > > > > tests out of 44 minutes and 42 seconds to do the build. This places > > the > > > > > unit tests at around 43% of the build time. I say all of this to > > point > > > > out > > > > > that while unit tests are a portion of the build, they are not even > > the > > > > > majority of the build time. We need an approach that addresses the > > > whole > > > > > build performance holistically and we need it soonest. > > > > > > > > > > To seed the discussion, I will point to a few things that come to > > mind > > > > > that > > > > > fit into three broad categories: > > > > > > > > > > *Tests are Slow* > > > > > > > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 > seconds > > > and > > > > > make up 14 minutes of the build. Considering what we can do to > speed > > > > those > > > > > tests as a tactical approach may be worth considering > > > > > - We are spinning up the same services (e.g. kafka, storm) for > > multiple > > > > > tests, instead use the docker infrastructure to spin them up once > and > > > > then > > > > > use them throughout the tests. > > > > > > > > > > > > > > > *Tests aren't parallel* > > > > > > > > > > Currently we cannot run the build in parallel due to the > integration > > > test > > > > > infrastructure spinning up its own services that bind to the same > > > ports. > > > > > If we correct this, we can run the builds in parallel with mvn -T > > > > > > > > > > - Correct this by decoupling the infrastructure from the tests and > > > > > refactoring the tests to run in parallel. > > > > > - Make the
[GitHub] incubator-metron pull request #444: METRON-705: Parallelize the build in tra...
GitHub user cestella opened a pull request: https://github.com/apache/incubator-metron/pull/444 METRON-705: Parallelize the build in travis to the extent that is obvious Travis suggests [here](https://blog.travis-ci.com/2012-11-28-speeding-up-your-tests-by-parallelizing-them/) that for situations where the integration get chunky, one can parallelize them using their build matrix functionality. Also, if we can separate those out, we can also process-parallelize the unit and build. Currently the build time is cut roughly in half to 24 minutes wall-clock. **NOTE: This is just a stopgap that requires no code changes to lower build wall-clock times. This is not intended to replace work parallelizing the integration tests or making the build take less time.** You can merge this pull request into a Git repository by running: $ git pull https://github.com/cestella/incubator-metron parallel_build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/444.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #444 commit b0e56a1b0faea33118f5fde5a23dd1982cee8c77 Author: cstellaDate: 2017-02-07T15:25:36Z Trying out parallelizing the unit and build but not integration tests. commit 02d9dc1e211f5a11ed0a7172d76c7cca97590989 Author: cstella Date: 2017-02-07T15:29:44Z Empty push commit a15daf72c411aed7b05c298eac6816cc5163cc0d Author: cstella Date: 2017-02-07T15:48:31Z Updating. commit 4da00b23d26bada058306317a969449a3bd87108 Author: cstella Date: 2017-02-07T16:30:20Z make sure to run rat. commit f4605994b1db018e60496d02abae369887ef5d78 Author: cstella Date: 2017-02-07T16:33:30Z quiet down rat. commit 939fb394ff34128fe704aeace8d7b4ce9f4daf41 Author: cstella Date: 2017-02-07T16:57:58Z Updating. commit 896e9b90aefef63572b0d4f69bd745506d10ebc8 Author: cstella Date: 2017-02-07T17:06:24Z Adding. commit c06862e87679140079c9250b170b0a9bee25e094 Author: cstella Date: 2017-02-07T17:10:02Z making rat happy. commit 6299cd23f1498b91630a7c118b8ca4366a7aa4be Author: cstella Date: 2017-02-07T17:12:16Z skipping other things. commit 11ec82d1edd6c0e6f158da37b2026b4dc07c466a Author: cstella Date: 2017-02-07T17:26:03Z Updating to actually run unit tests. commit 29296e35352d61ffe0f97bf1e9c78158808f7621 Author: cstella Date: 2017-02-07T18:06:20Z Update to build matrix commit 3eaa6428c3d42e555de472cf3502612350d513df Author: cstella Date: 2017-02-07T18:09:02Z Whoops. commit 2e0c4521d7fd8e86ae0e18cfdf487617008bb378 Author: cstella Date: 2017-02-07T18:12:43Z mised the echo commit 424dd1bf50b84275e0382be50354fceca558659e Author: cstella Date: 2017-02-07T18:38:06Z putting time statements. commit ab2688dc2ee772da479ae24862a0b184e8e39379 Author: cstella Date: 2017-02-07T19:41:01Z Commenting the exclude. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: BulkMessageWriterBolt and MessageGetters
You are correct, the BulkMessageWriterBolt/MessageGetters combination is not flexible enough. You would have to modify BulkMessageWriterBolt. I have addressed this in METRON-695 which will be submitted as a PR shortly. It will be easy to do what you want after that is merged in. Ryan On Tue, Feb 7, 2017 at 1:24 PM, Nick Allenwrote: > I am trying to use the `BulkMessageWriterBolt` to write a specific tuple > field named "measurement" to a Kafka topic. > > - id: "kafkaBolt" > className: "org.apache.metron.writer.bolt.BulkMessageWriterBolt" > constructorArgs: > - "${kafka.zk}" > configMethods: > - name: "withMessageWriter" > args: > - ref: "kafkaWriter" > - name: "withMessageGetter" > args: > - "measurement" > > Rather than wanting the name of a field, it wants the name of a valid > `MessageGetters` enum; either RAW or NAMED. It seems like there is no way > for me to plugin a `NamedMessageGetter` with a custom field name like > "measurement". > > Am I missing something? Is there a way to do this out-of-the-box? >
BulkMessageWriterBolt and MessageGetters
I am trying to use the `BulkMessageWriterBolt` to write a specific tuple field named "measurement" to a Kafka topic. - id: "kafkaBolt" className: "org.apache.metron.writer.bolt.BulkMessageWriterBolt" constructorArgs: - "${kafka.zk}" configMethods: - name: "withMessageWriter" args: - ref: "kafkaWriter" - name: "withMessageGetter" args: - "measurement" Rather than wanting the name of a field, it wants the name of a valid `MessageGetters` enum; either RAW or NAMED. It seems like there is no way for me to plugin a `NamedMessageGetter` with a custom field name like "measurement". Am I missing something? Is there a way to do this out-of-the-box?
Re: [Discuss] Direction of metron-docker
Is having a goal of replacing Vagrant/Virtualbox for Docker in "Quick Dev" and "Full Dev" mutually exclusive of the goals you outlined above? We could have both, no? I am unsure if you are objecting to this specific goal or not. On Mon, Feb 6, 2017 at 1:03 PM, Ryan Merrimanwrote: > From the README: > > "Metron Docker is a Docker Compose application that is intended for > development and integration testing of Metron. Use this instead of Vagrant > when: > > - You want an environment that can be built and spun up quickly > - You need to frequently rebuild and restart services > - You only need to test, troubleshoot or develop against a subset of > services" > > The "Quick Dev" environment actually serves 2 purposes: a development > environment and an end-to-end testing environment. This module was > intended to supplement or provide an alternative to the development > environment part of "Quick Dev", not the end-to-end testing part. It does > have "Docker" in the name of the module so I can see how that might suggest > a fully supported deployment option. It shouldn't be used for that though > because it doesn't include Ambari or MPack and isn't a true representation > of a production Metron cluster. > > What is the direction? I could see this evolving into a collection of > profiles or recipes. Need to development a custom parser? Spin up an > application that only includes the Storm, Kafka and Zookeeper images. Want > to develop a custom Kibana dashboard? Spin up Elasticsearch and Kibana > images preloaded with data. Maybe an analytics profile could be created > that only includes the tools you need for that? The application that > exists now in metron-docker could be considered a "rest" profile or a > collection of containers that support all the functions of the rest API. > It's very general purpose and supports a lot of use cases so I considered > it a good starting point. It's very useful if you're developing a UI and > have limited knowledge of Ambari or big data platform services. That was > the initial motivation. > > I think you should view this as more of a toolbox and not a turnkey > installation solution. Maintaining and building development environments > is something Docker is a really good fit for and I have found this works > much better than our Ansible/Vagrant environment. It's really fast and > stays up all the time. > > But it's completely optional. Use it if you think it will help you. Or > don't if "Quick Dev" is good enough and you've figured out how to tune it > so that it's not completely unusable. If everybody thinks it's confusing > and no one uses it then we can take it out and I'll just go back to > maintaining it privately. But then I would miss out on Kyle's awesome > contribution :) > > Ryan > > On Mon, Feb 6, 2017 at 10:12 AM, Nick Allen wrote: > > > So what is the direction then, Ryan? Can you describe what this is > > supposed to be used for? > > > > I had thought people wanted this to replace the existing Vagrant-based > > "Quick Dev"? But apparently this is the assumption that you think I am > > wrong on. > > > > > > > > On Mon, Feb 6, 2017 at 10:46 AM, Ryan Merriman > > wrote: > > > > > I agree with everything Kyle said and I think some of Nick's > assumptions > > > are false. I don't see this a third deployment option. > > > > > > I can understand people not wanting to maintain another deployment path > > > with Metron already being as big as it is. Ensuring that you've tested > > and > > > updated all the appropriate components is already tedious. But in the > > case > > > of this module, is it something that needs to updated anytime someone > > makes > > > a deployment related change? I don't think so and I've never had that > > > expectation. The build won't fail and nothing from this project is > ever > > > deployed or shipped. For me, maintaining this tool as needed is good > > > enough. What happens if a change is introduced that breaks > something? I > > > discover it as I'm using the tool, fix it, contribute it back and move > > on. > > > No big deal. I had been maintaining this privately for a while before > > the > > > PR was submitted and the work to keep it current with master was pretty > > > minimal. Does that mean it should live somewhere else besides the > master > > > branch in Metron? I'm not sure what the answer is but there should be > a > > > way to share and collaborate with the community on tools like this that > > > aren't necessarily deployed to production. Kyle's contribution is > > valuable > > > and something I would definitely use. > > > > > > Ryan > > > > > >
Re: [DISCUSS] Build Times are getting out of hand
FYI, found this for Docker - https://docs.travis-ci.com/user/docker/ On Tue, Feb 7, 2017 at 9:09 AM, David Lylewrote: > Absolutely agree. I also think we'd want both once we've done that. Travis > is good for smoke testing PRs and Commits. Jenkins is good for nightly runs > of medium duration tests and would be great for automating our distributed > testing if we found infrastructure to support it. I've seen them used in > concert to provide a good solution. > > But, initially, I'd like to see us get our in-process stuff replaced with > docker where (if) it makes sense, refactored to run in parallel, the poms > refactored to handle our dependencies better and our uber jars removed > where they can be and minimized where they cannot be. > > Which, I think, is a long-winded way of saying "I'd like to see us do what > Casey suggested." :) > > -D... > > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > I agree with this. I don't think we should switch to an alternate system > > until we find that we are absolutely incapable of eking out any further > > efficiency from the current setup. > > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella wrote: > > > > > I believe that some people use travis and some people request Jenkins > > from > > > Apache Infra. That being said, personally, I think we should take the > > > opportunity to correct the underlying issues. 50 minutes for a build > > seems > > > excessive to me. > > > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler > > > wrote: > > > > > > > Is there an alternative to Travis? Do other like sized apache > projects > > > > have these problems? Do they use travis? > > > > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > > > wrote: > > > > > > > > For those with pending/building pull requests, it will come as no > > > surprise > > > > that our build times are increasing at a pace that is worrisome. In > > fact, > > > > we have hit a fundamental limit associated with Travis over the > > weekend. > > > > We have creeped up into the 40+ minute build territory and travis > seems > > > to > > > > error out at around 49 minutes. > > > > > > > > Taking the current build ( > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), > looking > > > at > > > > just job times, we're spending about 19 - 20 minutes (1176.53 > seconds) > > in > > > > tests out of 44 minutes and 42 seconds to do the build. This places > the > > > > unit tests at around 43% of the build time. I say all of this to > point > > > out > > > > that while unit tests are a portion of the build, they are not even > the > > > > majority of the build time. We need an approach that addresses the > > whole > > > > build performance holistically and we need it soonest. > > > > > > > > To seed the discussion, I will point to a few things that come to > mind > > > > that > > > > fit into three broad categories: > > > > > > > > *Tests are Slow* > > > > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 seconds > > and > > > > make up 14 minutes of the build. Considering what we can do to speed > > > those > > > > tests as a tactical approach may be worth considering > > > > - We are spinning up the same services (e.g. kafka, storm) for > multiple > > > > tests, instead use the docker infrastructure to spin them up once and > > > then > > > > use them throughout the tests. > > > > > > > > > > > > *Tests aren't parallel* > > > > > > > > Currently we cannot run the build in parallel due to the integration > > test > > > > infrastructure spinning up its own services that bind to the same > > ports. > > > > If we correct this, we can run the builds in parallel with mvn -T > > > > > > > > - Correct this by decoupling the infrastructure from the tests and > > > > refactoring the tests to run in parallel. > > > > - Make the integration testing infrastructure bind intelligently to > > > > whatever port is available. > > > > - Move the integration tests to their own project. This will let us > run > > > > the build in parallel since an individual project's test will be run > > > > serially. > > > > > > > > *Packaging is Painful* > > > > > > > > We have a sensitive environment in terms of dependencies. As such, we > > are > > > > careful to shade and relocate dependencies that we want to isolate > from > > > > our > > > > transitive dependencies. The consequences of this is that we spend a > > lot > > > > of time in the build shading and relocating maven module output. > > > > > > > > - Do the hard work to walk our transitive dependencies and ensure > that > > > > we are including only one copy of every library by using exclusions > > > > effectively. This will not only bring down build times, it will make > > sure > > > > we know what we're including. > > > > - Try to devise a strategy where we only shade once at the end.
[VOTE] Releasing Apache Metron (incubating) 0.3.1-RC2
This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating Full list of changes in this release: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/CHANGES The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating: https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc2-incubating The source archive being voted upon can be found here: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz Other release files, signatures and digests can be found here: https://dist.apache.org/repos/dist/dev/incubator/metron/0.3.1-RC2-incubating/ The release artifacts are signed with the following key: https://git-wip-us.apache.org/repos/asf?p=incubator-metron.git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d9c260ba55e;hb=refs/tags/apache-metron-0.3.1-rc2-incubating Please vote on releasing this package as Apache Metron 0.3.1-RC2 incubating When voting, please list the actions taken to verify the release. Recommended build validation and verification instructions are posted here: https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds This vote will be open for at least 72 hours. [ ] +1 Release this package as Apache Metron 0.3.1-RC2 incubating [ ] 0 No opinion [ ] -1 Do not release this package because...
Re: [DISCUSS] Build Times are getting out of hand
Absolutely agree. I also think we'd want both once we've done that. Travis is good for smoke testing PRs and Commits. Jenkins is good for nightly runs of medium duration tests and would be great for automating our distributed testing if we found infrastructure to support it. I've seen them used in concert to provide a good solution. But, initially, I'd like to see us get our in-process stuff replaced with docker where (if) it makes sense, refactored to run in parallel, the poms refactored to handle our dependencies better and our uber jars removed where they can be and minimized where they cannot be. Which, I think, is a long-winded way of saying "I'd like to see us do what Casey suggested." :) -D... On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic < michael.miklav...@gmail.com> wrote: > I agree with this. I don't think we should switch to an alternate system > until we find that we are absolutely incapable of eking out any further > efficiency from the current setup. > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stellawrote: > > > I believe that some people use travis and some people request Jenkins > from > > Apache Infra. That being said, personally, I think we should take the > > opportunity to correct the underlying issues. 50 minutes for a build > seems > > excessive to me. > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler > > wrote: > > > > > Is there an alternative to Travis? Do other like sized apache projects > > > have these problems? Do they use travis? > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > > wrote: > > > > > > For those with pending/building pull requests, it will come as no > > surprise > > > that our build times are increasing at a pace that is worrisome. In > fact, > > > we have hit a fundamental limit associated with Travis over the > weekend. > > > We have creeped up into the 40+ minute build territory and travis seems > > to > > > error out at around 49 minutes. > > > > > > Taking the current build ( > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking > > at > > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) > in > > > tests out of 44 minutes and 42 seconds to do the build. This places the > > > unit tests at around 43% of the build time. I say all of this to point > > out > > > that while unit tests are a portion of the build, they are not even the > > > majority of the build time. We need an approach that addresses the > whole > > > build performance holistically and we need it soonest. > > > > > > To seed the discussion, I will point to a few things that come to mind > > > that > > > fit into three broad categories: > > > > > > *Tests are Slow* > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 seconds > and > > > make up 14 minutes of the build. Considering what we can do to speed > > those > > > tests as a tactical approach may be worth considering > > > - We are spinning up the same services (e.g. kafka, storm) for multiple > > > tests, instead use the docker infrastructure to spin them up once and > > then > > > use them throughout the tests. > > > > > > > > > *Tests aren't parallel* > > > > > > Currently we cannot run the build in parallel due to the integration > test > > > infrastructure spinning up its own services that bind to the same > ports. > > > If we correct this, we can run the builds in parallel with mvn -T > > > > > > - Correct this by decoupling the infrastructure from the tests and > > > refactoring the tests to run in parallel. > > > - Make the integration testing infrastructure bind intelligently to > > > whatever port is available. > > > - Move the integration tests to their own project. This will let us run > > > the build in parallel since an individual project's test will be run > > > serially. > > > > > > *Packaging is Painful* > > > > > > We have a sensitive environment in terms of dependencies. As such, we > are > > > careful to shade and relocate dependencies that we want to isolate from > > > our > > > transitive dependencies. The consequences of this is that we spend a > lot > > > of time in the build shading and relocating maven module output. > > > > > > - Do the hard work to walk our transitive dependencies and ensure that > > > we are including only one copy of every library by using exclusions > > > effectively. This will not only bring down build times, it will make > sure > > > we know what we're including. > > > - Try to devise a strategy where we only shade once at the end. This > > > could look like some combination of > > > - standardizing on the lowest common denominator of a troublesome > > > library > > > - We shade in dependencies so they can use different versions of > > > libraries (e.g. metron-common with a modern version of guava) than the > > > final jars. > > > - exclusions > > > - externalizing infrastructure out to not necessitate spinning up > >
[GitHub] incubator-metron pull request #443: METRON-703: Rev the version from 0.3.0 t...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-metron/pull/443 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1
Github user justinleet commented on the issue: https://github.com/apache/incubator-metron/pull/443 Built and installed the mpack. Versioning looks good where it shows up, and everything installed and started up correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] Build Times are getting out of hand
I agree with this. I don't think we should switch to an alternate system until we find that we are absolutely incapable of eking out any further efficiency from the current setup. On Tue, Feb 7, 2017 at 8:04 AM, Casey Stellawrote: > I believe that some people use travis and some people request Jenkins from > Apache Infra. That being said, personally, I think we should take the > opportunity to correct the underlying issues. 50 minutes for a build seems > excessive to me. > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler > wrote: > > > Is there an alternative to Travis? Do other like sized apache projects > > have these problems? Do they use travis? > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > wrote: > > > > For those with pending/building pull requests, it will come as no > surprise > > that our build times are increasing at a pace that is worrisome. In fact, > > we have hit a fundamental limit associated with Travis over the weekend. > > We have creeped up into the 40+ minute build territory and travis seems > to > > error out at around 49 minutes. > > > > Taking the current build ( > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking > at > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in > > tests out of 44 minutes and 42 seconds to do the build. This places the > > unit tests at around 43% of the build time. I say all of this to point > out > > that while unit tests are a portion of the build, they are not even the > > majority of the build time. We need an approach that addresses the whole > > build performance holistically and we need it soonest. > > > > To seed the discussion, I will point to a few things that come to mind > > that > > fit into three broad categories: > > > > *Tests are Slow* > > > > > > - *Tactical*: We have around 13 tests that take more than 30 seconds and > > make up 14 minutes of the build. Considering what we can do to speed > those > > tests as a tactical approach may be worth considering > > - We are spinning up the same services (e.g. kafka, storm) for multiple > > tests, instead use the docker infrastructure to spin them up once and > then > > use them throughout the tests. > > > > > > *Tests aren't parallel* > > > > Currently we cannot run the build in parallel due to the integration test > > infrastructure spinning up its own services that bind to the same ports. > > If we correct this, we can run the builds in parallel with mvn -T > > > > - Correct this by decoupling the infrastructure from the tests and > > refactoring the tests to run in parallel. > > - Make the integration testing infrastructure bind intelligently to > > whatever port is available. > > - Move the integration tests to their own project. This will let us run > > the build in parallel since an individual project's test will be run > > serially. > > > > *Packaging is Painful* > > > > We have a sensitive environment in terms of dependencies. As such, we are > > careful to shade and relocate dependencies that we want to isolate from > > our > > transitive dependencies. The consequences of this is that we spend a lot > > of time in the build shading and relocating maven module output. > > > > - Do the hard work to walk our transitive dependencies and ensure that > > we are including only one copy of every library by using exclusions > > effectively. This will not only bring down build times, it will make sure > > we know what we're including. > > - Try to devise a strategy where we only shade once at the end. This > > could look like some combination of > > - standardizing on the lowest common denominator of a troublesome > > library > > - We shade in dependencies so they can use different versions of > > libraries (e.g. metron-common with a modern version of guava) than the > > final jars. > > - exclusions > > - externalizing infrastructure out to not necessitate spinning up > > hadoop components in-process for integration tests (i.e. hbase server > > conflicts with storm in a few dependencies) > > > > *Final Thoughts* > > > > If I had three to pick, I'd pick > > > > - moving off of the in-memory component infrastructure to docker images > > - fixing the maven poms to exclude correctly > > - ensuring the resulting tests are parallelizable > > > > I will point out that fixing the maven poms to exclude correctly (i.e. we > > choose the version of every jar that we depend on transitively) ticks > > multiple boxes, not just making things faster. > > > > What are your thoughts? What did I miss? We need a plan and we need to > > execute on it soon, otherwise travis is going to keep smacking us hard. > It > > may be worth while constructing a tactical plan and then a more strategic > > plan that we can work toward. I was heartened at how much some of these > > suggestions dovetail with the discussion around the future of the docker > > infrastructure. > > > > Best, > > > >
Re: [DISCUSS] Build Times are getting out of hand
Mike, unfortunately something changed recently, and I can't run `mvn clean install -T 2C` locally anymore. I'd like to echo that I think working on fixing the dependency issue is a very good idea. We've actually faced issues with this on the REST API PR. Working to fix this and having a standard way of including/excluding dependencies will be helpful to all, and to Ryan's point will benefit us outside of this context. On Tue, Feb 7, 2017 at 9:36 AM, Ryan Merrimanwrote: > Debugging integration tests in an IDE uses the same approach with our > current infrastructure or with docker: start up the topology with > LocalRunner. I've had mixed success with our current infrastructure. As > Mike alluded to, some tests work fine (most of the parser topologies and > enrichment topology) while others fail when run in my IDE but work on the > command line (ES integration test due to guava issues and Squid topology > due to some issue with the remove subdomains Stellar function). Of course > with Docker infrastructure you will need a test runner to launch topologies > in LocalRunner. They are short and simple though and I have one written > for each topology that I can share when appropriate. > > There are some advantages and disadvantages to switching the integration > tests to use Docker. The infrastructure we have now works and could be > adjusted to overcome it's primary weaknesses (single classloader and start > up/shutdown after each test). With Docker the classloader issue goes away > for the most part (or is much better than it is now) without any extra > work. For spinning services up/down once instead of with each test, we > will need to adjust our tests to clean up after themselves or (even better) > namespace all testing objects so that tests don't step on each other. That > work would have to be done no matter which infrastructure approach we > take. Probably the biggest downside to using Docker is that all > integration tests will need to be adjusted and we'll likely hit some issues > that we'll need to resolve. I was bitten several times by services that > broadcast their host address (Kafka for example) and I bet we hit more of > those. We'll also need to add a few more containers (HDFS for sure) but > those are easy to create as long as you don't hit the issue I just > mentioned. > > I think all of the suggestions so far are good ideas. I think it goes > without saying that we should do one at a time and maybe even reassess > after we see the impact of each change. I would vote for doing the > Maven/shading one first because it is all around beneficial, even outside > of this context. > > On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella wrote: > > > I believe that some people use travis and some people request Jenkins > from > > Apache Infra. That being said, personally, I think we should take the > > opportunity to correct the underlying issues. 50 minutes for a build > seems > > excessive to me. > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler > > wrote: > > > > > Is there an alternative to Travis? Do other like sized apache projects > > > have these problems? Do they use travis? > > > > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > > wrote: > > > > > > For those with pending/building pull requests, it will come as no > > surprise > > > that our build times are increasing at a pace that is worrisome. In > fact, > > > we have hit a fundamental limit associated with Travis over the > weekend. > > > We have creeped up into the 40+ minute build territory and travis seems > > to > > > error out at around 49 minutes. > > > > > > Taking the current build ( > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking > > at > > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) > in > > > tests out of 44 minutes and 42 seconds to do the build. This places the > > > unit tests at around 43% of the build time. I say all of this to point > > out > > > that while unit tests are a portion of the build, they are not even the > > > majority of the build time. We need an approach that addresses the > whole > > > build performance holistically and we need it soonest. > > > > > > To seed the discussion, I will point to a few things that come to mind > > > that > > > fit into three broad categories: > > > > > > *Tests are Slow* > > > > > > > > > - *Tactical*: We have around 13 tests that take more than 30 seconds > and > > > make up 14 minutes of the build. Considering what we can do to speed > > those > > > tests as a tactical approach may be worth considering > > > - We are spinning up the same services (e.g. kafka, storm) for multiple > > > tests, instead use the docker infrastructure to spin them up once and > > then > > > use them throughout the tests. > > > > > > > > > *Tests aren't parallel* > > > > > > Currently we cannot run the build in parallel due
Re: [DISCUSS] Build Times are getting out of hand
Debugging integration tests in an IDE uses the same approach with our current infrastructure or with docker: start up the topology with LocalRunner. I've had mixed success with our current infrastructure. As Mike alluded to, some tests work fine (most of the parser topologies and enrichment topology) while others fail when run in my IDE but work on the command line (ES integration test due to guava issues and Squid topology due to some issue with the remove subdomains Stellar function). Of course with Docker infrastructure you will need a test runner to launch topologies in LocalRunner. They are short and simple though and I have one written for each topology that I can share when appropriate. There are some advantages and disadvantages to switching the integration tests to use Docker. The infrastructure we have now works and could be adjusted to overcome it's primary weaknesses (single classloader and start up/shutdown after each test). With Docker the classloader issue goes away for the most part (or is much better than it is now) without any extra work. For spinning services up/down once instead of with each test, we will need to adjust our tests to clean up after themselves or (even better) namespace all testing objects so that tests don't step on each other. That work would have to be done no matter which infrastructure approach we take. Probably the biggest downside to using Docker is that all integration tests will need to be adjusted and we'll likely hit some issues that we'll need to resolve. I was bitten several times by services that broadcast their host address (Kafka for example) and I bet we hit more of those. We'll also need to add a few more containers (HDFS for sure) but those are easy to create as long as you don't hit the issue I just mentioned. I think all of the suggestions so far are good ideas. I think it goes without saying that we should do one at a time and maybe even reassess after we see the impact of each change. I would vote for doing the Maven/shading one first because it is all around beneficial, even outside of this context. On Tue, Feb 7, 2017 at 9:04 AM, Casey Stellawrote: > I believe that some people use travis and some people request Jenkins from > Apache Infra. That being said, personally, I think we should take the > opportunity to correct the underlying issues. 50 minutes for a build seems > excessive to me. > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler > wrote: > > > Is there an alternative to Travis? Do other like sized apache projects > > have these problems? Do they use travis? > > > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) > wrote: > > > > For those with pending/building pull requests, it will come as no > surprise > > that our build times are increasing at a pace that is worrisome. In fact, > > we have hit a fundamental limit associated with Travis over the weekend. > > We have creeped up into the 40+ minute build territory and travis seems > to > > error out at around 49 minutes. > > > > Taking the current build ( > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking > at > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in > > tests out of 44 minutes and 42 seconds to do the build. This places the > > unit tests at around 43% of the build time. I say all of this to point > out > > that while unit tests are a portion of the build, they are not even the > > majority of the build time. We need an approach that addresses the whole > > build performance holistically and we need it soonest. > > > > To seed the discussion, I will point to a few things that come to mind > > that > > fit into three broad categories: > > > > *Tests are Slow* > > > > > > - *Tactical*: We have around 13 tests that take more than 30 seconds and > > make up 14 minutes of the build. Considering what we can do to speed > those > > tests as a tactical approach may be worth considering > > - We are spinning up the same services (e.g. kafka, storm) for multiple > > tests, instead use the docker infrastructure to spin them up once and > then > > use them throughout the tests. > > > > > > *Tests aren't parallel* > > > > Currently we cannot run the build in parallel due to the integration test > > infrastructure spinning up its own services that bind to the same ports. > > If we correct this, we can run the builds in parallel with mvn -T > > > > - Correct this by decoupling the infrastructure from the tests and > > refactoring the tests to run in parallel. > > - Make the integration testing infrastructure bind intelligently to > > whatever port is available. > > - Move the integration tests to their own project. This will let us run > > the build in parallel since an individual project's test will be run > > serially. > > > > *Packaging is Painful* > > > > We have a sensitive environment in terms of dependencies. As such, we are > > careful to
Re: [DISCUSS] Build Times are getting out of hand
I believe that some people use travis and some people request Jenkins from Apache Infra. That being said, personally, I think we should take the opportunity to correct the underlying issues. 50 minutes for a build seems excessive to me. On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowlerwrote: > Is there an alternative to Travis? Do other like sized apache projects > have these problems? Do they use travis? > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) wrote: > > For those with pending/building pull requests, it will come as no surprise > that our build times are increasing at a pace that is worrisome. In fact, > we have hit a fundamental limit associated with Travis over the weekend. > We have creeped up into the 40+ minute build territory and travis seems to > error out at around 49 minutes. > > Taking the current build ( > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking at > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in > tests out of 44 minutes and 42 seconds to do the build. This places the > unit tests at around 43% of the build time. I say all of this to point out > that while unit tests are a portion of the build, they are not even the > majority of the build time. We need an approach that addresses the whole > build performance holistically and we need it soonest. > > To seed the discussion, I will point to a few things that come to mind > that > fit into three broad categories: > > *Tests are Slow* > > > - *Tactical*: We have around 13 tests that take more than 30 seconds and > make up 14 minutes of the build. Considering what we can do to speed those > tests as a tactical approach may be worth considering > - We are spinning up the same services (e.g. kafka, storm) for multiple > tests, instead use the docker infrastructure to spin them up once and then > use them throughout the tests. > > > *Tests aren't parallel* > > Currently we cannot run the build in parallel due to the integration test > infrastructure spinning up its own services that bind to the same ports. > If we correct this, we can run the builds in parallel with mvn -T > > - Correct this by decoupling the infrastructure from the tests and > refactoring the tests to run in parallel. > - Make the integration testing infrastructure bind intelligently to > whatever port is available. > - Move the integration tests to their own project. This will let us run > the build in parallel since an individual project's test will be run > serially. > > *Packaging is Painful* > > We have a sensitive environment in terms of dependencies. As such, we are > careful to shade and relocate dependencies that we want to isolate from > our > transitive dependencies. The consequences of this is that we spend a lot > of time in the build shading and relocating maven module output. > > - Do the hard work to walk our transitive dependencies and ensure that > we are including only one copy of every library by using exclusions > effectively. This will not only bring down build times, it will make sure > we know what we're including. > - Try to devise a strategy where we only shade once at the end. This > could look like some combination of > - standardizing on the lowest common denominator of a troublesome > library > - We shade in dependencies so they can use different versions of > libraries (e.g. metron-common with a modern version of guava) than the > final jars. > - exclusions > - externalizing infrastructure out to not necessitate spinning up > hadoop components in-process for integration tests (i.e. hbase server > conflicts with storm in a few dependencies) > > *Final Thoughts* > > If I had three to pick, I'd pick > > - moving off of the in-memory component infrastructure to docker images > - fixing the maven poms to exclude correctly > - ensuring the resulting tests are parallelizable > > I will point out that fixing the maven poms to exclude correctly (i.e. we > choose the version of every jar that we depend on transitively) ticks > multiple boxes, not just making things faster. > > What are your thoughts? What did I miss? We need a plan and we need to > execute on it soon, otherwise travis is going to keep smacking us hard. It > may be worth while constructing a tactical plan and then a more strategic > plan that we can work toward. I was heartened at how much some of these > suggestions dovetail with the discussion around the future of the docker > infrastructure. > > Best, > > Casey > >
Re: [DISCUSS] Build Times are getting out of hand
Mike, I can verify that the integration tests do not run in parallel via mvn -T 1C clean install At a minimum the integration test infrastructure will need to hunt for an open port to bind to rather than assuming one. On Tue, Feb 7, 2017 at 9:26 AM, Michael Miklavcic < michael.miklav...@gmail.com> wrote: > I can't recall, did we have a good solution around Docker and remote > debugging integration tests from the IDE? On the topic of test refactoring > and running in parallel, I'm all for it. I know JJ had been doing this on > his local machine at one point, but we'd need to be sure all tests are > truly independent. E.g. counts on hbase tables would need to be very > specific or every test should use unique tables. Also, can we spin up > something like Docker in Travis? How many cores do we get? I'll look into > that and see what we get. > > I'm all for simplifying our dependencies. Shading the jars takes an > incredible amount of time and has consistently bitten us repeatedly. > Another bummer about the jar shading has been that the build runs > differently in IntelliJ than it does from the Maven command line. I don't > think we'll get away from it entirely, but we may be able to make this > better as well. > > From my most recent local build, these are the biggest offending modules: > metron-profiler SUCCESS [05:56 min] > metron-parsers . SUCCESS [09:38 min] > metron-data-management . SUCCESS [09:15 min] > elasticsearch-shaded ... SUCCESS [08:05 min] > > I'm going to take a look at Travis and also see what pom dependencies I can > start excluding. > > > On Mon, Feb 6, 2017 at 3:02 PM, Casey Stellawrote: > > > For those with pending/building pull requests, it will come as no > surprise > > that our build times are increasing at a pace that is worrisome. In > fact, > > we have hit a fundamental limit associated with Travis over the weekend. > > We have creeped up into the 40+ minute build territory and travis seems > to > > error out at around 49 minutes. > > > > Taking the current build ( > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking > at > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in > > tests out of 44 minutes and 42 seconds to do the build. This places the > > unit tests at around 43% of the build time. I say all of this to point > out > > that while unit tests are a portion of the build, they are not even the > > majority of the build time. We need an approach that addresses the whole > > build performance holistically and we need it soonest. > > > > To seed the discussion, I will point to a few things that come to mind > that > > fit into three broad categories: > > > > *Tests are Slow* > > > > > >- *Tactical*: We have around 13 tests that take more than 30 seconds > and > >make up 14 minutes of the build. Considering what we can do to speed > > those > >tests as a tactical approach may be worth considering > >- We are spinning up the same services (e.g. kafka, storm) for > multiple > >tests, instead use the docker infrastructure to spin them up once and > > then > >use them throughout the tests. > > > > > > *Tests aren't parallel* > > > > Currently we cannot run the build in parallel due to the integration test > > infrastructure spinning up its own services that bind to the same ports. > > If we correct this, we can run the builds in parallel with mvn -T > > > >- Correct this by decoupling the infrastructure from the tests and > >refactoring the tests to run in parallel. > >- Make the integration testing infrastructure bind intelligently to > >whatever port is available. > >- Move the integration tests to their own project. This will let us > run > >the build in parallel since an individual project's test will be run > >serially. > > > > *Packaging is Painful* > > > > We have a sensitive environment in terms of dependencies. As such, we > are > > careful to shade and relocate dependencies that we want to isolate from > our > > transitive dependencies. The consequences of this is that we spend a lot > > of time in the build shading and relocating maven module output. > > > >- Do the hard work to walk our transitive dependencies and ensure that > >we are including only one copy of every library by using exclusions > >effectively. This will not only bring down build times, it will make > > sure > >we know what we're including. > >- Try to devise a strategy where we only shade once at the end. This > >could look like some combination of > > - standardizing on the lowest common denominator of a troublesome > > library > > - We shade in dependencies so they can use different versions of > > libraries (e.g. metron-common with a modern version of guava) > > than the > >
[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/443 I verified that this works in vagrant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/443 I'm in the process of spinning this up in vagrant and I believe @justinleet will be testing out the mpack just to make sure nothing is borked. Please hold off until we report in to commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #443: METRON-703: Rev the version from 0.3.0 to 0.3.1
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/443 +1 Did a quick find-search and did not find any out-of-place 0.3.0 tags left. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[RESULT][VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1
The vote fails; a new release candidate will be cut when METRON-703 is accepted. Results: +1 Nick Allen James Sirota Casey Stella -1 David Lyle
[GitHub] incubator-metron pull request #443: METRON-703: Rev the version from 0.3.0 t...
GitHub user cestella opened a pull request: https://github.com/apache/incubator-metron/pull/443 METRON-703: Rev the version from 0.3.0 to 0.3.1 In order to release, we need to up the version to 0.3.1 so that the artifacts produced continue to function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cestella/incubator-metron METRON-703 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/443.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #443 commit 3d22fc5d9abdc5c5edacb7e8545c3f18e916624f Author: cstellaDate: 2017-02-07T14:14:14Z METRON-703: Upping the version to 0.3.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1
Whoops, you're absolutely right. We forgot to rev the version in the artifacts. I'm going to cancel the vote and rerelease when that JIRA gets in. On Tue, Feb 7, 2017 at 7:56 AM, David Lylewrote: > -1 Unless I'm mistaken, the artifacts are versioned 0.3.0. > > -D... > > On Mon, Feb 6, 2017 at 10:46 PM, James Sirota wrote: > > > +1 deployed on AWS > > > > 06.02.2017, 15:39, "Nick Allen" : > > > +1 > > > > > > Valid checksums > > > Build successful > > > Integration tests successful > > > Deploy of "Full Dev" successful > > > Deploy of "Quick Dev" successful > > > > > > On Mon, Feb 6, 2017 at 3:43 PM, Casey Stella > wrote: > > > > > >> This is a call to vote on releasing Apache Metron 0.3.1-RC1 > incubating > > >> > > >> Full list of changes in this release: > > >> > > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > > >> 1-RC1-incubating/CHANGES > > >> > > >> The tag/commit to be voted upon is apache-metron-0.3.0-rc1- > incubating: > > >> > > >> https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > > >> git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc1-incubating > > >> > > >> The source archive being voted upon can be found here: > > >> > > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > > >> 1-RC1-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz > > >> > > >> Other release files, signatures and digests can be found here: > > >> > > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > > >> 1-RC1-incubating/ > > >> > > >> The release artifacts are signed with the following key: > > >> > > >> https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > > >> git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d > > 9c260ba55e;hb=refs/tags/ > > >> apache-metron-0.3.1-rc1-incubating > > >> > > >> Please vote on releasing this package as Apache Metron 0.3.1-RC1 > > incubating > > >> > > >> When voting, please list the actions taken to verify the release. > > >> > > >> Recommended build validation and verification instructions are posted > > here: > > >> > > >> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds > > >> > > >> This vote will be open for at least 72 hours. > > >> > > >> [ ] +1 Release this package as Apache Metron 0.3.1-RC1 incubating > > >> > > >> [ ] 0 No opinion > > >> > > >> [ ] -1 Do not release this package because... > > > > --- > > Thank you, > > > > James Sirota > > PPMC- Apache Metron (Incubating) > > jsirota AT apache DOT org > > >
Re: [VOTE] Releasing Apache Metron (incubating) 0.3.1-RC1
-1 Unless I'm mistaken, the artifacts are versioned 0.3.0. -D... On Mon, Feb 6, 2017 at 10:46 PM, James Sirotawrote: > +1 deployed on AWS > > 06.02.2017, 15:39, "Nick Allen" : > > +1 > > > > Valid checksums > > Build successful > > Integration tests successful > > Deploy of "Full Dev" successful > > Deploy of "Quick Dev" successful > > > > On Mon, Feb 6, 2017 at 3:43 PM, Casey Stella wrote: > > > >> This is a call to vote on releasing Apache Metron 0.3.1-RC1 incubating > >> > >> Full list of changes in this release: > >> > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > >> 1-RC1-incubating/CHANGES > >> > >> The tag/commit to be voted upon is apache-metron-0.3.0-rc1-incubating: > >> > >> https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > >> git;a=shortlog;h=refs/tags/apache-metron-0.3.1-rc1-incubating > >> > >> The source archive being voted upon can be found here: > >> > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > >> 1-RC1-incubating/apache-metron-0.3.1-rc1-incubating.tar.gz > >> > >> Other release files, signatures and digests can be found here: > >> > >> https://dist.apache.org/repos/dist/dev/incubator/metron/0.3. > >> 1-RC1-incubating/ > >> > >> The release artifacts are signed with the following key: > >> > >> https://git-wip-us.apache.org/repos/asf?p=incubator-metron. > >> git;a=blob;f=KEYS;h=8381e96d64c249a0c1b489bc0c234d > 9c260ba55e;hb=refs/tags/ > >> apache-metron-0.3.1-rc1-incubating > >> > >> Please vote on releasing this package as Apache Metron 0.3.1-RC1 > incubating > >> > >> When voting, please list the actions taken to verify the release. > >> > >> Recommended build validation and verification instructions are posted > here: > >> > >> https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds > >> > >> This vote will be open for at least 72 hours. > >> > >> [ ] +1 Release this package as Apache Metron 0.3.1-RC1 incubating > >> > >> [ ] 0 No opinion > >> > >> [ ] -1 Do not release this package because... > > --- > Thank you, > > James Sirota > PPMC- Apache Metron (Incubating) > jsirota AT apache DOT org >
Re: [Discuss] Direction of metron-docker
>From a user perspective, We used Vagrant when we first encountered Metron to see and learn about it, it was quick and easy to deploy - handy. Now we switched to Ambari Mpack for internal use. We mostly write and deploy Parsers for Metron, so having just Ambari Mpack is enough for us. I haven't used Vagrant ever since we started using Ambari Mpack (which was harder to grasp then Vagrant though). I like what Ryan suggests to use Docker for, if it will be as easy to spin-up (basically if setup will be documented) and allow seamless development for the core team we then would kill two rabbits with that. Casey, > *Is the docker infrastructure sufficient to replace vagrant at the moment?* > > I do not consider it to be a sufficient environment to acceptance test > features because it does not install Metron in a realistic manner that > mimics a user. Vagrant isn't currently where it should be in that regard > and that is the reason that it is currently getting an overhaul to get > closer to that ideal. Considering that we move towards Ambari Mpack (something that was not considered main deployment solution originally) can we really call it realistic manner once Ansible (and if) will be deprecated? What will be the difference between Vagrant and just spinning HDP + Metron inside Virtualbox? - Dima On 02/07/2017 05:12 AM, Kyle Richardson wrote: > I like the idea of porting some of the integration tests to metron-docker. > I believe the maven plugin used in the rpm-docker project could be used to > support that goal. > > I agree with Ryan in that I see this as more of a toolbox for developers > than a supported deployment method. That is the vain I originally created > this PR in actually. I could continue to load the elasticsearch templates > manually when working with metron-docker but thought it would be worthwhile > to automate with a few lines of code. > > I have another PR just about ready to go to include a hadoop/hdfs container > in metron-docker. Would folks see value in including this? The idea was to > provide an easier way to iterate on HDFS indexing options for cold > storage/archive data. > > As for maintainability, the minimum would be to keep consistent versions of > storm, hbase, etc between the docker containers and the current supported > HDP stack. The automation pieces are nice to haves (not blockers in my > mind) and will continue to simplify as we move more configs into zookeeper > from the filesystem. I can't think of anything too onerous here but I may > be missing something obvious. > > -Kyle > > On Mon, Feb 6, 2017 at 2:30 PM, Otto Fowlerwrote: > >> Beyond the utility, is the cost of maintaining the docker path. It is just >> another thing that reviewers and committers have to keep in mind or know >> about when looking at PR’s. Maybe if there was a better and wider spread >> understanding of the work that is done and how continue it, it would not >> seem so onerous. It can’t be something that as long as one or two specific >> people keep up with it, it will be OK, or rather it should not be. Even >> if, or perhaps because it won’t break the build. >> >> There is a lot of utility and value to metron-docker, maybe we just need to >> think through the sustainability and maintaining issues, so it is a how can >> we make it work to the project’s satisfaction. >> >> On February 6, 2017 at 14:11:04, Casey Stella (ceste...@gmail.com) wrote: >> >> So, I'm late chiming in here, but I'll go ahead anyway. :) >> >> There are a couple of questions here that stand out: >> >> *Is the docker infrastructure sufficient to replace vagrant at the moment?* >> >> I do not consider it to be a sufficient environment to acceptance test >> features because it does not install Metron in a realistic manner that >> mimics a user. Vagrant isn't currently where it should be in that regard >> and that is the reason that it is currently getting an overhaul to get >> closer to that ideal. >> >> *Does it scratch an itch?* >> >> Yes, it does, I think. For those who want a limited portion of metron spun >> up to smoke-test features in a targeted way, this works well. That being >> said, in my opinion, you still need to test in vagrant or a cluster. Matt >> brings up a good point as well about integration test infrastructure. I >> think there could be an even bigger itch to scratch there as the cost of >> spinning up and down integration testing components per-test can be time >> consuming and lead to long build times. >> >> *Can we unify them?* >> >> I don't know; I'd like to, honestly. I think that it'd be a good >> discussion to have and it'd be nice to have a path to victory there, >> because I'm not thrilled about having so many avenues to install. If we >> don't unify them, I feel that docker will eventually get so far out of date >> that it will become unusable, frankly. >> >> >> Ultimately, I don't care about the tech stack that we use, docker vs >> vagrant vs vagrant on docker