[GitHub] metron issue #1293: METRON-1928: Bump Metron version to 0.7.0 for release
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1293 +1 Thanks! ---
[GitHub] metron issue #1293: METRON-1928: Bump Metron version to 0.7.0 for release
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1293 @justinleet The only issue I noticed is that the `Upgrading.md` has a section header that needs updated. ``` ## 0.6.0 to 0.6.1 ``` ``` ## 0.6.0 to 0.7.0 ``` ---
[GitHub] metron-bro-plugin-kafka pull request #21: METRON-1911 Docker setup for testi...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/21#discussion_r240209807 --- Diff: docker/scripts/download_sample_pcaps.sh --- @@ -0,0 +1,105 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +shopt -s nocasematch + +# +# Downloads sample pcap files to the data directory +# + +function help { + echo " " + echo "usage: ${0}" + echo "--data-path[REQURIED] The pcap data path" + echo "-h/--help Usage information." + echo " " + echo " " +} + +DATA_PATH= + +# handle command line options +for i in "$@"; do + case $i in + # + # DATA_PATH + # + # +--data-path=*) + DATA_PATH="${i#*=}" + shift # past argument=value +;; + + # + # -h/--help + # +-h | --help) + help + exit 0 + shift # past argument with no value +;; + + # + # Unknown option + # +*) + UNKNOWN_OPTION="${i#*=}" + echo "Error: unknown option: $UNKNOWN_OPTION" + help +;; + esac +done + +if [[ -z "$DATA_PATH" ]]; then + echo "DATA_PATH must be passed" + exit 1 +fi + +echo "Running download_sample_pcaps with " +echo "DATA_PATH = $DATA_PATH" +echo "===" + +for folder in nitroba example-traffic ssh ftp radius rfb; do + if [[ ! -d "${DATA_PATH}"/${folder} ]]; then +mkdir -p "${DATA_PATH}"/${folder} + fi +done + +if [[ ! -f "${DATA_PATH}"/example-traffic/exercise-traffic.pcap ]]; then + wget https://www.bro.org/static/traces/exercise-traffic.pcap -O "${DATA_PATH}"/example-traffic/exercise-traffic.pcap --- End diff -- Why not include this in the Dockerfile? That way it can be cached in the image. ---
[GitHub] metron-bro-plugin-kafka pull request #21: METRON-1911 Docker setup for testi...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/21#discussion_r240208847 --- Diff: docker/in_docker_scripts/build_bro_plugin.sh --- @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +shopt -s nocasematch + +# +# Runs bro-package to build and install the plugin +# + +cd /root/code || exit 1 + + +make clean + +rc=$?; if [[ ${rc} != 0 ]]; then + echo "ERROR cleaning project ${rc}" >>"${RUN_LOG_PATH}" + exit ${rc} +fi + +cd /root || exit 1 + +echo "" >>"${RUN_LOG_PATH}" 2>&1 +bro-pkg install code --force | tee "${RUN_LOG_PATH}" --- End diff -- Why not install the plugin in the Dockerfile? That way an image with the plugin installed and ready to go can be cached. ---
[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...
GitHub user nickwallen reopened a pull request: https://github.com/apache/metron/pull/1292 METRON-1925 Provide Verbose View of Profile Results in REPL ## Motivation When viewing profile measurements in the REPL using PROFILE_GET, you simply get back a list of values. It is not easy to determine from which time period the measurements were taken. For example, are the following values all sequential? Are there any gaps in the measurements taken over the past 30 minutes? When was the first measurement taken? ``` [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [2655, 1170, 1185, 1170, 1185, 1215, 1200, 1170] ``` The `PROFILE_GET` function was designed to return values that serve as input to other functions. It was not designed to return values in a human-readable form that can be easily understood. We need another way to query for profile measurements in the REPL that provides a user with a better understanding of the profile measurements. ## Solution This PR provides a new function called `PROFILE_VIEW`. It is effectively a "verbose mode" for `PROFILE_GET`. For lack of a better name, I just called it `PROFILE_VIEW`. I would be open to alternatives. I did not want to add additional options to the already complex `PROFILE_GET`. * Description: Retrieves a series of measurements from a stored profile. Provides a more verbose view of each measurement than PROFILE_GET. Returns a map containing the profile name, entity, period id, period start, period end for each profile measurement. * Arguments: profile - The name of the profile. entity - The name of the entity. periods - The list of profile periods to fetch. Use PROFILE_WINDOW or PROFILE_FIXED. groups - Optional, The groups to retrieve. Must correspond to the 'groupBy' list used during profile creation. Defaults to an empty list, meaning no groups. * Returns: A map for each profile measurement containing the profile name, entity, period, and value. ## Test Drive 1. Spin-up Full Dev and create a profile. Follow the Profiler README. Reduce the profile period if you are impatient. 1. Open up the REPL and retrieve the values using `PROFILE_GET`. Notice that I have no idea when the first measurement was taken, if the values are sequential, if there are gaps in the values and how big. ``` [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [1185, 1170, 1185, 1215, 1200, 1170, 5425, 1155, 1215, 1200] ``` 1. Now use `PROFILE_VIEW` to retrieve the same results. ``` [Stellar]>>> results := PROFILE_VIEW("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [{period.start=154411956, period=12867663, profile=hello-world, period.end=154411968, groups=[], value=1185, entity=global}, {period.start=154411968, period=12867664, profile=hello-world, period.end=154411980, groups=[], value=1170, entity=global}, {period.start=154411980, period=12867665, profile=hello-world, period.end=154411992, groups=[], value=1185, entity=global}, {period.start=154411992, period=12867666, profile=hello-world, period.end=154412004, groups=[], value=1215, entity=global}, {period.start=154412004, period=12867667, profile=hello-world, period.end=154412016, groups=[], value=1200, entity=global}, {period.start=154412016, period=12867668, profile=hello-world, period.end=154412028, groups=[], value=1170, entity=global}, {period.start=154412088, period=12867674, profile=hello-world, period.end=154412100, groups=[], value=5425, entity=global}, {period.start=154412100, period=12867675, profile=hello-world , period.end=154412112, groups=[], value=1155, entity=global}, {period.start=154412112, period=12867676, profile=hello-world, period.end=154412124, groups=[], value=1215, entity=global}, {period.start=154412124, period=12867677, profile=hello-world, period.end=154412136, groups=[], value=1200, entity=global}] ``` 1. For each measurement, I have a map containing the period, period start, period end, profile name, entity, groups, and value. With this I can better answer some of the questions above. ``` { profile=hello-world, entity=global, period=12867663, period.start=154411956, period.end=154411968, groups=[], value=1185 } ``` 1. When was the first measurement taken? ``` [Stellar]>>> GET(results, 0) {period.start=154411956, period=12867663, profile=hello-world, period.end=1544119
[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1292 Not sure where this is coming from: ``` Failed tests: RestFunctionsTest.restGetShouldTimeout:516 expected null, but was:<{get=success}> ``` ---
[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...
Github user nickwallen closed the pull request at: https://github.com/apache/metron/pull/1292 ---
[GitHub] metron pull request #1278: METRON-1892 Parser Debugger Should Load Config Fr...
GitHub user nickwallen reopened a pull request: https://github.com/apache/metron/pull/1278 METRON-1892 Parser Debugger Should Load Config From Zookeeper When using the parser debugger functions created in #1265, the user has to manually specify the parser configuration. This is useful when testing out a new parser before it goes live. In other cases, it would be simpler to use the sensor configuration values that are 'live' and already loaded in Zookeeper. For example, I might want to test why a particular messages fails to parse in my environment. ## Try It Out Try out the following examples in the Stellar REPL. 1. Launch a development environment. 1. Launch the REPL. ``` source /etc/default/metron cd $METRON_HOME bin/stellar -z $ZOOKEEPER ``` ### Parse a Message 1. Grab a message from the input topic to parse. You could also just mock-up a message that you would like to test. ``` [Stellar]>>> input := KAFKA_GET('bro') [{"http": {"ts":1542313125.807068,"uid":"CUrRne3iLIxXavQtci","id.orig_h"... ``` 1. Initialize the parser. The parser configuration for 'bro' will be loaded automatically from Zookeeper. ``` [Stellar]>>> parser := PARSER_INIT("bro") Parser{0 successful, 0 error(s)} ``` 1. Parse the message. ``` [Stellar]>>> msgs := PARSER_PARSE(parser, input) [{"bro_timestamp":"1542313125.807068","method":"GET","ip_dst_port":8080,... ``` 1. The parser will tally the success. ``` [Stellar]>>> parser Parser{1 successful, 0 error(s)} ``` 1. Review the successfully parsed message. ``` [Stellar]>>> LENGTH(msgs) 1 ``` ``` [Stellar]>>> msg := GET(msgs, 0) [Stellar]>>> MAP_GET("guid", msg) 7f2e0c77-c58c-488e-b1ad-fbec10fb8182 ``` ``` [Stellar]>>> MAP_GET("timestamp", msg) 1542313125807 ``` ``` [Stellar]>>> MAP_GET("source.type", msg) bro ``` ### Missing Configuration 1. If the configuration does not exist in Zookeeper, you should see something like this. I have not configured a parser named 'tuna' in my environment (but I could go for a tuna sandwich right about now ). ``` [Stellar]>>> bad := PARSER_INIT('tuna') [!] Unable to parse: PARSER_INIT('tuna') due to: Unable to read configuration from Zookeeper; sensorType = tuna org.apache.metron.stellar.dsl.ParseException: Unable to parse: PARSER_INIT('tuna') due to: Unable to read configuration from Zookeeper; sensorType = tuna at org.apache.metron.stellar.common.BaseStellarProcessor.createException(BaseStellarProcessor.java:166) at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:154) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:405) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:257) at org.apache.metron.stellar.common.shell.specials.AssignmentCommand.execute(AssignmentCommand.java:66) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:252) at org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:359) at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Unable to read configuration from Zookeeper; sensorType = tuna at org.apache.metron.management.ParserFunctions$InitializeFunction.readFromZookeeper(ParserFunctions.java:103) at org.apache.metron.management.ParserFunctions$InitializeFunction.apply(ParserFunctions.java:66) at org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:661) at org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:259) at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151) ... 9 more
[GitHub] metron pull request #1278: METRON-1892 Parser Debugger Should Load Config Fr...
Github user nickwallen closed the pull request at: https://github.com/apache/metron/pull/1278 ---
[GitHub] metron-bro-plugin-kafka issue #21: METRON-1911 Docker setup for testing bro ...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/21 I ran this up, but seemed to get a couple errors. ``` ./scripts/docker_run_bro_container.sh: line 152: DOCKER_CMD_BASE: command not found ``` ``` ./scripts/docker_run_bro_container.sh: line 162: metron-bro-docker-container:latest: command not found ``` This is the command that I ran along with what I saw when the errors popped up. ``` $ ./scripts/download_sample_pcaps.sh --data-path=~/tmp/data && ./example_script.sh --leave-running --data-path=/Users/me/tmp/pcap_data && ./scripts/docker_execute_process_data_dir.sh && ./scripts/docker_run_consume_bro_kafka.sh ... make[1]: Leaving directory `/root/bro-2.5.5/librdkafka-0.11.5/src-cpp' Removing intermediate container c4b01dee8e27 ---> 4301bf29af13 Step 23/23 : WORKDIR /root ---> Running in 3fac0790cfdc Removing intermediate container 3fac0790cfdc ---> 02b3e630af02 Successfully built 02b3e630af02 Successfully tagged metron-bro-docker-container:latest Running docker_run_bro_container with CONTAINER_NAME = bro NETWORK_NAME = bro-network SCRIPT_PATH = LOG_PATH = /Users/nallen/tmp/metron-bro-plugin-kafka-pr21/docker/logs DATA_PATH = /Users/me/tmp/pcap_data DOCKER_PARAMETERS = === Log will be found on host at /Users/nallen/tmp/metron-bro-plugin-kafka-pr21/docker/logs/bro-test-Fri_Dec__7_16:51:06_EST_2018.log ./scripts/docker_run_bro_container.sh: line 143: DOCKER_CMD_BASE: command not found ./scripts/docker_run_bro_container.sh: line 144: DOCKER_CMD_BASE: command not found ./scripts/docker_run_bro_container.sh: line 145: DOCKER_CMD_BASE: command not found ./scripts/docker_run_bro_container.sh: line 146: DOCKER_CMD_BASE: command not found ./scripts/docker_run_bro_container.sh: line 147: DOCKER_CMD_BASE: command not found ./scripts/docker_run_bro_container.sh: line 152: DOCKER_CMD_BASE: command not found ===Running Docker=== eval command is: bro bash ./scripts/docker_run_bro_container.sh: line 162: metron-bro-docker-container:latest: command not found Running stop_container with CONTAINER_NAME= kafka === kafka kafka Running stop_container with CONTAINER_NAME= zookeeper === zookeeper zookeeper Running destroy_docker_network with NETWORK_NAME = bro-network === bro-network ``` And I see no containers running. ``` $ docker ps CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES ``` Docker Engine 18.09.0 ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r239855809 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- Yes, good point @ottobackwards . @jagdeepsingh2 - He is referring specifically to the class `Syslog3164ParserIntegrationTest` in that PR. Should be fairly simple to put together with what you already have. ---
[GitHub] metron-bro-plugin-kafka issue #21: METRON-1911 Docker setup for testing bro ...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/21 Thanks @ottobackwards . I'll give it a run through. ---
[GitHub] metron issue #1289: METRON-1810 Storm Profiler Intermittent Test Failure
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1289 I have run it up in Full Dev successfully. I ran all the tests in `ProfilerIntegrationTest` in a continuous loop on my desktop for maybe 3 or 4 hours yesterday. Doing the same on master fails within 20 minutes or so. I have manually triggered it in Travis 10 times. ---
[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1292 > The underlying logic seems like it should be nearly identical. Is there any common functionality that could be pulled out and shared between the two? All of the heavy lifting was already done by the HBaseProfilerClient. So they already share a common implementation through that. And that is why you don't see a ton of change needed in `PROFILE_GET`. ---
[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1292 > Could the return be a full json document, that includes the query parameters? what do you mean by query parameters? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r239611104 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- I [opened this JIRA ](https://issues.apache.org/jira/browse/METRON-1926)to fix the parsing infrastructure. The error message produced should have made it clear that the message failed because it was missing a timestamp, but it does not. ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r239608145 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- Hi @jagdeepsingh2 - I was able to get this up and running in a debugger. Your parser will not parse messages successfully after the changes made in #1213. You are likely using this on an older version of Metron. The parser must produce a JSONObject that contains both a `timestamp` and `original_string` field based on the [validation performed here.](https://github.com/apache/metron/blob/2ee6cc7e0b448d8d27f56f873e2c15a603c53917/metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/BasicParser.java#L34-L46) If you add the timestamp like you mentioned it should work. ---
[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1289#discussion_r239577883 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/ProfilePeriod.java --- @@ -151,6 +152,8 @@ public String toString() { return "ProfilePeriod{" + "period=" + period + ", durationMillis=" + durationMillis + +", startTime=" + Instant.ofEpochMilli(getStartTimeMillis()).toString() + +", endTime=" + Instant.ofEpochMilli(getEndTimeMillis()).toString() + --- End diff -- I will remove this as it is not necessary. ---
[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1292 METRON-1925 Provide Verbose View of Profile Results in REPL ## Motivation When viewing profile measurements in the REPL using PROFILE_GET, you simply get back a list of values. It is not easy to determine from which time period the measurements were taken. For example, are the following values all sequential? Are there any gaps in the measurements taken over the past 30 minutes? When was the first measurement taken? ``` [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [2655, 1170, 1185, 1170, 1185, 1215, 1200, 1170] ``` The `PROFILE_GET` function was designed to return values that serve as input to other functions. It was not designed to return values in a human-readable form that can be easily understood. We need another way to query for profile measurements in the REPL that provides a user with a better understanding of the profile measurements. ## Solution This PR provides a new function called `PROFILE_VIEW`. It is effectively a "verbose mode" for `PROFILE_GET`. For lack of a better name, I just called it `PROFILE_VIEW`. I would be open to alternatives. I did not want to add additional options to the already complex `PROFILE_GET`. * Description: Retrieves a series of measurements from a stored profile. Provides a more verbose view of each measurement than PROFILE_GET. Returns a map containing the profile name, entity, period id, period start, period end for each profile measurement. * Arguments: profile - The name of the profile. entity - The name of the entity. periods - The list of profile periods to fetch. Use PROFILE_WINDOW or PROFILE_FIXED. groups - Optional, The groups to retrieve. Must correspond to the 'groupBy' list used during profile creation. Defaults to an empty list, meaning no groups. * Returns: A map for each profile measurement containing the profile name, entity, period, and value. ## Test Drive 1. Spin-up Full Dev and create a profile. Follow the Profiler README. Reduce the profile period if you are impatient. 1. Open up the REPL and retrieve the values using `PROFILE_GET`. Notice that I have no idea when the first measurement was taken, if the values are sequential, if there are gaps in the values and how big. ``` [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [1185, 1170, 1185, 1215, 1200, 1170, 5425, 1155, 1215, 1200] ``` 1. Now use `PROFILE_VIEW` to retrieve the same results. ``` [Stellar]>>> results := PROFILE_VIEW("hello-world","global",PROFILE_FIXED(30, "MINUTES")) [{period.start=154411956, period=12867663, profile=hello-world, period.end=154411968, groups=[], value=1185, entity=global}, {period.start=154411968, period=12867664, profile=hello-world, period.end=154411980, groups=[], value=1170, entity=global}, {period.start=154411980, period=12867665, profile=hello-world, period.end=154411992, groups=[], value=1185, entity=global}, {period.start=154411992, period=12867666, profile=hello-world, period.end=154412004, groups=[], value=1215, entity=global}, {period.start=154412004, period=12867667, profile=hello-world, period.end=154412016, groups=[], value=1200, entity=global}, {period.start=154412016, period=12867668, profile=hello-world, period.end=154412028, groups=[], value=1170, entity=global}, {period.start=154412088, period=12867674, profile=hello-world, period.end=154412100, groups=[], value=5425, entity=global}, {period.start=154412100, period=12867675, profile=hello-world , period.end=154412112, groups=[], value=1155, entity=global}, {period.start=154412112, period=12867676, profile=hello-world, period.end=154412124, groups=[], value=1215, entity=global}, {period.start=154412124, period=12867677, profile=hello-world, period.end=154412136, groups=[], value=1200, entity=global}] ``` 1. For each measurement, I have a map containing the period, period start, period end, profile name, entity, groups, and value. With this I can better answer some of the questions above. ``` { profile=hello-world, entity=global, period=12867663, period.start=154411956, period.end=154411968, groups=[], value=1185 } ``` 1. When was the first measurement taken? ``` [Stellar]>>> GET(results, 0) {period.start=154411956, period=12867663, profile=hello-world, period.end=1544119
[GitHub] metron issue #1288: METRON-1916 Stellar Classpath Function Resolver Should H...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1288 It was when I was working on #1289 . After changing the POM, I found that multiple versions of httpclient were getting pulled in, which broke all the integration tests because a Stellar executor could not load. It would throw the exception that I put in the JIRA. This is one specific scenario, but I can imagine that classpath issues like this are going to crop-up in different environments. It seems like we should have a way to handle this. ---
[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1289 METRON-1810 Storm Profiler Intermittent Test Failure This PR hopefully resolves some of the more frequent, intermittent `ProfilerIntegrationTest` failures. ### Testing Run the integration tests and they should not fail. I have repeatedly run the integration tests on my laptop and have yet to see a failure. I am also repeatedly triggering Travis CI builds on this branch to see if any failures occur there. I can't prove the problem is solved, but I am hoping this helps. ### Changes * This change uses Caffeine's CacheWriter in place of the RemovalListener for profile expiration. This is explained more in the next section. * Removed the ability to define the cache maintenance executor for the `DefaultMessageDistributor`. This was only needed for testing RemovalListeners, which no longer exist. * I re-jiggered some of the integration tests to provide more information when they do fail. I removed the use of `waitOrTimeout` and replaced it with `assertEventually`. I am also using a different style of assertion; Hamcrest-y. These changes should provide additional information and also be more consistent with the rest of the code base. * After removing the dependency that provided `waitOrTimeout`, I had to rework some of the project dependencies because multiple versions of`httpclient` lib was being pulled in. This caused the new REST function in `stellar-common` to blow up when a Stellar execution environment is loaded during the integration test. * Added additional debug logging for the caches including an estimate of the cache size. RemovalListener to CacheWriter Profiles are designed to expire and flush after a fixed period of time; the TTL. The integration tests rely on some of the profile values being flushed by this TTL mechanism. The TTL mechanism is driven by cache expiration. There is a cache of "active" profiles. This cache is configured to have values timeout after they have not been accessed for a fixed period of time. Once they timeout from the active cache, they are placed on an "expired" cache. Messages are not applied to expired profiles, but these expired profiles hang around for a fixed period of time to allow them to be flushed. Previously, a Caffeine [RemovalListener](https://github.com/ben-manes/caffeine/wiki/Removal#removal-listeners) was used so that when a profile expires from the active cache, it is placed into the expired cache. When the tests fail, it seems that the `RemovalListener` is not notified in a timely fashion, so the profile doesn't make it to the expired cache and so never flushes to be read by the integration test. In Caffeine, these listeners are notified asynchronously, on a separate thread (via ForkJoinPool.commonPool()), not inline with cache reads or writes. For running tests that depend on RemovalListener's it is recommended to set the cache maintenance executor to something like `MoreExecutors.sameThreadExecutor()` so that cache maintenance is executed on the main execution thread when `cleanUp` is called. This was done for the unit tests, but was not done for the integration tests. The Caffeine Wiki mentions that when notification should be performed synchronously, which logically works for this use case, to use a [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead. > Removal listener operations are executed asynchronously using an Executor. The default executor is ForkJoinPool.commonPool() and can be overridden via Caffeine.executor(Executor). When the operation must be performed synchronously with the removal, use [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead. This does not negatively impact production performance because the `ActiveCacheWriter` only does work on a delete which occurs only rarely when a profile stops receiving messages, not on a write. In addition, these caches are very read-heavy as in most cases the cache is only written to when a new profile is defined or on topology start-up. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide
[GitHub] metron issue #1269: METRON-1879 Allow Elasticsearch to Auto-Generate the Doc...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1269 Prior to this change we use the Metron-generated GUID as the documentID. They were always the same. There were various places in the code where the name `guid`, `docID`, or `id` were used that implicitly meant the Metron GUID. With this change, the Metron-generated guid will never be equal to the Elasticsearch-generated doc ID. You can view and use either the guid or the doc ID in the UI. Both are available as separate fields in the index. This PR has pulled on those strings and tried to make clear what is a Metron GUID and what is a document ID. In the Alerts UI you can see and/or use either the GUID or the document ID. It is up to the user which one they care to see. Although most users will not care what the document ID is. From my PR description: > This change is backwards compatible. The Alerts UI should continue to work no matter if some of the underlying indices were written with the Metron GUID as the document ID and some are written with the auto-generated document ID. There is no option to continue to use the GUID as the documentID. I can't think of a use case worthy of the additional effort and testing needed to support that. It is backwards compatible in that the Alerts UI will work when searching over both "legacy" indices where guid = docID and "new" indices where guid != docID. All places in the code where a docID is needed, that docID is actually first retrieved from Elasticsearch, rather than making an assumption about what it is. ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r239114286 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/bulk/BulkDocumentWriter.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.elasticsearch.bulk; + +import org.apache.metron.indexing.dao.update.Document; + +import java.util.List; + +/** + * Writes documents to an index in bulk. + * + * Partial failures within a batch can be handled individually by registering + * a {@link FailureListener}. + * + * @param The type of document to write. + */ +public interface BulkDocumentWriter { + +/** + * A listener that is notified when a set of documents have been + * written successfully. + * @param The type of document to write. + */ +interface SuccessListener { --- End diff -- Done. See latest commit. ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r239065568 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/dao/ElasticsearchDao.java --- @@ -196,7 +196,7 @@ public ElasticsearchDao withRefreshPolicy(WriteRequest.RefreshPolicy refreshPoli } protected Optional getIndexName(String guid, String sensorType) throws IOException { -return updateDao.getIndexName(guid, sensorType); +return updateDao.findIndexNameByGUID(guid, sensorType); --- End diff -- > Also, would we want any parity between the updateDao's find method name vs the ElasticsearchDao's getIndexName method name? I found the [code here confusing](https://github.com/apache/metron/blob/89a2beda4f07911c8b3cd7dee8a2c3426838d161/metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/dao/ElasticsearchUpdateDao.java#L195-L197) and had me stuck on an issue for quite some time. When both use `getIndexName` I have no idea what the logic is doing. It tries one approach, then falls back to another, but since the methods are named the same, it doesn't tell me how they attempt to find the index name in a different way. With the rename, I feel it improves understanding in a glance [what this is doing now](https://github.com/apache/metron/blob/260ccc366b79ef53595dbfd097066040444b4eda/metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/dao/ElasticsearchUpdateDao.java#L179) and the differences between the primary approach versus the fallback. > Is sensorType not a component to retrieving the index name? So you prefer the original function name? Or you prefer `lookupIndexName`, `findIndexNameByGUIDAndSensor`? ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r239054925 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriterTest.java --- @@ -18,170 +18,241 @@ package org.apache.metron.elasticsearch.writer; -import static org.junit.Assert.assertEquals; -import static org.mockito.Mockito.mock; -import static org.mockito.Mockito.when; +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.writer.WriterConfiguration; +import org.apache.metron.common.writer.BulkWriterResponse; +import org.apache.storm.task.TopologyContext; +import org.apache.storm.tuple.Tuple; +import org.json.simple.JSONObject; +import org.junit.Before; +import org.junit.Test; -import com.google.common.collect.ImmutableList; +import java.util.ArrayList; import java.util.Collection; import java.util.HashMap; +import java.util.List; import java.util.Map; -import org.apache.metron.common.writer.BulkWriterResponse; -import org.apache.storm.tuple.Tuple; -import org.elasticsearch.action.bulk.BulkItemResponse; -import org.elasticsearch.action.bulk.BulkResponse; -import org.junit.Test; +import java.util.UUID; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; public class ElasticsearchWriterTest { -@Test -public void testSingleSuccesses() throws Exception { -Tuple tuple1 = mock(Tuple.class); -BulkResponse response = mock(BulkResponse.class); -when(response.hasFailures()).thenReturn(false); +Map stormConf; +TopologyContext topologyContext; +WriterConfiguration writerConfiguration; -BulkWriterResponse expected = new BulkWriterResponse(); -expected.addSuccess(tuple1); +@Before +public void setup() { +topologyContext = mock(TopologyContext.class); -ElasticsearchWriter esWriter = new ElasticsearchWriter(); -BulkWriterResponse actual = esWriter.buildWriteReponse(ImmutableList.of(tuple1), response); +writerConfiguration = mock(WriterConfiguration.class); +when(writerConfiguration.getGlobalConfig()).thenReturn(globals()); -assertEquals("Response should have no errors and single success", expected, actual); +stormConf = new HashMap(); } @Test -public void testMultipleSuccesses() throws Exception { -Tuple tuple1 = mock(Tuple.class); -Tuple tuple2 = mock(Tuple.class); - -BulkResponse response = mock(BulkResponse.class); -when(response.hasFailures()).thenReturn(false); +public void shouldWriteSuccessfully() { +// create a writer where all writes will be successful +float probabilityOfSuccess = 1.0F; +ElasticsearchWriter esWriter = new ElasticsearchWriter(); +esWriter.setDocumentWriter( new BulkDocumentWriterStub<>(probabilityOfSuccess)); +esWriter.init(stormConf, topologyContext, writerConfiguration); -BulkWriterResponse expected = new BulkWriterResponse(); -expected.addSuccess(tuple1); -expected.addSuccess(tuple2); +// create a tuple and a message associated with that tuple +List tuples = createTuples(1); +List messages = createMessages(1); -ElasticsearchWriter esWriter = new ElasticsearchWriter(); -BulkWriterResponse actual = esWriter.buildWriteReponse(ImmutableList.of(tuple1, tuple2), response); +BulkWriterResponse response = esWriter.write("bro", writerConfiguration, tuples, messages); -assertEquals("Response should have no errors and two successes", expected, actual); +// response should only contain successes +assertFalse(response.hasErrors()); +assertTrue(response.getSuccesses().contains(tuples.get(0))); } @Test -public void testSingleFailure() throws Exception { -Tuple tuple1 = mock(Tuple.class); - -BulkResponse response = mock(BulkResponse.class); -when(response.hasFailures()).thenReturn(true); - -Exception e = new IllegalStateException(); -BulkItemResponse itemResponse = buildBulkItemFailure(e); - when(response.iterator()).thenReturn(ImmutableList.of(itemRespons
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r239054070 --- Diff: metron-platform/metron-indexing/src/main/java/org/apache/metron/indexing/dao/update/Document.java --- @@ -89,46 +91,29 @@ public void setGuid(String guid) { this.guid = guid; } - @Override - public String toString() { -return "Document{" + -"timestamp=" + timestamp + -", document=" + document + -", guid='" + guid + '\'' + -", sensorType='" + sensorType + '\'' + -'}'; - } - @Override public boolean equals(Object o) { -if (this == o) { - return true; -} -if (o == null || getClass() != o.getClass()) { - return false; -} - +if (this == o) return true; +if (!(o instanceof Document)) return false; Document document1 = (Document) o; - -if (timestamp != null ? !timestamp.equals(document1.timestamp) : document1.timestamp != null) { - return false; -} -if (document != null ? !document.equals(document1.document) : document1.document != null) { - return false; -} -if (guid != null ? !guid.equals(document1.guid) : document1.guid != null) { - return false; -} -return sensorType != null ? sensorType.equals(document1.sensorType) -: document1.sensorType == null; +return Objects.equals(timestamp, document1.timestamp) && --- End diff -- It is an auto-create from IntelliJ. It changed when on a previous iteration, I added a field to the Document class. I since backed that out and went with another approach. So there really is no need for this to change now. I will back this out. ---
[GitHub] metron issue #1284: METRON-1867 Remove `/api/v1/update/replace` endpoint
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1284 > Is this endpoint superseded by another implementation, or just removed altogether? The endpoint has been removed completely. It is not being used by anything in Metron currently. > Any users of the REST API using this directly for any reason? Not sure if we have to deprecate this - i.e. I'm unclear of whether or not we consider the REST API client facing or not, or if it's just middleware to us for the UI. I just sent out a notice to the dev list to see if anyone has a strong opinion. ---
[GitHub] metron-bro-plugin-kafka issue #20: METRON-1910: bro plugin segfaults on src/...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/20 No problem @JonZeolla. I can help track it down too when I get some free time. ---
[GitHub] metron pull request #1280: METRON-1869 Unable to Sort an Escalated Meta Aler...
Github user nickwallen closed the pull request at: https://github.com/apache/metron/pull/1280 ---
[GitHub] metron issue #1280: METRON-1869 Unable to Sort an Escalated Meta Alert
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1280 ``` Tests in error: ProfilerIntegrationTest.testEventTime:281 û Timeout ``` Unrelated test failure, which is tracked as METRON-1810 (and I am trying to fix on the side.) Kicking Travis. ---
[GitHub] metron issue #1288: METRON-1916 Stellar Classpath Function Resolver Should H...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1288 > In my mind we don't have a current state where Stellar is running but not all the functions in the class path are loaded. The same thing would happen today if an `Exception` is thrown in any of the functions being loaded. If an Exception is thrown, instead of N functions ready-to-rock, we would have N-1 functions. This just expands that behavior to `ClassNotFoundException` which gets thrown when there is a missing dependency.. > Before we would have crashed starting up. Now we will run and crash get an error later. I don't see it that way. Let's look at both scenarios. * If I am NOT using the REST function, then things just work as they should. Yay! I don't want to worry about the missing dependency for a function that I have no knowledge about. * If I am using the REST function, an exception would still be thrown when the function definition could not be found. So the user still gets an exception when there is a problem. (Q) Any thoughts on alternative approaches? I prefer not to have to worry about classpath dependencies (like those required by the REST function) that I am not using. In this example, I was only trying to use the `STATS_*` functions, but it was the REST function that blew things up for me. It is difficult for a user to track this down, because they have no knowledge of the REST function and its dependencies. (Alternative 1) An alternative approach would be to be more selective about what functions we add to `stellar-common`. Anything added to `stellar-common` forces a required dependency on all Stellar users. If the REST function had been a separate project, then a user could choose to use or not use that projects and not be burdened by the additional dependency (Alternative 2) Could the `stellar.function.resolver.includes` and `stellar.function.resolver.excludes` be enhanced to allow users to exclude functions they don't want to load? ---
[GitHub] metron issue #1288: METRON-1916 Stellar Classpath Function Resolver Should H...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1288 There is just one less function available on the close when this happens. This is not something new unknown state. ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237875330 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- If the parser does fail to parse a given message, we need to make sure that the error message kicked out to the error topic has a helpful message, stack trace, etc. Otherwise, it will be impossible for a user to determine why the parser failed to parse the message. While adding the timestamp is probably a good addition, I don't know that it really solves the problem here. Right now, I don't really know if the problem is in your parser or in the parser infrastructure, but it is something that I want to make sure we track down. ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237870087 --- Diff: metron-platform/metron-parsers/src/test/resources/config/RegularExpressionsInvalidParserConfig.json --- @@ -0,0 +1,208 @@ +{ + "convertCamelCaseToUnderScore": true, + "messageHeaderRegex": "(?(?<=^<)\\d{1,4}(?=>)).*?(?(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?(?<=\\s).*?(?=\\s))", + "recordTypeRegex": "(?(?<=\\s)\\b(tch-replicant|audispd|syslog|ntpd|sendmail|pure-ftpd|usermod|useradd|anacron|unix_chkpwd|sudo|dovecot|postfix\\/smtpd|postfix\\/smtp|postfix\\/qmgr|klnagent|systemd|(?i)crond(?-i)|clamd|kesl|sshd|run-parts|automount|suexec|freshclam|kernel|vsftpd|ftpd|su)\\b(?=\\[|:))", + "fields": [ +{ + "recordType": "syslog", + "regex": ".*(?(?<=PID\\s=\\s).*?(?=\\sLine)).*(?(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?.*?(?=\")).*(?(?<=\").*?(?=$))" +}, +{ + "recordType": "pure-ftpd", + "regex": ".*(?(?<=\\:\\s\\().*?(?=\\)\\s)).*?(?(?<=\\s\\[).*?(?=\\]\\s)).*?(?(?<=\\]\\s).*?(?=$))" +}, +{ + "recordType": "systemd", + "regex": [ + ".*(?(?<=\\ssystemd\\:\\s).*?(?=\\d+)).*?(?(?<=\\sSession\\s).*?(?=\\sof)).*?(?(?<=\\suser\\s).*?(?=\\.)).*$", + ".*(?(?<=\\ssystemd\\:\\s).*?(?=\\sof)).*?(?(?<=\\sof\\s).*?(?=\\.)).*$" + ] +}, +{ + "recordType": "kesl", + "regex": ".*(?(?<=\\:).*?(?=$))" +}, +{ + "recordType": "dovecot", + "regex": [ + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=\\:\\suser)).*?(?(?<=user\\=\\<).*?(?=\\>)).*?(?(?<=rip\\=).*?(?=,)).*?(?(?<=lip\\=).*?(?=,)).*?(?(?<=,\\s).*?(?=,)).*?(?(?<=session\\=\\<).*?(?=\\>)).*$", + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=\\:\\srip)).*?(?(?<=rip\\=).*?(?=,)).*?(?(?<=lip\\=).*?(?=,)).*?(?(?<=,\\s).*?(?=$))", + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "postfix/smtpd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))", + ".*(?(?<=\\[).*?(?=\\]:)).*?(?(?<=\\:\\s)disconnect(?=\\sfrom)).*?(?(?<=from).*(?=\\[)).*?(?(?<=\\[).*(?=\\])).*$" + ] +}, +{ + "recordType": "postfix/smtp", + "regex": [ + ".*(?(?<=smtp\\[).*?(?=\\]:)).*(?(?<=to=#\\<).*?(?=>,)).*(?(?<=relay=).*?(?=,)).*(?(?<=delay=).*?(?=,)).*(?(?<=delays=).*?(?=,)).*(?(?<=dsn=).*?(?=,)).*(?(?<=status=).*?(?=\\()).*?(?(?<=connect to).*?(?=\\[)).*?(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:).*?(?=:\\s)).*?(?(?<=:\\s).*?(?=$))", + ".*(?(?<=smtp\\[).*?(?=\\]:)).*?(?(?<=connect to).*?(?=\\[)).*?(?(?<=\\[).*?(?=\\])).*(?(?<=:).*?(?=\\s)).*(?(?<=\\s).*?(?=$))", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "crond", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=CMD\\s\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=CMD\\s\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "clamd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))", + ".*(?(?<=\\:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "run-parts", + "regex": ".*(?(?<=\\sparts).*?(?=$))" +}, +{ + "recordType": "sshd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s).*?(?=\\sfor)).*?(?(?<=\\sfor\\s).*?(?=\\sfrom)).*?(?(?<=\\sfrom\\s).*?(?=\\sport)).*?(?(?<=\\sport\\s).*?(?=\\s)).*?(?(?<=port\\s\\d{1,5}\\s).*(?=:\\s)).
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237869538 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", --- End diff -- Thanks for the explanation. Can you add these details to the README? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237869285 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", + "fields": [ +{ + "recordType": "kernel", + "regex": ".*((<=\\]|\\w\\:).*?(?=$))" +}, +{ + "recordType": "syslog", + "regex": ".*((<=PID\\s=\\s).*?(?=\\sLine)).*((<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w)) (.*?(?=\")).*((<=\").*?(?=$))" +} + ] + ``` + **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g. + ```json + "messageHeaderRegex": [ + "regular expression 1", + "regular expression 2" + ] + ``` + Where **regular expression 1** are valid regular expressions and may have named + groups, which would be extracted into fields. This list will be evaluated in order until a + matching regular expression is found. + + **recordTypeRegex** can be a more advanced regular expression containing named goups. For example --- End diff -- Good description. Can you add this advice to the documentation? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237868794 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); +regularExpressionsParser.configure(parserConfig); +JSONObject parsed = parse(message); +// Expected +Map expectedJson = new HashMap<>(); +expectedJson.put("device_name", "deviceName"); +expectedJson.put("dst_process_name", "sshd"); +expectedJson.put("dst_process_id", "11672"); +expectedJson.put("dst_user_id", "prod"); +expectedJson.put("ip_src_addr", "22.22.22.22"); +expectedJson.put("ip_src_port", "5"); +expectedJson.put("app_protocol", "ssh2"); +assertTrue(validate(expectedJson, parsed)); + + } + + @Test + public void testNoMessageHeaderRegex() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsNoMessageHeaderParserConfig.json") +.toString()); +regularExpressionsParser.configure(parserConfig); +JSONObject parsed = parse(message); +// Expected +Map expectedJson = new HashMap<>(); +expectedJson.put("dst_process_name", "sshd"); +expectedJson.put("dst_process_id", "11672"); +expectedJson.put("dst_user_id", "prod"); +expectedJson.put("ip_src_addr", "22.22.22.22"); +expectedJson.put("ip_src_port", "5"); +expectedJson.put("app_protocol", "ssh2"); +assertTrue(validate(expectedJson, parsed)); --- End diff -- > Junit best practices state that maximum one assertion per test case. I have never heard that, nor ever, ever followed that. :) I think every test in Metron has multiple assertions, which are necessary. I think best practice may be to test one "thing" at a time, but you may require multiple assertions when testing that one "thing". I think it is much simpler the way I suggested, but we could probably spend the time on other more important things. ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237866676 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- Thanks @jagdeepsingh2 . I will try and debug a little further myself too. I want to make sure there are no incompatibilities between your parser and the newer changes introduced by the `ParserRunner`. Glad there isn't something obviously stupid that I am doing. :) ---
[GitHub] metron pull request #1269: METRON-1879 Allow Elasticsearch to Auto-Generate ...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1269#discussion_r237852978 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriter.java --- @@ -56,90 +58,107 @@ */ private transient ElasticsearchClient client; + /** + * Responsible for writing documents. + * + * Uses a {@link TupleBasedDocument} to maintain the relationship between + * a {@link Tuple} and the document created from the contents of that tuple. If + * a document cannot be written, the associated tuple needs to be failed. + */ + private transient BulkDocumentWriter documentWriter; + /** * A simple data formatter used to build the appropriate Elasticsearch index name. */ private SimpleDateFormat dateFormat; - @Override public void init(Map stormConf, TopologyContext topologyContext, WriterConfiguration configurations) { - Map globalConfiguration = configurations.getGlobalConfig(); -client = ElasticsearchClientFactory.create(globalConfiguration); dateFormat = ElasticsearchUtils.getIndexFormat(globalConfiguration); + +// only create the document writer, if one does not already exist. useful for testing. +if(documentWriter == null) { + client = ElasticsearchClientFactory.create(globalConfiguration); + documentWriter = new ElasticsearchBulkDocumentWriter<>(client); +} } @Override - public BulkWriterResponse write(String sensorType, WriterConfiguration configurations, Iterable tuples, List messages) throws Exception { + public BulkWriterResponse write(String sensorType, + WriterConfiguration configurations, + Iterable tuplesIter, + List messages) { // fetch the field name converter for this sensor type FieldNameConverter fieldNameConverter = FieldNameConverters.create(sensorType, configurations); +String indexPostfix = dateFormat.format(new Date()); +String indexName = ElasticsearchUtils.getIndexName(sensorType, indexPostfix, configurations); + +// the number of tuples must match the number of messages +List tuples = Lists.newArrayList(tuplesIter); +int batchSize = tuples.size(); +if(messages.size() != batchSize) { + throw new IllegalStateException(format("Expect same number of tuples and messages; |tuples|=%d, |messages|=%d", + tuples.size(), messages.size())); +} -final String indexPostfix = dateFormat.format(new Date()); -BulkRequest bulkRequest = new BulkRequest(); -for(JSONObject message: messages) { +// create a document from each message +List documents = new ArrayList<>(); +for(int i=0; i
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237651771 --- Diff: metron-platform/metron-parsers/src/test/resources/config/RegularExpressionsInvalidParserConfig.json --- @@ -0,0 +1,208 @@ +{ + "convertCamelCaseToUnderScore": true, + "messageHeaderRegex": "(?(?<=^<)\\d{1,4}(?=>)).*?(?(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?(?<=\\s).*?(?=\\s))", + "recordTypeRegex": "(?(?<=\\s)\\b(tch-replicant|audispd|syslog|ntpd|sendmail|pure-ftpd|usermod|useradd|anacron|unix_chkpwd|sudo|dovecot|postfix\\/smtpd|postfix\\/smtp|postfix\\/qmgr|klnagent|systemd|(?i)crond(?-i)|clamd|kesl|sshd|run-parts|automount|suexec|freshclam|kernel|vsftpd|ftpd|su)\\b(?=\\[|:))", + "fields": [ +{ + "recordType": "syslog", + "regex": ".*(?(?<=PID\\s=\\s).*?(?=\\sLine)).*(?(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?.*?(?=\")).*(?(?<=\").*?(?=$))" +}, +{ + "recordType": "pure-ftpd", + "regex": ".*(?(?<=\\:\\s\\().*?(?=\\)\\s)).*?(?(?<=\\s\\[).*?(?=\\]\\s)).*?(?(?<=\\]\\s).*?(?=$))" +}, +{ + "recordType": "systemd", + "regex": [ + ".*(?(?<=\\ssystemd\\:\\s).*?(?=\\d+)).*?(?(?<=\\sSession\\s).*?(?=\\sof)).*?(?(?<=\\suser\\s).*?(?=\\.)).*$", + ".*(?(?<=\\ssystemd\\:\\s).*?(?=\\sof)).*?(?(?<=\\sof\\s).*?(?=\\.)).*$" + ] +}, +{ + "recordType": "kesl", + "regex": ".*(?(?<=\\:).*?(?=$))" +}, +{ + "recordType": "dovecot", + "regex": [ + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=\\:\\suser)).*?(?(?<=user\\=\\<).*?(?=\\>)).*?(?(?<=rip\\=).*?(?=,)).*?(?(?<=lip\\=).*?(?=,)).*?(?(?<=,\\s).*?(?=,)).*?(?(?<=session\\=\\<).*?(?=\\>)).*$", + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=\\:\\srip)).*?(?(?<=rip\\=).*?(?=,)).*?(?(?<=lip\\=).*?(?=,)).*?(?(?<=,\\s).*?(?=$))", + ".*(?(?<=\\sdovecot:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "postfix/smtpd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))", + ".*(?(?<=\\[).*?(?=\\]:)).*?(?(?<=\\:\\s)disconnect(?=\\sfrom)).*?(?(?<=from).*(?=\\[)).*?(?(?<=\\[).*(?=\\])).*$" + ] +}, +{ + "recordType": "postfix/smtp", + "regex": [ + ".*(?(?<=smtp\\[).*?(?=\\]:)).*(?(?<=to=#\\<).*?(?=>,)).*(?(?<=relay=).*?(?=,)).*(?(?<=delay=).*?(?=,)).*(?(?<=delays=).*?(?=,)).*(?(?<=dsn=).*?(?=,)).*(?(?<=status=).*?(?=\\()).*?(?(?<=connect to).*?(?=\\[)).*?(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:).*?(?=:\\s)).*?(?(?<=:\\s).*?(?=$))", + ".*(?(?<=smtp\\[).*?(?=\\]:)).*?(?(?<=connect to).*?(?=\\[)).*?(?(?<=\\[).*?(?=\\])).*(?(?<=:).*?(?=\\s)).*(?(?<=\\s).*?(?=$))", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "crond", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=CMD\\s\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s\\().*?(?=\\)\\s)).*?(?(?<=CMD\\s\\().*?(?=\\))).*$", + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "clamd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))", + ".*(?(?<=\\:\\s).*?(?=\\:)).*?(?(?<=\\:).*?(?=$))" + ] +}, +{ + "recordType": "run-parts", + "regex": ".*(?(?<=\\sparts).*?(?=$))" +}, +{ + "recordType": "sshd", + "regex": [ + ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s).*?(?=\\sfor)).*?(?(?<=\\sfor\\s).*?(?=\\sfrom)).*?(?(?<=\\sfrom\\s).*?(?=\\sport)).*?(?(?<=\\sport\\s).*?(?=\\s)).*?(?(?<=port\\s\\d{1,5}\\s).*(?=:\\s)).
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237650548 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", + "fields": [ +{ + "recordType": "kernel", + "regex": ".*((<=\\]|\\w\\:).*?(?=$))" +}, +{ + "recordType": "syslog", + "regex": ".*((<=PID\\s=\\s).*?(?=\\sLine)).*((<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w)) (.*?(?=\")).*((<=\").*?(?=$))" +} + ] + ``` + **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g. --- End diff -- Can you show me what the examples above would look like as lists? Why would I choose to use a list versus not use a list? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237649967 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", + "fields": [ +{ + "recordType": "kernel", + "regex": ".*((<=\\]|\\w\\:).*?(?=$))" +}, +{ + "recordType": "syslog", + "regex": ".*((<=PID\\s=\\s).*?(?=\\sLine)).*((<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w)) (.*?(?=\")).*((<=\").*?(?=$))" --- End diff -- What is the expected output here? Should I expect that for any 'syslog' message, there will be 3 fields added; processid, filePath, and fileName? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237649469 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", --- End diff -- What is the expected outcome with this `messageHeaderRegex` example? * I should expect this to be run on all record types (both kernel and syslog), right? * I should expect each output message to contain 3 fields; syslogPriority, timestamp, syslogHost? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237648142 --- Diff: metron-platform/metron-parsers/README.md --- @@ -52,6 +52,62 @@ There are two general types types of parsers: This is using the default value for `wrapEntityName` if that property is not set. * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`. The `jsonpQuery` should reference this name. * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted. + * Regular Expressions Parser + * `recordTypeRegex` : A regular expression to uniquely identify a record type. + * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages. + * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. + For example, following convertions will automatically happen: + + ``` + ipSrcAddr -> ip_src_addr + ipDstAddr -> ip_dst_addr + ipSrcPort -> ip_src_port + ``` + Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property. + + * `fields` : A json list of maps contaning a record type to regular expression mapping. + + A complete configuration example would look like: + + ```json + "convertCamelCaseToUnderScore": true, + "recordTypeRegex": "kernel|syslog", + "messageHeaderRegex": "((<=^)\\d{1,4}(?=>)).*?((<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?((<=\\s).*?(?=\\s))", + "fields": [ +{ + "recordType": "kernel", + "regex": ".*((<=\\]|\\w\\:).*?(?=$))" +}, +{ + "recordType": "syslog", + "regex": ".*((<=PID\\s=\\s).*?(?=\\sLine)).*((<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w)) (.*?(?=\")).*((<=\").*?(?=$))" +} + ] + ``` + **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g. + ```json + "messageHeaderRegex": [ + "regular expression 1", + "regular expression 2" + ] + ``` + Where **regular expression 1** are valid regular expressions and may have named + groups, which would be extracted into fields. This list will be evaluated in order until a + matching regular expression is found. + + **recordTypeRegex** can be a more advanced regular expression containing named goups. For example --- End diff -- Why would I want to use named groups in the `recordTypeRegex`? I thought the purpose was to return a record type? If I want to add fields, wouldn't I just add a regex to the `fields` parameter? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237574064 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- The configuration contained in `src/test/resources/config/RegularExpressionsParserConfig.json` is hard to grok because it covers so many record types. I would get rid of this JSON file completely. Actually all of the JSONs that you added in `src/test/resources/config`. Instead use the @Multiline annotation along with a more focused configuration that precedes each test case. You don't need 30 different record types defined to test SSHD parsing. Each test case would be preceded with a @Multiline annotated field containing the configuration for that test case. For example your SSHD test might look-like this. ``` /** * { *"convertCamelCaseToUnderScore": true, *"messageHeaderRegex": "(?(?<=^<)\\d{1,4}(?=>)).*?(?(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?(?<=\\s).*?(?=\\s))", *"recordTypeRegex": "(?(?<=\\s)\\b(tch-replicant|audispd|syslog|ntpd|sendmail|pure-ftpd|usermod|useradd|anacron|unix_chkpwd|sudo|dovecot|postfix\\/smtpd|postfix\\/smtp|postfix\\/qmgr|klnagent|systemd|(?i)crond(?-i)|clamd|kesl|sshd|run-parts|automount|suexec|freshclam|kernel|vsftpd|ftpd|su)\\b(?=\\[|:))", *"fields": [ *{ *"recordType": "sshd", *"regex": [ * ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s).*?(?=\\sfor)).*?(?(?<=\\sfor\\s).*?(?=\\sfrom)).*?(?(?<=\\sfrom\\s).*?(?=\\sport)).*?(?(?<=\\sport\\s).*?(?=\\s)).*?(?(?<=port\\s\\d{1,5}\\s).*(?=:\\s)).*?(?(?<=:\\s).+?(?=\\s)).*(?(?<=\\s).+?(?=$))", * ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:\\s).*?(?=\\sfor)).*?(?(?<=\\sfor\\s).*?(?=\\sfrom)).*?(?(?<=\\sfrom\\s).*?(?=\\sport)).*?(?(?<=\\sport\\s).*?(?=\\s)).*?(?(?<=port\\s\\d{1,5}\\s).*?(?=$))", * ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=Remote:).*?(?=\\-)).*?(?(?<=\\-).*?(?=;)).*?(?(?<=Protocol:).*?(?=;)).*?(?(?<=Client:).*?(?=$))", * ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=\\]:).*?(?=:)).*?(?(?<=Remote:).*?(?=\\-)).*?(?(?<=\\-).*?(?=;)).*?(?(?<=Enc:\\s).*?(?=$))", * ".*(?(?<=\\[).*?(?=\\])).*?(?(?<=Remote:).*?(?=\\-)).*?(?(?<=\\-).*?(?=;)).*?(?(?<=Enc:\\s).*?(?=$))", *
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237584581 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); +regularExpressionsParser.configure(parserConfig); +JSONObject parsed = parse(message); +// Expected +Map expectedJson = new HashMap<>(); +expectedJson.put("device_name", "deviceName"); +expectedJson.put("dst_process_name", "sshd"); +expectedJson.put("dst_process_id", "11672"); +expectedJson.put("dst_user_id", "prod"); +expectedJson.put("ip_src_addr", "22.22.22.22"); +expectedJson.put("ip_src_port", "5"); +expectedJson.put("app_protocol", "ssh2"); --- End diff -- Can you also ensure that "timestamp" and "original_string" are correctly added to each message? ---
[GitHub] metron pull request #1245: METRON-1795: Initial Commit for Regular Expressio...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r237585161 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { +regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; + +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); +regularExpressionsParser.configure(parserConfig); +JSONObject parsed = parse(message); +// Expected +Map expectedJson = new HashMap<>(); +expectedJson.put("device_name", "deviceName"); +expectedJson.put("dst_process_name", "sshd"); +expectedJson.put("dst_process_id", "11672"); +expectedJson.put("dst_user_id", "prod"); +expectedJson.put("ip_src_addr", "22.22.22.22"); +expectedJson.put("ip_src_port", "5"); +expectedJson.put("app_protocol", "ssh2"); +assertTrue(validate(expectedJson, parsed)); + + } + + @Test + public void testNoMessageHeaderRegex() throws Exception { +String message = +"<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 5 ssh2"; +parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsNoMessageHeaderParserConfig.json") +.toString()); +regularExpressionsParser.configure(parserConfig); +JSONObject parsed = parse(message); +// Expected +Map expectedJson = new HashMap<>(); +expectedJson.put("dst_process_name", "sshd"); +expectedJson.put("dst_process_id", "11672"); +expectedJson.put("dst_user_id", "prod"); +expectedJson.put("ip_src_addr", "22.22.22.22"); +expectedJson.put("ip_src_port", "5"); +expectedJson.put("app_protocol", "ssh2"); +assertTrue(validate(expectedJson, parsed)); --- End diff -- I don't get why we need this method 'validate' which seems rather complex. Can't we just let Junit do this? Instead of building your expected message and then calling validate, you would just do this... ``` assertEquals("5", parsed.get("ip_src_port")); ``` ---
[GitHub] metron issue #1259: METRON-1867 Remove `/api/v1/update/replace` endpoint
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1259 Closing. This is replaced by #1284 . ---
[GitHub] metron pull request #1284: METRON-1867 Remove `/api/v1/update/replace` endpo...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1284 METRON-1867 Remove `/api/v1/update/replace` endpoint The `/api/v1/update/replace` endpoint is no longer used. This is dead code and should be removed. I have tried to isolate the changes from #1259 on to master to help reviewers. This replaces #1259. ## Testing 1. Spin-up Full Dev. 1. Open the Alerts UI. * View alerts. * Escalate an alert. * Comment on an alert. * Create a meta-alert. * Escalate a meta-alert. 1. Run the UI e2e tests. ## Pull Request Checklist - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1867-v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1284 commit 89d1d5742005d7869728ae7a67c96acc26ce651b Author: Nick Allen Date: 2018-11-12T17:24:40Z METRON-1867 Remove `/api/v1/update/replace` endpoint commit 428a12f5ad883c0aa19e1c9040dee2070a97a6a4 Author: Nick Allen Date: 2018-11-29T15:51:12Z Trying to keep the changes as focused on remove the ReplaceRequest as possible ---
[GitHub] metron-bro-plugin-kafka issue #20: METRON-1910: bro plugin segfaults on src/...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/20 Can you provide some commentary on the root cause and your solution? ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r236844166 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriter.java --- @@ -56,90 +58,107 @@ */ private transient ElasticsearchClient client; + /** + * Responsible for writing documents. + * + * Uses a {@link TupleBasedDocument} to maintain the relationship between + * a {@link Tuple} and the document created from the contents of that tuple. If + * a document cannot be written, the associated tuple needs to be failed. + */ + private transient BulkDocumentWriter documentWriter; + /** * A simple data formatter used to build the appropriate Elasticsearch index name. */ private SimpleDateFormat dateFormat; - @Override public void init(Map stormConf, TopologyContext topologyContext, WriterConfiguration configurations) { - Map globalConfiguration = configurations.getGlobalConfig(); -client = ElasticsearchClientFactory.create(globalConfiguration); dateFormat = ElasticsearchUtils.getIndexFormat(globalConfiguration); + +// only create the document writer, if one does not already exist. useful for testing. +if(documentWriter == null) { + client = ElasticsearchClientFactory.create(globalConfiguration); + documentWriter = new ElasticsearchBulkDocumentWriter<>(client); +} } @Override - public BulkWriterResponse write(String sensorType, WriterConfiguration configurations, Iterable tuples, List messages) throws Exception { + public BulkWriterResponse write(String sensorType, + WriterConfiguration configurations, + Iterable tuplesIter, + List messages) { // fetch the field name converter for this sensor type FieldNameConverter fieldNameConverter = FieldNameConverters.create(sensorType, configurations); +String indexPostfix = dateFormat.format(new Date()); +String indexName = ElasticsearchUtils.getIndexName(sensorType, indexPostfix, configurations); + +// the number of tuples must match the number of messages +List tuples = Lists.newArrayList(tuplesIter); +int batchSize = tuples.size(); +if(messages.size() != batchSize) { + throw new IllegalStateException(format("Expect same number of tuples and messages; |tuples|=%d, |messages|=%d", + tuples.size(), messages.size())); +} -final String indexPostfix = dateFormat.format(new Date()); -BulkRequest bulkRequest = new BulkRequest(); -for(JSONObject message: messages) { +// create a document from each message +List documents = new ArrayList<>(); +for(int i=0; i { + List successfulTuples = docs.stream().map(doc -> doc.getTuple()).collect(Collectors.toList()); --- End diff -- Done too. Thanks. ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r236843942 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/bulk/ElasticsearchBulkDocumentWriter.java --- @@ -0,0 +1,184 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.elasticsearch.bulk; + +import org.apache.commons.lang3.exception.ExceptionUtils; +import org.apache.metron.elasticsearch.client.ElasticsearchClient; +import org.apache.metron.indexing.dao.update.Document; +import org.elasticsearch.action.DocWriteRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequest; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Optional; +import java.util.stream.Collectors; + +/** + * Writes documents to an Elasticsearch index in bulk. + * + * @param The type of document to write. + */ +public class ElasticsearchBulkDocumentWriter implements BulkDocumentWriter { + +/** + * A {@link Document} along with the index it will be written to. + */ +private class Indexable { +D document; +String index; + +public Indexable(D document, String index) { +this.document = document; +this.index = index; +} +} + +private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); +private Optional onSuccess; +private Optional onFailure; +private ElasticsearchClient client; +private List documents; + +public ElasticsearchBulkDocumentWriter(ElasticsearchClient client) { +this.client = client; +this.onSuccess = Optional.empty(); +this.onFailure = Optional.empty(); +this.documents = new ArrayList<>(); +} + +@Override +public void onSuccess(SuccessListener onSuccess) { +this.onSuccess = Optional.of(onSuccess); +} + +@Override +public void onFailure(FailureListener onFailure) { +this.onFailure = Optional.of(onFailure); +} + +@Override +public void addDocument(D document, String index) { +documents.add(new Indexable(document, index)); +LOG.debug("Adding document to batch; document={}, index={}", document, index); +} + +@Override +public void write() { +try { +// create an index request for each document +List requests = documents --- End diff -- It turned out to be much cleaner to get rid of the streams. So double-win. Thanks. ---
[GitHub] metron issue #1276: METRON-1888 Default Topology Settings in MPack Cause Pro...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1276 Thanks @justinleet . I added https://issues.apache.org/jira/browse/METRON-1897 for this. ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236771158 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/components/ElasticSearchComponent.java --- @@ -194,35 +215,41 @@ public Client getClient() { return client; } - public BulkResponse add(String indexName, String sensorType, String... docs) throws IOException { + public void add(UpdateDao updateDao, String indexName, String sensorType, String... docs) + throws IOException, ParseException { List d = new ArrayList<>(); Collections.addAll(d, docs); -return add(indexName, sensorType, d); +add(updateDao, indexName, sensorType, d); } - public BulkResponse add(String indexName, String sensorType, Iterable docs) - throws IOException { -BulkRequestBuilder bulkRequest = getClient().prepareBulk(); -for (String doc : docs) { - IndexRequestBuilder indexRequestBuilder = getClient() - .prepareIndex(indexName, sensorType + "_doc"); - - indexRequestBuilder = indexRequestBuilder.setSource(doc); - Map esDoc = JSONUtils.INSTANCE - .load(doc, JSONUtils.MAP_SUPPLIER); - indexRequestBuilder.setId((String) esDoc.get(Constants.GUID)); - Object ts = esDoc.get("timestamp"); - if (ts != null) { -indexRequestBuilder = indexRequestBuilder.setTimestamp(ts.toString()); - } - bulkRequest.add(indexRequestBuilder); -} + public void add(UpdateDao updateDao, String indexName, String sensorType, Iterable docs) --- End diff -- > To that end, if we're looking to route all of this through the ES component in that fashion, it might make sense to simply replace the internal private Client client; and instead use the new desired IndexDao for the proxied calls to ES. I don't think we're ready to do all that quite yet. There is still some legacy functionality in `ElasticSearchComponent` that uses the underlying client for the old Admin API. See [close](https://github.com/apache/metron/blob/fcd644ca77394d48d460c460b672a23d6594f49b/metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/components/ElasticSearchComponent.java#L292), [createIndexWithMapping](https://github.com/apache/metron/blob/fcd644ca77394d48d460c460b672a23d6594f49b/metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/components/ElasticSearchComponent.java#L228), and [start](https://github.com/apache/metron/blob/fcd644ca77394d48d460c460b672a23d6594f49b/metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/components/ElasticSearchComponent.java#L153). ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236767632 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/ElasticsearchSearchIntegrationTest.java --- @@ -97,48 +118,81 @@ protected static InMemoryComponent startIndex() throws Exception { return es; } - protected static void loadTestData() throws ParseException, IOException { + protected static void loadTestData() throws Exception { ElasticSearchComponent es = (ElasticSearchComponent) indexComponent; +// define the bro index template +String broIndex = "bro_index_2017.01.01.01"; JSONObject broTemplate = JSONUtils.INSTANCE.load(new File(broTemplatePath), JSONObject.class); addTestFieldMappings(broTemplate, "bro_doc"); - es.getClient().admin().indices().prepareCreate("bro_index_2017.01.01.01") -.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); +es.getClient().admin().indices().prepareCreate(broIndex) +.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); + +// define the snort index template +String snortIndex = "snort_index_2017.01.01.02"; JSONObject snortTemplate = JSONUtils.INSTANCE.load(new File(snortTemplatePath), JSONObject.class); addTestFieldMappings(snortTemplate, "snort_doc"); - es.getClient().admin().indices().prepareCreate("snort_index_2017.01.01.02") -.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); - -BulkRequestBuilder bulkRequest = es.getClient().prepareBulk() -.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); -JSONArray broArray = (JSONArray) new JSONParser().parse(broData); -for (Object o : broArray) { - JSONObject jsonObject = (JSONObject) o; - IndexRequestBuilder indexRequestBuilder = es.getClient() - .prepareIndex("bro_index_2017.01.01.01", "bro_doc"); - indexRequestBuilder = indexRequestBuilder.setId((String) jsonObject.get("guid")); - indexRequestBuilder = indexRequestBuilder.setSource(jsonObject.toJSONString()); - indexRequestBuilder = indexRequestBuilder - .setTimestamp(jsonObject.get("timestamp").toString()); - bulkRequest.add(indexRequestBuilder); +es.getClient().admin().indices().prepareCreate(snortIndex) +.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); + +// setup the classes required to write the test data +AccessConfig accessConfig = createAccessConfig(); +ElasticsearchClient client = ElasticsearchUtils.getClient(createGlobalConfig()); +ElasticsearchRetrieveLatestDao retrieveLatestDao = new ElasticsearchRetrieveLatestDao(client); +ElasticsearchColumnMetadataDao columnMetadataDao = new ElasticsearchColumnMetadataDao(client); +ElasticsearchRequestSubmitter requestSubmitter = new ElasticsearchRequestSubmitter(client); +ElasticsearchUpdateDao updateDao = new ElasticsearchUpdateDao(client, accessConfig, retrieveLatestDao); +ElasticsearchSearchDao searchDao = new ElasticsearchSearchDao(client, accessConfig, columnMetadataDao, requestSubmitter); --- End diff -- Yes! That worked. Much cleaner. Thanks ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236765451 --- Diff: metron-platform/metron-solr/src/test/java/org/apache/metron/solr/integration/SolrUpdateIntegrationTest.java --- @@ -186,4 +195,114 @@ public void testHugeErrorFields() throws Exception { exception.expectMessage("Document contains at least one immense term in field=\"error_hash\""); getDao().update(errorDoc, Optional.of("error")); } + + @Test + @Override + public void test() throws Exception { +List> inputData = new ArrayList<>(); +for(int i = 0; i < 10;++i) { + final String name = "message" + i; + inputData.add( + new HashMap() {{ +put("source.type", SENSOR_NAME); +put("name" , name); +put("timestamp", System.currentTimeMillis()); +put(Constants.GUID, name); + }} + ); +} +addTestData(getIndexName(), SENSOR_NAME, inputData); +List> docs = null; +for(int t = 0;t < MAX_RETRIES;++t, Thread.sleep(SLEEP_MS)) { + docs = getIndexedTestData(getIndexName(), SENSOR_NAME); + if(docs.size() >= 10) { +break; + } +} +Assert.assertEquals(10, docs.size()); +//modify the first message and add a new field +{ + Map message0 = new HashMap(inputData.get(0)) {{ +put("new-field", "metron"); + }}; + String guid = "" + message0.get(Constants.GUID); + Document update = getDao().replace(new ReplaceRequest(){{ +setReplacement(message0); +setGuid(guid); +setSensorType(SENSOR_NAME); +setIndex(getIndexName()); + }}, Optional.empty()); + + Assert.assertEquals(message0, update.getDocument()); + Assert.assertEquals(1, getMockHTable().size()); + findUpdatedDoc(message0, guid, SENSOR_NAME); + { +//ensure hbase is up to date +Get g = new Get(HBaseDao.Key.toBytes(new HBaseDao.Key(guid, SENSOR_NAME))); +Result r = getMockHTable().get(g); +NavigableMap columns = r.getFamilyMap(CF.getBytes()); +Assert.assertEquals(1, columns.size()); +Assert.assertEquals(message0 +, JSONUtils.INSTANCE.load(new String(columns.lastEntry().getValue()) +, JSONUtils.MAP_SUPPLIER) +); + } + { +//ensure ES is up-to-date --- End diff -- No longer a problem. ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236765044 --- Diff: metron-platform/metron-solr/src/test/java/org/apache/metron/solr/integration/SolrUpdateIntegrationTest.java --- @@ -186,4 +195,114 @@ public void testHugeErrorFields() throws Exception { exception.expectMessage("Document contains at least one immense term in field=\"error_hash\""); getDao().update(errorDoc, Optional.of("error")); } + + @Test + @Override + public void test() throws Exception { --- End diff -- Ok, I found the problem. With my changes, the test data was getting loaded into both the ElasticsearchDao and the HBaseDao because I was using a MultiIndexDao to load the test data. In master, only Elasticsearch gets loaded with the test data. This was causing some of the test assumptions to fail. I corrected how the data is loaded so it only loads Elasticsearch and matches the existing behavior. Thanks for pointing this out. ---
[GitHub] metron pull request #1254: METRON-1849 Elasticsearch Index Write Functionali...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1254#discussion_r236738173 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/bulk/ElasticsearchBulkDocumentWriter.java --- @@ -0,0 +1,184 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.elasticsearch.bulk; + +import org.apache.commons.lang3.exception.ExceptionUtils; +import org.apache.metron.elasticsearch.client.ElasticsearchClient; +import org.apache.metron.indexing.dao.update.Document; +import org.elasticsearch.action.DocWriteRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequest; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Optional; +import java.util.stream.Collectors; + +/** + * Writes documents to an Elasticsearch index in bulk. + * + * @param The type of document to write. + */ +public class ElasticsearchBulkDocumentWriter implements BulkDocumentWriter { + +/** + * A {@link Document} along with the index it will be written to. + */ +private class Indexable { +D document; +String index; + +public Indexable(D document, String index) { +this.document = document; +this.index = index; +} +} + +private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); +private Optional onSuccess; +private Optional onFailure; +private ElasticsearchClient client; +private List documents; + +public ElasticsearchBulkDocumentWriter(ElasticsearchClient client) { +this.client = client; +this.onSuccess = Optional.empty(); +this.onFailure = Optional.empty(); +this.documents = new ArrayList<>(); +} + +@Override +public void onSuccess(SuccessListener onSuccess) { +this.onSuccess = Optional.of(onSuccess); +} + +@Override +public void onFailure(FailureListener onFailure) { +this.onFailure = Optional.of(onFailure); +} + +@Override +public void addDocument(D document, String index) { +documents.add(new Indexable(document, index)); +LOG.debug("Adding document to batch; document={}, index={}", document, index); +} + +@Override +public void write() { +try { +// create an index request for each document +List requests = documents --- End diff -- I usually think of us as IO bound, rather than compute, in Indexing. But I don't mind refactoring out the use of streams, just to avoid any potential problems. ---
[GitHub] metron pull request #1280: METRON-1869 Unable to Sort an Escalated Meta Aler...
GitHub user nickwallen reopened a pull request: https://github.com/apache/metron/pull/1280 METRON-1869 Unable to Sort an Escalated Meta Alert This fixes a bug that causes meta-alerts to not be visible in the Alerts UI when the UI is sorted by 'alert_status' and a meta-alert has been escalated. This is only a problem when indexing into Elasticsearch. The root cause is that the 'alert_status' field needs to be defined as a keyword in the metaalert index template. This field only exists in the meta-alert index after a meta-alert has changed status, like when an alert is escalated. ## Changes * Fixed the Elasticsearch meta-alerts template that is deployed with the MPack. * Added an integration test for this bug. ## Testing Follow the "steps to reproduce" outlined in the JIRA to ensure that the bug has been squashed. 1. Create a meta-alert. 2. Escalate the meta-alert. 3. Submit another search that filters the results to only show meta-alerts. 4. Sort the results by âalert_statusâ field. 5. The meta-alert should still be visible. ## Pull Request Checklist - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1869 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1280 commit d2f56b4f41bead7b3490cb81bdce553359ce09fc Author: Nick Allen Date: 2018-11-14T13:34:17Z Attempting to create test that highlights the problem commit 70d1dc4bea3992f8e94eb855c36c99fde7f6cad9 Author: Nick Allen Date: 2018-11-26T19:41:21Z METRON-1869 Fix missing alert_status field in meta-alert template commit 09558fb5fa5eb24cebfc974170fd14abf047e2e5 Author: Nick Allen Date: 2018-11-26T20:51:43Z Completed integration test that highlights the problem commit afb87ba0b4efbc74913698e123c5661a42ac7f45 Author: Nick Allen Date: 2018-11-26T20:58:47Z Testing sorted search results under 3 scenarios commit 60c3af90061ce82a89696c96b9f07b3e4fb35324 Author: Nick Allen Date: 2018-11-26T20:59:49Z Reset ordering ---
[GitHub] metron pull request #1280: METRON-1869 Unable to Sort an Escalated Meta Aler...
Github user nickwallen closed the pull request at: https://github.com/apache/metron/pull/1280 ---
[GitHub] metron issue #1259: METRON-1867 Remove `/api/v1/update/replace` endpoint
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1259 > Why does this depend on 3 other PRs? Shouldn't this come first, or at least be independent of the others? It was just easier to do it this way based on when I ran into the issue that necessitated this. If I go through the effort to rebase this on master and retest, what problem does that solve? Sorry, I just fail to see the problem, but maybe I am missing something here. ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236652983 --- Diff: metron-platform/metron-solr/src/test/java/org/apache/metron/solr/integration/SolrUpdateIntegrationTest.java --- @@ -186,4 +195,114 @@ public void testHugeErrorFields() throws Exception { exception.expectMessage("Document contains at least one immense term in field=\"error_hash\""); getDao().update(errorDoc, Optional.of("error")); } + + @Test + @Override + public void test() throws Exception { --- End diff -- This test was previously shared in UpdateIntegrationTest between ES and Solr. With these changes, the tests don't behave exactly the same anymore. That being said, what I did here doesn't look right. I'll dig into this and see what is going on. ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236644723 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/components/ElasticSearchComponent.java --- @@ -194,35 +215,41 @@ public Client getClient() { return client; } - public BulkResponse add(String indexName, String sensorType, String... docs) throws IOException { + public void add(UpdateDao updateDao, String indexName, String sensorType, String... docs) + throws IOException, ParseException { List d = new ArrayList<>(); Collections.addAll(d, docs); -return add(indexName, sensorType, d); +add(updateDao, indexName, sensorType, d); } - public BulkResponse add(String indexName, String sensorType, Iterable docs) - throws IOException { -BulkRequestBuilder bulkRequest = getClient().prepareBulk(); -for (String doc : docs) { - IndexRequestBuilder indexRequestBuilder = getClient() - .prepareIndex(indexName, sensorType + "_doc"); - - indexRequestBuilder = indexRequestBuilder.setSource(doc); - Map esDoc = JSONUtils.INSTANCE - .load(doc, JSONUtils.MAP_SUPPLIER); - indexRequestBuilder.setId((String) esDoc.get(Constants.GUID)); - Object ts = esDoc.get("timestamp"); - if (ts != null) { -indexRequestBuilder = indexRequestBuilder.setTimestamp(ts.toString()); - } - bulkRequest.add(indexRequestBuilder); -} + public void add(UpdateDao updateDao, String indexName, String sensorType, Iterable docs) --- End diff -- > Might it be better to just use IndexDao, which extends UpdateDao, SearchDao, RetrieveLatestDao, ColumnMetadataDao? Why would that be better? It needs to do updates, so it needs an `UpdateDao`. Seemed logical to me. ---
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236643962 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/ElasticsearchSearchIntegrationTest.java --- @@ -97,45 +118,63 @@ protected static InMemoryComponent startIndex() throws Exception { return es; } - protected static void loadTestData() throws ParseException, IOException { + protected static void loadTestData() throws Exception { ElasticSearchComponent es = (ElasticSearchComponent) indexComponent; +// define the bro index template +String broIndex = "bro_index_2017.01.01.01"; JSONObject broTemplate = JSONUtils.INSTANCE.load(new File(broTemplatePath), JSONObject.class); addTestFieldMappings(broTemplate, "bro_doc"); - es.getClient().admin().indices().prepareCreate("bro_index_2017.01.01.01") -.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); +es.getClient().admin().indices().prepareCreate(broIndex) +.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); + +// define the snort index template +String snortIndex = "snort_index_2017.01.01.02"; JSONObject snortTemplate = JSONUtils.INSTANCE.load(new File(snortTemplatePath), JSONObject.class); addTestFieldMappings(snortTemplate, "snort_doc"); - es.getClient().admin().indices().prepareCreate("snort_index_2017.01.01.02") -.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); - -BulkRequestBuilder bulkRequest = es.getClient().prepareBulk() -.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); -JSONArray broArray = (JSONArray) new JSONParser().parse(broData); -for (Object o : broArray) { - JSONObject jsonObject = (JSONObject) o; - IndexRequestBuilder indexRequestBuilder = es.getClient() - .prepareIndex("bro_index_2017.01.01.01", "bro_doc"); - indexRequestBuilder = indexRequestBuilder.setId((String) jsonObject.get("guid")); - indexRequestBuilder = indexRequestBuilder.setSource(jsonObject.toJSONString()); - indexRequestBuilder = indexRequestBuilder - .setTimestamp(jsonObject.get("timestamp").toString()); - bulkRequest.add(indexRequestBuilder); +es.getClient().admin().indices().prepareCreate(snortIndex) +.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); + +// setup the classes required to write the test data +AccessConfig accessConfig = createAccessConfig(); +ElasticsearchClient client = ElasticsearchUtils.getClient(createGlobalConfig()); +ElasticsearchRetrieveLatestDao retrieveLatestDao = new ElasticsearchRetrieveLatestDao(client); +ElasticsearchColumnMetadataDao columnMetadataDao = new ElasticsearchColumnMetadataDao(client); +ElasticsearchRequestSubmitter requestSubmitter = new ElasticsearchRequestSubmitter(client); +ElasticsearchUpdateDao updateDao = new ElasticsearchUpdateDao(client, accessConfig, retrieveLatestDao); +ElasticsearchSearchDao searchDao = new ElasticsearchSearchDao(client, accessConfig, columnMetadataDao, requestSubmitter); + +// write the test documents for Bro +List broDocuments = new ArrayList<>(); +for (Object broObject: (JSONArray) new JSONParser().parse(broData)) { + broDocuments.add(((JSONObject) broObject).toJSONString()); } -JSONArray snortArray = (JSONArray) new JSONParser().parse(snortData); -for (Object o : snortArray) { - JSONObject jsonObject = (JSONObject) o; - IndexRequestBuilder indexRequestBuilder = es.getClient() - .prepareIndex("snort_index_2017.01.01.02", "snort_doc"); - indexRequestBuilder = indexRequestBuilder.setId((String) jsonObject.get("guid")); - indexRequestBuilder = indexRequestBuilder.setSource(jsonObject.toJSONString()); - indexRequestBuilder = indexRequestBuilder - .setTimestamp(jsonObject.get("timestamp").toString()); - bulkRequest.add(indexRequestBuilder); +es.add(updateDao, broIndex, "bro", broDocuments); + +// write the test documents for Snort +List snortDocuments = new ArrayList<>(); +for (Object snortObject: (JSONArray) new JSONParser().parse(snortData)) { + snortDocuments.add(((JSONObject) snortObject).toJSONString()
[GitHub] metron pull request #1247: METRON-1845 Correct Test Data Load in Elasticsear...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1247#discussion_r236461564 --- Diff: metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/ElasticsearchSearchIntegrationTest.java --- @@ -97,45 +118,63 @@ protected static InMemoryComponent startIndex() throws Exception { return es; } - protected static void loadTestData() throws ParseException, IOException { + protected static void loadTestData() throws Exception { ElasticSearchComponent es = (ElasticSearchComponent) indexComponent; +// define the bro index template +String broIndex = "bro_index_2017.01.01.01"; JSONObject broTemplate = JSONUtils.INSTANCE.load(new File(broTemplatePath), JSONObject.class); addTestFieldMappings(broTemplate, "bro_doc"); - es.getClient().admin().indices().prepareCreate("bro_index_2017.01.01.01") -.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); +es.getClient().admin().indices().prepareCreate(broIndex) +.addMapping("bro_doc", JSONUtils.INSTANCE.toJSON(broTemplate.get("mappings"), false)).get(); + +// define the snort index template +String snortIndex = "snort_index_2017.01.01.02"; JSONObject snortTemplate = JSONUtils.INSTANCE.load(new File(snortTemplatePath), JSONObject.class); addTestFieldMappings(snortTemplate, "snort_doc"); - es.getClient().admin().indices().prepareCreate("snort_index_2017.01.01.02") -.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); - -BulkRequestBuilder bulkRequest = es.getClient().prepareBulk() -.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); -JSONArray broArray = (JSONArray) new JSONParser().parse(broData); -for (Object o : broArray) { - JSONObject jsonObject = (JSONObject) o; - IndexRequestBuilder indexRequestBuilder = es.getClient() - .prepareIndex("bro_index_2017.01.01.01", "bro_doc"); - indexRequestBuilder = indexRequestBuilder.setId((String) jsonObject.get("guid")); - indexRequestBuilder = indexRequestBuilder.setSource(jsonObject.toJSONString()); - indexRequestBuilder = indexRequestBuilder - .setTimestamp(jsonObject.get("timestamp").toString()); - bulkRequest.add(indexRequestBuilder); +es.getClient().admin().indices().prepareCreate(snortIndex) +.addMapping("snort_doc", JSONUtils.INSTANCE.toJSON(snortTemplate.get("mappings"), false)).get(); + +// setup the classes required to write the test data +AccessConfig accessConfig = createAccessConfig(); +ElasticsearchClient client = ElasticsearchUtils.getClient(createGlobalConfig()); +ElasticsearchRetrieveLatestDao retrieveLatestDao = new ElasticsearchRetrieveLatestDao(client); +ElasticsearchColumnMetadataDao columnMetadataDao = new ElasticsearchColumnMetadataDao(client); +ElasticsearchRequestSubmitter requestSubmitter = new ElasticsearchRequestSubmitter(client); +ElasticsearchUpdateDao updateDao = new ElasticsearchUpdateDao(client, accessConfig, retrieveLatestDao); +ElasticsearchSearchDao searchDao = new ElasticsearchSearchDao(client, accessConfig, columnMetadataDao, requestSubmitter); + +// write the test documents for Bro +List broDocuments = new ArrayList<>(); +for (Object broObject: (JSONArray) new JSONParser().parse(broData)) { + broDocuments.add(((JSONObject) broObject).toJSONString()); } -JSONArray snortArray = (JSONArray) new JSONParser().parse(snortData); -for (Object o : snortArray) { - JSONObject jsonObject = (JSONObject) o; - IndexRequestBuilder indexRequestBuilder = es.getClient() - .prepareIndex("snort_index_2017.01.01.02", "snort_doc"); - indexRequestBuilder = indexRequestBuilder.setId((String) jsonObject.get("guid")); - indexRequestBuilder = indexRequestBuilder.setSource(jsonObject.toJSONString()); - indexRequestBuilder = indexRequestBuilder - .setTimestamp(jsonObject.get("timestamp").toString()); - bulkRequest.add(indexRequestBuilder); +es.add(updateDao, broIndex, "bro", broDocuments); + +// write the test documents for Snort +List snortDocuments = new ArrayList<>(); +for (Object snortObject: (JSONArray) new JSONParser().parse(snortData)) { + snortDocuments.add(((JSONObject) snortObject).toJSONString()
[GitHub] metron pull request #1280: METRON-1869 Unable to Sort an Escalated Meta Aler...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1280 METRON-1869 Unable to Sort an Escalated Meta Alert This fixes a bug that causes meta-alerts to not be visible in the Alerts UI when the UI is sorted by 'alert_status' and a meta-alert has been escalated. This is only a problem when indexing into Elasticsearch. The root cause is that the 'alert_status' field needs to be defined as a keyword in the metaalert index template. This field only exists in the meta-alert index after a meta-alert has changed status, like when an alert is escalated. ## Changes * Fixed the Elasticsearch meta-alerts template that is deployed with the MPack. * Added an integration test for this bug. ## Testing Follow the "steps to reproduce" outlined in the JIRA to ensure that the bug has been squashed. 1. Create a meta-alert. 2. Escalate the meta-alert. 3. Submit another search that filters the results to only show meta-alerts. 4. Sort the results by âalert_statusâ field. 5. The meta-alert should still be visible. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1869 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1280 commit d2f56b4f41bead7b3490cb81bdce553359ce09fc Author: Nick Allen Date: 2018-11-14T13:34:17Z Attempting to create test that highlights the problem commit 70d1dc4bea3992f8e94eb855c36c99fde7f6cad9 Author: Nick Allen Date: 2018-11-26T19:41:21Z METRON-1869 Fix missing alert_status field in meta-alert template commit 09558fb5fa5eb24cebfc974170fd14abf047e2e5 Author: Nick Allen Date: 2018-11-26T20:51:43Z Completed integration test that highlights the problem commit afb87ba0b4efbc74913698e123c5661a42ac7f45 Author: Nick Allen Date: 2018-11-26T20:58:47Z Testing sorted search results under 3 scenarios commit 60c3af90061ce82a89696c96b9f07b3e4fb35324 Author: Nick Allen Date: 2018-11-26T20:59:49Z Reset ordering ---
[GitHub] metron issue #1261: METRON-1860 [WIP] new developer option for ansible in do...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1261 hey @ottobackwards - I am still wrapping my head around this, but one small nit is that I don't like all the confirmation prompts. With the prompts I have to constantly check back to ensure it is not stuck on another prompt waiting for me to do something. As much as possible it should just be fire-and-forget, so I can run it, work on something else, and hopefully come back some time later with a functioning Metron install. ---
[GitHub] metron-bro-plugin-kafka issue #19: METRON-1885: Remove version from bro plug...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/19 +1 Looks good. Thanks FYI - Tested with this [Dockerfile](https://gist.github.com/nickwallen/3bf2bd4907ae55484b7767066359ffca). ``` docker build . --tag=bro-testing docker run -it bro-testing /bin/bash ``` ``` [root@0998ab0f716a ~]# bro-pkg install http://github.com/JonZeolla/metron-bro-plugin-kafka --version METRON-1885 The following packages will be INSTALLED: http://github.com/JonZeolla/metron-bro-plugin-kafka (METRON-1885) Verify the following REQUIRED external dependencies: (Ensure their installation on all relevant systems before proceeding): from http://github.com/JonZeolla/metron-bro-plugin-kafka (METRON-1885): librdkafka ~0.11.5 Proceed? [Y/n] Y http://github.com/JonZeolla/metron-bro-plugin-kafka asks for LIBRDKAFKA_ROOT (Path to librdkafka installation tree) ? [/usr/local/lib] Saved answers to config file: /root/.bro-pkg/config Running unit tests for "http://github.com/JonZeolla/metron-bro-plugin-kafka; all 10 tests successful Installing "http://github.com/JonZeolla/metron-bro-plugin-kafka; Installed "http://github.com/JonZeolla/metron-bro-plugin-kafka; (METRON-1885) Loaded "http://github.com/JonZeolla/metron-bro-plugin-kafka; [root@0998ab0f716a ~]# bro -N Apache::Kafka Apache::Kafka - Writes logs to Kafka (dynamic, version 0.3) ``` ---
[GitHub] metron-bro-plugin-kafka issue #19: METRON-1885: Remove version from bro plug...
Github user nickwallen commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/19 I am looking at this one now. Just trying to get a Dockerfile put together that I can use to test changes like this. I have manually built up a test environment one too many times now. :) ---
[GitHub] metron pull request #1278: METRON-1892 Parser Debugger Should Load Config Fr...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1278 METRON-1892 Parser Debugger Should Load Config From Zookeeper When using the parser debugger functions created in #1265, the user has to manually specify the parser configuration. This is useful when testing out a new parser before it goes live. In other cases, it would be simpler to use the sensor configuration values that are 'live' and already loaded in Zookeeper. For example, I might want to test why a particular messages fails to parse in my environment. ## Try It Out Try out the following examples in the Stellar REPL. 1. Launch a development environment. 1. Launch the REPL. ``` source /etc/default/metron cd $METRON_HOME bin/stellar -z $ZOOKEEPER ``` ### Parse a Message 1. Grab a message from the input topic to parse. You could also just mock-up a message that you would like to test. ``` [Stellar]>>> input := KAFKA_GET('bro') [{"http": {"ts":1542313125.807068,"uid":"CUrRne3iLIxXavQtci","id.orig_h"... ``` 1. Initialize the parser. The parser configuration for 'bro' will be loaded automatically from Zookeeper. ``` [Stellar]>>> parser := PARSER_INIT("bro") Parser{0 successful, 0 error(s)} ``` 1. Parse the message. ``` [Stellar]>>> msgs := PARSER_PARSE(parser, input) [{"bro_timestamp":"1542313125.807068","method":"GET","ip_dst_port":8080,... ``` 1. The parser will tally the success. ``` [Stellar]>>> parser Parser{1 successful, 0 error(s)} ``` 1. Review the successfully parsed message. ``` [Stellar]>>> LENGTH(msgs) 1 ``` ``` [Stellar]>>> msg := GET(msgs, 0) [Stellar]>>> MAP_GET("guid", msg) 7f2e0c77-c58c-488e-b1ad-fbec10fb8182 ``` ``` [Stellar]>>> MAP_GET("timestamp", msg) 1542313125807 ``` ``` [Stellar]>>> MAP_GET("source.type", msg) bro ``` ### Missing Configuration 1. If the configuration does not exist in Zookeeper, you should see something like this. I have not configured a parser named 'tuna' in my environment (but I could go for a tuna sandwich right about now ). ``` [Stellar]>>> bad := PARSER_INIT('tuna') [!] Unable to parse: PARSER_INIT('tuna') due to: Unable to read configuration from Zookeeper; sensorType = tuna org.apache.metron.stellar.dsl.ParseException: Unable to parse: PARSER_INIT('tuna') due to: Unable to read configuration from Zookeeper; sensorType = tuna at org.apache.metron.stellar.common.BaseStellarProcessor.createException(BaseStellarProcessor.java:166) at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:154) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:405) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:257) at org.apache.metron.stellar.common.shell.specials.AssignmentCommand.execute(AssignmentCommand.java:66) at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:252) at org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:359) at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Unable to read configuration from Zookeeper; sensorType = tuna at org.apache.metron.management.ParserFunctions$InitializeFunction.readFromZookeeper(ParserFunctions.java:103) at org.apache.metron.management.ParserFunctions$InitializeFunction.apply(ParserFunctions.java:66) at org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:661) at org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:259) at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151) ... 9 more
[GitHub] metron issue #1267: METRON-1873: Update Bootstrap version in Management UI
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1267 +1 Look good to me, pending @justinleet 's review. Thanks @sardell ! ---
[GitHub] metron pull request #1267: METRON-1873: Update Bootstrap version in Manageme...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1267#discussion_r235437430 --- Diff: metron-interface/metron-config/package.json --- @@ -40,14 +40,14 @@ "@types/node": "^10.9.4", "@types/tether": "^1.1.27", "ace-builds": "^1.2.5", -"bootstrap": "4.0.0-alpha.5", +"bootstrap": "^4.1.3", "core-js": "^2.5.7", "font-awesome": "^4.6.3", "jquery": "^3.3.1", "karma-phantomjs-launcher": "^1.0.4", +"popper.js": "^1.14.4", --- End diff -- Oops. Way ahead of me! Looks reasonable. ---
[GitHub] metron pull request #1267: METRON-1873: Update Bootstrap version in Manageme...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1267#discussion_r235436109 --- Diff: metron-interface/metron-config/package.json --- @@ -40,14 +40,14 @@ "@types/node": "^10.9.4", "@types/tether": "^1.1.27", "ace-builds": "^1.2.5", -"bootstrap": "4.0.0-alpha.5", +"bootstrap": "^4.1.3", "core-js": "^2.5.7", "font-awesome": "^4.6.3", "jquery": "^3.3.1", "karma-phantomjs-launcher": "^1.0.4", +"popper.js": "^1.14.4", --- End diff -- This is a new dependency for us, correct? Can you comment on the project license, size of the community backing this, etc? I am sure you have already considered these points, but we should have this discussion in the community for new dependencies. ---
[GitHub] metron issue #1274: METRON-1887: Add logging to the ClasspathFunctionResolve...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1274 One nit on a null check, but that is a pre-existing condition and does not seem to be directly related with the problem at hand. Dealer's choice on that one. Either way, +1 looks good. ---
[GitHub] metron pull request #1274: METRON-1887: Add logging to the ClasspathFunction...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1274#discussion_r235121753 --- Diff: metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/common/utils/VFSClassloaderUtil.java --- @@ -112,14 +112,18 @@ public static FileSystemManager generateVfs() throws FileSystemException { * @throws FileSystemException */ public static Optional configureClassloader(String paths) throws FileSystemException { +LOG.debug("Configuring class loader with paths = {}", paths); if(paths.trim().isEmpty()) { --- End diff -- Should we add a null check here, just in case? Or use StringUtils.isBlank? ---
[GitHub] metron issue #1249: METRON-1815: Separate metron-parsers into metron-parsers...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1249 Sorry to beat a dead horse, but I was giving this a once over and saw `metron-parsing-framework-storm`. Could we go with the more concise `metron-parsing-storm`? I'd prefer this because it matches what we already have in the Profiler; `metron-profiler-common`, `metron-profiler-spark`, `metron-profiler-storm`, `metron-profiler-repl`. I'd like to keep the same convention for naming projects. We've already decided that the words`parsing` and `parsers` are different enough to denote the differences between the projects. I don't see any new information being conveyed by the word 'framework'. IMHO. ---
[GitHub] metron pull request #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1265#discussion_r235108449 --- Diff: metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java --- @@ -87,13 +87,13 @@ public boolean isError() { protected transient Consumer onSuccess; protected transient Consumer onError; - private HashSet sensorTypes; + private Set sensorTypes; --- End diff -- I backed this out. Good catch. ---
[GitHub] metron pull request #1276: METRON-1888 Default Topology Settings in MPack Ca...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1276 METRON-1888 Default Topology Settings in MPack Cause Profiler to Stall The default max spout pending set by the Mpack for the Profiler topology is 300. This value is too low and causes the topology to stall and stop consuming messages. The Profiler topology is the only Metron topology that uses Storm's windowing support. Storm will queue up messages for the window (by default 30 seconds) and process each of those windows as one unit. This causes Storm to need to "hang on" to tuples longer than the other topologies and this requires a much greater max spout pending value. This PR restores the original default value prior to #1221 that changed this value. Oops. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1888 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1276.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1276 commit 561232f88b1f6ac9e03178bfe0ea6111c112a717 Author: Nick Allen Date: 2018-11-20T17:39:59Z METRON-1888 Default Topology Settings in MPack Cause Profiler to Stall ---
[GitHub] metron pull request #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1265#discussion_r235049017 --- Diff: metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java --- @@ -87,13 +87,13 @@ public boolean isError() { protected transient Consumer onSuccess; protected transient Consumer onError; - private HashSet sensorTypes; + private Set sensorTypes; --- End diff -- I did not intend for this change to be included in this PR. Let me back this out. ---
[GitHub] metron issue #1269: METRON-1879 Allow Elasticsearch to Auto-Generate the Doc...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1269 This issue has been fixed. ---
[GitHub] metron issue #1269: METRON-1879 Allow Elasticsearch to Auto-Generate the Doc...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1269 I was doing some final testing and noticed that the 'guid' field for metaalerts is showing the document ID, rather than the guid. I need to fix the presentation of metaalerts in the UI. ![screen shot 2018-11-19 at 10 56 19 am](https://user-images.githubusercontent.com/2475409/48718694-e6fc4580-ebe9-11e8-9eeb-2650767b1f55.png) ---
[GitHub] metron issue #1270: METRON-1880 Use Caffeine for Profiler Caching
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1270 It is 'drop-in' in my opinion. I just suggested sanity checks on the Profiler in the 3 different execution environments; Storm, REPL, and Spark. ---
[GitHub] metron issue #1268: METRON-1877: Nested IF ELSE statements can cause parse e...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1268 > @justinleet: Is it possible that the tests are more stable if we get in #1247? I am not sure. There might be an issue with ES's Bulk Writer refresh policy (or the way that we use it) that we are relying on in those tests. I ended up not relying on it for #1247 which may help, but may not solve the entire issue if that is the root cause. That being said, this is all speculation on my part. ---
[GitHub] metron issue #1272: METRON-1884 Updated to copy the new enrichment-unified.p...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1272 @aurelienduarte Thanks for the contribution. We need a few things to be able to review this. 1. Can you better describe the problem that you are solving with this patch? In [METRON-1884](https://issues.apache.org/jira/browse/METRON-1884) you say that "the enrichment.properties is missing from the Storm image." What error message do you see that tells you it is missing? What commands do you run and what error messages result? What are you trying to do that you cannot do? 2. How do we test this change? What commands should we run and what should we expect to see? ---
[GitHub] metron pull request #1270: METRON-1880 Use Caffeine for Profiler Caching
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1270 METRON-1880 Use Caffeine for Profiler Caching Caffeine is a more performant cache when cache sizes are larger. We've already seen a significant improvement elsewhere in Metron based on the work done in #947. The Profiler should use Caffeine for caching to ensure that the caches are not a performance bottleneck. ## Testing 1. Spin-up Full Dev. 1. Create a profile using the Stellar REPL. 1. Deploy the profile for the Storm-based Profiler. 1. Ensure valid profile data is persisted in HBase. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1880 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1270 commit 1860b064d6da7d97148507e8235ea825903ac57c Author: Nick Allen Date: 2018-11-17T21:20:55Z METRON-1880 Use Caffeine for Profiler Caching ---
[GitHub] metron pull request #1269: METRON-1677 UUIDv4 GUID is not Lucene friendly
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1269 METRON-1677 UUIDv4 GUID is not Lucene friendly With this change, when documents are written to Elasticsearch the document ID is no longer set as the Metron GUID, but instead left unset so that Elasticsearch can auto-generate it. Doing this improves write performance into Elasticsearch. This will also be the case for any Lucene based Indexer, including Solr. This work only covers Elasticsearch, but the same should be done for Solr as part of a separate effort. While the default table view looks the same, in the following screenshot I customized the table to show both the document ID and the GUID. This change is dependent on the following open pull requests. - [ ] #1247 - [ ] #1254 - [ ] #1259 ## Changes * The `ElasticsearchRetrieveLatestDao` was updated since the GUID is no longer the document ID. This instead does a terms query on the GUID field instead of an ID query. * The `Document` class now contains an optional documentID field. If the `Document` is retrieved from one of the DAOs this field will be populated. When creating a new document, this field will be empty. * Many of the integrations tests had to be updated because the GUID and document ID are now different. * The Alert UI was updated so that it visually looks the same. By default, the Metron GUID is still shown as one of the first columns in the table. * The table is actually showing the document's GUID field instead of the document ID as it was before. The ID field remains, which contains the document ID generated by Elasticsearch. The user can choose to add this to the table, if they like. ## Testing 1. Spin-up Full Dev. 1. Open up the Alerts UI and perform the following basic actions. * Search for alerts * Escalate an alert * Comment on an alert * Delete a comment from an alert * Create a meta-alert * Escalate a meta-alert 1. Click on the configure wheel and add the 'id' field to the table view. This will now display both the GUID and document ID in the table. They of course will be different. 1. Click on the 'guid' field in any row to filter the search results by the guid. ![screen shot 2018-11-14 at 2 44 21 pm](https://user-images.githubusercontent.com/2475409/48646597-53433300-e9b7-11e8-870b-f061af8cca47.png) 1. Click on the 'id' field to filter the search results by the document ID. ![screen shot 2018-11-14 at 2 44 08 pm](https://user-images.githubusercontent.com/2475409/48646566-3a3a8200-e9b7-11e8-8370-f596346d4a62.png) 1. Group by some fields to drill into the data. In the tree view, click on the 'guid' column and ensure the data sorts correctly. Do the same for the 'id' column that was added. ![screen shot 2018-11-14 at 2 46 43 pm](https://user-images.githubusercontent.com/2475409/48646519-1414e200-e9b7-11e8-96a2-50c568d909b2.png) ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1677 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1269 commit a7c7dc287b4f9c99c6780b934a0b6f433a03aa04 Author: cstella Date: 2018-10-09T00:06:52Z Casey Stella - elasticsearch r
[GitHub] metron issue #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1265 I was also wanting to do a follow-on that would load the parser configuration from Zk instead of requiring you to define that. In some cases, you just want to work with whatever the current 'live' configuration is. ---
[GitHub] metron issue #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1265 > @ottobackwards maybe that can be a follow on, it would be much better to load from disk -> split lines than to open an editor and cut and past from your sample log. Yep, agreed. It could just be a general purpose function that reads the contents of a file into a String. (Assuming we don't already have something like that.) ---
[GitHub] metron issue #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1265 > @ottobackwards: Can we test parser chains? I don't think so at the moment. Aggregation/chaining involves multiple parsers, each reading/writing data from Kafka. These functions work at the level of 1 parser and 1 `ParserRunner`. Thanks to @justinleet for my quick education on the topic. ---
[GitHub] metron issue #1265: METRON-1874 Create a Parser Debugger
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1265 > @ottobackwards: Can we load files from disk? It would be nice to not have had to setup kafka etc. There is no requirement to setup Kafka. All you have to do is create a String that contains your message. If you want to get that message from Kafka, create it yourself using `SHELL_EDIT`, or copy-paste, then it works all the same. In this PR you cannot load from disk. ---
[GitHub] metron issue #1262: METRON-1871 Cannot Run Elasticsearch Integration Tests i...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1262 This was taken care of in #1242 . No need. ---
[GitHub] metron pull request #1262: METRON-1871 Cannot Run Elasticsearch Integration ...
Github user nickwallen closed the pull request at: https://github.com/apache/metron/pull/1262 ---
[GitHub] metron issue #1247: METRON-1845 Correct Test Data Load in Elasticsearch Inte...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1247 I have merged this with master to pull in the changes from #1242. This is ready for review. ---
[GitHub] metron pull request #1265: METRON-1874 Create a Parser Debugger
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1265 METRON-1874 Create a Parser Debugger Users should be able to parse messages in a controlled environment like the Stellar REPL. This can help troubleshoot issues that are occurring at high velocity in a live Parser topology. * This could help a user understand why a deployed parser is not working as they would expect. * This can help a user to test their parser configuration before deploying to their live Metron cluster. * This can help a user understand why a particular message is failing to parse successfully. ## Try It Out ### Parse a Message 1. Define the parser configuration. ``` [Stellar]>>> config := SHELL_EDIT() { "parserClassName":"org.apache.metron.parsers.bro.BasicBroParser", "filterClassName":"org.apache.metron.parsers.filters.StellarFilter", "sensorTopic":"bro" } ``` 1. Grab a message from the input topic to parse. You could also just mock-up a message that you would like to test. ``` [Stellar]>>> bro := KAFKA_GET('bro') [{"http": {"ts":1542313125.807068,"uid":"CUrRne3iLIxXavQtci","id.orig_h"... ``` 1. Initialize the parser. The parser keeps track of the number of successes and failures which can be useful when parsing a batch of messages. ``` [Stellar]>>> parser := PARSER_INIT("bro", config) Parser{0 successful, 0 error(s)} ``` 1. Parse the message. ``` [Stellar]>>> msgs := PARSER_PARSE(parser, bro) [{"bro_timestamp":"1542313125.807068","method":"GET","ip_dst_port":8080,... ``` 1. Review the successfully parsed message. ``` [Stellar]>>> LENGTH(msgs) 1 ``` ``` [Stellar]>>> msg := GET(msgs, 0) [Stellar]>>> MAP_GET("guid", msg) 7f2e0c77-c58c-488e-b1ad-fbec10fb8182 ``` ``` [Stellar]>>> MAP_GET("timestamp", msg) 1542313125807 ``` ``` [Stellar]>>> MAP_GET("source.type", msg) bro ``` 1. The parser will tally the success. ``` [Stellar]>>> parser Parser{1 successful, 0 error(s)} ``` ### Parse Multiple Messages 1. Grab 5 raw input messages from Kafka. ``` [Stellar]>>> input := KAFKA_GET("bro", 5) [{"dns": {"ts":1542313125.342913,"uid":"CmJWpN3Ynwsggof57e", ... ``` 1. Parse the messages. ``` [Stellar]>>> msgs := PARSER_PARSE(parser, input) [{"TTLs":[13888.0],"qclass_name":"C_INTERNET", ... ``` 1. Review the parsed messages. There were 5 messages returned and each have a valid GUID as you would expect. ``` [Stellar]>>> LENGTH(msgs) 5 ``` ``` [Stellar]>>> MAP(msgs, m -> MAP_GET("guid", m)) [3b40ab62-ab6a-4dff-86c5-f35cdb2b01ea, 3b5826a7-f2d4-4df2-a28a-ab1f66037b4b, 9fc5f794-26f6-464f-bb99-05fb649ea465, c7162bee-01f9-4cc2-8e26-13101bc22ac1, b86dbb50-cb1d-4889-87ee-3919bcce6fdb] ### Parse an Invalid Message 1. Mock-up a message that will fail to parse. ``` [Stellar]>>> invalid := "{invalid>" {invalid> ``` 1. Parse the invalid message. This will show you the actual exception that occurred along with return the error message that is pushed onto the error topic. ``` [Stellar]>>> errors := PARSER_PARSE(parser, invalid) 2018-11-15 20:29:01 ERROR BasicBroParser:144 - Unable to parse Message: {invalid> Unexpected character (i) at position 1. at org.json.simple.parser.Yylex.yylex(Yylex.java:610) at org.json.simple.parser.JSONParser.nextToken(JSONParser.java:269) at org.json.simple.parser.JSONParser.parse(JSONParser.java:118) at org.json.simple.parser.JSONParser.parse(JSONParser.java:81) at org.json.simple.parser.JSONParser.parse(JSONParser.java:75) at org.apache.metron.parsers.bro.JSONCleaner.clean(JSONCleaner.java:49) at org.apache.metron.parsers.bro.BasicBroParser.parse(BasicBroParser.java:68) at org.apache.metron.parsers.interfaces.MessageParser.parseOptional(MessageParser.java:54) at org.apache
[GitHub] metron issue #1237: METRON-1825: Upgrade bro to 2.5.5
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1237 +1 by inspection. Thanks. ---
[GitHub] metron issue #857: METRON-1340: Improve e2e tests for metron alerts
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/857 I also requested that this be closed in this ticket; https://issues.apache.org/jira/browse/INFRA-17251 ---
[GitHub] metron issue #526: Metron-846: Add E2E tests for metron management ui
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/526 I opened a ticket to get this closed; https://issues.apache.org/jira/browse/INFRA-17251 ---
[GitHub] metron issue #1197: METRON-1778 Out-of-order timestamps may delay flush in S...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1197 @JonZeolla That error is a pre-existing condition that is tracked here;[ METRON-1810](https://issues.apache.org/jira/browse/METRON-1810). I am not sure what the root cause is. ---
[GitHub] metron issue #1237: METRON-1825: Upgrade bro to 2.5.5
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1237 Have you spun this up in Full Dev and tested? ---
[GitHub] metron issue #1264: METRON-1872: Move rat plugin away from snapshot version
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1264 +1 Thanks @justinleet ---