[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 Maybe the issue has to do with our keys, and their distribution as the size get's larger? Maybe when we get larger sizes we get more collisions and end up calling equals() more or something.

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 This should have the equiv. diagram and documentation ( i believe as shown above ) to the original split join strategy. ---

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383791 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + *

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383810 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/**

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @nickwallen Ok, I refactored the abstraction to separate some concerns, name things a bit, and collapse some of the more onerous abstractions. Also updated javadocs. Can you give it

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383754 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/**

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172377136 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/**

[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6

2018-03-05 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/933 Oh, I guess we need to reaffirm. Yes, +1 still stands. ---

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172373203 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/**

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172369461 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/**

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172369480 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + *

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172359339 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + *

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172353404 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/**

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172363362 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/**

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 Ahhh, that makes sense. I bet we were getting killed by small allocations in the caching layer. ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Caffeine doesn't allocate on read, so that would make sense. I saw a [25x boost](https://github.com/google/guava/issues/2063#issuecomment-107169736) (compared to

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I actually suspect GC as well. We adjusted the garbage collector to the G1GC and saw throughput gains, but not nearly the kinds of gains as we got with a drop-in of Caffeine to replace Guava.

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Interesting. Then I guess the size must trigger the read bottleneck as larger than writes. Perhaps it is incurring a lot more GC overhead that causes more collections? The CLQ additions requires

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 In this case, the loader isn't doing anything terribly expensive, though it may in the future (incur a hbase get or some more expensive computation). ---

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to record its size, per segment. When a read occurs, it records that in the queue and then drains it under the segment's lock

[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/924 +1, sorry! ---

[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...

2018-03-05 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/924 @cestella Bump ---

[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6

2018-03-05 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/933 @mmiklavc @cestella Bump ---

[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-05 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r172351786 --- Diff: pom.xml --- @@ -97,7 +97,7 @@ ${base_hadoop_version} ${base_hbase_version} ${base_flume_version} -

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We actually did increase the concurrency level for guava to 64; that is what confused us as well. The hash code is mostly standard, should be evenly distributed (the key is pretty much a POJO).

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to not abuse memory in low concurrent situations. You probably want to increase it to 64 in a heavy workload, which has a

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 The interesting thing that we found was that guava seems to be doing poorly when the # of items in the cache gets large. When we scaled the test down (830 distinct IP addresses chosen randomly and

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 That makes sense. A uniform distribution will, of course, degrades all policies to random replacement so the test is then about how well the implementations handle concurrency. Most often caches

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We were being purposefully unkind to the cache in the tests. The load simulation chose a IP address at random to present, so each IP had an equal probability of being selected. Whereas, in real

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Do you know what the hit rates were, for the same data set, between Guava and Caffeine? The caches use different policies so it is always interesting to see how the handle given workloads. As we

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 I completed some fairly extensive performance testing comparing this new Unified topology against the existing Split-Join implementation. The difference was dramatic. - The Unified

[GitHub] metron pull request #941: METRON-1355: Convert metron-elasticsearch to new i...

2018-03-05 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/941#discussion_r172318991 --- Diff: metron-contrib/metron-docker-e2e/README.md --- @@ -0,0 +1,94 @@ + +# Metron Docker + +Metron Docker E2E is a [Docker

[GitHub] metron issue #941: METRON-1355: Convert metron-elasticsearch to new infrastr...

2018-03-05 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/941 I'm unable to get the integration tests running locally. I've been able to get the docker containers up and running, but ES isn't exposed at localhost, only through the explicit docker-machine

[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread dcode
Github user dcode commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 That'd be great if you wouldn't mind to create a ticket for this. ---

[GitHub] metron-bro-plugin-kafka issue #7: METRON-1324: Increment metron-bro-plugin-k...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/7 # Testing ## Build and install manually Some guideline commands to test: ``` mkdir tmp cd tmp git clone https://github.com/bro/bro cd bro git

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new

[GitHub] metron issue #944: METRON-1463: Adjust the groupings and shuffles in enrichm...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/944 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new

[GitHub] metron issue #947: METRON-1467: Replace guava caches in places where the key...

2018-03-05 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/947 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new

[GitHub] metron pull request #948: METRON-1468: Add support for apache/metron-bro-plu...

2018-03-05 Thread JonZeolla
GitHub user JonZeolla opened a pull request: https://github.com/apache/metron/pull/948 METRON-1468: Add support for apache/metron-bro-plugin-kafka to prepare-commit ## Contributor Comments This updates the prepare-commit script to work with `apache/metron-bro-plugin-kafka`.

[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 It's a part of the `apache/metron` project (of which this is considered a component) and uses the open apache JIRA that I linked above. In order to accept PRs we need to have a

[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread dcode
Github user dcode commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 I haven't created a JIRA ticket. Not sure if that's something internal. ---

[GitHub] metron-bro-plugin-kafka pull request #7: METRON-1324: Increment metron-bro-p...

2018-03-05 Thread JonZeolla
GitHub user JonZeolla opened a pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/7 METRON-1324: Increment metron-bro-plugin-kafka version We have some changes staged to upgrade the plugin, so we should increment the version. You can merge this pull request into

[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 This is really coming together. Is there a

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172248492 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for }

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172243410 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172240860 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229125 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for }

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229215 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for }

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172228973 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for }

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172225023 --- Diff: README.md --- @@ -37,10 +37,11 @@ The following examples highlight different ways that the plugin can be used. Si ###

[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
GitHub user cestella reopened a pull request: https://github.com/apache/metron/pull/947 METRON-1467: Replace guava caches in places where the keyspace might be large (NOTE: Review after METRON-1460) ## Contributor Comments Based on the performance tuning exercise as part of

[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
Github user cestella closed the pull request at: https://github.com/apache/metron/pull/947 ---

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172192869 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for

[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172193204 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for