[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
Github user asfgit closed the pull request at: https://github.com/apache/metron/pull/1289 ---
[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1289#discussion_r239577883 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/ProfilePeriod.java --- @@ -151,6 +152,8 @@ public String toString() { return "ProfilePeriod{" + "period=" + period + ", durationMillis=" + durationMillis + +", startTime=" + Instant.ofEpochMilli(getStartTimeMillis()).toString() + +", endTime=" + Instant.ofEpochMilli(getEndTimeMillis()).toString() + --- End diff -- I will remove this as it is not necessary. ---
[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/1289#discussion_r239568177 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java --- @@ -315,41 +308,51 @@ public DefaultMessageDistributor withPeriodDuration(int duration, TimeUnit units } /** - * A listener that is notified when profiles expire from the active cache. + * Notified synchronously when the active cache is modified. */ - private class ActiveCacheRemovalListener implements RemovalListener, Serializable { + private class ActiveCacheWriter implements CacheWriter, Serializable { + +@Override +public void write(@Nonnull Integer key, @Nonnull ProfileBuilder value) { + // do nothing +} @Override -public void onRemoval(@Nullable Integer key, @Nullable ProfileBuilder expired, @Nonnull RemovalCause cause) { - LOG.warn("Profile expired from active cache; profile={}, entity={}", - expired.getDefinition().getProfile(), - expired.getEntity()); +public void delete(@Nonnull Integer key, @Nullable ProfileBuilder value, @Nonnull RemovalCause cause) { + if(cause.wasEvicted()) { +// add the profile to the expired cache +expiredCache.put(key, value); +LOG.info("Profile expired from active cache due to inactivity; profile={}, entity={}, cause={}", --- End diff -- Will this LOG.info cause a lot of churn in the logs? In a production scenario, there could be a lot of entities per profile, right? And these could be expiring pretty often. It seems like you could easily hit scenarios where thousands of entities are expiring and then clogging the logger up. This seems like more a debugging log than anything else. ---
[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1289 METRON-1810 Storm Profiler Intermittent Test Failure This PR hopefully resolves some of the more frequent, intermittent `ProfilerIntegrationTest` failures. ### Testing Run the integration tests and they should not fail. I have repeatedly run the integration tests on my laptop and have yet to see a failure. I am also repeatedly triggering Travis CI builds on this branch to see if any failures occur there. I can't prove the problem is solved, but I am hoping this helps. ### Changes * This change uses Caffeine's CacheWriter in place of the RemovalListener for profile expiration. This is explained more in the next section. * Removed the ability to define the cache maintenance executor for the `DefaultMessageDistributor`. This was only needed for testing RemovalListeners, which no longer exist. * I re-jiggered some of the integration tests to provide more information when they do fail. I removed the use of `waitOrTimeout` and replaced it with `assertEventually`. I am also using a different style of assertion; Hamcrest-y. These changes should provide additional information and also be more consistent with the rest of the code base. * After removing the dependency that provided `waitOrTimeout`, I had to rework some of the project dependencies because multiple versions of`httpclient` lib was being pulled in. This caused the new REST function in `stellar-common` to blow up when a Stellar execution environment is loaded during the integration test. * Added additional debug logging for the caches including an estimate of the cache size. RemovalListener to CacheWriter Profiles are designed to expire and flush after a fixed period of time; the TTL. The integration tests rely on some of the profile values being flushed by this TTL mechanism. The TTL mechanism is driven by cache expiration. There is a cache of "active" profiles. This cache is configured to have values timeout after they have not been accessed for a fixed period of time. Once they timeout from the active cache, they are placed on an "expired" cache. Messages are not applied to expired profiles, but these expired profiles hang around for a fixed period of time to allow them to be flushed. Previously, a Caffeine [RemovalListener](https://github.com/ben-manes/caffeine/wiki/Removal#removal-listeners) was used so that when a profile expires from the active cache, it is placed into the expired cache. When the tests fail, it seems that the `RemovalListener` is not notified in a timely fashion, so the profile doesn't make it to the expired cache and so never flushes to be read by the integration test. In Caffeine, these listeners are notified asynchronously, on a separate thread (via ForkJoinPool.commonPool()), not inline with cache reads or writes. For running tests that depend on RemovalListener's it is recommended to set the cache maintenance executor to something like `MoreExecutors.sameThreadExecutor()` so that cache maintenance is executed on the main execution thread when `cleanUp` is called. This was done for the unit tests, but was not done for the integration tests. The Caffeine Wiki mentions that when notification should be performed synchronously, which logically works for this use case, to use a [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead. > Removal listener operations are executed asynchronously using an Executor. The default executor is ForkJoinPool.commonPool() and can be overridden via Caffeine.executor(Executor). When the operation must be performed synchronously with the removal, use [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead. This does not negatively impact production performance because the `ActiveCacheWriter` only does work on a delete which occurs only rarely when a profile stops receiving messages, not on a write. In addition, these caches are very read-heavy as in most cases the cache is only written to when a new profile is defined or on topology start-up. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified