[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...

2018-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/1289


---


[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...

2018-12-06 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1289#discussion_r239577883
  
--- Diff: 
metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/ProfilePeriod.java
 ---
@@ -151,6 +152,8 @@ public String toString() {
 return "ProfilePeriod{" +
 "period=" + period +
 ", durationMillis=" + durationMillis +
+", startTime=" + 
Instant.ofEpochMilli(getStartTimeMillis()).toString() +
+", endTime=" + 
Instant.ofEpochMilli(getEndTimeMillis()).toString() +
--- End diff --

I will remove this as it is not necessary.


---


[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...

2018-12-06 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/1289#discussion_r239568177
  
--- Diff: 
metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java
 ---
@@ -315,41 +308,51 @@ public DefaultMessageDistributor 
withPeriodDuration(int duration, TimeUnit units
   }
 
   /**
-   * A listener that is notified when profiles expire from the active 
cache.
+   * Notified synchronously when the active cache is modified.
*/
-  private class ActiveCacheRemovalListener implements 
RemovalListener, Serializable {
+  private class ActiveCacheWriter implements CacheWriter, Serializable {
+
+@Override
+public void write(@Nonnull Integer key, @Nonnull ProfileBuilder value) 
{
+  // do nothing
+}
 
 @Override
-public void onRemoval(@Nullable Integer key, @Nullable ProfileBuilder 
expired, @Nonnull RemovalCause cause) {
-  LOG.warn("Profile expired from active cache; profile={}, entity={}",
-  expired.getDefinition().getProfile(),
-  expired.getEntity());
+public void delete(@Nonnull Integer key, @Nullable ProfileBuilder 
value, @Nonnull RemovalCause cause) {
+  if(cause.wasEvicted()) {
+// add the profile to the expired cache
+expiredCache.put(key, value);
+LOG.info("Profile expired from active cache due to inactivity; 
profile={}, entity={}, cause={}",
--- End diff --

Will this LOG.info cause a lot of churn in the logs? In a production 
scenario, there could be a lot of entities per profile, right? And these could 
be expiring pretty often.  It seems like you could easily hit scenarios where 
thousands of entities are expiring and then clogging the logger up.  This seems 
like more a debugging log than anything else.


---


[GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...

2018-12-05 Thread nickwallen
GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1289

METRON-1810 Storm Profiler Intermittent Test Failure

This PR hopefully resolves some of the more frequent, intermittent 
`ProfilerIntegrationTest` failures.  

### Testing

Run the integration tests and they should not fail.

I have repeatedly run the integration tests on my laptop and have yet to 
see a failure.  I am also repeatedly triggering Travis CI builds on this branch 
to see if any failures occur there.  I can't prove the problem is solved, but I 
am hoping this helps.

### Changes

* This change uses Caffeine's CacheWriter in place of the RemovalListener 
for profile expiration.  This is explained more in the next section.

* Removed the ability to define the cache maintenance executor for the 
`DefaultMessageDistributor`.  This was only needed for testing 
RemovalListeners, which no longer exist.

* I re-jiggered some of the integration tests to provide more information 
when they do fail. I removed the use of `waitOrTimeout` and replaced it with 
`assertEventually`.  I am also using a different style of assertion; 
Hamcrest-y.  These changes should provide additional information and also be 
more consistent with the rest of the code base.

* After removing the dependency that provided `waitOrTimeout`, I had to 
rework some of the project dependencies because multiple versions 
of`httpclient` lib was being pulled in.  This caused the new REST function in 
`stellar-common` to blow up when a Stellar execution environment is loaded 
during the integration test.

* Added additional debug logging for the caches including an estimate of 
the cache size.

 RemovalListener to CacheWriter

Profiles are designed to expire and flush after a fixed period of time; the 
TTL.  The integration tests rely on some of the profile values being flushed by 
this TTL mechanism.  

The TTL mechanism is driven by cache expiration. There is a cache of 
"active" profiles.  This cache is configured to have values timeout after they 
have not been accessed for a fixed period of time.  Once they timeout from the 
active cache, they are placed on an "expired" cache.  Messages are not applied 
to expired profiles, but these expired profiles hang around for a fixed period 
of time to allow them to be flushed.

Previously, a Caffeine 
[RemovalListener](https://github.com/ben-manes/caffeine/wiki/Removal#removal-listeners)
 was used so that when a profile expires from the active cache, it is placed 
into the expired cache. When the tests fail, it seems that the 
`RemovalListener` is not notified in a timely fashion, so the profile doesn't 
make it to the expired cache and so never flushes to be read by the integration 
test.

 In Caffeine, these listeners are notified asynchronously, on a separate 
thread (via ForkJoinPool.commonPool()), not inline with cache reads or writes.  
For running tests that depend on RemovalListener's it is recommended to set the 
cache maintenance executor to something like 
`MoreExecutors.sameThreadExecutor()` so that cache maintenance is executed on 
the main execution thread when `cleanUp` is called.  This was done for the unit 
tests, but was not done for the integration tests. 

The Caffeine Wiki mentions that when notification should be performed 
synchronously, which logically works for this use case, to use a 
[CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead.

> Removal listener operations are executed asynchronously using an 
Executor. The default executor is ForkJoinPool.commonPool() and can be 
overridden via Caffeine.executor(Executor). When the operation must be 
performed synchronously with the removal, use 
[CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer) instead.

This does not negatively impact production performance because the 
`ActiveCacheWriter` only does work on a delete which occurs only rarely when a 
profile stops receiving messages, not on a write.  In addition, these caches 
are very read-heavy as in most cases the cache is only written to when a new 
profile is defined or on topology start-up.

## Pull Request Checklist
- [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
- [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified