[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...

2017-07-24 Thread ben-manes
Github user ben-manes commented on a diff in the pull request:

https://github.com/apache/storm/pull/2218#discussion_r129211231
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/topology/SimpleWindowPartitionCache.java 
---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.storm.topology;
+
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.locks.ReentrantLock;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A simple implementation that evicts the largest un-pinned entry from 
the cache. This works well
+ * for caching window partitions since the access pattern is mostly 
sequential scans.
+ */
+public class SimpleWindowPartitionCache implements 
WindowPartitionCache {
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleWindowPartitionCache.class);
+
+private final ConcurrentSkipListMap map = new 
ConcurrentSkipListMap<>();
+private final Map pinned = new HashMap<>();
+private final long maximumSize;
+private final RemovalListener removalListener;
+private final CacheLoader cacheLoader;
+private final ReentrantLock lock = new ReentrantLock(true);
+private int size;
+
+@Override
+public V get(K key) {
+return getOrLoad(key, false);
+}
+
+@Override
+public V getPinned(K key) {
+return getOrLoad(key, true);
+}
+
+@Override
+public boolean unpin(K key) {
+LOG.debug("unpin '{}'", key);
+boolean res = false;
+try {
+lock.lock();
+if (pinned.compute(key, (k, v) -> v == null ? 0 : v - 1) <= 0) 
{
+pinned.remove(key);
+res = true;
+}
+} finally {
+lock.unlock();
+}
+LOG.debug("pinned '{}'", pinned);
+return res;
+}
+
+@Override
+public ConcurrentMap asMap() {
+return map;
+}
+
+@Override
+public void invalidate(K key) {
+if (isPinned(key)) {
+LOG.debug("Entry '{}' is pinned, skipping invalidation", key);
+} else {
+LOG.debug("Invalidating entry '{}'", key);
+V val = map.remove(key);
+try {
+lock.lock();
+--size;
--- End diff --

if the key isn't in the cache (e.g. concurrent invalidations) then this 
will decrease incorrectly. I think you need to check `val` and skip the locking 
code if not present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2218: STORM-2614: Enhance stateful windowing to persist the win...

2017-07-24 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/storm/pull/2218
  
fyi, you can emulate pinning by setting the entry's weight to zero. Then 
the entry will not be evicted, but is eligible for expiration, etc. Of course 
zero weight entries means the cache allows more entries overall, whereas 
pinning might still restrict overall capacity. But its not clear how to a cache 
should handle evictions when not enough space can be freed from unpinned 
entries.

The other approach is less elegant. A `CacheWriter` can intercept an 
eviction and a `CacheLoader` intercepts gets to fetch a missing entry. Since 
recursive computations are disallowed, the writer could re-put the entry 
asynchronously. This races with a `get`, so the loader could fetch from the map 
prior to some other data store. In some ways not too dissimilar to your own 
pinned cache.

In general pinning breaks the caching model of recomputable transient state 
and the eviction policy will try to avoid pollution by removing low-value 
entries early. Generally it can lead to O(n) evictions unless maintained in a 
dedicated LRU queue so that potential victims are not evaluated. Since pinning 
vs capacity is confusing, right now Caffeine favors zero weights since the 
use-cases are unclear.

Hope that helps. I agree your own logic may be more straightforward and 
preferable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt

2016-11-30 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/storm/pull/1783
  
Should fail, e.g. Class version error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt

2016-11-22 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/storm/pull/1783
  
Nope. Sorry, since your compilation target is 1.8 I hadn't thought you'd 
need that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt

2016-11-22 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/storm/pull/1783
  
I was a co-author of Guava's cache, too. 

Guava had originally considered soft references an ideal caching scheme, 
since they offer great concurrency and GC is for memory management. That 
evolved from `ReferenceMap` to `MapMaker` to optimize space, especially in 
regards to computations (no need for a `Future` wrapper). Unfortunately soft 
references result in awful performance outside of a micro-benchmark due to 
causing full GCs. In parallel, I had been experimenting with approaches for a 
concurrent LRU cache 
([CLHM](https://github.com/ben-manes/concurrentlinkedhashmap)) and when joining 
Google helped to retrofitted its ideas onto Guava. There was a lot of good that 
came out of that, but I left before working on optimizing for performance.

Java 8 provided an excuse to start from scratch. Caffeine is much faster 
and packs in even more features. I also spent time exploring eviction policies, 
which led to co-authoring a paper on a new technique called TinyLFU. That has a 
near optimal hit rate, low memory footprint, and amortized O(1) overhead. This 
is done by tracking frequency in a popularity sketch. The same concurrency 
model in CLHM and Guava is used (inspired by a write-ahead log), which allows 
for concurrent O(1) reads and writes.

The [HighScalability 
article](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html)
 provides an overview of the algorithms that I use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt

2016-11-21 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/storm/pull/1783
  
@revans2 perhaps [Caffeine](https://github.com/ben-manes/caffeine)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---