[GitHub] storm pull request #2218: STORM-2614: Enhance stateful windowing to persist ...
Github user ben-manes commented on a diff in the pull request: https://github.com/apache/storm/pull/2218#discussion_r129211231 --- Diff: storm-client/src/jvm/org/apache/storm/topology/SimpleWindowPartitionCache.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.storm.topology; + +import java.util.HashMap; +import java.util.Iterator; +import java.util.Map; +import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.ConcurrentSkipListMap; +import java.util.concurrent.locks.ReentrantLock; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A simple implementation that evicts the largest un-pinned entry from the cache. This works well + * for caching window partitions since the access pattern is mostly sequential scans. + */ +public class SimpleWindowPartitionCache implements WindowPartitionCache { +private static final Logger LOG = LoggerFactory.getLogger(SimpleWindowPartitionCache.class); + +private final ConcurrentSkipListMap map = new ConcurrentSkipListMap<>(); +private final Map pinned = new HashMap<>(); +private final long maximumSize; +private final RemovalListener removalListener; +private final CacheLoader cacheLoader; +private final ReentrantLock lock = new ReentrantLock(true); +private int size; + +@Override +public V get(K key) { +return getOrLoad(key, false); +} + +@Override +public V getPinned(K key) { +return getOrLoad(key, true); +} + +@Override +public boolean unpin(K key) { +LOG.debug("unpin '{}'", key); +boolean res = false; +try { +lock.lock(); +if (pinned.compute(key, (k, v) -> v == null ? 0 : v - 1) <= 0) { +pinned.remove(key); +res = true; +} +} finally { +lock.unlock(); +} +LOG.debug("pinned '{}'", pinned); +return res; +} + +@Override +public ConcurrentMap asMap() { +return map; +} + +@Override +public void invalidate(K key) { +if (isPinned(key)) { +LOG.debug("Entry '{}' is pinned, skipping invalidation", key); +} else { +LOG.debug("Invalidating entry '{}'", key); +V val = map.remove(key); +try { +lock.lock(); +--size; --- End diff -- if the key isn't in the cache (e.g. concurrent invalidations) then this will decrease incorrectly. I think you need to check `val` and skip the locking code if not present. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #2218: STORM-2614: Enhance stateful windowing to persist the win...
Github user ben-manes commented on the issue: https://github.com/apache/storm/pull/2218 fyi, you can emulate pinning by setting the entry's weight to zero. Then the entry will not be evicted, but is eligible for expiration, etc. Of course zero weight entries means the cache allows more entries overall, whereas pinning might still restrict overall capacity. But its not clear how to a cache should handle evictions when not enough space can be freed from unpinned entries. The other approach is less elegant. A `CacheWriter` can intercept an eviction and a `CacheLoader` intercepts gets to fetch a missing entry. Since recursive computations are disallowed, the writer could re-put the entry asynchronously. This races with a `get`, so the loader could fetch from the map prior to some other data store. In some ways not too dissimilar to your own pinned cache. In general pinning breaks the caching model of recomputable transient state and the eviction policy will try to avoid pollution by removing low-value entries early. Generally it can lead to O(n) evictions unless maintained in a dedicated LRU queue so that potential victims are not evaluated. Since pinning vs capacity is confusing, right now Caffeine favors zero weights since the use-cases are unclear. Hope that helps. I agree your own logic may be more straightforward and preferable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt
Github user ben-manes commented on the issue: https://github.com/apache/storm/pull/1783 Should fail, e.g. Class version error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt
Github user ben-manes commented on the issue: https://github.com/apache/storm/pull/1783 Nope. Sorry, since your compilation target is 1.8 I hadn't thought you'd need that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt
Github user ben-manes commented on the issue: https://github.com/apache/storm/pull/1783 I was a co-author of Guava's cache, too. Guava had originally considered soft references an ideal caching scheme, since they offer great concurrency and GC is for memory management. That evolved from `ReferenceMap` to `MapMaker` to optimize space, especially in regards to computations (no need for a `Future` wrapper). Unfortunately soft references result in awful performance outside of a micro-benchmark due to causing full GCs. In parallel, I had been experimenting with approaches for a concurrent LRU cache ([CLHM](https://github.com/ben-manes/concurrentlinkedhashmap)) and when joining Google helped to retrofitted its ideas onto Guava. There was a lot of good that came out of that, but I left before working on optimizing for performance. Java 8 provided an excuse to start from scratch. Caffeine is much faster and packs in even more features. I also spent time exploring eviction policies, which led to co-authoring a paper on a new technique called TinyLFU. That has a near optimal hit rate, low memory footprint, and amortized O(1) overhead. This is done by tracking frequency in a popularity sketch. The same concurrency model in CLHM and Guava is used (inspired by a write-ahead log), which allows for concurrent O(1) reads and writes. The [HighScalability article](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html) provides an overview of the algorithms that I use. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1783: STORM-2204 Adding caching capabilities in HBaseLookupBolt
Github user ben-manes commented on the issue: https://github.com/apache/storm/pull/1783 @revans2 perhaps [Caffeine](https://github.com/ben-manes/caffeine)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---