yifan-c commented on code in PR #91:
URL: https://github.com/apache/cassandra-sidecar/pull/91#discussion_r1462639265


##########
src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java:
##########
@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.sidecar.cache;
+
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BooleanSupplier;
+import java.util.function.Function;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Policy;
+import com.github.benmanes.caffeine.cache.stats.CacheStats;
+import org.checkerframework.checker.index.qual.NonNegative;
+import org.checkerframework.checker.nullness.qual.NonNull;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * A {@link Cache} where methods that take a mapping {@link Function} as a 
parameter can be toggled
+ * (enabled / disabled) such that when the cache is disabled, the {@link 
Function} will be called
+ * directly without retrieving cached values for any {@code key}, and when the 
cache is enabled the
+ * behavior remains the same as the provided behavior by the {@code delegate}.
+ *
+ * <p>When the cache is toggled to disabled, existing entries in the cache 
will <b>NOT</b> be
+ * removed, and if the cache is re-enabled and entries have not expired or 
have not been explicitly
+ * removed; the cache will serve the existing data from the {@code delegate} 
cache.
+ *
+ * @param <K> the type of keys maintained by this cache
+ * @param <V> the type of mapped values
+ */
+public class ToggleableCache<K, V> implements Cache<K, V>
+{
+    private final Cache<K, V> delegate;
+    private final BooleanSupplier cacheEnabledSupplier;
+
+    public ToggleableCache(Cache<K, V> delegate, BooleanSupplier 
cacheEnabledSupplier)
+    {
+        this.delegate = Objects.requireNonNull(delegate, "delegate is 
required");
+        this.cacheEnabledSupplier = 
Objects.requireNonNull(cacheEnabledSupplier, "toggleSupplier is required");
+    }
+
+    /**
+     * {@inheritDoc}
+     */
+    @Override
+    @Nullable
+    public V getIfPresent(@NonNull Object key)
+    {
+        return delegate.getIfPresent(key);
+    }
+
+    /**
+     * When the cache is enabled, returns the value associated with the {@code 
key} in this cache,
+     * obtaining that value from the {@code mappingFunction} if necessary. 
When the cache is disabled,
+     * it will always obtain the value from the {@code mappingFunction}.
+     *
+     * @param key             the key with which the specified value is to be 
associated
+     * @param mappingFunction the function to compute a value

Review Comment:
   never a fan of the name `mappingFunction`, although it is named the same as 
in the guava lib. The name does not tell anything unless looking at the 
description. I would rename it to `valueLoader`. 
   But it is just a nit and feel free to ignore. 



##########
src/main/java/org/apache/cassandra/sidecar/config/yaml/ServiceConfigurationImpl.java:
##########
@@ -458,6 +501,18 @@ public Builder 
ssTableImportConfiguration(SSTableImportConfiguration ssTableImpo
             return update(b -> b.ssTableImportConfiguration = 
ssTableImportConfiguration);
         }
 
+        /**
+         * Sets the {@code ssTableSnapshotConfiguration} and returns a 
reference to this Builder enabling
+         * method chaining.
+         *
+         * @param ssTableSnapshotConfiguration the {@code 
ssTableSnapshotConfiguration} to set
+         * @return a reference to this Builder
+         */
+        public Builder 
ssTableSnapshotConfiguration(SSTableSnapshotConfiguration 
ssTableSnapshotConfiguration)

Review Comment:
   nit: treat `sstable` as a single word. `ssTable` does not look right.



##########
src/main/java/org/apache/cassandra/sidecar/routes/FileStreamHandler.java:
##########
@@ -74,40 +103,114 @@ protected String extractParamsOrThrow(RoutingContext 
context)
         return context.get(FILE_PATH_CONTEXT_KEY);
     }
 
+    @Override
+    protected void processFailure(Throwable cause,
+                                  RoutingContext context,
+                                  String host,
+                                  SocketAddress remoteAddress,
+                                  String localFile)
+    {
+        IOException fileNotFoundException = ThrowableUtils.getCause(cause, 
NoSuchFileException.class);
+
+        if (fileNotFoundException == null)
+        {
+            // FileNotFoundException comes from the stream method
+            fileNotFoundException = ThrowableUtils.getCause(cause, 
FileNotFoundException.class);
+        }
+
+        if (fileNotFoundException != null)
+        {
+            logger.error("The requested file '{}' does not exist", localFile);
+            context.fail(wrapHttpException(NOT_FOUND, "The requested file does 
not exist"));
+            return;
+        }
+
+        super.processFailure(cause, context, host, remoteAddress, localFile);
+    }
+
     /**
-     * Ensures that the file exists and is a non-empty regular file
+     * Retrieves the Future for the validation operation. If the cache was 
initialized, it will retrieve the
+     * Future from the cache.
      *
      * @param fs        The underlying filesystem
      * @param localFile The path the file in the filesystem
-     * @param exists    Whether the file exists or not
      * @return a succeeded future with the {@link FileProps}, or a failed 
future if the file does not exist;
      * is not a regular file; or if the file is empty
      */
-    private Future<FileProps> ensureValidFile(FileSystem fs, String localFile, 
Boolean exists)
+    protected Future<FileProps> ensureValidFile(FileSystem fs, String 
localFile)
     {
-        if (!exists)
+        if (filePropsCache != null)
         {
-            logger.error("The requested file '{}' does not exist", localFile);
-            return Future.failedFuture(wrapHttpException(NOT_FOUND, "The 
requested file does not exist"));
+            FileProps fileProps = filePropsCache.getIfPresent(localFile);
+
+            if (fileProps != null)
+            {
+                return Future.succeededFuture(fileProps);
+            }
+
+            return ensureValidFileNonCached(fs).apply(localFile)

Review Comment:
   I am not convinced on the usefulness of the cache.
   
   1. When the cache is update to date, the cache helps to save 1 look up of 
the file attributes. However, during `sendFile`, there are more I/O operations. 
   2. When the cache is stale, the file does not exist. It is going to acquire 
permit and try to `sendFile` falsely, which could undesirably increase CPU 
usage. 



##########
src/main/java/org/apache/cassandra/sidecar/snapshots/CachedSnapshotPathBuilder.java:
##########
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.sidecar.snapshots;
+
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import io.vertx.core.Future;
+import io.vertx.core.Vertx;
+import org.apache.cassandra.sidecar.cache.ToggleableCache;
+import org.apache.cassandra.sidecar.cluster.InstancesConfig;
+import org.apache.cassandra.sidecar.common.data.QualifiedTableName;
+import org.apache.cassandra.sidecar.concurrent.ExecutorPools;
+import org.apache.cassandra.sidecar.config.CacheConfiguration;
+import org.apache.cassandra.sidecar.config.ServiceConfiguration;
+import org.apache.cassandra.sidecar.data.SnapshotRequest;
+import org.apache.cassandra.sidecar.data.StreamSSTableComponentRequest;
+import org.apache.cassandra.sidecar.utils.CassandraInputValidator;
+import org.jetbrains.annotations.VisibleForTesting;
+
+
+/**
+ * A {@link SnapshotPathBuilder} implementation that uses caches to optimize 
filesystem access patterns
+ * while a bulk reader client is using Sidecar to stream SSTable components 
allowing for faster lookups.
+ */
+public class CachedSnapshotPathBuilder extends SnapshotPathBuilder
+{
+    /**
+     * The table directory cache maintains a cache of recently used table 
directories. This cache
+     * avoids having to traverse the filesystem searching for the table 
directory while running
+     * bulk reads for example. Since bulk reads can stream hundreds of 
SSTables from the table directory,
+     * this cache helps avoid having to resolve the table directory for every 
SSTable.
+     */
+    @VisibleForTesting
+    final Cache<String, Future<String>> tableDirCache;
+
+    /**
+     * The snapshot list cache maintains a cache of recently listed snapshot 
files under a snapshot directory.
+     * This cache avoids having to access the filesystem every time a bulk 
reader client lists the snapshot
+     * directory.
+     */
+    @VisibleForTesting
+    final Cache<String, Future<List<SnapshotFile>>> snapshotListCache;
+
+    /**
+     * The snapshot path cache maintains a cache of recently streamed snapshot 
SSTable components. This cache
+     * avoids having to resolve the snapshot SSTable component file during 
bulk reads. Since bulk reads stream
+     * sub-ranges of an SSTable component, the resolution can happen multiple 
times during bulk reads for a single
+     * SSTable component. This cache aims to reduce filesystem access to 
resolve the SSTable component path
+     * during bulk reads.
+     */
+    @VisibleForTesting
+    final Cache<String, Future<String>> snapshotPathCache;

Review Comment:
   Similar question as for `fileProps` cache, what if the value is stable?



##########
src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java:
##########
@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.sidecar.cache;
+
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BooleanSupplier;
+import java.util.function.Function;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Policy;
+import com.github.benmanes.caffeine.cache.stats.CacheStats;
+import org.checkerframework.checker.index.qual.NonNegative;
+import org.checkerframework.checker.nullness.qual.NonNull;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * A {@link Cache} where methods that take a mapping {@link Function} as a 
parameter can be toggled
+ * (enabled / disabled) such that when the cache is disabled, the {@link 
Function} will be called
+ * directly without retrieving cached values for any {@code key}, and when the 
cache is enabled the
+ * behavior remains the same as the provided behavior by the {@code delegate}.
+ *
+ * <p>When the cache is toggled to disabled, existing entries in the cache 
will <b>NOT</b> be
+ * removed, and if the cache is re-enabled and entries have not expired or 
have not been explicitly
+ * removed; the cache will serve the existing data from the {@code delegate} 
cache.

Review Comment:
   > if the cache is re-enabled and entries have not expired or have not been 
explicitly removed
   
   I do not think it is a complete sentence. Do you meant to say "if the cache 
is re-enabled, the entries have not expired or have not been explicitly removed 
will become accessible again"?



##########
src/main/java/org/apache/cassandra/sidecar/models/HttpResponse.java:
##########
@@ -97,17 +97,23 @@ public void setInternalErrorStatus(String msg)
      */
     public Future<Void> sendFile(String fileName, long fileLength, HttpRange 
range)
     {
-        // notify client we support range requests
-        response.putHeader(HttpHeaderNames.ACCEPT_RANGES, "bytes");
-
-        if (range.length() != fileLength)
-        {
-            setPartialContentStatus(range);
-        }
-
-        return response.putHeader(HttpHeaderNames.CONTENT_TYPE, 
HttpHeaderValues.APPLICATION_OCTET_STREAM)
-                       .putHeader(HttpHeaderNames.CONTENT_LENGTH, 
Long.toString(range.length()))
-                       .sendFile(fileName, range.start(), range.length());
+        // Defer setting headers in case of an error before sending the file
+        response.headersEndHandler(v -> {
+            // notify client we support range requests
+            response.putHeader(HttpHeaderNames.ACCEPT_RANGES, "bytes");
+
+            if (range.length() != fileLength)
+            {
+                setPartialContentStatus(range);
+            }
+            response.putHeader(HttpHeaderNames.CONTENT_TYPE, 
HttpHeaderValues.APPLICATION_OCTET_STREAM)
+                    .putHeader(HttpHeaderNames.CONTENT_LENGTH, 
Long.toString(range.length()));

Review Comment:
   Those 2 headers are put as part of `sendFile`. See 
`io.vertx.core.http.impl.Http2ServerResponse#sendFile:line#619-627`. And then 
at line#628 `checkSendHeaders(false);` the `headersEndHandler` is called. 
   
   Other than that, the `CONTENT_LENGTH` might not always be the 
`range.length()`. It is the minimum of `Math.min(length, file_.length() - 
offset);` as implemented in `io.vertx.core.http.impl.HttpUtils#resolveFile`



##########
src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java:
##########
@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.sidecar.cache;
+
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BooleanSupplier;
+import java.util.function.Function;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Policy;
+import com.github.benmanes.caffeine.cache.stats.CacheStats;
+import org.checkerframework.checker.index.qual.NonNegative;
+import org.checkerframework.checker.nullness.qual.NonNull;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * A {@link Cache} where methods that take a mapping {@link Function} as a 
parameter can be toggled
+ * (enabled / disabled) such that when the cache is disabled, the {@link 
Function} will be called
+ * directly without retrieving cached values for any {@code key}, and when the 
cache is enabled the
+ * behavior remains the same as the provided behavior by the {@code delegate}.
+ *
+ * <p>When the cache is toggled to disabled, existing entries in the cache 
will <b>NOT</b> be
+ * removed, and if the cache is re-enabled and entries have not expired or 
have not been explicitly
+ * removed; the cache will serve the existing data from the {@code delegate} 
cache.
+ *
+ * @param <K> the type of keys maintained by this cache
+ * @param <V> the type of mapped values
+ */
+public class ToggleableCache<K, V> implements Cache<K, V>

Review Comment:
   The semantics of `Toggleable` implemented for this class is not clear to me. 
   1. when the cache is disabled, the cache still serves read. Then, what 
exactly is "disabled"?
   2. `getIfPresent` and `get` have different behavior. The former evaluate 
from cache regardlessly. The later, however, honor the flag value.
   3. put, mutate and the other meta methods does not honor the value of the 
flag at all. Again the question, what does "disabled" mean? 
   
   In the current implementation, I do not see how disabling a cache has any 
benefit, since it allows most of the operations anyway. IMO, if enabled, the 
cache can memorize values; if disabled, no operations are permitted at all. 
   
   Let's have a clear definition of enabled and disabled, then refine the 
implementation.  Or maybe you did not mean "toggleable". 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to