yifan-c commented on code in PR #91: URL: https://github.com/apache/cassandra-sidecar/pull/91#discussion_r1462639265
########## src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java: ########## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.sidecar.cache; + +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BooleanSupplier; +import java.util.function.Function; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Policy; +import com.github.benmanes.caffeine.cache.stats.CacheStats; +import org.checkerframework.checker.index.qual.NonNegative; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A {@link Cache} where methods that take a mapping {@link Function} as a parameter can be toggled + * (enabled / disabled) such that when the cache is disabled, the {@link Function} will be called + * directly without retrieving cached values for any {@code key}, and when the cache is enabled the + * behavior remains the same as the provided behavior by the {@code delegate}. + * + * <p>When the cache is toggled to disabled, existing entries in the cache will <b>NOT</b> be + * removed, and if the cache is re-enabled and entries have not expired or have not been explicitly + * removed; the cache will serve the existing data from the {@code delegate} cache. + * + * @param <K> the type of keys maintained by this cache + * @param <V> the type of mapped values + */ +public class ToggleableCache<K, V> implements Cache<K, V> +{ + private final Cache<K, V> delegate; + private final BooleanSupplier cacheEnabledSupplier; + + public ToggleableCache(Cache<K, V> delegate, BooleanSupplier cacheEnabledSupplier) + { + this.delegate = Objects.requireNonNull(delegate, "delegate is required"); + this.cacheEnabledSupplier = Objects.requireNonNull(cacheEnabledSupplier, "toggleSupplier is required"); + } + + /** + * {@inheritDoc} + */ + @Override + @Nullable + public V getIfPresent(@NonNull Object key) + { + return delegate.getIfPresent(key); + } + + /** + * When the cache is enabled, returns the value associated with the {@code key} in this cache, + * obtaining that value from the {@code mappingFunction} if necessary. When the cache is disabled, + * it will always obtain the value from the {@code mappingFunction}. + * + * @param key the key with which the specified value is to be associated + * @param mappingFunction the function to compute a value Review Comment: never a fan of the name `mappingFunction`, although it is named the same as in the guava lib. The name does not tell anything unless looking at the description. I would rename it to `valueLoader`. But it is just a nit and feel free to ignore. ########## src/main/java/org/apache/cassandra/sidecar/config/yaml/ServiceConfigurationImpl.java: ########## @@ -458,6 +501,18 @@ public Builder ssTableImportConfiguration(SSTableImportConfiguration ssTableImpo return update(b -> b.ssTableImportConfiguration = ssTableImportConfiguration); } + /** + * Sets the {@code ssTableSnapshotConfiguration} and returns a reference to this Builder enabling + * method chaining. + * + * @param ssTableSnapshotConfiguration the {@code ssTableSnapshotConfiguration} to set + * @return a reference to this Builder + */ + public Builder ssTableSnapshotConfiguration(SSTableSnapshotConfiguration ssTableSnapshotConfiguration) Review Comment: nit: treat `sstable` as a single word. `ssTable` does not look right. ########## src/main/java/org/apache/cassandra/sidecar/routes/FileStreamHandler.java: ########## @@ -74,40 +103,114 @@ protected String extractParamsOrThrow(RoutingContext context) return context.get(FILE_PATH_CONTEXT_KEY); } + @Override + protected void processFailure(Throwable cause, + RoutingContext context, + String host, + SocketAddress remoteAddress, + String localFile) + { + IOException fileNotFoundException = ThrowableUtils.getCause(cause, NoSuchFileException.class); + + if (fileNotFoundException == null) + { + // FileNotFoundException comes from the stream method + fileNotFoundException = ThrowableUtils.getCause(cause, FileNotFoundException.class); + } + + if (fileNotFoundException != null) + { + logger.error("The requested file '{}' does not exist", localFile); + context.fail(wrapHttpException(NOT_FOUND, "The requested file does not exist")); + return; + } + + super.processFailure(cause, context, host, remoteAddress, localFile); + } + /** - * Ensures that the file exists and is a non-empty regular file + * Retrieves the Future for the validation operation. If the cache was initialized, it will retrieve the + * Future from the cache. * * @param fs The underlying filesystem * @param localFile The path the file in the filesystem - * @param exists Whether the file exists or not * @return a succeeded future with the {@link FileProps}, or a failed future if the file does not exist; * is not a regular file; or if the file is empty */ - private Future<FileProps> ensureValidFile(FileSystem fs, String localFile, Boolean exists) + protected Future<FileProps> ensureValidFile(FileSystem fs, String localFile) { - if (!exists) + if (filePropsCache != null) { - logger.error("The requested file '{}' does not exist", localFile); - return Future.failedFuture(wrapHttpException(NOT_FOUND, "The requested file does not exist")); + FileProps fileProps = filePropsCache.getIfPresent(localFile); + + if (fileProps != null) + { + return Future.succeededFuture(fileProps); + } + + return ensureValidFileNonCached(fs).apply(localFile) Review Comment: I am not convinced on the usefulness of the cache. 1. When the cache is update to date, the cache helps to save 1 look up of the file attributes. However, during `sendFile`, there are more I/O operations. 2. When the cache is stale, the file does not exist. It is going to acquire permit and try to `sendFile` falsely, which could undesirably increase CPU usage. ########## src/main/java/org/apache/cassandra/sidecar/snapshots/CachedSnapshotPathBuilder.java: ########## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.sidecar.snapshots; + +import java.util.List; +import java.util.concurrent.TimeUnit; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import io.vertx.core.Future; +import io.vertx.core.Vertx; +import org.apache.cassandra.sidecar.cache.ToggleableCache; +import org.apache.cassandra.sidecar.cluster.InstancesConfig; +import org.apache.cassandra.sidecar.common.data.QualifiedTableName; +import org.apache.cassandra.sidecar.concurrent.ExecutorPools; +import org.apache.cassandra.sidecar.config.CacheConfiguration; +import org.apache.cassandra.sidecar.config.ServiceConfiguration; +import org.apache.cassandra.sidecar.data.SnapshotRequest; +import org.apache.cassandra.sidecar.data.StreamSSTableComponentRequest; +import org.apache.cassandra.sidecar.utils.CassandraInputValidator; +import org.jetbrains.annotations.VisibleForTesting; + + +/** + * A {@link SnapshotPathBuilder} implementation that uses caches to optimize filesystem access patterns + * while a bulk reader client is using Sidecar to stream SSTable components allowing for faster lookups. + */ +public class CachedSnapshotPathBuilder extends SnapshotPathBuilder +{ + /** + * The table directory cache maintains a cache of recently used table directories. This cache + * avoids having to traverse the filesystem searching for the table directory while running + * bulk reads for example. Since bulk reads can stream hundreds of SSTables from the table directory, + * this cache helps avoid having to resolve the table directory for every SSTable. + */ + @VisibleForTesting + final Cache<String, Future<String>> tableDirCache; + + /** + * The snapshot list cache maintains a cache of recently listed snapshot files under a snapshot directory. + * This cache avoids having to access the filesystem every time a bulk reader client lists the snapshot + * directory. + */ + @VisibleForTesting + final Cache<String, Future<List<SnapshotFile>>> snapshotListCache; + + /** + * The snapshot path cache maintains a cache of recently streamed snapshot SSTable components. This cache + * avoids having to resolve the snapshot SSTable component file during bulk reads. Since bulk reads stream + * sub-ranges of an SSTable component, the resolution can happen multiple times during bulk reads for a single + * SSTable component. This cache aims to reduce filesystem access to resolve the SSTable component path + * during bulk reads. + */ + @VisibleForTesting + final Cache<String, Future<String>> snapshotPathCache; Review Comment: Similar question as for `fileProps` cache, what if the value is stable? ########## src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java: ########## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.sidecar.cache; + +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BooleanSupplier; +import java.util.function.Function; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Policy; +import com.github.benmanes.caffeine.cache.stats.CacheStats; +import org.checkerframework.checker.index.qual.NonNegative; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A {@link Cache} where methods that take a mapping {@link Function} as a parameter can be toggled + * (enabled / disabled) such that when the cache is disabled, the {@link Function} will be called + * directly without retrieving cached values for any {@code key}, and when the cache is enabled the + * behavior remains the same as the provided behavior by the {@code delegate}. + * + * <p>When the cache is toggled to disabled, existing entries in the cache will <b>NOT</b> be + * removed, and if the cache is re-enabled and entries have not expired or have not been explicitly + * removed; the cache will serve the existing data from the {@code delegate} cache. Review Comment: > if the cache is re-enabled and entries have not expired or have not been explicitly removed I do not think it is a complete sentence. Do you meant to say "if the cache is re-enabled, the entries have not expired or have not been explicitly removed will become accessible again"? ########## src/main/java/org/apache/cassandra/sidecar/models/HttpResponse.java: ########## @@ -97,17 +97,23 @@ public void setInternalErrorStatus(String msg) */ public Future<Void> sendFile(String fileName, long fileLength, HttpRange range) { - // notify client we support range requests - response.putHeader(HttpHeaderNames.ACCEPT_RANGES, "bytes"); - - if (range.length() != fileLength) - { - setPartialContentStatus(range); - } - - return response.putHeader(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_OCTET_STREAM) - .putHeader(HttpHeaderNames.CONTENT_LENGTH, Long.toString(range.length())) - .sendFile(fileName, range.start(), range.length()); + // Defer setting headers in case of an error before sending the file + response.headersEndHandler(v -> { + // notify client we support range requests + response.putHeader(HttpHeaderNames.ACCEPT_RANGES, "bytes"); + + if (range.length() != fileLength) + { + setPartialContentStatus(range); + } + response.putHeader(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_OCTET_STREAM) + .putHeader(HttpHeaderNames.CONTENT_LENGTH, Long.toString(range.length())); Review Comment: Those 2 headers are put as part of `sendFile`. See `io.vertx.core.http.impl.Http2ServerResponse#sendFile:line#619-627`. And then at line#628 `checkSendHeaders(false);` the `headersEndHandler` is called. Other than that, the `CONTENT_LENGTH` might not always be the `range.length()`. It is the minimum of `Math.min(length, file_.length() - offset);` as implemented in `io.vertx.core.http.impl.HttpUtils#resolveFile` ########## src/main/java/org/apache/cassandra/sidecar/cache/ToggleableCache.java: ########## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.sidecar.cache; + +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BooleanSupplier; +import java.util.function.Function; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Policy; +import com.github.benmanes.caffeine.cache.stats.CacheStats; +import org.checkerframework.checker.index.qual.NonNegative; +import org.checkerframework.checker.nullness.qual.NonNull; +import org.checkerframework.checker.nullness.qual.Nullable; + +/** + * A {@link Cache} where methods that take a mapping {@link Function} as a parameter can be toggled + * (enabled / disabled) such that when the cache is disabled, the {@link Function} will be called + * directly without retrieving cached values for any {@code key}, and when the cache is enabled the + * behavior remains the same as the provided behavior by the {@code delegate}. + * + * <p>When the cache is toggled to disabled, existing entries in the cache will <b>NOT</b> be + * removed, and if the cache is re-enabled and entries have not expired or have not been explicitly + * removed; the cache will serve the existing data from the {@code delegate} cache. + * + * @param <K> the type of keys maintained by this cache + * @param <V> the type of mapped values + */ +public class ToggleableCache<K, V> implements Cache<K, V> Review Comment: The semantics of `Toggleable` implemented for this class is not clear to me. 1. when the cache is disabled, the cache still serves read. Then, what exactly is "disabled"? 2. `getIfPresent` and `get` have different behavior. The former evaluate from cache regardlessly. The later, however, honor the flag value. 3. put, mutate and the other meta methods does not honor the value of the flag at all. Again the question, what does "disabled" mean? In the current implementation, I do not see how disabling a cache has any benefit, since it allows most of the operations anyway. IMO, if enabled, the cache can memorize values; if disabled, no operations are permitted at all. Let's have a clear definition of enabled and disabled, then refine the implementation. Or maybe you did not mean "toggleable". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

