Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
danielcweeks commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4014002209 > > The only two requests that should be cached are HEAD and GET. > > But this rule isn't _enforced_. Nothing in the specs prevents a server from sending a `Cache-Control: private` header for other methods. And by doing so, it would break the client. I'm not saying it makes sense to do so, I'm saying that it's not fair for a server to break the client so easily. If the client's cache is designed to only handle these 2 methods and nothing else, I think the client should make sure to filter out other methods. It seems a bit pointless to me to require from servers to send the `Cache-Control` header when the client already knows what requests it can and cannot cache. I believe the default is that the client doesn't cache unless told to do so, which makes caching a server responsibility. While it might make sense to limit to just the two methods we expect, it should be the client's responsibility to fix a bad server implementation. Yes, the client would break, but it's really the server that needs to be fixed. > Yes but in fact, the most problematic scenario for me is a `GET` request with a `range` header. If a server decides to sign the `range` header (which is imho totally valid), the client would break. The prevailing philosophy is that "the server decides what to to sign," but in reality, the server's control appears limited due to potential client-side cache issues. Again, it appears to me that, if the client already knows that it would break if the server signs some header, it's best for the client to proactively remove that header from the request to sign. I think that's putting to much control in the clients hand and limits what functionality the server has in deciding what to sign for. If a client "hides" the range header, a server would only have the option to sign for everything or nothing. While in practice, I don't know of any implementation is protecting ranges of files, it is entirely feasible and since it's the servers responsibility to protect the data, it should have the final say on what it allows to be read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4012759866 @adutra @danielcweeks I've gone back to the servlet-only decision about what to sign, but with an expanded set of headers: more SDK, more "aws-" service and some of the classic http protocol fluff. That's all I care about, as it's enough to reduce the risk that slightly different GET requests will match in the cache but be inconsistently signed. Alex's tests in TestS3V4RestSignerClient are retained, as are my ones in TestS3RestSigner, which expands validation of multipart upload completion and a few more opeations. It doesn't make any assertions on cache hit/miss though, due to the roll-back of `S3V4RestSignerClient`, instead it is just making sure that the operations work at all. I'm personally happy with just the expanded set of stripped headers, which this iteration provides. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on code in PR #15428: URL: https://github.com/apache/iceberg/pull/15428#discussion_r2895338139 ## aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java: ## @@ -72,11 +78,57 @@ public abstract class S3V4RestSignerClient static final String CACHE_CONTROL_PRIVATE = "private"; static final String CACHE_CONTROL_NO_CACHE = "no-cache"; - private static final Cache SIGNED_COMPONENT_CACHE = + @VisibleForTesting + static final Cache SIGNED_COMPONENT_CACHE = Caffeine.newBuilder().expireAfterWrite(30, TimeUnit.SECONDS).maximumSize(100).build(); + private static final AtomicInteger CACHE_HITS = new AtomicInteger(); Review Comment: happily. though I was thinking of another stat "entry in cache but header mismatch" as that's also of interest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on code in PR #15428: URL: https://github.com/apache/iceberg/pull/15428#discussion_r2895338139 ## aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java: ## @@ -72,11 +78,57 @@ public abstract class S3V4RestSignerClient static final String CACHE_CONTROL_PRIVATE = "private"; static final String CACHE_CONTROL_NO_CACHE = "no-cache"; - private static final Cache SIGNED_COMPONENT_CACHE = + @VisibleForTesting + static final Cache SIGNED_COMPONENT_CACHE = Caffeine.newBuilder().expireAfterWrite(30, TimeUnit.SECONDS).maximumSize(100).build(); + private static final AtomicInteger CACHE_HITS = new AtomicInteger(); Review Comment: happily -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
adutra commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4011230555 > The only two requests that should be cached are HEAD and GET. But this rule isn't _enforced_. Nothing in the specs prevents a server from sending a `Cache-Control: private` header for other methods. And by doing so, it would break the client. I'm not saying it makes sense to do so, I'm saying that it's not fair for a server to break the client so easily. If the client's cache is designed to only handle these 2 methods and nothing else, I think the client should make sure to filter out other methods. It seems a bit pointless to me to require from servers to send the `Cache-Control` header when the client already knows what requests it can and cannot cache. > What led to this line of exploration appears to stem from trying to implement PUT or other methods, which should not be cached. Yes but in fact, the most problematic scenario for me is a `GET` request with a `range` header. If a server decides to sign the `range` header (which is imho totally valid), the client would break. The prevailing philosophy is that "the server decides what to to sign," but in reality, the server's control appears limited due to potential client-side cache issues. Again, it appears to me that, if the client already knows that it would break if the server signs some header, it's best for the client to proactively remove that header from the request to sign. But as I said, having now a good understanding of the limitations, I've already adapted Polaris to adhere to the implicit constraints. I'm leaving the ongoing work here for @steveloughran to continue, and will close #15171. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
adutra commented on code in PR #15428: URL: https://github.com/apache/iceberg/pull/15428#discussion_r2895311948 ## aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java: ## @@ -241,23 +319,31 @@ public SdkHttpFullRequest sign( AwsS3V4SignerParams signerParams = extractSignerParams(AwsS3V4SignerParams.builder(), executionAttributes).build(); +// Strip-off headers that should be signed Review Comment: ```suggestion // Strip-off headers that shouldn't be signed ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
adutra commented on code in PR #15428:
URL: https://github.com/apache/iceberg/pull/15428#discussion_r2895212304
##
aws/src/integration/java/org/apache/iceberg/aws/s3/signer/TestS3RestSigner.java:
##
@@ -202,24 +209,65 @@ private static Server initHttpServer() throws Exception {
return server;
}
+ /**
+ * Assert the cache hits and misses match the values.
+ *
+ * @param hits expected hits
+ * @param misses expected misses
+ */
+ private void assertCacheHitsAndMisses(int hits, int misses) {
+assertThat(S3V4RestSignerClient.cacheHits()).describedAs("Cache
hits").isEqualTo(hits);
+assertThat(S3V4RestSignerClient.cacheMisses()).describedAs("Cache
misses").isEqualTo(misses);
+ }
+
@Test
public void validateGetObject() {
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").build());
-// signer caching should kick in when repeating the same request
-
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").build());
+assertCacheHitsAndMisses(0, 1);
+
+// signer caching should kick in when repeating the same request with a
range
+
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").range("0-10").build());
+assertCacheHitsAndMisses(1, 1);
+ }
+
+ @Test
+ public void validateHeadObjectUnsignedHeaders() {
+final HeadObjectResponse response =
+
s3.headObject(HeadObjectRequest.builder().bucket(BUCKET).key("random/key").build());
+assertCacheHitsAndMisses(0, 1);
+
+// the etag is passed in: the same object is returned and the same cached
signature is retained.
+// if the ifMatch header was cached, this would have resulted in a failure
as there would
+// be a signature mismatch.
+s3.headObject(
+HeadObjectRequest.builder()
+.bucket(BUCKET)
+.key("random/key")
+.ifMatch(response.eTag())
+.build());
+assertCacheHitsAndMisses(1, 1);
}
@Test
public void validatePutObject() {
+int hits = S3V4RestSignerClient.cacheHits();
Review Comment:
```suggestion
```
##
aws/src/integration/java/org/apache/iceberg/aws/s3/signer/TestS3RestSigner.java:
##
@@ -202,24 +209,65 @@ private static Server initHttpServer() throws Exception {
return server;
}
+ /**
+ * Assert the cache hits and misses match the values.
+ *
+ * @param hits expected hits
+ * @param misses expected misses
+ */
+ private void assertCacheHitsAndMisses(int hits, int misses) {
+assertThat(S3V4RestSignerClient.cacheHits()).describedAs("Cache
hits").isEqualTo(hits);
+assertThat(S3V4RestSignerClient.cacheMisses()).describedAs("Cache
misses").isEqualTo(misses);
+ }
+
@Test
public void validateGetObject() {
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").build());
-// signer caching should kick in when repeating the same request
-
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").build());
+assertCacheHitsAndMisses(0, 1);
+
+// signer caching should kick in when repeating the same request with a
range
+
s3.getObject(GetObjectRequest.builder().bucket(BUCKET).key("random/key").range("0-10").build());
+assertCacheHitsAndMisses(1, 1);
+ }
+
+ @Test
+ public void validateHeadObjectUnsignedHeaders() {
+final HeadObjectResponse response =
+
s3.headObject(HeadObjectRequest.builder().bucket(BUCKET).key("random/key").build());
+assertCacheHitsAndMisses(0, 1);
+
+// the etag is passed in: the same object is returned and the same cached
signature is retained.
+// if the ifMatch header was cached, this would have resulted in a failure
as there would
+// be a signature mismatch.
+s3.headObject(
+HeadObjectRequest.builder()
+.bucket(BUCKET)
+.key("random/key")
+.ifMatch(response.eTag())
+.build());
+assertCacheHitsAndMisses(1, 1);
}
@Test
public void validatePutObject() {
+int hits = S3V4RestSignerClient.cacheHits();
s3.putObject(
PutObjectRequest.builder().bucket(BUCKET).key("some/key").build(),
Paths.get("/etc/hosts"));
+assertCacheHitsAndMisses(0, 1);
+s3.putObject(
+PutObjectRequest.builder().bucket(BUCKET).key("some/key").build(),
+RequestBody.fromString("update"));
+assertCacheHitsAndMisses(0, 2);
}
@Test
public void validateDeleteObjects() {
+int hits = S3V4RestSignerClient.cacheHits();
+int misses = S3V4RestSignerClient.cacheMisses();
Review Comment:
```suggestion
```
##
aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java:
##
@@ -72,11 +78,57 @@ public abstract class S3V4RestSignerClient
static final String CACHE_CONTROL_PRIVATE = "private";
static final String CACHE_CONTROL_NO_CACHE = "no-cache";
- private static final Cache SIGNED_COMPONENT_CACHE =
+ @Visible
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
danielcweeks commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4007739084 > Does this make sense? Full control in server, with the client adapting to its choices. Polaris can choose what to sign, clients can choose various checksum options, and the only consequence is an increase in cache misses if the signing service signs more headers than before. This just feels like we're overcomplicating the actual use case and how this is beneficial to the client and the service. The only two requests that should be cached are `HEAD` and `GET`. `HEAD` requests shouldn't really have any difference in the request. `GET` requests are reused to reduce the sign request rate for range based queries when reading parquet. Each request is isolated to a single path and there's very little reuse beyond what happens within the context of a single file read action. What led to this line of exploration appears to stem from trying to implement `PUT` or other methods, which should not be cached. A server shouldn't do that because it's not secure for it to omit critical headers like checksum or size from the signature when creating/overwriting objects. That also renders those types of operations un-cacheable, so they shouldn't be considered here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on code in PR #15428:
URL: https://github.com/apache/iceberg/pull/15428#discussion_r2892094989
##
aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java:
##
@@ -309,6 +325,40 @@ public SdkHttpFullRequest sign(
@Override
public void close() throws Exception {}
+ /**
+ * Given a signing response from a server, extract the signed headers from
the query parameters.
+ *
+ * @param response The signing response which contains the URI with signed
headers.
+ * @return A list of signed header names.
+ */
+ @VisibleForTesting
+ static List enumerateSignedHeaders(S3SignResponse response) {
+String headerEnum = null;
+final String query = response.uri().getQuery();
+if (isNotEmpty(query)) {
+
+ Map queryParams =
Splitter.on('&').withKeyValueSeparator('=').split(query);
+ String signedHeaders = queryParams.get(SIGNED_HEADERS_QUERY_PARAMETER);
+
+ if (isNotEmpty(signedHeaders)) {
+headerEnum = signedHeaders;
+ }
+}
+if (headerEnum == null) {
+ // nothing in the ? headers, look for it in response headers
+ final List headers = response.headers().get(SIGNED_HEADERS);
+ if (headers != null) {
+// there's an assumption here that there is only one.
+headerEnum = headers.get(0);
+ }
+}
+if (headerEnum == null) {
+ LOG.warn("No signed headers found in response: {}", response);
Review Comment:
let's not print the signature
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4007528040 @danielcweeks look at the top of my alternative proposal; I've discussed this with @adutra and will implement it; if you look at https://github.com/apache/iceberg/pull/15428/changes/777cbcf9a512b43a063861073d409fb50d7c3048 you can see that this was part of my original design: the auth header spitting exists, absent any tests. 1. server does all filtering, makes cache/no-cache decision 2. client uses key for the request, but caches only the signed headers. 3. next request, client does lookup as present, but compares signed headers and considers it a cache miss if those headers don't match what it wants to send. 4. If a hit, it adds its unsigned headers to the request. Outcome 1. server is in charge of what to sign 2. client never sends requests with invalid signature 3. there's never unintentional retention of an optional header which affects the new request (example, a range header in cache when the new request doesn't have one). Does this make sense? Full control in server, with the client adapting to its choices. Polaris can choose what to sign, clients can choose various checksum options, and the only consequence is an increase in cache misses if the signing service signs more headers than before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
danielcweeks commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4007201750 Just cross-referencing my [comment from the 15171](https://github.com/apache/iceberg/pull/15171#issuecomment-4007018390). I think we're leaning too much on the client to handle the complexity. This should be catalog responsibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching [iceberg]
steveloughran commented on PR #15428: URL: https://github.com/apache/iceberg/pull/15428#issuecomment-4005766198 And a put to S3 express with s3 sessions enabled adds the x-amz-token, which is the key mechanism for permission performance with s3 express. If set, that MUST be retained for performance. It does not change from request to request. ``` "PUT /oneline HTTP/1.1[\r][\n]" "Host: stevel--usw2-az1--x-s3.s3express-usw2-az1.us-west-2.amazonaws.com[\r][\n]" "amz-sdk-invocation-id: 805794ee-7ae6-b968-d406-7288c2c9f9e0[\r][\n]" "amz-sdk-request: attempt=1; max=3[\r][\n]" "Authorization: AWS4-HMAC-SHA256 Credential=3CQPBE3VTCSR3XYY7GTRGQNSYI/20260305/us-west-2/s3express/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date;x-amz-s3session-token, Signature=9a5c2c8769447555b232f388b77c9d0e70e2133ab9e7da6c8ef47957b783d020[\r][\n]" "Content-Type: application/octet-stream[\r][\n]" "Expect: 100-continue[\r][\n]" "Referer: https://audit.example.org/hadoop/1/op_createfile/8dcd2c60-f463-4b76-9910-a5af09f3454a-0005/?op=op_createfile&p1=oneline&pr=stevel&ps=bac2e4e3-7d13-4b40-8b16-bfe1cb0ffbb9&cm=Put&id=8dcd2c60-f463-4b76-9910-a5af09f3454a-0005&t0=1&fs=8dcd2c60-f463-4b76-9910-a5af09f3454a&t1=1&ts=1772723079703[\r][\n]"; "User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB md/rb#u m/F,G hll/cross-region[\r][\n]" "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]" "X-Amz-Date: 20260305T150441Z[\r][\n]" "x-amz-s3session-token: AwMTx5El1XCwcU+jKOJF+FJAL/5VuUPb08Ah4fHUBACKPc+fkgMAAADFX8QAa/tUHq6h5W/p1/pdbCq0rdsCIj84DVh8gByA65vdFvMC6heyI2oLJ23QUX0UVAo9udlJuBLOoeMwr4DGV6MkRsVNk0sozghkZHE2DR/QMXTqszqvWmqEISeJLrfwJfWzJllIoLussagjAySaJ+refJcliXoiUYhYPr1xNdWVw8TkrMUS+7k0Vb8xbkYowDRWX1esRjrKBT/v820SGpLGg0wvPzi3kmwYmxa0ycGbtScdhQ5PPDipZxK/OfRVapiCNtVOvfZVZ0zmuEIpmdO3BxpJsTvPEfBRWsrPWt65kBD6vE9VQYUBpTiPZQmHhU2l+b+U4n2AxrwhpTNFtzJqVm7OvpDf9qx0Sf/IJul3/cYU8EsPUzao3IkHZ9V4jyLDw3VzSmABJ9rnFgXIz9l7cZF7TN/gI0pIZukUPLz0H/DyP86iUnMPAOLh20I4pBYOT4qPn4VdBEjvdLK4LhppBEKk93/9+yrAi4brUDR/oFUvlAUknQ3CbNZOc7ALyj6hNe9p8AzLf3FQX8EJ+z2axhTavSvaHV0MkrfJSdAZx9CQfxIWMLIeqYNDeKHH8Mrk9Dliz2Parr0uuDp6itww0NbdHh2xiAEKRFku4YY4b8/HoD82kgimLNrU5mH8Qzs1ktHWMOvniu1/lHsgGzfhg0ixllv6Y2cMq7f3o2nsyvYLZs+5nkTPa7TTBuKC/9nif70CQ2aJDWWuBaL6apNfoZI1cQsrvtdUI5355ikF+r54Xd/3GHjKBTj0ad63Lq/scNaaSja+E+n8ZlKcZWnKYdvKn7rcnKtBYSEg9h1K+uhUJJUkhmtqxHQAKvh4w1BMZULeiDEC67IKS1hBBg7Eq3Z7wopyiNLfwn8QZ7qA1VCLRl+DPAdVxV/Qw+m 8g6bNefsvixOoJw4KC/INLpk1q3Vu5SFvXz6GbJgNazWIXk56CaJiWRxDA2U3+7w3Y/Lx06pskf3yoVkW9+zURrbbx6ka5Q5Zksvf84C9H7LPbhGowGbZ24oNnvtR/cCQJsIQQGyrCLVpKyBPG3loXmVlxSDoDZtNTcnCGNRKhxuzt9dHNeZj8VpQGnpDsBR13VKw8bOkIl6ZKHJEvCB+ZfLgBdIddqgC9LKmt8i3QWfaXtePKVCfcPnOg8VJerAmv/7/XRecTmAi6DHWp/AhZJvXI06HcNOgnvp9vjnCFRbIG1dn26tz8KNq55QLdOBsRA==[\r][\n]" "Content-Length: 5[\r][\n]" "Connection: Keep-Alive[\r][\n]" "[\r][\n]" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
