[GitHub] [maven-resolver] cstamas commented on a diff in pull request #281: Document Expected Checksums in Resolver

2023-04-24 Thread via GitHub


cstamas commented on code in PR #281:
URL: https://github.com/apache/maven-resolver/pull/281#discussion_r1174930750


##
src/site/markdown/expected-checksums.md:
##
@@ -45,7 +45,7 @@ so the three expected checksum kinds in transport are: 
"Provided", "Remote Inclu
 but it differs **how** Resolver obtains these.
 
 The new **Provided** kind of expected checksums are "provided" to resolver by 
some alternative
-means, possibly ahead of any transport operation. There is an SPI interfacce 
that users may 
+means, possibly ahead of any transport operation. There is an SPI interface 
that users may 

Review Comment:
   changed, "SPI interface" -> "SPI extension point"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [maven-resolver] cstamas commented on a diff in pull request #281: Document Expected Checksums in Resolver

2023-04-24 Thread via GitHub


cstamas commented on code in PR #281:
URL: https://github.com/apache/maven-resolver/pull/281#discussion_r1174897359


##
src/site/markdown/expected-checksums.md:
##
@@ -0,0 +1,140 @@
+# Expected Checksums
+
+
+Checksums in Resolver were historically used during transport, 
+to ensure Artifact integrity. In addition, latest Resolver may 
+use checksums in various other ways too, for example to ensure 
+Artifact integrity during resolution. 
+
+The bare essence of all checksum uses in Resolver is 
+"integrity validation": Resolver calculates by various
+means the "calculated" checksum (for given payload), 
+then obtains somehow the "expected" checksum (for same payload)
+and compares the two.
+
+This page covers all the "expected" checksum varieties.
+
+
+## Transport Checksum Strategies
+
+Historically, the "obtain expected checksum" was implemented as simple HTTP 
GET 
+request against Artifact checksum URL (Artifact URL appended by ".sha1"). This 
logic 
+is still present in current Resolver, but is "decorated" and extended in 
multiple 
+ways.
+
+Resolver has broadened the "obtain" step for "expected" checksum with two new 
strategies,
+so the three expected checksum kinds in transport are: "Provided", "Remote 
Included" and 
+"Remote External". All these strategies provide the source of "expected" 
checksum, 
+but it differs **how** Resolver obtains these.
+
+The new **Provided** kind of expected checksums are "provided" to resolver by 
some alternative
+means, possibly ahead of any transport operation. There is an SPI interfacce 
that users may 
+implement, to have own ways to provide checksums to resolver, or, may use out 
of the 
+box implementation, that simply delegates to "trusted checksums" (more about 
them later).
+
+The new **Remote Included** checksums are in some way "included" by remote 
party, typically 
+in their response. Since advent of modern Repository Managers, most of 
+them already sends checksums (usually the "standard" SHA-1 and MD5)
+in their response headers. Moreover, Maven Central, and even Google Mirror of 
Maven Central 
+sends them as well. By extracting these checksums from response, we can get 
hashes
+that were provided by remote repository along with its content. 
+
+Finally, the **Remote External** checksums are the classic checksums we all 
know: They are laid down 
+next to Artifact files (hence "external") on remote repository (hence 
"remote"), according 
+to remote repository layout. To obtain Remote External checksum, an HTTP GET 
request is
+required. This strategy will follow the order given in 
`aether.checksums.algorithms`, so
+will ask for checksums in same order as the parameter contains algorithm names.
+
+During single artifact retrieval, these strategies are executed in above 
specified order,
+and only if current strategy has "no answer", the next strategy is attempted. 
Hence, if 
+resolver is able to get "expected" checksum from Provided Checksum Source, the 
Remote Included
+and Remote External sources will not be consulted. Important implication: 
given that almost
+all MRMs and remote repositories (Maven Central, Google Mirror of Maven 
Central) send "standard"
+checksums in their response, if any of the standard (SHA-1, MD5) checksum is 
enabled, validation will
+be probably satisfied by "Remote Included" strategy. 
+
+The big win here is that by obtaining hashes using "Remote Included" and not 
by "Remote External"
+strategy, we can halve the count of HTTP requests to download an Artifact.
+
+### Remote Included Strategies
+
+**Note: Remote Included checksums work only with transport-http, they do NOT 
work with transport-wagon!**
+
+By using "Remote Included" checksum feature, we are able to halve the issued 
HTTP request 
+count, since many repository services along Maven Central emits the reference 
checksums in
+the artifact response itself (as HTTP headers). Hence, we are able to get the
+artifact and reference "expected" checksum using only one HTTP round-trip.
+
+
+ Sonatype Nexus 2
+
+Sonatype Nexus 2 uses SHA-1 hash to generate `ETag` header in "shielded" (à la 
Plexus Cipher)
+way. Naturally, this means only SHA-1 is available in artifact response header.
+
+Emitted by: Sonatype Nexus2 only.
+
+
+ Non-standard `X-` headers
+
+Maven Central emits headers `x-checksum-sha1` and `x-checksum-md5` along with 
artifact response. 
+Google GCS on the other hand uses `x-goog-meta-checksum-sha1` and 
`x-goog-meta-checksum-md5` 
+headers. Resolver will detect these and use their value.
+
+Emitted by: Maven Central, GCS, some CDNs and probably more.
+
+
+## Trusted Checksums
+
+All the "expected" checksums discussed above are trasport bound, they are all
+about URLs, HTTP requests and responses, or require Transport related API 
elements.
+
+Trusted checksums is a SPI component that is able to deliver "expected" 
checksums 
+for given Artifact, without use of any transport API element. In other words, 
this
+API is not 

[GitHub] [maven-resolver] cstamas commented on a diff in pull request #281: Document Expected Checksums in Resolver

2023-04-24 Thread via GitHub


cstamas commented on code in PR #281:
URL: https://github.com/apache/maven-resolver/pull/281#discussion_r1174897005


##
src/site/markdown/expected-checksums.md:
##
@@ -0,0 +1,140 @@
+# Expected Checksums
+
+
+Checksums in Resolver were historically used during transport, 
+to ensure Artifact integrity. In addition, latest Resolver may 
+use checksums in various other ways too, for example to ensure 
+Artifact integrity during resolution. 
+
+The bare essence of all checksum uses in Resolver is 
+"integrity validation": Resolver calculates by various
+means the "calculated" checksum (for given payload), 
+then obtains somehow the "expected" checksum (for same payload)
+and compares the two.
+
+This page covers all the "expected" checksum varieties.
+
+
+## Transport Checksum Strategies
+
+Historically, the "obtain expected checksum" was implemented as simple HTTP 
GET 
+request against Artifact checksum URL (Artifact URL appended by ".sha1"). This 
logic 
+is still present in current Resolver, but is "decorated" and extended in 
multiple 
+ways.
+
+Resolver has broadened the "obtain" step for "expected" checksum with two new 
strategies,
+so the three expected checksum kinds in transport are: "Provided", "Remote 
Included" and 
+"Remote External". All these strategies provide the source of "expected" 
checksum, 
+but it differs **how** Resolver obtains these.
+
+The new **Provided** kind of expected checksums are "provided" to resolver by 
some alternative
+means, possibly ahead of any transport operation. There is an SPI interfacce 
that users may 

Review Comment:
   tx, fixed



##
src/site/markdown/expected-checksums.md:
##
@@ -0,0 +1,140 @@
+# Expected Checksums
+
+
+Checksums in Resolver were historically used during transport, 
+to ensure Artifact integrity. In addition, latest Resolver may 
+use checksums in various other ways too, for example to ensure 
+Artifact integrity during resolution. 
+
+The bare essence of all checksum uses in Resolver is 
+"integrity validation": Resolver calculates by various
+means the "calculated" checksum (for given payload), 
+then obtains somehow the "expected" checksum (for same payload)
+and compares the two.
+
+This page covers all the "expected" checksum varieties.
+
+
+## Transport Checksum Strategies
+
+Historically, the "obtain expected checksum" was implemented as simple HTTP 
GET 
+request against Artifact checksum URL (Artifact URL appended by ".sha1"). This 
logic 
+is still present in current Resolver, but is "decorated" and extended in 
multiple 
+ways.
+
+Resolver has broadened the "obtain" step for "expected" checksum with two new 
strategies,
+so the three expected checksum kinds in transport are: "Provided", "Remote 
Included" and 
+"Remote External". All these strategies provide the source of "expected" 
checksum, 
+but it differs **how** Resolver obtains these.
+
+The new **Provided** kind of expected checksums are "provided" to resolver by 
some alternative
+means, possibly ahead of any transport operation. There is an SPI interfacce 
that users may 
+implement, to have own ways to provide checksums to resolver, or, may use out 
of the 
+box implementation, that simply delegates to "trusted checksums" (more about 
them later).
+
+The new **Remote Included** checksums are in some way "included" by remote 
party, typically 
+in their response. Since advent of modern Repository Managers, most of 
+them already sends checksums (usually the "standard" SHA-1 and MD5)
+in their response headers. Moreover, Maven Central, and even Google Mirror of 
Maven Central 
+sends them as well. By extracting these checksums from response, we can get 
hashes
+that were provided by remote repository along with its content. 
+
+Finally, the **Remote External** checksums are the classic checksums we all 
know: They are laid down 
+next to Artifact files (hence "external") on remote repository (hence 
"remote"), according 
+to remote repository layout. To obtain Remote External checksum, an HTTP GET 
request is
+required. This strategy will follow the order given in 
`aether.checksums.algorithms`, so
+will ask for checksums in same order as the parameter contains algorithm names.
+
+During single artifact retrieval, these strategies are executed in above 
specified order,
+and only if current strategy has "no answer", the next strategy is attempted. 
Hence, if 
+resolver is able to get "expected" checksum from Provided Checksum Source, the 
Remote Included
+and Remote External sources will not be consulted. Important implication: 
given that almost
+all MRMs and remote repositories (Maven Central, Google Mirror of Maven 
Central) send "standard"
+checksums in their response, if any of the standard (SHA-1, MD5) checksum is 
enabled, validation will
+be probably satisfied by "Remote Included" strategy. 
+
+The big win here is that by obtaining hashes using "Remote Included" and not 
by "Remote External"
+strategy, we can halve the count 

[GitHub] [maven-resolver] cstamas commented on a diff in pull request #281: Document Expected Checksums in Resolver

2023-04-24 Thread via GitHub


cstamas commented on code in PR #281:
URL: https://github.com/apache/maven-resolver/pull/281#discussion_r1174896698


##
src/site/markdown/expected-checksums.md:
##
@@ -0,0 +1,140 @@
+# Expected Checksums
+
+
+Checksums in Resolver were historically used during transport, 
+to ensure Artifact integrity. In addition, latest Resolver may 
+use checksums in various other ways too, for example to ensure 
+Artifact integrity during resolution. 
+
+The bare essence of all checksum uses in Resolver is 
+"integrity validation": Resolver calculates by various
+means the "calculated" checksum (for given payload), 
+then obtains somehow the "expected" checksum (for same payload)
+and compares the two.
+
+This page covers all the "expected" checksum varieties.
+
+
+## Transport Checksum Strategies
+
+Historically, the "obtain expected checksum" was implemented as simple HTTP 
GET 
+request against Artifact checksum URL (Artifact URL appended by ".sha1"). This 
logic 

Review Comment:
   removed mention of HTTP GET



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org