Canonical Hash Subgroup folks, Attached is a diagram illustrating the canonicalization process. As you know, our goal is to be able to take an SPDX document in any format and produce a hash value that is independent of that format. If the same SPDX information is serialized in RDF and JSON, then the hash of those documents must be the same. If the SPDX information in two documents is different, then their hashes must be different.
Thus far we have discussed * canonical data format (agreed to consider JSON and CBOR) * canonicalization tool programming languages * directly hashing a canonical data format vs. constructing a Merkle hash tree from an AST of that data format * normalizing URL strings We have not yet discussed defining the SPDX Abstract Syntax Tree, which is similar to but more strictly defined than the SPDX logical model. Although we discussed JSON ASTs in the context of producing hash trees, a JSON AST has no knowledge of SPDX and thus doesn't help when processing SPDX documents in other formats. This diagram illustrates some of the topics to be addressed, and hopefully can guide and focus future discussion. v/r, David -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4570): https://lists.spdx.org/g/Spdx-tech/message/4570 Mute This Topic: https://lists.spdx.org/mt/91678203/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
