There are two components of canonicalization:
1) being able to serialize a document as individual elements
2) defining a canonical serialized value for an element regardless of data
format

To enable canonicalization a checksum must be over individual elements so
that it doesn't depend on payload-specific information like prefixes and
document creation information.  All SPDXIDs must be full and all element
creation fields must have values.  And the procedure to hash a group of
individual elements must be specified for a particular data format:
a) concatenate the serialized bytes of each individual element and hash the
concatenated bytes
b) create a structure (array) of individual elements and hash the
serialized structure
c) hash the serialized bytes of each individual element, then hash the
concatenated hashes or array of hashes (Merkel)

If canonicalization is a goal, producers and consumers can't compute a hash
directly over the compressed payload. The hashing process must disentangle
the individual elements before hashing. But you can do #1 without going to
#2. The individual elements are in the payload's data format and no other
data formats need to be supported.

I can think of a couple possible solutions for verification:
> A. Allow the ambiguity and let the consumer of the payload containing the
> external reference to determine which of the many possible external
> payloads they would like to verify
> B. Remove the ambiguity by having the serialized format of the payload
> containing the external reference specify which external payload is
> associated with each element (e.g. in the external map)


I was suggesting B.  And observing that in the payload the external map
must be separate from the elements (as with prefixes), not included in the
value of any element as the logical model currently shows.  And I wasn't
clear about what is mandatory - Payload must have the external document
reference to enable B,  It's not strictly necessary to include references
in SpdxDocument Elements, but if there is a use case for making payloads
visible in the element store by defining an SpdxDocument type, then
references between payloads should be included, which in turn enables
SpdxDocument to replace a special External Reference data type in the
payload.

You can do B without computing hashes over independent elements, but that
precludes the path to canonicalization.

I'm not sure about A, but I think it means "try all the payloads that match
the locator information and see if any of them match the expected hash".
Only one of them can match (the one the producer used when computing the
hash), so that seems like a lot of work if the locators include more than
one payload containing an element, not just copies of the same payload.

Regards,
David


On Sat, Aug 27, 2022 at 1:53 PM Gary O'Neall <[email protected]> wrote:

> If we exclusively use canonicalization for verification of external
> elements, I would agree with the conclusion.
>
>
>
> I would like to retain the ability to verify by checksuming the payload
> which is supported in the current SPDX spec. Would we be able to support
> this approach if we remove the reference to the payload in the
> externalReference?
>
>
>
> This issue of ambiguous originating payloads for external element is a
> real concern which I haven’t considered until now.
>
>
>
> I can think of a couple possible solutions for verification:
> A. Allow the ambiguity and let the consumer of the payload containing the
> external reference to determine which of the many possible external
> payloads they would like to verify
>
> B. Remove the ambiguity by having the serialized format of the payload
> containing the external reference specify which external payload is
> associated with each element (e.g. in the external map)
>
>
>
> Regards,
>
> Gary
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of
> *David Kemp
> *Sent:* Saturday, August 27, 2022 6:56 AM
> *To:* SPDX-list <[email protected]>
> *Subject:* [spdx-tech] Payload externalReference considered harmful
>
>
>
> Last week we discussed ExternalReference, drew some combinations of
> elements on the model, and made the "elements" field of ExternalReference
> plural so that all of the elements in a document are included in a single
> ExternalReference.
>
> Later, Sebastian and I discussed canonicalization, particularly whether
> the element store is complete without document retrieval, and the role of
> detached signatures.  Those discussions led to the conclusion that whatever
> other types of data an Element's externalReference property may refer to,
> it should NOT refer to a payload.
>
> Consider the element drawing:
>
>
>
> There are 5 elements of *any* type that have some kind of connection (not
> "relationship"):
>
>    - Element 1 could be an Annotation whose subject is 2 and was created
>    by identities 3, 4, and 5.
>    - Element 1 could be a Relationship created by 2, from 3, to 4 and 5
>    - Element 1 could be an SBOM with Files 2, 3, and 4, created by 5
>
> Those elements can be put in Payloads (serialized SPDX documents) in *any*
> combination, for example:
>
>    - A single payload with elements 1,2,3,4,5
>    - Two or more payloads where one payload may reference elements in
>    other payloads
>    - Five payloads, each containing a single element and zero or more
>    references to other payloads
>
> Remember that the reason for serializing more than one element into a
> payload is to allow information within a payload to be shared (reducing its
> size) and to allow a single payload integrity to provide integrity for each
> of its elements.  The picture shows "H" on each reference to a payload,
> indicating a hash or signature over the element(s) in the payload.
>
> But the value (and hash/signature) of an element cannot depend on which of
> many payloads that element may be serialized in.  Therefore the
> externalReference property of an element cannot refer to a payload (the
> externalReferenceType (currently TBD) cannot be "PAYLOAD" or
> "SPDX_DOCUMENT" or whatever type would have been used for payloads.
>
> The drawing shows three ways of serializing five elements.  The first has
> no payload references, the second and third have a single reference.
>
>
>
> The SpdxDocument Element is the single element type that describes a
> payload.  An SpdxDocument MAY be created to describe any payload, but it
> only MUST be created in order to support references from one payload to
> another.  In the diagram, only two SpdxDocument elements MUST exist (for
> payloads [2,4,5] and [1,5]).
>
> A decision to not create an SpdxDocument element for a payload does not
> "make a commitment to future use cases".  If the creator of payload [2,4,5]
> did not create SpdxDocument [2,4,5], then the creator of payload [1,3,(6)]
> can create SpdxDocument(6) that describes the referenced payload.  The
> consequence of not having the creator's SpdxDocument included in payload
> [2,4,5] is that there is no creator's signature to allow original source
> verification.  The proxy signature by the creator of payload [1,3,(6)] can
> still be verified by anyone who references or uses that payload.
>
> In summary, the payload external reference information (element list,
> payload download/query information, and integrity information) belongs
> exclusively in the SpdxDocument element and must not be included in any
> other element type.
>
> Regards,
> David
>
> 
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4760): https://lists.spdx.org/g/Spdx-tech/message/4760
Mute This Topic: https://lists.spdx.org/mt/93289465/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to