[spdx-tech] Document discussion

David Kemp Wed, 06 Jul 2022 11:01:00 -0700

We had a good discussion of Identities last meeting but did not have time
for the Document topic.  There are a few points to consider for that
discussion - it's not just a matter of what to call a box in the logical
model, the important point is to decide what the box means and then give it
a name that communicates that meaning.

William identifies the central issue below: data is deserialized into the
logical model, and then serialized back into data.  Identifying the
purpose/meaning/properties of the box currently called "Document" in the
logical model, without regard to serialization format, is our task.

There are two options:
1) Document is not an Element, as currently shown, it is a logical model
representation of serialized data, i.e. a TransferUnit.
2) Document is an Element, i.e. metadata that describes a transfer unit of
serialized Elements.

The first contains the serialized Element values transferred as a unit.
The second is metadata about the unit, i.e., a set of *References *(IRIs)
for the Elements serialized in the unit.

*Question*: Are we talking about #1 (the value of a collection) or #2
(metadata about a collection)?

Assuming #1, there are two ways to represent the serialized value of a
collection:

*1a) an unoptimized set* of serialized Element values such that each value
stands on its own:

[element1, element2, element3, ...]

*1b) an optimized set* of serialized Element values that reduces the size
of the set and allows integrity of each Element in the set be derived from
integrity of the collection as a whole:

{ collection_info: [a, b, c, d, e, ...],
  optimized_element_values: [element1, element2, element3, ...] }

The critical thing to remember is that in the logical model, every Element
stands alone - it has its own IRI and its own set of property values that
depend on nothing else.  Option 1a is both a visualization of logical
element values and an unoptimized serialization of those values.  Option 1b
is a model only of an optimized serialized collection.

If the answer is #2, then Collection is an Element and the properties of
Collection are IRIs, not Element values.  The name of the Collection
properties should be elementRefs and rootElementRefs to make it clear that
they are metadata about a thing, not values of that thing.  (The logical
model does not distinguish between references and values, so property names
are the only way to signal the distinction.)

If we are modeling a serialized collection of Elements, there can be two
logical model types: ElementSet (unoptimized, for transfer or illustrative
purposes) and TransferUnit (optimized for transfer).  If we are modeling an
Element that is metadata about a serialized transfer unit, it can be called
Document and its properties include IRIs of the elements in the transfer
unit.

Regards,
David

On Mon, May 16, 2022 at 1:22 PM William Bartholomew (CELA) via
lists.spdx.org <[email protected]> wrote:

> I was thinking about this last week as I was putting together some SPDX
> 3.0 samples. Do we think canonicalization, whose purpose is to be able to
> hash the content, needs to understand the semantics at all? Or, *can we
> say that once deserialized to a logical model and then strictly serialized
> to a defined format and hashed then that is sufficient?* This
> serialization would only need to understand fundamental data types and
> since it is per element doesn't need to understand the relationships
> between elements.
>
>
> Regards,
>
> William Bartholomew (he/him) - Let's chat
> Principal Security Strategist
> Global Cybersecurity Policy - Microsoft
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4638): https://lists.spdx.org/g/Spdx-tech/message/4638
Mute This Topic: https://lists.spdx.org/mt/92212326/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

[spdx-tech] Document discussion

Reply via email to