With one modification, I agree that  "Once deserialized to a logical model
and then strictly serialized to a defined format and hashed then that is
sufficient."

I use "semantics" to mean the semantics of fundamental data types, not
relationships between independent elements.  For example, an IPv4 address
is semantically a 32 bit integer and is exactly 32 bits when serialized
into an IPv4 packet transmitted over the network.  Putting a string like
"192.48.240.119" into JSON or XML serialized data is semantically identical
to that 32 bit value in the packet.  Adding leading zeros "192.048.240.119"
does not change the semantics of the IP address, and the surest way to
ensure that the hashes are canonical is to convert the strings to 4 bytes
before hashing.

The "fundamental types" includes Integer and String, which leads to the
modification: the data is deserialized to an "information" model, not a
"logical" model.  The difference is that information models are
prescriptive, whereas logical models are descriptive.  A logical model
would say that a property "address" has type "IPv4-Address", but an
information model defines type IPv4-Address as having both a fundamental
type (Integer) and a text representation usable for serializations that
either do not support Integer fundamental type or choose not to serialize
IP addresses, MAC addresses, uuids, timestamps, etc, as integers.

In the context of the SPDX3 logical model, it says that a Collection has
0..* Elements.  But it doesn't define how Collection is serialized, not
even to the point of distinguishing between Element values and Element
pointers.  An information model defines the fundamental data types of
ordered and unordered sets, how to serialize each of those types, and
whether collection members are serialized as ordered or unordered sets of
property values or as IRI string pointers.

Those details are abstracted away in logical models, they must be precisely
defined in information models in order to support canonicalization.

Dave


On Mon, May 16, 2022 at 1:22 PM William Bartholomew (CELA) via
lists.spdx.org <[email protected]> wrote:

> I was thinking about this last week as I was putting together some SPDX
> 3.0 samples. Do we think canonicalization, whose purpose is to be able to
> hash the content, needs to understand the semantics at all? Or, can we say
> that once deserialized to a logical model and then strictly serialized to a
> defined format and hashed then that is sufficient? This serialization would
> only need to understand fundamental data types and since it is per element
> doesn't need to understand the relationships between elements.
>
>
> Regards,
>
> William Bartholomew (he/him) - Let's chat
> Principal Security Strategist
> Global Cybersecurity Policy - Microsoft
>
> My working day may not be your working day. Please don't feel obliged to
> reply to this e-mail outside of your normal working hours.
>
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of
> Sebastian Crane via lists.spdx.org
> Sent: Monday, May 16, 2022 10:12 AM
> To: SPDX Technical Mailing List <[email protected]>
> Subject: [EXTERNAL] [spdx-tech] Agenda topic for tomorrow's meeting:
> SpecVersion property
>
> Dear all,
>
> I would like to propose an agenda topic for tomorrow's SPDX Tech Team
> meeting:
> the precise format of the SpecVersion property on each Element.
>
> During last week's Canonicalisation Committee meeting we discussed the
> factors for a canonical represenation of this property's data type.
> However, it became apparent that many aspects of the discussion were out of
> scope for the Canonicalisation Committee, and should be brought up with the
> wider Tech Team.
>
> On the draft SPDX 3.0 model diagram, each Element has a SpecVersion
> property indicating which version of the SPDX Specification the Element is
> conformant to. The data type of SpecVersion refers to the 'Semantic
> Versioning' scheme described at
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsemver.org%2F&amp;data=05%7C01%7Cwillbar%40microsoft.com%7Cb4a85115c5794d2f2d0c08da375f4297%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637883179551953921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=8qVsyShOJ4hAD84Hzi8iZsXi2luv5cwkw1EP6FkAiKI%3D&amp;reserved=0
> - known simply as 'SemVer'.
>
> At the Canonicalisation Committee meeting we came up with three option for
> the SpecVersion property's data type:
>
> 1: A structured data type composed of integers corresponding to the major,
> minor and patch levels
>
> 2: A plain string
>
> 3: Enumeration (enum) type of specification versions published by SPDX
>
> It's worth noting that No.1 can only express a subset of the full Semantic
> Versioning specification, which supports extra tags like 'release
> candidate' and 'alpha'.
>
> The main criteria that came up in the meeting was the ability for tooling
> to ignore or reject SPDX data that is of a version not supported by that
> tool.
>
> If the types of changes represented by the major, minor and patch levels
> are rigorously defined and consistent, tools would be able to determine
> compatibilty automatically with No.1. This is more nuanced when considering
> the Canonical Serialisation, since this could make feature releases
> (usually minor changes) breaking, major changes. Clearly, a tool can not
> implement the canonical represenation of a value introduced in the future!
> Yet, a tool merely performing analysis on the data fields it understands
> can just ignore the newly added fields.
>
> No.3 would also allow for automated compatibility determination, but only
> for SPDX specification versions that it is hard-coded to understand, due to
> the 'semantic' elements of the version specifier being opaque to the tool.
>
> Looking forward to hearing everyones' views on this!
>
> Best wishes,
>
> Sebastian
>
>
>
>
>
>
>
> 
>
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4524): https://lists.spdx.org/g/Spdx-tech/message/4524
Mute This Topic: https://lists.spdx.org/mt/91144916/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to