I was thinking about this last week as I was putting together some SPDX 3.0 samples. Do we think canonicalization, whose purpose is to be able to hash the content, needs to understand the semantics at all? Or, can we say that once deserialized to a logical model and then strictly serialized to a defined format and hashed then that is sufficient? This serialization would only need to understand fundamental data types and since it is per element doesn't need to understand the relationships between elements.
Regards, William Bartholomew (he/him) - Let's chat Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. -----Original Message----- From: [email protected] <[email protected]> On Behalf Of Sebastian Crane via lists.spdx.org Sent: Monday, May 16, 2022 10:12 AM To: SPDX Technical Mailing List <[email protected]> Subject: [EXTERNAL] [spdx-tech] Agenda topic for tomorrow's meeting: SpecVersion property Dear all, I would like to propose an agenda topic for tomorrow's SPDX Tech Team meeting: the precise format of the SpecVersion property on each Element. During last week's Canonicalisation Committee meeting we discussed the factors for a canonical represenation of this property's data type. However, it became apparent that many aspects of the discussion were out of scope for the Canonicalisation Committee, and should be brought up with the wider Tech Team. On the draft SPDX 3.0 model diagram, each Element has a SpecVersion property indicating which version of the SPDX Specification the Element is conformant to. The data type of SpecVersion refers to the 'Semantic Versioning' scheme described at https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsemver.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7Cb4a85115c5794d2f2d0c08da375f4297%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637883179551953921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8qVsyShOJ4hAD84Hzi8iZsXi2luv5cwkw1EP6FkAiKI%3D&reserved=0 - known simply as 'SemVer'. At the Canonicalisation Committee meeting we came up with three option for the SpecVersion property's data type: 1: A structured data type composed of integers corresponding to the major, minor and patch levels 2: A plain string 3: Enumeration (enum) type of specification versions published by SPDX It's worth noting that No.1 can only express a subset of the full Semantic Versioning specification, which supports extra tags like 'release candidate' and 'alpha'. The main criteria that came up in the meeting was the ability for tooling to ignore or reject SPDX data that is of a version not supported by that tool. If the types of changes represented by the major, minor and patch levels are rigorously defined and consistent, tools would be able to determine compatibilty automatically with No.1. This is more nuanced when considering the Canonical Serialisation, since this could make feature releases (usually minor changes) breaking, major changes. Clearly, a tool can not implement the canonical represenation of a value introduced in the future! Yet, a tool merely performing analysis on the data fields it understands can just ignore the newly added fields. No.3 would also allow for automated compatibility determination, but only for SPDX specification versions that it is hard-coded to understand, due to the 'semantic' elements of the version specifier being opaque to the tool. Looking forward to hearing everyones' views on this! Best wishes, Sebastian -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4514): https://lists.spdx.org/g/Spdx-tech/message/4514 Mute This Topic: https://lists.spdx.org/mt/91144916/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
