With one modification, I agree that "Once deserialized to a logical model and then strictly serialized to a defined format and hashed then that is sufficient."
I use "semantics" to mean the semantics of fundamental data types, not relationships between independent elements. For example, an IPv4 address is semantically a 32 bit integer and is exactly 32 bits when serialized into an IPv4 packet transmitted over the network. Putting a string like "192.48.240.119" into JSON or XML serialized data is semantically identical to that 32 bit value in the packet. Adding leading zeros "192.048.240.119" does not change the semantics of the IP address, and the surest way to ensure that the hashes are canonical is to convert the strings to 4 bytes before hashing. The "fundamental types" includes Integer and String, which leads to the modification: the data is deserialized to an "information" model, not a "logical" model. The difference is that information models are prescriptive, whereas logical models are descriptive. A logical model would say that a property "address" has type "IPv4-Address", but an information model defines type IPv4-Address as having both a fundamental type (Integer) and a text representation usable for serializations that either do not support Integer fundamental type or choose not to serialize IP addresses, MAC addresses, uuids, timestamps, etc, as integers. In the context of the SPDX3 logical model, it says that a Collection has 0..* Elements. But it doesn't define how Collection is serialized, not even to the point of distinguishing between Element values and Element pointers. An information model defines the fundamental data types of ordered and unordered sets, how to serialize each of those types, and whether collection members are serialized as ordered or unordered sets of property values or as IRI string pointers. Those details are abstracted away in logical models, they must be precisely defined in information models in order to support canonicalization. Dave On Mon, May 16, 2022 at 1:22 PM William Bartholomew (CELA) via lists.spdx.org <[email protected]> wrote: > I was thinking about this last week as I was putting together some SPDX > 3.0 samples. Do we think canonicalization, whose purpose is to be able to > hash the content, needs to understand the semantics at all? Or, can we say > that once deserialized to a logical model and then strictly serialized to a > defined format and hashed then that is sufficient? This serialization would > only need to understand fundamental data types and since it is per element > doesn't need to understand the relationships between elements. > > > Regards, > > William Bartholomew (he/him) - Let's chat > Principal Security Strategist > Global Cybersecurity Policy - Microsoft > > My working day may not be your working day. Please don't feel obliged to > reply to this e-mail outside of your normal working hours. > > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of > Sebastian Crane via lists.spdx.org > Sent: Monday, May 16, 2022 10:12 AM > To: SPDX Technical Mailing List <[email protected]> > Subject: [EXTERNAL] [spdx-tech] Agenda topic for tomorrow's meeting: > SpecVersion property > > Dear all, > > I would like to propose an agenda topic for tomorrow's SPDX Tech Team > meeting: > the precise format of the SpecVersion property on each Element. > > During last week's Canonicalisation Committee meeting we discussed the > factors for a canonical represenation of this property's data type. > However, it became apparent that many aspects of the discussion were out of > scope for the Canonicalisation Committee, and should be brought up with the > wider Tech Team. > > On the draft SPDX 3.0 model diagram, each Element has a SpecVersion > property indicating which version of the SPDX Specification the Element is > conformant to. The data type of SpecVersion refers to the 'Semantic > Versioning' scheme described at > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsemver.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7Cb4a85115c5794d2f2d0c08da375f4297%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637883179551953921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8qVsyShOJ4hAD84Hzi8iZsXi2luv5cwkw1EP6FkAiKI%3D&reserved=0 > - known simply as 'SemVer'. > > At the Canonicalisation Committee meeting we came up with three option for > the SpecVersion property's data type: > > 1: A structured data type composed of integers corresponding to the major, > minor and patch levels > > 2: A plain string > > 3: Enumeration (enum) type of specification versions published by SPDX > > It's worth noting that No.1 can only express a subset of the full Semantic > Versioning specification, which supports extra tags like 'release > candidate' and 'alpha'. > > The main criteria that came up in the meeting was the ability for tooling > to ignore or reject SPDX data that is of a version not supported by that > tool. > > If the types of changes represented by the major, minor and patch levels > are rigorously defined and consistent, tools would be able to determine > compatibilty automatically with No.1. This is more nuanced when considering > the Canonical Serialisation, since this could make feature releases > (usually minor changes) breaking, major changes. Clearly, a tool can not > implement the canonical represenation of a value introduced in the future! > Yet, a tool merely performing analysis on the data fields it understands > can just ignore the newly added fields. > > No.3 would also allow for automated compatibility determination, but only > for SPDX specification versions that it is hard-coded to understand, due to > the 'semantic' elements of the version specifier being opaque to the tool. > > Looking forward to hearing everyones' views on this! > > Best wishes, > > Sebastian > > > > > > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4524): https://lists.spdx.org/g/Spdx-tech/message/4524 Mute This Topic: https://lists.spdx.org/mt/91144916/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
