CIL

________________________________
From: [email protected] <[email protected]> on behalf of David 
Kemp via lists.spdx.org <[email protected]>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <[email protected]>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization

One principle is that the goal of serialization is to put Elements into 
physical format, NOT to create new elements that didn't exist prior to 
serialization.  If you have 6 elements going into serialization, you should 
have 6 elements coming out, not 7.

[William] Agreed, does my example violate that? It would be difficult for a 
serialization to "generate" elements because of the id and other required 
properties so I had not considered this a possibility.

The second principle is that logical elements should be independent: the value 
of one element does not depend on the value of any other element.

[William] I think it depends on your definition of "depends on" (pun intended). 
Elements may have properties that are references to other elements and 
serializers may choose to use that information for more compact serialization 
but since this would get unwound on deserialization that's immaterial.

I believe that those two principles are worth adopting as design requirements.

It is ugly to put something into serialization and get something else back out,

[William] Agreed, though a lot of serializers/deserializers end up making minor 
changes as a result of normalization and other processes. Not ideal but that's 
an implementation detail within each serializer/deserializer.

and it's really ugly to stuff one element's value inside another

[William] I don't agree with this, at least for "collection" elements. Also, 
the serialization model for collection elements could support either element 
references or the element itself so if you think it's ugly then you would have 
the option of not doing nesting.

not least because you can wind up with infinite recursion with documents inside 
documents inside documents inside documents

[William] This is avoidable and using references instead of nesting doesn't 
prevent this problem. In fact, if you only use nesting then it's impossible to 
have infinite recursion, it's only when you use references that becomes 
possible.

 Even two levels of element nesting makes things quite difficult to disentangle.

[William] I don't agree, for collections the nesting makes it obvious which 
collection an element is part of without having to follow the id references. 
Since the serialization model could support either approach I don't see this 
being a blocker.

The fundamental principle is that a file containing data is not an element.  A 
Transfer Unit is defined by a data schema, just like the content of any XML 
file or JSON file or ASN.1 file.  If the logical model has a Document element 
that describes an X.509 certificate, that element has interesting facts about 
the certificate but does not define its content.  It is essential to remember 
the difference between the bytes in a file and the properties of a File or 
Document element - the difference between a thing and metadata about that thing.

[William] We've had this discussion a number of times, the Collection element 
(and its subclasses) aren't metadata about collections, document, SBOM, etc. 
they are the collection, document, SBOM, etc. There is no "physical" thing 
outside of the SPDX document that is the collection, document, SBOM, etc., they 
only exist in the SPDX graph. You could take that SBOM, serialize it to disk, 
and then have a File element that talks about the physical serialization of the 
SBOM, but that's different to the SBOM SPDX element.

* defaults:
I created a separate defaults property to hold the five defaultable properties 
in order to distinguish them from non-defaultable properties.  Gary and I like 
the idea, but I'm not wedded to it.  The transfer unit schema could have 
"defaultCreatedBy", "defaultCreated", etc properties at the top level, to 
highlight that they are defaults, unlike name, description, comments, etc.  
Whatever the mechanism, there must be a way to ensure that "name" doesn't take 
an inappropriate default value if it isn't populated, while the default for 
"profiles" is appropriate.

[William] I'm struggling with multiple properties that have the same definition 
having different names and different locations on the objects, it feels like a 
lot to explain. We could flag certain properties as inheritable in the schema, 
and this only applies to collection elements so I think the scope is quite 
narrow.

* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of 
element ids, and because mapping types are ubiquitous now,  XML schema had it 
in 2005 
https://www.w3.org/2005/07/xml-schema-patterns.html#Maps<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2005%2F07%2Fxml-schema-patterns.html%23Maps&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQ3nPm2ddYPE8xvFM0KqQAlHYZwbK%2BoIxh9K8hTz9r4%3D&reserved=0>,
 and it's a built-in part of JSON.

[William] While it is built-in to XML and JSON my experience is that it's not 
been supported well by schema languages and serializers/deserializers. I know 
I've had situations where I had to duplicate the id property in the class to 
ensure that other things work correctly (and to maintain the independence of 
the class). Also, in most object oriented languages there is not a way to get 
the key from the object so you end up having to track the key independently of 
the object which is a pain.

JSON-LD even treats ID differently from other properties by giving it a 
reserved @ID type, and SQL databases have primary keys with the special 
characteristic that they uniquely identify the record rather than being just 
another column.  Autogenerated ids are often hidden because they are ubiquitous.

[William] In both JSON-LD and SQL the properties are still normal properties, 
in JSON-LD it's still a property on the object it just has a special name, in 
SQL it's still a column in the table it just has special metadata attached to 
it. Even autogenerated ids are typically normal columns they're just system 
generated and you can't change their definition.

  And finally, you introduced Map to the logical model for Extensions.  If it's 
OK for extensions, it's OK for Elements :-).

[William] Not the same 🙂. The map for extensions is a map of "extension type" 
to value, not of "id" to value. It is a consequence of us deciding that each 
type can only be assigned once that it can also be used as an id, but it is 
primarily a type, not an id. If we changed that design decision it would no 
longer function as an id.

Seriously though, I'm not wedded to Map.  Treating Id as any other property but 
having some prose saying that it can be used as a primary key / unique 
identifier is OK, it's just kind of loose given that references from foreign to 
primary keys is a universal concept.

[William] In SQL Server (others are similar) a foreign key takes the form 
FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, 
ParentCol2, ...), they're still just columns, nothing magical about them, not 
even their names.

* type property
Since JSON does not have types it's good practice to ensure that "type: 
identity" cannot collide with a property named "identity".  At the core profile 
all type and property names are defined and don't collide, but if "type" goes 
away we'll need to ensure that properties defined in any profile cannot collide 
with types defined in any profile.  Again JSON-LD treats @type as a reserved 
property: 
https://w3c.github.io/json-ld-syntax/#typed-values<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Fjson-ld-syntax%2F%23typed-values&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CKnF7%2BIEKZf8stmRlX21mxCvsHPWJi1OOT7zbGdsNQ4%3D&reserved=0>.

[William] Agreed, and type isn't in the logical model, a JSON-LD serializer 
would use @type, an XML one would use XML namespaces and element names, a 
ProtoBuf one would use message types. Since my examples were "plain" JSON which 
does not have a built-in way of declaring types I used a "plain" property to 
capture the type, I agree that the name of this property should avoid potential 
conflict (e.g. by prefixing with an _).

* document root
A transfer unit file is not an Element and not a logical type or a class. The 
bytes in SPDX documents are not defined by the logical model, they just have to 
be able to be de-serialized into element instances.

[William] Same disagreement as above.

Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they 
define only data types.

[William] I'm not sure what definition of "class" you're using here, but the 
boxes on the diagram could be represent in an OO language as classes or 
interfaces, for our purposes I don't think the distinction between class and 
data type is meaningful.

Regards,
Dave


On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via 
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TiataTJ3Eq7MrdDQuuJIlKVEqBSvD3161KGOG6FSG%2BQ%3D&reserved=0>
 <[email protected]<mailto:[email protected]>> 
wrote:

There are some “proposed” examples at the bottom of the model diagram (note 
that I intended these to be representative until we define the exact 
serialization for each data format):

https://github.com/spdx/spdx-3-model/blob/main/model.png<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-3-model%2Fblob%2Fmain%2Fmodel.png&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rqc21Xo%2F%2B2KlwtUvFSqmnyhiGCBPJKTZydqzDG1VYw0%3D&reserved=0>



Some of the key differences (with no implied support for either choice, I have 
included my reasoning for reference only):

  *   Defaults being represented as the original properties on a collection 
element vs being in their own “defaults” property.
     *   I was thinking about this as a traditional inheritance/overrides 
structure. If a property doesn’t have a value you can walk the tree up looking 
for the same property.
  *   Array of elements vs map of elements.
     *   In the past I have found schema languages don’t have good support for 
one of the properties of an object being outside of the object (i.e. a key on 
the collection outside). Having a completely contained object makes 
canonicalization etc. easier at the risk of the array having multiple instances 
of the same element (which can be solved in other ways).
  *   Type being a string property vs an object property containing the type.
     *   I mainly followed the JSON-LD style and it has one less level of 
nesting.
  *   Document root being an element vs a custom class.
     *   Tried to minimize custom classes by having everything as either an 
element or a value type.





Regards,



William Bartholomew (he/him) – Let’s 
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Fbookwithme%2Fuser%2F988a5aee063345bab5c400a0da19af33%40microsoft.com%2Fmeetingtype%2FSVRwCe7HMUGxuT6WGxi68g2%3Fanonymous%26ep%3Dmlink&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fOWymTbUg2G24V70i0ng03S7DCnFGcdN45pwnDYCPrw%3D&reserved=0>

Principal Security Strategist

Global Cybersecurity Policy – Microsoft



My working day may not be your working day. Please don’t feel obliged to reply 
to this e-mail outside of your normal working hours.



From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> On Behalf Of David 
Kemp via 
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TiataTJ3Eq7MrdDQuuJIlKVEqBSvD3161KGOG6FSG%2BQ%3D&reserved=0>
Sent: Monday, July 18, 2022 1:56 PM
To: SPDX-list <[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL] [spdx-tech] V3 serialization



Last week I took an action item to describe what serialized data for the v3 
logical model could look like, in order to clarify discussion of the types 
shown in the model.

The thing to remember about v3 is that it is knowledge graph centric, not 
document centric. Element instances from the knowledge graph can be serialized 
into data instances, but the data definition is controlled by the logical 
model, not vice versa.  Data examples in various formats can illustrate the 
logical model for readers of the v3 spec, but they do not define it as they do 
in SPDX v2.

A collection of independent element values is shown in "logical-elements".  
JSON data is use to visualize the element values, but it is important to 
remember that the logical value itself is the ability to answer questions:
* what is the id of this element?
* what is the type of this element?
* who created this element?
etc.  The element is a class with getters that allow each property of an 
instance to be retrieved, and those property values are independent of 
serialization format.

That collection of elements can be serialized into a transfer unit file as 
shown in "transfer units"

A Document element describes the contents of a transfer unit, but does not need 
to be present in the transfer unit.  The example transfer unit containing six 
elements (an SBOM, a Package, two Files, a Relationship, and an Actor that 
created them) is:

{
  "namespace": 
"urn:acme.dev<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Facme.dev%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PKSuAOtEI1oCP5UYPeEQEjOt9abpeMQhfII1XzDY4M4%3D&reserved=0>:",
  "defaults": {
    "createdBy": ["identities:fred"],
    "created": "2022-04-05T22:00:00Z",
    "specVersion": "3.0",
    "profiles": ["Core", "Software"],
    "dataLicense": "CC0-1.0"
  },
  "elementValues": {
    "artifacts:gnu-coreutils/v9.1/src/du.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1/src/echo.c": {
      "type": {
        "file": {
          "filePurpose": ["APPLICATION", "SOURCE"]
        }
      }
    },
    "artifacts:gnu-coreutils/v9.1": {
      "type": {
        "package": {
          "packagePurpose": ["APPLICATION", "SOURCE"],
          "downloadLocation": 
"http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmirror.rit.edu%2Fgnu%2Fcoreutils%2Fcoreutils-9.1.tar.gz&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4sq6zH7mAV6i7AmXqkbACbfYy%2BkEBWZmOIrXNN1HJxM%3D&reserved=0>",
          "homePage": 
"https://www.gnu.org/software/coreutils/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fcoreutils%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uBw%2BKAYUGx6qpjWVTTtWQY1LLuT0i3EOFHzp7U%2BWfPw%3D&reserved=0>"
        }
      },
      "name": "GNU Coreutils"
    },
    "relationships:gnu-coreutils/v9.1": {
      "type": {
        "relationship": {
          "relationshipType": "CONTAINS",
          "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1",
          "to": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c"
          ]
        }
      }
    },
    "identities:fred": {
      "type": {
        "actor": {}
      },
      "identifiedBy": [{"email": "[email protected]<mailto:[email protected]>"}]
    },
    "sboms:gnu-coreutils/v9.1": {
      "type": {
        "sbom": {
          "elements": [
            "artifacts:gnu-coreutils/v9.1/src/du.c",
            "artifacts:gnu-coreutils/v9.1/src/echo.c",
            "artifacts:gnu-coreutils/v9.1",
            "relationships:gnu-coreutils/v9.1",
            "identities:fred"
          ]
        }
      }
    }
  }
}

The element examples, transfer unit examples, and the SPDX v3 schema derived 
from the logical model are available in 
https://github.com/davaya/spdx-3-elements<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdx-3-elements&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3LM2oFWzhacH2fk0Na7s7mo1Vb4cpu4v3miwzeSPB%2Fc%3D&reserved=0>.

The intent is for these to assist in refining the logical model and its 
serializations together.

Regards,
Dave




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4660): https://lists.spdx.org/g/Spdx-tech/message/4660
Mute This Topic: https://lists.spdx.org/mt/92468742/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to