Re: [spdx-tech] Thoughts on the issues of NamespaceMap and SpdxDocument

Gary O'Neall Thu, 17 Aug 2023 14:49:56 -0700

I thought I would update this email thread with some context – the results of 2 
meetings on the topic (SPDX Tech Call on 15 Aug and Serialization call on 17 
Aug) and the planned next steps.  I would encourage anyone interested in the 
issue to read through the context before next Tuesday’s tech call.


 

The original issue is #390 <https://github.com/spdx/spdx-3-model/issues/390> .

 

On the SPDX tech call, we agreed that for namespaces “we need to preserve 
roundtripping and conversion between formats, as well as linkage between 
elements located in different collections.”  This led to a discussion on how we 
preserve the namespaces for round tripping.  The next step was to create 
criteria and compare different solutions and choose one.

 

On the serialization call, we identified 4 potential solutions:

*       The original proposal to have a class with a “originalNamespaceMap” 
property to preserve the namespace from the original serialization.  This is 
represented in pull request 403 
<https://github.com/spdx/spdx-3-model/pull/403/files> .
*       Sean’s recent proposal at the start of this thread
*       David’s proposal to handle this in the serialization spec having 
specific requirements for handling and translating namespaces
*       Max’s proposal to add a Deserialization profile and class in pull 
request #479 <https://github.com/spdx/spdx-3-model/pull/479/files> 

 

During the discussion, we learned the original proposal and Sean’s proposal was 
more similar than different.  They both propose adding a class to the model to 
represent the namespaces and a purely optional “hint” type property.  The class 
created by the creator of the SPDX serialization and populated by the producer.

 

We also found similarities between Max’s proposal and David’s proposal in that 
the namespace information is not created and stored in the model objects 
itself, but rather the information is gathered when deserializing the data.  In 
Max’s proposal, that information is stored in a model object but that object is 
not part of the original serialization.

 

We agreed in the serialization call to not try to name these classes (we used 
the term “X” during the call) since calling something (or not calling 
something) and SPDX Document carries a lot of implications that may or may not 
apply.

 

The plan is to continue the discussion on Tuesday’s tech call.

 

We agreed we will NOT try to name the class during the call.

 

The first decision would be between having the namespace information part of 
the model objects serialized by the producer of the SPDX data (Sean and Gary’s 
approach) or whether we would use the native namespace mapping from the 
serialization format (David and Max’s proposal).

 

Once that’s decided, if there is a model object involved we can flesh out the 
properties and definition / semantics.

 

We’ll decide on the name in the following tech call.

 

Note: I will be hiking Monday through Thursday, so I won’t be online and I 
won’t be available for the meetings – but we should have good representation 
from other participants in the discussions.

 

That’s it for the context – I’m going to add a couple inputs into the 
discussion for consideration below.

 

Gary

 

A couple of inputs into the thread and discussion.

 

*       I’m leaning more towards the David / Max approach since it avoids 
ambiguity and possible inconsistencies between the actual serialization 
namespaces used and the namespaces represented in the model object.  I think it 
is also easier to understand.  I do have a couple questions and potential 
issues with this – I left them as comments in PR #479.

*       I do like having the model class a subclass of Artifact and not a 
subclass of Collection.  This does, however, require that the model object is 
NOT in the original Payload since it would be a recursive definition making it 
very hard to create a checksum.
*       If we go with Max’s proposal, is there any relationship between the 
ExternalMap and the Deserialized artifact?

*       I have the same view as David that this is just related to 
serialization.  It sounds like Sean has some additional use cases to consider, 
but I don’t know what they are and I have not considered the solution 
implications on those use cases.
*       It would be nice, but not necessary, if the solution also supported the 
licensing of copyright data.  The reason these may be related, is the data 
license is placed on a “copy” or artifact – not on the general model – so it is 
almost by definition serialization related (unless I’m misunderstanding 
copyright law – which is quite possible because IANL).  Reference: DATA 
MANAGEMENT: INTELLECTUAL PROPERTY AND COPYRIGHT 
<https://libguides.library.kent.edu/data-management/copyright#:~:text=Data%20are%20considered%20%22facts%22%20under,work%20and%20ensure%20proper%20attribution.>
 

 

I’ll catch up next week.

 

Gary

 

 

From: [email protected] <[email protected]> On Behalf Of David 
Kemp
Sent: Thursday, August 17, 2023 12:15 PM
To: [email protected]
Subject: Re: [spdx-tech] Thoughts on the issues of NamespaceMap and SpdxDocument

 

The net-net of my thoughts on these matters is the following:

*       The current SpdxDocument class should be removed from the model

 

+1 to Gary's proposal to defer naming the class currently called SpdxDocument.
 

*       Bills of material should be represented with Bom (or its subclasses) 
instances

 

Definitely, and other element collections such as Bundle are subclasses of 
ElementCollection

*       Metadata for specific serialization instances should be represented 
with File class instances


Name not chosen yet.
 

*       A new special direct subclass of ElementCollection should be defined (a 
couple of name suggestions above but not "SpdxDocument") to be an "outer layer" 
collection enforced with a constraint on the 'element' property of 
ElementCollection that it cannot contain this new "outer layer" type of 
collection thus preventing layering


Definitely not.  A pile (rhymes with "file") of serialized element values is 
not an ElementCollection, nor is the unnamed "X" element that describes it.  A 
set of elements can be parsed from one, two, or more serialized piles; there is 
no association of a single element to a particular pile in the model.  A single 
element can be in and be parsed from many piles, which can have the same or 
different data formats.

*       The NamespaceMap should be placed on the new "outer layer" type of 
collection class thus avoiding the potential conflicts and complexities of 
prefix layering


A NamespaceMap exists in the pile to enable compact serialized data to 
reconstruct full element instances. An "X" element class does not need to exist 
at all to parse a pile into full element instances.  If an X element does 
exist, its namespace map has nothing to do with element instances, only how 
much compaction is applied to other piles - a given SpdxId instance is 
serialized into: full IRI, short namespace + long local id, or long namespace + 
short local id.

*       NamespaceMap does not exist at all in element instances (the logical 
model).
*       I agree with Max's proposal that a consumer can create an "X" element 
for a particular pile that it parses some elements from. The consumer can then 
include that "X" element in piles that it produces, to allow external 
references from the produced pile to the consumed pile.
*       There is no such thing as "external" in the element graph; an instance 
with an id exists or it doesn't.  "External" is a concept that applies only to 
serialized data piles, it doesn't exist in the logical model
*       The use case of Payload A -> Logical Element Store -> Payload A` where 
A is destroyed but A' must use the same namespace map as A would require that 
namespace map be included in an element.  But I question the validity of 
requiring that Payload A be destroyed. If that were true, source integrity from 
the Producer of all elements in A would be lost.  Not to mention that it's as 
illogical as saying that a File element must be able to reconstruct the 
original content in file A' from the element store if original file A is 
destroyed.

 

On Thu, Aug 17, 2023 at 10:45 AM Sean Barnum <[email protected] 
<mailto:[email protected]> > wrote:

All,

 

After our discussions around NamespaceMap and SpdxDocument in our Aug 8th tech 
meeting I put a little more thought into these challenges. I recognize the 
implementation complexities around prefix layering that Gary and others were 
expressing deep concern with and respect their perspective. Given these 
expressed concerns but still remaining issues in the current approach, after 
some further thought I believe I have a compromise to propose that seems to 
address all of these issues cleanly.

 

Summarization of my thoughts are:

 

*       There have been previous discussions that putting NamespaceMap on 
ElementCollection gets very complicated because collections can contain 
collections (to an undefined depth) and there is a potential for prefix 
conflict. I would concur that this is a challenge of significant complexity to 
fully address.
*       There is existing confusion even from people in the tech working group 
on the relationship/difference between SpdxDocument and Bom. This includes 
questions on whether SpdxDocument is needed in the model. It was conveyed on 
the call that the purpose of SpdxDocument was to convey metadata for a specific 
serialization instance of SPDX content.
*       Having SpdxDocument in the model if it is intended to convey metadata 
for a specific serialization instance creates a messy situation as it conflates 
the model with specific serialization which is a separation that is very 
important to maintain for simplicity, flexibility, consistency, etc.

*       Details about how to serialize and specific instances of serialization 
should be specified and managed outside of the model.
*       If we want to convey verifiedUsing details of a specific instance of 
serialization then we should use a File Element to represent the serialized 
file and assert the verifiedUsing details on it. We could also then relate that 
File to the content Elements it contains (hopefully typically a single wrapping 
collection) with a "contains" Relationship. This is an appropriate way to 
handle metadata for a specific serialization instance and not confuse the model.
*       There are cases where instances of serialization may not involve a file 
and this would call for the need for a ContentData (chunk-o-bits) Element which 
we have in prior discussions scoped to after 3.0 so for 3.0 instances of 
serialization would only be Files.

*       To simplify a solution for NamespaceMap in layered collections it does 
make sense to specify a special subclass (directly) of ElementCollection that 
is intended to be a collection that no other collection can reference as an 
Element (thus preventing layering). This special subclass should not be thought 
of as specific to serialization but rather just a special kind of collection in 
the model. This layer prevention would need explicit assertion in the RDFS/OWL 
and in the SHACL such that the range of the 'element' property on 
ElementCollection would be Element but NOT the outer-shell collection. I would 
propose not using "SpdxDocument" as the name of this class as it has a lot of 
history and would have the potential for a lot of confusion. We should choose 
another name for this class that conveys that it is an outer-shell-only 
collection and namespaceMap can be a property on it. We could go with something 
simple like "EnclosingCollection" though that name does not inherently convey 
lack of layering. Another more esoteric but explicit possibility could be to 
maybe borrow from chemistry and we could call it "ValenceShellCollection" given 
the universal usage of the term 'valence shell' to be the outermost layer of 
electrons in any atomic element and the layer that reacts with content outside 
the atom. This seems pretty close to what we are trying to convey.

 

The net-net of my thoughts on these matters is the following:

*       The current SpdxDocument class should be removed from the model

*       Bills of material should be represented with Bom (or its subclasses) 
instances
*       Metadata for specific serialization instances should be represented 
with File class instances

*       A new special direct subclass of ElementCollection should be defined (a 
couple of name suggestions above but not "SpdxDocument") to be an "outer layer" 
collection enforced with a constraint on the 'element' property of 
ElementCollection that it cannot contain this new "outer layer" type of 
collection thus preventing layering
*       The NamespaceMap should be placed on the new "outer layer" type of 
collection class thus avoiding the potential conflicts and complexities of 
prefix layering

 

 

Thank you for your consideration.

 

Sean

 





-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#5298): https://lists.spdx.org/g/Spdx-tech/message/5298
Mute This Topic: https://lists.spdx.org/mt/100801657/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [spdx-tech] Thoughts on the issues of NamespaceMap and SpdxDocument

Reply via email to