[spdx-tech] Element Identifier Proposal - spdxNamespace

Gary O'Neall Tue, 17 Aug 2021 11:44:11 -0700

As suggested on today's tech call, I'm reposting a slightly modified
proposal for Element identifiers originally emailed on August 4.


 

Proposal Summary:

*       All Element ID's and LicenseRef ID's are URI's (same as is the case
for SPDX 2.2). 
*       Document has a property documentNamespace which is a string prefix
for all ID's local to the document (similar to SPDX 2.2, but with a few
differences outlined below)

*       documentNamespace would be optional - if it is not present, the non
linked-data formats would use the full URI for the ID's within the document
(SPDX 2.2 has this as a required field)
*       We would remove the requirement for all element ID's to be fragments
of the Document (e.g. no more special treatement of "#" - the
documentNamespace would be a simple string prefix).  This would create a
minor incompatibility with SPDX 2.2.
*       documentNamespace would have a restriction on the non-namespace
portion of the ID to allow for parsing of external document references (e.g.
externalDocumentRef-12:SPDXRef-14).  Currently, this is a colon (':'),
however, we may want to choose a different character which less common in
URI's.  We would remove the required prefixe of "SPDXRef-".

*       Elements have a property document which references the Document
containing the documentNamespace property

*       an alternative would be to include the documentNamespace property in
the Element - not my preferred approach, but it would still solve the
translation problem

 

Below is the criteria from today's call an my evaluation of meeting those
criteria:

*       Independently unique defined - All element ID's are URI's (or IRI's
if we change the spec) and meet this requirement
*       Independently unique referenced - All element ID's are URI's (or
IRI's if we change the spec) and meet this requirement
*       Not requiring intermediate steps

*       For linked data, the URI would be used for reference and can be done
directly
*       For non-linked data going to linked data, the URI can be used to
access directly
*       For non-linked data going to non-linked data, the target document
would need to be deserialized for access (I think this would be true for any
of the proposals)

*       Support non-linked data serializations - Storing the
documentNamespace as a property allows for lossless translation to/from
linked and non-linked data serialization formats

 

Original emailed proposal from August 4:

 

To support non-linked data, we need a way to translate the ID's back to and
from non-linked-data serialization formats.

 

The easiest approach would be to just include the entire ID String in
formats like tag/value.

 

This would end up with something like:

 

               .

               SPDXID:
http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#S
PDXRef-File

               .

 

Rather than the current format of:

 

               DocumentNamespace:
http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301

               .

               SPDXID: SPDXRef-File

               .

 

Similar issues for the Spreadsheet and YAML formats.  We also have a
non-linked-data JSON format which would also have the same ID issues.

 

If the above change is acceptable to those using the non-linked-data
serialization formats, I would definitely go with the simpler approach.

 

If we want the ID's to be short, however, we'll need to introduce something
like namespaces which are really a string prefix and have pre-defined rules
to make it possible to reliable to translate between the linked-data formats
(which will always use URI's) and the non-linked-data formats.

 

Here are the rules for SPDX-2.0:

*       The full URI is formed by concatenating the documentNamespace + '#'
+ SPDXID in non-linked-data formats
*       Linked Data formats must include a default namespace in their
serialization - this is the same namespace used as the documentNamespace
property used in the non-linked-data format appended by '#'
*       SPDX ID's are restricted to the format SPDXRef-[idString] where
idString is a unique string containing letters, numbers, ., and/or -.
*       Any ID's not defined within the SPDX document use the format
DocumentRef-[idString]:SpdxRef-[idString] for non-linked-data formats and
uses the external map to form the full URI

 

Note - there are similar rules for LicenseRef's.

 

Sean raised a valid issue regarding the required use of '#'.

 

I have a proposed solution below:

 

In thinking about this, since we have the documentNamespace and XMLNS
properties (for RDF/XML), we could relax this requirement and allow any
valid URI namespace prefix.  This creates a minor incompatibility since we
would need to append a '#' to the documentNamespace property for any pre-3.0
SPDX documents.

 

I would still suggest restricting the characters available for SPDXRef's to
make it possible to parse the ID's in the non-linked-data formats.  We
could, however, extend some of the characters (e.g. add "/" as an allowed
character).  As per previous discussions, we could also remove the
requirements for the SPDXRef- prefix.

 

This would solve some of the issues raised previously yet still allow
support for both linked-data and non-linked data.

 

Here's a proposed set of rules for 3.0:

*       The full URI is formed by concatenating the documentNamespace +
SPDXID in non-linked-data formats.  The documentNamespace property would be
optional.  If the documentNamespace not included, the SPDXID must be the
full URI.
*       Linked Data formats may include a default namespace in their
serialization - this is the same namespace used as the documentNamespace
property used in the non-linked-data format
*       SPDX ID's are restricted to be a unique (within the document) string
containing only letters, numbers, ., /, and/or -.
*       Any ID's not defined within the SPDX document use the format
DocumentRef-[idString]:[idString] for non-linked-data formats (NOTE: the ':'
must not be an allowed character in the idString)

 

I would further proposed some recommended practices:

*       Namespaces are used and must be unique
*       SPDX ID's have a format the conveys information about the type (per
previous conversations)
*       Namespaces not include '#' to make the URI's more HTTP addressable
(per Sean's concern)

 

Variations on a theme:

*       We could introduce a separator character for the namespace that
would be appended to the documentNamespace.  This would relax the
requirement for an XMLNS property in the RDF serializations since we could
then parse - although I'm not sure how reliable the parsing would be.
*       Require a namespace - this would make the tag/value more readable
and the expense of flexibility

 

Let me know if this sounds reasonable.

 

Gary

 

 

 

 

-------------------------------------------------

Gary O'Neall

Principal Consultant

Source Auditor Inc.

Mobile: 408.805.0586

Email:  <mailto:[email protected]> [email protected]

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review,
re-transmission, dissemination or other use of, or taking of any action in
reliance upon this information by persons or entities other than the
intended recipient is prohibited. If you received this in error, please
contact the sender and destroy any copies of this information.

 



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4161): https://lists.spdx.org/g/Spdx-tech/message/4161
Mute This Topic: https://lists.spdx.org/mt/84955264/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

[spdx-tech] Element Identifier Proposal - spdxNamespace

Reply via email to