Re: [spdx-tech] Element Identifier Proposal - spdxNamespace

David Kemp Wed, 18 Aug 2021 06:38:16 -0700

Gary, I nearly agree, and I think we are making this harder than it is. The
model currently uses made-up typenames, not actual types from
https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes.  Fixing the model
would fix the problem.


*Proposal Summary:*
>
>    - All Element ID’s and LicenseRef ID’s are URI’s (same as is the case
>    for SPDX 2.2).
>
>
Yes, the Element/id property is type xsd:anyURI


>    - Document has a property *documentNamespace* which is a string prefix
>    for all ID’s local to the document (similar to SPDX 2.2, but with a few
>    differences outlined below)
>
>
Yes, the Document/namespace property is type xsd:anyURI


>
>    - documentNamespace would be optional – if it is not present, the non
>       linked-data formats would use the full URI for the ID’s within the 
> document
>       (SPDX 2.2 has this as a required field)
>
>
This assumes that what you call "non-linked data" is a document.  That's
one viable solution, another would be that Element does not need to be
enclosed in a Document, and a bare Element is what you call linked data.


>    - We would remove the requirement for all element ID’s to be fragments
>       of the Document (e.g. no more special treatement of “#” – the
>       documentNamespace would be a simple string prefix).  This would create a
>       minor incompatibility with SPDX 2.2.
>
>
Yes.


>    - documentNamespace would have a restriction on the non-namespace
>       portion of the ID to allow for parsing of external document references
>       (e.g. externalDocumentRef-12:SPDXRef-14).  Currently, this is a colon
>       (‘:’), however, we may want to choose a different character which less
>       common in URI’s.  We would remove the required prefixe of “SPDXRef-“.
>
>
I don't think we should define any special syntax for the non-namespace
portion of the URI - the last slash in *hier-part* separates the namespace
from the non-namespace, if the URI ends with a slash the non-namespace part
is empty, and the non-namespace part cannot contain a slash.  That is
simple and unambiguous for both concatenating and splitting.


>    - Elements have a property *document *which references the Document
>    containing the documentNamespace property
>       - an alternative would be to include the documentNamespace property
>       in the Element – not my preferred approach, but it would still solve the
>       translation problem
>
>
This assumes that 1) a Document can contain other Documents, and 2) any
contained Documents have their elements stripped out and moved into the
top-level Document.  If you embed one document into another, wouldn't it be
preferable to leave its contents intact, with the namespace of an Element
always specified in the Document containing that Element?

-----------------

I agree with Sean that we should consider not requiring Document for
Elements whose id is an absolute URI.  I started a drawing showing:
1) a Document containing Elements with absolute URI ids
2) a Document containing Elements with relative URI ids  (what we are using
today, and must continue to allow)
3) an Element not contained in a Document with an absolute URI id

The slide might be more confusing than enlightening, but it seems clear to
me.  It is intended to show that all references to an Element are always an
absolute URI even when the element id is relative.
https://docs.google.com/presentation/d/1v62mftkzWvH8WwdQgtJwWM6FovHoS2VRTWkZxjbaG00
.

Dave


On Tue, Aug 17, 2021 at 2:44 PM Gary O'Neall <[email protected]> wrote:

> As suggested on today’s tech call, I’m reposting a slightly modified
> proposal for Element identifiers originally emailed on August 4.
>
>
>
> *Proposal Summary:*
>
>    - All Element ID’s and LicenseRef ID’s are URI’s (same as is the case
>    for SPDX 2.2).
>    - Document has a property *documentNamespace* which is a string prefix
>    for all ID’s local to the document (similar to SPDX 2.2, but with a few
>    differences outlined below)
>       - documentNamespace would be optional – if it is not present, the
>       non linked-data formats would use the full URI for the ID’s within the
>       document (SPDX 2.2 has this as a required field)
>       - We would remove the requirement for all element ID’s to be
>       fragments of the Document (e.g. no more special treatement of “#” – the
>       documentNamespace would be a simple string prefix).  This would create a
>       minor incompatibility with SPDX 2.2.
>       - documentNamespace would have a restriction on the non-namespace
>       portion of the ID to allow for parsing of external document references
>       (e.g. externalDocumentRef-12:SPDXRef-14).  Currently, this is a colon
>       (‘:’), however, we may want to choose a different character which less
>       common in URI’s.  We would remove the required prefixe of “SPDXRef-“.
>    - Elements have a property *document *which references the Document
>    containing the documentNamespace property
>       - an alternative would be to include the documentNamespace property
>       in the Element – not my preferred approach, but it would still solve the
>       translation problem
>
>
>
> *Below is the criteria from today’s call an my evaluation of meeting those
> criteria:*
>
>    - Independently unique defined – All element ID’s are URI’s (or IRI’s
>    if we change the spec) and meet this requirement
>    - Independently unique referenced – All element ID’s are URI’s (or
>    IRI’s if we change the spec) and meet this requirement
>    - Not requiring intermediate steps
>       - For linked data, the URI would be used for reference and can be
>       done directly
>       - For non-linked data going to linked data, the URI can be used to
>       access directly
>       - For non-linked data going to non-linked data, the target document
>       would need to be deserialized for access (I think this would be true for
>       any of the proposals)
>    - Support non-linked data serializations – Storing the
>    documentNamespace as a property allows for lossless translation to/from
>    linked and non-linked data serialization formats
>
>
>
> *Original emailed proposal from August 4:*
>
>
>
> To support non-linked data, we need a way to translate the ID’s back to
> and from non-linked-data serialization formats.
>
>
>
> The easiest approach would be to just include the entire ID String in
> formats like tag/value.
>
>
>
> This would end up with something like:
>
>
>
>                …
>
>                SPDXID:
> http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#SPDXRef-File
>
>                …
>
>
>
> Rather than the current format of:
>
>
>
>                DocumentNamespace:
> http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301
>
>                …
>
>                SPDXID: SPDXRef-File
>
>                …
>
>
>
> Similar issues for the Spreadsheet and YAML formats.  We also have a
> non-linked-data JSON format which would also have the same ID issues.
>
>
>
> If the above change is acceptable to those using the non-linked-data
> serialization formats, I would definitely go with the simpler approach.
>
>
>
> If we want the ID’s to be short, however, we’ll need to introduce
> something like namespaces which are really a string prefix and have
> pre-defined rules to make it possible to reliable to translate between the
> linked-data formats (which will always use URI’s) and the non-linked-data
> formats.
>
>
>
> Here are the rules for SPDX-2.0:
>
>    - The full URI is formed by concatenating the documentNamespace + ‘#’
>    + SPDXID in non-linked-data formats
>    - Linked Data formats must include a default namespace in their
>    serialization – this is the same namespace used as the documentNamespace
>    property used in the non-linked-data format appended by ‘#’
>    - SPDX ID’s are restricted to the format SPDXRef-[idString] where
>    idString is a unique string containing letters, numbers, ., and/or -.
>    - Any ID’s not defined within the SPDX document use the format
>    DocumentRef-[idString]:SpdxRef-[idString] for non-linked-data formats and
>    uses the external map to form the full URI
>
>
>
> Note – there are similar rules for LicenseRef’s.
>
>
>
> Sean raised a valid issue regarding the required use of ‘#’.
>
>
>
> I have a proposed solution below:
>
>
>
> In thinking about this, since we have the documentNamespace and XMLNS
> properties (for RDF/XML), we could relax this requirement and allow any
> valid URI namespace prefix.  This creates a minor incompatibility since we
> would need to append a ‘#’ to the documentNamespace property for any
> pre-3.0 SPDX documents.
>
>
>
> I would still suggest restricting the characters available for SPDXRef’s
> to make it possible to parse the ID’s in the non-linked-data formats.  We
> could, however, extend some of the characters (e.g. add “/” as an allowed
> character).  As per previous discussions, we could also remove the
> requirements for the SPDXRef- prefix.
>
>
>
> This would solve some of the issues raised previously yet still allow
> support for both linked-data and non-linked data.
>
>
>
> Here’s a proposed set of rules for 3.0:
>
>    - The full URI is formed by concatenating the documentNamespace +
>    SPDXID in non-linked-data formats.  The documentNamespace property would be
>    optional.  If the documentNamespace not included, the SPDXID must be the
>    full URI.
>    - Linked Data formats may include a default namespace in their
>    serialization – this is the same namespace used as the documentNamespace
>    property used in the non-linked-data format
>    - SPDX ID’s are restricted to be a unique (within the document) string
>    containing only letters, numbers, ., /, and/or -.
>    - Any ID’s not defined within the SPDX document use the format
>    DocumentRef-[idString]:[idString] for non-linked-data formats (NOTE: the
>    ‘:’ must not be an allowed character in the idString)
>
>
>
> I would further proposed some *recommended* practices:
>
>    - Namespaces are used and must be unique
>    - SPDX ID’s have a format the conveys information about the type (per
>    previous conversations)
>    - Namespaces not include ‘#’ to make the URI’s more HTTP addressable
>    (per Sean’s concern)
>
>
>
> Variations on a theme:
>
>    - We could introduce a separator character for the namespace that
>    would be appended to the documentNamespace.  This would relax the
>    requirement for an XMLNS property in the RDF serializations since we could
>    then parse – although I’m not sure how reliable the parsing would be.
>    - Require a namespace – this would make the tag/value more readable
>    and the expense of flexibility
>
>
>
> Let me know if this sounds reasonable.
>
>
>
> Gary
>
>
>
>
>
>
>
>
>
> -------------------------------------------------
>
> Gary O'Neall
>
> Principal Consultant
>
> Source Auditor Inc.
>
> Mobile: 408.805.0586
>
> Email: [email protected]
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments,
> is intended only for the person(s) or entity to which it is addressed and
> may contain confidential and/or privileged material. Any review,
> re-transmission, dissemination or other use of, or taking of any action in
> reliance upon this information by persons or entities other than the
> intended recipient is prohibited. If you received this in error, please
> contact the sender and destroy any copies of this information.
>
>
> 
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4162): https://lists.spdx.org/g/Spdx-tech/message/4162
Mute This Topic: https://lists.spdx.org/mt/84955264/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [spdx-tech] Element Identifier Proposal - spdxNamespace

Reply via email to