Sean,

Would it be helpful to discuss the distinction between:
  *  "graph" - a knowledge instance that conforms to the SPDXv3 knowledge
model, used within an environment
  *  "document" - a unit of interchange between environments.

A collection of documents must be able to losslessly communicate an entire
graph from one environment to another. Wolfram Alpha is an example of an
environment.  Not all of its knowledge must have existed in documents in
the past, but in the future it must be possible to clone the entire
knowledge graph from one environment into another environment using one or
more documents.  Is this a correct statement (in principle, as a design
goal)?

I agree with your bullets 2, 3, and partially 4.

Taking your last bullet first:

   - *I don’t believe we ever fully decided on how we would explicitly
   denote which ids are using such a shorthand so a deserializer would know
   when to construct the full id*

The UCO / STIX pattern does not allow that to be solved.  Making the
non-namespaced id part of the URI path makes it impossible to tell where
the namespace ends.  I suppose a heuristic could say that the last path
component MUST BE the non-namespaced id and that the latter MUST NOT
contain the path separator "/", but that is unsatisfying.  It would be more
standard to treat the full URI path as the namespace and define id as a
fragment within that namespace.  I'll use the <namespace>#<id> convention
below; substitute "/" if you wish.

I would say:

   - the non-namespace portion of the identifier must uniquely identify an
      element within the namespace.  (The namespace is globally unique and the
      authority for a namespace is fully authoritative for the IDs within that
      namespace.)  At the strongest I'd say an authority MAY choose to use UUID
      v4 or v5 for it's IDs, but there is no reason to prohibit other big IDs
      (such as vehicle VINs) or small IDs (such as one-up serial numbers). The
      following would all be valid full IDs if so designated by the namespace
      authority:
         - <namespace>#File-154c8aa4-98ba-4642-b84e-fe8e283299bc
         - <namespace>#1394
         - <namespace>#File-1394
         - <namespace>#1394-File
         - <namespace>#1394-BEERWARE-4.2
         - <namespace>#License-BEERWARE-4.2

These are all opaque IDs, meaning that the only property that can be
counted on is that namespace is separated from non-namespace ID by "#".
I'm suggesting that we consider adding a third defined level:

   - "<namespace>#<uid>/<label>" where uid is guaranteed to be unique
      within the namespace and any "/<label>" MUST be ignored when matching
      Element IDs.

<uid>/<label> is a syntactically-valid RFC 3986 fragment, so the entire ID
including namespace, uid, and label is a valid URI.
All of the examples above would be valid in the 3-part ID, in addition to
the following:

   - <namespace>#154c8aa4-98ba-4642-b84e-fe8e283299bc/File
         - <namespace>#154c8aa4-98ba-4642-b84e-fe8e283299bc/File-ld.so.4
         - <namespace>#1394/File-ld.so.4
         - <namespace>#1394/License-BEERWARE-4.2

*In his email William stated that IDs should be opaque, which means that
processors MUST NOT examine the content of  the ID and make decisions based
on its content.  Element type (File, License, Identity, etc) must exist in
a property other than ID in order to be visible to applications.*

I agree with your serialization bullets, except:

   - Documents must specify the content of all elements that they define
      (inline) in the element property.  I don't know what it would mean for an
      "element" property value to be an ID reference.  References are used in
      properties other than element; element contains only the
document's inline
      definitions.
      - I don't disagree that Elements specified in a document MUST use
      only the non-namespace portion of the ID.  But it seems
potentially useful
      for a Document to contain a copy of element definitions from another
      namespace, as a convenience.

*If the two or three part ID structure is defined by SPDXv3 as normative,
serializers would use "#" and "/" as the separators between <namespace>,
<uid>, and <label>.*

Dave


On Tue, Jul 20, 2021 at 11:58 AM Sean Barnum <[email protected]> wrote:

> All,
>
>
>
> Trying to get caught up from my absence and looking forward to get back in
> the game.
>
>
>
> I am concerned by some of the id discussions I am seeing here as it looks
> like they may not be based on the consensus and results of the very
> extensive 3T-SBOM conversations that occurred regarding scoping of Elements
> and IDs.
>
>
>
> Here is a very quick stab at outlining the results of our previous
> discussions:
>
>    - Elements can be defined and referenced independent of the
>    scope/context of any Document.
>    - At the model level, Element ids uniquely identify an Element within
>    the universe across all contexts and therefore MUST be globally unique, not
>    simply unique within a given Document.
>    - There is significant value in full Element ids being IRIs
>    - Element ids should consist of a namespace (globally unique to some
>    specific authority (e.g., a single producer/definer)) combined with an
>    identifier that is globally unique within the id namespace.
>       - The namespace could be a general namespace for a given
>       producer/definer, could be tied to a specific Document, or could be
>       something else
>       - The non-namespace portion of the identifier would ideally contain
>       a UUID
>          - The UUID portion could be random (v4) or could be
>          deterministic (v5)
>       - *I do not recall any discussion of intent that the identifier
>       should be opaque and I do not see the value in having it so. I would 
> concur
>       with William that there is value in the value of readable ids.*
>       - *In UCO, STIX and other efforts I have been part of we have found
>       the following structural pattern to be most effective:
>       “<namespace>/<Element type>-<UUID>”*
>    - In serialization:
>       - Elements can be expressed as a simple flat set
>       - Any Element expressed outside of a Document MUST use it full
>       Element ID
>       - Documents, in their “element” property can either reference
>       relevant Elements via their Element ID or can specify/express Elements
>       inline
>       - Any references inside the Document to Elements defined outside
>       the Document MUST have an entry in the ExternalMap for the Document
>          - All ExternalMap entries MUST use full Element Ids
>       - Any Elements specified/expressed inline within the Document MAY
>       use only the non-namespace portion of its Element ID as its id
>          - When such Elements are deserialized their full Element ID is
>          constructed by combining the Document namespace with the the 
> non-namespace
>          portion of its Element ID being used as shorthand within the 
> serialization
>          - *I don’t believe we ever fully decided on how we would
>          explicitly denote which ids are using such a shorthand so a 
> deserializer
>          would know when to construct the full id*
>
>
>
> I would consider these characteristics of Elements and Ids to be among the
> most important considerations for the practical feasibility of our efforts.
>
>
>
> I am sure we can discuss further but wanted to make sure to get this
> concern out there.
>
>
>
> sean
>
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4105): https://lists.spdx.org/g/Spdx-tech/message/4105
Mute This Topic: https://lists.spdx.org/mt/84335757/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to