Re: [EXT] [spdx-tech] Element IDs

David Kemp Wed, 04 Aug 2021 11:28:11 -0700

SPDX 2.2 has license expressions, which represent multiple elements
(license IDs and operators) in a single string.  If RDF is going to reason
about license expressions, it must be able to know which license IDs and
which operators are present.  We don't call license expressions
"serialization" even though they define an equivalence between the AST and
a string.


Similarly, RDF must be able to reason about an Element's "namespace" and
its "inNsid", not just some idString blob as a property value.  Standard
ways of representing individual elements in compound strings are ABNF and
regular expression named groups.  The model can call it an idString
property, but it must also define the "virtual properties" within it in a
manner that allows different serializations treat the virtual properties as
real ones - every element has a "namespace" property and getter and setter
methods for that property.

Dave

On Wed, Aug 4, 2021 at 1:02 PM Alexios Zavras <[email protected]>
wrote:

> I am actually very conflicted about this…
>
> On one hand splitting an idString into a two things (namespace and
> in-namespace-id, if you excuse the awful wording) sounds natural and
> appeals to my design aesthetics, especially since we all agree that an id
> will use such a combination.
>
> On the other hand, the worry expressed by Sean about complexity in the
> model is real and I am not sure that introducing such complexity is
> justifiable.
>
>
>
> In case it wasn’t clearly understood: if we split, and an Element, instead
> of a single property “id”, has two properties (say “namespace” and
> “inNsId”) we lose the easy way of referencing to an element. That means
> that **every** other class that points to an Element (or anything else,
> really), will have to specify both properties to refer to something.
>
> As an example, we will not have a Relationship of type CONTAINS from
> Element-1 to Element-2; we should have a Relationship of type CONTAINS
> from-namespace Ns1 and from-innsid I1 to-namespace Ns2 and to-nssid i2.
>
> Or a Document will have a rootElement-namespace and rootElement-innsid in
> order to point to something…
>
>
>
> … which of course also contradicts the most basic principle of linked
> data: each thing should have a single URI that can be addressed with.
>
>
>
> Judging pros and cons, I therefore vote for a single property “id”.
>
>
>
> It will be a string, a URI, and it’s up to us to define how we join the
> namespace and in-namespace-id parts. We’ve already said ‘#’ and ‘/’ are not
> suitable.
>
> A very simple encoding (a single “-“ or “_”, for example) may not be
> sufficient, because we want to be able to also do the reverse
> transformation: from a single string understand the namespace and
> in-namespace-id parts.
>
> Back in the old days we were using non-printable characters to separate
> strings; “Unit Separator” (dec 31, hex 1F) is still in ASCII tables; we can
> use the three printable characters “%1F”. Or go for the section sign § and
> use “%A7”. Or a sequence of one or more “~” signs. Or…
>
>
>
>
>
> -- zvr
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of
> *David Kemp
> *Sent:* Wednesday, 4 August, 2021 05:28
> *To:* Sean Barnum <[email protected]>
> *Cc:* SPDX-list <[email protected]>
> *Subject:* Re: [EXT] [spdx-tech] Element IDs
>
>
>
> My assertion on the call is that any presumption of “globally unique”
> based soley on the probability space of possible values is a poor general
> approach because it does not explicitly take into account the instantial
> value space where the number of objects may be very large and increase the
> probability of collisions. It does not deterministically prevent
> collisions. While extremely unlikely, it is possible to have a conflict
> with only two objects.
>
>
>
> You should speak with a cryptographer.  For a 256 bit hash value, the
> chance of birthday collision is 1 / 2^128, or 1 / 3.4*10^38.  That's 10
> with 38 zeros.  For comparison, the chance of winning the Powerball lottery
> jackpot is 1 in 292 million, or 1 / 3*10^8, so the chance of collision is
> about the same as the chance of winning a powerball jackpot
> 1,000,000,000,000,000,000,000,000,000,000 times in a row. The age of the
> universe is 436,117,076,640,000,000 seconds, so you'd have to be running
> those lotteries at 1,000,000,000,000 times a second for the whole age of
> the universe before getting a 50% chance of a collision.
>
> Compare that to the reliability of trying to deconflict namespaces using a
> global registration system.  "Extremely unlikely" is easy to say, but it
> doesn't come close to doing the mathematics justice. The chance of
> collision due to an error in a global managed system is infinitely greater
> (yes, that's hyperbole) than in a cryptographic system.
>
>
>
> So, to support use cases such as linked data we need namespaces to be URIs
> themselves.
>
>
>
> Yes, that goes without saying, just as UUIDs are included in URIs.
>
> I am avoiding using “local id” as it may imply that that portion of the
> identifier is only local to that namespace
>
>
> That's OK.  The identifier is local to the namespace, and since it can be
> anything within a namespace, nothing prevents many namespaces from using
> the same id.  The full "namespace:id" is different, meaning the Elements
> are different regardless of whether the ids are the same.  I think
> "component" is misleading because it implies that several namespaces using
> the same id are using it to refer to the same "thing"/component, which
> clearly is not required.  The id has no semantics, it is opaque, but I'm
> not going to quibble over what to call it.
>
>
>
> I think it is important that we realize that the identifier (idString) is
> a valid URI that is composed of the namespace and the component id. It is
> not adequate to split these properties and store them in separate
> properties.
>
>
>
> On the contrary, it is essential to recognize that the model represents
> semantics.  By using the words namespace and id we are assigning meaning
> within the compound identifier.  Pretending that that meaning doesn't
> exist, wishing it weren't real, and modeling an Element identifier as a
> lump without two components is the root cause of the discussion going
> around in circles for months.  Sebastian observed that the "#" character
> (or whatever other character we use) is not part of the semantics at all,
> it is part of the syntax.  Taking a namespace and an id and forming a
> single Element identifier (and putting that identifier in URI format) is by
> definition syntax, whenever and wherever it is done in any kind of
> application.  The Element identifier always has a namespace and an id,
> that's its semantic meaning across all applications, period.
>
>
>
> Each serialization of Elements MUST maintain integrity and consistency of
> the fully composed identifier string during serialization and
> deserialization.
>
>
>
> I fully agree.  You say that Elements don't have to be associated with any
> document.  If those Elements have integrity and consistency of their
> identifiers, and they aren't associated with a document, then wherever they
> come from they must have a proper namespace and id.   Gary suggested
> Elements could come from single-element documents that provided the
> namespace for that one Element; that's certainly possible.  I haven't seen
> your non-Document serialization, but I agree that it too must provide a
> proper compound identifier for its Element.
>
> Dave
>
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon
> Silva
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
> 
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4151): https://lists.spdx.org/g/Spdx-tech/message/4151
Mute This Topic: https://lists.spdx.org/mt/84649986/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [EXT] [spdx-tech] Element IDs

Reply via email to