> Today's discussion of the Identity model was very interesting.  A few
> thoughts:
>
> * Entity resolution (e.g.,
> https://www.ibm.com/docs/en/iii/9.0.0?topic=insight-entity-resolution) is
> an enormously complex process.  Casinos have a business interest in knowing
> that a person is not acting under multiple identities for fraudulent
> purposes; linking voter registration rolls to strongly identity-proofed
> RealID drivers licenses is another example.
>
Agree, this would be a non-goal for SPDX.


> * SPDX has no use case (that I'm aware of) for doing any kind of entity
> resolution - ensuring that the same person is not using more than one SPDX
> Identity.  Artifacts are scanned for whatever person identifiers (name and
> email) are present, and the results recorded in the SPDX Document. If one
> person has 4 different email addresses, it is not SPDX's role to discover
> that they are actually a single person.  And it's a given that with 330
> million U.S. persons, name collisions (multiple meatbags having the same
> first and last name) are inevitable.  SPDX doesn't care, it's just
> harvesting what the scanners see.
>
Partially agree. While we don't get that one person has one SPDX identity,
we do want to enable identities to be re-used with an SPDX document and, if
desired, across SPDX documents. In some cases this is just for efficiency
(I have 10,000 components created by one organization so I want to define
the organization once and re-use that identity to keep the file compact)
and in other cases it's so you can see which components in an SBOM or group
of SBOMs originated from a particular identity. Also, SPDX isn't just about
scanners, that's the world we live in today because SBOMs are rarely
produced/provided by the software producer and instead are
reverse-engineered from the consumer - that's not the world we want to live
in.


> * With regard to inheritance, I'm in the database graph camp - the
> simplest way for SPDX to both perform its own functions and integrate into
> the larger world is to be as simple as possible - Person is a type all by
> itself, not a subclass of Identity which is in turn a subclass of Element.
> An SPDX document can list all the Person instances it sees, with whatever
> fields are interesting about a person - so far, just email address (primary
> key) and name.  Adding an SPDX-Ref or a comment or a creation date to
> Person is just cruft that makes the spec more complex with no benefit.  If
> it turns out that scanners can validate additional information about a
> person (such as public key certificates for that email address), then
> either the Person type can add a list of certificates (clumsy) or the
> Certificate type can have a subject email address field (easy).
>
Having one way to reference "things" within and across SPDX documents
(which is Document Namespace + SPDXID) (which is effectively the main
capability that Element brings) establishes a "contract" that any element
in SPDX (identities, artifacts, defects, etc.) can use to consistently
refer to elements. Having each entity define its own method of referencing
elements will lead to a lot of inconsistency and subsequent confusion. With
your point above of not needing to ensure that one person has one
identifier it also frees us from needing to identify the perfect primary
key for an identity, for example, in your scanner example if we decide that
email address is the primary key what would the scanner do if it knows the
name of an entity but not its email address? Email address is optional in
our model and not a good candidate for a primary key, not to mention I'd
say that most of us have more than one email address and aren't necessarily
consistent about which ones we use where

>
> * I disagree that making everything an Element makes everything easier to
> integrate with other activities.  If SPDX and Application B want to talk
> about the same Person, it's far easier for them both to use email address
> as the common identifier than for Application B to learn everything about
> SPDX Elements and SPDXIDs.
>
I disagree with your disagreement :). The specification provides all
elements with certain capabilities, for example, the ability to describe
relationships between elements, the ability to capture verification
information for an element, etc. If each entity doesn't inherit from
element then it will need to duplicate those capabilities if it needs them.
For example, in the call there was the scenario of wanting to link a person
to an organization, this can be done simply with a relationship between the
two elements. If a person and organization didn't inherit from element then
we would have to model that explicitly which would increase the complexity
of the model to support a secondary use case.


> * Identity is not a base class of Person.  Identity is a composition of
> Person, Organization and Tool that does something (creates a piece of
> software, creates an SPDX document, creates an annotation).  An Element can
> reference an Identity and associate a creation date to the act of creating
> that Element.  The Identity can reference the Person, Organization, and
> Tool involved in the act of creation.
>
> Of course decisions will be made by working through specifics of use
> cases.  But I'm pretty confident that making Person (and Organization and
> Tool) independent types will make every use case easier to understand and
> integrate with.
>
> v/r,
> Dave
> 
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4043): https://lists.spdx.org/g/Spdx-tech/message/4043
Mute This Topic: https://lists.spdx.org/mt/82242865/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to