> Today's discussion of the Identity model was very interesting. A few > thoughts: > > * Entity resolution (e.g., > https://www.ibm.com/docs/en/iii/9.0.0?topic=insight-entity-resolution) is > an enormously complex process. Casinos have a business interest in knowing > that a person is not acting under multiple identities for fraudulent > purposes; linking voter registration rolls to strongly identity-proofed > RealID drivers licenses is another example. > Agree, this would be a non-goal for SPDX.
> * SPDX has no use case (that I'm aware of) for doing any kind of entity > resolution - ensuring that the same person is not using more than one SPDX > Identity. Artifacts are scanned for whatever person identifiers (name and > email) are present, and the results recorded in the SPDX Document. If one > person has 4 different email addresses, it is not SPDX's role to discover > that they are actually a single person. And it's a given that with 330 > million U.S. persons, name collisions (multiple meatbags having the same > first and last name) are inevitable. SPDX doesn't care, it's just > harvesting what the scanners see. > Partially agree. While we don't get that one person has one SPDX identity, we do want to enable identities to be re-used with an SPDX document and, if desired, across SPDX documents. In some cases this is just for efficiency (I have 10,000 components created by one organization so I want to define the organization once and re-use that identity to keep the file compact) and in other cases it's so you can see which components in an SBOM or group of SBOMs originated from a particular identity. Also, SPDX isn't just about scanners, that's the world we live in today because SBOMs are rarely produced/provided by the software producer and instead are reverse-engineered from the consumer - that's not the world we want to live in. > * With regard to inheritance, I'm in the database graph camp - the > simplest way for SPDX to both perform its own functions and integrate into > the larger world is to be as simple as possible - Person is a type all by > itself, not a subclass of Identity which is in turn a subclass of Element. > An SPDX document can list all the Person instances it sees, with whatever > fields are interesting about a person - so far, just email address (primary > key) and name. Adding an SPDX-Ref or a comment or a creation date to > Person is just cruft that makes the spec more complex with no benefit. If > it turns out that scanners can validate additional information about a > person (such as public key certificates for that email address), then > either the Person type can add a list of certificates (clumsy) or the > Certificate type can have a subject email address field (easy). > Having one way to reference "things" within and across SPDX documents (which is Document Namespace + SPDXID) (which is effectively the main capability that Element brings) establishes a "contract" that any element in SPDX (identities, artifacts, defects, etc.) can use to consistently refer to elements. Having each entity define its own method of referencing elements will lead to a lot of inconsistency and subsequent confusion. With your point above of not needing to ensure that one person has one identifier it also frees us from needing to identify the perfect primary key for an identity, for example, in your scanner example if we decide that email address is the primary key what would the scanner do if it knows the name of an entity but not its email address? Email address is optional in our model and not a good candidate for a primary key, not to mention I'd say that most of us have more than one email address and aren't necessarily consistent about which ones we use where > > * I disagree that making everything an Element makes everything easier to > integrate with other activities. If SPDX and Application B want to talk > about the same Person, it's far easier for them both to use email address > as the common identifier than for Application B to learn everything about > SPDX Elements and SPDXIDs. > I disagree with your disagreement :). The specification provides all elements with certain capabilities, for example, the ability to describe relationships between elements, the ability to capture verification information for an element, etc. If each entity doesn't inherit from element then it will need to duplicate those capabilities if it needs them. For example, in the call there was the scenario of wanting to link a person to an organization, this can be done simply with a relationship between the two elements. If a person and organization didn't inherit from element then we would have to model that explicitly which would increase the complexity of the model to support a secondary use case. > * Identity is not a base class of Person. Identity is a composition of > Person, Organization and Tool that does something (creates a piece of > software, creates an SPDX document, creates an annotation). An Element can > reference an Identity and associate a creation date to the act of creating > that Element. The Identity can reference the Person, Organization, and > Tool involved in the act of creation. > > Of course decisions will be made by working through specifics of use > cases. But I'm pretty confident that making Person (and Organization and > Tool) independent types will make every use case easier to understand and > integrate with. > > v/r, > Dave > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4043): https://lists.spdx.org/g/Spdx-tech/message/4043 Mute This Topic: https://lists.spdx.org/mt/82242865/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
