Let me split this question into a few questions for the community:

  1.  Do we agree that Set, Bag, etc. are specialized forms of Collection?
  2.  Are all subclasses of Collection, now and forever, going to have the same 
specialized semantics or could some subclasses be sets and others be bags?
     *   If all the subclasses are the same now and we can't imagine any 
possible scenario where that won't be true or desired, then Collection should 
be the specialized form (Set or Bag or ...).
     *   If we know some will be different or we don't want to commit to this 
always being true, then Collection should be the generalized form (Collection).
  3.  Even if Collection is a specialized form of collection is the general 
term Collection more approachable to the broader community (less technical and 
non-native English speakers)? The specification text either way would need to 
be specific.


Regards,

William Bartholomew (he/him) - Let's 
chat<https://outlook.office.com/findtime/[email protected]&anonymous&ep=plink>
Principal Security Strategist
Cybersecurity Policy - Digital Diplomacy

From: David Kemp <[email protected]>
Sent: Tuesday, November 23, 2021 5:59 PM
To: William Bartholomew (CELA) <[email protected]>
Cc: [email protected]
Subject: Re: [EXTERNAL] Re: [spdx-tech] ContextualCollection and CONTAINS 
Relationship

William,

I agree that:

  *   Collection is a grouping of Elements
  *   Package is a grouping of artifacts
  *   Contains describes one of two physical relationships between artifacts 
(the other is "references")
Elements are metadata about artifacts.  Artifacts are data, and data can 
contain or reference other data.  (A paper can have references, an html file 
can have links.)

The question raised today is: Is a Collection a Set or a Bag 
(https://en.wikipedia.org/wiki/Multiset<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMultiset&data=04%7C01%7Cwillbar%40microsoft.com%7C0a1832997b934f977be708d9aeedf31e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733159288601279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hk1Tvf4QvdZ%2Bzkjv5YsvQSTFaw63LirBMViPhvsFiRg%3D&reserved=0>)?
  In other words, does a Collection of Elements count the number of copies of a 
file that exists in an artifact, or just the fact that that file exists?  The 
members of a grouping can be ordered or unordered, and unique or non-unique.  
I'm assuming a Collection is unordered, but unordered Collection members are 
either unique (Set) or non-unique (Bag).  I'm also assuming that Collection of 
Elements is unique - the Collection is a Set.  Is that correct?

Then you get the benefits of grouping of elements (being able to refer to a set 
of elements so you can re-use them) but you avoid the multiple methods of 
describing artifacts contained within another artifact.

That is one use case.  Another use case is an anonymous grouping of elements 
that can't be referred to or re-used.  That is the "ferry" example from 
physical artifacts - the cars on a ferry are an ephemeral grouping, once they 
leave the ferry they are no longer a grouping that can be referred to.  That is 
also the non-Collection example - a Collection of Elements (not artifacts) can 
be referred to, but a non-Collection of Elements is an anonymous ephemeral 
grouping of Elements that exists only in the serialized data containing that 
grouping.

There are two reasons for non-Collection groupings:
1) Applications that need N random Elements to fully perform their function 
don't need an artificial N+1th element - the grouping is meaningless and 
doesn't need to be referred to or re-used, it's used only to import N Elements 
into the Application.
2) Godel's incompleteness theorem 
(https://plato.stanford.edu/entries/goedel-incompleteness/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplato.stanford.edu%2Fentries%2Fgoedel-incompleteness%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C0a1832997b934f977be708d9aeedf31e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733159288601279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2OGXax82Xy08fMjkTp2SSGnpx6V0O8UWJVgs6leckZE%3D&reserved=0>)
 talks about formal systems and proofs within them, but an analogy would treat 
the Universe of Elements as a system, in which case that Universe cannot be 
described as an Element, because as soon as you did so, the Universe would now 
have N+1 Elements, a Collection of N+1 Elements would become a Universe of N+2, 
and so on.  The use case is again to serialize N Elements and wind up with the 
same N Elements after deserialization.

Persistent groupings (Collections) are absolutely a requirement.  Ephemeral 
groupings (non-Collections / Bundles / Sets) are also a requirement.  Both are 
supported in the information model, and as I noted, a Bundle / Set is not a 
"non-contextual Collection" because it is not a Collection at all.  The N+1th 
Element does not exist in a Bundle / Set.

Dave

P.S.: I agree with you and Sean that composition 
(https://www.uml-diagrams.org/composition.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.uml-diagrams.org%2Fcomposition.html&data=04%7C01%7Cwillbar%40microsoft.com%7C0a1832997b934f977be708d9aeedf31e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733159288601279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PIFB2%2F5qSR%2BzA34lOwnu02GFem9gR34aD2YgKSZKTSc%3D&reserved=0>)
 is the wrong relationship between a logical Collection and its members, 
because the members don't existentially depend on the Collection.  Normal 
association (filled arrow) is the appropriate relationship in the logical 
model.  This reinforces that while an Artifact (data) can contain other 
Artifacts (data), a Collection Element describing a grouping of Artifacts does 
not "contain" other Elements.  Destruction of the Collection does not destroy 
the Elements it references.

On Tue, Nov 23, 2021 at 11:56 AM William Bartholomew (CELA) via 
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C0a1832997b934f977be708d9aeedf31e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733159288601279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=laAtnY4pgJ7VxKmse7fBrMcFk1QRGK%2BGTWDHMn1MoKg%3D&reserved=0>
 <[email protected]<mailto:[email protected]>> 
wrote:

The "ah ha" moment for me out of the last meeting was that ContextualCollection 
and Package were trying to do double duty, representing both a grouping of 
elements (metadata about artifacts) and describing the artifacts contained 
within another artifact. This also overlapped with the purpose of the CONTAINS 
relationship which is used to describe the artifacts contained within another 
artifact.

If we split these purposes and say that:

  1.  ContextualCollection is a grouping of elements
  2.  Package is a grouping of artifacts
  3.  CONTAINS relationship is the only method to describe the artifacts 
contained within another artifact

Then you get the benefits of grouping of elements (being able to refer to a set 
of elements so you can re-use them) but you avoid the multiple methods of 
describing artifacts contained within another artifact.

A couple of examples:


  *   These are logically equivalent:

     *   PackageA (artifact) CONTAINS (relationship) FileA (artifact) and FileB 
(artifact)
     *   PackageA (artifact) CONTAINS (relationship) PackageAContents 
(contextualcollection) which includes FileA (artifact) and FileB (artifact)

  *   So are these:

     *   PackageA (artifact) DEPENDS_ON (relationship) PackageB (artifact) and 
PackageC (artifact)
     *   PackageA (artifact) DEPENDS_ON (relationship) PackageADependencies 
(contextualcollection) which includes PackageB (artifact) and PackageC 
(artifact)

Another way of thinking about it is that ContextualCollection has meaning 
inside the SPDX realm whereas Relationships have meaning in the "real world".

Regards,

William Bartholomew (he/him) - Let's 
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C0a1832997b934f977be708d9aeedf31e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637733159288651279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hc2peYnI58LDfrQ5VqkdMYlQeUlZCT3IFYxzaIZP%2F6c%3D&reserved=0>
Principal Security Strategist
Cybersecurity Policy - Digital Diplomacy


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4267): https://lists.spdx.org/g/Spdx-tech/message/4267
Mute This Topic: https://lists.spdx.org/mt/87265208/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to