I would like to get some feedback from the community on some changes I'm
making to the SPDX Java tools related to the hasFiles property in JSON and
the CONTAINS relationship.

 

If you're a user of the SPDX Java tools, please review the following since
it may introduce an incompatibility with prior versions.

 

If you're an implementer of tools that read or write SPDX, you may also want
to review this and let us know if you agree with the approach.

 

If you're working on the SPDX 3.0 spec, you may find this issue relevant to
some upcoming topics related to serialization/deserialization.

 

I'd like to get feedback over the next week or two before I update the tools
with the changes.

 

Problem statement: The SPDX Java tools are currently representing the
relationships between the Package and the files contained in the Package in
two possibly inconsistent ways - using a hasFile property and using the
CONTAINS relationship between the Package and the File.  This could lead to
inconsistent results depending on how the SPDX file was serialized.

 

Current state of the SPDX Spec:

*       The relationship CONTAINS is documented and can be used to describe
a package CONTAINing a file in all supported serialization formats
*       Section 5.2.3
<https://spdx.github.io/spdx-spec/composition-of-an-SPDX-document/#523-file-
information-section>  describes how the position of file and package
declarations are used to denote which files belong to which package
*       Section 5.2.3
<https://spdx.github.io/spdx-spec/composition-of-an-SPDX-document/#523-file-
information-section>  states "When implementing file information in RDF, the
spdx:hasFile property is used to associate the package with the file."
*       The RDF OWL property hasFile is defined as "Indicates that a
particular file belongs to a package."
*       The RDF OWL documentation for the CONTAINS relationship includes the
comment "A Relationship of relationshipType_contains expresses that an
SPDXElement contains the relatedSPDXElement. For example, a Package contains
a File. (relationshipType_contains introduced in SPDX 2.0 deprecates
property 'hasFile' from SPDX 1.2)"

*       Note that comment in parenthesis is inconsistent with the hasFile
documentation in the OWL document (it is not deprecated) and also
inconsistent with section 5.2.32

*       The JSON schema defines a hasFiles property in the JSON Schema file
with the same definition as RDF

 

Current state of the Tools-Java version 1.0.3:

*       The Model object SpdxPackage has a property "files" which is a
collection based on a hasFile property in the underlying object store.
*       When deserialized, Tag/Value, JSON, YAML, XML, and Spreadsheets,
will store any files contained by a package as a hasFile property in the
underlying store and not as a CONTAINS relationship
*       If a package has a stated CONTAINS relationship between a package
and a file, it will be stored as a relationship (possibly duplicating
information in hasFile)

 

I would assert that a Package with a File listed in the hasFiles property is
semantically the same as Package has a CONTAINS relationship with File.
This leads to the inconsistency described in the problem statement.

 

There are 3 alternatives I've looked at to resolve the inconsistency:

A.      Leave the tools as is and live with the inconsistency.
B.      Translate all CONTAINS relationships to a hasFiles property in the
model store when deserializing.
C.      Translate all hasFiles properties into CONTAINS relationships when
deserializing and translating back to the hasFiles property in the
JSON/YAML/XML formats (not in the Tag/Value or RDF formats)

 

I've taken approach C in a large part due to the SPDX 3.0 discussions where
we plan to allow more compact serializations and convert to Relationships
when deserializing.  If nothing else, this would be a good experiment to see
how this approach works in practice.

 

Approach C has the following implications on the Java-Tools:

*       Runtime model:

*       In the runtime model, any addition to the files collection for a
package will automatically create a CONTAINS relationship between the
package and the file
*       In the runtime model, and modification to the CONTAINS relationships
between a package and file will be reflected in the files collection
*       There is no longer any possibility of duplication or inconsistencies
between the CONTAINS relationship and the files collection for a package.

*       Tag/Value:

*       When deserializing, a CONTAINS relationship between the package and
the file will be created based on the position of the files and packages per
the spec

*       A check will be made to make sure we don't add any duplicate
CONTAINS relationships

*       Files serialized will aways include the CONTAINS relationships in
addition to maintaining the proper relative positions of the packages and
files

*       Note: I could remove these relationships in the serialization since
they are redundant with the position, however, I personally think the
resultant tag/value is clearing having the additional relationships.
Feedback is welcome on this point.

*       JSON/XML/YAML:

*       When deserializing, a CONTAINS relationship between the package and
the file will be created for every element of the hasFiles list.
*       Files serialized will always use the hasFiles property for any
CONTAINS relationship and not include the CONTAINS relationships.  

*       RDF/XML:

*       When deserializing, a CONTAINS relationship between the package and
the file will be created for every <Package,hasFile,File> triple
*       When serializing, the CONTAINS relationships will be serialized.

*       Note: I'm quite interested in feedback if this translation to a
Relationship makes it harder for semantic reasoners or other implementations
using RDF

 

Thanks for reading through all this!  Let me know any concerns, thoughts,
questions.

 

Gary

 

-------------------------------------------------

Gary O'Neall

Principal Consultant

Source Auditor Inc.

Mobile: 408.805.0586

Email:  <mailto:[email protected]> [email protected]

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review,
re-transmission, dissemination or other use of, or taking of any action in
reliance upon this information by persons or entities other than the
intended recipient is prohibited. If you received this in error, please
contact the sender and destroy any copies of this information.

 



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4285): https://lists.spdx.org/g/Spdx-tech/message/4285
Mute This Topic: https://lists.spdx.org/mt/87646486/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to