SPDX 2.0 database schema

Thomas T Gurney Fri, 22 May 2015 19:03:31 -0700

Hey all,

Tom Gurney here, undergrad student from the open source research lab at
University of Nebraska Omaha.


I have been digging into SPDX 2.0 since its official release. In trying to
build a relational database that will store SPDX 2.0 documents, I've realized
it's a lot tougher to store 2.0 data in a relational form than 1.2 data. A
_lot_ tougher. (At least, from my limited perspective, it is.)

Here's my attempt at a schema (beware, I threw it together in an evening):
https://github.com/ttgurney/spdx2.0-schema/blob/master/spdx2_schema.sql
It's like SQL pseudocode in that no actual DBMS will accept it, but it should
make sense.

So here's what's thrown me for a loop, and resulted in some odd design choices:

- SPDX identifiers that can be associated with a file, document or package, but
  must be unique within a document
- An SPDX document can describe files that are not part of any package (and
  it can contain multiple packages too? Not sure I'm reading the spec right)
- Relationships between identifiers
- License expression syntax (I don't see a way to sensibly accomodate this
  in a relational DB)
- Multiple checksum types supported (I stuck to just SHA1 for the above schema)
- What can we say about a file from its checksum? If two files have the same
  checksum, can we say that they are the same file in every aspect, and thereby
  carry with them all the same SPDX metadata, regardless of what package each
  is in? I'm not sure.

Has anyone run into similar difficulties? Ideas on how to overcome them? Or
is the idea of using a relational database to store this type of data
absolutely silly? Many thanks in advance.

Tom
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

SPDX 2.0 database schema

Reply via email to