Hey all, Tom Gurney here, undergrad student from the open source research lab at University of Nebraska Omaha.
I have been digging into SPDX 2.0 since its official release. In trying to build a relational database that will store SPDX 2.0 documents, I've realized it's a lot tougher to store 2.0 data in a relational form than 1.2 data. A _lot_ tougher. (At least, from my limited perspective, it is.) Here's my attempt at a schema (beware, I threw it together in an evening): https://github.com/ttgurney/spdx2.0-schema/blob/master/spdx2_schema.sql It's like SQL pseudocode in that no actual DBMS will accept it, but it should make sense. So here's what's thrown me for a loop, and resulted in some odd design choices: - SPDX identifiers that can be associated with a file, document or package, but must be unique within a document - An SPDX document can describe files that are not part of any package (and it can contain multiple packages too? Not sure I'm reading the spec right) - Relationships between identifiers - License expression syntax (I don't see a way to sensibly accomodate this in a relational DB) - Multiple checksum types supported (I stuck to just SHA1 for the above schema) - What can we say about a file from its checksum? If two files have the same checksum, can we say that they are the same file in every aspect, and thereby carry with them all the same SPDX metadata, regardless of what package each is in? I'm not sure. Has anyone run into similar difficulties? Ideas on how to overcome them? Or is the idea of using a relational database to store this type of data absolutely silly? Many thanks in advance. Tom _______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
