Try here: https://github.com/ttgurney/spdx2.0-schema
I have an ER diagram, but it's on a whiteboard :) Getting a proper one put together is next on my list as far as documentation is concerned. Tom On Thu, May 28, 2015 at 02:07:19PM +0000, Manbeck, Jack wrote: > Tom, > > I tried the link but it doesn't seem to work. Do you have an ER diagram for > the database? It would be helpful it showed primary and secondary keys as > well but I suspect that's in your schema? > > Jack > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Thomas T Gurney > Sent: Thursday, May 28, 2015 9:49 AM > To: Gary O'Neall > Cc: [email protected] > Subject: Re: SPDX 2.0 database schema > > Thanks, Gary, for the helpful commentary! Looks like I will be making several > changes based on your suggestions. > > I have indeed seen the class diagram; I found it useful. But again, not a > one-to-one correspondence to how a relational DB would look (classic problem, > I know...) hence my questions. > > I did look briefly at triplestores. I'll admit, I put off looking into them > in detail since I'm very familiar with relational DBs and very unfamiliar > with the technology surrounding RDF :) Certainly I'll have to get familiar > with it, if only to properly support SPDX document generation in RDF format. > > For the record: the link I provided below was broken for a time as I was > moving things around; it has since been corrected. Not to mention, it > actually works with a specific DBMS now. (I went with Postgres specifically > for the CHECK constraint support, and overall unsurprising behavior compared > to MySQL :) > > I have also added (+ am adding) some additional documentation on some of the > quirks of this schema, in case anyone finds it useful. > > Tom > > On Wed, May 27, 2015 at 09:07:48PM -0700, Gary O'Neall wrote: > > Hi Tom, > > > > I agree it would be a challenge to store the SPDX data in a relational DB. > > The spec was designed in an object oriented fashion and it can be a > > challenge to map objects to relations (or at least I find it to be a > > challenge). > > > > For me, it is easier to understand the spec with a visual. If you > > haven't already, take a look at the class diagram: > > http://wiki.spdx.org/view/Technical_Team/Model_2_0 > > > > Some responses inline below. > > > > > -----Original Message----- > > > From: [email protected] [mailto:spdx-tech- > > > [email protected]] On Behalf Of Thomas T Gurney > > > Sent: Friday, May 22, 2015 6:33 PM > > > To: [email protected] > > > Subject: SPDX 2.0 database schema > > > > > > Hey all, > > > > > > Tom Gurney here, undergrad student from the open source research lab > > > at University of Nebraska Omaha. > > > > > > I have been digging into SPDX 2.0 since its official release. In > > > trying to build a relational database that will store SPDX 2.0 > > > documents, I've realized it's a lot tougher to store 2.0 data in a > > > relational form than > > > 1.2 data. A _lot_ tougher. (At least, from my limited perspective, > > > it > > > is.) > > > > > > Here's my attempt at a schema (beware, I threw it together in an > > > evening): > > > https://github.com/ttgurney/spdx2.0-schema/blob/master/spdx2_schema. > > > sql It's like SQL pseudocode in that no actual DBMS will accept it, > > > but it should make sense. > > > > > > So here's what's thrown me for a loop, and resulted in some odd > > > design > > > choices: > > > > > > - SPDX identifiers that can be associated with a file, document or > > > package, but > > > must be unique within a document > > [Gary] > > [Gary] Correct > > > - An SPDX document can describe files that are not part of any > > > package (and > > > it can contain multiple packages too? Not sure I'm reading the > > > spec > > > right) > > [Gary] Correct > > > > > - Relationships between identifiers > > [Gary] I think of it as relationships between SpdxElements which have > > identifiers as a property, but having the relationship between ID's > > makes sense to me for a relational DB. > > > > You can have external references to identifiers as well - they are > > made unique by the use of the SPDX Document Namespace, so including > > the document namespace or the document ID in the relationships table > > for the left and right relationships would allow the database to > > properly map external references and hold multiple SPDX documents. > > > > > - License expression syntax (I don't see a way to sensibly > > > accomodate this > > > in a relational DB) > > [Gary] It wasn't easy to write in Java ;) You could implement them as > > sets and operators (similar to the object model), but it would be > > rather complex > > > - Multiple checksum types supported (I stuck to just SHA1 for the > > > above > > > schema) > > [Gary] If you want it highly normalized, you could create a separate > > table which checksums and have a reference (foreign key) to the > > checksum table from the file. The checksum table would have a value > > and algorithm columns > > > - What can we say about a file from its checksum? If two files have > > > the same > > > checksum, can we say that they are the same file in every aspect, > > > and thereby > > > carry with them all the same SPDX metadata, regardless of what > > > package each > > > is in? I'm not sure. > > [Gary] This has been debated and there are different opinions on this. > > As far as the spec goes, we include the file name along with the > > checksum when calculating the validation. My personal view is that > > the checksum states the content is extremely likely to be the same > > (depending on the checksum algorithm, I may even say the content is > > the same), but the placement of the file itself may be relevant to how it > > is used and may impact the metadata. > > > > > > Has anyone run into similar difficulties? Ideas on how to overcome > > > them? Or is the idea of using a relational database to store this > > > type of data absolutely silly? Many thanks in advance. > > [Gary] Not silly, but difficult. Our commercial application store > > license, package, and file data in a RDMS and translates to/from SPDX > > without storing any data outside the DB. That being said, we don't > > have to worry about all possible SPDX documents - only the ones likely > > to be used in our application. > > > > An interesting thing to research would be using a storage facility for > > RDF (e.g. triplestore) since the RDF schema has already been created. > > > > > > Tom > > > _______________________________________________ > > > Spdx-tech mailing list > > > [email protected] > > > https://lists.spdx.org/mailman/listinfo/spdx-tech > > > > > _______________________________________________ > Spdx-tech mailing list > [email protected] > https://lists.spdx.org/mailman/listinfo/spdx-tech > > _______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
