Tom,

I tried the link but it doesn't seem to work. Do you have an ER diagram for the 
database? It would be helpful it showed primary and secondary keys as well but 
I suspect that's in your schema?

Jack


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Thomas T Gurney
Sent: Thursday, May 28, 2015 9:49 AM
To: Gary O'Neall
Cc: [email protected]
Subject: Re: SPDX 2.0 database schema

Thanks, Gary, for the helpful commentary! Looks like I will be making several 
changes based on your suggestions.

I have indeed seen the class diagram; I found it useful. But again, not a 
one-to-one correspondence to how a relational DB would look (classic problem, I 
know...) hence my questions.

I did look briefly at triplestores. I'll admit, I put off looking into them in 
detail since I'm very familiar with relational DBs and very unfamiliar with the 
technology surrounding RDF :) Certainly I'll have to get familiar with it, if 
only to properly support SPDX document generation in RDF format.

For the record: the link I provided below was broken for a time as I was moving 
things around; it has since been corrected. Not to mention, it actually works 
with a specific DBMS now. (I went with Postgres specifically for the CHECK 
constraint support, and overall unsurprising behavior compared to MySQL :)

I have also added (+ am adding) some additional documentation on some of the 
quirks of this schema, in case anyone finds it useful.

Tom

On Wed, May 27, 2015 at 09:07:48PM -0700, Gary O'Neall wrote:
> Hi Tom,
> 
> I agree it would be a challenge to store the SPDX data in a relational DB.
> The spec was designed in an object oriented fashion and it can be a 
> challenge to map objects to relations (or at least I find it to be a 
> challenge).
> 
> For me, it is easier to understand the spec with a visual.  If you 
> haven't already, take a look at the class diagram:
> http://wiki.spdx.org/view/Technical_Team/Model_2_0
> 
> Some responses inline below.
> 
> > -----Original Message-----
> > From: [email protected] [mailto:spdx-tech- 
> > [email protected]] On Behalf Of Thomas T Gurney
> > Sent: Friday, May 22, 2015 6:33 PM
> > To: [email protected]
> > Subject: SPDX 2.0 database schema
> > 
> > Hey all,
> > 
> > Tom Gurney here, undergrad student from the open source research lab 
> > at University of Nebraska Omaha.
> > 
> > I have been digging into SPDX 2.0 since its official release. In 
> > trying to build a relational database that will store SPDX 2.0 
> > documents, I've realized it's a lot tougher to store 2.0 data in a 
> > relational form than
> > 1.2 data. A _lot_ tougher. (At least, from my limited perspective, 
> > it
> > is.)
> > 
> > Here's my attempt at a schema (beware, I threw it together in an
> > evening):
> > https://github.com/ttgurney/spdx2.0-schema/blob/master/spdx2_schema.
> > sql It's like SQL pseudocode in that no actual DBMS will accept it, 
> > but it should make sense.
> > 
> > So here's what's thrown me for a loop, and resulted in some odd 
> > design
> > choices:
> > 
> > - SPDX identifiers that can be associated with a file, document or 
> > package, but
> >   must be unique within a document
> [Gary]
> [Gary] Correct
> > - An SPDX document can describe files that are not part of any 
> > package (and
> >   it can contain multiple packages too? Not sure I'm reading the 
> > spec
> > right)
> [Gary] Correct
> 
> > - Relationships between identifiers
> [Gary] I think of it as relationships between SpdxElements which have 
> identifiers as a property, but having the relationship between ID's 
> makes sense to me for a relational DB.
> 
> You can have external references to identifiers as well - they are 
> made unique by the use of the SPDX Document Namespace, so including 
> the document namespace or the document ID in the relationships table 
> for the left and right relationships would allow the database to 
> properly map external references and hold multiple SPDX documents.
> 
> > - License expression syntax (I don't see a way to sensibly 
> > accomodate this
> >   in a relational DB)
> [Gary] It wasn't easy to write in Java ;) You could implement them as 
> sets and operators (similar to the object model), but it would be 
> rather complex
> > - Multiple checksum types supported (I stuck to just SHA1 for the 
> > above
> > schema)
> [Gary] If you want it highly normalized, you could create a separate 
> table which checksums and have a reference (foreign key) to the 
> checksum table from the file.  The checksum table would have a value 
> and algorithm columns
> > - What can we say about a file from its checksum? If two files have 
> > the same
> >   checksum, can we say that they are the same file in every aspect, 
> > and thereby
> >   carry with them all the same SPDX metadata, regardless of what 
> > package each
> >   is in? I'm not sure.
> [Gary] This has been debated and there are different opinions on this.  
> As far as the spec goes, we include the file name along with the 
> checksum when calculating the validation.  My personal view is that 
> the checksum states the content is extremely likely to be the same 
> (depending on the checksum algorithm, I may even say the content is 
> the same), but the placement of the file itself may be relevant to how it is 
> used and may impact the metadata.
> > 
> > Has anyone run into similar difficulties? Ideas on how to overcome 
> > them? Or is the idea of using a relational database to store this 
> > type of data absolutely silly? Many thanks in advance.
> [Gary] Not silly, but difficult. Our commercial application store 
> license, package, and file data in a RDMS and translates to/from SPDX 
> without storing any data outside the DB.  That being said, we don't 
> have to worry about all possible SPDX documents - only the ones likely 
> to be used in our application.
> 
> An interesting thing to research would be using a storage facility for 
> RDF (e.g. triplestore) since the RDF schema has already been created.
> > 
> > Tom
> > _______________________________________________
> > Spdx-tech mailing list
> > [email protected]
> > https://lists.spdx.org/mailman/listinfo/spdx-tech
> 
> 
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to