Try here: https://github.com/ttgurney/spdx2.0-schema

I have an ER diagram, but it's on a whiteboard :) Getting a proper one put
together is next on my list as far as documentation is concerned.

Tom

On Thu, May 28, 2015 at 02:07:19PM +0000, Manbeck, Jack wrote:
> Tom,
> 
> I tried the link but it doesn't seem to work. Do you have an ER diagram for 
> the database? It would be helpful it showed primary and secondary keys as 
> well but I suspect that's in your schema?
> 
> Jack
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Thomas T Gurney
> Sent: Thursday, May 28, 2015 9:49 AM
> To: Gary O'Neall
> Cc: [email protected]
> Subject: Re: SPDX 2.0 database schema
> 
> Thanks, Gary, for the helpful commentary! Looks like I will be making several 
> changes based on your suggestions.
> 
> I have indeed seen the class diagram; I found it useful. But again, not a 
> one-to-one correspondence to how a relational DB would look (classic problem, 
> I know...) hence my questions.
> 
> I did look briefly at triplestores. I'll admit, I put off looking into them 
> in detail since I'm very familiar with relational DBs and very unfamiliar 
> with the technology surrounding RDF :) Certainly I'll have to get familiar 
> with it, if only to properly support SPDX document generation in RDF format.
> 
> For the record: the link I provided below was broken for a time as I was 
> moving things around; it has since been corrected. Not to mention, it 
> actually works with a specific DBMS now. (I went with Postgres specifically 
> for the CHECK constraint support, and overall unsurprising behavior compared 
> to MySQL :)
> 
> I have also added (+ am adding) some additional documentation on some of the 
> quirks of this schema, in case anyone finds it useful.
> 
> Tom
> 
> On Wed, May 27, 2015 at 09:07:48PM -0700, Gary O'Neall wrote:
> > Hi Tom,
> > 
> > I agree it would be a challenge to store the SPDX data in a relational DB.
> > The spec was designed in an object oriented fashion and it can be a 
> > challenge to map objects to relations (or at least I find it to be a 
> > challenge).
> > 
> > For me, it is easier to understand the spec with a visual.  If you 
> > haven't already, take a look at the class diagram:
> > http://wiki.spdx.org/view/Technical_Team/Model_2_0
> > 
> > Some responses inline below.
> > 
> > > -----Original Message-----
> > > From: [email protected] [mailto:spdx-tech- 
> > > [email protected]] On Behalf Of Thomas T Gurney
> > > Sent: Friday, May 22, 2015 6:33 PM
> > > To: [email protected]
> > > Subject: SPDX 2.0 database schema
> > > 
> > > Hey all,
> > > 
> > > Tom Gurney here, undergrad student from the open source research lab 
> > > at University of Nebraska Omaha.
> > > 
> > > I have been digging into SPDX 2.0 since its official release. In 
> > > trying to build a relational database that will store SPDX 2.0 
> > > documents, I've realized it's a lot tougher to store 2.0 data in a 
> > > relational form than
> > > 1.2 data. A _lot_ tougher. (At least, from my limited perspective, 
> > > it
> > > is.)
> > > 
> > > Here's my attempt at a schema (beware, I threw it together in an
> > > evening):
> > > https://github.com/ttgurney/spdx2.0-schema/blob/master/spdx2_schema.
> > > sql It's like SQL pseudocode in that no actual DBMS will accept it, 
> > > but it should make sense.
> > > 
> > > So here's what's thrown me for a loop, and resulted in some odd 
> > > design
> > > choices:
> > > 
> > > - SPDX identifiers that can be associated with a file, document or 
> > > package, but
> > >   must be unique within a document
> > [Gary]
> > [Gary] Correct
> > > - An SPDX document can describe files that are not part of any 
> > > package (and
> > >   it can contain multiple packages too? Not sure I'm reading the 
> > > spec
> > > right)
> > [Gary] Correct
> > 
> > > - Relationships between identifiers
> > [Gary] I think of it as relationships between SpdxElements which have 
> > identifiers as a property, but having the relationship between ID's 
> > makes sense to me for a relational DB.
> > 
> > You can have external references to identifiers as well - they are 
> > made unique by the use of the SPDX Document Namespace, so including 
> > the document namespace or the document ID in the relationships table 
> > for the left and right relationships would allow the database to 
> > properly map external references and hold multiple SPDX documents.
> > 
> > > - License expression syntax (I don't see a way to sensibly 
> > > accomodate this
> > >   in a relational DB)
> > [Gary] It wasn't easy to write in Java ;) You could implement them as 
> > sets and operators (similar to the object model), but it would be 
> > rather complex
> > > - Multiple checksum types supported (I stuck to just SHA1 for the 
> > > above
> > > schema)
> > [Gary] If you want it highly normalized, you could create a separate 
> > table which checksums and have a reference (foreign key) to the 
> > checksum table from the file.  The checksum table would have a value 
> > and algorithm columns
> > > - What can we say about a file from its checksum? If two files have 
> > > the same
> > >   checksum, can we say that they are the same file in every aspect, 
> > > and thereby
> > >   carry with them all the same SPDX metadata, regardless of what 
> > > package each
> > >   is in? I'm not sure.
> > [Gary] This has been debated and there are different opinions on this.  
> > As far as the spec goes, we include the file name along with the 
> > checksum when calculating the validation.  My personal view is that 
> > the checksum states the content is extremely likely to be the same 
> > (depending on the checksum algorithm, I may even say the content is 
> > the same), but the placement of the file itself may be relevant to how it 
> > is used and may impact the metadata.
> > > 
> > > Has anyone run into similar difficulties? Ideas on how to overcome 
> > > them? Or is the idea of using a relational database to store this 
> > > type of data absolutely silly? Many thanks in advance.
> > [Gary] Not silly, but difficult. Our commercial application store 
> > license, package, and file data in a RDMS and translates to/from SPDX 
> > without storing any data outside the DB.  That being said, we don't 
> > have to worry about all possible SPDX documents - only the ones likely 
> > to be used in our application.
> > 
> > An interesting thing to research would be using a storage facility for 
> > RDF (e.g. triplestore) since the RDF schema has already been created.
> > > 
> > > Tom
> > > _______________________________________________
> > > Spdx-tech mailing list
> > > [email protected]
> > > https://lists.spdx.org/mailman/listinfo/spdx-tech
> > 
> > 
> _______________________________________________
> Spdx-tech mailing list
> [email protected]
> https://lists.spdx.org/mailman/listinfo/spdx-tech
> 
> 
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to