I completed a simple program today that would iterate over the license 
templates/data, let me mark out sections, and output something in XML form. The 
license headers from the spreadsheet are included, and SPDX markup should be 
converted with no loss in information. I marked the files by hand, it only took 
a couple hours. The exercise was enlightening, it brought to mind a few 
questions:


-          There are a lot of licenses with "exhibit A" type sections that talk 
about how to apply the license (essentially, providing what we call the 
"header" and saying how to use it); should these be flagged as optional? What 
would be done about references in the license proper that refer to them?

-          Similarly, the matching guidelines allow for matching different 
"kinds" of bullets (numbered, lettered, unordered), but in some cases specific 
sections are referred to elsewhere in a license. What should be done here? 
Presumably they are unlikely to be changed or seen in another format, so it may 
be that the matching guidelines shouldn't be applied to bullets in these cases, 
since it would allow licenses with broken "references", and/or have no way to 
validate those references. (This is simple to apply with the xml format - 
simply don't mark the bullets)

-          There are a lot of licenses that look like basically the MIT or BSD 
licenses but with the names changed, that could potentially be targets for 
consolidation in the future, if someone's into that sort of thing

-          There are a few licenses with the copyright notices mixed in in 
inconvenient places, or where the formatting is weird (sub-32 ascii 
characters), or where the title is repeated or messed up, etc.

-          The template files do not have consistent line terminators; some are 
CRLF, some are LF. This would probably be worth making consistent, though the 
point may be moot with XML since to a certain extent whitespace is ignored.

-          Some of the bullets got missed, e.g. the BSD-3-Clause-Clear license. 
I didn't think it was worth another pass at this stage to correct.

-          The program I wrote allows for a 'review' flag, and I applied it to 
various licenses for various reasons. Some, I didn't have the granularity to 
divide the license and copyright notice when they were on the same line, some 
where I was uncertain about a decision I made, and some where the structure of 
the license created an unintuitive XML document (for example, multiple title 
sections; there was at least one license template that actually contained two 
wholly separate licenses concatenated)

In light of all these things, it might be worthwhile for someone other than me 
to perform this process (someone with more legal knowledge, or more familiarity 
with SPDX policies); the tool I wrote is more or less usable, if spare, and 
anyone interested is welcome to have a look. Alternately, a thorough reading of 
the XML source files (which we determined to call 'master' files, but is 
confusing when we're talking about git repositories!) with regard for which 
section which contents go in and so on is certainly warranted before any kind 
of release.

The generated files are here: 
https://github.com/myndzi/license-list/tree/xml-test/src

And an example of a generated file which includes most of the discussed 
features: https://github.com/myndzi/license-list/blob/xml-test/src/GPL-1.0.xml

-Kris
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to