I completed a simple program today that would iterate over the license templates/data, let me mark out sections, and output something in XML form. The license headers from the spreadsheet are included, and SPDX markup should be converted with no loss in information. I marked the files by hand, it only took a couple hours. The exercise was enlightening, it brought to mind a few questions:
- There are a lot of licenses with "exhibit A" type sections that talk about how to apply the license (essentially, providing what we call the "header" and saying how to use it); should these be flagged as optional? What would be done about references in the license proper that refer to them? - Similarly, the matching guidelines allow for matching different "kinds" of bullets (numbered, lettered, unordered), but in some cases specific sections are referred to elsewhere in a license. What should be done here? Presumably they are unlikely to be changed or seen in another format, so it may be that the matching guidelines shouldn't be applied to bullets in these cases, since it would allow licenses with broken "references", and/or have no way to validate those references. (This is simple to apply with the xml format - simply don't mark the bullets) - There are a lot of licenses that look like basically the MIT or BSD licenses but with the names changed, that could potentially be targets for consolidation in the future, if someone's into that sort of thing - There are a few licenses with the copyright notices mixed in in inconvenient places, or where the formatting is weird (sub-32 ascii characters), or where the title is repeated or messed up, etc. - The template files do not have consistent line terminators; some are CRLF, some are LF. This would probably be worth making consistent, though the point may be moot with XML since to a certain extent whitespace is ignored. - Some of the bullets got missed, e.g. the BSD-3-Clause-Clear license. I didn't think it was worth another pass at this stage to correct. - The program I wrote allows for a 'review' flag, and I applied it to various licenses for various reasons. Some, I didn't have the granularity to divide the license and copyright notice when they were on the same line, some where I was uncertain about a decision I made, and some where the structure of the license created an unintuitive XML document (for example, multiple title sections; there was at least one license template that actually contained two wholly separate licenses concatenated) In light of all these things, it might be worthwhile for someone other than me to perform this process (someone with more legal knowledge, or more familiarity with SPDX policies); the tool I wrote is more or less usable, if spare, and anyone interested is welcome to have a look. Alternately, a thorough reading of the XML source files (which we determined to call 'master' files, but is confusing when we're talking about git repositories!) with regard for which section which contents go in and so on is certainly warranted before any kind of release. The generated files are here: https://github.com/myndzi/license-list/tree/xml-test/src And an example of a generated file which includes most of the discussed features: https://github.com/myndzi/license-list/blob/xml-test/src/GPL-1.0.xml -Kris
_______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
