J Lovejoy <[email protected]>:
> Hi Eric,
> 
> Gary O’Neall wrote a paper about the various ways one can access the SPDX 
> License List, which is available in a variety of ways (beside scraping). That 
> paper is here: http://wiki.spdx.org/images/SPDX-TR-2014-2.v1.0.pdf
> 
> I’m also copying your email to the SPDX tech team, as that is a better forum 
> for discussing this kind of thing.
> 
> Thanks,
> Jilayne
> 
> SPDX Legal Team co-lead
> [email protected]

I've enclosed a copy of a proof-of-concept program in Python that walks a
code tree replacing inline license headers with SPDX tags.  It can be 
tested as a filter - feed a source file to its stdin, get back the
SPDXified version on stdout.

This program uses a static list of headers because I was unaware of
the RDFa querying machinery when I wrote it. Presently it recognizes
the GPL 2.0+ license header, MIT, and BSD-2-Clause licenses.  It has
been tested on a real project, cvs-fast-export, and performed correctly.

It should not be very difficult to extend this to work with your database.
It needs two things:

(1) A way to get a map from license first lines to SPDX license IDs.

(2) A way to get a map from SPDX license IDs to canonical license header
texts.

The strategy it uses is to walk through each file looking for a
triggering first line, then grab a copy of the canonical header text
corresponding to it, then consume lines until a either a match has been
recognized or the match has failed. In the former case the span of
lines is replaced with an SPDX license tag. There's also a bit of trickery
used to deduce the right comment leader for the tag.

Can we cooperate on making this a production-quality tool?
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to