J Lovejoy <[email protected]>:
> Hi Eric,
>
> Gary O’Neall wrote a paper about the various ways one can access the SPDX
> License List, which is available in a variety of ways (beside scraping). That
> paper is here: http://wiki.spdx.org/images/SPDX-TR-2014-2.v1.0.pdf
>
> I’m also copying your email to the SPDX tech team, as that is a better forum
> for discussing this kind of thing.
>
> Thanks,
> Jilayne
>
> SPDX Legal Team co-lead
> [email protected]
I've enclosed a copy of a proof-of-concept program in Python that walks a
code tree replacing inline license headers with SPDX tags. It can be
tested as a filter - feed a source file to its stdin, get back the
SPDXified version on stdout.
This program uses a static list of headers because I was unaware of
the RDFa querying machinery when I wrote it. Presently it recognizes
the GPL 2.0+ license header, MIT, and BSD-2-Clause licenses. It has
been tested on a real project, cvs-fast-export, and performed correctly.
It should not be very difficult to extend this to work with your database.
It needs two things:
(1) A way to get a map from license first lines to SPDX license IDs.
(2) A way to get a map from SPDX license IDs to canonical license header
texts.
The strategy it uses is to walk through each file looking for a
triggering first line, then grab a copy of the canonical header text
corresponding to it, then consume lines until a either a match has been
recognized or the match has failed. In the former case the span of
lines is replaced with an SPDX license tag. There's also a bit of trickery
used to deduce the right comment leader for the tag.
Can we cooperate on making this a production-quality tool?
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech