On Thu, Dec 28, 2017 at 4:53 PM, Philippe Ombredanne <[email protected]> wrote:
> Hi Thanh, > > On Thu, Dec 28, 2017 at 2:18 AM, Thanh Ha <[email protected]> > wrote: > > I am developing a license header scanner in order to quickly scan local > > files for license headers at the top of code files. > > You may want to check out ScanCode [1]. Since I use it with top Linux > maintainers to clarify the kernel licensing and set SPDX ids, it must > not be too shabby as a license detection engine. It detects headers > alright and much more, including EPL headers. > > PS: ScanCode is written in Python, not Go and I am the maintainer. > > [1] https://github.com/nexB/scancode-toolkit > -- > Cordially > Philippe Ombredanne > Hi Philippe, Thanks for the pointer. I had a look and unfortunately it isn't the tool we need for our use case. The tool we need (and is what I'm prototyping) is one that we can use in CI systems to pass a list of valid licenses like "EPL-1.0, Apache-2.0" for example and then it searches all the code files in a project repo to make sure that the top of every code file contains the correct license header text (and optionally SPDX identifier). If any files that have missing license headers or has incorrect license header text will automatically fail the build in CI and reports a -1 vote (or blocking vote) in a code review system like Gerrit or GitHub code reviews. The intention here is to block developers from merging code with missing license headers in the first place rather than find out after the fact that this has happened. We've successfully done this for a few of our Java projects using checkstyle but it's Java specific and runs quite a bit slower than we like. The new tool we've been working on scans significantly more quickly as it only reads the first few bytes of every file and all the scanning is done locally without generating anything (scans 10s of thousands of files in seconds). I have a work in progress here [0] in case anyone is interested but it currently requires us to provide an example license header. I'd like to pull in SPDX data so that this data can be automatically sourced from somewhere rather than depending on the projects to provide the correct header examples. Hope this explains things more clearly. Thanks, Thanh [0] https://github.com/zxiiro/license-header-checker
_______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
