On Thu, Dec 28, 2017 at 4:53 PM, Philippe Ombredanne <[email protected]>
wrote:

> Hi Thanh,
>
> On Thu, Dec 28, 2017 at 2:18 AM, Thanh Ha <[email protected]>
> wrote:
> > I am developing a license header scanner in order to quickly scan local
> > files for license headers at the top of code files.
>
> You may want to check out ScanCode [1]. Since I use it with top Linux
> maintainers to clarify the kernel licensing and set SPDX ids, it must
> not be too shabby as a license detection engine.  It detects headers
> alright and much more, including EPL headers.
>
> PS: ScanCode is written in Python, not Go and I am the maintainer.
>
> [1] https://github.com/nexB/scancode-toolkit
> --
> Cordially
> Philippe Ombredanne
>

Hi Philippe,

Thanks for the pointer. I had a look and unfortunately it isn't the tool we
need for our use case.

The tool we need (and is what I'm prototyping) is one that we can use in CI
systems to pass a list of valid licenses like "EPL-1.0, Apache-2.0" for
example and then it searches all the code files in a project repo to make
sure that the top of every code file contains the correct license header
text (and optionally SPDX identifier). If any files that have missing
license headers or has incorrect license header text will automatically
fail the build in CI and reports a -1 vote (or blocking vote) in a code
review system like Gerrit or GitHub code reviews. The intention here is to
block developers from merging code with missing license headers in the
first place rather than find out after the fact that this has happened.

We've successfully done this for a few of our Java projects using
checkstyle but it's Java specific and runs quite a bit slower than we like.
The new tool we've been working on scans significantly more quickly as it
only reads the first few bytes of every file and all the scanning is done
locally without generating anything (scans 10s of thousands of files in
seconds). I have a work in progress here [0] in case anyone is interested
but it currently requires us to provide an example license header. I'd like
to pull in SPDX data so that this data can be automatically sourced from
somewhere rather than depending on the projects to provide the correct
header examples.

Hope this explains things more clearly.

Thanks,
Thanh

[0] https://github.com/zxiiro/license-header-checker
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to