On Mon, Sep 19, 2011 at 4:32 PM, Dennis E. Hamilton
<dennis.hamil...@acm.org> wrote:
> I agree that there is no escape from managing down to the individual file.  
> It is a question of organization now, where the entire base is involved.
>

RAT or something RAT-like.

> Later, if the svn:property is to be trusted, the problem is quite different, 
> it seems to me.  Plus the rules are understood and provenance and IP are 
> likely handled as anything needing clearance enters the code base.  What is 
> done to ensure a previously-vetted code base has not become tainted strikes 
> me as a kind of regression/smoke test.
>

Here is how I see SVN properties and RAT relating.   Any use of a
grep-like RAT-like tool will need to deal with exceptions.  We're
going to have stuff like binary files, say ODF files that are used for
testing, that don't have a "header".  Or files that are used only as a
build tool, checked in for convenience, but are not part of the
release.  Or 3rd party code that does not have a header, but we know
its original, like the ICU breakiterator data files.

How do we deal with those types of files, in the content of an
automated audit tool?  One solution is to record in a big config file
or script a list of all of these exceptions.  Essentially, an list of
files to ignore in the RAT scan.

That approach would certainly work, but would be fragile.  Moving or
renaming the files would break our script.  Not the end of the world,
since this could be designed to be "fail safe" and give us errors on
the files that moved.

But if we track this info in SVN, then we could generate the exclusion
list from SVN, so it automatically adjusts as files are moved or
renamed.  It also avoid the problem -- and this might just be my own
engineering esthetic -- of tracking metadata for files in two
different places.  It seems rather untidy to me.

>From a regression standpoint, you could treat all files as being in
one of several states:

1) Unexamined (no property set)

2) Apache 2.0 (included in the Oracle SGA or new code contributed by
committer or other person under iCLA)

3) Compatible 3rd party license

4) Incompatible 3rd party license

5) Not part of release

The goal would be to iterate until every file is in category 2, 3 or 5.

> It is in that regard that I am concerned the tools for this one-time case 
> need not be the same as for future cases.
>

There are two kinds of future cases:

1) Code contributed in small chunks by committers or patches, where we
can expect CTR to work.  There will be errors, but we can catch those
before we do subsequent releases via RAT.

2) Larger contributions made by SGA.  For example, the IBM Lotus
Symphony contribution, or other similar corporate contributions.  When
an Apache project receives a large code contribution like this they
need to do an IP clearance process on that contribution as well.   I
think that the RAT/SVN combination could work well here also.  The
goal would be to clear the IP on the new contributions before we start
copying or merging it into the core AOOo code.


> And, since I am not doing the work in the present case, I am offering this as 
> something to think about, not a position.
>
>  - Dennis
>

Reply via email to