On Mon, Sep 19, 2011 at 4:32 PM, Dennis E. Hamilton <dennis.hamil...@acm.org> wrote: > I agree that there is no escape from managing down to the individual file. > It is a question of organization now, where the entire base is involved. >
RAT or something RAT-like. > Later, if the svn:property is to be trusted, the problem is quite different, > it seems to me. Plus the rules are understood and provenance and IP are > likely handled as anything needing clearance enters the code base. What is > done to ensure a previously-vetted code base has not become tainted strikes > me as a kind of regression/smoke test. > Here is how I see SVN properties and RAT relating. Any use of a grep-like RAT-like tool will need to deal with exceptions. We're going to have stuff like binary files, say ODF files that are used for testing, that don't have a "header". Or files that are used only as a build tool, checked in for convenience, but are not part of the release. Or 3rd party code that does not have a header, but we know its original, like the ICU breakiterator data files. How do we deal with those types of files, in the content of an automated audit tool? One solution is to record in a big config file or script a list of all of these exceptions. Essentially, an list of files to ignore in the RAT scan. That approach would certainly work, but would be fragile. Moving or renaming the files would break our script. Not the end of the world, since this could be designed to be "fail safe" and give us errors on the files that moved. But if we track this info in SVN, then we could generate the exclusion list from SVN, so it automatically adjusts as files are moved or renamed. It also avoid the problem -- and this might just be my own engineering esthetic -- of tracking metadata for files in two different places. It seems rather untidy to me. >From a regression standpoint, you could treat all files as being in one of several states: 1) Unexamined (no property set) 2) Apache 2.0 (included in the Oracle SGA or new code contributed by committer or other person under iCLA) 3) Compatible 3rd party license 4) Incompatible 3rd party license 5) Not part of release The goal would be to iterate until every file is in category 2, 3 or 5. > It is in that regard that I am concerned the tools for this one-time case > need not be the same as for future cases. > There are two kinds of future cases: 1) Code contributed in small chunks by committers or patches, where we can expect CTR to work. There will be errors, but we can catch those before we do subsequent releases via RAT. 2) Larger contributions made by SGA. For example, the IBM Lotus Symphony contribution, or other similar corporate contributions. When an Apache project receives a large code contribution like this they need to do an IP clearance process on that contribution as well. I think that the RAT/SVN combination could work well here also. The goal would be to clear the IP on the new contributions before we start copying or merging it into the core AOOo code. > And, since I am not doing the work in the present case, I am offering this as > something to think about, not a position. > > - Dennis >