Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 10/27/11 8:28 PM, Pedro Giffuni wrote: --- On Thu, 10/27/11, Jürgen Schmidtjogischm...@googlemail.com wrote: In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those components in favor of the system libraries when those available. me too but we should move forward and we can change it at any time when we have a better solution. I am OK with that, but let me attempt to dump what I think: 1) you are not bringing in *anything* copyleft, that directory will only be for the non-restrictive stuff that we need: ICU, Boost, etc. 2) This will all have to be registered in the NOTICE file, but since this is transitory and not really stuff we use in base, we should start a new section there to separate it from the stuff we do use in the core system. 3) We should probably move some of the stuff in soltools there too (mkdepend). 4) I know you want ucpp there too, but since that stuff is used in idlc, I think I'd prefer it in idlc/source/preproc/ as it was before. No idea if we can use the system cpp for the rest but that would probably make sense. mmh, i would prefer to put it under the ext-sources to make clear that it comes from external. Juergen All just IMHO, I am pretty sure whatever you do is better than what we have now :). Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
--- On Fri, 10/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote: snip mental dump 4) I know you want ucpp there too, but since that stuff is used in idlc, I think I'd prefer it in idlc/source/preproc/ as it was before. No idea if we can use the system cpp for the rest but that would probably make sense. mmh, i would prefer to put it under the ext-sources to make clear that it comes from external. That is pretty well covered by SVN and the NOTICE file, but I was only brainstorming. Just have fun :). Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 9/22/11 1:19 PM, Jürgen Schmidt wrote: ok, we have several arguments for and against but no decision how we want to move forward. Let us take again a look on it 1. we have a working mechanism to get the externals from somewhere, check md5 sum, unpack, patch, build 1.1 somewhere is configurable during the configure step, initially the externals are downloaded from http://hg.services.openoffice.org/binaries 2. having the externals in the repository (SVN) won't be a big issue because in case of a checkout always the tip version is downloaded 2.1 the SCM can be used to track the used version of the externals for a specific OO version - simply checkout the version tag and everything is in place ... 3. in a DSCM it would be a real problem over time because of the increasing space of all versions 4. we need a replacement http://hg.services.openoffice.org/binaries asap (who knows how long the server will be available) 5. many developers probably work with a local clone of the repository using for example git svn or something else - disadvantage of the increasing space but probably acceptable if a clean local trunk will be kept and updated Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? i think we still haven't finished on this topic but it is somewhat important to move forward with our IP clearance and the whole development work. So if nobody has real objections i would like to move forward with this proposal but would also like to change the proposed directory name from ext_sources to 3rdparty. Keep in mind that we use this directory to keep the current state working and with our ongoing work we will remove more and more stuff from there. The adapted bootstrap mechanism will download the libraries from this new place. Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
2011/10/27 Jürgen Schmidt jogischm...@googlemail.com: On 9/22/11 1:19 PM, Jürgen Schmidt wrote: ok, we have several arguments for and against but no decision how we want to move forward. Let us take again a look on it 1. we have a working mechanism to get the externals from somewhere, check md5 sum, unpack, patch, build 1.1 somewhere is configurable during the configure step, initially the externals are downloaded from http://hg.services.openoffice.org/binaries 2. having the externals in the repository (SVN) won't be a big issue because in case of a checkout always the tip version is downloaded 2.1 the SCM can be used to track the used version of the externals for a specific OO version - simply checkout the version tag and everything is in place ... 3. in a DSCM it would be a real problem over time because of the increasing space of all versions 4. we need a replacement http://hg.services.openoffice.org/binaries asap (who knows how long the server will be available) 5. many developers probably work with a local clone of the repository using for example git svn or something else - disadvantage of the increasing space but probably acceptable if a clean local trunk will be kept and updated Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? i think we still haven't finished on this topic but it is somewhat important to move forward with our IP clearance and the whole development work. So if nobody has real objections i would like to move forward with this proposal but would also like to change the proposed directory name from ext_sources to 3rdparty. Keep in mind that we use this directory to keep the current state working and with our ongoing work we will remove more and more stuff from there. So keep the current approach with tarballs with MD5 hashnames, etc., just as before but on Apache servers? That sounds good to me. The adapted bootstrap mechanism will download the libraries from this new place. Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
--- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote: ... i think we still haven't finished on this topic but it is somewhat important to move forward with our IP clearance and the whole development work. So if nobody has real objections i would like to move forward with this proposal but would also like to change the proposed directory name from ext_sources to 3rdparty. Keep in mind that we use this directory to keep the current state working and with our ongoing work we will remove more and more stuff from there. I was about to bring in support for FreeBSD's fetch command (somewhat like curl) in fetch-tarballs.sh and it looks like now you are obsoleting it :-P . In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those components in favor of the system libraries when those available. Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 10/27/11 6:13 PM, Pedro Giffuni wrote: --- On Thu, 10/27/11, Jürgen Schmidtjogischm...@googlemail.com wrote: ... i think we still haven't finished on this topic but it is somewhat important to move forward with our IP clearance and the whole development work. So if nobody has real objections i would like to move forward with this proposal but would also like to change the proposed directory name from ext_sources to 3rdparty. Keep in mind that we use this directory to keep the current state working and with our ongoing work we will remove more and more stuff from there. I was about to bring in support for FreeBSD's fetch command (somewhat like curl) in fetch-tarballs.sh and it looks like now you are obsoleting it :-P . In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those components in favor of the system libraries when those available. me too but we should move forward and we can change it at any time when we have a better solution. Juergen Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
--- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote: In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those components in favor of the system libraries when those available. me too but we should move forward and we can change it at any time when we have a better solution. I am OK with that, but let me attempt to dump what I think: 1) you are not bringing in *anything* copyleft, that directory will only be for the non-restrictive stuff that we need: ICU, Boost, etc. 2) This will all have to be registered in the NOTICE file, but since this is transitory and not really stuff we use in base, we should start a new section there to separate it from the stuff we do use in the core system. 3) We should probably move some of the stuff in soltools there too (mkdepend). 4) I know you want ucpp there too, but since that stuff is used in idlc, I think I'd prefer it in idlc/source/preproc/ as it was before. No idea if we can use the system cpp for the rest but that would probably make sense. All just IMHO, I am pretty sure whatever you do is better than what we have now :). Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Thu, Oct 27, 2011 at 2:28 PM, Pedro Giffuni p...@apache.org wrote: --- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote: In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those components in favor of the system libraries when those available. me too but we should move forward and we can change it at any time when we have a better solution. I am OK with that, but let me attempt to dump what I think: 1) you are not bringing in *anything* copyleft, that directory will only be for the non-restrictive stuff that we need: ICU, Boost, etc. I think it is like the SVN trunk. We initially bring it all in, and then remove the copyleft parts. Of course if we can remove them before hand, that is good as well. But whatever order we do the work, we cannot release until we've done the IP review. The files are currently hosted here: http://hg.services.openoffice.org/binaries/ Since the build currently depends on that, I think we want to move those files now, to Apache, rather than wait too long. -Rob 2) This will all have to be registered in the NOTICE file, but since this is transitory and not really stuff we use in base, we should start a new section there to separate it from the stuff we do use in the core system. 3) We should probably move some of the stuff in soltools there too (mkdepend). 4) I know you want ucpp there too, but since that stuff is used in idlc, I think I'd prefer it in idlc/source/preproc/ as it was before. No idea if we can use the system cpp for the rest but that would probably make sense. All just IMHO, I am pretty sure whatever you do is better than what we have now :). Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Hi Matthias; --- On Thu, 10/27/11, Mathias Bauer mathias_ba...@gmx.net wrote: ... In any case, yes.. I think this is the way to go. I am just hoping there will be a way to opt out those I am OK with that, but let me attempt to dump what I think: 1) you are not bringing in *anything* copyleft, that directory will only be for the non-restrictive stuff that we need: ICU, Boost, etc. That should be doable. OTOH I'm wondering whether we should keep the copyleft tarballs at Apache Extras - it would allow to still build with them (something that can be done outside the ASF infrastructure and is still appreciated (if I understood correctly). I don't like that but we will have to do it as a temporary solution to avoid breaking the build until we replace everything. I think on the long run this is only interesting for windows binaries, due to the difficulties of getting those packages from different places. On linux/BSD distributions it makes sense to use the prepackaged mozilla, etc. 3) We should probably move some of the stuff in soltools there too (mkdepend). That's something for later, ATM we should move the ext_src stuff into a secure place. Yes. Also for later, the simpleICC library is used to generate a color profile required for pdf. I think we should just generate the color profile somewhere outside the main build and use it, avoiding the extra build cycles. Another thing is we are excluding by default with extreme prejudice both LGPL and MPL but it will be convenient to reevaluate that since we will have to use the prepackaged hunspell. If nobody else wants to do it, I can invest some time into that, but it might take some days. I won't do it because of principles... I want them to just go away ;-). FWIW, Rob and I are trying to use an ooo- prefix on Apache Extras. ooo-external-sources ? It seems that the consensus is that we check in the binary tarballs into trunk/ext_sources?! I am not sure on that, I think lazy consensus by whomever does it first will win :). Pedro.
Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)
--- On Fri, 10/14/11, Robert Burrell Donkin wrote: ... A branch would save us from having say... 1000 commits with header changes in the history. Apache uses version control as the canonical record. It's therefore essential to know why a header was changed and by whom. And of course the branch would be on SVN so the history for the legal changes wouldn't be lost. Of course I meant this only for the SGA, but ultimately it depends on the people applying in and from what I understand now, *I* won't be touching any headers :). thanks for all these explanations, Pedro.
Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)
On 10/14/2011 8:58 AM, Pedro Giffuni wrote: --- On Fri, 10/14/11, Robert Burrell Donkin wrote: ... A branch would save us from having say... 1000 commits with header changes in the history. Apache uses version control as the canonical record. It's therefore essential to know why a header was changed and by whom. And of course the branch would be on SVN so the history for the legal changes wouldn't be lost. Of course I meant this only for the SGA, but ultimately it depends on the people applying in and from what I understand now, *I* won't be touching any headers :). thanks for all these explanations, Pedro. Robert Pedro, I intend to get started on the headers in the very near future. My intention is to do a series of checkins by project/directory in the source tree, matching the changes to the grant(s). I have a bit of sequencing of activities before I start, but this is next up on the list. Andrew -- Oracle Email Signature Logo Andrew Rist | Interoperability Architect Oracle Corporate Architecture Group Redwood Shores, CA | 650.506.9847
ICC generated profiles are copylefted (was Re: A systematic approach to IP review?)
Hi; When I saw this thread about machine-generate files, I never imagined we would be taking about code in OpenOffice.org but I found that this file: icc/source/create_sRGB_profile/create_sRGB_profile.cpp indeed generates viral licensed code! I am proposing an obvious patch but I wanted the issue documented so I created bug 118512. enjoy ;) Pedro. --- On Thu, 9/29/11, Rob Weir robw...@apache.org wrote: On Thu, Sep 29, 2011 at 1:53 AM, Dennis E. Hamilton wrote: Let me recall the bidding a little here. What I said was It is unlikely that machine-generated files of any kind are copyrightable subject matter. You point out that computer-generated files might incorporate copyrightable subject matter. I hadn't considered a hybrid case where copyrightable subject matter would subsist in such a work, and I have no idea how and to what extend the output qualifies as a work of authorship, but it is certainly a case to be reckoned with. Then there is the issue of macro expansion, template parameter substitution, etc., and the cases becomes blurrier and blurrier. For example, if I wrote a program and then put it through the C Language pre-processor, in how much of the expanded result does the copyright declared on the original subsist? (I am willing to concede, for purposes of argument, that the second is a derivative work of the former, even though the derivation occurred dynamically.) I fancy this example because it is commonplace that the pre-processor incorporated files that have their own copyright and license notices too. Also, the original might include macro calls, with parameters using macros defined in one or more of those incorporated files. Under US law: Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device IANAL, but I believe Dennis is correct that a machine cannot be an author, in terms of copyright. But the author of that program might. It comes down to who exactly put the work into a fixed in any tangible medium of expression. When I used a n ordinary code editor, the machine acts as a tool that I use to create an original work. It is a tool, like a paintbrush. In other cases, a tool can be used to transform a work. If there is an original work in fixed form that I transform, then I may have copyright interest in the transformed work. That is how copyright law protects software binaries as well as source code. As for the GNU Bison example, if I created the BNF, then I have copyright interest in the generated code. That does not mean that I have exclusive ownership of all the generated code. It might be a mashup of original template code from the Bison authors, along with code that is a transformation of my original grammar definition. It isn't an either/or situation. A work can have mixed authorship. -Rob I concede that copyrightable matter can survive into a machine-generated file. And I maintain that there can be other conditions on the use of such a file other than by virtue of it containing portions in which copyright subsists. For example, I don't think the Copyright office is going to accept registration of compiled binaries any time soon, even though there may be conditions on the license of the source code that carries over onto those binaries. And, yes, it is murky all the way down. - Dennis -Original Message- From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] Sent: Wednesday, September 28, 2011 22:32 To: 'ooo-dev@incubator.apache.org' Subject: RE: A systematic approach to IP review? Not to put too fine a point on this, but it sounds like you are talking about boilerplate (and authored) template code that Bison incorporates in its output. It is also tricky because the Bison output is computer source code. That is an interesting case. In the US, original work of authorship is pretty specific in the case of literary works, which is where software copyright falls the last time I checked (too long ago, though). I suspect that a license (in the contractual sense) can deal with more than copyright. And, if Bison spits out copyright notices, they still only apply to that part of the output, if any, that qualifies as copyrightable subject matter. Has the Bison claim ever been tested in court? Has anyone been pursued or challenged for infringement? I'm just curious. - Dennis -Original Message- From: Norbert Thiebaud [mailto:nthieb...@gmail.com] Sent: Wednesday, September 28, 2011 22:11 To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org Subject: Re: A systematic approach to IP review? On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)
On Sun, Oct 9, 2011 at 7:42 PM, Pedro Giffuni p...@apache.org wrote: Hi; Looking at how big, and mostly cosmetic but necessary, a change it will be to bring in all the SGA license changes, and given that it requires manual intervention and is not something that can be done in one huge mega commit ... I think we should create a branch for this changes in merge them in two steps: corresponding to both SGAs. This way merging CWSs and bugzilla patches can go on without pain and people can get started on the header changes. I recommend separating review from (automated) execution. If this is done, a branch shouldn't be necessary... Robert
Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)
--- On Thu, 10/13/11, Robert Burrell Donkin wrote: I recommend separating review from (automated) execution. If this is done, a branch shouldn't be necessary... Uhm.. can you elaborate a bit more? A branch would save us from having say... 1000 commits with header changes in the history. regards, Pedro.
How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)
Hi; Looking at how big, and mostly cosmetic but necessary, a change it will be to bring in all the SGA license changes, and given that it requires manual intervention and is not something that can be done in one huge mega commit ... I think we should create a branch for this changes in merge them in two steps: corresponding to both SGAs. This way merging CWSs and bugzilla patches can go on without pain and people can get started on the header changes. cheers, Pedro.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Am 01.10.2011 00:17, schrieb Michael Stahl: On 30.09.2011 21:24, Mathias Bauer wrote: On 28.09.2011 17:32, Pedro F. Giffuni wrote: Another advantage of unpacking the tarballs: the patches will become *real* patches that just contain changes of the original source code. Often the patches nowadays contain additional files that we just need to build the stuff in OOo (e.g. dmake makefiles) - they could be checked in as regular files. Currently keeping them as regular files is awkward because then they need to be copied to the place the tarballs are unpacked to. but this is just because dmake can only build source files in the same directory; imagine a more flexible gbuild external build target where the makefiles are in the source tree while the tarball gets unpacked in the workdir... Sure, but until we aren't there... I didn't talk about the dmake makefiles that are used to unpack and patch, I was talking about using dmake for building the external modules that come with their own build system. The makefile.mk in the root directory of the external modules are not part of the patch, but some patches contain makefile.mk files that are necessary to build the stuff, either on all or only on some platforms. Regards, Mathias
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 28.09.2011 17:32, Pedro F. Giffuni wrote: FWIW; I don't like the patches because I can't really examine well the code, besides this is something the VCS handles acceptably: commit the original sourcecode and then apply the patches in a different commit. If we start with up to date versions there would not be much trouble. I'm not against unpacking the tarballs and applying the patches, but we should keep the patches somewhere so that updates could be done with the same effort as today. Another advantage of unpacking the tarballs: the patches will become *real* patches that just contain changes of the original source code. Often the patches nowadays contain additional files that we just need to build the stuff in OOo (e.g. dmake makefiles) - they could be checked in as regular files. Currently keeping them as regular files is awkward because then they need to be copied to the place the tarballs are unpacked to. Regards, Mathias
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 30.09.2011 21:24, Mathias Bauer wrote: On 28.09.2011 17:32, Pedro F. Giffuni wrote: Another advantage of unpacking the tarballs: the patches will become *real* patches that just contain changes of the original source code. Often the patches nowadays contain additional files that we just need to build the stuff in OOo (e.g. dmake makefiles) - they could be checked in as regular files. Currently keeping them as regular files is awkward because then they need to be copied to the place the tarballs are unpacked to. but this is just because dmake can only build source files in the same directory; imagine a more flexible gbuild external build target where the makefiles are in the source tree while the tarball gets unpacked in the workdir... Regards, Mathias
Re: A systematic approach to IP review?
On Thu, Sep 29, 2011 at 1:53 AM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: Let me recall the bidding a little here. What I said was It is unlikely that machine-generated files of any kind are copyrightable subject matter. You point out that computer-generated files might incorporate copyrightable subject matter. I hadn't considered a hybrid case where copyrightable subject matter would subsist in such a work, and I have no idea how and to what extend the output qualifies as a work of authorship, but it is certainly a case to be reckoned with. Then there is the issue of macro expansion, template parameter substitution, etc., and the cases becomes blurrier and blurrier. For example, if I wrote a program and then put it through the C Language pre-processor, in how much of the expanded result does the copyright declared on the original subsist? (I am willing to concede, for purposes of argument, that the second is a derivative work of the former, even though the derivation occurred dynamically.) I fancy this example because it is commonplace that the pre-processor incorporated files that have their own copyright and license notices too. Also, the original might include macro calls, with parameters using macros defined in one or more of those incorporated files. Under US law: Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device IANAL, but I believe Dennis is correct that a machine cannot be an author, in terms of copyright. But the author of that program might. It comes down to who exactly put the work into a fixed in any tangible medium of expression. When I used a n ordinary code editor, the machine acts as a tool that I use to create an original work. It is a tool, like a paintbrush. In other cases, a tool can be used to transform a work. If there is an original work in fixed form that I transform, then I may have copyright interest in the transformed work. That is how copyright law protects software binaries as well as source code. As for the GNU Bison example, if I created the BNF, then I have copyright interest in the generated code. That does not mean that I have exclusive ownership of all the generated code. It might be a mashup of original template code from the Bison authors, along with code that is a transformation of my original grammar definition. It isn't an either/or situation. A work can have mixed authorship. -Rob I concede that copyrightable matter can survive into a machine-generated file. And I maintain that there can be other conditions on the use of such a file other than by virtue of it containing portions in which copyright subsists. For example, I don't think the Copyright office is going to accept registration of compiled binaries any time soon, even though there may be conditions on the license of the source code that carries over onto those binaries. And, yes, it is murky all the way down. - Dennis -Original Message- From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] Sent: Wednesday, September 28, 2011 22:32 To: 'ooo-dev@incubator.apache.org' Subject: RE: A systematic approach to IP review? Not to put too fine a point on this, but it sounds like you are talking about boilerplate (and authored) template code that Bison incorporates in its output. It is also tricky because the Bison output is computer source code. That is an interesting case. In the US, original work of authorship is pretty specific in the case of literary works, which is where software copyright falls the last time I checked (too long ago, though). I suspect that a license (in the contractual sense) can deal with more than copyright. And, if Bison spits out copyright notices, they still only apply to that part of the output, if any, that qualifies as copyrightable subject matter. Has the Bison claim ever been tested in court? Has anyone been pursued or challenged for infringement? I'm just curious. - Dennis -Original Message- From: Norbert Thiebaud [mailto:nthieb...@gmail.com] Sent: Wednesday, September 28, 2011 22:11 To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org Subject: Re: A systematic approach to IP review? On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: I'll stand by my original statement. I'm not going to get into the Pixar case since it doesn't apply here. I did not say it applied to the Visual studio generated cruft... I merely commented on the blanket assertion that 'computer generated = no copyright' The Bison manual may have license conditions on what can be done with the generated artifact, but I suggest that is not about copyrightable subject matter
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 20.09.2011 16:36, Pavel Janík wrote: Have we ever considered using version control to...uh...manage file versions? Just an idea. Maybe Heiner will say more, but in the past, we have had the external tarballs in the VCS, but then we moved them out and it worked very well. There never was a reason to track external.tar.gz files in VCS, because we do not change them. What might be the best way to handle 3rd party code in AOOo probably will depend on the needs of the developers as well as on legal requirements. We had these tarballs plus patches IIRC because Sun Legal required that all used 3rd party stuff should be preserved in our repos in its original form. As a developer I always had preferred to have 3rd party code treated in the *build* like the internal source code. So if there wasn't a requirement to have unpatched sources in the repository, the most natural way to keep 3rd party stuff would be to have a third sub-repo 3rdparty next to main and extras with the 3rd party stuff checked in. Not the tarballs, just the unpacked content. I wouldn't give up the patches, as they allow to handle updates better. This would cause a problem, as direct changes to the 3rd party stuff without additional authorization (means: changing the source code must not happen accidently, only when the 3rd party code gets an update from upstream) must be prevented, while still patch files must be allowed to added, removed, or changed, not the original source code. If that wasn't possible or too cumbersome, checking in the tarballs in 3rdparty would be better. As svn users never download the complete history as DSCM users do, the pain of binary files in the repo isn't that hard. In case AOOo moved to a DSCM again later, the tarballs could be moved out again easily. Regards, Mathias
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
FWIW; I don't like the patches because I can't really examine well the code, besides this is something the VCS handles acceptably: commit the original sourcecode and then apply the patches in a different commit. If we start with up to date versions there would not be much trouble. just my $0.02, not an objection. Pedro. --- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote: ... I wouldn't give up the patches, as they allow to handle updates better. This would cause a problem, as direct changes to the 3rd party stuff without additional authorization (means: changing the source code must not happen accidently, only when the 3rd party code gets an update from upstream) must be prevented, while still patch files must be allowed to added, removed, or changed, not the original source code. If that wasn't possible or too cumbersome, checking in the tarballs in 3rdparty would be better. i also wouldn't give up the patches and for that reason i would like to move forward for now with keeping the tarballs as proposed. But i like the name 3rdparty for the directory and we can later on change it from the tarballs to the unpacked code it we see demand for it. At the moment it's just easier to keep the tarballs and focus on other work. As svn users never download the complete history as DSCM users do, the pain of binary files in the repo isn't that hard. In case AOOo moved to a DSCM again later, the tarballs could be moved out again easily. agree, we don't really loose anything, can change if necessary and can continue with our work Juergen
RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
The problem with bringing the 3rd party software completely into the SVN tree and modifying it in the tree has to do with the license the updated software is under. In that case, there *is* a code provenance issue and I believe it crosses a line that the Apache Software Foundation is unwilling to cross with regard to the integrity of its code bases. The current patches to Boost, for example, do not change the license on the code and preserve the Boost license. But since this is ephemeral and the source is never in the SVN tree (is that correct?) the derivative use disappears at the end of a build. It is sufficient then to include the dependency in the NOTICE for the release and not worry further. Also, the current dependency is several releases behind the current Boost release. This might not matter - the specific Boost libraries that are used might not be effected. But there is a release synchronization issue. A fork would have to be maintained. Also, the dependencies are managed better now, rather than having the entire Boost library installed for cherry picking. (This will all change at some point, since Boost is being incorporated into ISO C++. It is probably best to wait for that to ripple out into the compiler distributions.) - Dennis -Original Message- From: Pedro F. Giffuni [mailto:giffu...@tutopia.com] Sent: Wednesday, September 28, 2011 08:32 To: ooo-dev@incubator.apache.org Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?] FWIW; I don't like the patches because I can't really examine well the code, besides this is something the VCS handles acceptably: commit the original sourcecode and then apply the patches in a different commit. If we start with up to date versions there would not be much trouble. just my $0.02, not an objection. Pedro. --- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote: ... I wouldn't give up the patches, as they allow to handle updates better. This would cause a problem, as direct changes to the 3rd party stuff without additional authorization (means: changing the source code must not happen accidently, only when the 3rd party code gets an update from upstream) must be prevented, while still patch files must be allowed to added, removed, or changed, not the original source code. If that wasn't possible or too cumbersome, checking in the tarballs in 3rdparty would be better. i also wouldn't give up the patches and for that reason i would like to move forward for now with keeping the tarballs as proposed. But i like the name 3rdparty for the directory and we can later on change it from the tarballs to the unpacked code it we see demand for it. At the moment it's just easier to keep the tarballs and focus on other work. As svn users never download the complete history as DSCM users do, the pain of binary files in the repo isn't that hard. In case AOOo moved to a DSCM again later, the tarballs could be moved out again easily. agree, we don't really loose anything, can change if necessary and can continue with our work Juergen
Re: A systematic approach to IP review?
On 19.09.2011 02:27, Rob Weir wrote: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. If you want svn to be the place for the IP review, we have to do it in two steps. There are some cws for post-3.4 that bring in new files. Setting up a branch now to bring them to svn will create additional work now that IMHO should better be done later. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. see above e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. I assume that you are talking about header files with a MS copyright, not header files generated from e.g. Visual Studio. In my understanding these files should be considered as contributed under the rules of the OOo project and so now their copyright owner is Oracle. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. IMHO this is the best solution. svn is the place of truth if it comes down to files. The second best solution would be to have one text file per build unit (that would be a gbuild makefile in the new build system) or per module (that would be a sub folder of the sub-repos). The file should be checked in in svn. Everything else (spreadsheets or whatsoever) could be generated from that, in case anyone had a need for a spreadsheet with 6 rows containing license information. ;-) Regards, Mathias
RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
The idea (not originally mine) is to have keep only compatible licensed code under an isolated (3rdparty) directory. I think on the long run we should try to use the system versions of such software when available, and every linux/bsd distribution is probably doing that for LO already. Pedro. --- On Wed, 9/28/11, Dennis E. Hamilton dennis.hamil...@acm.org wrote: The problem with bringing the 3rd party software completely into the SVN tree and modifying it in the tree has to do with the license the updated software is under. In that case, there *is* a code provenance issue and I believe it crosses a line that the Apache Software Foundation is unwilling to cross with regard to the integrity of its code bases. The current patches to Boost, for example, do not change the license on the code and preserve the Boost license. But since this is ephemeral and the source is never in the SVN tree (is that correct?) the derivative use disappears at the end of a build. It is sufficient then to include the dependency in the NOTICE for the release and not worry further. Also, the current dependency is several releases behind the current Boost release. This might not matter - the specific Boost libraries that are used might not be effected. But there is a release synchronization issue. A fork would have to be maintained. Also, the dependencies are managed better now, rather than having the entire Boost library installed for cherry picking. (This will all change at some point, since Boost is being incorporated into ISO C++. It is probably best to wait for that to ripple out into the compiler distributions.) - Dennis -Original Message- From: Pedro F. Giffuni [mailto:giffu...@tutopia.com] Sent: Wednesday, September 28, 2011 08:32 To: ooo-dev@incubator.apache.org Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?] FWIW; I don't like the patches because I can't really examine well the code, besides this is something the VCS handles acceptably: commit the original sourcecode and then apply the patches in a different commit. If we start with up to date versions there would not be much trouble. just my $0.02, not an objection. Pedro. --- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote: ... I wouldn't give up the patches, as they allow to handle updates better. This would cause a problem, as direct changes to the 3rd party stuff without additional authorization (means: changing the source code must not happen accidently, only when the 3rd party code gets an update from upstream) must be prevented, while still patch files must be allowed to added, removed, or changed, not the original source code. If that wasn't possible or too cumbersome, checking in the tarballs in 3rdparty would be better. i also wouldn't give up the patches and for that reason i would like to move forward for now with keeping the tarballs as proposed. But i like the name 3rdparty for the directory and we can later on change it from the tarballs to the unpacked code it we see demand for it. At the moment it's just easier to keep the tarballs and focus on other work. As svn users never download the complete history as DSCM users do, the pain of binary files in the repo isn't that hard. In case AOOo moved to a DSCM again later, the tarballs could be moved out again easily. agree, we don't really loose anything, can change if necessary and can continue with our work Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 28.09.2011 17:32, Pedro F. Giffuni wrote: FWIW; I don't like the patches because I can't really examine well the code, besides this is something the VCS handles acceptably: commit the original sourcecode and then apply the patches in a different commit. If we start with up to date versions there would not be much trouble. if we didn't have many thousands of lines of patches to rebase, then upgrading to less outdated versions wouldn't be such a PITA. sadly in many cases upstreaming patches was never sufficiently high on the priority list to actually get done... -- Dealing with failure is easy: Work hard to improve. Success is also easy to handle: You've solved the wrong problem. Work hard to improve. -- Alan Perlis
RE: A systematic approach to IP review?
It is unlikely that machine-generated files of any kind are copyrightable subject matter. I would think that files generated by Visual Studio should just be regenerated, especially if this has to do with preprocessor pre-compilation, project boiler-plate (and even build/make) files, MIDL-compiled files, resource-compiler output, and the like. (I assume there are no MFC dependencies unless MFC has somehow shown up under VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times. I thought the big issue was ATL.) Meanwhile, I favor what you say about having a file at the folder level of the buildable components. It strikes me as a visible way to ensure that the IP review has been completed and is current. It also has great transparency and accountability since the document is in the SVN itself. It also survives being extracted from the SVN, included in a tar-ball, etc. In short: nice! - Dennis -Original Message- From: Mathias Bauer [mailto:mathias_ba...@gmx.net] Sent: Wednesday, September 28, 2011 04:25 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On 19.09.2011 02:27, Rob Weir wrote: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. If you want svn to be the place for the IP review, we have to do it in two steps. There are some cws for post-3.4 that bring in new files. Setting up a branch now to bring them to svn will create additional work now that IMHO should better be done later. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. see above e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. I assume that you are talking about header files with a MS copyright, not header files generated from e.g. Visual Studio. In my understanding these files should be considered as contributed under the rules of the OOo project and so now their copyright owner is Oracle. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. IMHO this is the best solution. svn is the place of truth if it comes down to files. The second best solution would be to have one text file per build unit (that would be a gbuild makefile in the new build system) or per module (that would be a sub folder of the sub-repos). The file should be checked in in svn. Everything else (spreadsheets or whatsoever) could be generated from that, in case anyone had a need for a spreadsheet with 6 rows containing license information. ;-) Regards, Mathias
Re: A systematic approach to IP review?
On Wed, Sep 28, 2011 at 6:42 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: It is unlikely that machine-generated files of any kind are copyrightable subject matter. I would think that files generated by Visual Studio should just be regenerated, especially if this has to do with preprocessor pre-compilation, project boiler-plate (and even build/make) files, MIDL-compiled files, resource-compiler output, and the like. That is my understanding as well, wrt computer-generated files. However the lack of copyright does not mean lack of concern. For example, some code generation applications have a license that puts additional restrictions on the generated code. Some versions of GNU Bison, the YACC variant, did that. (I assume there are no MFC dependencies unless MFC has somehow shown up under VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times. I thought the big issue was ATL.) Meanwhile, I favor what you say about having a file at the folder level of the buildable components. It strikes me as a visible way to ensure that the IP review has been completed and is current. It also has great transparency and accountability since the document is in the SVN itself. It also survives being extracted from the SVN, included in a tar-ball, etc. In short: nice! - Dennis -Original Message- From: Mathias Bauer [mailto:mathias_ba...@gmx.net] Sent: Wednesday, September 28, 2011 04:25 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On 19.09.2011 02:27, Rob Weir wrote: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. If you want svn to be the place for the IP review, we have to do it in two steps. There are some cws for post-3.4 that bring in new files. Setting up a branch now to bring them to svn will create additional work now that IMHO should better be done later. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. see above e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. I assume that you are talking about header files with a MS copyright, not header files generated from e.g. Visual Studio. In my understanding these files should be considered as contributed under the rules of the OOo project and so now their copyright owner is Oracle. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. IMHO this is the best solution. svn is the place of truth if it comes down to files. The second best solution would be to have one text file per build unit (that would be a gbuild makefile in the new build system) or per module (that would be a sub folder of the sub-repos). The file should be checked in in svn. Everything else (spreadsheets or whatsoever) could be generated from that, in case anyone had a need for a spreadsheet with 6 rows containing license information. ;-) Regards, Mathias
Re: A systematic approach to IP review?
On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: It is unlikely that machine-generated files of any kind are copyrightable subject matter. I'd imagine that Pixar, for instance, would have a problem with that blanket statement... The very existence of this paragraph in the Bison manual : http://www.gnu.org/s/bison/manual/bison.html#Conditions also raise doubt as the the validity of the premise. Norbert
RE: A systematic approach to IP review?
I'll stand by my original statement. I'm not going to get into the Pixar case since it doesn't apply here. The Bison manual may have license conditions on what can be done with the generated artifact, but I suggest that is not about copyrightable subject matter in the artifact. A similar condition would be one in, let's say for a hypothetical case, Visual C++ 2008 Express Edition requiring that generated code be run on Windows. It's not about copyright. And I agree, one must understand license conditions that apply to the tool used to make the generated artifacts. I did neglect to consider that. - Dennis -Original Message- From: Norbert Thiebaud [mailto:nthieb...@gmail.com] Sent: Wednesday, September 28, 2011 16:41 To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org Subject: Re: A systematic approach to IP review? On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: It is unlikely that machine-generated files of any kind are copyrightable subject matter. I'd imagine that Pixar, for instance, would have a problem with that blanket statement... The very existence of this paragraph in the Bison manual : http://www.gnu.org/s/bison/manual/bison.html#Conditions also raise doubt as the the validity of the premise. Norbert
Re: A systematic approach to IP review?
--- On Wed, 9/28/11, Norbert Thiebaud wrote: ... On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton wrote: It is unlikely that machine-generated files of any kind are copyrightable subject matter. I'd imagine that Pixar, for instance, would have a problem with that blanket statement... The very existence of this paragraph in the Bison manual : http://www.gnu.org/s/bison/manual/bison.html#Conditions also raise doubt as the the validity of the premise. Ugh... I am not a lawyer and I normally prefer not to be have to read all that but OOo requires bison to build, so if that paragraph still applies we should be using yacc instead. Pedro.
Re: A systematic approach to IP review?
On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: I'll stand by my original statement. I'm not going to get into the Pixar case since it doesn't apply here. I did not say it applied to the Visual studio generated cruft... I merely commented on the blanket assertion that 'computer generated = no copyright' The Bison manual may have license conditions on what can be done with the generated artifact, but I suggest that is not about copyrightable subject matter in the artifact. Actually it is. The only claim they could legally have _is_ on the generated bit that are substantial piece of code copied from template they provide, namely in the case of a bison generated parser the whole parser skeleton needed to exploit the generated state-graph. the whole paragraph is about the copyright disposition of these bits. and in the case of bison they explicitly grant you a license to use these bits in the 'normal' use case... my point being that the existence of that paragraph also disprove the assertion that 'computer generated = no copyright' You could write a program that print itself... the mere fact that it print itself does not mean you lose the copyright on your program... That being said, I do think you are on the clear with the Visual Studio generated cruft... but not merely because there is 'computer generation' involved. Norbert
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtien jhrecht...@web.dewrote: On 09/20/2011 05:26 PM, Rob Weir wrote: Ai2011/9/20 Pavel Janíkpa...@janik.cz: Have we ever considered using version control to...uh...manage file versions? Just an idea. Maybe Heiner will say more, but in the past, we have had the external tarballs in the VCS, but then we moved them out and it worked very well. There never was a reason to track external.tar.gz files in VCS, because we do not change them. -- That's fine. If they don't change, then doing a svn update will not bring them down each time. Aside from being useful for version control, SVN is useful also very useful as an audit trail. So in the rare occasions when one of these files does change, we know who changed it and why. This is important for ensuring the IP cleanliness of the project. Is your main concern performance? Even as individual tarballs, ext-sources is 86 files, 250MB. ooo/extras is 243 files and 822 MB. And ooo/main is 76,295 files for over 900MB. So ext-sources is not a huge contributor to download time. Placing all the external tarballs in the VCS is a real killer if using a distributed SCM like git or Mercurial, thats why we had moved them out. As Pavel said, it worked quite nice. As for the audit possibility, we referenced the external tar balls in the source tree by file name and a md5 check sum, which works just as reliantly as putting them directly into the repository. Nowadays the DSCM have some alternative methods which deal with such blobs but in essence they also keep them separate. If AOOo ever plans to go back to a DSCM I would keep the source tree and the external blobs strictly separated. All in all the general SCM tooling community opinion trend seems to be that a S(ource)CM system is for, well, source and external dependencies are better handled with other mechanism, like Maven or so. With SVN all this is less of a concern, naturally. ok, we have several arguments for and against but no decision how we want to move forward. Let us take again a look on it 1. we have a working mechanism to get the externals from somewhere, check md5 sum, unpack, patch, build 1.1 somewhere is configurable during the configure step, initially the externals are downloaded from http://hg.services.openoffice.org/binaries 2. having the externals in the repository (SVN) won't be a big issue because in case of a checkout always the tip version is downloaded 2.1 the SCM can be used to track the used version of the externals for a specific OO version - simply checkout the version tag and everything is in place ... 3. in a DSCM it would be a real problem over time because of the increasing space of all versions 4. we need a replacement http://hg.services.openoffice.org/binaries asap (who knows how long the server will be available) 5. many developers probably work with a local clone of the repository using for example git svn or something else - disadvantage of the increasing space but probably acceptable if a clean local trunk will be kept and updated Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? +1. And one more question: If we put something into SVN into .../trunk/ext_sources, do we have some URL that can replace http://hg so users don't have to check out everything? Ie. do we have a URL where we have real checkout of the SVN? Some SVN web interface? Don't know Apache infra well yet... That would be real killer solution! -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 22.09.2011 13:19, Jürgen Schmidt wrote: On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtienjhrecht...@web.dewrote: On 09/20/2011 05:26 PM, Rob Weir wrote: ... Placing all the external tarballs in the VCS is a real killer if using a distributed SCM like git or Mercurial, thats why we had moved them out. As Pavel said, it worked quite nice. As for the audit possibility, we referenced the external tar balls in the source tree by file name and a md5 check sum, which works just as reliantly as putting them directly into the repository. Nowadays the DSCM have some alternative methods which deal with such blobs but in essence they also keep them separate. If AOOo ever plans to go back to a DSCM I would keep the source tree and the external blobs strictly separated. All in all the general SCM tooling community opinion trend seems to be that a S(ource)CM system is for, well, source and external dependencies are better handled with other mechanism, like Maven or so. With SVN all this is less of a concern, naturally. ok, we have several arguments for and against but no decision how we want to move forward. Let us take again a look on it 1. we have a working mechanism to get the externals from somewhere, check md5 sum, unpack, patch, build 1.1 somewhere is configurable during the configure step, initially the externals are downloaded from http://hg.services.openoffice.org/binaries 2. having the externals in the repository (SVN) won't be a big issue because in case of a checkout always the tip version is downloaded 2.1 the SCM can be used to track the used version of the externals for a specific OO version - simply checkout the version tag and everything is in place ... 3. in a DSCM it would be a real problem over time because of the increasing space of all versions 4. we need a replacement http://hg.services.openoffice.org/binaries asap (who knows how long the server will be available) 5. many developers probably work with a local clone of the repository using for example git svn or something else - disadvantage of the increasing space but probably acceptable if a clean local trunk will be kept and updated Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? +1 Best current solution: Added to SVN where it does not really matter, and a way to get back when we may change to a DSCM in the future. Juergen sincerely, Armin -- ALG
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
2011/9/22 Pavel Janík pa...@janik.cz Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? +1. And one more question: If we put something into SVN into .../trunk/ext_sources, do we have some URL that can replace http://hg so users don't have to check out everything? Ie. do we have a URL where we have real checkout of the SVN? Some SVN web interface? Don't know Apache infra well yet... That would be real killer solution! don't know if it is what you are looking for but wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/ filename?view=co should download the head version. Juergen -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
don't know if it is what you are looking for but wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/ filename?view=co should download the head version. Then we should be able to have both things solved - files in SVN and with a relatively small change in the download script also the remote fetching of the files if we do not have ext_sources local checkout. -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
2011/9/22 Pavel Janík pa...@janik.cz: Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? +1. And one more question: If we put something into SVN into .../trunk/ext_sources, do we have some URL that can replace http://hg so users don't have to check out everything? Ie. do we have a URL where we have real checkout of the SVN? Some SVN web interface? Don't know Apache infra well yet... That would be real killer solution! -- I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources This would save having a duplicate local SVN working copy of the file, right? -Rob
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote: I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources yes, this is the correct URL, the URL that i have posted wouldn't work Juergen This would save having a duplicate local SVN working copy of the file, right? -Rob
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
2011/9/22 Jürgen Schmidt jogischm...@googlemail.com On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote: I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources yes, this is the correct URL, the URL that i have posted wouldn't work Juergen This would save having a duplicate local SVN working copy of the file, right? mmh, no or i understand something wrong. People checkout .../trunk and would get ext_sources, main and extras. To benefit from the modified script we have to put ext_sources besides trunk .../ooo/ext_sources .../ooo/trunk/main .../ooo/trunk/extras Means back to my initial proposal, right? Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
2011/9/22 Jürgen Schmidt jogischm...@googlemail.com: 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote: I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources yes, this is the correct URL, the URL that i have posted wouldn't work Juergen This would save having a duplicate local SVN working copy of the file, right? mmh, no or i understand something wrong. People checkout .../trunk and would get ext_sources, main and extras. To benefit from the modified script we have to put ext_sources besides trunk .../ooo/ext_sources .../ooo/trunk/main .../ooo/trunk/extras Means back to my initial proposal, right? I think the idea is this: Everything under ooo represents what goes into a release. It can be tagged and branched. trunk/ is a peer to a tags/ and branches/ directory. It is possible that we have this wrong. Adding in site/ and ooo-site/ brings in a different convention. They have are set up to have trunk/tags/branches underneath them. That is fine, because the website does not release in synch with an OOo release. It makes sense for them to be able to tag and branch independently. We should also consider how the project grows going forward. We know that other code bases will be checked in, like Symphony. And there are other, small, but disjoint contributions that I'm working on as well. So it might make sense to move trunk down one level: /ooo/ooo-src/trunk/main /ooo/ooo-src/trunk/extras /ooo/ooo-src/trunk/ext-sources /ooo/ooo-src/tags /ooo/ooo-src/branches That would make more sense then, as a unit, since we would want to tag the across all of /ooo/ooo-src/ to define a release. I assume a developer still just checks out ooo/ooo-src/trunk/main. If they need the additional extras then they check that out separately. I don't think most users will want to check out the entire trunk all the time. We should consider also how we want this tree to grow over time, as other related In the end, I think we want to preserve the ability to: 1) Preserve an audit trail of all changes that went into a release 2) Do be able to tag and branch a release and everything that is in the release 3) Restore the exact state of a previous tagged release, including the exact ext-sources used in that release I'm certain that my proposal will enable this. There may be other approaches that do as well. Another thing to keep in mind is the SVN support for externals: http://svnbook.red-bean.com/en/1.0/ch07s03.html This might make some things easier. -Rob Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
hi, Based on this result, an other trunk will be like the following if IBM symphony checked in: /ooo/symphony-src/trunk/main /ooo/symphony-src/trunk/extras /ooo/symphony-src/tags /ooo/symphony-src/branches thus it introduces a problem: How to merge the two trunks of symphony-src and ooo-src? thanks mail:zhaos...@cn.ibm.com Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8, Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, P.R.China Rob Weir robweir@apache.o rgTo ooo-dev@incubator.apache.org, 2011-09-22 21:18 cc Subject Please respond to Re: handling of ext_sources - ooo-dev@incubator Juergen's suggestion [was: Re: A .apache.orgsystematic approach to IP review?] 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com: 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote: I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources yes, this is the correct URL, the URL that i have posted wouldn't work Juergen This would save having a duplicate local SVN working copy of the file, right? mmh, no or i understand something wrong. People checkout .../trunk and would get ext_sources, main and extras. To benefit from the modified script we have to put ext_sources besides trunk .../ooo/ext_sources .../ooo/trunk/main .../ooo/trunk/extras Means back to my initial proposal, right? I think the idea is this: Everything under ooo represents what goes into a release. It can be tagged and branched. trunk/ is a peer to a tags/ and branches/ directory. It is possible that we have this wrong. Adding in site/ and ooo-site/ brings in a different convention. They have are set up to have trunk/tags/branches underneath them. That is fine, because the website does not release in synch with an OOo release. It makes sense for them to be able to tag and branch independently. We should also consider how the project grows going forward. We know that other code bases will be checked in, like Symphony. And there are other, small, but disjoint contributions that I'm working on as well. So it might make sense to move trunk down one level: /ooo/ooo-src/trunk/main /ooo/ooo-src/trunk/extras /ooo/ooo-src/trunk/ext-sources /ooo/ooo-src/tags /ooo/ooo-src/branches That would make more sense then, as a unit, since we would want to tag the across all of /ooo/ooo-src/ to define a release. I assume a developer still just checks out ooo/ooo-src/trunk/main. If they need the additional extras then they check that out separately. I don't think most users will want to check out the entire trunk all the time. We should consider also how we want this tree to grow over time, as other related In the end, I think we want to preserve the ability to: 1) Preserve an audit trail of all changes that went into a release 2) Do be able to tag and branch a release and everything that is in the release 3) Restore the exact state of a previous tagged release, including the exact ext-sources used in that release I'm certain that my proposal will enable this. There may be other approaches that do as well. Another thing to keep in mind is the SVN support for externals: http://svnbook.red-bean.com/en/1.0/ch07s03.html This might make some things easier. -Rob Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Thu, Sep 22, 2011 at 3:18 PM, Rob Weir robw...@apache.org wrote: It is possible that we have this wrong. Adding in site/ and ooo-site/ brings in a different convention. They have are set up to have trunk/tags/branches underneath them. That is fine, because the website does not release in synch with an OOo release. It makes sense for them to be able to tag and branch independently. agree We should also consider how the project grows going forward. We know that other code bases will be checked in, like Symphony. And there are other, small, but disjoint contributions that I'm working on as well. So it might make sense to move trunk down one level: /ooo/ooo-src/trunk/main /ooo/ooo-src/trunk/extras /ooo/ooo-src/trunk/ext-sources /ooo/ooo-src/tags /ooo/ooo-src/branches That would make more sense then, as a unit, since we would want to tag the across all of /ooo/ooo-src/ to define a release. agree, from this perspective it make sense. The question then is when we want to introduce this further level? I assume a developer still just checks out ooo/ooo-src/trunk/main. If they need the additional extras then they check that out separately. I don't think most users will want to check out the entire trunk all the time. We should consider also how we want this tree to grow over time, as other related i assumed that a developer will check out trunk, maybe a wrong assumption In the end, I think we want to preserve the ability to: 1) Preserve an audit trail of all changes that went into a release 2) Do be able to tag and branch a release and everything that is in the release 3) Restore the exact state of a previous tagged release, including the exact ext-sources used in that release I'm certain that my proposal will enable this. There may be other approaches that do as well. i think so too. And with my changed mindset to not always check out trunk completely i am fine with this approach. Another thing to keep in mind is the SVN support for externals: http://svnbook.red-bean.com/en/1.0/ch07s03.html interesting, i didn't know that before Juergen
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Thu, Sep 22, 2011 at 9:40 AM, Shao Zhi Zhao zhaos...@cn.ibm.com wrote: hi, Based on this result, an other trunk will be like the following if IBM symphony checked in: /ooo/symphony-src/trunk/main /ooo/symphony-src/trunk/extras /ooo/symphony-src/tags /ooo/symphony-src/branches thus it introduces a problem: How to merge the two trunks of symphony-src and ooo-src? I don't think moving the tree down one level introduces any new problems for Symphony, so long as the directories within */main. remain the same. Of course, merging code from Symphony into AOOo will be difficult in general. The problem is how do we establish a common ancestor revision to do a 3-way merge with? This will really depend on whether Symphony has a good record of what the corresponding OOo revision was for each of its initial files. If not, then you can do a text diff and do some merging without trouble. But dealing with renamed files, or moved files, or deleted files, these are trickier to process automatically. If you don't have that history, then in theory it could be reestablished by taking the initial revision of each file in Symphony and comparing it to each revision of the same file in OOo Mercurial, and find which revision matches. It might be possible to establish enough context for a 3-way merge that way. -Rob thanks mail:zhaos...@cn.ibm.com Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8, Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, P.R.China [image: Inactive hide details for Rob Weir ---2011-09-22 21:20:44---Rob Weir robw...@apache.org]Rob Weir ---2011-09-22 21:20:44---Rob Weir robw...@apache.org *Rob Weir robw...@apache.org* 2011-09-22 21:18 Please respond to ooo-dev@incubator.apache.org To ooo-dev@incubator.apache.org, cc Subject Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?] 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com: 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote: I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources yes, this is the correct URL, the URL that i have posted wouldn't work Juergen This would save having a duplicate local SVN working copy of the file, right? mmh, no or i understand something wrong. People checkout .../trunk and would get ext_sources, main and extras. To benefit from the modified script we have to put ext_sources besides trunk .../ooo/ext_sources .../ooo/trunk/main .../ooo/trunk/extras Means back to my initial proposal, right? I think the idea is this: Everything under ooo represents what goes into a release. It can be tagged and branched. trunk/ is a peer to a tags/ and branches/ directory. It is possible that we have this wrong. Adding in site/ and ooo-site/ brings in a different convention. They have are set up to have trunk/tags/branches underneath them. That is fine, because the website does not release in synch with an OOo release. It makes sense for them to be able to tag and branch independently. We should also consider how the project grows going forward. We know that other code bases will be checked in, like Symphony. And there are other, small, but disjoint contributions that I'm working on as well. So it might make sense to move trunk down one level: /ooo/ooo-src/trunk/main /ooo/ooo-src/trunk/extras /ooo/ooo-src/trunk/ext-sources /ooo/ooo-src/tags /ooo/ooo-src/branches That would make more sense then, as a unit, since we would want to tag the across all of /ooo/ooo-src/ to define a release. I assume a developer still just checks out ooo/ooo-src/trunk/main. If they need the additional extras then they check that out separately. I don't think most users will want to check out the entire trunk all the time. We should consider also how we want this tree to grow over time, as other related In the end, I think we want to preserve the ability to: 1) Preserve an audit trail of all changes that went into a release 2) Do be able to tag and branch a release and everything that is in the release 3) Restore the exact state of a previous tagged release, including the exact ext-sources used in that release I'm certain that my proposal will enable this. There may be other approaches that do as well. Another thing to keep in mind is the SVN support for externals: http://svnbook.red-bean.com/en/1.0/ch07s03.html This might make some things easier. -Rob Juergen
RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
You can get anything off of the web interface of SVN at the individual level without it being in a working copy, though of course it has to be somewhere local while it is being processed in a build. But if you check-out the trunk, you get everything that is in the trunk HEAD (or a specified) version. As far as I know, you can do a check-out anywhere deeper in the tree and avoid everything not at that node [and below]. For example, just checkout trunk/main. It takes some consideration of SVN organization to have the desired flavors in convenient chunks that people can work with without having to eat the whole thing (with regard to SVN checkout, SVN update and, of course, SVN commits). I can testify that an SVN UDPATE of the working copy of the entire incubator/ooo/ subtree is a painful experience, even when there is nothing to update. - Dennis PS: I find it an interesting characteristic of SVN that trunk, tags, and branches are just names of folders and don't mean anything special to SVN. The nomenclature and it is use is a matter of custom, like code indentation rules for ( ... }. -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Thursday, September 22, 2011 05:24 To: ooo-dev@incubator.apache.org Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?] 2011/9/22 Pavel Janík pa...@janik.cz: Proposed way to move forward 1. put the externals under .../trunk/ext_sources .../trunk/ext_sources .../trunk/main .../trunk/extras 2. adapt configure to use this as default, disable the download (maybe reactivate it later if we move to a DSCM) 3. keep the process with checking the md5 sum as it is (for potential later use) Any opinions or suggestions? +1. And one more question: If we put something into SVN into .../trunk/ext_sources, do we have some URL that can replace http://hg so users don't have to check out everything? Ie. do we have a URL where we have real checkout of the SVN? Some SVN web interface? Don't know Apache infra well yet... That would be real killer solution! -- I was thinking something similar. We only need to use the SVN interface to the files when we're adding or updating. But we can have bootstrap continue to download via http. The location, using Juergen's proposed location, would be http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources This would save having a duplicate local SVN working copy of the file, right? -Rob
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 09/20/2011 05:26 PM, Rob Weir wrote: Ai2011/9/20 Pavel Janíkpa...@janik.cz: Have we ever considered using version control to...uh...manage file versions? Just an idea. Maybe Heiner will say more, but in the past, we have had the external tarballs in the VCS, but then we moved them out and it worked very well. There never was a reason to track external.tar.gz files in VCS, because we do not change them. -- That's fine. If they don't change, then doing a svn update will not bring them down each time. Aside from being useful for version control, SVN is useful also very useful as an audit trail. So in the rare occasions when one of these files does change, we know who changed it and why. This is important for ensuring the IP cleanliness of the project. Is your main concern performance? Even as individual tarballs, ext-sources is 86 files, 250MB. ooo/extras is 243 files and 822 MB. And ooo/main is 76,295 files for over 900MB. So ext-sources is not a huge contributor to download time. Placing all the external tarballs in the VCS is a real killer if using a distributed SCM like git or Mercurial, thats why we had moved them out. As Pavel said, it worked quite nice. As for the audit possibility, we referenced the external tar balls in the source tree by file name and a md5 check sum, which works just as reliantly as putting them directly into the repository. Nowadays the DSCM have some alternative methods which deal with such blobs but in essence they also keep them separate. If AOOo ever plans to go back to a DSCM I would keep the source tree and the external blobs strictly separated. All in all the general SCM tooling community opinion trend seems to be that a S(ource)CM system is for, well, source and external dependencies are better handled with other mechanism, like Maven or so. With SVN all this is less of a concern, naturally. Heiner -- Jens-Heiner Rechtien
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 7:05 PM, Rob Weir robw...@apache.org wrote: On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 04:47 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script
Re: A systematic approach to IP review?
So... has anyone actually run Apache RAT yet? It has a scan only mode which I'd think would be the simplest place to start. Personally, I'd recommend working on basic RAT scans, with the scripts to run them and any exception rules (for known files, etc.) all checked into SVN with the build tools for the code. But hey, it's easy for me to suggest we do stuff, when I only currently have time to be a mentor and thus can get away with just making suggestions. 8-) I like the general concept of storing the IP type for files in SVN properties; although properties are easy to change, Apache does have a strong history of being able to provide oversight for commit logs throughout a project's history. - Shane
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 1:59 PM, Rob Weir robw...@apache.org wrote: 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, i am not really happy with all the binaries in the trunk tree because of the large binary blobs and i don't expect too many changes of these dependencies. And i would like to avoid to check them out every time. What do others think about a structure where we have ext_sources besides trunk. incubator/ooo/trunk incubator/ooo/ext_source ... If we can agree on such a structure i would move forward to bring in some new external sources. The proposed ucpp preprocessor - BSD license, used in the idlc and of course part of the SDK later on. I made some tests with it and was able to build the sources on windows in our cygwin environment with a new gnu make file. I was also able to build udkapi and offapi with this new and adapted idlc/ucpp without any problems - generated type library is equal to the old one. I have to run some more tests on other platforms as soon as i have other platforms available for testing. I decided to replace the preprocessor instead of removing it because of compatibility reasons and it was of course the easier change. The next step is to check how the process with ext_sources work in detail in our build process and adapt the new ucpp module. If anybody is familiar with ext_sources and can point me to potential hurdles, please let me know (on a new thread) ;-) Juergen 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible
Re: A systematic approach to IP review?
On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote: So... has anyone actually run Apache RAT yet? It has a scan only mode which I'd think would be the simplest place to start. it's on my todo list to take a look on it, probably i will come back with questions Juergen Personally, I'd recommend working on basic RAT scans, with the scripts to run them and any exception rules (for known files, etc.) all checked into SVN with the build tools for the code. But hey, it's easy for me to suggest we do stuff, when I only currently have time to be a mentor and thus can get away with just making suggestions. 8-) I like the general concept of storing the IP type for files in SVN properties; although properties are easy to change, Apache does have a strong history of being able to provide oversight for commit logs throughout a project's history. - Shane
handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Hi, On 20.09.2011 14:37, Jürgen Schmidt wrote: On Mon, Sep 19, 2011 at 1:59 PM, Rob Weirrobw...@apache.org wrote: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: ... Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, i am not really happy with all the binaries in the trunk tree because of the large binary blobs and i don't expect too many changes of these dependencies. And i would like to avoid to check them out every time. What do others think about a structure where we have ext_sources besides trunk. incubator/ooo/trunk incubator/ooo/ext_source ... I like this idea. From a developer point of view I only have to checkout ext_sources once and reference it from all my trunks using the already existing configure-switch 'with-external-tar=path to ext_sources' Best regards, Oliver.
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Hi, I like this idea. From a developer point of view I only have to checkout ext_sources once and reference it from all my trunks using the already existing configure-switch 'with-external-tar=path to ext_sources' when we will have such repository, we will surely modify the current sources so you don't have to add such switch because ../ext_sources will be auto-checked. BTW - welcome! :-) -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grand armin.le.gr...@me.com wrote: On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote: Hi, On 20.09.2011 14:37, Jürgen Schmidt wrote: ... What do others think about a structure where we have ext_sources besides trunk. incubator/ooo/trunk incubator/ooo/ext_source ... So are we saying we would never need to branch or tag these files? For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0. Then someone finds a serious security flaw in AOOo 3.4.0, and we decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1. Would we be able to do this? What if the flaw was related to code in ext_sources? And if not us, in the project, what if some downstream consumer of AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever. But we've already updated ext_sources for AOOo 4.0? In other words, how do we track, in SVN, a compatible set of matching trunk/ and ext_source/ revisions, so we (or someone else) can recreate any released version of AOOo? -Rob I like this idea. From a developer point of view I only have to checkout ext_sources once and reference it from all my trunks using the already existing configure-switch 'with-external-tar=path to ext_sources' +1 Also, hopefully ext_sources will not change too much (after a consolidation phase) and it's mostly binaries, thus not too well suited for a repository. Let's not extend our main repository with those binaries, please. Best regards, Oliver. Regards, Armin -- ALG
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Would we be able to do this? What if the flaw was related to code in ext_sources? Then we patch it. Patch will be in the trunk/main, as always. And if not us, in the project, what if some downstream consumer of AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever. But we've already updated ext_sources for AOOo 4.0? Versions - we can and will have more tarballs of one external source. This all is already solved. -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
On 20.09.2011 15:58, Rob Weir wrote: On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grandarmin.le.gr...@me.com wrote: On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote: Hi, On 20.09.2011 14:37, Jürgen Schmidt wrote: ... What do others think about a structure where we have ext_sources besides trunk. incubator/ooo/trunk incubator/ooo/ext_source ... So are we saying we would never need to branch or tag these files? For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0. Then someone finds a serious security flaw in AOOo 3.4.0, and we decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1. Would we be able to do this? What if the flaw was related to code in ext_sources? And if not us, in the project, what if some downstream consumer of AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever. But we've already updated ext_sources for AOOo 4.0? In other words, how do we track, in SVN, a compatible set of matching trunk/ and ext_source/ revisions, so we (or someone else) can recreate any released version of AOOo? Good point. Thus, it should be part of incubator/ooo/trunk, something like: incubator/ooo/trunk/main incubator/ooo/trunk/extras incubator/ooo/trunk/ext_sources It could be in an own repro, but this would just bring up the risk to not use the same tags in both (by purpose or by error). Indeed, looks as if it has to be a part of trunk somehow. Not very nice for binaries. Maybe we could find a intermediate place for them as long as we will need to do changes pretty often. Currently we will have to do some add/remove/changes to it. It could be good to add them to trunk after it has stabilized a little more. -Rob I like this idea. From a developer point of view I only have to checkout ext_sources once and reference it from all my trunks using the already existing configure-switch 'with-external-tar=path to ext_sources' +1 Also, hopefully ext_sources will not change too much (after a consolidation phase) and it's mostly binaries, thus not too well suited for a repository. Let's not extend our main repository with those binaries, please. Best regards, Oliver. Regards, Armin -- ALG
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
+1 - This will make it easier to update the BSD/MIT unrestricted stuff. - Hopefully it also means we will eventually stop depending on GNU patch for the build. Welcome Oliver! Great job Juergen: it's the first code replacement and a very necessary one for OO forks too (unless they want to carry lcc's copyright;) ). cheers, Pedro. On Tue, 20 Sep 2011 15:44:59 +0200, Pavel Janík pa...@janik.cz wrote: Hi, I like this idea. From a developer point of view I only have to checkout ext_sources once and reference it from all my trunks using the already existing configure-switch 'with-external-tar=path to ext_sources' when we will have such repository, we will surely modify the current sources so you don't have to add such switch because ../ext_sources will be auto-checked. BTW - welcome! :-)
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Have we ever considered using version control to...uh...manage file versions? Just an idea. Maybe Heiner will say more, but in the past, we have had the external tarballs in the VCS, but then we moved them out and it worked very well. There never was a reason to track external.tar.gz files in VCS, because we do not change them. -- Pavel Janík
Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]
Ai2011/9/20 Pavel Janík pa...@janik.cz: Have we ever considered using version control to...uh...manage file versions? Just an idea. Maybe Heiner will say more, but in the past, we have had the external tarballs in the VCS, but then we moved them out and it worked very well. There never was a reason to track external.tar.gz files in VCS, because we do not change them. -- That's fine. If they don't change, then doing a svn update will not bring them down each time. Aside from being useful for version control, SVN is useful also very useful as an audit trail. So in the rare occasions when one of these files does change, we know who changed it and why. This is important for ensuring the IP cleanliness of the project. Is your main concern performance? Even as individual tarballs, ext-sources is 86 files, 250MB. ooo/extras is 243 files and 822 MB. And ooo/main is 76,295 files for over 900MB. So ext-sources is not a huge contributor to download time. Pavel Janík
Re: A systematic approach to IP review?
2011/9/20 Jürgen Schmidt jogischm...@googlemail.com: On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote: So... has anyone actually run Apache RAT yet? It has a scan only mode which I'd think would be the simplest place to start. it's on my todo list to take a look on it, probably i will come back with questions I did a run earlier today. Good news is we have 4 files with Apache license. Bad news is we have 52,876 files with unknown license. In most cases that should just be the standard OOo header. These scans will be much more useful after we've replaced the OOo headers with Apache headers. But we can't just do a global change. We should only make that change for files that are in the official Oracle SGA. After that is done, then the RAT report will be more useful. Juergen Personally, I'd recommend working on basic RAT scans, with the scripts to run them and any exception rules (for known files, etc.) all checked into SVN with the build tools for the code. But hey, it's easy for me to suggest we do stuff, when I only currently have time to be a mentor and thus can get away with just making suggestions. 8-) I like the general concept of storing the IP type for files in SVN properties; although properties are easy to change, Apache does have a strong history of being able to provide oversight for commit logs throughout a project's history. - Shane
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 3:34 AM, Pedro Giffuni giffu...@tutopia.com wrote: Hi; Is there an updated SGA already? good question and where can we find it? Juergen I think there will likely be a set of files of uncertain license that we should move to apache-extras. I am refering specifically to the dictionaries: Oracle might have property over some but not all. I propose we rescue myspell in apache-extras and put the dictionaries there to keep it as an alternative. I have no idea where to get MySpell though. While here, if there's still interest in maintaining the Hg history, bitbucket.org seems to be a nice alternative: it's rather specialized in Mercurial. Cheers, Pedro. On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. -Rob [1] http://incubator.apache.org/**projects/openofficeorg.htmlhttp://incubator.apache.org/projects/openofficeorg.html [2] http://incubator.apache.org/**rat/ http://incubator.apache.org/rat/
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. talked about it yes but did we reached a final decision? The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can be used. Do we want to continue with this wiki now? It's still not clear for me at the moment. But we need a place to document the IP clearance and under http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some information. Juergen -Rob [1] http://incubator.apache.org/projects/openofficeorg.html [2] http://incubator.apache.org/rat/
Re: A systematic approach to IP review?
On Sun, Sep 18, 2011 at 9:34 PM, Pedro Giffuni giffu...@tutopia.com wrote: Hi; Is there an updated SGA already? Not that I know of. But we can and should go ahead with IP clearance using the SGA we already have. In fact, starting that process will help us identify exactly which files needed to be added to the updated SGA. -Rob I think there will likely be a set of files of uncertain license that we should move to apache-extras. I am refering specifically to the dictionaries: Oracle might have property over some but not all. I propose we rescue myspell in apache-extras and put the dictionaries there to keep it as an alternative. I have no idea where to get MySpell though. While here, if there's still interest in maintaining the Hg history, bitbucket.org seems to be a nice alternative: it's rather specialized in Mercurial. Cheers, Pedro. On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. -Rob [1] http://incubator.apache.org/projects/openofficeorg.html [2] http://incubator.apache.org/rat/
Re: A systematic approach to IP review?
2011/9/19 Jürgen Schmidt jogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going
Re: A systematic approach to IP review?
Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. talked
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did.
Re: A systematic approach to IP review?
--- On Mon, 9/19/11, Rob Weir robw...@apache.org wrote: ... 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com: ... do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. But adding in stuff that we have to remove immediately (nss, seamonkey, .. ) doesn't help either. I also think a lot of that stuff has to be updated before brought in: ICU apparently would be trouble, but the Apache-commons, ICC, and other stuff can/should be updated. snip a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. I thought we had delayed updating the copyrights in the header to ease the CWS integration. I still hope to see more of those, especially anything related to gnumake (I don't know when, but dmake has to go!). Using the SVN properties is a good idea. And we do have to start the NOTICES file. All just IMHO, of course. Pedro.
RE: A systematic approach to IP review?
Rob, I was reading the suggestion from Marcus as it being that since the code base is in a folder structure (modularized) and the wiki can map folder structures and their status nicely, it is not necessary to have a single table to manage this from, but have any tables be at some appropriate granularity toward the leaves of the hierarchy (on the wiki). I can see some brittle cases, especially in the face of refactoring. The use of the wiki might have to be an ephemeral activity that is handled this way entirely for our initial scrubbing. Ideally, additional and sustained review would be in the SVN with the artifacts so reviewed, and coalesced somehow. The use of SVN properties is interesting, but they are rather invisible and I have a question about what happens with them when a commit happens against the particular artifact. It seems that there is some need to balance an immediate requirement and what would be sufficient for it versus what would assist us in the longer term. It would be interesting to know what the additional-review work has become for other projects that have a substantial code base (e.g., SVN itself, httpd, ...). I have no idea. - Dennis -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 07:47 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention
RE: A systematic approach to IP review?
On the wiki question, I think OOOUSERS should continue to be used for transition work. Or OOODEV could be used if it needs to be limited to committers (perhaps the case for this activity), although it means power observers can't contribute there and have to do so by some other means. This is transition work and the Confluence wiki seems like a good place for it. The MW may be interrupted or disrupted and it is probably a good idea to *not* put such development-transition intensive content there. Also, the migrated wiki is not the live wiki at OpenOffice.org. So doing anything there will create collisions. It is also not fully migrated in that it is not operating in place of what folks see via OpenOffice.org as far as I know. The current Confluence wikis avoid confusion and are stable for this particular purpose. - Dennis -Original Message- From: Jürgen Schmidt [mailto:jogischm...@googlemail.com] Sent: Monday, September 19, 2011 01:45 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote: [ ... ] 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. talked about it yes but did we reached a final decision? The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can be used. Do we want to continue with this wiki now? It's still not clear for me at the moment. [ ... ]
Re: A systematic approach to IP review?
Am 09/19/2011 04:47 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: Rob, I was reading the suggestion from Marcus as it being that since the code base is in a folder structure (modularized) and the wiki can map folder structures and their status nicely, it is not necessary to have a single table to manage this from, but have any tables be at some appropriate granularity toward the leaves of the hierarchy (on the wiki). Using the wiki for this might be useful for tracking the status of modules we already know we need to replace. Bugzilla would be another way to track the status. But it is not really a sufficient solution. Why? Because it is not tied to the code and is not reproducible. How was the list of components listed in the wiki generated? Based on what script? Where is the script? How do we know it is accurate and current? How do we know that integrating a CWS does not make that list become outdated? How do we prove to ourselves that we did this right? And how to we record that proof as a record? And how do we repeat this proof every time we do a new release? A list of components of unknown derivation sitting on a community wiki that anyone can edit is not really a suitable basis for an IP review. The granularity we need to worry about is the file. That is the finest grain level of having a license header. That is the unit of tracking in SVN. That is the unit that someone could have changed the content in SVN. Again, it is fine if someone wants to outline this at the module level. But that does not eliminate the requirement for us to do this at the file level as well. I can see some brittle cases, especially in the face of refactoring. The use of the wiki might have to be an ephemeral activity that is handled this way entirely for our initial scrubbing. Ideally, additional and sustained review would be in the SVN with the artifacts so reviewed, and coalesced somehow. The use of SVN properties is interesting, but they are rather invisible and I have a question about what happens with them when a commit happens against the particular artifact. Properties stick with the file, unless changed. Think of the svn:eol-style property. It is not wiped out with a new revision of the file. It seems that there is some need to balance an immediate requirement and what would be sufficient for it versus what would assist us in the longer term. It would be interesting to know what the additional-review work has become for other projects that have a substantial code base (e.g., SVN itself, httpd, ...). I have no idea. The IP review needs to occur with every release. So the work we do to automate this, and make it data-drive, will repay itself with every release. I invite you to investigate what other projects do. When you do I think you will agree. - Dennis -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 07:47 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 04:47 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be
Re: A systematic approach to IP review?
Am 09/19/2011 06:54 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: Rob, I was reading the suggestion from Marcus as it being that since the code base is in a folder structure (modularized) and the wiki can map folder structures and their status nicely, it is not necessary to have a single table to manage this from, but have any tables be at some appropriate granularity toward the leaves of the hierarchy (on the wiki). Using the wiki for this might be useful for tracking the status of modules we already know we need to replace. Bugzilla would be another way to track the status. How do you want to use Bugzilla to track thousands of files? But it is not really a sufficient solution. Why? Because it is not tied to the code and is not reproducible. How was the list of components listed in the wiki generated? Based on what script? Where is the script? How do we know it is accurate and current? How do we know that integrating a CWS does not make that list become outdated? How do we prove to ourselves that we did this right? And how to we record that proof as a record? And how do we repeat this proof every time we do a new release? Questions over questions but not helpful. ;-) A list of components of unknown derivation sitting on a community wiki that anyone can edit is not really a suitable basis for an IP review. Then restrict the write access. The granularity we need to worry about is the file. That is the finest grain level of having a license header. That is the unit of tracking in SVN. That is the unit that someone could have changed the content in SVN. Again, it is fine if someone wants to outline this at the module level. But that does not eliminate the requirement for us to do this at the file level as well. IMHO you haven't understood what I wanted to tell you. Sure it makes no sense to create a list of every file in SVN to see if the license is good or bad. So, do it module by module. And when a module is marked as done, then of course every file in the modules was checked. Otherwise it's not working. And how to make sure that there was no change when source was added/moved/improved? Simply Commit Then Review (CTR). A change in the license header at the beginning should be remarkable, right? However, we also need to have trust in everybodies work. BTW: What is your plan to track every file to make sure the license is OK? Marcus I can see some brittle cases, especially in the face of refactoring. The use of the wiki might have to be an ephemeral activity that is handled this way entirely for our initial scrubbing. Ideally, additional and sustained review would be in the SVN with the artifacts so reviewed, and coalesced somehow. The use of SVN properties is interesting, but they are rather invisible and I have a question about what happens with them when a commit happens against the particular artifact. Properties stick with the file, unless changed. Think of the svn:eol-style property. It is not wiped out with a new revision of the file. It seems that there is some need to balance an immediate requirement and what would be sufficient for it versus what would assist us in the longer term. It would be interesting to know what the additional-review work has become for other projects that have a substantial code base (e.g., SVN itself, httpd, ...). I have no idea. The IP review needs to occur with every release. So the work we do to automate this, and make it data-drive, will repay itself with every release. I invite you to investigate what other projects do. When you do I think you will agree. - Dennis -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 07:47 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more
Re: A systematic approach to IP review?
Am 09/19/2011 07:05 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 04:47 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com: On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. do you mean to check in the files under ext_source into svn and remove it later on when we have cleaned up the code. Or do you mean to put it somehwere on apache extras? I would prefer to save these binary files under apache extra if possible. Why not just keep in in SVN? Moving things to Apache-Extras does not help us with the IP review. In other words, if we have a dependency on a OSS module that has an incompatible license, then moving that module to Apache Extras does not make that dependency go away. We still need to understand the nature of the dependency: a build tool, a dynamic runtime dependency, a statically linked library, an optional extensions, a necessary core module. If we find out, for example, that something in ext-sources is only used as a build tool, and is not part of the release, then there is nothing that prevents us from hosting it in SVN. But if something is a necessary library and it is under GPL, then this is a problem even if we store it on Apache-Extras, 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 1:19 PM, Marcus (OOo) marcus.m...@wtnet.de wrote: Am 09/19/2011 06:54 PM, schrieb Rob Weir: On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: Rob, I was reading the suggestion from Marcus as it being that since the code base is in a folder structure (modularized) and the wiki can map folder structures and their status nicely, it is not necessary to have a single table to manage this from, but have any tables be at some appropriate granularity toward the leaves of the hierarchy (on the wiki). Using the wiki for this might be useful for tracking the status of modules we already know we need to replace. Bugzilla would be another way to track the status. How do you want to use Bugzilla to track thousands of files? No. But for tracking module review, Bugzilla might be better than the wiki. It allows us to have a conversation on each module via comments. But it is not really a sufficient solution. Why? Because it is not tied to the code and is not reproducible. How was the list of components listed in the wiki generated? Based on what script? Where is the script? How do we know it is accurate and current? How do we know that integrating a CWS does not make that list become outdated? How do we prove to ourselves that we did this right? And how to we record that proof as a record? And how do we repeat this proof every time we do a new release? Questions over questions but not helpful. ;-) A list of components of unknown derivation sitting on a community wiki that anyone can edit is not really a suitable basis for an IP review. Then restrict the write access. The granularity we need to worry about is the file. That is the finest grain level of having a license header. That is the unit of tracking in SVN. That is the unit that someone could have changed the content in SVN. Again, it is fine if someone wants to outline this at the module level. But that does not eliminate the requirement for us to do this at the file level as well. IMHO you haven't understood what I wanted to tell you. I understand what you are saying. I just don't agree with you. Sure it makes no sense to create a list of every file in SVN to see if the license is good or bad. So, do it module by module. And when a module is marked as done, then of course every file in the modules was checked. Otherwise it's not working. That is not a consistent approach. Every developer applies their own criteria. It is not reproducible. It leaves no audit trail. And it doesn't help us with the next release. If you use the Apache Release Audit Tool (RAT) then it will check all the files automatically. And how to make sure that there was no change when source was added/moved/improved? Simply Commit Then Review (CTR). A change in the license header at the beginning should be remarkable, right? However, we also need to have trust in everybodies work. We would run RAT before every release and with every significant code contribution. You can think of this as a form of CTR, but one that is automated, with a consistent rule set. Obviously, good CTR plus the work on the wiki will all help. But we need the RAT scans as well, to show that we're clean. BTW: What is your plan to track every file to make sure the license is OK? Run RAT. That is what it does. Marcus I can see some brittle cases, especially in the face of refactoring. The use of the wiki might have to be an ephemeral activity that is handled this way entirely for our initial scrubbing. Ideally, additional and sustained review would be in the SVN with the artifacts so reviewed, and coalesced somehow. The use of SVN properties is interesting, but they are rather invisible and I have a question about what happens with them when a commit happens against the particular artifact. Properties stick with the file, unless changed. Think of the svn:eol-style property. It is not wiped out with a new revision of the file. It seems that there is some need to balance an immediate requirement and what would be sufficient for it versus what would assist us in the longer term. It would be interesting to know what the additional-review work has become for other projects that have a substantial code base (e.g., SVN itself, httpd, ...). I have no idea. The IP review needs to occur with every release. So the work we do to automate this, and make it data-drive, will repay itself with every release. I invite you to investigate what other projects do. When you do I think you will agree. - Dennis -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 07:47 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de wrote: Am 09/19/2011 01:59 PM, schrieb Rob Weir: 2011/9/19 Jürgen Schmidtjogischm
RE: A systematic approach to IP review?
I agree running rat is important ... I haven't heard any suggestion that such an important tool not be used. -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 10:05 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? [ ... ] I think the wiki is fine as a collaboration tool, to list tasks and who is working on them. But that is not a substitute for running scans with the Apache Release Audit Tool (RAT) and working toward a clean report. Think of it this way: 1) We have a list of modules on the wiki that we need to replace. Great. Developers can work on that list. 2) But how do we know that the list on the wiki is complete? How do we know that it is not missing anything? 3) Running RAT against the source is how we ensure that the code is clean In other words, the criteria should be that we have a clean RAT record, not that we have a clean wiki. The list of modules on the wiki is not traceable to a scan of the source code. It is not reproducible. It might be useful. But it is not sufficient. -Rob [ ... ]
RE: A systematic approach to IP review?
I hope that Rat can produce a list of OK and exclude not OK on the first use, since the list of not OK would overwhelm everything else about the current repository. - Dennis -Original Message- From: Marcus (OOo) [mailto:marcus.m...@wtnet.de] Sent: Monday, September 19, 2011 10:27 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? Am 09/19/2011 07:05 PM, schrieb Rob Weir: [ ... ] 3) Running RAT against the source is how we ensure that the code is clean OK, I don't know what this can do your us. Maybe it's the solution for the problem. How do you know that it is not skipping anything? I guess you simply would trust RAT that it is doing fine, right? ;-) BTW: Is RAT producing a log file, so that we have a list of every file that was checked? This could be very helpful. Marcus [ ... ]
RE: A systematic approach to IP review?
I agree that there is no escape from managing down to the individual file. It is a question of organization now, where the entire base is involved. Later, if the svn:property is to be trusted, the problem is quite different, it seems to me. Plus the rules are understood and provenance and IP are likely handled as anything needing clearance enters the code base. What is done to ensure a previously-vetted code base has not become tainted strikes me as a kind of regression/smoke test. It is in that regard that I am concerned the tools for this one-time case need not be the same as for future cases. And, since I am not doing the work in the present case, I am offering this as something to think about, not a position. - Dennis -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Monday, September 19, 2011 09:55 To: ooo-dev@incubator.apache.org Subject: Re: A systematic approach to IP review? [ ... ] The granularity we need to worry about is the file. That is the finest grain level of having a license header. That is the unit of tracking in SVN. That is the unit that someone could have changed the content in SVN. Again, it is fine if someone wants to outline this at the module level. But that does not eliminate the requirement for us to do this at the file level as well. [ ... ]
Re: A systematic approach to IP review?
On Mon, Sep 19, 2011 at 4:32 PM, Dennis E. Hamilton dennis.hamil...@acm.org wrote: I agree that there is no escape from managing down to the individual file. It is a question of organization now, where the entire base is involved. RAT or something RAT-like. Later, if the svn:property is to be trusted, the problem is quite different, it seems to me. Plus the rules are understood and provenance and IP are likely handled as anything needing clearance enters the code base. What is done to ensure a previously-vetted code base has not become tainted strikes me as a kind of regression/smoke test. Here is how I see SVN properties and RAT relating. Any use of a grep-like RAT-like tool will need to deal with exceptions. We're going to have stuff like binary files, say ODF files that are used for testing, that don't have a header. Or files that are used only as a build tool, checked in for convenience, but are not part of the release. Or 3rd party code that does not have a header, but we know its original, like the ICU breakiterator data files. How do we deal with those types of files, in the content of an automated audit tool? One solution is to record in a big config file or script a list of all of these exceptions. Essentially, an list of files to ignore in the RAT scan. That approach would certainly work, but would be fragile. Moving or renaming the files would break our script. Not the end of the world, since this could be designed to be fail safe and give us errors on the files that moved. But if we track this info in SVN, then we could generate the exclusion list from SVN, so it automatically adjusts as files are moved or renamed. It also avoid the problem -- and this might just be my own engineering esthetic -- of tracking metadata for files in two different places. It seems rather untidy to me. From a regression standpoint, you could treat all files as being in one of several states: 1) Unexamined (no property set) 2) Apache 2.0 (included in the Oracle SGA or new code contributed by committer or other person under iCLA) 3) Compatible 3rd party license 4) Incompatible 3rd party license 5) Not part of release The goal would be to iterate until every file is in category 2, 3 or 5. It is in that regard that I am concerned the tools for this one-time case need not be the same as for future cases. There are two kinds of future cases: 1) Code contributed in small chunks by committers or patches, where we can expect CTR to work. There will be errors, but we can catch those before we do subsequent releases via RAT. 2) Larger contributions made by SGA. For example, the IBM Lotus Symphony contribution, or other similar corporate contributions. When an Apache project receives a large code contribution like this they need to do an IP clearance process on that contribution as well. I think that the RAT/SVN combination could work well here also. The goal would be to clear the IP on the new contributions before we start copying or merging it into the core AOOo code. And, since I am not doing the work in the present case, I am offering this as something to think about, not a position. - Dennis
RE: A systematic approach to IP review?
+1 -Original Message- From: Rob Weir [mailto:robw...@apache.org] Sent: Sunday, September 18, 2011 17:27 To: ooo-dev@incubator.apache.org Subject: A systematic approach to IP review? If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. -Rob [1] http://incubator.apache.org/projects/openofficeorg.html [2] http://incubator.apache.org/rat/
Re: A systematic approach to IP review?
Hi; Is there an updated SGA already? I think there will likely be a set of files of uncertain license that we should move to apache-extras. I am refering specifically to the dictionaries: Oracle might have property over some but not all. I propose we rescue myspell in apache-extras and put the dictionaries there to keep it as an alternative. I have no idea where to get MySpell though. While here, if there's still interest in maintaining the Hg history, bitbucket.org seems to be a nice alternative: it's rather specialized in Mercurial. Cheers, Pedro. On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote: If you haven't looked it closely, it is probably worth a few minutes of your time to review our incubation status page, especially the items under Copyright and Verify Distribution Rights. It lists the things we need to do, including: -- Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. -- Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. -- Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and redistribute. -- Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. Some of this is already going on, but it is hard to get a sense of who is doing what and how much progress we have made. I wonder if we can agree to a more systematic approach? This will make it easier to see the progress we're making and it will also make it easier for others to help. Suggestions: 1) We need to get all files needed for the build into SVN. Right now there are some that are copied down from the OpenOffice.org website during the build's bootstrap process. Until we get the files all in one place it is hard to get a comprehensive view of our dependencies. 2) Continue the CWS integrations. Along with 1) this ensures that all the code we need for the release is in SVN. 3) Files that Oracle include in their SGA need to have the Apache license header inserted and the Sun/Oracle copyright migrated to the NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to automate parts of this. 4) Once the SGA files have the Apache headers, then we can make regular use of RAT to report on files that are lacking an Apache header. Such files might be in one of the following categories: a) Files that Oracle owns the copyright on and which should be included in an amended SGA b) Files that have a compatible OSS license which we are permitted to use. This might require that we add a mention of it to the NOTICE file. c) Files that have an incompatible OSS license. These need to be removed/replaced. d) Files that have an OSS license that has not yet been reviewed/categorized by Apache legal affairs. In that case we need to bring it to their attention. e) (Hypothetically) files that are not under an OSS license at all. E.g., a Microsoft header file. These must be removed. 5) We should to track the resolution of each file, and do this publicly. The audit trail is important. Some ways we could do this might be: a) Track this in SVN properties. So set ip:sga for the SGA files, ip:mit for files that are MIT licensed, etc. This should be reflected in headers as well, but this is not always possible. For example, we might have binary files where we cannot add headers, or cases where the OSS files do not have headers, but where we can prove their provenance via other means. b) Track this is a spreadsheet, one row per file. c) Track this is an text log file checked in SVN d) Track this in an annotated script that runs RAT, where the annotations document the reason for cases where we tell it to ignore a file or directory. 6) Iterate until we have a clean RAT report. 7) Goal should be for anyone today to be able to see what work remains for IP clearance, as well as for someone 5 years from now to be able to tell what we did. Tracking this on the community wiki is probably not good enough, since we've previously talked about dropping that wiki and going to MWiki. -Rob [1] http://incubator.apache.org/projects/openofficeorg.html [2] http://incubator.apache.org/rat/