subject:"Re\: A systematic approach to IP review\?"

On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtien jhrecht...@web.dewrote:

 On 09/20/2011 05:26 PM, Rob Weir wrote:

 Ai2011/9/20 Pavel Janíkpa...@janik.cz:

 Have we ever considered using version control to...uh...manage file
 versions?

 Just an idea.



 Maybe Heiner will say more, but in the past, we have had the external
 tarballs in the VCS, but then we moved them out and it worked very well.
 There never was a reason to track external.tar.gz files in VCS, because we
 do not change them.
 --


 That's fine.  If they don't change, then doing a svn update will not
 bring them down each time.

 Aside from being useful for version control, SVN is useful also very
 useful as an audit trail.  So in the rare occasions when one of these
 files does change, we know who changed it and why.  This is important
 for ensuring the IP cleanliness of the project.

 Is your main concern performance?  Even as individual tarballs,
 ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
 And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
 huge contributor to download time.


 Placing all the external tarballs in the VCS is a real killer if using a
 distributed SCM like git or Mercurial, thats why we had moved them out. As
 Pavel said, it worked quite nice. As for the audit possibility, we
 referenced the external tar balls in the source tree by file name and a md5
 check sum, which works just as reliantly as putting them directly into the
 repository.

 Nowadays the DSCM have some alternative methods which deal with such blobs
 but in essence they also keep them separate.

 If AOOo ever plans to go back to a DSCM I would keep the source tree and
 the external blobs strictly separated.

 All in all the general SCM tooling community opinion trend seems to be that
 a S(ource)CM system is for, well, source and external dependencies are
 better handled with other mechanism, like Maven or so.

 With SVN all this is less of a concern, naturally.

 ok, we have several arguments for and against but no decision how we want
to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 somewhere is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version - simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else - disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?

Juergen

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík

 Proposed way to move forward
 
 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)
 
 Any opinions or suggestions?


+1.

And one more question:

If we put something into SVN into .../trunk/ext_sources, do we have some URL 
that can replace http://hg so users don't have to check out everything? Ie. 
do we have a URL where we have real checkout of the SVN? Some SVN web 
interface? Don't know Apache infra well yet... That would be real killer 
solution!
-- 
Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Armin Le Grand


On 22.09.2011 13:19, Jürgen Schmidt wrote:

On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtienjhrecht...@web.dewrote:


On 09/20/2011 05:26 PM, Rob Weir wrote:

...


Placing all the external tarballs in the VCS is a real killer if using a
distributed SCM like git or Mercurial, thats why we had moved them out. As
Pavel said, it worked quite nice. As for the audit possibility, we
referenced the external tar balls in the source tree by file name and a md5
check sum, which works just as reliantly as putting them directly into the
repository.

Nowadays the DSCM have some alternative methods which deal with such blobs
but in essence they also keep them separate.

If AOOo ever plans to go back to a DSCM I would keep the source tree and
the external blobs strictly separated.

All in all the general SCM tooling community opinion trend seems to be that
a S(ource)CM system is for, well, source and external dependencies are
better handled with other mechanism, like Maven or so.

With SVN all this is less of a concern, naturally.

ok, we have several arguments for and against but no decision how we want

to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 somewhere is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version -  simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else -  disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?


+1

Best current solution: Added to SVN where it does not really matter, and 
a way to get back when we may change to a DSCM in the future.



Juergen



sincerely,
Armin
--
ALG

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011/9/22 Pavel Janík pa...@janik.cz

  Proposed way to move forward
 
  1. put the externals under .../trunk/ext_sources
  .../trunk/ext_sources
  .../trunk/main
  .../trunk/extras
  2. adapt configure to use this as default, disable the download (maybe
  reactivate it later if we move to a DSCM)
  3. keep the process with checking the md5 sum as it is (for potential
 later
  use)
 
  Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some
 URL that can replace http://hg so users don't have to check out
 everything? Ie. do we have a URL where we have real checkout of the SVN?
 Some SVN web interface? Don't know Apache infra well yet... That would be
 real killer solution!


don't know if it is what you are looking for but

wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
filename?view=co

should download the head version.

Juergen



 --
 Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík

 don't know if it is what you are looking for but
 
 wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
 filename?view=co
 
 should download the head version.

Then we should be able to have both things solved - files in SVN and with a 
relatively small change in the download script also the remote fetching of the 
files if we do not have ext_sources local checkout.
-- 
Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir

2011/9/22 Pavel Janík pa...@janik.cz:
 Proposed way to move forward

 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)

 Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some URL 
 that can replace http://hg so users don't have to check out everything? 
 Ie. do we have a URL where we have real checkout of the SVN? Some SVN web 
 interface? Don't know Apache infra well yet... That would be real killer 
 solution!
 --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?

 -Rob

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


mmh, no or i understand something wrong. People checkout .../trunk and would
get ext_sources, main and extras. To benefit from the modified script
we have to put ext_sources besides trunk

.../ooo/ext_sources
.../ooo/trunk/main
.../ooo/trunk/extras

Means back to my initial proposal, right?

Juergen

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir

2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


 mmh, no or i understand something wrong. People checkout .../trunk and would
 get ext_sources, main and extras. To benefit from the modified script
 we have to put ext_sources besides trunk

 .../ooo/ext_sources
 .../ooo/trunk/main
 .../ooo/trunk/extras

 Means back to my initial proposal, right?


I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not release in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for externals:

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

 Juergen

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Shao Zhi Zhao



hi,

Based on this result, an other trunk will be like the following if IBM
symphony checked in:
/ooo/symphony-src/trunk/main
/ooo/symphony-src/trunk/extras
/ooo/symphony-src/tags
/ooo/symphony-src/branches

thus it introduces a problem:
How to merge the two trunks of symphony-src and ooo-src?



thanks

mail:zhaos...@cn.ibm.com
Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
P.R.China


   
 Rob Weir  
 robweir@apache.o 
 rgTo
   ooo-dev@incubator.apache.org,   
 2011-09-22 21:18   cc
   
   Subject
 Please respond to Re: handling of ext_sources -   
 ooo-dev@incubator Juergen's suggestion [was: Re: A
.apache.orgsystematic approach to IP review?]
   
   
   
   
   
   




2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


 mmh, no or i understand something wrong. People checkout .../trunk and
would
 get ext_sources, main and extras. To benefit from the modified
script
 we have to put ext_sources besides trunk

 .../ooo/ext_sources
 .../ooo/trunk/main
 .../ooo/trunk/extras

 Means back to my initial proposal, right?


I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not release in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the
release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for externals:

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

 Juergen

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

On Thu, Sep 22, 2011 at 3:18 PM, Rob Weir robw...@apache.org wrote:

 It is possible that we have this wrong.  Adding in site/ and ooo-site/
 brings in a different convention.  They have are set up to have
 trunk/tags/branches underneath them.  That is fine, because the
 website does not release in synch with an OOo release.  It makes
 sense for them to be able to tag and branch independently.


agree


 We should also consider how the project grows going forward.  We know
 that other code bases will be checked in, like Symphony.  And there
 are other, small, but disjoint contributions that I'm working on as
 well.

 So it might make sense to move trunk down one level:

 /ooo/ooo-src/trunk/main
 /ooo/ooo-src/trunk/extras
 /ooo/ooo-src/trunk/ext-sources
 /ooo/ooo-src/tags
 /ooo/ooo-src/branches

 That would make more sense then, as a unit, since we would want to tag
 the across all of /ooo/ooo-src/ to define a release.


agree, from this perspective it make sense. The question then is when we
want to introduce this further level?


 I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
 they need the additional extras then they check that out separately.
  I don't think most users will want to check out the entire trunk all
 the time.   We should consider also how we want this tree to grow over
 time, as other related


i assumed that a developer will check out trunk, maybe a wrong assumption



 In the end, I think we want to preserve the ability to:

 1) Preserve an audit trail of all changes that went into a release

 2) Do be able to tag and branch a release and everything that is in the
 release

 3) Restore the exact state of a previous tagged release, including the
 exact ext-sources used in that release

 I'm certain that my proposal will enable this.  There may be other
 approaches that do as well.


i think so too. And with my changed mindset to not always check out trunk
completely i am fine with this approach.



 Another thing to keep in mind is the SVN support for externals:

 http://svnbook.red-bean.com/en/1.0/ch07s03.html


interesting, i didn't know that before

Juergen

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir

On Thu, Sep 22, 2011 at 9:40 AM, Shao Zhi Zhao zhaos...@cn.ibm.com wrote:

 hi,

 Based on this result, an other trunk will be like the following if IBM
 symphony checked in:
 /ooo/symphony-src/trunk/main
 /ooo/symphony-src/trunk/extras
 /ooo/symphony-src/tags
 /ooo/symphony-src/branches

 thus it introduces a problem:
 How to merge the two trunks of symphony-src and ooo-src?

 I don't think moving the tree down one level introduces any new problems
for Symphony, so long as the directories within */main. remain the same.

Of course, merging code from Symphony into AOOo will be difficult in
general.  The problem is how do we establish a common ancestor revision to
do a 3-way merge with?  This will really depend on whether Symphony has a
good record of what the corresponding OOo revision was for each of its
initial files.

If not, then you can do a text diff and do some merging without trouble.
But dealing with renamed files, or moved files, or deleted files, these are
trickier to process automatically.

If you don't have that history, then in theory it could be reestablished by
taking the initial revision of each file in Symphony and comparing it to
each revision of the same file in OOo Mercurial, and find which revision
matches.  It might be possible to establish enough context for a 3-way merge
that way.

-Rob




 thanks

 mail:zhaos...@cn.ibm.com
 Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
 P.R.China

 [image: Inactive hide details for Rob Weir ---2011-09-22 21:20:44---Rob
 Weir robw...@apache.org]Rob Weir ---2011-09-22 21:20:44---Rob Weir 
 robw...@apache.org


 *Rob Weir robw...@apache.org*

 2011-09-22 21:18
 Please respond to
 ooo-dev@incubator.apache.org



 To

 ooo-dev@incubator.apache.org,
 cc


 Subject

 Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic
 approach to IP review?]

 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
  2011/9/22 Jürgen Schmidt jogischm...@googlemail.com
 
  On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:
 
 
  I was thinking something similar.  We only need to use the SVN
  interface to the files when we're adding or updating.  But we can have
  bootstrap continue to download via http.  The location, using
  Juergen's proposed location, would be
  http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
 
  yes, this is the correct URL, the URL that i have posted wouldn't work
 
  Juergen
 
 
  This would save having a duplicate local SVN working copy of the file,
  right?
 
 
  mmh, no or i understand something wrong. People checkout .../trunk and
 would
  get ext_sources, main and extras. To benefit from the modified
 script
  we have to put ext_sources besides trunk
 
  .../ooo/ext_sources
  .../ooo/trunk/main
  .../ooo/trunk/extras
 
  Means back to my initial proposal, right?
 

 I think the idea is this:  Everything under ooo represents what goes
 into a release.  It can be tagged and branched.  trunk/ is a peer to a
 tags/ and branches/ directory.

 It is possible that we have this wrong.  Adding in site/ and ooo-site/
 brings in a different convention.  They have are set up to have
 trunk/tags/branches underneath them.  That is fine, because the
 website does not release in synch with an OOo release.  It makes
 sense for them to be able to tag and branch independently.

 We should also consider how the project grows going forward.  We know
 that other code bases will be checked in, like Symphony.  And there
 are other, small, but disjoint contributions that I'm working on as
 well.

 So it might make sense to move trunk down one level:

 /ooo/ooo-src/trunk/main
 /ooo/ooo-src/trunk/extras
 /ooo/ooo-src/trunk/ext-sources
 /ooo/ooo-src/tags
 /ooo/ooo-src/branches

 That would make more sense then, as a unit, since we would want to tag
 the across all of /ooo/ooo-src/ to define a release.

 I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
 they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
 the time.   We should consider also how we want this tree to grow over
 time, as other related

 In the end, I think we want to preserve the ability to:

 1) Preserve an audit trail of all changes that went into a release

 2) Do be able to tag and branch a release and everything that is in the
 release

 3) Restore the exact state of a previous tagged release, including the
 exact ext-sources used in that release

 I'm certain that my proposal will enable this.  There may be other
 approaches that do as well.

 Another thing to keep in mind is the SVN support for externals:

 http://svnbook.red-bean.com/en/1.0/ch07s03.html

 This might make some things easier.

 -Rob

  Juergen

RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Dennis E. Hamilton

You can get anything off of the web interface of SVN at the individual level 
without it being in a working copy, though of course it has to be somewhere 
local while it is being processed in a build.

But if you check-out the trunk, you get everything that is in the trunk HEAD 
(or a specified) version.

As far as I know, you can do a check-out anywhere deeper in the tree and avoid 
everything not at that node [and below].  For example, just checkout 
trunk/main.  

It takes some consideration of SVN organization to have the desired flavors in 
convenient chunks that people can work with without having to eat the whole 
thing (with regard to SVN checkout, SVN update and, of course, SVN commits).  I 
can testify that an SVN UDPATE of the working copy of the entire incubator/ooo/ 
subtree is a painful experience, even when there is nothing to update.

 - Dennis

PS: I find it an interesting characteristic of SVN that trunk, tags, and 
branches are just names of folders and don't mean anything special to SVN.  The 
nomenclature and it is use is a matter of custom, like code indentation rules 
for ( ... }.


-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Thursday, September 22, 2011 05:24
To: ooo-dev@incubator.apache.org
Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A 
systematic approach to IP review?]

2011/9/22 Pavel Janík pa...@janik.cz:
 Proposed way to move forward

 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)

 Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some URL 
 that can replace http://hg so users don't have to check out everything? 
 Ie. do we have a URL where we have real checkout of the SVN? Some SVN web 
 interface? Don't know Apache infra well yet... That would be real killer 
 solution!
 --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-21 Thread Jens-Heiner Rechtien


On 09/20/2011 05:26 PM, Rob Weir wrote:

Ai2011/9/20 Pavel Janíkpa...@janik.cz:

Have we ever considered using version control to...uh...manage file versions?

Just an idea.



Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
--


That's fine.  If they don't change, then doing a svn update will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.


Placing all the external tarballs in the VCS is a real killer if using a 
distributed SCM like git or Mercurial, thats why we had moved them out. 
As Pavel said, it worked quite nice. As for the audit possibility, we 
referenced the external tar balls in the source tree by file name and a 
md5 check sum, which works just as reliantly as putting them directly 
into the repository.


Nowadays the DSCM have some alternative methods which deal with such 
blobs but in essence they also keep them separate.


If AOOo ever plans to go back to a DSCM I would keep the source tree and 
the external blobs strictly separated.


All in all the general SCM tooling community opinion trend seems to be 
that a S(ource)CM system is for, well, source and external dependencies 
are better handled with other mechanism, like Maven or so.


With SVN all this is less of a concern, naturally.

Heiner

--
Jens-Heiner Rechtien

Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt

On Mon, Sep 19, 2011 at 7:05 PM, Rob Weir robw...@apache.org wrote:

 On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de
 wrote:
  Am 09/19/2011 04:47 PM, schrieb Rob Weir:
 
  On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
   wrote:
 
  Am 09/19/2011 01:59 PM, schrieb Rob Weir:
 
  2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:
 
  On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org
  wrote:
 
  If you haven't looked it closely, it is probably worth a few minutes
  of your time to review our incubation status page, especially the
  items under Copyright and Verify Distribution Rights.  It lists
  the things we need to do, including:
 
   -- Check and make sure that the papers that transfer rights to the
  ASF been received. It is only necessary to transfer rights for the
  package, the core code, and any new code produced by the project.
 
  -- Check and make sure that the files that have been donated have
 been
  updated to reflect the new ASF copyright.
 
  -- Check and make sure that for all code included with the
  distribution that is not under the Apache license, we have the right
  to combine with Apache-licensed code and redistribute.
 
  -- Check and make sure that all source code distributed by the
 project
  is covered by one or more of the following approved licenses:
 Apache,
  BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with
 essentially
  the same terms.
 
  Some of this is already going on, but it is hard to get a sense of
 who
  is doing what and how much progress we have made.  I wonder if we
 can
  agree to a more systematic approach?  This will make it easier to
 see
  the progress we're making and it will also make it easier for others
  to help.
 
  Suggestions:
 
  1) We need to get all files needed for the build into SVN.  Right
 now
  there are some that are copied down from the OpenOffice.org website
  during the build's bootstrap process.   Until we get the files all
 in
  one place it is hard to get a comprehensive view of our
 dependencies.
 
 
  do you mean to check in the files under ext_source into svn and
 remove
  it
  later on when we have cleaned up the code. Or do you mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache extra if
  possible.
 
 
 
  Why not just keep in in SVN?   Moving things to Apache-Extras does not
  help us with the IP review.   In other words, if we have a dependency
  on a OSS module that has an incompatible license, then moving that
  module to Apache Extras does not make that dependency go away.  We
  still need to understand the nature of the dependency: a build tool, a
  dynamic runtime dependency, a statically linked library, an optional
  extensions, a necessary core module.
 
  If we find out, for example, that something in ext-sources is only
  used as a build tool, and is not part of the release, then there is
  nothing that prevents us from hosting it in SVN.   But if something is
  a necessary library and it is under GPL, then this is a problem even
  if we store it on Apache-Extras,
 
 
 
 
  2) Continue the CWS integrations.  Along with 1) this ensures that
 all
  the code we need for the release is in SVN.
 
  3)  Files that Oracle include in their SGA need to have the Apache
  license header inserted and the Sun/Oracle copyright migrated to the
  NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
  automate parts of this.
 
  4) Once the SGA files have the Apache headers, then we can make
  regular use of RAT to report on files that are lacking an Apache
  header.  Such files might be in one of the following categories:
 
  a) Files that Oracle owns the copyright on and which should be
  included in an amended SGA
 
  b) Files that have a compatible OSS license which we are permitted
 to
  use.  This might require that we add a mention of it to the NOTICE
  file.
 
  c) Files that have an incompatible OSS license.  These need to be
  removed/replaced.
 
  d) Files that have an OSS license that has not yet been
  reviewed/categorized by Apache legal affairs.  In that case we need
 to
  bring it to their attention.
 
  e) (Hypothetically) files that are not under an OSS license at all.
  E.g., a Microsoft header file.  These must be removed.
 
  5) We should to track the resolution of each file, and do this
  publicly.  The audit trail is important.  Some ways we could do this
  might be:
 
  a) Track this in SVN properties.  So set ip:sga for the SGA files,
  ip:mit for files that are MIT licensed, etc.  This should be
 reflected
  in headers as well, but this is not always possible.  For example,
 we
  might have binary files where we cannot add headers, or cases where
  the OSS files do not have headers, but where we can prove their
  provenance via other means.
 
  b) Track this is a spreadsheet, one row per file.
 
  c) Track this is an text log file checked in SVN
 
  d) Track this in an annotated script

Re: A systematic approach to IP review?

2011-09-20 Thread Shane Curcuru

So... has anyone actually run Apache RAT yet?  It has a scan only mode 
which I'd think would be the simplest place to start.


Personally, I'd recommend working on basic RAT scans, with the scripts 
to run them and any exception rules (for known files, etc.) all checked 
into SVN with the build tools for the code.  But hey, it's easy for me 
to suggest we do stuff, when I only currently have time to be a mentor 
and thus can get away with just making suggestions.  8-)


I like the general concept of storing the IP type for files in SVN 
properties; although properties are easy to change, Apache does have a 
strong history of being able to provide oversight for commit logs 
throughout a project's history.


- Shane

Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt

On Mon, Sep 19, 2011 at 1:59 PM, Rob Weir robw...@apache.org wrote:

 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
  On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:
 
  If you haven't looked it closely, it is probably worth a few minutes
  of your time to review our incubation status page, especially the
  items under Copyright and Verify Distribution Rights.  It lists
  the things we need to do, including:
 
   -- Check and make sure that the papers that transfer rights to the
  ASF been received. It is only necessary to transfer rights for the
  package, the core code, and any new code produced by the project.
 
  -- Check and make sure that the files that have been donated have been
  updated to reflect the new ASF copyright.
 
  -- Check and make sure that for all code included with the
  distribution that is not under the Apache license, we have the right
  to combine with Apache-licensed code and redistribute.
 
  -- Check and make sure that all source code distributed by the project
  is covered by one or more of the following approved licenses: Apache,
  BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
  the same terms.
 
  Some of this is already going on, but it is hard to get a sense of who
  is doing what and how much progress we have made.  I wonder if we can
  agree to a more systematic approach?  This will make it easier to see
  the progress we're making and it will also make it easier for others
  to help.
 
  Suggestions:
 
  1) We need to get all files needed for the build into SVN.  Right now
  there are some that are copied down from the OpenOffice.org website
  during the build's bootstrap process.   Until we get the files all in
  one place it is hard to get a comprehensive view of our dependencies.
 
 
  do you mean to check in the files under ext_source into svn and remove it
  later on when we have cleaned up the code. Or do you mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache extra if possible.
 


 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,

 i am not really happy with all the binaries in the trunk tree because of
the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have ext_sources besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...

If we can agree on such a structure i would move forward to bring in some
new external sources. The proposed ucpp preprocessor - BSD license, used in
the idlc and of course part of the SDK later on. I made some tests with it
and was able to build the sources on windows in our cygwin environment with
a new gnu make file. I was also able to build udkapi and offapi with this
new and adapted idlc/ucpp without any problems - generated type library is
equal to the old one.

I have to run some more tests on other platforms as soon as i have other
platforms available for testing. I decided to replace the preprocessor
instead of removing it because of compatibility reasons and it was of course
the easier change. The next step is to check how the process with
ext_sources work in detail in our build process and adapt the new ucpp
module. If anybody is familiar with ext_sources and can point me to
potential hurdles, please let me know (on a new thread) ;-)

Juergen



 
 
  2) Continue the CWS integrations.  Along with 1) this ensures that all
  the code we need for the release is in SVN.
 
  3)  Files that Oracle include in their SGA need to have the Apache
  license header inserted and the Sun/Oracle copyright migrated to the
  NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
  automate parts of this.
 
  4) Once the SGA files have the Apache headers, then we can make
  regular use of RAT to report on files that are lacking an Apache
  header.  Such files might be in one of the following categories:
 
  a) Files that Oracle owns the copyright on and which should be
  included in an amended SGA
 
  b) Files that have a compatible OSS license which we are permitted to
  use.  This might require that we add a mention of it to the NOTICE
  file.
 
  c) Files that have an incompatible

Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt

On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote:

 So... has anyone actually run Apache RAT yet?  It has a scan only mode
 which I'd think would be the simplest place to start.

 it's on my todo list to take a look on it, probably i will come back with
questions

Juergen


 Personally, I'd recommend working on basic RAT scans, with the scripts to
 run them and any exception rules (for known files, etc.) all checked into
 SVN with the build tools for the code.  But hey, it's easy for me to suggest
 we do stuff, when I only currently have time to be a mentor and thus can
 get away with just making suggestions.  8-)

 I like the general concept of storing the IP type for files in SVN
 properties; although properties are easy to change, Apache does have a
 strong history of being able to provide oversight for commit logs throughout
 a project's history.

 - Shane

handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Oliver-Rainer Wittmann


Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:

On Mon, Sep 19, 2011 at 1:59 PM, Rob Weirrobw...@apache.org  wrote:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


...

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,

i am not really happy with all the binaries in the trunk tree because of

the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have ext_sources besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...



I like this idea.

From a developer point of view I only have to checkout ext_sources 
once and reference it from all my trunks using the already existing 
configure-switch 'with-external-tar=path to ext_sources'


Best regards, Oliver.

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík

Hi,

 I like this idea.
 
 From a developer point of view I only have to checkout ext_sources once and 
 reference it from all my trunks using the already existing configure-switch 
 'with-external-tar=path to ext_sources'

when we will have such repository, we will surely modify the current sources so 
you don't have to add such switch because ../ext_sources will be auto-checked.

BTW - welcome! :-)
-- 
Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir

On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grand armin.le.gr...@me.com wrote:
 On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:

 Hi,

 On 20.09.2011 14:37, Jürgen Schmidt wrote:

 ...

 What do others think about a structure where we have ext_sources
 besides
 trunk.

 incubator/ooo/trunk
 incubator/ooo/ext_source
 ...

So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some downstream consumer of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?

-Rob



 I like this idea.

  From a developer point of view I only have to checkout ext_sources
 once and reference it from all my trunks using the already existing
 configure-switch 'with-external-tar=path to ext_sources'

 +1

 Also, hopefully ext_sources will not change too much (after a consolidation
 phase) and it's mostly binaries, thus not too well suited for a repository.
 Let's not extend our main repository with those binaries, please.

 Best regards, Oliver.


 Regards,
        Armin
 --
 ALG

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík

 Would we be able to do this?  What if the flaw was related to code in
 ext_sources?

Then we patch it. Patch will be in the trunk/main, as always.

 And if not us, in the project, what if some downstream consumer of
 AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
 we've already updated ext_sources for AOOo 4.0?

Versions - we can and will have more tarballs of one external source.

This all is already solved.
-- 
Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Armin Le Grand


On 20.09.2011 15:58, Rob Weir wrote:

On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grandarmin.le.gr...@me.com  wrote:

On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:


Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:


...


What do others think about a structure where we have ext_sources
besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...


So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some downstream consumer of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?


Good point. Thus, it should be part of incubator/ooo/trunk, something like:

incubator/ooo/trunk/main
incubator/ooo/trunk/extras
incubator/ooo/trunk/ext_sources

It could be in an own repro, but this would just bring up the risk to 
not use the same tags in both (by purpose or by error).


Indeed, looks as if it has to be a part of trunk somehow. Not very nice 
for binaries.


Maybe we could find a intermediate place for them as long as we will 
need to do changes pretty often. Currently we will have to do some 
add/remove/changes to it. It could be good to add them to trunk after it 
has stabilized a little more.



-Rob





I like this idea.

  From a developer point of view I only have to checkout ext_sources
once and reference it from all my trunks using the already existing
configure-switch 'with-external-tar=path to ext_sources'


+1

Also, hopefully ext_sources will not change too much (after a consolidation
phase) and it's mostly binaries, thus not too well suited for a repository.
Let's not extend our main repository with those binaries, please.


Best regards, Oliver.



Regards,
Armin
--
ALG

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pedro Giffuni


+1
- This will make it easier to update the BSD/MIT unrestricted stuff.
- Hopefully it also means we will eventually stop depending on GNU
  patch for the build.

Welcome Oliver!
Great job Juergen: it's the first code replacement and a very
necessary one for OO forks too (unless they want to carry
lcc's copyright;) ).

cheers,

Pedro.

On Tue, 20 Sep 2011 15:44:59 +0200, Pavel Janík pa...@janik.cz wrote:

Hi,


I like this idea.

From a developer point of view I only have to checkout ext_sources 
once and reference it from all my trunks using the already existing 
configure-switch 'with-external-tar=path to ext_sources'


when we will have such repository, we will surely modify the current
sources so you don't have to add such switch because ../ext_sources
will be auto-checked.

BTW - welcome! :-)

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík

 Have we ever considered using version control to...uh...manage file versions?
 
 Just an idea.


Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
-- 
Pavel Janík

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir

Ai2011/9/20 Pavel Janík pa...@janik.cz:
 Have we ever considered using version control to...uh...manage file versions?

 Just an idea.


 Maybe Heiner will say more, but in the past, we have had the external 
 tarballs in the VCS, but then we moved them out and it worked very well. 
 There never was a reason to track external.tar.gz files in VCS, because we do 
 not change them.
 --

That's fine.  If they don't change, then doing a svn update will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.

 Pavel Janík

Re: A systematic approach to IP review?

2011-09-20 Thread Rob Weir

2011/9/20 Jürgen Schmidt jogischm...@googlemail.com:
 On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote:

 So... has anyone actually run Apache RAT yet?  It has a scan only mode
 which I'd think would be the simplest place to start.

 it's on my todo list to take a look on it, probably i will come back with
 questions


I did a run earlier today.  Good news is we have 4 files with Apache
license.  Bad news is we have 52,876 files with unknown license.  In
most cases that should just be the standard OOo header.

These scans will be much more useful after we've replaced the OOo
headers with Apache headers.  But we can't just do a global change.
We should only make that change for files that are in the official
Oracle SGA.  After that is done, then the RAT report will be more
useful.

 Juergen


 Personally, I'd recommend working on basic RAT scans, with the scripts to
 run them and any exception rules (for known files, etc.) all checked into
 SVN with the build tools for the code.  But hey, it's easy for me to suggest
 we do stuff, when I only currently have time to be a mentor and thus can
 get away with just making suggestions.  8-)

 I like the general concept of storing the IP type for files in SVN
 properties; although properties are easy to change, Apache does have a
 strong history of being able to provide oversight for commit logs throughout
 a project's history.

 - Shane

Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt

On Mon, Sep 19, 2011 at 3:34 AM, Pedro Giffuni giffu...@tutopia.com wrote:

Hi;

Is there an updated SGA already?

good question and where can we find it?

Juergen

I think there will likely be a set of files of uncertain license
that we should move to apache-extras. I am refering specifically
to the dictionaries: Oracle might have property over some but not
all. I propose we rescue myspell in apache-extras and put the
dictionaries there to keep it as an alternative. I have no idea
where to get MySpell though.

While here, if there's still interest in maintaining the Hg
history, bitbucket.org seems to be a nice alternative: it's
rather specialized in Mercurial.

Cheers,

Pedro.

On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote:

If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights. It lists
the things we need to do, including:

-- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made. I wonder if we can
agree to a more systematic approach? This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN. Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process. Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.

2) Continue the CWS integrations. Along with 1) this ensures that all
the code we need for the release is in SVN.

3) Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file. Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header. Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use. This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license. These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs. In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file. These must be removed.

5) We should to track the resolution of each file, and do this
publicly. The audit trail is important. Some ways we could do this
might be:

a) Track this in SVN properties. So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc. This should be reflected
in headers as well, but this is not always possible. For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did. Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.

-Rob

[1]
http://incubator.apache.org/**projects/openofficeorg.htmlhttp://incubator.apache.org/projects/openofficeorg.html

[2] http://incubator.apache.org/**rat/ http://incubator.apache.org/rat/

Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.



 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

But we need a place to document the IP clearance and under
http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some
information.

Juergen




 -Rob


 [1] http://incubator.apache.org/projects/openofficeorg.html

 [2] http://incubator.apache.org/rat/

Re: A systematic approach to IP review?

On Sun, Sep 18, 2011 at 9:34 PM, Pedro Giffuni giffu...@tutopia.com wrote:
 Hi;

 Is there an updated SGA already?


Not that I know of.   But we can and should go ahead with IP clearance
using the SGA we already have.   In fact, starting that process will
help us identify exactly which files needed to be added to the updated
SGA.

-Rob


 I think there will likely be a set of files of uncertain license
 that we should move to apache-extras. I am refering specifically
 to the dictionaries: Oracle might have property over some but not
 all. I propose we rescue myspell in apache-extras and put the
 dictionaries there to keep it as an alternative. I have no idea
 where to get MySpell though.

 While here, if there's still interest in maintaining the Hg
 history, bitbucket.org seems to be a nice alternative: it's
 rather specialized in Mercurial.

 Cheers,

 Pedro.

 On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


 -Rob


 [1] http://incubator.apache.org/projects/openofficeorg.html

 [2] http://incubator.apache.org/rat/

Re: A systematic approach to IP review?

2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going

Re: A systematic approach to IP review?


Am 09/19/2011 01:59 PM, schrieb Rob Weir:

2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.



talked

Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.

Re: A systematic approach to IP review?

2011-09-19 Thread Pedro F. Giffuni



--- On Mon, 9/19/11, Rob Weir robw...@apache.org wrote:
...
 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
...
 
  do you mean to check in the files under ext_source
 into svn and remove it
  later on when we have cleaned up the code. Or do you
 mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache
 extra if possible.
 
 
 
 Why not just keep in in SVN?   Moving things
 to Apache-Extras does not
 help us with the IP review.   In other
 words, if we have a dependency
 on a OSS module that has an incompatible license, then
 moving that
 module to Apache Extras does not make that dependency go
 away.  We
 still need to understand the nature of the dependency: a
 build tool, a
 dynamic runtime dependency, a statically linked library, an
 optional
 extensions, a necessary core module.


But adding in stuff that we have to remove immediately (nss,
seamonkey, .. ) doesn't help either. I also think a lot of
that stuff has to be updated before brought in: ICU
apparently would be trouble, but the Apache-commons, ICC,
and other stuff can/should be updated.

snip

 a) Track this in SVN properties.  So set ip:sga
 for the SGA files,
  ip:mit for files that are MIT licensed, etc.


I thought we had delayed updating the copyrights in the
header to ease the CWS integration. I still hope to see
more of those, especially anything related to gnumake
(I don't know when, but dmake has to go!).

Using the SVN properties is a good idea. And we do have
to start the NOTICES file.

All just IMHO, of course.

Pedro.

RE: A systematic approach to IP review?

Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).

I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.

It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention

RE: A systematic approach to IP review?

On the wiki question, I think OOOUSERS should continue to be used for 
transition work.  Or OOODEV could be used if it needs to be limited to 
committers (perhaps the case for this activity), although it means power 
observers can't contribute there and have to do so by some other means.

This is transition work and the Confluence wiki seems like a good place for it.

The MW may be interrupted or disrupted and it is probably a good idea to *not* 
put such development-transition intensive content there.  

Also, the migrated wiki is not the live wiki at OpenOffice.org.  So doing 
anything there will create collisions.  It is also not fully migrated in that 
it is not operating in place of what folks see via OpenOffice.org as far as I 
know.  The current Confluence wikis avoid confusion and are stable for this 
particular purpose.

 - Dennis



-Original Message-
From: Jürgen Schmidt [mailto:jogischm...@googlemail.com] 
Sent: Monday, September 19, 2011 01:45
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

[ ... ]

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

[ ... ]

Re: A systematic approach to IP review?


Am 09/19/2011 04:47 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community

Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 Rob,

 I was reading the suggestion from Marcus as it being that since the code base 
 is in a folder structure (modularized) and the wiki can map folder structures 
 and their status nicely, it is not necessary to have a single table to manage 
 this from, but have any tables be at some appropriate granularity toward the 
 leaves of the hierarchy (on the wiki).


Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.

But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?

A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

 I can see some brittle cases, especially in the face of refactoring.  The use 
 of the wiki might have to be an ephemeral activity that is handled this way 
 entirely for our initial scrubbing.

 Ideally, additional and sustained review would be in the SVN with the 
 artifacts so reviewed, and coalesced somehow.  The use of SVN properties is 
 interesting, but they are rather invisible and I have a question about what 
 happens with them when a commit happens against the particular artifact.


Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.

 It seems that there is some need to balance an immediate requirement and what 
 would be sufficient for it versus what would assist us in the longer term.  
 It would be interesting to know what the additional-review work has become 
 for other projects that have a substantial code base (e.g., SVN itself, 
 httpd, ...).  I have no idea.


The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.

  - Dennis

 -Original Message-
 From: Rob Weir [mailto:robw...@apache.org]
 Sent: Monday, September 19, 2011 07:47
 To: ooo-dev@incubator.apache.org
 Subject: Re: A systematic approach to IP review?

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have

Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 04:47 PM, schrieb Rob Weir:

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:

 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org    wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove
 it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if
 possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be

Re: A systematic approach to IP review?


Am 09/19/2011 06:54 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
dennis.hamil...@acm.org  wrote:

Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).



Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.


How do you want to use Bugzilla to track thousands of files?


But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?


Questions over questions but not helpful. ;-)


A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.


Then restrict the write access.


The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.


IMHO you haven't understood what I wanted to tell you.

Sure it makes no sense to create a list of every file in SVN to see if 
the license is good or bad. So, do it module by module. And when a 
module is marked as done, then of course every file in the modules was 
checked. Otherwise it's not working.


And how to make sure that there was no change when source was 
added/moved/improved? Simply Commit Then Review (CTR). A change in the 
license header at the beginning should be remarkable, right? However, we 
also need to have trust in everybodies work.


BTW:
What is your plan to track every file to make sure the license is OK?

Marcus




I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.



Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.


It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.



The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.


  - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org]
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more

Re: A systematic approach to IP review?


Am 09/19/2011 07:05 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 04:47 PM, schrieb Rob Weir:


On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:


Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove
it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if
possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work

Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 1:19 PM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 06:54 PM, schrieb Rob Weir:

 On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
 dennis.hamil...@acm.org  wrote:

 Rob,

 I was reading the suggestion from Marcus as it being that since the code
 base is in a folder structure (modularized) and the wiki can map folder
 structures and their status nicely, it is not necessary to have a single
 table to manage this from, but have any tables be at some appropriate
 granularity toward the leaves of the hierarchy (on the wiki).


 Using the wiki for this might be useful for tracking the status of
 modules we already know we need to replace.  Bugzilla would be another
 way to track the status.

 How do you want to use Bugzilla to track thousands of files?


No.  But for tracking module review, Bugzilla might be better than the
wiki.  It allows us to have a conversation on each module via
comments.

 But it is not really a sufficient solution.  Why?  Because it is not
 tied to the code and is not reproducible.  How was the list of
 components listed in the wiki generated?  Based on what script?  Where
 is the script?  How do we know it is accurate and current?  How do we
 know that integrating a CWS does not make that list become outdated?
 How do we prove to ourselves that we did this right?  And how to we
 record that proof as a record?  And how do we repeat this proof every
 time we do a new release?

 Questions over questions but not helpful. ;-)

 A list of components of unknown derivation sitting on a community wiki
 that anyone can edit is not really a suitable basis for an IP review.

 Then restrict the write access.

 The granularity we need to worry about is the file.  That is the
 finest grain level of having a license header.  That is the unit of
 tracking in SVN.  That is the unit that someone could have changed the
 content in SVN.

 Again, it is fine if someone wants to outline this at the module
 level.  But that does not eliminate the requirement for us to do this
 at the file level as well.

 IMHO you haven't understood what I wanted to tell you.


I understand what you are saying.  I just don't agree with you.

 Sure it makes no sense to create a list of every file in SVN to see if the
 license is good or bad. So, do it module by module. And when a module is
 marked as done, then of course every file in the modules was checked.
 Otherwise it's not working.


That is not a consistent approach. Every developer applies their own
criteria.   It is not reproducible. It leaves no audit trail.  And it
doesn't help us with the next release.

If you use the Apache Release Audit Tool (RAT) then it will check all
the files automatically.

 And how to make sure that there was no change when source was
 added/moved/improved? Simply Commit Then Review (CTR). A change in the
 license header at the beginning should be remarkable, right? However, we
 also need to have trust in everybodies work.


We would run RAT before every release and with every significant code
contribution.

You can think of this as a form of CTR, but one that is automated,
with a consistent rule set.

Obviously, good CTR plus the work on the wiki will all help.  But we
need the RAT scans as well, to show that we're clean.

 BTW:
 What is your plan to track every file to make sure the license is OK?


Run RAT.  That is what it does.

 Marcus



 I can see some brittle cases, especially in the face of refactoring.  The
 use of the wiki might have to be an ephemeral activity that is handled this
 way entirely for our initial scrubbing.

 Ideally, additional and sustained review would be in the SVN with the
 artifacts so reviewed, and coalesced somehow.  The use of SVN properties is
 interesting, but they are rather invisible and I have a question about what
 happens with them when a commit happens against the particular artifact.


 Properties stick with the file, unless changed.  Think of the
 svn:eol-style property.  It is not wiped out with a new revision of
 the file.

 It seems that there is some need to balance an immediate requirement and
 what would be sufficient for it versus what would assist us in the longer
 term.  It would be interesting to know what the additional-review work has
 become for other projects that have a substantial code base (e.g., SVN
 itself, httpd, ...).  I have no idea.


 The IP review needs to occur with every release.  So the work we do to
 automate this, and make it data-drive, will repay itself with every
 release.

 I invite you to investigate what other projects do.  When you do I
 think you will agree.

  - Dennis

 -Original Message-
 From: Rob Weir [mailto:robw...@apache.org]
 Sent: Monday, September 19, 2011 07:47
 To: ooo-dev@incubator.apache.org
 Subject: Re: A systematic approach to IP review?

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:

 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm

RE: A systematic approach to IP review?

I agree running rat is important ...

I haven't heard any suggestion that such an important tool not be used.

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 10:05
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

I think the wiki is fine as a collaboration tool, to list tasks and
who is working on them.  But that is not a substitute for running
scans with the Apache Release Audit Tool (RAT) and working toward a
clean report.

Think of it this way:

1) We have a list of modules on the wiki that we need to replace.
Great.  Developers can work on that list.

2) But how do we know that the list on the wiki is complete?  How do
we know that it is not missing anything?

3) Running RAT against the source is how we ensure that the code is clean

In other words, the criteria should be that we have a clean RAT
record, not that we have a clean wiki.  The list of modules on the
wiki is not traceable to a scan of the source code.  It is not
reproducible.  It might be useful.  But it is not sufficient.

-Rob

[ ... ]

RE: A systematic approach to IP review?

I hope that Rat can produce a list of OK and exclude not OK on the first use, 
since the list of not OK would overwhelm everything else about the current 
repository.

 - Dennis

-Original Message-
From: Marcus (OOo) [mailto:marcus.m...@wtnet.de] 
Sent: Monday, September 19, 2011 10:27
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

Am 09/19/2011 07:05 PM, schrieb Rob Weir:
[ ... ]

 3) Running RAT against the source is how we ensure that the code is clean

OK, I don't know what this can do your us. Maybe it's the solution for 
the problem.

How do you know that it is not skipping anything? I guess you simply 
would trust RAT that it is doing fine, right? ;-)

BTW:
Is RAT producing a log file, so that we have a list of every file that 
was checked? This could be very helpful.

Marcus
[ ... ]

RE: A systematic approach to IP review?

I agree that there is no escape from managing down to the individual file.  It 
is a question of organization now, where the entire base is involved.

Later, if the svn:property is to be trusted, the problem is quite different, it 
seems to me.  Plus the rules are understood and provenance and IP are likely 
handled as anything needing clearance enters the code base.  What is done to 
ensure a previously-vetted code base has not become tainted strikes me as a 
kind of regression/smoke test.

It is in that regard that I am concerned the tools for this one-time case need 
not be the same as for future cases.

And, since I am not doing the work in the present case, I am offering this as 
something to think about, not a position.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 09:55
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

[ ... ]

Re: A systematic approach to IP review?