Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-28 Thread Jürgen Schmidt

On 10/27/11 8:28 PM, Pedro Giffuni wrote:



--- On Thu, 10/27/11, Jürgen Schmidtjogischm...@googlemail.com  wrote:



In any case, yes.. I think this is the way to go. I am
just hoping there will be a way to opt out those
components in favor of the system libraries when those
available.


me too but we should move forward and we can change it at
any time when we have a better solution.



I am OK with that, but let me attempt to dump what I think:

1) you are not bringing in *anything* copyleft, that directory
will only be for the non-restrictive stuff that we need: ICU,
Boost, etc.

2) This will all have to be registered in the NOTICE file,
but since this is transitory and not really stuff we use in
base, we should start a new section there to separate it from
the stuff we do use in the core system.
3) We should probably move some of the stuff in soltools
there too (mkdepend).
4) I know you want ucpp there too, but since that stuff is
used in idlc, I think I'd prefer it in idlc/source/preproc/
as it was before. No idea if we can use the system cpp for the
rest but that would probably make sense.
mmh, i would prefer to put it under the ext-sources to make clear that 
it comes from external.


Juergen



All just IMHO, I am pretty sure whatever you do is better than
what we have now :).

Pedro.




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-28 Thread Pedro Giffuni

--- On Fri, 10/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote:

snip mental dump

  4) I know you want ucpp there too, but since that
  stuff is used in idlc, I think I'd prefer it in
  idlc/source/preproc/
  as it was before. No idea if we can use the system cpp
  for the rest but that would probably make sense.

 mmh, i would prefer to put it under the ext-sources to make
 clear that it comes from external.
 

That is pretty well covered by SVN and the NOTICE file,
but I was only brainstorming.

Just have fun :).

Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Jürgen Schmidt

On 9/22/11 1:19 PM, Jürgen Schmidt wrote:


ok, we have several arguments for and against but no decision how we
want to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere,
check md5 sum, unpack, patch, build
1.1 somewhere is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue
because in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version - simply checkout the version tag and everything is
in place ...

3. in a DSCM it would be a real problem over time because of the
increasing space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository
using for example git svn or something else - disadvantage of the
increasing space but probably acceptable if a clean local trunk will be
kept and updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential
later use)

Any opinions or suggestions?



i think we still haven't finished on this topic but it is somewhat 
important to move forward with our IP clearance and the whole 
development work.


So if nobody has real objections i would like to move forward with this 
proposal but would also like to change the proposed directory name from 
ext_sources to 3rdparty.


Keep in mind that we use this directory to keep the current state 
working and with our ongoing work we will remove more and more stuff 
from there.


The adapted bootstrap mechanism will download the libraries from this 
new place.


Juergen







Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Rob Weir
2011/10/27 Jürgen Schmidt jogischm...@googlemail.com:
 On 9/22/11 1:19 PM, Jürgen Schmidt wrote:

 ok, we have several arguments for and against but no decision how we
 want to move forward. Let us take again a look on it

 1. we have a working mechanism to get the externals from somewhere,
 check md5 sum, unpack, patch, build
 1.1 somewhere is configurable during the configure step, initially the
 externals are downloaded from http://hg.services.openoffice.org/binaries

 2. having the externals in the repository (SVN) won't be a big issue
 because in case of a checkout always the tip version is downloaded
 2.1 the SCM can be used to track the used version of the externals for a
 specific OO version - simply checkout the version tag and everything is
 in place ...

 3. in a DSCM it would be a real problem over time because of the
 increasing space of all versions

 4. we need a replacement http://hg.services.openoffice.org/binaries asap
 (who knows how long the server will be available)

 5. many developers probably work with a local clone of the repository
 using for example git svn or something else - disadvantage of the
 increasing space but probably acceptable if a clean local trunk will be
 kept and updated

 Proposed way to move forward

 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential
 later use)

 Any opinions or suggestions?


 i think we still haven't finished on this topic but it is somewhat important
 to move forward with our IP clearance and the whole development work.

 So if nobody has real objections i would like to move forward with this
 proposal but would also like to change the proposed directory name from
 ext_sources to 3rdparty.

 Keep in mind that we use this directory to keep the current state working
 and with our ongoing work we will remove more and more stuff from there.


So keep the current approach with tarballs with MD5 hashnames, etc.,
just as before but on Apache servers?

That sounds good to me.

 The adapted bootstrap mechanism will download the libraries from this new
 place.

 Juergen








Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni


--- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote:
...
 
 i think we still haven't finished on this topic but it is
 somewhat 
 important to move forward with our IP clearance and the
 whole 
 development work.
 
 So if nobody has real objections i would like to move
 forward with this 
 proposal but would also like to change the proposed
 directory name from 
 ext_sources to 3rdparty.
 
 Keep in mind that we use this directory to keep the current
 state 
 working and with our ongoing work we will remove more and
 more stuff 
 from there.
 

I was about to bring in support for FreeBSD's fetch command
(somewhat like curl) in fetch-tarballs.sh and it looks like
now you are obsoleting it :-P .

In any case, yes.. I think this is the way to go. I am just
hoping there will be a way to opt out those components in
favor of the system libraries when those available.

Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Jürgen Schmidt

On 10/27/11 6:13 PM, Pedro Giffuni wrote:



--- On Thu, 10/27/11, Jürgen Schmidtjogischm...@googlemail.com  wrote:
...


i think we still haven't finished on this topic but it is
somewhat
important to move forward with our IP clearance and the
whole
development work.

So if nobody has real objections i would like to move
forward with this
proposal but would also like to change the proposed
directory name from
ext_sources to 3rdparty.

Keep in mind that we use this directory to keep the current
state
working and with our ongoing work we will remove more and
more stuff
from there.



I was about to bring in support for FreeBSD's fetch command
(somewhat like curl) in fetch-tarballs.sh and it looks like
now you are obsoleting it :-P .

In any case, yes.. I think this is the way to go. I am just
hoping there will be a way to opt out those components in
favor of the system libraries when those available.


me too but we should move forward and we can change it at any time when 
we have a better solution.


Juergen



Pedro.





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni


--- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote:

 
  In any case, yes.. I think this is the way to go. I am
  just hoping there will be a way to opt out those
  components in favor of the system libraries when those
  available.
 
 me too but we should move forward and we can change it at
 any time when we have a better solution.
 

I am OK with that, but let me attempt to dump what I think:

1) you are not bringing in *anything* copyleft, that directory
will only be for the non-restrictive stuff that we need: ICU,
Boost, etc.

2) This will all have to be registered in the NOTICE file,
but since this is transitory and not really stuff we use in
base, we should start a new section there to separate it from
the stuff we do use in the core system.
3) We should probably move some of the stuff in soltools
there too (mkdepend).
4) I know you want ucpp there too, but since that stuff is
used in idlc, I think I'd prefer it in idlc/source/preproc/
as it was before. No idea if we can use the system cpp for the
rest but that would probably make sense.

All just IMHO, I am pretty sure whatever you do is better than
what we have now :).

Pedro.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Rob Weir
On Thu, Oct 27, 2011 at 2:28 PM, Pedro Giffuni p...@apache.org wrote:


 --- On Thu, 10/27/11, Jürgen Schmidt jogischm...@googlemail.com wrote:

 
  In any case, yes.. I think this is the way to go. I am
  just hoping there will be a way to opt out those
  components in favor of the system libraries when those
  available.

 me too but we should move forward and we can change it at
 any time when we have a better solution.


 I am OK with that, but let me attempt to dump what I think:

 1) you are not bringing in *anything* copyleft, that directory
 will only be for the non-restrictive stuff that we need: ICU,
 Boost, etc.


I think it is like the SVN trunk.  We initially bring it all in, and
then remove the copyleft parts.  Of course if we can remove them
before hand, that is good as well.  But whatever order we do the work,
we cannot release until we've done the IP review.

The files are currently hosted here:

http://hg.services.openoffice.org/binaries/

Since the build currently depends on that, I think we want to move
those files now, to Apache, rather than wait too long.

-Rob

 2) This will all have to be registered in the NOTICE file,
 but since this is transitory and not really stuff we use in
 base, we should start a new section there to separate it from
 the stuff we do use in the core system.
 3) We should probably move some of the stuff in soltools
 there too (mkdepend).
 4) I know you want ucpp there too, but since that stuff is
 used in idlc, I think I'd prefer it in idlc/source/preproc/
 as it was before. No idea if we can use the system cpp for the
 rest but that would probably make sense.

 All just IMHO, I am pretty sure whatever you do is better than
 what we have now :).

 Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni
Hi Matthias;

--- On Thu, 10/27/11, Mathias Bauer mathias_ba...@gmx.net wrote:
...

   In any case, yes.. I think this is the way to
 go. I am
   just hoping there will be a way to opt out
 those
  
  I am OK with that, but let me attempt to dump what I
 think:
  
  1) you are not bringing in *anything* copyleft, that
 directory
  will only be for the non-restrictive stuff that we
 need: ICU,
  Boost, etc.
 
 That should be doable. OTOH I'm wondering whether we should
 keep the copyleft tarballs at Apache Extras - it would allow
 to still build with them (something that can be done outside
 the ASF infrastructure and is still appreciated (if I
 understood correctly).

I don't like that but we will have to do it as a temporary
solution to avoid breaking the build until we replace
everything.

I think on the long run this is only interesting for windows
binaries, due to the difficulties of getting those packages
from different places. On linux/BSD distributions it makes
sense to use the prepackaged mozilla, etc.
 
  3) We should probably move some of the stuff in
 soltools
  there too (mkdepend).
 
 That's something for later, ATM we should move the ext_src
 stuff into a secure place.
 

Yes. Also for later, the simpleICC library is used to generate
a color profile required for pdf. I think we should just
generate the color profile somewhere outside the main build
and use it, avoiding the extra build cycles.

Another thing is we are excluding by default with extreme
prejudice both LGPL and MPL but it will be convenient to
reevaluate that since we will have to use the prepackaged
hunspell.

 If nobody else wants to do it, I can invest some time into
 that, but it might take some days.
 

I won't do it because of principles... I want them to
just go away ;-).

FWIW, Rob and I are trying to use an ooo- prefix on
Apache Extras. ooo-external-sources ?

 It seems that the consensus is that we check in the binary
 tarballs into trunk/ext_sources?!
 

I am not sure on that, I think lazy consensus by whomever does
it first will win :).

Pedro.



Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-14 Thread Pedro Giffuni

--- On Fri, 10/14/11, Robert Burrell Donkin wrote:
...
 
  A branch would save us from having say... 1000 commits
  with header changes in the history.
 
 Apache uses version control as the canonical record. It's
 therefore essential to know why a header was changed and
 by whom.


And of course the branch would be on SVN so the history for
the legal changes wouldn't be lost. Of course I meant this
only for the SGA, but ultimately it depends on the people
applying in and from what I understand now, *I* won't be
touching any headers :).

thanks for all these explanations,

Pedro.


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-14 Thread Andrew Rist



On 10/14/2011 8:58 AM, Pedro Giffuni wrote:

--- On Fri, 10/14/11, Robert Burrell Donkin wrote:
...

A branch would save us from having say... 1000 commits
with header changes in the history.

Apache uses version control as the canonical record. It's
therefore essential to know why a header was changed and
by whom.


And of course the branch would be on SVN so the history for
the legal changes wouldn't be lost. Of course I meant this
only for the SGA, but ultimately it depends on the people
applying in and from what I understand now, *I* won't be
touching any headers :).

thanks for all these explanations,

Pedro.


Robert  Pedro,

I intend to get started on the headers in the very near future.
My intention is to do a series of checkins by project/directory in the 
source tree, matching the changes to the grant(s).
I have a bit of sequencing of activities before I start, but this is 
next up on the list.


Andrew

--


Oracle Email Signature Logo
Andrew Rist | Interoperability Architect
Oracle Corporate Architecture Group
Redwood Shores, CA | 650.506.9847


ICC generated profiles are copylefted (was Re: A systematic approach to IP review?)

2011-10-14 Thread Pedro Giffuni
Hi;

When I saw this thread about machine-generate files, I never
imagined we would be taking about code in OpenOffice.org but
I found that this file:
icc/source/create_sRGB_profile/create_sRGB_profile.cpp

indeed generates viral licensed code!

I am proposing an obvious patch but I wanted the issue
documented so I created bug 118512.

enjoy ;)

Pedro.

--- On Thu, 9/29/11, Rob Weir robw...@apache.org wrote:

 On Thu, Sep 29, 2011 at 1:53 AM,
 Dennis E. Hamilton wrote:
  Let me recall the bidding a little here.  What I said
 was
 
   It is unlikely that machine-generated files of any
 kind are copyrightable subject matter.
 
  You point out that computer-generated files might
 incorporate copyrightable subject matter.  I hadn't
 considered a hybrid case where copyrightable subject matter
 would subsist in such a work, and I have no idea how and to
 what extend the output qualifies as a work of authorship,
 but it is certainly a case to be reckoned with.
 
  Then there is the issue of macro expansion, template
 parameter substitution, etc., and the cases becomes blurrier
 and blurrier.  For example, if I wrote a program and then
 put it through the C Language pre-processor, in how much of
 the expanded result does the copyright declared on the
 original subsist?  (I am willing to concede, for purposes
 of argument, that the second is a derivative work of the
 former, even though the derivation occurred dynamically.)
 
  I fancy this example because it is commonplace that
 the pre-processor incorporated files that have their own
 copyright and license notices too.  Also, the original
 might include macro calls, with
  parameters using macros defined in one or more of
 those incorporated files.
 
 
 Under US law:  Copyright protection subsists, in
 accordance with this
 title, in original works of authorship fixed in any
 tangible medium of
 expression, now known or later developed, from which they
 can be
 perceived, reproduced, or otherwise communicated, either
 directly or
 with the aid of a machine or device
 
 IANAL, but I believe Dennis is correct that a machine
 cannot be an
 author, in terms of copyright.  But the author of that
 program might.
 It comes down to who exactly put the work into a fixed in
 any
 tangible medium of expression.
 
 When I used a n ordinary code editor, the machine acts as a
 tool that
 I use to create an original work. It is a tool, like a
 paintbrush.  In
 other cases, a tool can be used to transform a work.
 
 If there is an original work in fixed form that I
 transform, then I
 may have copyright interest in the transformed work. That
 is how
 copyright law protects software binaries as well as source
 code.
 
 As for the GNU Bison example, if I created the BNF, then I
 have
 copyright interest in the generated code.  That does
 not mean that I
 have exclusive ownership of all the generated code. 
 It might be a
 mashup of original template code from the Bison authors,
 along with
 code that is a transformation of my original grammar
 definition.  It
 isn't an either/or situation.  A work can have mixed
 authorship.
 
 -Rob
 
 
  I concede that copyrightable matter can survive into a
 machine-generated file.  And I maintain that there can be
 other conditions on the use of such a file other than by
 virtue of it containing portions in which copyright
 subsists.  For example, I don't think the Copyright office
 is going to accept registration of compiled binaries any
 time soon, even though there may be conditions on the
 license of the source code that carries over onto those
 binaries.
 
  And, yes, it is murky all the way down.
 
   - Dennis
 
  -Original Message-
  From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
  Sent: Wednesday, September 28, 2011 22:32
  To: 'ooo-dev@incubator.apache.org'
  Subject: RE: A systematic approach to IP review?
 
  Not to put too fine a point on this, but it sounds
 like you are talking about boilerplate (and authored)
 template code that Bison incorporates in its output.  It is
 also tricky because the Bison output is computer source
 code.  That is an interesting case.
 
  In the US, original work of authorship is pretty
 specific in the case of literary works, which is where
 software copyright falls the last time I checked (too long
 ago, though).  I suspect that a license (in the contractual
 sense) can deal with more than copyright.  And, if Bison
 spits out copyright notices, they still only apply to that
 part of the output, if any, that qualifies as copyrightable
 subject matter.
 
  Has the Bison claim ever been tested in court?  Has
 anyone been pursued or challenged for infringement? I'm just
 curious.
 
   - Dennis
 
  -Original Message-
  From: Norbert Thiebaud [mailto:nthieb...@gmail.com]
  Sent: Wednesday, September 28, 2011 22:11
  To: ooo-dev@incubator.apache.org;
 dennis.hamil...@acm.org
  Subject: Re: A systematic approach to IP review?
 
  On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton

Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-13 Thread Robert Burrell Donkin
On Sun, Oct 9, 2011 at 7:42 PM, Pedro Giffuni p...@apache.org wrote:
 Hi;

 Looking at how big, and mostly cosmetic but necessary, a
 change it will be to bring in all the SGA license changes,
 and given that it requires manual intervention and is not
 something that can be done in one huge mega commit ...

 I think we should create a branch for this changes in merge
 them in two steps: corresponding to both SGAs. This way
 merging CWSs and bugzilla patches can go on without pain and
 people can get started on the header changes.

I recommend separating review from (automated) execution. If this is
done, a branch shouldn't be necessary...

Robert


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-13 Thread Pedro Giffuni


--- On Thu, 10/13/11, Robert Burrell Donkin wrote:

 I recommend separating review from (automated) execution.
 If this is done, a branch shouldn't be necessary...
 

Uhm.. can you elaborate a bit more?

A branch would save us from having say... 1000 commits with
header changes in the history.

regards,

Pedro.
 


How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-09 Thread Pedro Giffuni
Hi;

Looking at how big, and mostly cosmetic but necessary, a
change it will be to bring in all the SGA license changes,
and given that it requires manual intervention and is not
something that can be done in one huge mega commit ...

I think we should create a branch for this changes in merge
them in two steps: corresponding to both SGAs. This way
merging CWSs and bugzilla patches can go on without pain and
people can get started on the header changes.

cheers,

Pedro.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-01 Thread Mathias Bauer
Am 01.10.2011 00:17, schrieb Michael Stahl:

 On 30.09.2011 21:24, Mathias Bauer wrote:
 On 28.09.2011 17:32, Pedro F. Giffuni wrote:
 
 Another advantage of unpacking the tarballs: the patches will become
 *real* patches that just contain changes of the original source code.
 Often the patches nowadays contain additional files that we just need to
 build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
 as regular files.
 
 Currently keeping them as regular files is awkward because then they
 need to be copied to the place the tarballs are unpacked to.
 
 but this is just because dmake can only build source files in the same
 directory; imagine a more flexible gbuild external build target where the
 makefiles are in the source tree while the tarball gets unpacked in the
 workdir...

Sure, but until we aren't there...

I didn't talk about the dmake makefiles that are used to unpack and
patch, I was talking about using dmake for building the external modules
that come with their own build system. The makefile.mk in the root
directory of the external modules are not part of the patch, but some
patches contain makefile.mk files that are necessary to build the stuff,
either on all or only on some platforms.

Regards,
Mathias


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-30 Thread Mathias Bauer
On 28.09.2011 17:32, Pedro F. Giffuni wrote:
 FWIW;

 I don't like the patches because I can't really examine well
 the code, besides this is something the VCS handles acceptably:
 commit the original sourcecode and then apply the patches in a
 different commit. If we start with up to date versions there
 would not be much trouble.

I'm not against unpacking the tarballs and applying the patches, but we
should keep the patches somewhere so that updates could be done with the
same effort as today.

Another advantage of unpacking the tarballs: the patches will become
*real* patches that just contain changes of the original source code.
Often the patches nowadays contain additional files that we just need to
build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
as regular files.

Currently keeping them as regular files is awkward because then they
need to be copied to the place the tarballs are unpacked to.

Regards,
Mathias


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-30 Thread Michael Stahl
On 30.09.2011 21:24, Mathias Bauer wrote:
 On 28.09.2011 17:32, Pedro F. Giffuni wrote:

 Another advantage of unpacking the tarballs: the patches will become
 *real* patches that just contain changes of the original source code.
 Often the patches nowadays contain additional files that we just need to
 build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
 as regular files.
 
 Currently keeping them as regular files is awkward because then they
 need to be copied to the place the tarballs are unpacked to.

but this is just because dmake can only build source files in the same
directory; imagine a more flexible gbuild external build target where the
makefiles are in the source tree while the tarball gets unpacked in the
workdir...

 Regards,
 Mathias
 




Re: A systematic approach to IP review?

2011-09-29 Thread Rob Weir
On Thu, Sep 29, 2011 at 1:53 AM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 Let me recall the bidding a little here.  What I said was

  It is unlikely that machine-generated files of any kind are copyrightable 
 subject matter.

 You point out that computer-generated files might incorporate copyrightable 
 subject matter.  I hadn't considered a hybrid case where copyrightable 
 subject matter would subsist in such a work, and I have no idea how and to 
 what extend the output qualifies as a work of authorship, but it is certainly 
 a case to be reckoned with.

 Then there is the issue of macro expansion, template parameter substitution, 
 etc., and the cases becomes blurrier and blurrier.  For example, if I wrote a 
 program and then put it through the C Language pre-processor, in how much of 
 the expanded result does the copyright declared on the original subsist?  (I 
 am willing to concede, for purposes of argument, that the second is a 
 derivative work of the former, even though the derivation occurred 
 dynamically.)

 I fancy this example because it is commonplace that the pre-processor 
 incorporated files that have their own copyright and license notices too.  
 Also, the original might include macro calls, with
 parameters using macros defined in one or more of those incorporated files.


Under US law:  Copyright protection subsists, in accordance with this
title, in original works of authorship fixed in any tangible medium of
expression, now known or later developed, from which they can be
perceived, reproduced, or otherwise communicated, either directly or
with the aid of a machine or device

IANAL, but I believe Dennis is correct that a machine cannot be an
author, in terms of copyright.  But the author of that program might.
It comes down to who exactly put the work into a fixed in any
tangible medium of expression.

When I used a n ordinary code editor, the machine acts as a tool that
I use to create an original work. It is a tool, like a paintbrush.  In
other cases, a tool can be used to transform a work.

If there is an original work in fixed form that I transform, then I
may have copyright interest in the transformed work. That is how
copyright law protects software binaries as well as source code.

As for the GNU Bison example, if I created the BNF, then I have
copyright interest in the generated code.  That does not mean that I
have exclusive ownership of all the generated code.  It might be a
mashup of original template code from the Bison authors, along with
code that is a transformation of my original grammar definition.  It
isn't an either/or situation.  A work can have mixed authorship.

-Rob


 I concede that copyrightable matter can survive into a machine-generated 
 file.  And I maintain that there can be other conditions on the use of such a 
 file other than by virtue of it containing portions in which copyright 
 subsists.  For example, I don't think the Copyright office is going to accept 
 registration of compiled binaries any time soon, even though there may be 
 conditions on the license of the source code that carries over onto those 
 binaries.

 And, yes, it is murky all the way down.

  - Dennis

 -Original Message-
 From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
 Sent: Wednesday, September 28, 2011 22:32
 To: 'ooo-dev@incubator.apache.org'
 Subject: RE: A systematic approach to IP review?

 Not to put too fine a point on this, but it sounds like you are talking about 
 boilerplate (and authored) template code that Bison incorporates in its 
 output.  It is also tricky because the Bison output is computer source code.  
 That is an interesting case.

 In the US, original work of authorship is pretty specific in the case of 
 literary works, which is where software copyright falls the last time I 
 checked (too long ago, though).  I suspect that a license (in the contractual 
 sense) can deal with more than copyright.  And, if Bison spits out copyright 
 notices, they still only apply to that part of the output, if any, that 
 qualifies as copyrightable subject matter.

 Has the Bison claim ever been tested in court?  Has anyone been pursued or 
 challenged for infringement? I'm just curious.

  - Dennis

 -Original Message-
 From: Norbert Thiebaud [mailto:nthieb...@gmail.com]
 Sent: Wednesday, September 28, 2011 22:11
 To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
 Subject: Re: A systematic approach to IP review?

 On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
 dennis.hamil...@acm.org wrote:
 I'll stand by my original statement.

 I'm not going to get into the Pixar case since it doesn't apply here.

 I did not say it applied to the Visual studio generated cruft... I
 merely commented on the blanket assertion that 'computer generated =
 no copyright'

 The Bison manual may have license conditions on what can be done with the 
 generated artifact, but I suggest that is not about copyrightable subject 
 matter

Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Mathias Bauer

On 20.09.2011 16:36, Pavel Janík wrote:

Have we ever considered using version control to...uh...manage file
versions?

Just an idea.



Maybe Heiner will say more, but in the past, we have had the external
tarballs in the VCS, but then we moved them out and it worked very
well. There never was a reason to track external.tar.gz files in VCS,
because we do not change them.
What might be the best way to handle 3rd party code in AOOo probably 
will depend on the needs of the developers as well as on legal requirements.


We had these tarballs plus patches IIRC because Sun Legal required that 
all used 3rd party stuff should be preserved in our repos in its 
original form.


As a developer I always had preferred to have 3rd party code treated in 
the *build* like the internal source code.


So if there wasn't a requirement to have unpatched sources in the 
repository, the most natural way to keep 3rd party stuff would be to 
have a third sub-repo 3rdparty next to main and extras with the 
3rd party stuff checked in. Not the tarballs, just the unpacked content.


I wouldn't give up the patches, as they allow to handle updates better. 
This would cause a problem, as direct changes to the 3rd party stuff 
without additional authorization (means: changing the source code must 
not happen accidently, only when the 3rd party code gets an update from 
upstream) must be prevented, while still patch files must be allowed to 
added, removed, or changed, not the original source code. If that wasn't 
possible or too cumbersome, checking in the tarballs in 3rdparty would 
be better.


As svn users never download the complete history as DSCM users do, the 
pain of binary files in the repo isn't that hard. In case AOOo moved to 
a DSCM again later, the tarballs could be moved out again easily.


Regards,
Mathias


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Pedro F. Giffuni
FWIW;

I don't like the patches because I can't really examine well
the code, besides this is something the VCS handles acceptably:
commit the original sourcecode and then apply the patches in a
different commit. If we start with up to date versions there
would not be much trouble.

just my $0.02, not an objection.

Pedro.

--- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote:

...

  I wouldn't give up the patches, as they allow to
 handle updates better.
  This would cause a problem, as direct changes to the
 3rd party stuff without
  additional authorization (means: changing the source
 code must not happen
  accidently, only when the 3rd party code gets an
 update from upstream) must
  be prevented, while still patch files must be allowed
 to added, removed, or
  changed, not the original source code. If that wasn't
 possible or too
  cumbersome, checking in the tarballs in 3rdparty
 would be better.
 
 
 i also wouldn't give up the patches and for that reason i
 would like to move
 forward for now with keeping the tarballs as proposed. But
 i like the name
 3rdparty for the directory and we can later on change it
 from the tarballs
 to the unpacked code it we see demand for it. At the moment
 it's just easier
 to keep the tarballs and focus on other work.
 
 
 
  As svn users never download the complete history as
 DSCM users do, the pain
  of binary files in the repo isn't that hard. In case
 AOOo moved to a DSCM
  again later, the tarballs could be moved out again
 easily.
 
 
 agree, we don't really loose anything, can change if
 necessary and can
 continue with our work
 
 Juergen



RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Dennis E. Hamilton
The problem with bringing the 3rd party software completely into the SVN tree 
and modifying it in the tree has to do with the license the updated software is 
under.  In that case, there *is* a code provenance issue and I believe it 
crosses a line that the Apache Software Foundation is unwilling to cross with 
regard to the integrity of its code bases.

The current patches to Boost, for example, do not change the license on the 
code and preserve the Boost license.  But since this is ephemeral and the 
source is never in the SVN tree (is that correct?) the derivative use 
disappears at the end of a build.  It is sufficient then to include the 
dependency in the NOTICE for the release and not worry further.

Also, the current dependency is several releases behind the current Boost 
release.  This might not matter - the specific Boost libraries that are used 
might not be effected.  But there is a release synchronization issue.  A fork 
would have to be maintained.  Also, the dependencies are managed better now, 
rather than having the entire Boost library installed for cherry picking.

(This will all change at some point, since Boost is being incorporated into ISO 
C++.  It is probably best to wait for that to ripple out into the compiler 
distributions.)

 - Dennis

-Original Message-
From: Pedro F. Giffuni [mailto:giffu...@tutopia.com] 
Sent: Wednesday, September 28, 2011 08:32
To: ooo-dev@incubator.apache.org
Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A 
systematic approach to IP review?]

FWIW;

I don't like the patches because I can't really examine well
the code, besides this is something the VCS handles acceptably:
commit the original sourcecode and then apply the patches in a
different commit. If we start with up to date versions there
would not be much trouble.

just my $0.02, not an objection.

Pedro.

--- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com wrote:

...

  I wouldn't give up the patches, as they allow to
 handle updates better.
  This would cause a problem, as direct changes to the
 3rd party stuff without
  additional authorization (means: changing the source
 code must not happen
  accidently, only when the 3rd party code gets an
 update from upstream) must
  be prevented, while still patch files must be allowed
 to added, removed, or
  changed, not the original source code. If that wasn't
 possible or too
  cumbersome, checking in the tarballs in 3rdparty
 would be better.
 
 
 i also wouldn't give up the patches and for that reason i
 would like to move
 forward for now with keeping the tarballs as proposed. But
 i like the name
 3rdparty for the directory and we can later on change it
 from the tarballs
 to the unpacked code it we see demand for it. At the moment
 it's just easier
 to keep the tarballs and focus on other work.
 
 
 
  As svn users never download the complete history as
 DSCM users do, the pain
  of binary files in the repo isn't that hard. In case
 AOOo moved to a DSCM
  again later, the tarballs could be moved out again
 easily.
 
 
 agree, we don't really loose anything, can change if
 necessary and can
 continue with our work
 
 Juergen
 



Re: A systematic approach to IP review?

2011-09-28 Thread Mathias Bauer

On 19.09.2011 02:27, Rob Weir wrote:


1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.


If you want svn to be the place for the IP review, we have to do it in 
two steps. There are some cws for post-3.4 that bring in new files. 
Setting up a branch now to bring them to svn will create additional work 
now that IMHO should better be done later.




2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.


see above


e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.


I assume that you are talking about header files with a MS copyright, 
not header files generated from e.g. Visual Studio. In my understanding 
these files should be considered as contributed under the rules of the 
OOo project and so now their copyright owner is Oracle.



5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.
IMHO this is the best solution. svn is the place of truth if it comes 
down to files.


The second best solution would be to have one text file per build unit 
(that would be a gbuild makefile in the new build system) or per module 
(that would be a sub folder of the sub-repos). The file should be 
checked in in svn.


Everything else (spreadsheets or whatsoever) could be generated from 
that, in case anyone had a need for a spreadsheet with 6 rows 
containing license information. ;-)


Regards,
Mathias


RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Pedro F. Giffuni
The idea (not originally mine) is to have keep only compatible
licensed code under an isolated (3rdparty) directory.

I think on the long run we should try to use the system versions
of such software when available, and every linux/bsd distribution
is probably doing that for LO already.

Pedro.

--- On Wed, 9/28/11, Dennis E. Hamilton dennis.hamil...@acm.org wrote:

 The problem with bringing the 3rd
 party software completely into the SVN tree and modifying it
 in the tree has to do with the license the updated software
 is under.  In that case, there *is* a code provenance
 issue and I believe it crosses a line that the Apache
 Software Foundation is unwilling to cross with regard to the
 integrity of its code bases.
 
 The current patches to Boost, for example, do not change
 the license on the code and preserve the Boost
 license.  But since this is ephemeral and the source is
 never in the SVN tree (is that correct?) the derivative use
 disappears at the end of a build.  It is sufficient
 then to include the dependency in the NOTICE for the release
 and not worry further.
 
 Also, the current dependency is several releases behind the
 current Boost release.  This might not matter - the
 specific Boost libraries that are used might not be
 effected.  But there is a release synchronization
 issue.  A fork would have to be maintained.  Also,
 the dependencies are managed better now, rather than having
 the entire Boost library installed for cherry picking.
 
 (This will all change at some point, since Boost is being
 incorporated into ISO C++.  It is probably best to wait
 for that to ripple out into the compiler distributions.)
 
  - Dennis
 
 -Original Message-
 From: Pedro F. Giffuni [mailto:giffu...@tutopia.com]
 
 Sent: Wednesday, September 28, 2011 08:32
 To: ooo-dev@incubator.apache.org
 Subject: Re: handling of ext_sources - Juergen's suggestion
 [was: Re: A systematic approach to IP review?]
 
 FWIW;
 
 I don't like the patches because I can't really examine
 well
 the code, besides this is something the VCS handles
 acceptably:
 commit the original sourcecode and then apply the patches
 in a
 different commit. If we start with up to date versions
 there
 would not be much trouble.
 
 just my $0.02, not an objection.
 
 Pedro.
 
 --- On Wed, 9/28/11, Jürgen Schmidt jogischm...@googlemail.com
 wrote:
 
 ...
 
   I wouldn't give up the patches, as they allow to
  handle updates better.
   This would cause a problem, as direct changes to
 the
  3rd party stuff without
   additional authorization (means: changing the
 source
  code must not happen
   accidently, only when the 3rd party code gets an
  update from upstream) must
   be prevented, while still patch files must be
 allowed
  to added, removed, or
   changed, not the original source code. If that
 wasn't
  possible or too
   cumbersome, checking in the tarballs in
 3rdparty
  would be better.
  
  
  i also wouldn't give up the patches and for that
 reason i
  would like to move
  forward for now with keeping the tarballs as proposed.
 But
  i like the name
  3rdparty for the directory and we can later on
 change it
  from the tarballs
  to the unpacked code it we see demand for it. At the
 moment
  it's just easier
  to keep the tarballs and focus on other work.
  
  
  
   As svn users never download the complete history
 as
  DSCM users do, the pain
   of binary files in the repo isn't that hard. In
 case
  AOOo moved to a DSCM
   again later, the tarballs could be moved out
 again
  easily.
  
  
  agree, we don't really loose anything, can change if
  necessary and can
  continue with our work
  
  Juergen
  
 
 



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Michael Stahl
On 28.09.2011 17:32, Pedro F. Giffuni wrote:
 FWIW;
 
 I don't like the patches because I can't really examine well
 the code, besides this is something the VCS handles acceptably:
 commit the original sourcecode and then apply the patches in a
 different commit. If we start with up to date versions there
 would not be much trouble.

if we didn't have many thousands of lines of patches to rebase, then
upgrading to less outdated versions wouldn't be such a PITA.

sadly in many cases upstreaming patches was never sufficiently high on the
priority list to actually get done...

-- 
Dealing with failure is easy: Work hard to improve.
 Success is also easy to handle: You've solved the wrong problem.
 Work hard to improve. -- Alan Perlis



RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
It is unlikely that machine-generated files of any kind are copyrightable 
subject matter.  I would think that files generated by Visual Studio should 
just be regenerated, especially if this has to do with preprocessor 
pre-compilation, project boiler-plate (and even build/make) files, 
MIDL-compiled files, resource-compiler output, and the like.  

(I assume there are no MFC dependencies unless MFC has somehow shown up under 
VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times.  I 
thought the big issue was ATL.)

Meanwhile, I favor what you say about having a file at the folder level of the 
buildable components.  It strikes me as a visible way to ensure that the IP 
review has been completed and is current.  It also has great transparency and 
accountability since the document is in the SVN itself.  It also survives being 
extracted from the SVN, included in a tar-ball, etc.  In short: nice!

 - Dennis

-Original Message-
From: Mathias Bauer [mailto:mathias_ba...@gmx.net] 
Sent: Wednesday, September 28, 2011 04:25
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On 19.09.2011 02:27, Rob Weir wrote:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

If you want svn to be the place for the IP review, we have to do it in 
two steps. There are some cws for post-3.4 that bring in new files. 
Setting up a branch now to bring them to svn will create additional work 
now that IMHO should better be done later.


 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

see above

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

I assume that you are talking about header files with a MS copyright, 
not header files generated from e.g. Visual Studio. In my understanding 
these files should be considered as contributed under the rules of the 
OOo project and so now their copyright owner is Oracle.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.
IMHO this is the best solution. svn is the place of truth if it comes 
down to files.

The second best solution would be to have one text file per build unit 
(that would be a gbuild makefile in the new build system) or per module 
(that would be a sub folder of the sub-repos). The file should be 
checked in in svn.

Everything else (spreadsheets or whatsoever) could be generated from 
that, in case anyone had a need for a spreadsheet with 6 rows 
containing license information. ;-)

Regards,
Mathias



Re: A systematic approach to IP review?

2011-09-28 Thread Rob Weir
On Wed, Sep 28, 2011 at 6:42 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 It is unlikely that machine-generated files of any kind are copyrightable 
 subject matter.  I would think that files generated by Visual Studio should 
 just be regenerated, especially if this has to do with preprocessor 
 pre-compilation, project boiler-plate (and even build/make) files, 
 MIDL-compiled files, resource-compiler output, and the like.


That is my understanding as well, wrt computer-generated files.
However the lack of copyright does not mean lack of concern.  For
example, some code generation applications have a license that puts
additional restrictions on the generated code.  Some versions of GNU
Bison, the YACC variant, did that.


 (I assume there are no MFC dependencies unless MFC has somehow shown up under 
 VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times.  
 I thought the big issue was ATL.)

 Meanwhile, I favor what you say about having a file at the folder level of 
 the buildable components.  It strikes me as a visible way to ensure that the 
 IP review has been completed and is current.  It also has great transparency 
 and accountability since the document is in the SVN itself.  It also survives 
 being extracted from the SVN, included in a tar-ball, etc.  In short: nice!

  - Dennis

 -Original Message-
 From: Mathias Bauer [mailto:mathias_ba...@gmx.net]
 Sent: Wednesday, September 28, 2011 04:25
 To: ooo-dev@incubator.apache.org
 Subject: Re: A systematic approach to IP review?

 On 19.09.2011 02:27, Rob Weir wrote:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

 If you want svn to be the place for the IP review, we have to do it in
 two steps. There are some cws for post-3.4 that bring in new files.
 Setting up a branch now to bring them to svn will create additional work
 now that IMHO should better be done later.


 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 see above

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 I assume that you are talking about header files with a MS copyright,
 not header files generated from e.g. Visual Studio. In my understanding
 these files should be considered as contributed under the rules of the
 OOo project and so now their copyright owner is Oracle.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.
 IMHO this is the best solution. svn is the place of truth if it comes
 down to files.

 The second best solution would be to have one text file per build unit
 (that would be a gbuild makefile in the new build system) or per module
 (that would be a sub folder of the sub-repos). The file should be
 checked in in svn.

 Everything else (spreadsheets or whatsoever) could be generated from
 that, in case anyone had a need for a spreadsheet with 6 rows
 containing license information. ;-)

 Regards,
 Mathias




Re: A systematic approach to IP review?

2011-09-28 Thread Norbert Thiebaud
On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 It is unlikely that machine-generated files of any kind are copyrightable 
 subject matter.

I'd imagine that Pixar, for instance, would have a problem with that
blanket statement...

The very existence of this paragraph in the Bison manual :
http://www.gnu.org/s/bison/manual/bison.html#Conditions
also raise doubt as the the validity of the premise.

Norbert


RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
I'll stand by my original statement.

I'm not going to get into the Pixar case since it doesn't apply here.

The Bison manual may have license conditions on what can be done with the 
generated artifact, but I suggest that is not about copyrightable subject 
matter in the artifact.  A similar condition would be one in, let's say for a 
hypothetical case, Visual C++ 2008 Express Edition requiring that generated 
code be run on Windows.  It's not about copyright.  

And I agree, one must understand license conditions that apply to the tool used 
to make the generated artifacts.  I did neglect to consider that.

 - Dennis

-Original Message-
From: Norbert Thiebaud [mailto:nthieb...@gmail.com] 
Sent: Wednesday, September 28, 2011 16:41
To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
Subject: Re: A systematic approach to IP review?

On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 It is unlikely that machine-generated files of any kind are copyrightable 
 subject matter.

I'd imagine that Pixar, for instance, would have a problem with that
blanket statement...

The very existence of this paragraph in the Bison manual :
http://www.gnu.org/s/bison/manual/bison.html#Conditions
also raise doubt as the the validity of the premise.

Norbert



Re: A systematic approach to IP review?

2011-09-28 Thread Pedro F. Giffuni

--- On Wed, 9/28/11, Norbert Thiebaud wrote:
...
 On Wed, Sep 28, 2011 at 5:42 PM,
 Dennis E. Hamilton wrote:
  It is unlikely that machine-generated files of any
 kind are copyrightable subject matter.
 
 I'd imagine that Pixar, for instance, would have a problem
 with that
 blanket statement...
 
 The very existence of this paragraph in the Bison manual :
 http://www.gnu.org/s/bison/manual/bison.html#Conditions
 also raise doubt as the the validity of the premise.
 

Ugh... I am not a lawyer and I normally prefer not to be have
to read all that but OOo requires bison to build, so if that
paragraph still applies we should be using yacc instead.

Pedro.



Re: A systematic approach to IP review?

2011-09-28 Thread Norbert Thiebaud
On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 I'll stand by my original statement.

 I'm not going to get into the Pixar case since it doesn't apply here.

I did not say it applied to the Visual studio generated cruft... I
merely commented on the blanket assertion that 'computer generated =
no copyright'

 The Bison manual may have license conditions on what can be done with the 
 generated artifact, but I suggest that is not about copyrightable subject 
 matter in the artifact.
Actually it is. The only claim they could legally have _is_ on the
generated bit that are substantial piece of code copied from template
they provide, namely in the case of a bison generated parser the whole
parser skeleton needed to exploit the generated state-graph. the whole
paragraph is about the copyright disposition of these bits. and in the
case of bison they explicitly grant you a license to use these bits in
the 'normal' use case... my point being that the existence of that
paragraph also disprove the assertion that 'computer  generated = no
copyright'

You could write a program that print itself... the mere fact that it
print itself does not mean you lose the copyright on your program...

That being said, I do think you are on the clear with the Visual
Studio generated cruft... but not merely because there is 'computer
generation' involved.


Norbert


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtien jhrecht...@web.dewrote:

 On 09/20/2011 05:26 PM, Rob Weir wrote:

 Ai2011/9/20 Pavel Janíkpa...@janik.cz:

 Have we ever considered using version control to...uh...manage file
 versions?

 Just an idea.



 Maybe Heiner will say more, but in the past, we have had the external
 tarballs in the VCS, but then we moved them out and it worked very well.
 There never was a reason to track external.tar.gz files in VCS, because we
 do not change them.
 --


 That's fine.  If they don't change, then doing a svn update will not
 bring them down each time.

 Aside from being useful for version control, SVN is useful also very
 useful as an audit trail.  So in the rare occasions when one of these
 files does change, we know who changed it and why.  This is important
 for ensuring the IP cleanliness of the project.

 Is your main concern performance?  Even as individual tarballs,
 ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
 And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
 huge contributor to download time.


 Placing all the external tarballs in the VCS is a real killer if using a
 distributed SCM like git or Mercurial, thats why we had moved them out. As
 Pavel said, it worked quite nice. As for the audit possibility, we
 referenced the external tar balls in the source tree by file name and a md5
 check sum, which works just as reliantly as putting them directly into the
 repository.

 Nowadays the DSCM have some alternative methods which deal with such blobs
 but in essence they also keep them separate.

 If AOOo ever plans to go back to a DSCM I would keep the source tree and
 the external blobs strictly separated.

 All in all the general SCM tooling community opinion trend seems to be that
 a S(ource)CM system is for, well, source and external dependencies are
 better handled with other mechanism, like Maven or so.

 With SVN all this is less of a concern, naturally.

 ok, we have several arguments for and against but no decision how we want
to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 somewhere is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version - simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else - disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík
 Proposed way to move forward
 
 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)
 
 Any opinions or suggestions?


+1.

And one more question:

If we put something into SVN into .../trunk/ext_sources, do we have some URL 
that can replace http://hg so users don't have to check out everything? Ie. 
do we have a URL where we have real checkout of the SVN? Some SVN web 
interface? Don't know Apache infra well yet... That would be real killer 
solution!
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Armin Le Grand

On 22.09.2011 13:19, Jürgen Schmidt wrote:

On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtienjhrecht...@web.dewrote:


On 09/20/2011 05:26 PM, Rob Weir wrote:

...


Placing all the external tarballs in the VCS is a real killer if using a
distributed SCM like git or Mercurial, thats why we had moved them out. As
Pavel said, it worked quite nice. As for the audit possibility, we
referenced the external tar balls in the source tree by file name and a md5
check sum, which works just as reliantly as putting them directly into the
repository.

Nowadays the DSCM have some alternative methods which deal with such blobs
but in essence they also keep them separate.

If AOOo ever plans to go back to a DSCM I would keep the source tree and
the external blobs strictly separated.

All in all the general SCM tooling community opinion trend seems to be that
a S(ource)CM system is for, well, source and external dependencies are
better handled with other mechanism, like Maven or so.

With SVN all this is less of a concern, naturally.

ok, we have several arguments for and against but no decision how we want

to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 somewhere is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version -  simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else -  disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?


+1

Best current solution: Added to SVN where it does not really matter, and 
a way to get back when we may change to a DSCM in the future.



Juergen



sincerely,
Armin
--
ALG



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
2011/9/22 Pavel Janík pa...@janik.cz

  Proposed way to move forward
 
  1. put the externals under .../trunk/ext_sources
  .../trunk/ext_sources
  .../trunk/main
  .../trunk/extras
  2. adapt configure to use this as default, disable the download (maybe
  reactivate it later if we move to a DSCM)
  3. keep the process with checking the md5 sum as it is (for potential
 later
  use)
 
  Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some
 URL that can replace http://hg so users don't have to check out
 everything? Ie. do we have a URL where we have real checkout of the SVN?
 Some SVN web interface? Don't know Apache infra well yet... That would be
 real killer solution!


don't know if it is what you are looking for but

wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
filename?view=co

should download the head version.

Juergen



 --
 Pavel Janík






Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík
 don't know if it is what you are looking for but
 
 wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
 filename?view=co
 
 should download the head version.

Then we should be able to have both things solved - files in SVN and with a 
relatively small change in the download script also the remote fetching of the 
files if we do not have ext_sources local checkout.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
2011/9/22 Pavel Janík pa...@janik.cz:
 Proposed way to move forward

 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)

 Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some URL 
 that can replace http://hg so users don't have to check out everything? 
 Ie. do we have a URL where we have real checkout of the SVN? Some SVN web 
 interface? Don't know Apache infra well yet... That would be real killer 
 solution!
 --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?

 -Rob



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


mmh, no or i understand something wrong. People checkout .../trunk and would
get ext_sources, main and extras. To benefit from the modified script
we have to put ext_sources besides trunk

.../ooo/ext_sources
.../ooo/trunk/main
.../ooo/trunk/extras

Means back to my initial proposal, right?

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


 mmh, no or i understand something wrong. People checkout .../trunk and would
 get ext_sources, main and extras. To benefit from the modified script
 we have to put ext_sources besides trunk

 .../ooo/ext_sources
 .../ooo/trunk/main
 .../ooo/trunk/extras

 Means back to my initial proposal, right?


I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not release in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for externals:

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

 Juergen



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Shao Zhi Zhao


hi,

Based on this result, an other trunk will be like the following if IBM
symphony checked in:
/ooo/symphony-src/trunk/main
/ooo/symphony-src/trunk/extras
/ooo/symphony-src/tags
/ooo/symphony-src/branches

thus it introduces a problem:
How to merge the two trunks of symphony-src and ooo-src?



thanks

mail:zhaos...@cn.ibm.com
Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
P.R.China


   
 Rob Weir  
 robweir@apache.o 
 rgTo
   ooo-dev@incubator.apache.org,   
 2011-09-22 21:18   cc
   
   Subject
 Please respond to Re: handling of ext_sources -   
 ooo-dev@incubator Juergen's suggestion [was: Re: A
.apache.orgsystematic approach to IP review?]
   
   
   
   
   
   




2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com

 On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:


 I was thinking something similar.  We only need to use the SVN
 interface to the files when we're adding or updating.  But we can have
 bootstrap continue to download via http.  The location, using
 Juergen's proposed location, would be
 http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

 yes, this is the correct URL, the URL that i have posted wouldn't work

 Juergen


 This would save having a duplicate local SVN working copy of the file,
 right?


 mmh, no or i understand something wrong. People checkout .../trunk and
would
 get ext_sources, main and extras. To benefit from the modified
script
 we have to put ext_sources besides trunk

 .../ooo/ext_sources
 .../ooo/trunk/main
 .../ooo/trunk/extras

 Means back to my initial proposal, right?


I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not release in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the
release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for externals:

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

 Juergen



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 3:18 PM, Rob Weir robw...@apache.org wrote:

 It is possible that we have this wrong.  Adding in site/ and ooo-site/
 brings in a different convention.  They have are set up to have
 trunk/tags/branches underneath them.  That is fine, because the
 website does not release in synch with an OOo release.  It makes
 sense for them to be able to tag and branch independently.


agree


 We should also consider how the project grows going forward.  We know
 that other code bases will be checked in, like Symphony.  And there
 are other, small, but disjoint contributions that I'm working on as
 well.

 So it might make sense to move trunk down one level:

 /ooo/ooo-src/trunk/main
 /ooo/ooo-src/trunk/extras
 /ooo/ooo-src/trunk/ext-sources
 /ooo/ooo-src/tags
 /ooo/ooo-src/branches

 That would make more sense then, as a unit, since we would want to tag
 the across all of /ooo/ooo-src/ to define a release.


agree, from this perspective it make sense. The question then is when we
want to introduce this further level?


 I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
 they need the additional extras then they check that out separately.
  I don't think most users will want to check out the entire trunk all
 the time.   We should consider also how we want this tree to grow over
 time, as other related


i assumed that a developer will check out trunk, maybe a wrong assumption



 In the end, I think we want to preserve the ability to:

 1) Preserve an audit trail of all changes that went into a release

 2) Do be able to tag and branch a release and everything that is in the
 release

 3) Restore the exact state of a previous tagged release, including the
 exact ext-sources used in that release

 I'm certain that my proposal will enable this.  There may be other
 approaches that do as well.


i think so too. And with my changed mindset to not always check out trunk
completely i am fine with this approach.



 Another thing to keep in mind is the SVN support for externals:

 http://svnbook.red-bean.com/en/1.0/ch07s03.html


interesting, i didn't know that before

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
On Thu, Sep 22, 2011 at 9:40 AM, Shao Zhi Zhao zhaos...@cn.ibm.com wrote:

 hi,

 Based on this result, an other trunk will be like the following if IBM
 symphony checked in:
 /ooo/symphony-src/trunk/main
 /ooo/symphony-src/trunk/extras
 /ooo/symphony-src/tags
 /ooo/symphony-src/branches

 thus it introduces a problem:
 How to merge the two trunks of symphony-src and ooo-src?

 I don't think moving the tree down one level introduces any new problems
for Symphony, so long as the directories within */main. remain the same.

Of course, merging code from Symphony into AOOo will be difficult in
general.  The problem is how do we establish a common ancestor revision to
do a 3-way merge with?  This will really depend on whether Symphony has a
good record of what the corresponding OOo revision was for each of its
initial files.

If not, then you can do a text diff and do some merging without trouble.
But dealing with renamed files, or moved files, or deleted files, these are
trickier to process automatically.

If you don't have that history, then in theory it could be reestablished by
taking the initial revision of each file in Symphony and comparing it to
each revision of the same file in OOo Mercurial, and find which revision
matches.  It might be possible to establish enough context for a 3-way merge
that way.

-Rob




 thanks

 mail:zhaos...@cn.ibm.com
 Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
 P.R.China

 [image: Inactive hide details for Rob Weir ---2011-09-22 21:20:44---Rob
 Weir robw...@apache.org]Rob Weir ---2011-09-22 21:20:44---Rob Weir 
 robw...@apache.org


 *Rob Weir robw...@apache.org*

 2011-09-22 21:18
 Please respond to
 ooo-dev@incubator.apache.org



 To

 ooo-dev@incubator.apache.org,
 cc


 Subject

 Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic
 approach to IP review?]

 2011/9/22 Jürgen Schmidt jogischm...@googlemail.com:
  2011/9/22 Jürgen Schmidt jogischm...@googlemail.com
 
  On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir robw...@apache.org wrote:
 
 
  I was thinking something similar.  We only need to use the SVN
  interface to the files when we're adding or updating.  But we can have
  bootstrap continue to download via http.  The location, using
  Juergen's proposed location, would be
  http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
 
  yes, this is the correct URL, the URL that i have posted wouldn't work
 
  Juergen
 
 
  This would save having a duplicate local SVN working copy of the file,
  right?
 
 
  mmh, no or i understand something wrong. People checkout .../trunk and
 would
  get ext_sources, main and extras. To benefit from the modified
 script
  we have to put ext_sources besides trunk
 
  .../ooo/ext_sources
  .../ooo/trunk/main
  .../ooo/trunk/extras
 
  Means back to my initial proposal, right?
 

 I think the idea is this:  Everything under ooo represents what goes
 into a release.  It can be tagged and branched.  trunk/ is a peer to a
 tags/ and branches/ directory.

 It is possible that we have this wrong.  Adding in site/ and ooo-site/
 brings in a different convention.  They have are set up to have
 trunk/tags/branches underneath them.  That is fine, because the
 website does not release in synch with an OOo release.  It makes
 sense for them to be able to tag and branch independently.

 We should also consider how the project grows going forward.  We know
 that other code bases will be checked in, like Symphony.  And there
 are other, small, but disjoint contributions that I'm working on as
 well.

 So it might make sense to move trunk down one level:

 /ooo/ooo-src/trunk/main
 /ooo/ooo-src/trunk/extras
 /ooo/ooo-src/trunk/ext-sources
 /ooo/ooo-src/tags
 /ooo/ooo-src/branches

 That would make more sense then, as a unit, since we would want to tag
 the across all of /ooo/ooo-src/ to define a release.

 I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
 they need the additional extras then they check that out separately.
 I don't think most users will want to check out the entire trunk all
 the time.   We should consider also how we want this tree to grow over
 time, as other related

 In the end, I think we want to preserve the ability to:

 1) Preserve an audit trail of all changes that went into a release

 2) Do be able to tag and branch a release and everything that is in the
 release

 3) Restore the exact state of a previous tagged release, including the
 exact ext-sources used in that release

 I'm certain that my proposal will enable this.  There may be other
 approaches that do as well.

 Another thing to keep in mind is the SVN support for externals:

 http://svnbook.red-bean.com/en/1.0/ch07s03.html

 This might make some things easier.

 -Rob

  Juergen
 




RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Dennis E. Hamilton
You can get anything off of the web interface of SVN at the individual level 
without it being in a working copy, though of course it has to be somewhere 
local while it is being processed in a build.

But if you check-out the trunk, you get everything that is in the trunk HEAD 
(or a specified) version.

As far as I know, you can do a check-out anywhere deeper in the tree and avoid 
everything not at that node [and below].  For example, just checkout 
trunk/main.  

It takes some consideration of SVN organization to have the desired flavors in 
convenient chunks that people can work with without having to eat the whole 
thing (with regard to SVN checkout, SVN update and, of course, SVN commits).  I 
can testify that an SVN UDPATE of the working copy of the entire incubator/ooo/ 
subtree is a painful experience, even when there is nothing to update.

 - Dennis

PS: I find it an interesting characteristic of SVN that trunk, tags, and 
branches are just names of folders and don't mean anything special to SVN.  The 
nomenclature and it is use is a matter of custom, like code indentation rules 
for ( ... }.


-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Thursday, September 22, 2011 05:24
To: ooo-dev@incubator.apache.org
Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A 
systematic approach to IP review?]

2011/9/22 Pavel Janík pa...@janik.cz:
 Proposed way to move forward

 1. put the externals under .../trunk/ext_sources
 .../trunk/ext_sources
 .../trunk/main
 .../trunk/extras
 2. adapt configure to use this as default, disable the download (maybe
 reactivate it later if we move to a DSCM)
 3. keep the process with checking the md5 sum as it is (for potential later
 use)

 Any opinions or suggestions?


 +1.

 And one more question:

 If we put something into SVN into .../trunk/ext_sources, do we have some URL 
 that can replace http://hg so users don't have to check out everything? 
 Ie. do we have a URL where we have real checkout of the SVN? Some SVN web 
 interface? Don't know Apache infra well yet... That would be real killer 
 solution!
 --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-21 Thread Jens-Heiner Rechtien

On 09/20/2011 05:26 PM, Rob Weir wrote:

Ai2011/9/20 Pavel Janíkpa...@janik.cz:

Have we ever considered using version control to...uh...manage file versions?

Just an idea.



Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
--


That's fine.  If they don't change, then doing a svn update will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.


Placing all the external tarballs in the VCS is a real killer if using a 
distributed SCM like git or Mercurial, thats why we had moved them out. 
As Pavel said, it worked quite nice. As for the audit possibility, we 
referenced the external tar balls in the source tree by file name and a 
md5 check sum, which works just as reliantly as putting them directly 
into the repository.


Nowadays the DSCM have some alternative methods which deal with such 
blobs but in essence they also keep them separate.


If AOOo ever plans to go back to a DSCM I would keep the source tree and 
the external blobs strictly separated.


All in all the general SCM tooling community opinion trend seems to be 
that a S(ource)CM system is for, well, source and external dependencies 
are better handled with other mechanism, like Maven or so.


With SVN all this is less of a concern, naturally.

Heiner

--
Jens-Heiner Rechtien


Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 7:05 PM, Rob Weir robw...@apache.org wrote:

 On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de
 wrote:
  Am 09/19/2011 04:47 PM, schrieb Rob Weir:
 
  On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
   wrote:
 
  Am 09/19/2011 01:59 PM, schrieb Rob Weir:
 
  2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:
 
  On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org
  wrote:
 
  If you haven't looked it closely, it is probably worth a few minutes
  of your time to review our incubation status page, especially the
  items under Copyright and Verify Distribution Rights.  It lists
  the things we need to do, including:
 
   -- Check and make sure that the papers that transfer rights to the
  ASF been received. It is only necessary to transfer rights for the
  package, the core code, and any new code produced by the project.
 
  -- Check and make sure that the files that have been donated have
 been
  updated to reflect the new ASF copyright.
 
  -- Check and make sure that for all code included with the
  distribution that is not under the Apache license, we have the right
  to combine with Apache-licensed code and redistribute.
 
  -- Check and make sure that all source code distributed by the
 project
  is covered by one or more of the following approved licenses:
 Apache,
  BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with
 essentially
  the same terms.
 
  Some of this is already going on, but it is hard to get a sense of
 who
  is doing what and how much progress we have made.  I wonder if we
 can
  agree to a more systematic approach?  This will make it easier to
 see
  the progress we're making and it will also make it easier for others
  to help.
 
  Suggestions:
 
  1) We need to get all files needed for the build into SVN.  Right
 now
  there are some that are copied down from the OpenOffice.org website
  during the build's bootstrap process.   Until we get the files all
 in
  one place it is hard to get a comprehensive view of our
 dependencies.
 
 
  do you mean to check in the files under ext_source into svn and
 remove
  it
  later on when we have cleaned up the code. Or do you mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache extra if
  possible.
 
 
 
  Why not just keep in in SVN?   Moving things to Apache-Extras does not
  help us with the IP review.   In other words, if we have a dependency
  on a OSS module that has an incompatible license, then moving that
  module to Apache Extras does not make that dependency go away.  We
  still need to understand the nature of the dependency: a build tool, a
  dynamic runtime dependency, a statically linked library, an optional
  extensions, a necessary core module.
 
  If we find out, for example, that something in ext-sources is only
  used as a build tool, and is not part of the release, then there is
  nothing that prevents us from hosting it in SVN.   But if something is
  a necessary library and it is under GPL, then this is a problem even
  if we store it on Apache-Extras,
 
 
 
 
  2) Continue the CWS integrations.  Along with 1) this ensures that
 all
  the code we need for the release is in SVN.
 
  3)  Files that Oracle include in their SGA need to have the Apache
  license header inserted and the Sun/Oracle copyright migrated to the
  NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
  automate parts of this.
 
  4) Once the SGA files have the Apache headers, then we can make
  regular use of RAT to report on files that are lacking an Apache
  header.  Such files might be in one of the following categories:
 
  a) Files that Oracle owns the copyright on and which should be
  included in an amended SGA
 
  b) Files that have a compatible OSS license which we are permitted
 to
  use.  This might require that we add a mention of it to the NOTICE
  file.
 
  c) Files that have an incompatible OSS license.  These need to be
  removed/replaced.
 
  d) Files that have an OSS license that has not yet been
  reviewed/categorized by Apache legal affairs.  In that case we need
 to
  bring it to their attention.
 
  e) (Hypothetically) files that are not under an OSS license at all.
  E.g., a Microsoft header file.  These must be removed.
 
  5) We should to track the resolution of each file, and do this
  publicly.  The audit trail is important.  Some ways we could do this
  might be:
 
  a) Track this in SVN properties.  So set ip:sga for the SGA files,
  ip:mit for files that are MIT licensed, etc.  This should be
 reflected
  in headers as well, but this is not always possible.  For example,
 we
  might have binary files where we cannot add headers, or cases where
  the OSS files do not have headers, but where we can prove their
  provenance via other means.
 
  b) Track this is a spreadsheet, one row per file.
 
  c) Track this is an text log file checked in SVN
 
  d) Track this in an annotated script 

Re: A systematic approach to IP review?

2011-09-20 Thread Shane Curcuru
So... has anyone actually run Apache RAT yet?  It has a scan only mode 
which I'd think would be the simplest place to start.


Personally, I'd recommend working on basic RAT scans, with the scripts 
to run them and any exception rules (for known files, etc.) all checked 
into SVN with the build tools for the code.  But hey, it's easy for me 
to suggest we do stuff, when I only currently have time to be a mentor 
and thus can get away with just making suggestions.  8-)


I like the general concept of storing the IP type for files in SVN 
properties; although properties are easy to change, Apache does have a 
strong history of being able to provide oversight for commit logs 
throughout a project's history.


- Shane


Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 1:59 PM, Rob Weir robw...@apache.org wrote:

 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
  On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:
 
  If you haven't looked it closely, it is probably worth a few minutes
  of your time to review our incubation status page, especially the
  items under Copyright and Verify Distribution Rights.  It lists
  the things we need to do, including:
 
   -- Check and make sure that the papers that transfer rights to the
  ASF been received. It is only necessary to transfer rights for the
  package, the core code, and any new code produced by the project.
 
  -- Check and make sure that the files that have been donated have been
  updated to reflect the new ASF copyright.
 
  -- Check and make sure that for all code included with the
  distribution that is not under the Apache license, we have the right
  to combine with Apache-licensed code and redistribute.
 
  -- Check and make sure that all source code distributed by the project
  is covered by one or more of the following approved licenses: Apache,
  BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
  the same terms.
 
  Some of this is already going on, but it is hard to get a sense of who
  is doing what and how much progress we have made.  I wonder if we can
  agree to a more systematic approach?  This will make it easier to see
  the progress we're making and it will also make it easier for others
  to help.
 
  Suggestions:
 
  1) We need to get all files needed for the build into SVN.  Right now
  there are some that are copied down from the OpenOffice.org website
  during the build's bootstrap process.   Until we get the files all in
  one place it is hard to get a comprehensive view of our dependencies.
 
 
  do you mean to check in the files under ext_source into svn and remove it
  later on when we have cleaned up the code. Or do you mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache extra if possible.
 


 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,

 i am not really happy with all the binaries in the trunk tree because of
the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have ext_sources besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...

If we can agree on such a structure i would move forward to bring in some
new external sources. The proposed ucpp preprocessor - BSD license, used in
the idlc and of course part of the SDK later on. I made some tests with it
and was able to build the sources on windows in our cygwin environment with
a new gnu make file. I was also able to build udkapi and offapi with this
new and adapted idlc/ucpp without any problems - generated type library is
equal to the old one.

I have to run some more tests on other platforms as soon as i have other
platforms available for testing. I decided to replace the preprocessor
instead of removing it because of compatibility reasons and it was of course
the easier change. The next step is to check how the process with
ext_sources work in detail in our build process and adapt the new ucpp
module. If anybody is familiar with ext_sources and can point me to
potential hurdles, please let me know (on a new thread) ;-)

Juergen



 
 
  2) Continue the CWS integrations.  Along with 1) this ensures that all
  the code we need for the release is in SVN.
 
  3)  Files that Oracle include in their SGA need to have the Apache
  license header inserted and the Sun/Oracle copyright migrated to the
  NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
  automate parts of this.
 
  4) Once the SGA files have the Apache headers, then we can make
  regular use of RAT to report on files that are lacking an Apache
  header.  Such files might be in one of the following categories:
 
  a) Files that Oracle owns the copyright on and which should be
  included in an amended SGA
 
  b) Files that have a compatible OSS license which we are permitted to
  use.  This might require that we add a mention of it to the NOTICE
  file.
 
  c) Files that have an incompatible 

Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote:

 So... has anyone actually run Apache RAT yet?  It has a scan only mode
 which I'd think would be the simplest place to start.

 it's on my todo list to take a look on it, probably i will come back with
questions

Juergen


 Personally, I'd recommend working on basic RAT scans, with the scripts to
 run them and any exception rules (for known files, etc.) all checked into
 SVN with the build tools for the code.  But hey, it's easy for me to suggest
 we do stuff, when I only currently have time to be a mentor and thus can
 get away with just making suggestions.  8-)

 I like the general concept of storing the IP type for files in SVN
 properties; although properties are easy to change, Apache does have a
 strong history of being able to provide oversight for commit logs throughout
 a project's history.

 - Shane



handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Oliver-Rainer Wittmann

Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:

On Mon, Sep 19, 2011 at 1:59 PM, Rob Weirrobw...@apache.org  wrote:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


...

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,

i am not really happy with all the binaries in the trunk tree because of

the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have ext_sources besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...



I like this idea.

From a developer point of view I only have to checkout ext_sources 
once and reference it from all my trunks using the already existing 
configure-switch 'with-external-tar=path to ext_sources'


Best regards, Oliver.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
Hi,

 I like this idea.
 
 From a developer point of view I only have to checkout ext_sources once and 
 reference it from all my trunks using the already existing configure-switch 
 'with-external-tar=path to ext_sources'

when we will have such repository, we will surely modify the current sources so 
you don't have to add such switch because ../ext_sources will be auto-checked.

BTW - welcome! :-)
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir
On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grand armin.le.gr...@me.com wrote:
 On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:

 Hi,

 On 20.09.2011 14:37, Jürgen Schmidt wrote:

 ...

 What do others think about a structure where we have ext_sources
 besides
 trunk.

 incubator/ooo/trunk
 incubator/ooo/ext_source
 ...

So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some downstream consumer of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?

-Rob



 I like this idea.

  From a developer point of view I only have to checkout ext_sources
 once and reference it from all my trunks using the already existing
 configure-switch 'with-external-tar=path to ext_sources'

 +1

 Also, hopefully ext_sources will not change too much (after a consolidation
 phase) and it's mostly binaries, thus not too well suited for a repository.
 Let's not extend our main repository with those binaries, please.

 Best regards, Oliver.


 Regards,
        Armin
 --
 ALG




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
 Would we be able to do this?  What if the flaw was related to code in
 ext_sources?

Then we patch it. Patch will be in the trunk/main, as always.

 And if not us, in the project, what if some downstream consumer of
 AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
 we've already updated ext_sources for AOOo 4.0?

Versions - we can and will have more tarballs of one external source.

This all is already solved.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Armin Le Grand

On 20.09.2011 15:58, Rob Weir wrote:

On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grandarmin.le.gr...@me.com  wrote:

On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:


Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:


...


What do others think about a structure where we have ext_sources
besides
trunk.

incubator/ooo/trunk
incubator/ooo/ext_source
...


So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some downstream consumer of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?


Good point. Thus, it should be part of incubator/ooo/trunk, something like:

incubator/ooo/trunk/main
incubator/ooo/trunk/extras
incubator/ooo/trunk/ext_sources

It could be in an own repro, but this would just bring up the risk to 
not use the same tags in both (by purpose or by error).


Indeed, looks as if it has to be a part of trunk somehow. Not very nice 
for binaries.


Maybe we could find a intermediate place for them as long as we will 
need to do changes pretty often. Currently we will have to do some 
add/remove/changes to it. It could be good to add them to trunk after it 
has stabilized a little more.



-Rob





I like this idea.

  From a developer point of view I only have to checkout ext_sources
once and reference it from all my trunks using the already existing
configure-switch 'with-external-tar=path to ext_sources'


+1

Also, hopefully ext_sources will not change too much (after a consolidation
phase) and it's mostly binaries, thus not too well suited for a repository.
Let's not extend our main repository with those binaries, please.


Best regards, Oliver.



Regards,
Armin
--
ALG









Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pedro Giffuni

+1
- This will make it easier to update the BSD/MIT unrestricted stuff.
- Hopefully it also means we will eventually stop depending on GNU
  patch for the build.

Welcome Oliver!
Great job Juergen: it's the first code replacement and a very
necessary one for OO forks too (unless they want to carry
lcc's copyright;) ).

cheers,

Pedro.

On Tue, 20 Sep 2011 15:44:59 +0200, Pavel Janík pa...@janik.cz wrote:

Hi,


I like this idea.

From a developer point of view I only have to checkout ext_sources 
once and reference it from all my trunks using the already existing 
configure-switch 'with-external-tar=path to ext_sources'


when we will have such repository, we will surely modify the current
sources so you don't have to add such switch because ../ext_sources
will be auto-checked.

BTW - welcome! :-)




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
 Have we ever considered using version control to...uh...manage file versions?
 
 Just an idea.


Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir
Ai2011/9/20 Pavel Janík pa...@janik.cz:
 Have we ever considered using version control to...uh...manage file versions?

 Just an idea.


 Maybe Heiner will say more, but in the past, we have had the external 
 tarballs in the VCS, but then we moved them out and it worked very well. 
 There never was a reason to track external.tar.gz files in VCS, because we do 
 not change them.
 --

That's fine.  If they don't change, then doing a svn update will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.

 Pavel Janík






Re: A systematic approach to IP review?

2011-09-20 Thread Rob Weir
2011/9/20 Jürgen Schmidt jogischm...@googlemail.com:
 On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru a...@shanecurcuru.org wrote:

 So... has anyone actually run Apache RAT yet?  It has a scan only mode
 which I'd think would be the simplest place to start.

 it's on my todo list to take a look on it, probably i will come back with
 questions


I did a run earlier today.  Good news is we have 4 files with Apache
license.  Bad news is we have 52,876 files with unknown license.  In
most cases that should just be the standard OOo header.

These scans will be much more useful after we've replaced the OOo
headers with Apache headers.  But we can't just do a global change.
We should only make that change for files that are in the official
Oracle SGA.  After that is done, then the RAT report will be more
useful.

 Juergen


 Personally, I'd recommend working on basic RAT scans, with the scripts to
 run them and any exception rules (for known files, etc.) all checked into
 SVN with the build tools for the code.  But hey, it's easy for me to suggest
 we do stuff, when I only currently have time to be a mentor and thus can
 get away with just making suggestions.  8-)

 I like the general concept of storing the IP type for files in SVN
 properties; although properties are easy to change, Apache does have a
 strong history of being able to provide oversight for commit logs throughout
 a project's history.

 - Shane




Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 3:34 AM, Pedro Giffuni giffu...@tutopia.com wrote:

 Hi;

 Is there an updated SGA already?


good question and where can we find it?

Juergen



 I think there will likely be a set of files of uncertain license
 that we should move to apache-extras. I am refering specifically
 to the dictionaries: Oracle might have property over some but not
 all. I propose we rescue myspell in apache-extras and put the
 dictionaries there to keep it as an alternative. I have no idea
 where to get MySpell though.

 While here, if there's still interest in maintaining the Hg
 history, bitbucket.org seems to be a nice alternative: it's
 rather specialized in Mercurial.

 Cheers,

 Pedro.


 On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


 -Rob


 [1] 
 http://incubator.apache.org/**projects/openofficeorg.htmlhttp://incubator.apache.org/projects/openofficeorg.html

 [2] http://incubator.apache.org/**rat/ http://incubator.apache.org/rat/





Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.



 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

But we need a place to document the IP clearance and under
http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some
information.

Juergen




 -Rob


 [1] http://incubator.apache.org/projects/openofficeorg.html

 [2] http://incubator.apache.org/rat/



Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Sun, Sep 18, 2011 at 9:34 PM, Pedro Giffuni giffu...@tutopia.com wrote:
 Hi;

 Is there an updated SGA already?


Not that I know of.   But we can and should go ahead with IP clearance
using the SGA we already have.   In fact, starting that process will
help us identify exactly which files needed to be added to the updated
SGA.

-Rob


 I think there will likely be a set of files of uncertain license
 that we should move to apache-extras. I am refering specifically
 to the dictionaries: Oracle might have property over some but not
 all. I propose we rescue myspell in apache-extras and put the
 dictionaries there to keep it as an alternative. I have no idea
 where to get MySpell though.

 While here, if there's still interest in maintaining the Hg
 history, bitbucket.org seems to be a nice alternative: it's
 rather specialized in Mercurial.

 Cheers,

 Pedro.

 On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


 -Rob


 [1] http://incubator.apache.org/projects/openofficeorg.html

 [2] http://incubator.apache.org/rat/




Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going 

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 01:59 PM, schrieb Rob Weir:

2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.



talked 

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  

Re: A systematic approach to IP review?

2011-09-19 Thread Pedro F. Giffuni


--- On Mon, 9/19/11, Rob Weir robw...@apache.org wrote:
...
 2011/9/19 Jürgen Schmidt jogischm...@googlemail.com:
...
 
  do you mean to check in the files under ext_source
 into svn and remove it
  later on when we have cleaned up the code. Or do you
 mean to put it
  somehwere on apache extras?
  I would prefer to save these binary files under apache
 extra if possible.
 
 
 
 Why not just keep in in SVN?   Moving things
 to Apache-Extras does not
 help us with the IP review.   In other
 words, if we have a dependency
 on a OSS module that has an incompatible license, then
 moving that
 module to Apache Extras does not make that dependency go
 away.  We
 still need to understand the nature of the dependency: a
 build tool, a
 dynamic runtime dependency, a statically linked library, an
 optional
 extensions, a necessary core module.


But adding in stuff that we have to remove immediately (nss,
seamonkey, .. ) doesn't help either. I also think a lot of
that stuff has to be updated before brought in: ICU
apparently would be trouble, but the Apache-commons, ICC,
and other stuff can/should be updated.

snip

 a) Track this in SVN properties.  So set ip:sga
 for the SGA files,
  ip:mit for files that are MIT licensed, etc.


I thought we had delayed updating the copyrights in the
header to ease the CWS integration. I still hope to see
more of those, especially anything related to gnumake
(I don't know when, but dmake has to go!).

Using the SVN properties is a good idea. And we do have
to start the NOTICES file.

All just IMHO, of course.

Pedro.


RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).

I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.

It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention

RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
On the wiki question, I think OOOUSERS should continue to be used for 
transition work.  Or OOODEV could be used if it needs to be limited to 
committers (perhaps the case for this activity), although it means power 
observers can't contribute there and have to do so by some other means.

This is transition work and the Confluence wiki seems like a good place for it.

The MW may be interrupted or disrupted and it is probably a good idea to *not* 
put such development-transition intensive content there.  

Also, the migrated wiki is not the live wiki at OpenOffice.org.  So doing 
anything there will create collisions.  It is also not fully migrated in that 
it is not operating in place of what folks see via OpenOffice.org as far as I 
know.  The current Confluence wikis avoid confusion and are stable for this 
particular purpose.

 - Dennis



-Original Message-
From: Jürgen Schmidt [mailto:jogischm...@googlemail.com] 
Sent: Monday, September 19, 2011 01:45
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir robw...@apache.org wrote:

[ ... ]

 7) Goal should be for anyone today to be able to see what work remains
 for IP clearance, as well as for someone 5 years from now to be able
 to tell what we did.  Tracking this on the community wiki is probably
 not good enough, since we've previously talked about dropping that
 wiki and going to MWiki.


talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

[ ... ]



Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 04:47 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community 

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 Rob,

 I was reading the suggestion from Marcus as it being that since the code base 
 is in a folder structure (modularized) and the wiki can map folder structures 
 and their status nicely, it is not necessary to have a single table to manage 
 this from, but have any tables be at some appropriate granularity toward the 
 leaves of the hierarchy (on the wiki).


Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.

But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?

A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

 I can see some brittle cases, especially in the face of refactoring.  The use 
 of the wiki might have to be an ephemeral activity that is handled this way 
 entirely for our initial scrubbing.

 Ideally, additional and sustained review would be in the SVN with the 
 artifacts so reviewed, and coalesced somehow.  The use of SVN properties is 
 interesting, but they are rather invisible and I have a question about what 
 happens with them when a commit happens against the particular artifact.


Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.

 It seems that there is some need to balance an immediate requirement and what 
 would be sufficient for it versus what would assist us in the longer term.  
 It would be interesting to know what the additional-review work has become 
 for other projects that have a substantial code base (e.g., SVN itself, 
 httpd, ...).  I have no idea.


The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.

  - Dennis

 -Original Message-
 From: Rob Weir [mailto:robw...@apache.org]
 Sent: Monday, September 19, 2011 07:47
 To: ooo-dev@incubator.apache.org
 Subject: Re: A systematic approach to IP review?

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove it
 later on when we have

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 04:47 PM, schrieb Rob Weir:

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:

 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:

 On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org    wrote:

 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under Copyright and Verify Distribution Rights.  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.


 do you mean to check in the files under ext_source into svn and remove
 it
 later on when we have cleaned up the code. Or do you mean to put it
 somehwere on apache extras?
 I would prefer to save these binary files under apache extra if
 possible.



 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,




 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that runs RAT, where the
 annotations document the reason for cases where we tell it to ignore a
 file or directory.

 6) Iterate until we have a clean RAT report.

 7) Goal should be for anyone today to be 

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 06:54 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
dennis.hamil...@acm.org  wrote:

Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).



Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.


How do you want to use Bugzilla to track thousands of files?


But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?


Questions over questions but not helpful. ;-)


A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.


Then restrict the write access.


The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.


IMHO you haven't understood what I wanted to tell you.

Sure it makes no sense to create a list of every file in SVN to see if 
the license is good or bad. So, do it module by module. And when a 
module is marked as done, then of course every file in the modules was 
checked. Otherwise it's not working.


And how to make sure that there was no change when source was 
added/moved/improved? Simply Commit Then Review (CTR). A change in the 
license header at the beginning should be remarkable, right? However, we 
also need to have trust in everybodies work.


BTW:
What is your plan to track every file to make sure the license is OK?

Marcus




I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.



Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.


It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.



The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.


  - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org]
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.orgwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 07:05 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)marcus.m...@wtnet.de  wrote:

Am 09/19/2011 04:47 PM, schrieb Rob Weir:


On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:


Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidtjogischm...@googlemail.com:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirrobw...@apache.org  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove
it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if
possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work 

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 1:19 PM, Marcus (OOo) marcus.m...@wtnet.de wrote:
 Am 09/19/2011 06:54 PM, schrieb Rob Weir:

 On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
 dennis.hamil...@acm.org  wrote:

 Rob,

 I was reading the suggestion from Marcus as it being that since the code
 base is in a folder structure (modularized) and the wiki can map folder
 structures and their status nicely, it is not necessary to have a single
 table to manage this from, but have any tables be at some appropriate
 granularity toward the leaves of the hierarchy (on the wiki).


 Using the wiki for this might be useful for tracking the status of
 modules we already know we need to replace.  Bugzilla would be another
 way to track the status.

 How do you want to use Bugzilla to track thousands of files?


No.  But for tracking module review, Bugzilla might be better than the
wiki.  It allows us to have a conversation on each module via
comments.

 But it is not really a sufficient solution.  Why?  Because it is not
 tied to the code and is not reproducible.  How was the list of
 components listed in the wiki generated?  Based on what script?  Where
 is the script?  How do we know it is accurate and current?  How do we
 know that integrating a CWS does not make that list become outdated?
 How do we prove to ourselves that we did this right?  And how to we
 record that proof as a record?  And how do we repeat this proof every
 time we do a new release?

 Questions over questions but not helpful. ;-)

 A list of components of unknown derivation sitting on a community wiki
 that anyone can edit is not really a suitable basis for an IP review.

 Then restrict the write access.

 The granularity we need to worry about is the file.  That is the
 finest grain level of having a license header.  That is the unit of
 tracking in SVN.  That is the unit that someone could have changed the
 content in SVN.

 Again, it is fine if someone wants to outline this at the module
 level.  But that does not eliminate the requirement for us to do this
 at the file level as well.

 IMHO you haven't understood what I wanted to tell you.


I understand what you are saying.  I just don't agree with you.

 Sure it makes no sense to create a list of every file in SVN to see if the
 license is good or bad. So, do it module by module. And when a module is
 marked as done, then of course every file in the modules was checked.
 Otherwise it's not working.


That is not a consistent approach. Every developer applies their own
criteria.   It is not reproducible. It leaves no audit trail.  And it
doesn't help us with the next release.

If you use the Apache Release Audit Tool (RAT) then it will check all
the files automatically.

 And how to make sure that there was no change when source was
 added/moved/improved? Simply Commit Then Review (CTR). A change in the
 license header at the beginning should be remarkable, right? However, we
 also need to have trust in everybodies work.


We would run RAT before every release and with every significant code
contribution.

You can think of this as a form of CTR, but one that is automated,
with a consistent rule set.

Obviously, good CTR plus the work on the wiki will all help.  But we
need the RAT scans as well, to show that we're clean.

 BTW:
 What is your plan to track every file to make sure the license is OK?


Run RAT.  That is what it does.

 Marcus



 I can see some brittle cases, especially in the face of refactoring.  The
 use of the wiki might have to be an ephemeral activity that is handled this
 way entirely for our initial scrubbing.

 Ideally, additional and sustained review would be in the SVN with the
 artifacts so reviewed, and coalesced somehow.  The use of SVN properties is
 interesting, but they are rather invisible and I have a question about what
 happens with them when a commit happens against the particular artifact.


 Properties stick with the file, unless changed.  Think of the
 svn:eol-style property.  It is not wiped out with a new revision of
 the file.

 It seems that there is some need to balance an immediate requirement and
 what would be sufficient for it versus what would assist us in the longer
 term.  It would be interesting to know what the additional-review work has
 become for other projects that have a substantial code base (e.g., SVN
 itself, httpd, ...).  I have no idea.


 The IP review needs to occur with every release.  So the work we do to
 automate this, and make it data-drive, will repay itself with every
 release.

 I invite you to investigate what other projects do.  When you do I
 think you will agree.

  - Dennis

 -Original Message-
 From: Rob Weir [mailto:robw...@apache.org]
 Sent: Monday, September 19, 2011 07:47
 To: ooo-dev@incubator.apache.org
 Subject: Re: A systematic approach to IP review?

 On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)marcus.m...@wtnet.de
  wrote:

 Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidtjogischm

RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I agree running rat is important ...

I haven't heard any suggestion that such an important tool not be used.

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 10:05
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

I think the wiki is fine as a collaboration tool, to list tasks and
who is working on them.  But that is not a substitute for running
scans with the Apache Release Audit Tool (RAT) and working toward a
clean report.

Think of it this way:

1) We have a list of modules on the wiki that we need to replace.
Great.  Developers can work on that list.

2) But how do we know that the list on the wiki is complete?  How do
we know that it is not missing anything?

3) Running RAT against the source is how we ensure that the code is clean

In other words, the criteria should be that we have a clean RAT
record, not that we have a clean wiki.  The list of modules on the
wiki is not traceable to a scan of the source code.  It is not
reproducible.  It might be useful.  But it is not sufficient.

-Rob

[ ... ]



RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I hope that Rat can produce a list of OK and exclude not OK on the first use, 
since the list of not OK would overwhelm everything else about the current 
repository.

 - Dennis

-Original Message-
From: Marcus (OOo) [mailto:marcus.m...@wtnet.de] 
Sent: Monday, September 19, 2011 10:27
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

Am 09/19/2011 07:05 PM, schrieb Rob Weir:
[ ... ]

 3) Running RAT against the source is how we ensure that the code is clean

OK, I don't know what this can do your us. Maybe it's the solution for 
the problem.

How do you know that it is not skipping anything? I guess you simply 
would trust RAT that it is doing fine, right? ;-)

BTW:
Is RAT producing a log file, so that we have a list of every file that 
was checked? This could be very helpful.

Marcus
[ ... ]



RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I agree that there is no escape from managing down to the individual file.  It 
is a question of organization now, where the entire base is involved.

Later, if the svn:property is to be trusted, the problem is quite different, it 
seems to me.  Plus the rules are understood and provenance and IP are likely 
handled as anything needing clearance enters the code base.  What is done to 
ensure a previously-vetted code base has not become tainted strikes me as a 
kind of regression/smoke test.

It is in that regard that I am concerned the tools for this one-time case need 
not be the same as for future cases.

And, since I am not doing the work in the present case, I am offering this as 
something to think about, not a position.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 09:55
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

[ ... ]



Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 4:32 PM, Dennis E. Hamilton
dennis.hamil...@acm.org wrote:
 I agree that there is no escape from managing down to the individual file.  
 It is a question of organization now, where the entire base is involved.


RAT or something RAT-like.

 Later, if the svn:property is to be trusted, the problem is quite different, 
 it seems to me.  Plus the rules are understood and provenance and IP are 
 likely handled as anything needing clearance enters the code base.  What is 
 done to ensure a previously-vetted code base has not become tainted strikes 
 me as a kind of regression/smoke test.


Here is how I see SVN properties and RAT relating.   Any use of a
grep-like RAT-like tool will need to deal with exceptions.  We're
going to have stuff like binary files, say ODF files that are used for
testing, that don't have a header.  Or files that are used only as a
build tool, checked in for convenience, but are not part of the
release.  Or 3rd party code that does not have a header, but we know
its original, like the ICU breakiterator data files.

How do we deal with those types of files, in the content of an
automated audit tool?  One solution is to record in a big config file
or script a list of all of these exceptions.  Essentially, an list of
files to ignore in the RAT scan.

That approach would certainly work, but would be fragile.  Moving or
renaming the files would break our script.  Not the end of the world,
since this could be designed to be fail safe and give us errors on
the files that moved.

But if we track this info in SVN, then we could generate the exclusion
list from SVN, so it automatically adjusts as files are moved or
renamed.  It also avoid the problem -- and this might just be my own
engineering esthetic -- of tracking metadata for files in two
different places.  It seems rather untidy to me.

From a regression standpoint, you could treat all files as being in
one of several states:

1) Unexamined (no property set)

2) Apache 2.0 (included in the Oracle SGA or new code contributed by
committer or other person under iCLA)

3) Compatible 3rd party license

4) Incompatible 3rd party license

5) Not part of release

The goal would be to iterate until every file is in category 2, 3 or 5.

 It is in that regard that I am concerned the tools for this one-time case 
 need not be the same as for future cases.


There are two kinds of future cases:

1) Code contributed in small chunks by committers or patches, where we
can expect CTR to work.  There will be errors, but we can catch those
before we do subsequent releases via RAT.

2) Larger contributions made by SGA.  For example, the IBM Lotus
Symphony contribution, or other similar corporate contributions.  When
an Apache project receives a large code contribution like this they
need to do an IP clearance process on that contribution as well.   I
think that the RAT/SVN combination could work well here also.  The
goal would be to clear the IP on the new contributions before we start
copying or merging it into the core AOOo code.


 And, since I am not doing the work in the present case, I am offering this as 
 something to think about, not a position.

  - Dennis



RE: A systematic approach to IP review?

2011-09-18 Thread Dennis E. Hamilton
+1

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Sunday, September 18, 2011 17:27
To: ooo-dev@incubator.apache.org
Subject: A systematic approach to IP review?

If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

 -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.

2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.


-Rob


[1] http://incubator.apache.org/projects/openofficeorg.html

[2] http://incubator.apache.org/rat/



Re: A systematic approach to IP review?

2011-09-18 Thread Pedro Giffuni

Hi;

Is there an updated SGA already?

I think there will likely be a set of files of uncertain license
that we should move to apache-extras. I am refering specifically
to the dictionaries: Oracle might have property over some but not
all. I propose we rescue myspell in apache-extras and put the
dictionaries there to keep it as an alternative. I have no idea
where to get MySpell though.

While here, if there's still interest in maintaining the Hg
history, bitbucket.org seems to be a nice alternative: it's
rather specialized in Mercurial.

Cheers,

Pedro.

On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir robw...@apache.org 
wrote:

If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under Copyright and Verify Distribution Rights.  It lists
the things we need to do, including:

 -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have 
been

updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the 
project

is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of 
who

is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.

2) Continue the CWS integrations.  Along with 1) this ensures that 
all

the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need 
to

bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be 
reflected

in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore 
a

file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work 
remains

for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.


-Rob


[1] http://incubator.apache.org/projects/openofficeorg.html

[2] http://incubator.apache.org/rat/