Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Upayavira


On Wed, 30 Mar 2011 12:00 -0400, Grant Ingersoll gsing...@apache.org
wrote:
 
 On Mar 30, 2011, at 9:19 AM, Robert Muir wrote:
 
  On Wed, Mar 30, 2011 at 8:22 AM, Grant Ingersoll gsing...@apache.org 
  wrote:
  (Long post, please bear with me and please read!)
  
  Now that we have the release done (I'm working through the publication 
  process now), I want to start the process of thinking about how we can 
  improve the release process.  As I see it, building the artifacts and 
  checking the legal items are now almost completely automated and testable 
  at earlier stages in the game.
  
  
  Thanks for writing this up. Here is my major beef with 2 concrete 
  suggestions:
  
  It seems the current process is that we all develop and develop and at
  some point we agree we want to try to release. At this point its the
  RM's job to polish a turd, and no serious community participation
  takes place until an RC is actually produced: so its a chicken-and-egg
  thing, perhaps with the RM even declaring publicly 'i dont expect this
  to actually pass, i'm just building this to make you guys look at it'.
  
  I think its probably hard/impossible to force people to review this
  stuff before an RC, for some reason a VOTE seems to be the only thing
  for people to take it seriously.
  
  But what we can do is ask ourselves, how did the codebase become a
  turd in the first place? Because at one point we released off the code
  and the packaging was correct, there weren't javadocs warnings, and
  there weren't licensing issues, etc.
  
  So I think an important step would be to try to make more of this
  continuous, in other words, we did all the work to fix up the
  codebase to make it releasable, lets implement things to enforce it
  stays this way. It seems we did this for some things (e.g. code
  correctness with the unit tests and licensing with the license
  checker) but there is more to do.
  
  A. implement the hudson-patch capability to vote -1 on patches that
  break things as soon as they go on the JIRA issues. this is really
  early feedback and I think will go a long way.
 
 +1.  I asked on builds@a.o if there was any standard way of doing this,
 or if there is a place someone can point me at to get this going.
 
 
  B. increase the scope of our 'ant test'/hudson runs to check more
  things. For example, it would be nice if they failed on javadocs
  warnings. Its insane if you think about it: we go to a ton of effort
  to implement really cruel and picky unit tests to verify the
  correctness of our code, but you can almost break the packaging and
  documentation completely and the build still passes.
 
 +1 on failing on javadocs.
 
 Also, what about code coverage?  We run all this Clover stuff, but how do
 we incorporate that into our dev. cycle?
 
  
  Anyway, we spend a lot of time on trying to make our code correct, but
  our build is a bit messy. I know if we look at the time we spend on
  search performance and correctness, and applied even 1% of this effort
  to our build system to make it fast, picky, and and cleaner, that we
  would be in much better shape as a development team, with a faster
  compile/test/debug cycle to boot... I think there is a lot of
  low-hanging fruit here, and I think this thread has encouraged me to
  revisit the build and try to straighten some of this out.
 
 Yeah, our build is a bit messy, lots of recursion.  I'm still not totally
 happy w/ how license checking is hooked in.

Are you willing to say more? I have a little time, and have done a lot
of work with Ant. Maybe I could help.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Robert Muir
On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote:

 Are you willing to say more? I have a little time, and have done a lot
 of work with Ant. Maybe I could help.

 Upayavira

Thanks, there is some followup discussion on this JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2002

The prototype patch I refer to in the comments where solr build
system is changed to extend lucene's is the latest _merged.patch on
the issue: 
https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch

(Additionally as sort of a followup there are more comments/ideas
about additional things we could do besides just refactoring the build
system to be faster and simpler)

As a first step I think the patch needs to be brought up to trunk (it
gets out of date fast). I mentioned on the issue we can simply create
a branch to make coordination easier. A branch might seem silly for a
thing like this, but it would at least allow us to work together and
people could contribute parts (e.g. PMD integration or something)
without having to juggle huge out of sync patches.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Grant Ingersoll
Other things to add:

1. Managing our website is a big pain in the butt.  Why do we need to publish 
PDFs again?  We really need to get on the new CMS.
2. Copying/moving the artifacts to the release area could be automated, too

At the end of the day, #1 below is what strikes me as the biggest impediment to 
releases.

 
 
 -Original Message-
 From: ext Grant Ingersoll [mailto:gsing...@apache.org] 
 Sent: Wednesday, March 30, 2011 8:22 AM
 To: dev@lucene.apache.org
 Subject: Brainstorming on Improving the Release Process
 
 (Long post, please bear with me and please read!)
 
 Now that we have the release done (I'm working through the publication 
 process now), I want to start the process of thinking about how we can 
 improve the release process.  As I see it, building the artifacts and 
 checking the legal items are now almost completely automated and testable at 
 earlier stages in the game. 
 
 We have kept saying we want to release more often, but we have never defined 
 actionable steps with which we can get there.  Goals without actionable steps 
 are useless.
 
 So, with that in mind, I'd like to brainstorm on how we can improve things a 
 bit more.  Several us acted as RM this time around, so I think we have some 
 common, shared knowledge to take advantage of this time as opposed to in the 
 past where one person mostly just did the release in the background and then 
 we all voted.
 
 So, let's start with what we have right:
 
 1. The Ant process for building a release candidate for both Lucene and Solr 
 is almost identical now and fairly straightforward.
 2. I think the feature freeze is a good thing, although it is a bit too long 
 perhaps.
 3. Pretty good documentation on the steps involved to branch, etc.
 4. The new license validation stuff is a start for enforcing licensing up 
 front more effectively.  What else can we validate up front in terms of 
 packaging?
 5. We have an awesome test infrastructure now.  I think it is safe to say 
 that this version of Lucene is easily the most tested version we have ever 
 shipped.
 
 Things I see that can be improved, and these are only suggestions:
 
 1.  We need to define the Minimum Effective Dose (MED - 
 http://gizmodo.com/#!5709902/4+hour-body-the-principle-of-the-minimum-effective-dose)
  for producing a quality release.  Nothing more, nothing less.  I think one 
 of our biggest problems is we don't know when we are done.  It's this 
 loosey-goosey we all agree notion, but that's silly.  It's software, we 
 should be able to test almost all of the artifacts for certain attributes and 
 then release when they pass.  If we get something wrong, put in a test for it 
 in the next release.  The old saying about perfect being the enemy of great 
 applies here.
 
 In other words, we don't have well defined things that we all are looking for 
 when vetting a release candidate, other than what the ASF requires.  Look at 
 the last few vote threads or for any of the previous threads.  It's obvious 
 that we have a large variety of people doing a large variety of things when 
 it comes to testing the candidates.  For instance, I do the following:
  a. check sigs., md5 hashes, etc. 
  b. run the demos, 
  c. run the Solr example and index some content, 
  d. check over the LICENSE, NOTICE, CHANGES files
  e. Check the overall packaging, etc. is reasonable
  f. I run them through my training code
 
 Others clearly do many other things.  Many of you have your own benchmark 
 tests you run, others read over every last bit of documentation others still 
 put the RC into their own application and test it.   All of this is good, but 
 the problem is it is not _shared_ until the actual RC is up and it is not 
 repeatable (not that all of it can be).  If you have benchmark code/tests 
 that your run on an RC that doesn't involve proprietary code, why isn't it 
 donated to the project so that we can all use it?  That way we don't have to 
 wait until your -1 at the 11th hour to realize the RC is not good.  I 
 personally don't care whether it's python or perl or whatever.  Something 
 that works is better than nothing.  For instance, right now some of the 
 committers have an Apache Extras project going for benchmarking.  Can we get 
 this running on ASF resources on a regular basis?  If it's a computing 
 resource issue, let's go to Infrastructure and ask for resources.  
 Infrastructure has repeatedly said that if a project needs resources to put 
 together a proposal of what you want.  I bet we could get budget to spin up 
 an EC2 instance once a week, run those long running tests (Test2B and other 
 benchmarks) and then report back.  All of that can be automated.
 
 Also, please think hard about whether the things you test can be automated 
 and built into our test suite or at least run nightly or something on Jenkins 
 and then donating them.  I know reading documentation can't, but what else?  
 For instance, could we auto-generate the file 

Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Grant Ingersoll

On Mar 31, 2011, at 11:51 AM, Marvin Humphrey wrote:

 On Thu, Mar 31, 2011 at 11:45:53AM -0400, Grant Ingersoll wrote:
 Why do we need to publish PDFs again?  
 
 IIRC, publishing PDFs is the default in Forrest.  It might have been a passive
 choice.

Yeah, it is.  I know.  Just one more thing to worry about when it is broken.

I think we need to simplify across a lot of our processes and get back to what 
I said earlier Minimum Effective Dose when it comes to builds, releases, etc.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Upayavira


On Thu, 31 Mar 2011 09:51 -0400, Robert Muir rcm...@gmail.com wrote:
 On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote:
 
  Are you willing to say more? I have a little time, and have done a lot
  of work with Ant. Maybe I could help.
 
  Upayavira
 
 Thanks, there is some followup discussion on this JIRA issue:
 https://issues.apache.org/jira/browse/SOLR-2002
 
 The prototype patch I refer to in the comments where solr build
 system is changed to extend lucene's is the latest _merged.patch on
 the issue:
 https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch
 
 (Additionally as sort of a followup there are more comments/ideas
 about additional things we could do besides just refactoring the build
 system to be faster and simpler)
 
 As a first step I think the patch needs to be brought up to trunk (it
 gets out of date fast). I mentioned on the issue we can simply create
 a branch to make coordination easier. A branch might seem silly for a
 thing like this, but it would at least allow us to work together and
 people could contribute parts (e.g. PMD integration or something)
 without having to juggle huge out of sync patches.

Thx. I'll take a look in the (uk) morning.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-30 Thread Grant Ingersoll
I'll also add:

We need to figure out a better approach for CHANGES.txt.  Step 4 of the 
Publishing process is a PITA.

On Mar 30, 2011, at 8:22 AM, Grant Ingersoll wrote:

 (Long post, please bear with me and please read!)
 
 Now that we have the release done (I'm working through the publication 
 process now), I want to start the process of thinking about how we can 
 improve the release process.  As I see it, building the artifacts and 
 checking the legal items are now almost completely automated and testable at 
 earlier stages in the game. 
 
 We have kept saying we want to release more often, but we have never defined 
 actionable steps with which we can get there.  Goals without actionable steps 
 are useless.
 
 So, with that in mind, I'd like to brainstorm on how we can improve things a 
 bit more.  Several us acted as RM this time around, so I think we have some 
 common, shared knowledge to take advantage of this time as opposed to in the 
 past where one person mostly just did the release in the background and then 
 we all voted.
 
 So, let's start with what we have right:
 
 1. The Ant process for building a release candidate for both Lucene and Solr 
 is almost identical now and fairly straightforward.
 2. I think the feature freeze is a good thing, although it is a bit too long 
 perhaps.
 3. Pretty good documentation on the steps involved to branch, etc.
 4. The new license validation stuff is a start for enforcing licensing up 
 front more effectively.  What else can we validate up front in terms of 
 packaging?
 5. We have an awesome test infrastructure now.  I think it is safe to say 
 that this version of Lucene is easily the most tested version we have ever 
 shipped.
 
 Things I see that can be improved, and these are only suggestions:
 
 1.  We need to define the Minimum Effective Dose (MED - 
 http://gizmodo.com/#!5709902/4+hour-body-the-principle-of-the-minimum-effective-dose)
  for producing a quality release.  Nothing more, nothing less.  I think one 
 of our biggest problems is we don't know when we are done.  It's this 
 loosey-goosey we all agree notion, but that's silly.  It's software, we 
 should be able to test almost all of the artifacts for certain attributes and 
 then release when they pass.  If we get something wrong, put in a test for it 
 in the next release.  The old saying about perfect being the enemy of great 
 applies here.
 
 In other words, we don't have well defined things that we all are looking for 
 when vetting a release candidate, other than what the ASF requires.  Look at 
 the last few vote threads or for any of the previous threads.  It's obvious 
 that we have a large variety of people doing a large variety of things when 
 it comes to testing the candidates.  For instance, I do the following:
  a. check sigs., md5 hashes, etc. 
  b. run the demos, 
  c. run the Solr example and index some content, 
  d. check over the LICENSE, NOTICE, CHANGES files
  e. Check the overall packaging, etc. is reasonable
  f. I run them through my training code
 
 Others clearly do many other things.  Many of you have your own benchmark 
 tests you run, others read over every last bit of documentation others still 
 put the RC into their own application and test it.   All of this is good, but 
 the problem is it is not _shared_ until the actual RC is up and it is not 
 repeatable (not that all of it can be).  If you have benchmark code/tests 
 that your run on an RC that doesn't involve proprietary code, why isn't it 
 donated to the project so that we can all use it?  That way we don't have to 
 wait until your -1 at the 11th hour to realize the RC is not good.  I 
 personally don't care whether it's python or perl or whatever.  Something 
 that works is better than nothing.  For instance, right now some of the 
 committers have an Apache Extras project going for benchmarking.  Can we get 
 this running on ASF resources on a regular basis?  If it's a computing 
 resource issue, let's go to Infrastructure and ask for resources.  
 Infrastructure has repeatedly said that if a project needs resources to put 
 together a proposal of what you want.  I bet we could get budget to spin up 
 an EC2 instance once a week, run those long running tests (Test2B and other 
 benchmarks) and then report back.  All of that can be automated.
 
 Also, please think hard about whether the things you test can be automated 
 and built into our test suite or at least run nightly or something on Jenkins 
 and then donating them.  I know reading documentation can't, but what else?  
 For instance, could we auto-generate the file formats documentation?
 
 2. We should be running and testing the release packaging process more 
 regularly.
 
 3.  I had an epiphany this release and it came via Hoss on a non release 
 related issue where, likely unbeknownst to him, he called me out for not 
 being focused on the release 
 

RE: Brainstorming on Improving the Release Process

2011-03-30 Thread karl.wright
Hi Grant,

This is a great post.

I'm not a committer for Lucene or Solr, but I'm seriously thinking that much of 
what Lucene/Solr does right should be considered by the project I AM a 
committer for: ManifoldCF.

Key things I would add based on experience with commercial software development:

(A) Left to their own devices, releases almost always get too big.  The 
temptation is to just keep adding stuff, which winds up causing a delay, which 
adds more pressure for more features to be added to the release, etc.  The only 
way to address this that I've found which works is the train leaving the 
station model, where the date gets set in advance, and decisions as to feature 
inclusion based on that date.  Practically speaking, this means extended 
periods of time where development is happening in trunk and only selected 
changes are being pulled up to the release branch.

(B) Corollary to the train leaving the station model is that any massive 
global changes must occur only towards the beginning of the cycle.  Changes 
added to the release later in the cycle must be less and less destabilizing.  
This often involves significant tradeoffs of the proper way to do things vs. 
the least risky way to do things.

(C) Finally, the larger the release, the LONGER the release branch must be 
active.  If you intend to release 4.0 this year, you should probably create the 
release branch no later than May/June, given the size of the 4.0 release 
already.

I'm sure all of this is well known, but I thought I'd state it nonetheless.

Karl


-Original Message-
From: ext Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Wednesday, March 30, 2011 8:22 AM
To: dev@lucene.apache.org
Subject: Brainstorming on Improving the Release Process

(Long post, please bear with me and please read!)

Now that we have the release done (I'm working through the publication process 
now), I want to start the process of thinking about how we can improve the 
release process.  As I see it, building the artifacts and checking the legal 
items are now almost completely automated and testable at earlier stages in the 
game. 

We have kept saying we want to release more often, but we have never defined 
actionable steps with which we can get there.  Goals without actionable steps 
are useless.

So, with that in mind, I'd like to brainstorm on how we can improve things a 
bit more.  Several us acted as RM this time around, so I think we have some 
common, shared knowledge to take advantage of this time as opposed to in the 
past where one person mostly just did the release in the background and then we 
all voted.

So, let's start with what we have right:

1. The Ant process for building a release candidate for both Lucene and Solr is 
almost identical now and fairly straightforward.
2. I think the feature freeze is a good thing, although it is a bit too long 
perhaps.
3. Pretty good documentation on the steps involved to branch, etc.
4. The new license validation stuff is a start for enforcing licensing up front 
more effectively.  What else can we validate up front in terms of packaging?
5. We have an awesome test infrastructure now.  I think it is safe to say that 
this version of Lucene is easily the most tested version we have ever shipped.

Things I see that can be improved, and these are only suggestions:

1.  We need to define the Minimum Effective Dose (MED - 
http://gizmodo.com/#!5709902/4+hour-body-the-principle-of-the-minimum-effective-dose)
 for producing a quality release.  Nothing more, nothing less.  I think one of 
our biggest problems is we don't know when we are done.  It's this 
loosey-goosey we all agree notion, but that's silly.  It's software, we 
should be able to test almost all of the artifacts for certain attributes and 
then release when they pass.  If we get something wrong, put in a test for it 
in the next release.  The old saying about perfect being the enemy of great 
applies here.

In other words, we don't have well defined things that we all are looking for 
when vetting a release candidate, other than what the ASF requires.  Look at 
the last few vote threads or for any of the previous threads.  It's obvious 
that we have a large variety of people doing a large variety of things when it 
comes to testing the candidates.  For instance, I do the following:
  a. check sigs., md5 hashes, etc. 
  b. run the demos, 
  c. run the Solr example and index some content, 
  d. check over the LICENSE, NOTICE, CHANGES files
  e. Check the overall packaging, etc. is reasonable
  f. I run them through my training code

Others clearly do many other things.  Many of you have your own benchmark tests 
you run, others read over every last bit of documentation others still put the 
RC into their own application and test it.   All of this is good, but the 
problem is it is not _shared_ until the actual RC is up and it is not 
repeatable (not that all of it can be).  If you have benchmark code/tests that 
your run on an RC 

Re: Brainstorming on Improving the Release Process

2011-03-30 Thread Robert Muir
On Wed, Mar 30, 2011 at 8:22 AM, Grant Ingersoll gsing...@apache.org wrote:
 (Long post, please bear with me and please read!)

 Now that we have the release done (I'm working through the publication 
 process now), I want to start the process of thinking about how we can 
 improve the release process.  As I see it, building the artifacts and 
 checking the legal items are now almost completely automated and testable at 
 earlier stages in the game.


Thanks for writing this up. Here is my major beef with 2 concrete suggestions:

It seems the current process is that we all develop and develop and at
some point we agree we want to try to release. At this point its the
RM's job to polish a turd, and no serious community participation
takes place until an RC is actually produced: so its a chicken-and-egg
thing, perhaps with the RM even declaring publicly 'i dont expect this
to actually pass, i'm just building this to make you guys look at it'.

I think its probably hard/impossible to force people to review this
stuff before an RC, for some reason a VOTE seems to be the only thing
for people to take it seriously.

But what we can do is ask ourselves, how did the codebase become a
turd in the first place? Because at one point we released off the code
and the packaging was correct, there weren't javadocs warnings, and
there weren't licensing issues, etc.

So I think an important step would be to try to make more of this
continuous, in other words, we did all the work to fix up the
codebase to make it releasable, lets implement things to enforce it
stays this way. It seems we did this for some things (e.g. code
correctness with the unit tests and licensing with the license
checker) but there is more to do.

A. implement the hudson-patch capability to vote -1 on patches that
break things as soon as they go on the JIRA issues. this is really
early feedback and I think will go a long way.
B. increase the scope of our 'ant test'/hudson runs to check more
things. For example, it would be nice if they failed on javadocs
warnings. Its insane if you think about it: we go to a ton of effort
to implement really cruel and picky unit tests to verify the
correctness of our code, but you can almost break the packaging and
documentation completely and the build still passes.

Anyway, we spend a lot of time on trying to make our code correct, but
our build is a bit messy. I know if we look at the time we spend on
search performance and correctness, and applied even 1% of this effort
to our build system to make it fast, picky, and and cleaner, that we
would be in much better shape as a development team, with a faster
compile/test/debug cycle to boot... I think there is a lot of
low-hanging fruit here, and I think this thread has encouraged me to
revisit the build and try to straighten some of this out.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-30 Thread Grant Ingersoll

On Mar 30, 2011, at 9:19 AM, Robert Muir wrote:

 On Wed, Mar 30, 2011 at 8:22 AM, Grant Ingersoll gsing...@apache.org wrote:
 (Long post, please bear with me and please read!)
 
 Now that we have the release done (I'm working through the publication 
 process now), I want to start the process of thinking about how we can 
 improve the release process.  As I see it, building the artifacts and 
 checking the legal items are now almost completely automated and testable at 
 earlier stages in the game.
 
 
 Thanks for writing this up. Here is my major beef with 2 concrete suggestions:
 
 It seems the current process is that we all develop and develop and at
 some point we agree we want to try to release. At this point its the
 RM's job to polish a turd, and no serious community participation
 takes place until an RC is actually produced: so its a chicken-and-egg
 thing, perhaps with the RM even declaring publicly 'i dont expect this
 to actually pass, i'm just building this to make you guys look at it'.
 
 I think its probably hard/impossible to force people to review this
 stuff before an RC, for some reason a VOTE seems to be the only thing
 for people to take it seriously.
 
 But what we can do is ask ourselves, how did the codebase become a
 turd in the first place? Because at one point we released off the code
 and the packaging was correct, there weren't javadocs warnings, and
 there weren't licensing issues, etc.
 
 So I think an important step would be to try to make more of this
 continuous, in other words, we did all the work to fix up the
 codebase to make it releasable, lets implement things to enforce it
 stays this way. It seems we did this for some things (e.g. code
 correctness with the unit tests and licensing with the license
 checker) but there is more to do.
 
 A. implement the hudson-patch capability to vote -1 on patches that
 break things as soon as they go on the JIRA issues. this is really
 early feedback and I think will go a long way.

+1.  I asked on builds@a.o if there was any standard way of doing this, or if 
there is a place someone can point me at to get this going.


 B. increase the scope of our 'ant test'/hudson runs to check more
 things. For example, it would be nice if they failed on javadocs
 warnings. Its insane if you think about it: we go to a ton of effort
 to implement really cruel and picky unit tests to verify the
 correctness of our code, but you can almost break the packaging and
 documentation completely and the build still passes.

+1 on failing on javadocs.

Also, what about code coverage?  We run all this Clover stuff, but how do we 
incorporate that into our dev. cycle?

 
 Anyway, we spend a lot of time on trying to make our code correct, but
 our build is a bit messy. I know if we look at the time we spend on
 search performance and correctness, and applied even 1% of this effort
 to our build system to make it fast, picky, and and cleaner, that we
 would be in much better shape as a development team, with a faster
 compile/test/debug cycle to boot... I think there is a lot of
 low-hanging fruit here, and I think this thread has encouraged me to
 revisit the build and try to straighten some of this out.

Yeah, our build is a bit messy, lots of recursion.  I'm still not totally happy 
w/ how license checking is hooked in.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org