On Thu, Aug 9, 2012 at 10:39 AM, Ariel Constenla-Haile <[email protected]> wrote: > Hi Jürgen, > > On Thu, Aug 09, 2012 at 01:33:29PM +0200, Jürgen Schmidt wrote: >> I would like to propose now a new snapshot build based revision >> 1371068 (tel:1371068). > > -1 > > Didn't the last build show us that it is really a bad idea to propose > one build just because there is a fix for a release blocker? Browse the > archive looking for the rev. number to get a timeline idea: > > 1367440 > 1367911 > 1368799 > 1368968 > 1369110 > 1369843 > > A small resume: Rob's finding the missing update setting, Josef finding > two issues on Sunday, even before the RC was announced on Monday; a new > RC for those two fixes on Tuesday; now there is a fix, so yet another > RC... What if another release blockers are found tomorrow? Yet another > RC on Friday if the fix is available? >
If the newly found bugs were caused by fixes to previous release blocker bugs, then that would be a big problem. But is this the case here? Recent bugs we've had are: -- password protection not working -- update notifications not set -- hang first time launching Impress due to threading issue with Presenter Console I don't think any of these caused issues with printing. Certainly the browser plugin dialog issue goes back to March, before AOO 3.4.0. So we're finding new bugs. The frustrating thing is that we're not finding them sooner. For example, at least two of the bugs (the missing update notification setting and the the browser plugin dialog issue) existing in 3.4.0. It is certainly possible that waiting another week will uncover more such bugs. Waiting 2 weeks will find more. Waiting 2 months even more. But when do we know it is ready to ship? IMHO, there are two things we should be concerned with: 1) When a last minute fix is made, be sure that we're taking steps to reduce the risk of introducing new bugs. The risk is high because the last minute fix is made after the main test pass has completed. So you want to reduce the risk of new errors via code reviews, targeted testing, etc. 2) Ensuring that our main test pass for each release is able to complete before we vote on a release. Our confidence in the quality of a release relies on our test coverage. 3) Reducing unnecessary churn on those who are building the binaries. The process I'm familiar with aims to reduce the rate of code changes, so less testing is invalidated by further code changes. It goes something like this: 1. A "feature freeze" date. All new feature work for a release is done. If a feature is not testable by that date it may be dropped from the release plan. Bugs might still exist, but testing is not blocked. 2. This then allows a full test pass on the release, maybe a two-week pass for a project like this. Bugs are still being fixed, and the QA team will be sure to retest areas as they are fixed. But we rarely have the luxury to do a 2nd complete test pass. So this requires excellent communications between coders and testers. 3. After all critical bugs are fixed and verified you then have a "code freeze". Any changes after this point must meet a high threshold to be fixed. Further fixes might require code review, for example. The goal is to avoid risk. Risk comes from code changes. Any code change has risk. So we avoid even trivial fixes unless the impact is severe. There are no risk-free trivial fixes. So we're at step #3 now. The decision on whether to vote on the release should be based on our confidence that there are no further show stopper bugs. How do we know whether this is true? One formal method is to look at bug find rates, e.g., the number of bugs you find per hour of testing. As the product improves in quality this rate will go down. There will always be more bugs to be found, of course, but if the rate is coming down that is a healthy sign. So doing another few days of testing is fine with me. But shouldn't we be testing a new build with all the latest fixes in it? > In the meantime, I propose > > https://issues.apache.org/ooo/show_bug.cgi?id=120518 I don't see how that one is critical. What is the user impact? Is there data loss? A crash? Or is that dialog option just a no-op now? > https://issues.apache.org/ooo/show_bug.cgi?id=120389 > > Crash bug 120389 has been reported on 2012-07-27, nobody notice it until > the user made some noise. This shows that something is really not > working with the way RC are proposed right after a fix is found for > a release blocker, IMHO there should be enough time (a week, for > example) to test the RC, even if a release blocker is detected, because > nothing prevent this from finding another release blocker. > Certainly the RC build did not create the bug. If this is a regression and we're finding it this late, and from a user, it means a few things: 1) the bug was introduced in error in a release that is supposed to have only a handful of maintenance fixes 2) the bug was not detected in any previous formal test passes for this release 3) we're probably not paying enough attention to printing in our testing in general, since we've seen two defects that would have been obvious if printing was tested 4) we're getting useful feedback from users testing dev snapshots outside of our formal testing work. But the reports are being missed, perhaps because users are not aware of our conventions for setting release blocker flags. There are several solutions here. But the existence or non-existence of a defect like this is independent of the pace of builds. Frequent builds and frequent testing is a good thing, IMHO. We just don't want that to translate into frequent voting ;-) Regards, -Rob > > Regards > -- > Ariel Constenla-Haile > La Plata, Argentina
