There is one more test category that we could add to the list that is used by Hudson: "renewalmanager". All the other categories have one or more issues (I have run all these tests myself many, many times), mostly because of missing infrastructure, but some also fail unexpectedly.
2010/8/24 Patricia Shanahan <[email protected]> > I'm not sure how much that would tell us, done on a bulk basis, because > some of the tests will be specific to bugs that were found and fixed after > then. > > I will be doing something similar for individual tests, but taking into > account what their comments tell me about which versions are expected to > pass. > > Patricia > > > > On 8/24/2010 1:02 PM, Patrick Wright wrote: > >> Hi Patricia >> >> Is there perhaps a solid baseline to test against, for example Jini >> 2.1 to see how many pass/fails we get? >> >> Thanks for all the hard work >> Patrick >> >> On Tue, Aug 24, 2010 at 9:58 PM, Patricia Shanahan<[email protected]> wrote: >> >>> I ran a batch of the previously ignored QA tests overnight. I got 156 >>> passes >>> and 64 failures. This is nowhere near as bad as it sounds, because many >>> of >>> the failures were clusters of related tests failing in similar ways, >>> suggesting a single problem affecting the base infrastructure for the >>> test >>> category. Some of the failures may relate to the known regression that >>> Peter >>> is going to look at this week. >>> >>> Also, it is important to remember that the bugs may be in the tests, not >>> in >>> the code under test. A test may be obsolete, depending on behavior that >>> is >>> no longer supported. >>> >>> I do think there is a good enough chance that at least one of the >>> failures >>> represents a real problem, and an opportunity to improve River, that I >>> plan >>> to start a background activity looking at failed tests to see what is >>> going >>> on. The objective is to do one of three things for each cluster of >>> failures: >>> >>> 1. Fix River. >>> >>> 2. Fix the test. >>> >>> 3. Decide the test is unfixable, and delete it. There is no point >>> spending >>> disk space, file transfer time, and test load time on tests we are never >>> going to run. >>> >>> Running the subset I did last night took about 15 hours, but that >>> included a >>> lot of timeouts. >>> >>> Patricia >>> >>> >> >
