Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
> On 17 Oct 2018, at 15:54, Vincent Massol wrote: > > Hi, > >> On 17 Oct 2018, at 11:20, Vincent Massol wrote: >> >> Hi, >> >> [snip] >> >>> Process to run DSpot: >>> 1) Pick a module. Measure coverage and mutation score (or take the value >>> there already if they’re in the pom.xml). Same as for Descartes testing. >>> 2) Run DSpot on the module, see >>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for >>> explanations >> >> One important detail that I had missed. We need to run Dspot with >> “—descartes” on the command line so that it uses Descartes for computing the >> mutation score for mutations and only keep tests that increase the mutation >> score as reported by Descartes. > > So actually, after speaking with Benjamin, I’ve realized a few things: > > * By default DSpot runs with the PIT selector (PitMutantScoreSelector) which > is configured to use the default PIT mutations. This is why we need to run > with the PIT selector but configured to use the Descartes mutation, and this > is done by specifying --descartes. > * Now this will optimize the generation of new tests for their increased > mutation score. Right now we got 0% all the time on our tests (see > https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816) > and it’s because we didn’t use --descartes. We need to try again or run on > new modules with --descartes and see what it gives us. It’s possible it’ll > generate even less tests… > * For the coverage part, there are 2 other selectors that can be used with > DSpot to generate tests that all increase the coverage: > ** "--test-criterion JacocoCoverageSelector": uses jacoco and keep tests that > increase the instruction coverage > ** "--test-criterion CloverCoverageSelector”: uses openclover and keep tests > that increase the branch coverage > > So we need to test with the various selectors and see what we get. I’ve retested on xwiki-commons-component-default: 1) With —descartes: failure, see https://github.com/STAMP-project/dspot/issues/584 2) With jacoco selector: failure, see https://github.com/STAMP-project/dspot/issues/586. I’ve manually fixed the tests and remove those that didn’t pass. I got only +0.18% jacoco coverage increase and -2% descartes mutation score… That’s the problem, we would need a selector that optimizes for both. I’ve created https://github.com/STAMP-project/dspot/issues/587 3) With clover selector: no tests generated! Opened https://github.com/STAMP-project/dspot/issues/588 So my recommendation is to wait for https://github.com/STAMP-project/dspot/issues/584 to be fixed and then to use —descartes for our measures FTM. Thanks -Vincent PS: Command lines used for reference: - java -jar /Users/vmassol/dev/dspot/dspot/target/dspot-1.1.1-SNAPSHOT-jar-with-dependencies.jar --path-to-properties dspot.properties --descartes --verbose --generate-new-test-class --with-comment - java -jar /Users/vmassol/dev/dspot/dspot/target/dspot-1.1.1-SNAPSHOT-jar-with-dependencies.jar --path-to-properties dspot.properties --test-criterion JacocoCoverageSelector --verbose --generate-new-test-class --with-comment - java -jar /Users/vmassol/dev/dspot/dspot/target/dspot-1.1.1-SNAPSHOT-jar-with-dependencies.jar --path-to-properties dspot.properties --test-criterion CloverCoverageSelector --verbose --generate-new-test-class --with-comment > > If we want to get the best values, we should use --descartes for K03 and > either jacoco or clover selector for K01. Now we need to see what tests we > get. > > Thanks > -Vincent > >> >>> 3) If DSpot has generated tests, add them to XWiki’s source code in >>> src/test/dspot and add the following to the pom of that module: >>> >>> >>> >>> >>> >>>org.codehaus.mojo >>>build-helper-maven-plugin >>> >>> >>> >>> >>> Example: >>> https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher >>> >>> Note: The generated tests sometimes need to be modified a bit to pass. >>> Personally I’ve only committed tests that were passing and I reported >>> issues for those that were not passing. >>> >>> 4) File the various reports: >>> a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki >>> both for success and failures >>> b) >>> https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816 >>> c) for failures, file a github issue at >>> https://github.com/STAMP-project/dspot/issues and link to the place on >>> https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki >>> where we put the failing result. >>> >>> Note: The reason we need to report failures too is because DSpot fails a >>> lot so we need to show what we have tested >>> >>> Thanks >>> -Vincent >>> >> >> [snip] >> >> Thanks >> -Vincent
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Hi, > On 17 Oct 2018, at 11:20, Vincent Massol wrote: > > Hi, > > [snip] > >> Process to run DSpot: >> 1) Pick a module. Measure coverage and mutation score (or take the value >> there already if they’re in the pom.xml). Same as for Descartes testing. >> 2) Run DSpot on the module, see >> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for >> explanations > > One important detail that I had missed. We need to run Dspot with > “—descartes” on the command line so that it uses Descartes for computing the > mutation score for mutations and only keep tests that increase the mutation > score as reported by Descartes. So actually, after speaking with Benjamin, I’ve realized a few things: * By default DSpot runs with the PIT selector (PitMutantScoreSelector) which is configured to use the default PIT mutations. This is why we need to run with the PIT selector but configured to use the Descartes mutation, and this is done by specifying --descartes. * Now this will optimize the generation of new tests for their increased mutation score. Right now we got 0% all the time on our tests (see https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816) and it’s because we didn’t use --descartes. We need to try again or run on new modules with --descartes and see what it gives us. It’s possible it’ll generate even less tests… * For the coverage part, there are 2 other selectors that can be used with DSpot to generate tests that all increase the coverage: ** "--test-criterion JacocoCoverageSelector": uses jacoco and keep tests that increase the instruction coverage ** "--test-criterion CloverCoverageSelector”: uses openclover and keep tests that increase the branch coverage So we need to test with the various selectors and see what we get. If we want to get the best values, we should use --descartes for K03 and either jacoco or clover selector for K01. Now we need to see what tests we get. Thanks -Vincent > >> 3) If DSpot has generated tests, add them to XWiki’s source code in >> src/test/dspot and add the following to the pom of that module: >> >> >> >> >> >> org.codehaus.mojo >> build-helper-maven-plugin >> >> >> >> >> Example: >> https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher >> >> Note: The generated tests sometimes need to be modified a bit to pass. >> Personally I’ve only committed tests that were passing and I reported issues >> for those that were not passing. >> >> 4) File the various reports: >> a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki >> both for success and failures >> b) >> https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816 >> c) for failures, file a github issue at >> https://github.com/STAMP-project/dspot/issues and link to the place on >> https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki >> where we put the failing result. >> >> Note: The reason we need to report failures too is because DSpot fails a lot >> so we need to show what we have tested >> >> Thanks >> -Vincent >> > > [snip] > > Thanks > -Vincent > >
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Hi, [snip] > Process to run DSpot: > 1) Pick a module. Measure coverage and mutation score (or take the value > there already if they’re in the pom.xml). Same as for Descartes testing. > 2) Run DSpot on the module, see > https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for > explanations One important detail that I had missed. We need to run Dspot with “—descartes” on the command line so that it uses Descartes for computing the mutation score for mutations and only keep tests that increase the mutation score as reported by Descartes. > 3) If DSpot has generated tests, add them to XWiki’s source code in > src/test/dspot and add the following to the pom of that module: > > > > > > org.codehaus.mojo > build-helper-maven-plugin > > > > > Example: > https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher > > Note: The generated tests sometimes need to be modified a bit to pass. > Personally I’ve only committed tests that were passing and I reported issues > for those that were not passing. > > 4) File the various reports: > a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki > both for success and failures > b) > https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816 > c) for failures, file a github issue at > https://github.com/STAMP-project/dspot/issues and link to the place on > https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki > where we put the failing result. > > Note: The reason we need to report failures too is because DSpot fails a lot > so we need to show what we have tested > > Thanks > -Vincent > [snip] Thanks -Vincent
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Hi there, We need some more DSpot results. Would be great if you could help out. See below for instructions. > On 29 Aug 2018, at 11:20, Vincent Massol wrote: > > Hi devs (and anyone else interested to improve the tests of XWiki), > > History > == > > It all started when I analyzed our global TPC and found that it was going > down globally even though we have the fail-build-on-jacoco-threshold strategy. > > I sent several email threads: > > - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 > - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn > - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 > > Note: As a consequence of this last thread, I implemented a Jenkins Pipeline > to send us a mail when the global TPC of an XWiki module goes down so that we > fix it ASAP. This is still a development in progress. A first version is done > and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to > debug it and fix it (it’s not working ATM). > > As a result of the global TPC going down/stagnating, I have proposed to have > 10.7 focused on Tests + BFD. > - Initially I proposed to focus on increasing the global TPC by looking at > the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See > the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need > to fix the red parts). > - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked > if we could instead focus on fixing tests as reported by Descartes to > increase both coverage and mutation score (ie test quality), since those are > 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we > need to work on them and increase them substantially. See > http://markmail.org/message/ejmdkf3hx7drkj52 > > The results of XWiki 10.7 has been quite poor on test improvements (more > focus on BFD than tests, lots of devs on holidays, etc). This forces us to > have a different strategy. > > Full Strategy proposal > = > > 1) As many XWiki SAS devs as possible (and anyone else from the community > who’s interested ofc! :)) should spend 1 day per week working on improving > STAMP metrics > * Currently the agreement is that Thomas and myself will do this for the > foreseeable future till we get some good-enough metric progress > * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM > (Marius, Adel if he can, Simon in the future). The idea is to see where that > could get us by using substantial manpower. > > 2) All committers: More generally the global TPC failure is also already > active and dev need to modify modules that see their global TPC go down. > > 3) All committers: Of course, the jacoco strategy is also active at each > module level. > > STAMP tools > == > > There are 4 tools developed by STAMP: > * Descartes: Improves quality of tests by increasing their mutation scores. > See http://markmail.org/message/bonb5f7f37omnnog and also > https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes > * DSpot: Automatically generate new tests, based on existing tests. See > https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot Process to run DSpot: 1) Pick a module. Measure coverage and mutation score (or take the value there already if they’re in the pom.xml). Same as for Descartes testing. 2) Run DSpot on the module, see https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for explanations 3) If DSpot has generated tests, add them to XWiki’s source code in src/test/dspot and add the following to the pom of that module: org.codehaus.mojo build-helper-maven-plugin Example: https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher Note: The generated tests sometimes need to be modified a bit to pass. Personally I’ve only committed tests that were passing and I reported issues for those that were not passing. 4) File the various reports: a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki both for success and failures b) https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816 c) for failures, file a github issue at https://github.com/STAMP-project/dspot/issues and link to the place on https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki where we put the failing result. Note: The reason we need to report failures too is because DSpot fails a lot so we need to show what we have tested Thanks -Vincent > * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and > execute tests on the software to see if the mutation works or not. Note this > is currently not fitting the need of XWiki and thus I’ve been developing > another tool as an experiment (which may go back in CAMP one
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
> On 29 Aug 2018, at 11:20, Vincent Massol wrote: [snip] > Objectives/KPIs/Metrics for STAMP > === > > The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need > to work on: > > 1) K01: Increase test coverage > * Global increase by reducing by 40% the non-covered code. For XWiki since > we’re at about 70%, this means reaching about 80% before the end of STAMP > (ie. before end of 2019) > * Increase the coverage contributions of each tool developed by STAMP. > > Strategy: > * Primary goal: > ** Increase coverage by executing Descartes and improving our tests. This is > http://markmail.org/message/ejmdkf3hx7drkj52 > ** Don’t do anything with DSpot. I’ll do that part. Note that the goal is to > write a Jenkins pipeline to automatically execute DSpot from time to time and > commit the generated tests in a separate test source and have our build > execute both src/test/java and this new test source. Contrary to what was proposed initially, it would be nice to run DSpot too. FTR a good command line to use for DSpot is: java -jar /dspot-1.1.1-SNAPSHOT-jar-with-dependencies.jar --path-to-properties dspot.properties --verbose --generate-new-test-class --with-comment The --generate-new-test-class tells DSpot to generate in its output dir only the new tests added and not include existing tests. The --with-comment tells DSpot to keep the comments and thus the license header too I did a session today and committed the results in https://github.com/STAMP-project/dspot-usecases-output/commit/113726c0aac3af3df30334d14115d89227eaebdc What I did: * For each module tested with DSpot create a folder in https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki * For cases where DSpot could generate some tests, commit them and modify the pom.xml so that they are executed * Note: tests need to have their license headers adjusted so that they don’t fail the build * Computed coverage + mutation scores before and after and reported in the README.md in each folder Thanks -Vincent > ** Don’t do anything with TestContainers FTM since I need to finish a first > working version. I may need help in the future to implement docker images for > more configurations (on Oracle, in a cluster, with LibreOffice, with an > external SOLR server, etc). > ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in K08. > * Secondary goal: > ** Increase our global TPC as mentioned above by fixing the modules in red. > > 2) K02: Reduce flaky tests. > * Objective: reduce the number of flaky tests by 20% > > Strategy: > * Record flaky tests in jira > * Fix the max number of them > > 3) K03: Better test quality > * Objective: increase mutation score by 20% > > Strategy: > * Same strategy as K01. [snip] Thanks -Vincent
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
So we had a conf call this morning and we agreed to have TFD (Test Fixing Day) on Tuesdays for the XWiki 10.8 timeframe. Those who cannot attend on Tuesday will work on the tests during the other days to catch up. This means starting today! :) Thanks -Vincent > On 30 Aug 2018, at 12:27, Adel Atallah wrote: > > Just to be clear, when I proposed "having a whole day dedicated on > using these tools", I didn't meant having to have it every week but > only once, so we can properly start improving the tests. It would be > some kind of training. > On my side I don't think I'll be able to have on a week one day > dedicated to tests and one for bug fixing, I won't have time left for > the roadmap as I will only work on the product 50% of the time. > > > On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: >> Hi, >> >> I don’t remember discussing this with you Thomas. Actually I’m not convinced >> to have a fixed day: >> * we already have a fixed BFD and having a second one doesn’t leave much >> flexibility for working on roadmap items when it’s the best >> * test sessions can be short (0.5-1 hours) and it’s easy to do them between >> other tasks >> * it can be boring to spend a full day on them >> >> Now, I agree that not having a fixed day will make it hard to make sure that >> we work 20% on that topic. >> >> So if you prefer we can define a day, knowing that some won’t be able to >> always attend during that day and in this case they should do it on another >> day. What’s important is to have 20% done each week (i.e. enough work done >> on it). >> >> In term of day, if we have to choose one, I’d say Tuesday. That’s the most >> logical to me. >> >> WDYT? What do you prefer? >> >> Thanks >> -Vincent >> >>> On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: >>> >>> Indeed we discussed this but I don't see it in your mail Vincent. >>> >>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah >>> wrote: Hello, Maybe we should agree on having a whole day dedicated on using these tools with a maximum number of developers. That way we will be able to help each other and maybe it will make the process easier to carry out in the future. WDYT? Thanks, Adel On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: > Hi devs (and anyone else interested to improve the tests of XWiki), > > History > == > > It all started when I analyzed our global TPC and found that it was going > down globally even though we have the fail-build-on-jacoco-threshold > strategy. > > I sent several email threads: > > - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 > - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn > - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 > > Note: As a consequence of this last thread, I implemented a Jenkins > Pipeline to send us a mail when the global TPC of an XWiki module goes > down so that we fix it ASAP. This is still a development in progress. A > first version is done and running at > https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and > fix it (it’s not working ATM). > > As a result of the global TPC going down/stagnating, I have proposed to > have 10.7 focused on Tests + BFD. > - Initially I proposed to focus on increasing the global TPC by looking > at the reports from 1) above > (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at > https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red > parts). > - Then with the STAMP mid-term review, a bigger urgency surfaced and I > asked if we could instead focus on fixing tests as reported by Descartes > to increase both coverage and mutation score (ie test quality), since > those are 2 metrics/KPIs measured by STAMP and since XWiki participates > to STAMP we need to work on them and increase them substantially. See > http://markmail.org/message/ejmdkf3hx7drkj52 > > The results of XWiki 10.7 has been quite poor on test improvements (more > focus on BFD than tests, lots of devs on holidays, etc). This forces us > to have a different strategy. > > Full Strategy proposal > = > > 1) As many XWiki SAS devs as possible (and anyone else from the community > who’s interested ofc! :)) should spend 1 day per week working on > improving STAMP metrics > * Currently the agreement is that Thomas and myself will do this for the > foreseeable future till we get some good-enough metric progress > * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM > (Marius, Adel if he can, Simon in the future). The idea is to see where > that could get us by using substantial manpower. > > 2) All committers: More
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
OK for me too. Simon On 9/3/18 10:31 AM, Thomas Mortagne wrote: Sounds good. On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol wrote: On 3 Sep 2018, at 09:55, Vincent Massol wrote: I propose to do this tomorrow Tuesday, starting with an intro from me, using youtube live. Say, 10AM Paris time. Thanks -Vincent WDYT? Thanks -Vincent On 30 Aug 2018, at 12:27, Adel Atallah wrote: Just to be clear, when I proposed "having a whole day dedicated on using these tools", I didn't meant having to have it every week but only once, so we can properly start improving the tests. It would be some kind of training. On my side I don't think I'll be able to have on a week one day dedicated to tests and one for bug fixing, I won't have time left for the roadmap as I will only work on the product 50% of the time. On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: Hi, I don’t remember discussing this with you Thomas. Actually I’m not convinced to have a fixed day: * we already have a fixed BFD and having a second one doesn’t leave much flexibility for working on roadmap items when it’s the best * test sessions can be short (0.5-1 hours) and it’s easy to do them between other tasks * it can be boring to spend a full day on them Now, I agree that not having a fixed day will make it hard to make sure that we work 20% on that topic. So if you prefer we can define a day, knowing that some won’t be able to always attend during that day and in this case they should do it on another day. What’s important is to have 20% done each week (i.e. enough work done on it). In term of day, if we have to choose one, I’d say Tuesday. That’s the most logical to me. WDYT? What do you prefer? Thanks -Vincent On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: Indeed we discussed this but I don't see it in your mail Vincent. On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah wrote: Hello, Maybe we should agree on having a whole day dedicated on using these tools with a maximum number of developers. That way we will be able to help each other and maybe it will make the process easier to carry out in the future. WDYT? Thanks, Adel On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: Hi devs (and anyone else interested to improve the tests of XWiki), History == It all started when I analyzed our global TPC and found that it was going down globally even though we have the fail-build-on-jacoco-threshold strategy. I sent several email threads: - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 Note: As a consequence of this last thread, I implemented a Jenkins Pipeline to send us a mail when the global TPC of an XWiki module goes down so that we fix it ASAP. This is still a development in progress. A first version is done and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and fix it (it’s not working ATM). As a result of the global TPC going down/stagnating, I have proposed to have 10.7 focused on Tests + BFD. - Initially I proposed to focus on increasing the global TPC by looking at the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red parts). - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked if we could instead focus on fixing tests as reported by Descartes to increase both coverage and mutation score (ie test quality), since those are 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we need to work on them and increase them substantially. See http://markmail.org/message/ejmdkf3hx7drkj52 The results of XWiki 10.7 has been quite poor on test improvements (more focus on BFD than tests, lots of devs on holidays, etc). This forces us to have a different strategy. Full Strategy proposal = 1) As many XWiki SAS devs as possible (and anyone else from the community who’s interested ofc! :)) should spend 1 day per week working on improving STAMP metrics * Currently the agreement is that Thomas and myself will do this for the foreseeable future till we get some good-enough metric progress * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM (Marius, Adel if he can, Simon in the future). The idea is to see where that could get us by using substantial manpower. 2) All committers: More generally the global TPC failure is also already active and dev need to modify modules that see their global TPC go down. 3) All committers: Of course, the jacoco strategy is also active at each module level. STAMP tools == There are 4 tools developed by STAMP: * Descartes: Improves quality of tests by increasing their mutation scores. See http://markmail.org/message/bonb5f7f37omnnog and also
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Sounds good. On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol wrote: > >> On 3 Sep 2018, at 09:55, Vincent Massol wrote: >> >> I propose to do this tomorrow Tuesday, starting with an intro from me, using >> youtube live. > > Say, 10AM Paris time. > > Thanks > -Vincent > >> WDYT? >> >> Thanks >> -Vincent >> >>> On 30 Aug 2018, at 12:27, Adel Atallah wrote: >>> >>> Just to be clear, when I proposed "having a whole day dedicated on >>> using these tools", I didn't meant having to have it every week but >>> only once, so we can properly start improving the tests. It would be >>> some kind of training. >>> On my side I don't think I'll be able to have on a week one day >>> dedicated to tests and one for bug fixing, I won't have time left for >>> the roadmap as I will only work on the product 50% of the time. >>> >>> >>> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: Hi, I don’t remember discussing this with you Thomas. Actually I’m not convinced to have a fixed day: * we already have a fixed BFD and having a second one doesn’t leave much flexibility for working on roadmap items when it’s the best * test sessions can be short (0.5-1 hours) and it’s easy to do them between other tasks * it can be boring to spend a full day on them Now, I agree that not having a fixed day will make it hard to make sure that we work 20% on that topic. So if you prefer we can define a day, knowing that some won’t be able to always attend during that day and in this case they should do it on another day. What’s important is to have 20% done each week (i.e. enough work done on it). In term of day, if we have to choose one, I’d say Tuesday. That’s the most logical to me. WDYT? What do you prefer? Thanks -Vincent > On 30 Aug 2018, at 10:38, Thomas Mortagne > wrote: > > Indeed we discussed this but I don't see it in your mail Vincent. > > On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah > wrote: >> Hello, >> >> Maybe we should agree on having a whole day dedicated on using these >> tools with a maximum number of developers. >> That way we will be able to help each other and maybe it will make the >> process easier to carry out in the future. >> >> WDYT? >> >> Thanks, >> Adel >> >> >> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol >> wrote: >>> Hi devs (and anyone else interested to improve the tests of XWiki), >>> >>> History >>> == >>> >>> It all started when I analyzed our global TPC and found that it was >>> going down globally even though we have the >>> fail-build-on-jacoco-threshold strategy. >>> >>> I sent several email threads: >>> >>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >>> >>> Note: As a consequence of this last thread, I implemented a Jenkins >>> Pipeline to send us a mail when the global TPC of an XWiki module goes >>> down so that we fix it ASAP. This is still a development in progress. A >>> first version is done and running at >>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and >>> fix it (it’s not working ATM). >>> >>> As a result of the global TPC going down/stagnating, I have proposed to >>> have 10.7 focused on Tests + BFD. >>> - Initially I proposed to focus on increasing the global TPC by looking >>> at the reports from 1) above >>> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at >>> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the >>> red parts). >>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >>> asked if we could instead focus on fixing tests as reported by >>> Descartes to increase both coverage and mutation score (ie test >>> quality), since those are 2 metrics/KPIs measured by STAMP and since >>> XWiki participates to STAMP we need to work on them and increase them >>> substantially. See http://markmail.org/message/ejmdkf3hx7drkj52 >>> >>> The results of XWiki 10.7 has been quite poor on test improvements >>> (more focus on BFD than tests, lots of devs on holidays, etc). This >>> forces us to have a different strategy. >>> >>> Full Strategy proposal >>> = >>> >>> 1) As many XWiki SAS devs as possible (and anyone else from the >>> community who’s interested ofc! :)) should spend 1 day per week working >>> on improving STAMP metrics >>> * Currently the agreement is that Thomas and myself will do this for >>> the foreseeable future till we get some good-enough metric progress >>> *
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
+1 On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol wrote: > >> On 3 Sep 2018, at 09:55, Vincent Massol wrote: >> >> I propose to do this tomorrow Tuesday, starting with an intro from me, using >> youtube live. > > Say, 10AM Paris time. > > Thanks > -Vincent > >> WDYT? >> >> Thanks >> -Vincent >> >>> On 30 Aug 2018, at 12:27, Adel Atallah wrote: >>> >>> Just to be clear, when I proposed "having a whole day dedicated on >>> using these tools", I didn't meant having to have it every week but >>> only once, so we can properly start improving the tests. It would be >>> some kind of training. >>> On my side I don't think I'll be able to have on a week one day >>> dedicated to tests and one for bug fixing, I won't have time left for >>> the roadmap as I will only work on the product 50% of the time. >>> >>> >>> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: Hi, I don’t remember discussing this with you Thomas. Actually I’m not convinced to have a fixed day: * we already have a fixed BFD and having a second one doesn’t leave much flexibility for working on roadmap items when it’s the best * test sessions can be short (0.5-1 hours) and it’s easy to do them between other tasks * it can be boring to spend a full day on them Now, I agree that not having a fixed day will make it hard to make sure that we work 20% on that topic. So if you prefer we can define a day, knowing that some won’t be able to always attend during that day and in this case they should do it on another day. What’s important is to have 20% done each week (i.e. enough work done on it). In term of day, if we have to choose one, I’d say Tuesday. That’s the most logical to me. WDYT? What do you prefer? Thanks -Vincent > On 30 Aug 2018, at 10:38, Thomas Mortagne > wrote: > > Indeed we discussed this but I don't see it in your mail Vincent. > > On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah > wrote: >> Hello, >> >> Maybe we should agree on having a whole day dedicated on using these >> tools with a maximum number of developers. >> That way we will be able to help each other and maybe it will make the >> process easier to carry out in the future. >> >> WDYT? >> >> Thanks, >> Adel >> >> >> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol >> wrote: >>> Hi devs (and anyone else interested to improve the tests of XWiki), >>> >>> History >>> == >>> >>> It all started when I analyzed our global TPC and found that it was >>> going down globally even though we have the >>> fail-build-on-jacoco-threshold strategy. >>> >>> I sent several email threads: >>> >>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >>> >>> Note: As a consequence of this last thread, I implemented a Jenkins >>> Pipeline to send us a mail when the global TPC of an XWiki module goes >>> down so that we fix it ASAP. This is still a development in progress. A >>> first version is done and running at >>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and >>> fix it (it’s not working ATM). >>> >>> As a result of the global TPC going down/stagnating, I have proposed to >>> have 10.7 focused on Tests + BFD. >>> - Initially I proposed to focus on increasing the global TPC by looking >>> at the reports from 1) above >>> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at >>> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the >>> red parts). >>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >>> asked if we could instead focus on fixing tests as reported by >>> Descartes to increase both coverage and mutation score (ie test >>> quality), since those are 2 metrics/KPIs measured by STAMP and since >>> XWiki participates to STAMP we need to work on them and increase them >>> substantially. See http://markmail.org/message/ejmdkf3hx7drkj52 >>> >>> The results of XWiki 10.7 has been quite poor on test improvements >>> (more focus on BFD than tests, lots of devs on holidays, etc). This >>> forces us to have a different strategy. >>> >>> Full Strategy proposal >>> = >>> >>> 1) As many XWiki SAS devs as possible (and anyone else from the >>> community who’s interested ofc! :)) should spend 1 day per week working >>> on improving STAMP metrics >>> * Currently the agreement is that Thomas and myself will do this for >>> the foreseeable future till we get some good-enough metric progress >>> * Some other
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
> On 3 Sep 2018, at 09:55, Vincent Massol wrote: > > I propose to do this tomorrow Tuesday, starting with an intro from me, using > youtube live. Say, 10AM Paris time. Thanks -Vincent > WDYT? > > Thanks > -Vincent > >> On 30 Aug 2018, at 12:27, Adel Atallah wrote: >> >> Just to be clear, when I proposed "having a whole day dedicated on >> using these tools", I didn't meant having to have it every week but >> only once, so we can properly start improving the tests. It would be >> some kind of training. >> On my side I don't think I'll be able to have on a week one day >> dedicated to tests and one for bug fixing, I won't have time left for >> the roadmap as I will only work on the product 50% of the time. >> >> >> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: >>> Hi, >>> >>> I don’t remember discussing this with you Thomas. Actually I’m not >>> convinced to have a fixed day: >>> * we already have a fixed BFD and having a second one doesn’t leave much >>> flexibility for working on roadmap items when it’s the best >>> * test sessions can be short (0.5-1 hours) and it’s easy to do them between >>> other tasks >>> * it can be boring to spend a full day on them >>> >>> Now, I agree that not having a fixed day will make it hard to make sure >>> that we work 20% on that topic. >>> >>> So if you prefer we can define a day, knowing that some won’t be able to >>> always attend during that day and in this case they should do it on another >>> day. What’s important is to have 20% done each week (i.e. enough work done >>> on it). >>> >>> In term of day, if we have to choose one, I’d say Tuesday. That’s the most >>> logical to me. >>> >>> WDYT? What do you prefer? >>> >>> Thanks >>> -Vincent >>> On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: Indeed we discussed this but I don't see it in your mail Vincent. On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah wrote: > Hello, > > Maybe we should agree on having a whole day dedicated on using these > tools with a maximum number of developers. > That way we will be able to help each other and maybe it will make the > process easier to carry out in the future. > > WDYT? > > Thanks, > Adel > > > On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol > wrote: >> Hi devs (and anyone else interested to improve the tests of XWiki), >> >> History >> == >> >> It all started when I analyzed our global TPC and found that it was >> going down globally even though we have the >> fail-build-on-jacoco-threshold strategy. >> >> I sent several email threads: >> >> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >> >> Note: As a consequence of this last thread, I implemented a Jenkins >> Pipeline to send us a mail when the global TPC of an XWiki module goes >> down so that we fix it ASAP. This is still a development in progress. A >> first version is done and running at >> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and >> fix it (it’s not working ATM). >> >> As a result of the global TPC going down/stagnating, I have proposed to >> have 10.7 focused on Tests + BFD. >> - Initially I proposed to focus on increasing the global TPC by looking >> at the reports from 1) above >> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at >> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red >> parts). >> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >> asked if we could instead focus on fixing tests as reported by Descartes >> to increase both coverage and mutation score (ie test quality), since >> those are 2 metrics/KPIs measured by STAMP and since XWiki participates >> to STAMP we need to work on them and increase them substantially. See >> http://markmail.org/message/ejmdkf3hx7drkj52 >> >> The results of XWiki 10.7 has been quite poor on test improvements >> (more focus on BFD than tests, lots of devs on holidays, etc). This >> forces us to have a different strategy. >> >> Full Strategy proposal >> = >> >> 1) As many XWiki SAS devs as possible (and anyone else from the >> community who’s interested ofc! :)) should spend 1 day per week working >> on improving STAMP metrics >> * Currently the agreement is that Thomas and myself will do this for the >> foreseeable future till we get some good-enough metric progress >> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM >> (Marius, Adel if he can, Simon in the future). The idea is to see where >> that could
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
I propose to do this tomorrow Tuesday, starting with an intro from me, using youtube live. WDYT? Thanks -Vincent > On 30 Aug 2018, at 12:27, Adel Atallah wrote: > > Just to be clear, when I proposed "having a whole day dedicated on > using these tools", I didn't meant having to have it every week but > only once, so we can properly start improving the tests. It would be > some kind of training. > On my side I don't think I'll be able to have on a week one day > dedicated to tests and one for bug fixing, I won't have time left for > the roadmap as I will only work on the product 50% of the time. > > > On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: >> Hi, >> >> I don’t remember discussing this with you Thomas. Actually I’m not convinced >> to have a fixed day: >> * we already have a fixed BFD and having a second one doesn’t leave much >> flexibility for working on roadmap items when it’s the best >> * test sessions can be short (0.5-1 hours) and it’s easy to do them between >> other tasks >> * it can be boring to spend a full day on them >> >> Now, I agree that not having a fixed day will make it hard to make sure that >> we work 20% on that topic. >> >> So if you prefer we can define a day, knowing that some won’t be able to >> always attend during that day and in this case they should do it on another >> day. What’s important is to have 20% done each week (i.e. enough work done >> on it). >> >> In term of day, if we have to choose one, I’d say Tuesday. That’s the most >> logical to me. >> >> WDYT? What do you prefer? >> >> Thanks >> -Vincent >> >>> On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: >>> >>> Indeed we discussed this but I don't see it in your mail Vincent. >>> >>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah >>> wrote: Hello, Maybe we should agree on having a whole day dedicated on using these tools with a maximum number of developers. That way we will be able to help each other and maybe it will make the process easier to carry out in the future. WDYT? Thanks, Adel On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: > Hi devs (and anyone else interested to improve the tests of XWiki), > > History > == > > It all started when I analyzed our global TPC and found that it was going > down globally even though we have the fail-build-on-jacoco-threshold > strategy. > > I sent several email threads: > > - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 > - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn > - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 > > Note: As a consequence of this last thread, I implemented a Jenkins > Pipeline to send us a mail when the global TPC of an XWiki module goes > down so that we fix it ASAP. This is still a development in progress. A > first version is done and running at > https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and > fix it (it’s not working ATM). > > As a result of the global TPC going down/stagnating, I have proposed to > have 10.7 focused on Tests + BFD. > - Initially I proposed to focus on increasing the global TPC by looking > at the reports from 1) above > (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at > https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red > parts). > - Then with the STAMP mid-term review, a bigger urgency surfaced and I > asked if we could instead focus on fixing tests as reported by Descartes > to increase both coverage and mutation score (ie test quality), since > those are 2 metrics/KPIs measured by STAMP and since XWiki participates > to STAMP we need to work on them and increase them substantially. See > http://markmail.org/message/ejmdkf3hx7drkj52 > > The results of XWiki 10.7 has been quite poor on test improvements (more > focus on BFD than tests, lots of devs on holidays, etc). This forces us > to have a different strategy. > > Full Strategy proposal > = > > 1) As many XWiki SAS devs as possible (and anyone else from the community > who’s interested ofc! :)) should spend 1 day per week working on > improving STAMP metrics > * Currently the agreement is that Thomas and myself will do this for the > foreseeable future till we get some good-enough metric progress > * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM > (Marius, Adel if he can, Simon in the future). The idea is to see where > that could get us by using substantial manpower. > > 2) All committers: More generally the global TPC failure is also already > active and dev need to modify modules that see their global TPC go down. > > 3) All
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Just to be clear, when I proposed "having a whole day dedicated on using these tools", I didn't meant having to have it every week but only once, so we can properly start improving the tests. It would be some kind of training. On my side I don't think I'll be able to have on a week one day dedicated to tests and one for bug fixing, I won't have time left for the roadmap as I will only work on the product 50% of the time. On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol wrote: > Hi, > > I don’t remember discussing this with you Thomas. Actually I’m not convinced > to have a fixed day: > * we already have a fixed BFD and having a second one doesn’t leave much > flexibility for working on roadmap items when it’s the best > * test sessions can be short (0.5-1 hours) and it’s easy to do them between > other tasks > * it can be boring to spend a full day on them > > Now, I agree that not having a fixed day will make it hard to make sure that > we work 20% on that topic. > > So if you prefer we can define a day, knowing that some won’t be able to > always attend during that day and in this case they should do it on another > day. What’s important is to have 20% done each week (i.e. enough work done on > it). > > In term of day, if we have to choose one, I’d say Tuesday. That’s the most > logical to me. > > WDYT? What do you prefer? > > Thanks > -Vincent > >> On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: >> >> Indeed we discussed this but I don't see it in your mail Vincent. >> >> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah >> wrote: >>> Hello, >>> >>> Maybe we should agree on having a whole day dedicated on using these >>> tools with a maximum number of developers. >>> That way we will be able to help each other and maybe it will make the >>> process easier to carry out in the future. >>> >>> WDYT? >>> >>> Thanks, >>> Adel >>> >>> >>> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: Hi devs (and anyone else interested to improve the tests of XWiki), History == It all started when I analyzed our global TPC and found that it was going down globally even though we have the fail-build-on-jacoco-threshold strategy. I sent several email threads: - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 Note: As a consequence of this last thread, I implemented a Jenkins Pipeline to send us a mail when the global TPC of an XWiki module goes down so that we fix it ASAP. This is still a development in progress. A first version is done and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and fix it (it’s not working ATM). As a result of the global TPC going down/stagnating, I have proposed to have 10.7 focused on Tests + BFD. - Initially I proposed to focus on increasing the global TPC by looking at the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red parts). - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked if we could instead focus on fixing tests as reported by Descartes to increase both coverage and mutation score (ie test quality), since those are 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we need to work on them and increase them substantially. See http://markmail.org/message/ejmdkf3hx7drkj52 The results of XWiki 10.7 has been quite poor on test improvements (more focus on BFD than tests, lots of devs on holidays, etc). This forces us to have a different strategy. Full Strategy proposal = 1) As many XWiki SAS devs as possible (and anyone else from the community who’s interested ofc! :)) should spend 1 day per week working on improving STAMP metrics * Currently the agreement is that Thomas and myself will do this for the foreseeable future till we get some good-enough metric progress * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM (Marius, Adel if he can, Simon in the future). The idea is to see where that could get us by using substantial manpower. 2) All committers: More generally the global TPC failure is also already active and dev need to modify modules that see their global TPC go down. 3) All committers: Of course, the jacoco strategy is also active at each module level. STAMP tools == There are 4 tools developed by STAMP: * Descartes: Improves quality of tests by increasing their mutation scores. See http://markmail.org/message/bonb5f7f37omnnog and also
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Hi, I don’t remember discussing this with you Thomas. Actually I’m not convinced to have a fixed day: * we already have a fixed BFD and having a second one doesn’t leave much flexibility for working on roadmap items when it’s the best * test sessions can be short (0.5-1 hours) and it’s easy to do them between other tasks * it can be boring to spend a full day on them Now, I agree that not having a fixed day will make it hard to make sure that we work 20% on that topic. So if you prefer we can define a day, knowing that some won’t be able to always attend during that day and in this case they should do it on another day. What’s important is to have 20% done each week (i.e. enough work done on it). In term of day, if we have to choose one, I’d say Tuesday. That’s the most logical to me. WDYT? What do you prefer? Thanks -Vincent > On 30 Aug 2018, at 10:38, Thomas Mortagne wrote: > > Indeed we discussed this but I don't see it in your mail Vincent. > > On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah wrote: >> Hello, >> >> Maybe we should agree on having a whole day dedicated on using these >> tools with a maximum number of developers. >> That way we will be able to help each other and maybe it will make the >> process easier to carry out in the future. >> >> WDYT? >> >> Thanks, >> Adel >> >> >> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: >>> Hi devs (and anyone else interested to improve the tests of XWiki), >>> >>> History >>> == >>> >>> It all started when I analyzed our global TPC and found that it was going >>> down globally even though we have the fail-build-on-jacoco-threshold >>> strategy. >>> >>> I sent several email threads: >>> >>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >>> >>> Note: As a consequence of this last thread, I implemented a Jenkins >>> Pipeline to send us a mail when the global TPC of an XWiki module goes down >>> so that we fix it ASAP. This is still a development in progress. A first >>> version is done and running at https://ci.xwiki.org/view/Tools/job/Clover/ >>> but I need to debug it and fix it (it’s not working ATM). >>> >>> As a result of the global TPC going down/stagnating, I have proposed to >>> have 10.7 focused on Tests + BFD. >>> - Initially I proposed to focus on increasing the global TPC by looking at >>> the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). >>> See the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we >>> need to fix the red parts). >>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >>> asked if we could instead focus on fixing tests as reported by Descartes to >>> increase both coverage and mutation score (ie test quality), since those >>> are 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP >>> we need to work on them and increase them substantially. See >>> http://markmail.org/message/ejmdkf3hx7drkj52 >>> >>> The results of XWiki 10.7 has been quite poor on test improvements (more >>> focus on BFD than tests, lots of devs on holidays, etc). This forces us to >>> have a different strategy. >>> >>> Full Strategy proposal >>> = >>> >>> 1) As many XWiki SAS devs as possible (and anyone else from the community >>> who’s interested ofc! :)) should spend 1 day per week working on improving >>> STAMP metrics >>> * Currently the agreement is that Thomas and myself will do this for the >>> foreseeable future till we get some good-enough metric progress >>> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM >>> (Marius, Adel if he can, Simon in the future). The idea is to see where >>> that could get us by using substantial manpower. >>> >>> 2) All committers: More generally the global TPC failure is also already >>> active and dev need to modify modules that see their global TPC go down. >>> >>> 3) All committers: Of course, the jacoco strategy is also active at each >>> module level. >>> >>> STAMP tools >>> == >>> >>> There are 4 tools developed by STAMP: >>> * Descartes: Improves quality of tests by increasing their mutation scores. >>> See http://markmail.org/message/bonb5f7f37omnnog and also >>> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >>> * DSpot: Automatically generate new tests, based on existing tests. See >>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot >>> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and >>> execute tests on the software to see if the mutation works or not. Note >>> this is currently not fitting the need of XWiki and thus I’ve been >>> developing another tool as an experiment (which may go back in CAMP one >>> day), based on TestContainers, see >>>
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Indeed we discussed this but I don't see it in your mail Vincent. On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah wrote: > Hello, > > Maybe we should agree on having a whole day dedicated on using these > tools with a maximum number of developers. > That way we will be able to help each other and maybe it will make the > process easier to carry out in the future. > > WDYT? > > Thanks, > Adel > > > On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: >> Hi devs (and anyone else interested to improve the tests of XWiki), >> >> History >> == >> >> It all started when I analyzed our global TPC and found that it was going >> down globally even though we have the fail-build-on-jacoco-threshold >> strategy. >> >> I sent several email threads: >> >> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >> >> Note: As a consequence of this last thread, I implemented a Jenkins Pipeline >> to send us a mail when the global TPC of an XWiki module goes down so that >> we fix it ASAP. This is still a development in progress. A first version is >> done and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need >> to debug it and fix it (it’s not working ATM). >> >> As a result of the global TPC going down/stagnating, I have proposed to have >> 10.7 focused on Tests + BFD. >> - Initially I proposed to focus on increasing the global TPC by looking at >> the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). >> See the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we >> need to fix the red parts). >> - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked >> if we could instead focus on fixing tests as reported by Descartes to >> increase both coverage and mutation score (ie test quality), since those are >> 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we >> need to work on them and increase them substantially. See >> http://markmail.org/message/ejmdkf3hx7drkj52 >> >> The results of XWiki 10.7 has been quite poor on test improvements (more >> focus on BFD than tests, lots of devs on holidays, etc). This forces us to >> have a different strategy. >> >> Full Strategy proposal >> = >> >> 1) As many XWiki SAS devs as possible (and anyone else from the community >> who’s interested ofc! :)) should spend 1 day per week working on improving >> STAMP metrics >> * Currently the agreement is that Thomas and myself will do this for the >> foreseeable future till we get some good-enough metric progress >> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM >> (Marius, Adel if he can, Simon in the future). The idea is to see where that >> could get us by using substantial manpower. >> >> 2) All committers: More generally the global TPC failure is also already >> active and dev need to modify modules that see their global TPC go down. >> >> 3) All committers: Of course, the jacoco strategy is also active at each >> module level. >> >> STAMP tools >> == >> >> There are 4 tools developed by STAMP: >> * Descartes: Improves quality of tests by increasing their mutation scores. >> See http://markmail.org/message/bonb5f7f37omnnog and also >> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >> * DSpot: Automatically generate new tests, based on existing tests. See >> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot >> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and >> execute tests on the software to see if the mutation works or not. Note this >> is currently not fitting the need of XWiki and thus I’ve been developing >> another tool as an experiment (which may go back in CAMP one day), based on >> TestContainers, see >> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations >> * EvoCrash: Takes a stack trace from production logs and generates a test >> that, when executed, reproduces the crash. See >> https://markmail.org/message/v74g3tsmflquqwra. See also >> https://github.com/SERG-Delft/EvoCrash >> >> Since XWiki is part of the STAMP research project, we need to use those 4 >> tools to increase the KPIs associated with the tools. See below. >> >> Objectives/KPIs/Metrics for STAMP >> === >> >> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need >> to work on: >> >> 1) K01: Increase test coverage >> * Global increase by reducing by 40% the non-covered code. For XWiki since >> we’re at about 70%, this means reaching about 80% before the end of STAMP >> (ie. before end of 2019) >> * Increase the coverage contributions of each tool developed by STAMP. >> >> Strategy: >> * Primary goal: >> ** Increase coverage by executing Descartes and improving our tests. This is >>
Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy
Hello, Maybe we should agree on having a whole day dedicated on using these tools with a maximum number of developers. That way we will be able to help each other and maybe it will make the process easier to carry out in the future. WDYT? Thanks, Adel On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol wrote: > Hi devs (and anyone else interested to improve the tests of XWiki), > > History > == > > It all started when I analyzed our global TPC and found that it was going > down globally even though we have the fail-build-on-jacoco-threshold strategy. > > I sent several email threads: > > - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 > - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn > - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 > > Note: As a consequence of this last thread, I implemented a Jenkins Pipeline > to send us a mail when the global TPC of an XWiki module goes down so that we > fix it ASAP. This is still a development in progress. A first version is done > and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to > debug it and fix it (it’s not working ATM). > > As a result of the global TPC going down/stagnating, I have proposed to have > 10.7 focused on Tests + BFD. > - Initially I proposed to focus on increasing the global TPC by looking at > the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See > the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need > to fix the red parts). > - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked > if we could instead focus on fixing tests as reported by Descartes to > increase both coverage and mutation score (ie test quality), since those are > 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we > need to work on them and increase them substantially. See > http://markmail.org/message/ejmdkf3hx7drkj52 > > The results of XWiki 10.7 has been quite poor on test improvements (more > focus on BFD than tests, lots of devs on holidays, etc). This forces us to > have a different strategy. > > Full Strategy proposal > = > > 1) As many XWiki SAS devs as possible (and anyone else from the community > who’s interested ofc! :)) should spend 1 day per week working on improving > STAMP metrics > * Currently the agreement is that Thomas and myself will do this for the > foreseeable future till we get some good-enough metric progress > * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM > (Marius, Adel if he can, Simon in the future). The idea is to see where that > could get us by using substantial manpower. > > 2) All committers: More generally the global TPC failure is also already > active and dev need to modify modules that see their global TPC go down. > > 3) All committers: Of course, the jacoco strategy is also active at each > module level. > > STAMP tools > == > > There are 4 tools developed by STAMP: > * Descartes: Improves quality of tests by increasing their mutation scores. > See http://markmail.org/message/bonb5f7f37omnnog and also > https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes > * DSpot: Automatically generate new tests, based on existing tests. See > https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot > * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and > execute tests on the software to see if the mutation works or not. Note this > is currently not fitting the need of XWiki and thus I’ve been developing > another tool as an experiment (which may go back in CAMP one day), based on > TestContainers, see > https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations > * EvoCrash: Takes a stack trace from production logs and generates a test > that, when executed, reproduces the crash. See > https://markmail.org/message/v74g3tsmflquqwra. See also > https://github.com/SERG-Delft/EvoCrash > > Since XWiki is part of the STAMP research project, we need to use those 4 > tools to increase the KPIs associated with the tools. See below. > > Objectives/KPIs/Metrics for STAMP > === > > The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need > to work on: > > 1) K01: Increase test coverage > * Global increase by reducing by 40% the non-covered code. For XWiki since > we’re at about 70%, this means reaching about 80% before the end of STAMP > (ie. before end of 2019) > * Increase the coverage contributions of each tool developed by STAMP. > > Strategy: > * Primary goal: > ** Increase coverage by executing Descartes and improving our tests. This is > http://markmail.org/message/ejmdkf3hx7drkj52 > ** Don’t do anything with DSpot. I’ll do that part. Note that the goal is to > write a Jenkins pipeline to automatically execute DSpot from time to time and > commit the generated tests in a separate