Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
> I have now done this in Isabelle/705d4c4003ea. > > It means that ISABELLE_FULL_TEST no longer occurs in the main Isabelle > repository. The extra slow sessions are in ~~/src/Benchmarks, and only > tested by special background jobs that take care of that. (Presently none!) Thanks. I will add a job to the Jenkins during the next maintenance window (some time next week). With the current state of the instrumentation, I need to shut down Jenkins for most kinds of changes. Unfortunately that means I can't deploy a benchmark job right now. (This will also change during the next maintenance window.) Cheers Lars ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
On Wed, 3 Feb 2016, Jasmin Blanchette wrote: I have a slight worry concerning the characterization and naming as "benchmarks". From what I understand, a "benchmark" is a theory file that tests the performance of the system. I have introduced this terminology many years ago. The idea is that we always do a proper test of all tools, libraries, examples, either via old-school "isabelle build -a" or some immediate test service (formerly "testboard"). The things that are run in addition as "nightly build" are slow (and thus getting in the way), but are not relevant for actual testing. Since the main purpose of nightly builds is to collect reliably performance figures, the outcome of these sessions is characterized as "benchmarks". Since the measurements of isatest have become very inaccurate in recent years, this key purpose the background tests has been forgotten. We are flying blind, especially for AFP. Good performance records are important, especially due to the trend to shrink consumer devices more and more. I don't mind if there is a more fitting name for ~~/src/Additional_Examples_That_Are_Often_Slow_But_Not_Really_Relevant instead of ~~/src/Benchmarks. Getting good performance figures back -- stored over in the long term -- is really important, though. Special setup: Sudoku What is special about Sudoku? The relevant changeset is this: changeset: 58331:054e9a9fccad user:blanchet date:Fri Sep 12 17:51:31 2014 +0200 files: src/HOL/ROOT description: enabled 'Sudoku' only with 'ISABELLE_FULL_TEST' -- Sudoku is fast enough on modern hardware (within seconds on my MacBook), but it seems to fail on older test machines SMT_Tests (requires Z3) The relevant changeset is this: changeset: 50666:6f48853f08d5 user:blanchet date:Wed Jan 02 09:31:25 2013 +0100 files: src/HOL/ROOT src/HOL/SMT_Examples/SMT_Examples.thy src/HOL/SMT_Examples/SMT_Tests.certs src/HOL/SMT_Examples/SMT_Tests.thy description: actually run Z3 for "SMT_Tests" when "ISABELLE_FULL_TEST" is enabled In 2013 we did not have Z3 as default component, usable for everybody. Now the condition on it is always true -- z3 is always enabled. The theory takes 40s elapsed time on my machine. That could qualify it as optional "benchmark", but I don't mind to have it by default. We have other much bigger things all the time. Slow: Brackin IsaFoR Misc_N2M Probably also in the slow category (the last three might have a benchmarking component): Find_Unused_Assms_Examples Needham_Schroeder_No_Attacker_Example Needham_Schroeder_Guided_Attacker_Example Needham_Schroeder_Unguided_Attacker_Example Slow and benchmarking: Record_Benchmark These are the genuine examples where I understand the arrangement as "test this continuously in the background, and record good performance figures for it". Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
> What is special about Sudoku? You're right, probably nothing. >> SMT_Tests (requires Z3) > ... > In 2013 we did not have Z3 as default component, usable for everybody. Now > the condition on it is always true -- z3 is always enabled. We could indeed enable it by default. >> Slow: >> Brackin >> IsaFoR >> Misc_N2M >> >> Probably also in the slow category (the last three might have a benchmarking >> component): >> Find_Unused_Assms_Examples >> Needham_Schroeder_No_Attacker_Example >> Needham_Schroeder_Guided_Attacker_Example >> Needham_Schroeder_Unguided_Attacker_Example >> >> Slow and benchmarking: >> Record_Benchmark > > These are the genuine examples where I understand the arrangement as "test > this continuously in the background, and record good performance figures for > it". I guess in that sense they are "Benchmarks". For "Brackin" etc., we were just interested in checking that they work at all -- and took them out when we saw that they were slow. But they do serve a dual purpose, esp. that "working at all" implies "not being *too* slow". So I won't object anymore to their characterization as "benchmarks". Jasmin ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
>> What remains are various benchmarks. We could move these sessions to >> ~~/src/Benchmarks and omit that directory in ~~/ROOTS. Thus it can be >> included on demand like this: >> >> isabelle build -d '~~/src/Benchmarks' -a >> >> or like this: >> >> isabelle build -D '~~/src/Benchmarks' >> >> This avoids more sophisticated Boolean algebra on session groups. It >> also makes clear from the source structure, what is normally not tested. >> >> I basically do the same with -d '$AFP' all the time. > > Sounds reasonable. There's no pressure to do it just yet; I'd be happy > to wait for post-release mode. I also like that. FYI in my test setup there is currently a particular isabelle tool with the following content: > #!/usr/bin/env bash > # > # DESCRIPTION: pragmatic combined build of distribution and afp > > ISABELLE_FULL_TEST=true "${ISABELLE_TOOL}" build -d "${AFP}" -x Flyspeck-Tame > "$@" > ISABELLE_FULL_TEST= "${ISABELLE_TOOL}" build -d "${AFP}" Flyspeck-Tame Maybe that distinction could also be tackled by an appropriate session structure and / or grouping. Florian -- PGP available: http://isabelle.in.tum.de/~haftmann/pgp/florian_haftmann_at_informatik_tu_muenchen_de signature.asc Description: OpenPGP digital signature ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
> What remains are various benchmarks. We could move these sessions to > ~~/src/Benchmarks and omit that directory in ~~/ROOTS. Thus it can be > included on demand like this: > > isabelle build -d '~~/src/Benchmarks' -a > > or like this: > > isabelle build -D '~~/src/Benchmarks' > > This avoids more sophisticated Boolean algebra on session groups. It > also makes clear from the source structure, what is normally not tested. > > I basically do the same with -d '$AFP' all the time. Sounds reasonable. There's no pressure to do it just yet; I'd be happy to wait for post-release mode. Cheers Lars ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
On Sun, 31 Jan 2016, Lars Hupel wrote: The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc measures to get meaningful interactive tests, without wasting too much time. It probably makes sense to imitate that for "testboard" setups. The idea is to distribute the build load over multiple machines, since that isn't supported by "isabelle build". This is generally helpful – even if we get better hardware – because distributing jobs is trivial with Jenkins and there's no reason we shouldn't. The shorter the build time, the better. Some years ago we were in a similar situation of outdated hardware, and Isabelle + AFP sessions growing beyond feasibility. So we started to speculate towards distributed "make". When Isabelle "build" took over in 2012, the multicore support of Poly/ML + Isabelle/ML was already so advanced that the question was irrelevant. A full test of Isabelle + AFP required less than 1h on a plain and basic 2 * 4 core machine (macbroy2). Now we have much bigger Isabelle + AFP sessions, and still the same old hardware. Clustering weak nodes always requires more administrative efforts than just one or two fat nodes without special tricks. Building with SML/NJ configuration is trivial. It would be "just another job" in Jenkins. I'd just need to know about what to build precisely. Can you explain to me what the ML_SYSTEM_POLYML variable means? I full hypersearch over Isabelle + AFP confirms that the documentation in the "system" manual is still correct: \<^descr>[@{setting_def ML_SYSTEM_POLYML}\\<^sup>*\] is \<^verbatim>\true\ for @{setting ML_SYSTEM} values derived from Poly/ML, as opposed to SML/NJ where it is empty. This is particularly useful with the build option @{system_option condition} (\secref{sec:system-options}) to restrict big sessions to something that SML/NJ can still handle. Concerning what "SML/NJ can still handle", see also https://bitbucket.org/isabelle_project/isabelle-release/commits/a4e6ea45f416 That commit is only on the isabelle-release repository. It will come back via merge in something like 2 weeks. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
Maybe it’s time. They’ve not made it better, and it seems pretty hopeless now. Larry > On 31 Jan 2016, at 13:41, Makariuswrote: > > We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session for > SML/NJ. > > Or we could discontinue SML/NJ altogether. I proposed that the last time 10 > years ago. ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
> The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc > measures to get meaningful interactive tests, without wasting too much > time. It probably makes sense to imitate that for "testboard" setups. The idea is to distribute the build load over multiple machines, since that isn't supported by "isabelle build". This is generally helpful – even if we get better hardware – because distributing jobs is trivial with Jenkins and there's no reason we shouldn't. The shorter the build time, the better. > For regular batch tests in the style of "isatest", the main purpose is > to accumulate accurate timing information, and preserve that in a simple > format over a long time. This has been forgotten in recent years, since > measurements have become more and more messed up. So performance-wise, we > are flying blind, especially for AFP. I can't yet say with confidence that we now have a setup with reliable performance numbers. In a couple of days I will look at the numbers to get an idea of their variance. > A related question is what to do with SML/NJ and the ML_SYSTEM_POLYML > variable. For the Isabelle2016 release, I've tried to build everything > with SML/NJ as usual, but many sessions failed, even just HOL-Library. > > We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session > for SML/NJ. > > Or we could discontinue SML/NJ altogether. I proposed that the last > time 10 years ago. Building with SML/NJ configuration is trivial. It would be "just another job" in Jenkins. I'd just need to know about what to build precisely. Can you explain to me what the ML_SYSTEM_POLYML variable means? Cheers Lars ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
On Sun, 31 Jan 2016, Lars Hupel wrote: unexpected exception (bug?) in SML/NJ: Io [Io: openIn failed on "/build/smlnj-tOOpSy/smlnj-110.76/sml.boot.x86-unix/smlnj/basis/.cm/x86-unix/basis.cm", No such file or directory] raised at: Basis/Implementation/IO/bin-io-fn.sml:617.25-617.71 ../cm/util/safeio.sml:30.11 ../compiler/TopLevel/interact/evalloop.sml:44.55 Settings are: ML_SYSTEM=smlnj-110 ML_HOME="/usr/lib/smlnj/bin" /usr/lib probably means that this is a "package" on the target system, and thus it is usually broken by default. I've never used anything else than a manual build of SML/NJ from the official sources. It usually works out of the box, unchanged for many decades. Interestingly enough, "isabelle build" doesn't complain. The error only becomes apparent in the log file. Total existence failure of the outermost ML executable can lead to problems with the Isabelle wrapper scripts. The same can happen for Poly/ML under certain circumstances. We are normally testing Isabelle applications, not ML installations. Nonetheless, I hope to make this more robust at some point, by replacing the shell scripts with Isabelle/Scala. Not having SML/NJ anymore could accelerate that move. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
> Building with SML/NJ configuration is trivial. It would be "just another > job" in Jenkins. I'd just need to know about what to build precisely. > Can you explain to me what the ML_SYSTEM_POLYML variable means? I was speaking too early. Here's what I get when trying to build Pure with SML/NJ: Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014] !* unable to process `' (unknown extension `') - [autoloading] unexpected exception (bug?) in SML/NJ: Io [Io: openIn failed on "/build/smlnj-tOOpSy/smlnj-110.76/sml.boot.x86-unix/smlnj/basis/.cm/x86-unix/basis.cm", No such file or directory] raised at: Basis/Implementation/IO/bin-io-fn.sml:617.25-617.71 ../cm/util/safeio.sml:30.11 ../compiler/TopLevel/interact/evalloop.sml:44.55 Settings are: ML_SYSTEM=smlnj-110 ML_HOME="/usr/lib/smlnj/bin" ML_OPTIONS="@SMLdebug=/dev/null @SMLalloc=1024" ML_PLATFORM=$(eval $("$ML_HOME/.arch-n-opsys" 2>/dev/null); echo "$HEAP_SUFFIX") Interestingly enough, "isabelle build" doesn't complain. The error only becomes apparent in the log file. Cheers Lars ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
[isabelle-dev] Grouping ISABELLE_FULL_TEST?
As of 527488dc8b90, there are five sessions which contain theories that are only processed under ISABELLE_FULL_TEST=true: HOL-ex Sudoku HOL-Datatype_Examples Brackin IsaFoR Misc_N2M HOL-Word-SMT_Examples SMT_Tests HOL-Quickcheck_Benchmark Find_Unused_Assms_Examples Needham_Schroeder_No_Attacker_Example Needham_Schroeder_Guided_Attacker_Example Needham_Schroeder_Unguided_Attacker_Example HOL-Record_Benchmark Record_Benchmark Most of these appear to be benchmarks of some sort. Currently, the new CI setup doesn't set the ISABELLE_FULL_TEST flag. I think it would be useful to run those, but I'd like to split them up into a separate run so that they can be executed in parallel to the regular makeall. To that end, I would suggest grouping these sessions in 'full'. I could then run export ISABELLE_FULL_TEST=true isabelle build -v -g full (In a similar fashion, I run 'slow' sessions separately from the others.) If nobody complains, I'll add the group. Cheers Lars ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?
On Sun, 31 Jan 2016, Lars Hupel wrote: As of 527488dc8b90, there are five sessions which contain theories that are only processed under ISABELLE_FULL_TEST=true: HOL-ex Sudoku HOL-Datatype_Examples Brackin IsaFoR Misc_N2M HOL-Word-SMT_Examples SMT_Tests HOL-Quickcheck_Benchmark Find_Unused_Assms_Examples Needham_Schroeder_No_Attacker_Example Needham_Schroeder_Guided_Attacker_Example Needham_Schroeder_Unguided_Attacker_Example HOL-Record_Benchmark Record_Benchmark Most of these appear to be benchmarks of some sort. Currently, the new CI setup doesn't set the ISABELLE_FULL_TEST flag. I think it would be useful to run those, but I'd like to split them up into a separate run so that they can be executed in parallel to the regular makeall. To that end, I would suggest grouping these sessions in 'full'. I could then run export ISABELLE_FULL_TEST=true isabelle build -v -g full (In a similar fashion, I run 'slow' sessions separately from the others.) We have accumulated a fair amount of complication here. There was a time when it was even more complication, but we managed to recover from it by substantial improvements of Poly/ML (multithreading and parallel GC), elimination of "make", and introduction of up-to-date hardware. The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc measures to get meaningful interactive tests, without wasting too much time. It probably makes sense to imitate that for "testboard" setups. For regular batch tests in the style of "isatest", the main purpose is to accumulate accurate timing information, and preserve that in a simple format over a long time. This has been forgotten in recent years, since measurements have become more and more messed up. So performance-wise, we are flying blind, especially for AFP. A related question is what to do with SML/NJ and the ML_SYSTEM_POLYML variable. For the Isabelle2016 release, I've tried to build everything with SML/NJ as usual, but many sessions failed, even just HOL-Library. We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session for SML/NJ. Or we could discontinue SML/NJ altogether. I proposed that the last time 10 years ago. Here are also the canonical meta questions: * What is really required? * What is the simplest way to achieve that? * What can be discontinued? Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev