Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-02-13 Thread Lars Hupel
> I have now done this in Isabelle/705d4c4003ea.
> 
> It means that ISABELLE_FULL_TEST no longer occurs in the main Isabelle
> repository. The extra slow sessions are in ~~/src/Benchmarks, and only
> tested by special background jobs that take care of that. (Presently none!)

Thanks. I will add a job to the Jenkins during the next maintenance
window (some time next week).

With the current state of the instrumentation, I need to shut down
Jenkins for most kinds of changes. Unfortunately that means I can't
deploy a benchmark job right now.

(This will also change during the next maintenance window.)

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-02-03 Thread Makarius

On Wed, 3 Feb 2016, Jasmin Blanchette wrote:

I have a slight worry concerning the characterization and naming as 
"benchmarks". From what I understand, a "benchmark" is a theory file 
that tests the performance of the system.


I have introduced this terminology many years ago.  The idea is that we 
always do a proper test of all tools, libraries, examples, either via 
old-school "isabelle build -a" or some immediate test service (formerly 
"testboard").


The things that are run in addition as "nightly build" are slow (and thus 
getting in the way), but are not relevant for actual testing.  Since the 
main purpose of nightly builds is to collect reliably performance figures, 
the outcome of these sessions is characterized as "benchmarks".


Since the measurements of isatest have become very inaccurate in recent 
years, this key purpose the background tests has been forgotten.  We are 
flying blind, especially for AFP.  Good performance records are important, 
especially due to the trend to shrink consumer devices more and more.



I don't mind if there is a more fitting name for
~~/src/Additional_Examples_That_Are_Often_Slow_But_Not_Really_Relevant
instead of ~~/src/Benchmarks.

Getting good performance figures back -- stored over in the long term -- 
is really important, though.




Special setup:
Sudoku


What is special about Sudoku?  The relevant changeset is this:

changeset:   58331:054e9a9fccad
user:blanchet
date:Fri Sep 12 17:51:31 2014 +0200
files:   src/HOL/ROOT
description:
enabled 'Sudoku' only with 'ISABELLE_FULL_TEST' -- Sudoku is fast enough 
on modern hardware (within seconds on my MacBook), but it seems to fail on 
older test machines




SMT_Tests (requires Z3)


The relevant changeset is this:

changeset:   50666:6f48853f08d5
user:blanchet
date:Wed Jan 02 09:31:25 2013 +0100
files:   src/HOL/ROOT src/HOL/SMT_Examples/SMT_Examples.thy 
src/HOL/SMT_Examples/SMT_Tests.certs src/HOL/SMT_Examples/SMT_Tests.thy

description:
actually run Z3 for "SMT_Tests" when "ISABELLE_FULL_TEST" is enabled


In 2013 we did not have Z3 as default component, usable for everybody. Now 
the condition on it is always true -- z3 is always enabled.


The theory takes 40s elapsed time on my machine.  That could qualify it as 
optional "benchmark", but I don't mind to have it by default.  We have 
other much bigger things all the time.




Slow:
 Brackin
 IsaFoR
 Misc_N2M

Probably also in the slow category (the last three might have a benchmarking 
component):
 Find_Unused_Assms_Examples
 Needham_Schroeder_No_Attacker_Example
 Needham_Schroeder_Guided_Attacker_Example
 Needham_Schroeder_Unguided_Attacker_Example

Slow and benchmarking:
Record_Benchmark


These are the genuine examples where I understand the arrangement as "test 
this continuously in the background, and record good performance figures 
for it".



Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-02-03 Thread Jasmin Blanchette
> What is special about Sudoku?

You're right, probably nothing.

>> SMT_Tests (requires Z3)
> ...
> In 2013 we did not have Z3 as default component, usable for everybody. Now 
> the condition on it is always true -- z3 is always enabled.

We could indeed enable it by default.

>> Slow:
>> Brackin
>> IsaFoR
>> Misc_N2M
>> 
>> Probably also in the slow category (the last three might have a benchmarking 
>> component):
>> Find_Unused_Assms_Examples
>> Needham_Schroeder_No_Attacker_Example
>> Needham_Schroeder_Guided_Attacker_Example
>> Needham_Schroeder_Unguided_Attacker_Example
>> 
>> Slow and benchmarking:
>> Record_Benchmark
> 
> These are the genuine examples where I understand the arrangement as "test 
> this continuously in the background, and record good performance figures for 
> it".

I guess in that sense they are "Benchmarks". For "Brackin" etc., we were just 
interested in checking that they work at all -- and took them out when we saw 
that they were slow. But they do serve a dual purpose, esp. that "working at 
all" implies "not being *too* slow". So I won't object anymore to their 
characterization as "benchmarks".

Jasmin

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-02-02 Thread Florian Haftmann
>> What remains are various benchmarks.  We could move these sessions to
>> ~~/src/Benchmarks and omit that directory in ~~/ROOTS.  Thus it can be
>> included on demand like this:
>>
>>   isabelle build -d '~~/src/Benchmarks' -a
>>
>> or like this:
>>
>>   isabelle build -D '~~/src/Benchmarks'
>>
>> This avoids more sophisticated Boolean algebra on session groups.  It
>> also makes clear from the source structure, what is normally not tested.
>>
>> I basically do the same with -d '$AFP' all the time.
> 
> Sounds reasonable. There's no pressure to do it just yet; I'd be happy
> to wait for post-release mode.

I also like that.

FYI in my test setup there is currently a particular isabelle tool with
the following content:

> #!/usr/bin/env bash
> #
> # DESCRIPTION: pragmatic combined build of distribution and afp
> 
> ISABELLE_FULL_TEST=true "${ISABELLE_TOOL}" build -d "${AFP}" -x Flyspeck-Tame 
> "$@"
> ISABELLE_FULL_TEST= "${ISABELLE_TOOL}" build -d "${AFP}" Flyspeck-Tame

Maybe that distinction could also be tackled by an appropriate session
structure and / or grouping.

Florian

-- 

PGP available:
http://isabelle.in.tum.de/~haftmann/pgp/florian_haftmann_at_informatik_tu_muenchen_de



signature.asc
Description: OpenPGP digital signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-02-02 Thread Lars Hupel
> What remains are various benchmarks.  We could move these sessions to
> ~~/src/Benchmarks and omit that directory in ~~/ROOTS.  Thus it can be
> included on demand like this:
> 
>   isabelle build -d '~~/src/Benchmarks' -a
> 
> or like this:
> 
>   isabelle build -D '~~/src/Benchmarks'
> 
> This avoids more sophisticated Boolean algebra on session groups.  It
> also makes clear from the source structure, what is normally not tested.
> 
> I basically do the same with -d '$AFP' all the time.

Sounds reasonable. There's no pressure to do it just yet; I'd be happy
to wait for post-release mode.

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Makarius

On Sun, 31 Jan 2016, Lars Hupel wrote:


The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc
measures to get meaningful interactive tests, without wasting too much
time.  It probably makes sense to imitate that for "testboard" setups.


The idea is to distribute the build load over multiple machines, since 
that isn't supported by "isabelle build". This is generally helpful – 
even if we get better hardware – because distributing jobs is trivial 
with Jenkins and there's no reason we shouldn't. The shorter the build 
time, the better.


Some years ago we were in a similar situation of outdated hardware, and 
Isabelle + AFP sessions growing beyond feasibility.  So we started to 
speculate towards distributed "make".


When Isabelle "build" took over in 2012, the multicore support of Poly/ML
+ Isabelle/ML was already so advanced that the question was irrelevant.
A full test of Isabelle + AFP required less than 1h on a plain and basic
2 * 4 core machine (macbroy2).

Now we have much bigger Isabelle + AFP sessions, and still the same old 
hardware.



Clustering weak nodes always requires more administrative efforts than 
just one or two fat nodes without special tricks.




Building with SML/NJ configuration is trivial. It would be "just another 
job" in Jenkins. I'd just need to know about what to build precisely. 
Can you explain to me what the ML_SYSTEM_POLYML variable means?


I full hypersearch over Isabelle + AFP confirms that the documentation in 
the "system" manual is still correct:


  \<^descr>[@{setting_def ML_SYSTEM_POLYML}\\<^sup>*\] is
  \<^verbatim>\true\ for @{setting ML_SYSTEM} values derived
  from Poly/ML, as opposed to SML/NJ where it is empty. This is
  particularly useful with the build option @{system_option condition}
  (\secref{sec:system-options}) to restrict big sessions to something that
  SML/NJ can still handle.


Concerning what "SML/NJ can still handle", see also 
https://bitbucket.org/isabelle_project/isabelle-release/commits/a4e6ea45f416


That commit is only on the isabelle-release repository.  It will come back 
via merge in something like 2 weeks.



Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Lawrence Paulson
Maybe it’s time. They’ve not made it better, and it seems pretty hopeless now.
Larry

> On 31 Jan 2016, at 13:41, Makarius  wrote:
> 
> We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session for 
> SML/NJ.
> 
> Or we could discontinue SML/NJ altogether.  I proposed that the last time 10 
> years ago.

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Lars Hupel
> The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc
> measures to get meaningful interactive tests, without wasting too much
> time.  It probably makes sense to imitate that for "testboard" setups.

The idea is to distribute the build load over multiple machines, since
that isn't supported by "isabelle build". This is generally helpful –
even if we get better hardware – because distributing jobs is trivial
with Jenkins and there's no reason we shouldn't. The shorter the build
time, the better.

> For regular batch tests in the style of "isatest", the main purpose is
> to accumulate accurate timing information, and preserve that in a simple
> format over a long time.  This has been forgotten in recent years, since
> measurements have become more and more messed up.  So performance-wise, we
> are flying blind, especially for AFP.

I can't yet say with confidence that we now have a setup with reliable
performance numbers. In a couple of days I will look at the numbers to
get an idea of their variance.

> A related question is what to do with SML/NJ and the ML_SYSTEM_POLYML
> variable.  For the Isabelle2016 release, I've tried to build everything
> with SML/NJ as usual, but many sessions failed, even just HOL-Library.
> 
> We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session
> for SML/NJ.
> 
> Or we could discontinue SML/NJ altogether.  I proposed that the last
> time 10 years ago.

Building with SML/NJ configuration is trivial. It would be "just another
job" in Jenkins. I'd just need to know about what to build precisely.
Can you explain to me what the ML_SYSTEM_POLYML variable means?

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Makarius

On Sun, 31 Jan 2016, Lars Hupel wrote:


unexpected exception (bug?) in SML/NJ: Io [Io: openIn failed on
"/build/smlnj-tOOpSy/smlnj-110.76/sml.boot.x86-unix/smlnj/basis/.cm/x86-unix/basis.cm",
No such file or directory]
 raised at: Basis/Implementation/IO/bin-io-fn.sml:617.25-617.71
../cm/util/safeio.sml:30.11
../compiler/TopLevel/interact/evalloop.sml:44.55

Settings are:

ML_SYSTEM=smlnj-110
ML_HOME="/usr/lib/smlnj/bin"


/usr/lib probably means that this is a "package" on the target system, and 
thus it is usually broken by default.


I've never used anything else than a manual build of SML/NJ from the 
official sources.  It usually works out of the box, unchanged for many 
decades.




Interestingly enough, "isabelle build" doesn't complain. The error only
becomes apparent in the log file.


Total existence failure of the outermost ML executable can lead to 
problems with the Isabelle wrapper scripts.  The same can happen for 
Poly/ML under certain circumstances.


We are normally testing Isabelle applications, not ML installations.


Nonetheless, I hope to make this more robust at some point, by replacing 
the shell scripts with Isabelle/Scala.  Not having SML/NJ anymore could 
accelerate that move.



Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Lars Hupel
> Building with SML/NJ configuration is trivial. It would be "just another
> job" in Jenkins. I'd just need to know about what to build precisely.
> Can you explain to me what the ML_SYSTEM_POLYML variable means?

I was speaking too early. Here's what I get when trying to build Pure
with SML/NJ:


Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
!* unable to process `' (unknown extension `')
- [autoloading]

unexpected exception (bug?) in SML/NJ: Io [Io: openIn failed on
"/build/smlnj-tOOpSy/smlnj-110.76/sml.boot.x86-unix/smlnj/basis/.cm/x86-unix/basis.cm",
No such file or directory]
  raised at: Basis/Implementation/IO/bin-io-fn.sml:617.25-617.71
 ../cm/util/safeio.sml:30.11
 ../compiler/TopLevel/interact/evalloop.sml:44.55



Settings are:

ML_SYSTEM=smlnj-110
ML_HOME="/usr/lib/smlnj/bin"
ML_OPTIONS="@SMLdebug=/dev/null @SMLalloc=1024"
ML_PLATFORM=$(eval $("$ML_HOME/.arch-n-opsys" 2>/dev/null); echo
"$HEAP_SUFFIX")


Interestingly enough, "isabelle build" doesn't complain. The error only
becomes apparent in the log file.

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


[isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Lars Hupel
As of 527488dc8b90, there are five sessions which contain theories that
are only processed under ISABELLE_FULL_TEST=true:

HOL-ex
Sudoku
HOL-Datatype_Examples
Brackin
IsaFoR
Misc_N2M
HOL-Word-SMT_Examples
SMT_Tests
HOL-Quickcheck_Benchmark
Find_Unused_Assms_Examples
Needham_Schroeder_No_Attacker_Example
Needham_Schroeder_Guided_Attacker_Example
Needham_Schroeder_Unguided_Attacker_Example
HOL-Record_Benchmark
Record_Benchmark

Most of these appear to be benchmarks of some sort. Currently, the new
CI setup doesn't set the ISABELLE_FULL_TEST flag. I think it would be
useful to run those, but I'd like to split them up into a separate run
so that they can be executed in parallel to the regular makeall. To that
end, I would suggest grouping these sessions in 'full'. I could then run

  export ISABELLE_FULL_TEST=true
  isabelle build -v -g full

(In a similar fashion, I run 'slow' sessions separately from the others.)

If nobody complains, I'll add the group.

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Grouping ISABELLE_FULL_TEST?

2016-01-31 Thread Makarius

On Sun, 31 Jan 2016, Lars Hupel wrote:


As of 527488dc8b90, there are five sessions which contain theories that
are only processed under ISABELLE_FULL_TEST=true:

HOL-ex
   Sudoku
HOL-Datatype_Examples
   Brackin
   IsaFoR
   Misc_N2M
HOL-Word-SMT_Examples
   SMT_Tests
HOL-Quickcheck_Benchmark
   Find_Unused_Assms_Examples
   Needham_Schroeder_No_Attacker_Example
   Needham_Schroeder_Guided_Attacker_Example
   Needham_Schroeder_Unguided_Attacker_Example
HOL-Record_Benchmark
   Record_Benchmark

Most of these appear to be benchmarks of some sort. Currently, the new
CI setup doesn't set the ISABELLE_FULL_TEST flag. I think it would be
useful to run those, but I'd like to split them up into a separate run
so that they can be executed in parallel to the regular makeall. To that
end, I would suggest grouping these sessions in 'full'. I could then run

 export ISABELLE_FULL_TEST=true
 isabelle build -v -g full

(In a similar fashion, I run 'slow' sessions separately from the others.)


We have accumulated a fair amount of complication here. There was a time 
when it was even more complication, but we managed to recover from it by 
substantial improvements of Poly/ML (multithreading and parallel GC), 
elimination of "make", and introduction of up-to-date hardware.



The "ISABELLE_FULL_TEST" variable and "slow" group are somehow adhoc 
measures to get meaningful interactive tests, without wasting too much 
time.  It probably makes sense to imitate that for "testboard" setups.


For regular batch tests in the style of "isatest", the main purpose is to 
accumulate accurate timing information, and preserve that in a simple 
format over a long time.  This has been forgotten in recent years, since 
measurements have become more and more messed up.  So performance-wise, we

are flying blind, especially for AFP.


A related question is what to do with SML/NJ and the ML_SYSTEM_POLYML 
variable.  For the Isabelle2016 release, I've tried to build everything 
with SML/NJ as usual, but many sessions failed, even just HOL-Library.


We could discontinue ML_SYSTEM_POLYML and merely test the "HOL" session 
for SML/NJ.


Or we could discontinue SML/NJ altogether.  I proposed that the last time 
10 years ago.



Here are also the canonical meta questions:

  * What is really required?

  * What is the simplest way to achieve that?

  * What can be discontinued?


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev