Re: [isabelle-dev] More context on Isabelle memory consumption

2011-10-19 Thread Makarius

On Tue, 18 Oct 2011, Thomas Sewell wrote:

My email about PolyML and memory usage has generated some discussion, 
perhaps I should give some more context.


The continuing improvements of Poly/ML statistics add important new 
aspects to the overall performance monitoring of Isabelle performance, 
although this is an ongoing effort for many years already.



The L4.verified project has been stuck at Isabelle-2009-2 for years. Our 
proofs seem to be compatible with newer Isabelle versions, but we're 
running out of (virtual) memory. We've been running PolyML in 32 bit 
mode on most of our machines. As images grow beyond 4GB, we must switch 
to 64 bit mode, which roughly doubles memory use to  8GB, which means 
replacing most of our laptops. Soon 16GB will be standard, I suppose.


Probably yes.  Moore's Law for memory is still working.  5 years ago 4 GB 
was really huge, but now it is not so much, something you can assume for a 
small Laptop.  A normal technical workstation of 2011 could be expected 
to have 16-32 GB.


Isabelle has traditionally targeted well-equipped workstations.  In the 
past few years there was an unusual situation that the difference between 
laptops and servers was dimishied, but the gap is opening again due to 
hardware and marketing reasons (think of Smart phones, iPads, light 
Netbooks etc. opposed to huge cloud facilities).


I don't think it is realistic to follow the path of such mobile devices 
for local execution of the proof develpment environment.  It is imaginable 
to do it remotely, though, say via some proof cloud service.  We have 
had this essentially in the past already: In the classic times of the Bali 
project, for example, it was commonplace to run the actual proof process 
in the background on a big server.



In the meanwhile I've been having a look at Isabelle memory use. I've 
never really understood why Isabelle images consume gigabytes of memory. 
With full proofs off, the only real inputs should be the statements of 
hypotheses and the definitions of types and terms. Isabelle's output is 
many orders of magnitude bigger than these inputs.


You probably mean heap size 1 GB, not image size -- the latter is fully 
shared an usually quite compact.  Taking current hardware, 1 GB is not so 
much.  Empirically, a software system always grows to fill up the 
available space.  The question is what can be done with that, i.e. if it 
is put into proper use or just wasted.  (I am also the one who 
periodically reminds people of Isabelle/HOL bloat, even though it builds 
faster on my machine than ever before in history: 2min 16s on 4 cores with 
maybe 4-8 GB.



Understanding the full details of memory usage is hard, often infeasible. 
After years of Isabelle maintanence and optimization, I have myself 
developed a certain mental model that approximates this complex situation.



Counting mere types and terms of the core logic is not realistic.  The 
system is much more than the logic, even though everything happens to be 
reduced to some core calculus.  And even the logic itself has a lot of not 
immediately visible overhead compared to rather compact source 
representation.  As a rule of thumb the explicit type information is 10 
times bigger than the overall term structure.  You have already 
rediscovered some aspects of that in your recent profiling 4 measurement.



The initial thought was that the record package would be representative. 
The HOL image has grown because it has more in it, but the theorems 
generated in defining a large record stay roughly constant.

...
The record package uses multiple typedefs to simplify the types of some 
terms. In 2009-2 a typedef caused a theory to be checkpointed 8 times, 
in 2011-2 30 times.


This gives us a large collection of hypotheses, all of which are tedious 
to investigate. Please appreciate how valuable empirical data from 
PolyML is. Up until now we've been poking at these issues in the dark.


One more performance figure is good, but many other things have been 
measured over the years, with an ever improving tendency.  So profiling 4 
is just one more step.


Since a couple of years, my morning ritual is to look first at the charts 
that are generated from the isatest runs, e.g. see 
http://isabelle.in.tum.de/devel/stats/at-poly.html


Every tiny unexplained change in performance usually leads to some 
investigation of changesets, often with a private mail exchange with the 
people involved.  Thus we have managed to keep an ever growing behemoth 
manageable, and actually devliver some performance to end-users, who in 
turn produce larger and larger applications to fill up the available space 
and time.



Here is a list of things that can be easily contributed externally to 
continue the continious efforts to manage the system, without postulating 
arbitrary quick changes to the incredibly complex Isabelle/Pure and main 
HOL/Tools:


  * Improve isatest to include additional measurements, say from profiling

Re: [isabelle-dev] More context on Isabelle memory consumption

2011-10-19 Thread Makarius

On Tue, 18 Oct 2011, Thomas Sewell wrote:

We've been running PolyML in 32 bit mode on most of our machines. As 
images grow beyond 4GB, we must switch to 64 bit mode, which roughly 
doubles memory use to  8GB, which means replacing most of our laptops.


The jump from 32bit to 64bit has to be made at some point, and it will 
cost a bit of memory, but not runtime.  64bit is already faster for usual 
practical situations, I always use it by default for 1 year or so.


Another aspect is the ongoing replacement of TTY / Proof General 
interaction with the new Prover IDE.  This will cost further resources, 
and it is unlikely that the result of two big processes (ML and JVM) can 
support a big application on a small 4 GB laptop.  (For me the bottom line 
for small/medium Isabelle/PIDE applications is 2 cores, 4 GB, i.e. typical 
Mac Book Pro from 2 years ago.)



Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev