Re: [Haskell-cafe] Estimating the time to garbage collect

2009-05-04 Thread Duncan Coutts
On Fri, 2009-05-01 at 09:14 +0100, Neil Davies wrote:
 Hi
 
 With the discussion on threads and priority, and given that (in  
 Stats.c) there are lots of useful pieces of information that the run  
 time system is collecting, some of which is already visible (like the  
 total amount of memory mutated) and it is easy to make other measures  
 available - it has raised this question in my mind:
 
 Given that you have access to that information (the stuff that comes  
 out at the end of a run if you use +RTS -S) is it possible to estimate  
 the time a GC will take before asking for one?
 
 Ignoring, at least for the moment, all the issues of paging, processor  
 cache occupancy etc, what are the complexity drivers for the time to GC?
 
 I realise that it is going to depend on things like, volume of data  
 mutated, count of objects mutated, what fraction of them are live etc  
 - and even if it turns out that these things are very program specific  
 then I have a follow-on question - what properties do you need from  
 your program to be able to construct a viable estimate of GC time from  
 a past history of such garbage collections?

Would looking at statistics suffice? Treat it mostly as a black box.
Measure all the info you can before and after each GC and then use
statistical methods to look for correlations to see if any set of
variables predicts GC time.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Estimating the time to garbage collect

2009-05-04 Thread Neil Davies

Duncan

That was my first thought - but what I'm looking for is some  
confirmation from those who know better that treating the GC as  
'statistical source' is a valid hypothesis. If the thing is 'random'  
that's fine - if its timing is non-deterministic, that's not fine.


So GC experts are there any hints you can give me? - are there any  
papers that cover this timing aspect? and are there any corner cases  
that might make the statistical approach risky? (or at worse invalid).


I don't want to have to build a stochastic model of the GC, if I can  
help it!


Neil



On 4 May 2009, at 12:51, Duncan Coutts wrote:


On Fri, 2009-05-01 at 09:14 +0100, Neil Davies wrote:

Hi

With the discussion on threads and priority, and given that (in
Stats.c) there are lots of useful pieces of information that the run
time system is collecting, some of which is already visible (like the
total amount of memory mutated) and it is easy to make other measures
available - it has raised this question in my mind:

Given that you have access to that information (the stuff that comes
out at the end of a run if you use +RTS -S) is it possible to  
estimate

the time a GC will take before asking for one?

Ignoring, at least for the moment, all the issues of paging,  
processor
cache occupancy etc, what are the complexity drivers for the time  
to GC?


I realise that it is going to depend on things like, volume of data
mutated, count of objects mutated, what fraction of them are live etc
- and even if it turns out that these things are very program  
specific

then I have a follow-on question - what properties do you need from
your program to be able to construct a viable estimate of GC time  
from

a past history of such garbage collections?


Would looking at statistics suffice? Treat it mostly as a black box.
Measure all the info you can before and after each GC and then use
statistical methods to look for correlations to see if any set of
variables predicts GC time.

Duncan



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Estimating the time to garbage collect

2009-05-04 Thread Duncan Coutts
On Mon, 2009-05-04 at 15:05 +0100, Neil Davies wrote:
 Duncan
 
 That was my first thought - but what I'm looking for is some  
 confirmation from those who know better that treating the GC as  
 'statistical source' is a valid hypothesis. If the thing is 'random'  
 that's fine - if its timing is non-deterministic, that's not fine.
 
 So GC experts are there any hints you can give me? - are there any  
 papers that cover this timing aspect? and are there any corner cases  
 that might make the statistical approach risky? (or at worse invalid).

I suggest you repost this question on the ghc users mailing list as
-cafe is a bit high volume sometimes for the ghc hackers to keep up
with.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Estimating the time to garbage collect

2009-05-02 Thread Neil Davies
Yes, you've got the problem domain. I don't have to deliver responses
to stimuli all the time within a bound, but I need to supply some
probability for that figure.
That problem domain is everywhere - all that varies is the bound on the time
and the probability of meeting it.

'Hard real time' systems are very expensive to build and, typically, make
very low utilisation of resources and have interesting failure modes when
timing stops being be met. Meeting strict timing constraints is becoming
more difficult as processors become more complex (think multi-level caching,
clock rates that vary with temperature and/or load) and when those systems
use packet based multiplexed as their interconnect (time slotted shared bus
being too expensive).

Yes, the proof obligations are more challenging, no more ability to
enumerate the complete state space and prove that the schedule can always be
met, no more 'certainty' that events and communications will occur within a
fixed time. Interestingly giving up that constraint may well have its
up-side, it was being used as a design 'crutch' - possibly being leaned on
too heavily. Having to explicitly consider a probability distribution
appears to create more robust overall systems.

On the flip side, this more stochastic approach has to work - the commercial
trends in wide area networking mean things are getting more
stochastic, deterministic timings for wide are communications will be a
thing of the past in 10 - 15 years (or prohibitively expensive). This is
already worrying people like electricity distribution folks - their control
systems are looking vulnerable to such changes and the issue of
co-ordination electricity grids is only going to get more difficult as the
number of generations sources increase, as is inevitable.

Perhaps this is too much for a Saturday morning, sunny one at that

Neil

2009/5/1 John Van Enk vane...@gmail.com

 I think the problem becomes slightly easier if you can provide an upper
 bound on the time GC will take. If I understand your problem domain, Neil,
 you're most concerned with holding up other processes/partitions who are
 expecting to have a certain amount of processing time per frame. If we can
 give an upper bound to the GC time, then we can plan for it in the schedule
 without upsetting the other processes.

 I don't have an answer (though I'd love one), but I do think that asking
 for an upper bound substantially simplifies the problem (though, I could be
 wrong) and still gives you the characterisics you need to give a 'time to
 complete'.
  /jve
 On Fri, May 1, 2009 at 4:14 AM, Neil Davies 
 semanticphilosop...@googlemail.com wrote:

 Hi

 With the discussion on threads and priority, and given that (in Stats.c)
 there are lots of useful pieces of information that the run time system is
 collecting, some of which is already visible (like the total amount of
 memory mutated) and it is easy to make other measures available - it has
 raised this question in my mind:

 Given that you have access to that information (the stuff that comes out
 at the end of a run if you use +RTS -S) is it possible to estimate the time
 a GC will take before asking for one?

 Ignoring, at least for the moment, all the issues of paging, processor
 cache occupancy etc, what are the complexity drivers for the time to GC?

 I realise that it is going to depend on things like, volume of data
 mutated, count of objects mutated, what fraction of them are live etc - and
 even if it turns out that these things are very program specific then I have
 a follow-on question - what properties do you need from your program to be
 able to construct a viable estimate of GC time from a past history of such
 garbage collections?

 Why am I interested? There are all manners of 'real time' in systems,
 there is a vast class where a statistical bound (ie some sort of 'time to
 complete' CDF) is more than adequate for production use. If this is possible
 then it opens up areas where all the lovely properties of haskell can be
 exploited if only you had confidence in the timing behaviour.

 Cheers
 Neil
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




 --
 /jve

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Estimating the time to garbage collect

2009-05-01 Thread Neil Davies

Hi

With the discussion on threads and priority, and given that (in  
Stats.c) there are lots of useful pieces of information that the run  
time system is collecting, some of which is already visible (like the  
total amount of memory mutated) and it is easy to make other measures  
available - it has raised this question in my mind:


Given that you have access to that information (the stuff that comes  
out at the end of a run if you use +RTS -S) is it possible to estimate  
the time a GC will take before asking for one?


Ignoring, at least for the moment, all the issues of paging, processor  
cache occupancy etc, what are the complexity drivers for the time to GC?


I realise that it is going to depend on things like, volume of data  
mutated, count of objects mutated, what fraction of them are live etc  
- and even if it turns out that these things are very program specific  
then I have a follow-on question - what properties do you need from  
your program to be able to construct a viable estimate of GC time from  
a past history of such garbage collections?


Why am I interested? There are all manners of 'real time' in systems,  
there is a vast class where a statistical bound (ie some sort of 'time  
to complete' CDF) is more than adequate for production use. If this is  
possible then it opens up areas where all the lovely properties of  
haskell can be exploited if only you had confidence in the timing  
behaviour.


Cheers
Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Estimating the time to garbage collect

2009-05-01 Thread John Van Enk
I think the problem becomes slightly easier if you can provide an upper
bound on the time GC will take. If I understand your problem domain, Neil,
you're most concerned with holding up other processes/partitions who are
expecting to have a certain amount of processing time per frame. If we can
give an upper bound to the GC time, then we can plan for it in the schedule
without upsetting the other processes.

I don't have an answer (though I'd love one), but I do think that asking for
an upper bound substantially simplifies the problem (though, I could be
wrong) and still gives you the characterisics you need to give a 'time to
complete'.
/jve
On Fri, May 1, 2009 at 4:14 AM, Neil Davies 
semanticphilosop...@googlemail.com wrote:

 Hi

 With the discussion on threads and priority, and given that (in Stats.c)
 there are lots of useful pieces of information that the run time system is
 collecting, some of which is already visible (like the total amount of
 memory mutated) and it is easy to make other measures available - it has
 raised this question in my mind:

 Given that you have access to that information (the stuff that comes out at
 the end of a run if you use +RTS -S) is it possible to estimate the time a
 GC will take before asking for one?

 Ignoring, at least for the moment, all the issues of paging, processor
 cache occupancy etc, what are the complexity drivers for the time to GC?

 I realise that it is going to depend on things like, volume of data
 mutated, count of objects mutated, what fraction of them are live etc - and
 even if it turns out that these things are very program specific then I have
 a follow-on question - what properties do you need from your program to be
 able to construct a viable estimate of GC time from a past history of such
 garbage collections?

 Why am I interested? There are all manners of 'real time' in systems, there
 is a vast class where a statistical bound (ie some sort of 'time to
 complete' CDF) is more than adequate for production use. If this is possible
 then it opens up areas where all the lovely properties of haskell can be
 exploited if only you had confidence in the timing behaviour.

 Cheers
 Neil
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
/jve
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe