Re: [Haskell-cafe] Estimating the time to garbage collect
On Fri, 2009-05-01 at 09:14 +0100, Neil Davies wrote: Hi With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind: Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one? Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC? I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections? Would looking at statistics suffice? Treat it mostly as a black box. Measure all the info you can before and after each GC and then use statistical methods to look for correlations to see if any set of variables predicts GC time. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Estimating the time to garbage collect
Duncan That was my first thought - but what I'm looking for is some confirmation from those who know better that treating the GC as 'statistical source' is a valid hypothesis. If the thing is 'random' that's fine - if its timing is non-deterministic, that's not fine. So GC experts are there any hints you can give me? - are there any papers that cover this timing aspect? and are there any corner cases that might make the statistical approach risky? (or at worse invalid). I don't want to have to build a stochastic model of the GC, if I can help it! Neil On 4 May 2009, at 12:51, Duncan Coutts wrote: On Fri, 2009-05-01 at 09:14 +0100, Neil Davies wrote: Hi With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind: Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one? Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC? I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections? Would looking at statistics suffice? Treat it mostly as a black box. Measure all the info you can before and after each GC and then use statistical methods to look for correlations to see if any set of variables predicts GC time. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Estimating the time to garbage collect
On Mon, 2009-05-04 at 15:05 +0100, Neil Davies wrote: Duncan That was my first thought - but what I'm looking for is some confirmation from those who know better that treating the GC as 'statistical source' is a valid hypothesis. If the thing is 'random' that's fine - if its timing is non-deterministic, that's not fine. So GC experts are there any hints you can give me? - are there any papers that cover this timing aspect? and are there any corner cases that might make the statistical approach risky? (or at worse invalid). I suggest you repost this question on the ghc users mailing list as -cafe is a bit high volume sometimes for the ghc hackers to keep up with. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Estimating the time to garbage collect
Yes, you've got the problem domain. I don't have to deliver responses to stimuli all the time within a bound, but I need to supply some probability for that figure. That problem domain is everywhere - all that varies is the bound on the time and the probability of meeting it. 'Hard real time' systems are very expensive to build and, typically, make very low utilisation of resources and have interesting failure modes when timing stops being be met. Meeting strict timing constraints is becoming more difficult as processors become more complex (think multi-level caching, clock rates that vary with temperature and/or load) and when those systems use packet based multiplexed as their interconnect (time slotted shared bus being too expensive). Yes, the proof obligations are more challenging, no more ability to enumerate the complete state space and prove that the schedule can always be met, no more 'certainty' that events and communications will occur within a fixed time. Interestingly giving up that constraint may well have its up-side, it was being used as a design 'crutch' - possibly being leaned on too heavily. Having to explicitly consider a probability distribution appears to create more robust overall systems. On the flip side, this more stochastic approach has to work - the commercial trends in wide area networking mean things are getting more stochastic, deterministic timings for wide are communications will be a thing of the past in 10 - 15 years (or prohibitively expensive). This is already worrying people like electricity distribution folks - their control systems are looking vulnerable to such changes and the issue of co-ordination electricity grids is only going to get more difficult as the number of generations sources increase, as is inevitable. Perhaps this is too much for a Saturday morning, sunny one at that Neil 2009/5/1 John Van Enk vane...@gmail.com I think the problem becomes slightly easier if you can provide an upper bound on the time GC will take. If I understand your problem domain, Neil, you're most concerned with holding up other processes/partitions who are expecting to have a certain amount of processing time per frame. If we can give an upper bound to the GC time, then we can plan for it in the schedule without upsetting the other processes. I don't have an answer (though I'd love one), but I do think that asking for an upper bound substantially simplifies the problem (though, I could be wrong) and still gives you the characterisics you need to give a 'time to complete'. /jve On Fri, May 1, 2009 at 4:14 AM, Neil Davies semanticphilosop...@googlemail.com wrote: Hi With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind: Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one? Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC? I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections? Why am I interested? There are all manners of 'real time' in systems, there is a vast class where a statistical bound (ie some sort of 'time to complete' CDF) is more than adequate for production use. If this is possible then it opens up areas where all the lovely properties of haskell can be exploited if only you had confidence in the timing behaviour. Cheers Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- /jve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Estimating the time to garbage collect
Hi With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind: Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one? Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC? I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections? Why am I interested? There are all manners of 'real time' in systems, there is a vast class where a statistical bound (ie some sort of 'time to complete' CDF) is more than adequate for production use. If this is possible then it opens up areas where all the lovely properties of haskell can be exploited if only you had confidence in the timing behaviour. Cheers Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Estimating the time to garbage collect
I think the problem becomes slightly easier if you can provide an upper bound on the time GC will take. If I understand your problem domain, Neil, you're most concerned with holding up other processes/partitions who are expecting to have a certain amount of processing time per frame. If we can give an upper bound to the GC time, then we can plan for it in the schedule without upsetting the other processes. I don't have an answer (though I'd love one), but I do think that asking for an upper bound substantially simplifies the problem (though, I could be wrong) and still gives you the characterisics you need to give a 'time to complete'. /jve On Fri, May 1, 2009 at 4:14 AM, Neil Davies semanticphilosop...@googlemail.com wrote: Hi With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind: Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one? Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC? I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections? Why am I interested? There are all manners of 'real time' in systems, there is a vast class where a statistical bound (ie some sort of 'time to complete' CDF) is more than adequate for production use. If this is possible then it opens up areas where all the lovely properties of haskell can be exploited if only you had confidence in the timing behaviour. Cheers Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- /jve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe