Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: You see this as a requirement that can be discharged by adding the ACIO concept; I see it as a requirement that should be communicated in the type. Another way of looking at it is that Data.Unique has associated with it some context in which Unique values are safely comparable. You want that context to always be the top-level/RTS scope, I would like the defining that context to be part of the API. But why pick on Data.Unique as special? Because I just happened to have pointed out it uses a global variable? If you didn't know this I suspect this issue just wouldn't be an issue at all. Why haven't you raised a ticket complaining about it's API having the wrong type sigs? :-) There's shed loads of information and semantic subtleties about pretty much any operation you care to think of in the IO monad that isn't communicated by it's type. All you know for sure is that it's weird, because if it wasn't it wouldn't be in the IO monad. So I think you're applying double standards. We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted Would you mind listing them? It might help provide some clarity to the discussion. Here's what you can't find in the libs distributed with ghc. Note this does not include all uses of unsafePerformIO. It only includes uses to make a global variable. Control.Concurrent 1 Control.OldException 1 Data.HashTable 1 Data.Typeable1 Data.Unique 1 GHC.Conc 8 GHC.Handle 3 System.Random1 Language.Haskell.Syntax 1 System.Posix.Signals 2 System.Win32.Types 1 Network.BSD 1 System.Posix.User1 Total: 23 In the ghc source you can find 16 uses of the GLOBAL_VAR macro (can't imagine what that does :-). I didn't even attempt to figure out how global variables might be the rts source. Anyone care to hazard a guess? You can find a few more in the extra libs.. Graphics.UI.GLUT.Menu1 Graphics.UI.GLUT.Callbacks.Registration 3 Graphics.Rendering.OpenGL.GLU.ErrorsInternal 1 Total: 5 A few more: wxHaskell 6 c2hs 1 GTK2HS1 SDL 0 !! However, I happen to know that SDL suffers from the initialisation issue and IIRC it needs at least 1 global to stop user using an unsafe (possibly segfault inducing) calling sequence. Anyway, that's all from me because I'm bored with this thread now. Regards -- Adrian hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Top Level -
Sittampalam, Ganesh wrote: Can't you write two recursive modules with - that depend on each other, so that there's no valid initialisation order? Contrived example follows: module Module1 where glob1 :: IORef Int glob1 - mod2 = newIORef mod1 :: IO Int mod1 = readIORef glob1 module Module2 where glob2 :: IORef Int glob2 - mod1 = newIORef mod2 :: IO Int mod2 = readIORef glob2 Immediatly breaking my promise to shut up.. This is illegal because you're only allowed to use ACIO in top level - bindings and readIORef isn't (and clearly could not be) ACIO. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: On Sun, 31 Aug 2008, Adrian Hey wrote: Eh? Please illustrate your point with Data.Unique. What requirements does it place on it's context? (whatever that might mean :-) It requires that its context initialises it precisely once. It's context being main? If so this is true, but I don't see why this is a problem. It's a happy accident with the unsafePerformIO hack as it is, and part of the defined semantics for *all* hypothetical top level - bindings. Though to be more precise, the requirement is that it may be initialised at any time prior to first use, but never again (there's no requirement to initialise it at all if it isn't used). Also ACIO monad properties guarantee that it's always initialised to the same value regardless of when this occurs. So I don't see the problem. Data.Unique is actually a poor example, as it is actually fine to initialise it multiple times as long as the resulting Unique values aren't treated as coming from the same datatype. I just don't see what you're getting at. There's no problem here and Data.Unique is not special. We don't even have to consider whether or not it's OK to reinitialise these things unless the programmer explicitly allows this in the API (which Data.Unique doesn't). This is true for all top level - bindings. myCount :: MVar Int myCount - newMVar 0 In a hypothetical second initialisation, do you mean.. 1 - myCount somehow gets rebound to a different/new MVar 2 - The binding stays the same but MVar gets reset to 0 without this being explicitly done in the code. I assume you mean the latter (2). But either case seems like an absurdity to me. No top level bindings randomly change halfway through a program and MVars (I hope) are not prone to random corruption (no need to suppose things are any different if they occur at the top level). But equally it can be implemented with IORefs, Actually it couldn't as IORefs are not an Ord instance. so it's not a good advert for the need for global variables. Oh please! We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted and plenty of people have found that other libs/ffi bindings need them for safety reasons. Or at least they need something that has global main/process scope and so far the unsafePerformIO hack is the only known way to get that and still keep APIs stable,sane and modular. Also, AFAICS going the way that seems to be suggested of having all this stuff reflected in the arguments/types of API is going to make it practically impossible to have platform independent APIs if all platform specific implementation detail has to be accounted for in this way. The real irony of your remark is that making APIs this robust is practically impossible *without* using global variables, and you're now saying that because they've done this work to eliminate these constraints they now have to be held to account for this with an absurd API. I think there are two cases to consider here. A Data.Unique style library, which requires genuinely *internal* state, and which is agnostic to having multiple copies of itself loaded simultaneously. In that case, there is no requirement for a process-level scope for -, just that each instance of the library is only initialised once - the RTS can do this, as can any dynamic loader. The other is some library that really cannot be safely loaded multiple times, because it depends on some lower-level shared resource. Such a library simply cannot be made safe without cooperation from the thing that controls that shared resource, because you cannot prevent a second copy of it being loaded by something you have no control over. If the - proposal were only about supporting the first of these applications, I would have far fewer objections to it. But it would have nothing to do with process-level scope, then. The - proposal introduces no new problems that aren't already with us. It solves 1 problem in that at least there's no room for the compiler to get it wrong or for people do use dangerous things when using the unsafePerformIO hack. I think that is really the only problem that can be solved at the level of Haskell language definition. I also think we need to be careful about the use of the term process. IMO when we say the process defined by main, we are talking about an abstract process that is essentially defined by Haskell and may have nothing in common with a process as defined by various OS's (assuming there's an OS involved at all). Perhaps we should try be more clear and say Haskell process or OS process as appropriate. In particular when we say an MVar or IORef has global process scope (whether or not it occurs at top level) we are talking about a Haskell process, not an OS process. The issues you raise seem to me to be more to do with correct implementaton on various platforms using various tools
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Adrian Hey wrote: We have to have something concrete to discuss and this is the simplest. Like I said there are a dozen or so other examples in the base package last time I counted and plenty of people have found that other libs/ffi bindings need them for safety reasons. Or at least they need something that has global main/process scope and so far the unsafePerformIO hack is the only known way to get that and still keep APIs stable,sane and modular. Actually all this use of the tainted and derogatory term global variable is causing me to be imprecise. All MVars/IORefs have global main/process scope whether or not they're bound to something at the top level. The purpose of the top level static binding is to prevent accidental or malicious state spoofing if it's important that the *same* IORef/MVar is always used for some purpose. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Dan Doel wrote: Here's a first pass: -- snip -- {-# LANGUAGE Rank2Types, GeneralizedNewtypeDeriving #-} module Unique where import Control.Monad.Reader import Control.Monad.Trans import Control.Concurrent.MVar -- Give Uniques a phantom region parameter, so that you can't accidentally -- compare Uniques from two different uniqueness sources. newtype Unique r = Unique Integer deriving Eq newtype U r a = U { unU :: ReaderT (MVar Integer) IO a } deriving (Functor, Monad, MonadIO) -- Higher rank type for region consistency runU :: (forall r. U r a) - IO a runU m = newMVar 0 = runReaderT (unU m) newUnique :: U r (Unique r) newUnique = U (do source - ask val - lift $ takeMVar source let next = val + 1 lift $ putMVar source next return $ Unique next) -- hashUnique omitted -- snip -- It's possible that multiple unique sources can exist in a program with this implementation, but because of the region parameter, the fact that a Unique may not be globally unique shouldn't be a problem. If your whole program needs arbitrary access to unique values, then I suppose something like: main = runU realMain realMain :: U r () realMain = ... is in order. Insert standard complaints about this implementation requiring liftIO all over the place if you actually want to do other I/O stuff inside the U monad. Well that wouldn't be my main complaint :-) Thanks for taking the time to do this Dan. I think the safety requirement has been met, but I think it fails on the improved API. The main complaint would be what I see as loss of modularity, in that somehow what should be a small irrelevant detail of the implementation of some obscure module somewhere has propogated it's way all the way upto main. This is something it seems to have in common with all other attempts I've seen to solve the global variable problem without actually using a..you know what :-) It doesn't matter whether it's explicit state handle args, withWhateverDo wrappers, novel monads or what. They all have this effect. To me this seems completely at odds with what I thought was generally accepted wisdom of how to write good maintainable, modular software. Namely hiding as much implemention detail possible and keeping APIs as simple and stable as they can be. I don't know if I'm alone in that view nowadays. I'm also not sure I understand why so many people seem to feel that stateful effects must be accounted for somehow in the args and/or types of the effecting function. Like if I had.. getThing :: IO Thing ..as an FFI binding, nobody would give it a moments thought. They'd see it from it's type that it had some mysterious world state dependent/effecting behaviour, but would be quite happy to just accept that the didn't really need to worry about all that magic... instead they'd accept that it just works. Why then, if I want to implement precisely the same thing in Haskell (using a global variable) does it suddenly become so important for this stateful magic to be accounted for? Like the presence of that global variable must be made so very painfully apparent in main (and everywhere else on the dependency path too I guess). In short, I just don't get it :-) Purists aren't going to like it, but I think folk *will* be using real global variables in I/O libs for the forseeable future. Seems a shame that they'll have to do this with unsafePerformIO hack though :-( Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: On Sun, 31 Aug 2008, Adrian Hey wrote: Thanks for taking the time to do this Dan. I think the safety requirement has been met, but I think it fails on the improved API. The main complaint would be what I see as loss of modularity, in that somehow what should be a small irrelevant detail of the implementation of some obscure module somewhere has propogated it's way all the way upto main. That's the key point, as I see it - they aren't irrelevant details of the implementation, they are requirements the implementation places on its context in order for that implementation to be correct. So they should be communicated appropriately. Eh? Please illustrate your point with Data.Unique. What requirements does it place on it's context? (whatever that might mean :-) It just does what it says on the tin AFAICS. There are no requirements it places on clients (to use an OO term), as should any halfway decent API IMO. To me this seems completely at odds with what I thought was generally accepted wisdom of how to write good maintainable, modular software. Namely hiding as much implemention detail possible and keeping APIs as simple and stable as they can be. I don't know if I'm alone in that view nowadays. It's no problem to hide implementation detail, but I don't think you should hide the *requirement* of the implementation that it has constraints on how it is called, namely that it requires once-only initialisation or whatever. No decent API should require this. Data.Unique certainly doesn't. In fact is debatable whether any API should requre an initalisation call at all before other stuff should be called (the other stuff check initialisation has occured and do it itself if necessary). The real irony of your remark is that making APIs this robust is practically impossible *without* using global variables, and you're now saying that because they've done this work to eliminate these constraints they now have to be held to account for this with an absurd API. Seems like a clear case of no good deed going unpunished :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: How do the implementers of Data.Unique know that they musn't let them be serialised/deserialised? Because if you could take a String and convert it to a Unique there would be no guarantee that result was *unique*. What stops the same rule from applying to Data.Random? Well the only data type defined by this is StdGen, which is a Read/Show instance. I guess there's no semantic problem with that (can't think of one off hand myself). Also what if I want a thread-local variable? Well actually I would say that threads are bad concurrency model so I'm not keen on thread local state at all. Mainly because I'd like to get rid of threads, but also a few other doubts even if we keep threads. Even if you don't like them, people still use them. AFAICS this is irrelvant for the present discussions as Haskell doesn't support thread local variable thingies. If it ever does being precise about that is someone elses problem. The fact that your proposal isn't general enough to handle them is a mark against it; standardised language features should be widely applicable, and as orthogonal as possible to other considerations. I think the whole thread local state thing is a complete red herring. I've never seen a convincing use case for it and I suspect the only reason these to issues have become linked is that some folk are so convinced that global variables are evil, they mistakenly think thread local variables must be less evil (because they are less global). Anyway, if you understand the reasons why all the real world libraries that do currently use global variables do this, it's not hard to see why they don't want this to be thread local (it would break all the safety properties they're trying to ensure). So whatever problem thread local variables might solve, it isn't this one. For the time being the scope of IORefs/MVars/Chans is (and should remain) whatever process is described by main (whether or not they appear at top level). And if main isn't the entry point? This comes back to my questions about dynamic loading. Well you're talking about some non-standard Haskell, so with this and other non standard stuff (like plugins etc) I guess the answer is it's up to whoever's doing this to make sure they do it right. I can't comment further as I don't know what it is they're trying to do, but AFAICS it's not a language design issue at present. If plugins breaks is down to plugins to fix itself, at least until such time as a suitable formal theory of plugins has been developed so it can become standard Haskell :-) (I.E. Just making existing practice *safe*, at least in the sense that the compiler ain't gonna fcuk it up with INLINING or CSE and every one understands what is and isn't safe in ACIO) Creating new language features means defining their semantics rather more clearly than just no inlining or cse, IMO. I wouldn't even know how to go about that to the satisfaction of purists. But global variables *are* being used whether or not the top level - bindings are implemented. They're in the standard libraries! So if this stuff matters someone had better figure it out :-) It's a hack that isn't robust in many situations. We should find better ways to do it, not standardise it. Nobody's talking about standardising the current hack. This the whole point of the top level - proposal, which JM seems to think is sound enough for incorporation into JHC (correctly IMO). Nobody's found fault with it, other than the usual global variables are evil mantra :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: On Sat, 30 Aug 2008, Adrian Hey wrote: Because if you could take a String and convert it to a Unique there would be no guarantee that result was *unique*. Well, yes, but if I implemented a library in standard Haskell it would always be safely serialisable/deserialisable (I think). So the global variables hack somehow destroys that property - how do I work out why it does in some cases but not others? This has nothing to do with the use of global variables. If you have a set of values that are guaranteed to be distinct (unique) and you add another random/arbitrary value to that set you have no way of knowing that it is different from any current member (other than searching the entire set, assuming it's available). Well, I've never seen a convincing use case for global variables :-) Well apart from all the libs that couldn't be implemented with them... reason these to issues have become linked is that some folk are so convinced that global variables are evil, they mistakenly think thread local variables must be less evil (because they are less global). I don't think they're less evil, just that you might want them for the same sorts of reasons you might want global variables. Global variables are needed to ensure important safety properties, but the only reasons I've seen people give for thread local variables is that explicit state threading is just so tiresome and ugly. Well that may be (wouldn't disagree), but I'm not aware of any library that simply couldn't be implemented without them. If plugins breaks is down to plugins to fix itself, at least until such time as a suitable formal theory of plugins has been developed so it can become standard Haskell :-) Dynamic loading and plugins work fine with standard Haskell now, because nothing in standard Haskell breaks them. The - proposal might well break them, which is a significant downside for it. I don't see how, but if so - bindings are not the cause of the brokeness. They'd still be broken using the unsafePerformIO hack. In general, the smaller the world that the Haskell standard lives in, the less it can interfere with other concerns. - massively increases that world, by introducing the concept of a process scope. All IORefs,MVars,Chans scope across the entire process defined by main. Or at least they *should*, if they don't then something is already badly wrong somewhere. This has nothing to do with whether or not they appear at top level. This is what an IORef/MVar whatever is defined to be. It's a hack that isn't robust in many situations. We should find better ways to do it, not standardise it. Nobody's talking about standardising the current hack. This the whole point of the top level - proposal, It just amounts to giving the current hack some nicer syntax and stating some rules under which it can be used. No, the unsafePerformIO hack is a hack because it's *unsound*. The compiler doesn't know how to translate this into code that does what the programmer intended. Fortunately ghc at least does have a couple of flags that give the intended result (we hope). The new binding syntax is nicer, but it's real purpose is to leave the compiler no wriggle room when interpreting the programmers intent. But then again, I'm sure that some that will be adamant that any way of making global variables is a hack. But they'll still be happy to go on using file IO, sockets etc regardless, blissfully unaware of the hacks they are dependent on :-) Those rules aren't actually strong enough to provide a guarantee of process level scope. The rules for - bindings shouldn't have to guarantee this. This should be guaranteed by newMVar returning a new *MVar*, wherever it's used (for example). which JM seems to think is sound enough for incorporation into JHC (correctly IMO). Nobody's found fault with it, other than the usual global variables are evil mantra :-) Several people have found faults with it, you've just ignored or dismissed them. No doubt from your perspective the faults are irrelevant or untrue, but that's not my perspective. I mean semantic faults, as in the proposal just doesn't do what it promises for some subtle reason. If you consider not giving you thread local variables a fault I guess you're entitled to that view, but this was never the intent of the proposal in the first place (that's not what people are trying to do when they use the unsafePerformIO hack). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Adrian Hey wrote: Global variables are needed to ensure important safety properties, but the only reasons I've seen people give for thread local variables is that explicit state threading is just so tiresome and ugly. Well that may be (wouldn't disagree), but I'm not aware of any library that simply couldn't be implemented without them. I thought I ought to say a bit more about my unkind and hasty words re. thread local variables. This is discussed from time to time and there's a wiki page here sumarising proposals... http://www.haskell.org/haskellwiki/Thread-local_storage One thing that worries me is that nobody seems to know what problem thread local storage is solving, hence diversity of proposals. I'm also a struggling to see why we need it, but I don't have any passionate objections to it either. Unfortunately for those of us that want a solution to the global variables problem the two issues seem have been linked as being the part of same problem, so while there's all this uncertainty about what thread local variables are actually going to be used for and what they should look like the (IMO) much simpler global variables problem/solution is in limbo. This has been going on 4 or 5 years now IIRC. But the global variables problem is really much simpler. All we want is something that does exactly what the unsafePerformIO hack currently does (assuming flag/pragma hackery does the trick), but does it reliably. (IMO, YMMV..) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Brandon S. Allbery KF8NH wrote: On 2008 Aug 28, at 20:45, Adrian Hey wrote: Lennart Augustsson wrote: If Haskell had always taken the pragmatic path of adding what seems easiest and most in line with imperative practice it would not be the language it is today. It would be Perl, ML, or Java. The Haskell philosophy has always been to stick it out until someone comes up with the right solution to a problem rather than picking some easy way out. BTW, unsafePerformIO seems quite pragmatic and easy to me, so let's not get too snobby about this. (Sorry, I couldn't resist.) It's anything but easy; there are specific rules you need to follow, including use of certain compiler pragmas, to insure it works properly. Yes, of course. The worst thing about all this is that the single most common use case AFAICS (the one under discussion) isn't even a safe use. Just pointing out that this pseudo function is certainly not something one would expect from an organisation as dedicated to the persuit of perfection as Lennart would have us believe. It's an expedient hack. Not that I wish to seem ungrateful or anything :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Brandon S. Allbery KF8NH wrote: On 2008 Aug 29, at 4:22, Adrian Hey wrote: Brandon S. Allbery KF8NH wrote: On 2008 Aug 28, at 20:45, Adrian Hey wrote: Lennart Augustsson wrote: If Haskell had always taken the pragmatic path of adding what seems easiest and most in line with imperative practice it would not be the language it is today. It would be Perl, ML, or Java. The Haskell philosophy has always been to stick it out until someone comes up with the right solution to a problem rather than picking some easy way out. BTW, unsafePerformIO seems quite pragmatic and easy to me, so let's not get too snobby about this. (Sorry, I couldn't resist.) It's anything but easy; there are specific rules you need to follow, including use of certain compiler pragmas, to insure it works properly. Yes, of course. The worst thing about all this is that the single most common use case AFAICS (the one under discussion) isn't even a safe use. Just pointing out that this pseudo function is certainly not something one would expect from an organisation as dedicated to the persuit of perfection as Lennart would have us believe. It's an expedient hack. Not that I wish to seem ungrateful or anything :-) ...but, as he noted, we *do* that until we find the right way to do it. So what's the problem with doing it *safely*, that is at least until someone has found the mythic right way to do it. Not that anybody has ever been able to offer any rational explanation of what's *wrong* with the current proposed solution AFAICS. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: Will Data.Unique still work properly if a value is sent across a RPC interface? A value of type Unique you mean? This isn't possible. Data.Unique has been designed so cannot be Shown/Read or otherwise serialised/deserialised (for obvious reasons I guess). Also what if I want a thread-local variable? Well actually I would say that threads are bad concurrency model so I'm not keen on thread local state at all. Mainly because I'd like to get rid of threads, but also a few other doubts even if we keep threads. Even if you don't like them, people still use them. AFAICS this is irrelvant for the present discussions as Haskell doesn't support thread local variable thingies. If it ever does being precise about that is someone elses problem. For the time being the scope of IORefs/MVars/Chans is (and should remain) whatever process is described by main (whether or not they appear at top level). (I.E. Just making existing practice *safe*, at least in the sense that the compiler ain't gonna fcuk it up with INLINING or CSE and every one understands what is and isn't safe in ACIO) Creating new language features means defining their semantics rather more clearly than just no inlining or cse, IMO. I wouldn't even know how to go about that to the satisfaction of purists. But global variables *are* being used whether or not the top level - bindings are implemented. They're in the standard libraries! So if this stuff matters someone had better figure it out :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Bryan O'Sullivan wrote: I haven't seen a coherent description of what the semantics of top-level - should be, but avoidance of widespread swearing would be at the top of my list of requirements. Don't the ACIO monad properties satisfy you? Anyway, as I pointed out in my last post, if this is a problem with top level - ACIO monad bindings it's still going to be a problem (probably much worse) with unsafePerformIO hack IO monad bindings. This problem isn't just going to go away, no matter how long it's ignored :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Jonathan Cast wrote: On Wed, 2008-08-27 at 11:53 +0100, Adrian Hey wrote: John Meacham wrote: As with all design decisions, it is sometimes the right thing and sometimes the wrong one. And sometimes the most expedient. (which, occasionally, is a perfectly valid driving force behind a certain bit of coding). However, I am fully convinced it is necessary. You don't even have to look further than Haskell 98 to find a use in the Random module, and Data.Unique _depends_ on the state being global for correctness. ..and of course there's stdin, stdout. That takes some explaining. Not really. If you don't have buffered IO, then you just say stdin = 0 stdout = 1 stderr = 2 nonStdout = 42? I'm afraid I have no idea what your point is :-( I tried it anyway and doesn't seem to work, but that ain't so surprising as Handles aren't Nums. What needs explaining IMO is that we appear to have global Handles exported at the top level from System.IO, but no way for users to write their own modules that do the same for nonStdout, or even to implement getNonStdout. I think that's pretty weird and inconsistent. But perhaps you could show me how to do it with some real Haskell :-) If you need buffered IO, you just change your IO monad* to look like: newtype NewIO alpha = NewIO (ReaderT (Map Fd Buffer) OldIO alpha) Of course, if you do this, you can't go mixing IO with unique values with RNG with mutable state with everything else under the sun anymore. You might actually have to declare exactly what effects you need when you give your function's type, now. Clearly, a horror we must avoid at all costs. Indeed. If anyone thinks that's the way to go maybe Clean would be of some interest. IMHO Cleans treatment of IO and concurrency is just about the worst thing in an otherwise pretty decent language :-( Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: I don't don't think global variables should be banned, I just think they should be severly discouraged. If you're saying a language should not provide a sound way to do this (as I believe you are), then AFAICT for all practical purposes you *are* saying you think global variables should be banned. Where are we going to be if the unsafePerformIO hack ever becomes *really* unsafe? and.. I'm certain you can write a kernel in Haskell where the only use of global variables is those that hardware interfacing forces you to use. But what you haven't explained is why this is even desirable? I don't doubt it's true in an academic sense if you don't mind sacrificing safety and modularity. Why wasn't this done in the (presumably) much simpler case of the Haskell base libs? No hardware constraints there. There are plenty situations where it makes no semantic sense to allow 2 or more or some thing. A list of all active processes for example. Why would I ever want 2 or more lists of all active processes? I think I'd just be setting myself up for trouble and heartache by even allowing such a possibility. Now I could get the safety I need by wrapping all this stuff up in my own custom augmented IO monad right at the start of main. But this solution still lacks modularity. The top level - bindings are just a modular and extensible way to achieve the same thing AFAICS (augmenting real world state with my own custom state). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Johannes Waldmann wrote: Adrian Hey wrote: There are plenty situations where it makes no semantic sense to allow 2 or more or some thing. A list of all active processes for example. all referring to what scope? perhaps there occurs a situation with several process (thread) pools, severals cores etc. Seeing as we're talking about an OS kernel I guess the scope would be all processes active on the (possibly virtual) machine being managed by the OS. But it really doesn't matter what the scope is. All is the key word here. See also singleton considered harmful, there are similar arguments: http://www.oreillynet.com/cs/user/view/cs_msg/23417 Following the arguments made against the singleton pattern over the years leads me to conclude there are 2 distinct camps. Applications programmers who consider it bad because it's way of making global variables and we all know how bad they are, right? Typically these folk appear to have no clue about how the underlying IO library, framework and OS infrastructure they are dependent on *actually works*. System programmers who recognise the need for singletons but regard being forced to use the singleton pattern hack as language design defect. The situation seems similar with us. The unsafePerformIO hack is just terrible (especially for a language like Haskell), but why is it being used so often? Is it incompetance of library writers or a language design defect? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Jonathan Cast wrote: On Thu, 2008-08-28 at 10:00 +0100, Adrian Hey wrote: Lennart Augustsson wrote: I don't don't think global variables should be banned, I just think they should be severly discouraged. If you're saying a language should not provide a sound way to do this (as I believe you are), then AFAICT for all practical purposes you *are* saying you think global variables should be banned. Where are we going to be if the unsafePerformIO hack ever becomes *really* unsafe? and.. I'm certain you can write a kernel in Haskell where the only use of global variables is those that hardware interfacing forces you to use. But what you haven't explained is why this is even desirable? I don't doubt it's true in an academic sense if you don't mind sacrificing safety What `safety' is being sacrificed? and modularity. What modularity? As I've pointed out several times already you can find simple examples in the standard haskell libs. So far nobody has accepted my challenge to re-implement any of these competantly (I.E. avoiding the use of global variables). Why don't you try it with Data.Unique and find out :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: I don't think anyone has claimed that any interface can be implemented without globals. Of course some can't (just pick an interface that is the specification of a global variable). I said in the original challenge even I'd let you (anyone) change the interface if you could provide a sensible explanation of why the new interface was better, safer, more convenient or whatever. What I (and others) claims is that such interfaces are bad. Using a global variable makes an assumption that there's only ever going to be one of something, It's not an assumption, any more than I always want 1*N to yield N is an assumption. It's a fundamental property I absolutely want to guarantee. By far the simplest way to do this is simply not to expose a newWhatever constructor in my API. If I expose anything it should be Whatever itself or getWatever, neither of which is possible if Whatever contains MVars, Chans and the like. What's more, there seems to be no good *semantic* reason why this should not be possible. The only objections seem dogmatic to me. and that's just an inflexible assumption to make. You think global variables are essential, I think they are a sign of bad design. So we have different opinions and neither one of us is going to convince the other. You might stand some chance of convincing me by showing a better design :-) Dan seems to have had a reasonable go at 1 of them. I'm not sure passes the improved interface test but I'll think about it. But there are quite a few left. There's the Hughes paper too of course, using implicit parameters (a highly dubious language feature IMO). But even if someone does produce an entirely unsafePerformIO hack free set of standard libs, I have to ask why jump through all these hoops? There's no semantic difficulty with the proposed language extension, and it should be very simple to implement (John seems to have done it already). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: On Thu, 28 Aug 2008, Adrian Hey wrote: implicit parameters (a highly dubious language feature IMO). How can you say that with a straight face at the same time as advocating global variables? :-) Quite easily, what's the problem? IORefs, Chans etc are perfectly ordinary values. Why should they not exist at the top level? The global variable lives in the world, not the IORef. The IORef is just a reference, no different from filepaths in principle (and if having them at the top level is also evil then making this so easy and not screaming about it seems a little inconsistent to me). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Jonathan Cast wrote: This has been answered repeatedly, at least implicitly. Unless you insist that getWhatever should live in the IO monad and have no functional arguments (why?), there is no reason why this should be impossible. What's more, there seems to be no good *semantic* reason why this should not be possible. The only objections seem dogmatic to me. I haven't seen you give a non-dogmatic reason for wanting global variables yet, either. You consider real examples from real *standard* libs that we're all using (and presumably not written by clueless hackers such as myself) to be dogmatic? I would call that pragmatic myself. These are the standard libs after all. Shouldn't we expect them to be the perfect examples of how to do things rite? But even if someone does produce an entirely unsafePerformIO hack free set of standard libs, I have to ask why jump through all these hoops? To improve the APIs available? There's nothing wrong with the APIs as they are as far as I am concerned. It's their implemenation that's the problem. You're advocating an extension to a *purely functional programming language*. So? What's being proposed doesn't compromise referential transparency (at least no more that the IO monad already does, as some might argue). There's no semantic difficulty with the proposed language extension, Although I've noticed it's grossly under-powered compared to what's needed to implement stdin the way you want to. Can't recall expressing any opinion about how stdin should be implemented so I don't know what your on about here. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Ganesh Sittampalam wrote: On Thu, 28 Aug 2008, Adrian Hey wrote: There's no semantic difficulty with the proposed language extension, How does it behave in the presence of dynamic loading? To answer this you need to be precise about the semantics of what is being dynamically loaded. But this is too complex an issue for me to get in to right now. Actually as far as things like hs-plugins are concerned I'd alway meant one day what exactly a plugin is, semantically. But as I've never had cause to use them so never got round to it. Like is it a value, or does it have state and identity or what? What about remote procedure calls? Dunno, what problem do you anticipate? Also what if I want a thread-local variable? Well actually I would say that threads are bad concurrency model so I'm not keen on thread local state at all. Mainly because I'd like to get rid of threads, but also a few other doubts even if we keep threads. Yes, I'm no big fan of the IO monad (or any other monad in fact) and IORefs and all that (all smacks of putting a purely function veneer on good ol fashioned procedural programming to me). But we are where we are and this isn't going to change any time soon. Just trying to fix what seem like obvious problems with Haskells current IO without doing anything too radical and unproven. (I.E. Just making existing practice *safe*, at least in the sense that the compiler ain't gonna fcuk it up with INLINING or CSE and every one understands what is and isn't safe in ACIO) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: The Haskell philosophy has always been to stick it out until someone comes up with the right solution to a problem rather than picking some easy way out. I understood from your previous remarks that you regarded this as a non-problem even in C. There's no justification for using them, at least if you have clean slate priveleges (no legacy issues). That kind of implies to me that we (or at least you) already have the right solution. What is it and why can't we use it right now in Haskell? (Again assuming we have clean slate and no legacy issues). Or can we.. So I'd rather keep global variables being eye sores (as they are now) to remind us to keep looking for a nice way. Are you looking? I can't even figure out from your posts if you're even prepared to admit that there *is* a problem, other than there being so many people in the world who can't write proper code, in Haskell or C :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: If Haskell had always taken the pragmatic path of adding what seems easiest and most in line with imperative practice it would not be the language it is today. It would be Perl, ML, or Java. The Haskell philosophy has always been to stick it out until someone comes up with the right solution to a problem rather than picking some easy way out. BTW, unsafePerformIO seems quite pragmatic and easy to me, so let's not get too snobby about this. (Sorry, I couldn't resist.) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: BTW, I'm not contradicting that the use of global variables can be necessary when interfacing with legacy code, I just don't think it's the right design when doing something new. AFAICS the use of top level mutable state in the base libs has nothing at all to do with interfacing with legacy code, it's a semantic necessity and there's no legacy code involved. If you want to dispute that then please show some real Haskell code that does as good or better job without it (or point me too the relevant legacy code that makes it necessary). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: I told you where to look at code. It's C code, mind you, but written in a decent way. No well written device driver ever accesses memory or IO ports directly, doing so would seriously hamper portability. Well something must be accessing both. Dunno what you mean by directly I take it you must mean that the driver does not make use of global variables or baked in port addresses in it's source code. Instead you use an abstraction layer to access to hardware, and the driver gets passed a bus (whatever that might be) access token (akin to a capability). I know you're not going to be convinced, so I won't even try. :) I have actually spent the last 20 years or so writing both non-hosted and hosted device drivers for a few OS's. I'm perfectly convinced about the truth of what you say, but not at all convinced about the relevance. It's a red herring IMO as you've introduced a very complex and mysterious black box that itself cannot be implemented without making use of global variables. You can find them easily enough in the Linux kernel source. I'm sure they'll be there in NetBSD too (never looked though). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: I've also written quite a few hosted and non-hosted device drivers (in C). None of them have any global variables. The point is to be able to properly model, understand and if necessary implement *entire systems* without using global variables (allegedly). You can always define sub-system boundaries in such a way that you can claim that this/that or the other sub-system, device driver or whatever does not use global variables if you put the global variables somewhere else and pass a reference to the sub-system concerned. We could play that game with Data.Unique, for example. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
John Meacham wrote: As with all design decisions, it is sometimes the right thing and sometimes the wrong one. And sometimes the most expedient. (which, occasionally, is a perfectly valid driving force behind a certain bit of coding). However, I am fully convinced it is necessary. You don't even have to look further than Haskell 98 to find a use in the Random module, and Data.Unique _depends_ on the state being global for correctness. ..and of course there's stdin, stdout. That takes some explaining. Even with the proposed ACIO and top level - bindings I still couldn't implement a lib that exported a top level nonStdout handle. It'd have to be a getNonStdout IO action. Regarding the necessity of global variables, despite what I've been saying it is of course possible to implement entire systems (programs/processes or whatever main corresponds to) without them if you don't mind explicitly creating all those micro states immediately on entry to main and passing the references around. But this is a highly unmodular, inconvenient, unsafe (because you must expose and allow potentially uncontrained use of newWhateverMicroState constuctors) and a general maintainance nightmare. Definitely not the way to go IMO. So it would be more accurate to say that IMO it's impossible to implement many sane and inherently safe IO lib APIs without using global variables. But people who prefer insane and inherently unsafe APIs could live without them quite happily I guess :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell] Top Level -
Ashley Yakeley wrote: Adrian Hey wrote: Maybe it would be safest to just say anything with a finaliser can't be created at the top level. Do you have an example of something that is correctly ACIO to create, but has a problematic finaliser? Sorry for the delay in getting my attention. I've been looking at my old ACIO code (which I've largely forgotten the details of) and dug up the following snippet. -- * Weak pointers -- | All these functions require that finalisers (if any) are in the -- ACIO Monad. -- I'm not sure what such a finaliser can usefully do, but it can't be -- in the IO Monad and preserve ACIO Monad properties AFAICS. mkWeak,mkWeakPtr,mkWeakPair,addFinalizer,mkWeakIORef, I must admit I can't remember much about any of this, I just worked my way through the IO libs trying to figure out which actions might plausibly (if not usefully :-) be regarded as ACIO. There seemed to be quite a lot at the end of the day. I also added a few forkIO variants which force the forked thread to block on certain things (empty MVar,Chan, QSem..) waiting for some IO monad action to kick them into life. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Judah Jacobson wrote: I've been wondering: is there any benefit to having top-level ACIO'd - instead of just using runOnce (or perhaps oneshot) as the primitive for everything? For example: oneshot uniqueRef :: IO (MVar Integer) uniqueRef = newMVar 0 I've been wondering about something like this too (in some way just have a oneShot or runOnce and the *only* thing in ACIO as a magic primitive). runOnce :: IO a - ACIO (IO a) It would certainly simplify the ACIO monad :-), but I'm not sure it's really as flexible. Provided newMVar can be ACIO then this can be implemented directly (doesn't need to be a primitive). But we can't go the other way round (use runOnce to create genuine top level MVars or channels say). Does that matter? Probably not for monadic IO code. It's not a huge inconvenience to write.. do ... thing - getThing foo thing vs.. do ... foo thing -- thing is at top level But for top level non monadic code/expressions/data structures I can see a certain convenience in having thing as top level identifier if possible, which it often won't be anyway I guess for other reasons (like it's creation and initialisation requires real IO). So I don't have any particularly strong opinion either way. In practice if thing (or getThing) is to be exported then it would probably be prudent to assume creation and initialisation might require real IO at some point in the future even if they don't right now, so you'd export getThing (= return thing) anyway, rather then have an exported thing dissappear from the API at some point. My 2p.. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustssom wrote: Since at some point you have to interface with the hardware you are forced to obey whatever convention is used for interrupts etc. At that point you may very well have to use global variables. But you can quickly abstract away from that in stay in the safe land without globals. BTW, did you notice that the non-hosted driver API example I put on the wiki page doesn't use globals either :-) The two device handles are at the top level (are globals as you would say), but the driver code takes these as an argument so doesn't care if they're top level or not. If I don't have the device handles at the top level or some means to get them from the top level then I have no alternative but to export createDeviceHandle either directly to the user level API (really bad!) or to some other bit of higher level system code that can be trusted not to create 2 or more handles for the same device (and no handles for non-existant devices). Let's assume it's that latter. In the simple case where it is known at compile time that there are exactly 2 devices in the system at known base addresses, can you explain how the higher level bit of the system guarantees that user level application code will only ever have access to 2 different device handles in total (one for each device), no matter what? How does it solve this problem in the more complex complex case where hardware must be probed somehow to discover how many devices there are and where they are? All without using any global variables please. Also, assuming you succeed, please explain what advantages or extra safety your solution provides over a solution which does use global variables. The obvious solution seems to be to have this code present this function to the user level API.. getAvailableDeviceHandles :: IO [DeviceHandle] Which shouldn't be hard to implement in either case, except for the fact IO state getters can't be implemented at all without top level mutable state or runOnce primitives or something like that (AFAICS). newAvailableDeviceHandles perhaps? I guess that could come in handy if the user code decides it doesn't like the old ones for some reason :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
John Meacham wrote: I forgot who came up with the original ACIO idea, but I'd give them props in the manual if they wish. I think this is based on Ian Starks message.. http://www.haskell.org/pipermail/haskell-cafe/2004-November/007664.html Yeah, this sounds like a great idea. there were a whole lot of issues dealing with finalizers and concurrency, and restricting them in some way similar to ACIO might be good... however, you want something a little weaker than ACIO I think. it must satisfy the ACIO conditions, but _may_ assume its argument (the item being collected) is never referenced again. hence something like 'free' is okay which wouldn't be if other references to the object exist. do you think that is 'formal' enough of a description? seems clear enough if ACIO is well defined which I think it is. Yes, now I cast my mind back that was something that was troubling me. Clearly the one thing you're most likely to want to do in a finaliser is free some external resource, which certainly wouldn't be ACIO ordinarily. But as you say, giving sane semantics and type safety to finalisers is very tricky indeed. I can't help thinking that semantically finaliser execution should be treated like some kind of external event handling, like an interrupt. Not sure what that should be properly, but I think finalisers should be the same. But from a top level aThing - someACIO point of view, if we're going to say that it doesn't matter if someACIO is executed before main is entered (possibly even at compile time) or on demand, then we clearly don't want to observe any difference between the latter case and the former (if aThing becomes garbage without ever being demanded). Maybe it would be safest to just say anything with a finaliser can't be created at the top level. We can always define an appropriate top level get IO action using runOnce or whatever. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell] Top Level -
I have a feeling this is going to be a very long thread so I'm trying to go to Haskell cafe again (without mucking it up again). Derek Elkins wrote: Haskell should be moving -toward- a capability-like model, not away from it. Could you show how to implement Data.Random or Data.Unique using such a model, or any (preferably all) of the use cases identified can be implemented? Like what about implementing the socket API starting with nothing but primitives to peek/poke ethernet mac and dma controller registers? Why should Haskell should be moving -toward- a capability-like model and why does top level - declarations take us away from it? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [Haskell] Top Level -
Lennart Augustsson wrote: Making a network stack from peek and poke is easy in a well structured OS. The boot loader (or whatever) hands you the capability (call it something else if you want) to do raw hardware access, and you build from there. If you look at well structured OSs like NetBSD, this is pretty much how they work. No hardware drivers use global variables. So? We all know this is possible outside Haskell. But I don't want to rely on mysterious black box OS's to hand me the capability any more than I want to rely on mysterious extant but unimplementable libs like Data.Unique. Most real world computing infrastructure uses no OS at all. How could I use Haskell to implement such systems? Also (to mis-quote Linus Torvalds) could you or anyone else who agrees with you please SHOW ME THE CODE in *Haskell*! If scripture is all that's on offer I'm just not going to take any of you seriously. Frankly I'm tired of the patronising lectures that always acompany these threads. It'd be good if someone who knows global variables are evil could put their code where their mouth is for a change. Fixing up the base libs to eliminate the dozen or so uses of the unsafePerformIO hack might be a good place to start. I'll even let you change the API of these libs if you must, provided you can give a sensible explanation why the revised API is better, safer, more convenient or whatever. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell] Top Level -
(Moving to Haskell cafe) Edward Kmett wrote: On Sun, Aug 24, 2008 at 7:12 PM, Ashley Yakeley [EMAIL PROTECTED] wrote: Is there any interest in implementing a top level - to run monadic code? This is actually implemented in jhc. See the 'top level actions' section of http://repetae.net/computer/jhc/manual.html Gosh! I was always quite impressed by Johns determination to write a Haskell compiler and by his self discipline in resisting the temptation to fix everything that was wrong with Haskell and keep to standards :-) I implemented my own ACIO monad a while ago (which is of course quite useless without top level - bindings) and it turned out that there was quite a lot that could go in here. The only problem seemed to be that some things that seemed perfectly reasonable to create via ACIO had *IO* finalisers associated with them, which didn't feel right to me. But if you think about how finalisers get run I'm inclined to think we should insist that they are ACIO too. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: poll: how can we help you contribute to darcs?
Brandon S. Allbery KF8NH wrote: (Hm, I feel a blog rant coming on.) I take it you mean as the perfect example of how to kill off interest in your own project :-) I can't help thinking this was so obviously going to happen that it must have been Davids intent at the time he wrote that announcement. The only doubt I have is whether he was just in a grumpy mood at the time and now regrets this, or whether he still feels this way. To answer the OP's question, when I first looked at darcs I was quite enthusastic about it and did indeed consider hacking on it. But a few hours browsing the source code made me realise I would never be able to do this easily because of the lack of decent documentation of source. It was literate haskell (which I dislike anyway) and the literate commentation that I could see told me practically nothing about the code I was actually looking at. Instead it was just the markup that would eventually become the user guide (presumably). So as well as having no useful documentation (for a would be contributor) I had make the extra mental effort deliberately avoiding reading the distacting and utterly irrelevant literate commentation clutter. It would have been easier if there was no comments at all. The net result was that I couldn't even figure out what the various functions I was looking at were trying to do (and hence whether of not they might be the source of the bug or performance bottleneck or whatever). Put simply, (as someone else observed earlier) it seemed to me like it was written and organised so as to be unmaintainable by anyone other than David himself. That said, I think by normal the standards of OS projects darcs *user* documentation is very good, but for would be hackers this is not enough (decent source documentation is needed too IMO). I also think Neils idea of breaking darcs up from 1 monolithic prog to a darcs lib suite is a good idea. This would give decent haddock documentation for most of the code base and an easy way to have multiple user interfaces (gui/web/command line based). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell on ARM (was Re: ANN: Topkata)
Hello Jeremy, Jeremy Apthorp wrote: Next year I'll be working on a project for my undergraduate computing course at UNSW that will involve getting GHC to target the Nintendo DS. It'll require cross-compilation, because the DS isn't powerful enough to actually run GHC (4M main ram and a 66MHz processor). It'll also require that I significantly strip down the runtime system, as the current RTS won't fit in 4M and leave any left over for the main application. Maybe one of these would help (running Linux) .. http://www.iyonix.com/ It'd be good too have a native code generator support for ARM (not via C). Many years ago I tinkered with implementing a lazy FPL on ARM (on my Acorn Risc PC). I never even started the compiler but got a basic single threaded RTS and mark-sweep-compact garbage collector up and running (written in ARM assembler). It wasn't really useable for real programs though as I had to write my function definitions as comments and actually implement the graph reduction code by hand in assembler :-) But I remember the ARM instruction set had some really useful features that made things like checking for stack-heap collision cheap and if you got the register allocation right (which is not hard on the ARM) you could construct the overwhelming majority of heap records using a single STMIA instruction. Almost seemed like it's instruction set was designed for efficient FPL implementation :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Implementing ParseChart with Data.Map
Hello Krasimir, Krasimir Angelov wrote: Hi, I have to write ParseChart implementation with Data.Map/Set. The chart is type like this: type Chart k v = Map k (Set v) now I need operation like: insert :: k - v - Chart k v - Maybe (Chart k v) where the result is (Just _) if the (k,v) is actually added to the chart or Nothing if it was already there and nothing have to be done. The straight forward implementation is: case Map.lookup k chart of Nothing - Just (Map.insert k (Set.singleton v) chart) Just set | Set.member v set - Nothing | otherwise- Just (Map.insert k (Set.insert v set) chart) You can do this quite easily with the AVL library, something like this (untested code) import Data.Cordering import Data.Tree.AVL type Chart k v = AVL (k, AVL v) insert :: (Ord k, Ord v) = k - v - Chart k v - Maybe (Chart k v) insert k v tk = case genOpenPathWith cmpk tk of EmptyBP pthk- Just $! insertPath pthk (k, singleton v) tk FullBP pthk tv - case genOpenPath (compare v) tv of EmptyBP pthv - let tv' = insertPath pthv v tv in tv' `seq` (Just $! writePath pthk (k, tv') tk) FullBP _ _ - Nothing where cmpk (k',tv) = case compare k k' of LT - Lt EQ - Eq tv GT - Gt ..or something like that (maybe you don't want all that strictness) The insertPath writePath functions do involve a second traversal but do not repeat all the comparisons. Also, provided not too much has happened in between, they should be very fast as the nodes on the path are probably still in cache. The important thing is that in the case where Nothing is returned you'll have burned very little heap. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Trouble compiling collections-0.3 (from Hackage)
ccing Haskell Cafe in case anyone else is interested in my answer.. Bryan O'Sullivan wrote: Adrian Hey wrote: I don't think anyone is interested in working on this or maintaining it, so it's probably best not to use it for new stuff. If nobody has stepped up yet, I'll take it over. It would be a shame to see it bitrot. I think the main problem is that the collections package contained a number of different sub-projects that should really be separated out (and indeed have been to some extent), as they were progressing at rather different rates and it was never clear when would be a good time to do a new release of collections as a whole, so it was never done. There's the abstract collections class API's which were really what Jean-Philippe Bernardy was working on. I'm not sure if he's still interested in that so you'd better ask him. I think he's more interested in Yi these days :-) Then there's the AVL tree lib which has been put in hackage as a separate package already. I kinda washed my hands of this too, but I guess I'd better change my mind about that as if I don't maintain it I don't think any one else will. Then there's the relatively trivial AVL based Data.Set/Map clones which aren't in hackage yet but I guess could be put there without too much work. Then there's all the generalised trie stuff which has stalled for the time being, at least until type families are available. You'll probably see some more action on this in the summer as it's the subject of Jamie Brandons GSoC project. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] lookup tables style guidelines
Jan-Willem Maessen wrote: Just to clarify: divide and conquer splits one tree on the root value of the other (possibly avoiding enforcing the balance metric until after joining trees, though not obvious how / if that's useful)? The definition of divide and conquer on trees without a fixed structure is rather unclear, which is why the question comes up in the first place. The Divide and conquer algorithm presented in the Adams paper treats the two trees differently, depending on size. The algorithm used in AVL lib doesn't do this, it treats them both the same. You split each tree by the root of the other tree, then do the sub-unions on the three resulting ranges, then join these using the 2 orignal roots as bridging elements. This seems to give better results, and also aesthetically (an important consideration) seems more natural. With AVL I don't think there's really anything to be gained by not balancing intermediate trees as balancing costs practically nothing and has obvious advantages. Still it's not perfect. If the two trees are about the same size this still seemed to cost about 20% more comparisons than a noddy list merge union would. It's a big win over lists if there's a substantial difference in tree sizes though (and a big win over Hedge in all cases I think). It would be nice if someone could do a good theoretical analysis of the performance of these algorithms. I didn't because my maths wasn't good enough. I just chose algorithms empirically to minimise comparison counts (not execution times), which is the right thing to do for polymorphic implementations. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Trouble compiling collections-0.3 (from Hackage)
David F. Place wrote: Thanks for the tip. I edited the collections.cabal file to add bytestring to the Build-depends. It also needed containers and array! The build gave streams of warnings and finally failed, so I guess this library is not ready for use. I don't think anyone is interested in working on this or maintaining it, so it's probably best not to use it for new stuff. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] lookup tables style guidelines
Jan-Willem Maessen wrote: On Apr 24, 2008, at 11:33 AM, Adrian Hey wrote: Also, if you're likely to be using union/intersection a lot you should know that Data.Map/Set are very slow for this because they use the not efficient hedge algorithm :-) OK, I'm going to bite here: What's the efficient algorithm for union on balanced trees, given that hedge union was chosen as being more efficient than naive alternatives (split and merge, repeated insertion)? My going hypothesis has been hedge union is an inefficient algorithm, except that it's better than all those other inefficient algorithms. Divide and conquer seems to be the most efficient, though not the algorithm presented in the Adams paper. Hedge algorithm performs many more comparisons than are needed, which is obviously bad if you don't know how expensive those comparisons are going to be. IIRC it was something like 4..5 times as many of 2 sets of a million or so random Ints. But even in favourable circumstances (tree elements are boxed Ints) divide and conquer on AVL trees seemed much faster than Hedge on Data.Set. Of course ideally we would want implementations of Hedge for AVL and divide and conquer for Data.Set too, but I didn't feel inclined to write them. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] lookup tables style guidelines
Ketil Malde wrote: Don Stewart [EMAIL PROTECTED] writes: 1) what is the most performant lookup table/hashtable/dictionary solution for Haskell? Data.IntMap is awfully good. Is it benchmarked anywhere? Compared to the Judy bindings, or Adrian Hey's AVL trees, or Data.Hashtable? I must get around to putting the AVL clones of Data.Map/Set in Hackage sometime. Meanwhile anyone wanting to roll their own maps with an API of their chosing could do a lot worse than the raw AVL lib.. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/AvlTree-3.1 Also, if you're likely to be using union/intersection a lot you should know that Data.Map/Set are very slow for this because they use the not efficient hedge algorithm :-) There's a really cool demo I found from wikipedia that shows why it is that AVL trees perform so well in the pathological situation of sorted insertions.. http://www.csi.uottawa.ca/~stan/csi2514/applets/avl/BT.html If you try it you can see that after 2^n-1 sorted insertions you always get a perfectly balanced tree. This explains these benchmark results.. http://groups.google.co.uk/group/comp.lang.functional/msg/74a422ea04ff1217 DData is what became Data.Map/Set and it would appear to be the worst performing of the four tree types tested there :-( Don is right about Data.IntMap/IntSet. For Ints it will be much faster than Data.Map/Set or even (polymorphic) AVL tree. But an AVL tree of unboxed Ints gives faster lookup than IntMap/Set and about the same insert/delete times.. http://www.haskell.org/pipermail/libraries/2005-October/004518.html Union and Intersection times for AVL aren't so great, but I think I know what to do about that (especially intersection). But the real way ahead has to be Tries for non-trivial keys and (I suspect) AVL trees of unboxed Ints for simple keys (serialisable as 1 machine word). This is what that GSoC project is all about. At the moment we have the exact opposite, Tries for Ints and balanced trees for non-trivial keys. Oh well.. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell] ANNOUNCE: Generic Haskell 1.80 (Emerald)
Thomas van Noort wrote: As you already noticed, there is no Windows binary available for the Emerald release. However, there is one for the Coral release, available from: http://www.generic-haskell.org Although this is an old release of Generic Haskell, this release already supports generic types, which is what you need for your project probably. In the user's guide, there is a small example available which defines a generic type to represent tries. Thanks, but this won't register itself as a package with current ghc, and even I fixed that problem I guess it probably still wouldn't work :-( It's entirely up to you folks of course. I don't know if anyone at GHHQ cares enought to do anything about the buildability problem. But if not I'm afraid I can't see this being used or accepted as part of standard Haskell infrastructure. It'll just be something people look at, admire the coolness, maybe even tinker with for a while, but never really use for anything serious. Meanwhile I think the GSoC project I mentioned will have to make other arrangements :-) To be honest though, I'm not sure we'd use it anyway as I understand it generates essentially pure H98 and there are ghc extensions we'd probably want to use for performance reasons (like unboxed Ints and unboxed tuples). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Libraries need a new owner
Don Stewart wrote: That said, I know that type families are provisionally available, so maybe doing something with generalised tries might be possible. I don't mind mentoring anyone who wants to do something with any of this. Great! Would you like to revise the Soc ticket, with this information? Getting some usable generalised tries available would be a great result. Hello Don, I created a new Generalised Tries specific ticket.. http://hackage.haskell.org/trac/summer-of-code/ticket/1560 Do I need to do more? (not really sure how the whole process works). Who decides if proposals are good/ok/bad? Not me I assume :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Libraries need a new owner
Don Stewart wrote: ahey: Hello Folks, As some of you will be aware, I have been working on various Map implementations that currently live here.. http://code.haskell.org/collections/collections-ghc6.8 The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones. Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy. Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-) Thanks How about we propose this work be done for Summer of Code. I've created a ticket here: http://hackage.haskell.org/trac/summer-of-code/ticket/1549 Add comments, or if you can mentor, add that information too! Let's get a new faster Data.Map and other containers ready to go by the end of the northern summer? Hello Don, I'm not sure what you're proposing as the SOC project, but I don't think getting AVL based Data.Map/Set clones in Hackage is particularly suitable or challenging. This work is 99% done and also quite boring. It could be finished by the end of today if I set my mind to it :-) There are 3 significant things that really need doing IMO. 1- Try to reconcile the apparent differences between the collections package and Edison class APIs. I don't really understand either myself, but having multiple classes for the same things makes both implementors and test suite writers lives harder. The generalised trie class GT should still be kept separate as it needs some weird class methods in order to make new instances from old and can't really be finalised until this type families stuff is available anyway. 2- Write a polymorphic test and benchmarking suite for sets, maps, sequences etc. This would be really useful and is something that could reasonably be done as SOC project. But it may also be little boring :-( This could be based on classes defined as a result of 1 (above), or failing that the author would have to define yet another set of class APIs for test/benchmarking only. This may be the simpler approach as it doesn't assume anything about Edison or collections abstractions (it's just a way of testing concrete data structure implementations). 3- Produce some way of automatically deriving (efficient) generalised trie (GT) instances for user defined types. The API is huge and complex (and likely to get bigger still), so hand writing instances really isn't realistic in the long run. But this may be a bit premature for SOC as the GT class API itself is not yet complete or stable, and probably won't be until type families are available (and tested and documented and someone takes the trouble to finish it). So maybe this is something for next years SOC? That said, I know that type families are provisionally available, so maybe doing something with generalised tries might be possible. I don't mind mentoring anyone who wants to do something with any of this. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Dan Weston wrote: 6.3.2 (The Ord Class): The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) Thanks Dan. I didn't grasp the significance of this at first, but I believe you are correct. But maybe it should be = not == in the last line? forall f . (compare x y == EQ and (f x or f y is defined)) == f x = f y) So assuming your (and my) logic is correct, the existing report text does indeed settle the original dispute that sparked this thread. Essentially you can't have 2 distinct values that compare equal, so if they do then they must be indistinguishable? Is that right? So there is no need for the sort on a list of elements whose type is an instance of Ord to be stable as the difference between the results of a stable and unstable sort cannot be observable for any (correct) Ord instance (assuming the the instances compare method was used to perform the sort). So if we have a compare method on this type we can establish the == method: x == y = case compare x y of EQ - True _ - False and from this it follows that (x == y) = True implies x and y are indistingushable. So I believe for types that are instances of both Eq and Ord, this settles the question of what (x == y) = True implies. So now I'm wondering what about types that are instances of Eq but not of Ord? Well from para. 6.3.1 The Eq class provides equality (==) and inequality (/=) methods. Well I guess assuming that saying two values are equal is another way of saying they are indistinguishable then I think it's pretty clear what the report is saying. This interpretation also ensures consistency between Eq/Ord instances and Eq only instances. Assuming this is all correct, I think I can sleep easier now I can forget about all this things being equal and not equal at the same time craziness, at least for Eq/Ord instances that are compliant with the standard (which are the only ones I care about). I think anyone wanting standard classes with different mathematical properties should define them, stick them in Hackage and propose them for Haskell-prime (if that's still happening?) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
[EMAIL PROTECTED] wrote: G'day all. Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. Nonsense. Consider a Schwartzian transform wrapper: data OrdWrap k v = OrdWrap k v instance (Ord k) = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = OrdWrap k1 k2 I take it you mean something like .. instance Ord k = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = compare k1 k2 Where's the Eq instance for OrdWrap? This may or may not satisfy the law: (compare a b) = EQ implies (a == b) = True. I think everbody agrees about that, but I can't tell from the code you've posted if it does in this case. What's disputed is whether or not this law should hold: (a == b) = True implies a = b Again, I can't tell if it does or not in this case, but I assume the point of your post is that it doesn't. AFAICT the report is ambiguous about this, or at least the non-intutive equality semantics are not at all clear to me from what I can see in the Eq class definition (para 6.3.1). I think an the absence of any clear and *explicit* statement to the contrary people are entitled to assume this law is mandatory for all (correct) Eq instances. It would be incorrect (and not sane) for sort [a,b] to return [a,a] in this case, though a case could be made that either [a,b] or [b,a] make sense. Quoting Jules Bean [EMAIL PROTECTED]: Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. Stability is an occasionally very useful property. However, if there is a tradeoff between stability and performance, I'd prefer it if the library didn't choose for me. Well I hope you or anyone else hasn't used Data.Map or with OrdWrap keys because if so it's likely that the code has either been broken in the past, or is broken now (not sure which). But the equality semantics some people seem to want seem to me like a very good way to guarantee that similar bugs and ambiguities will occur all over the place, now and forever. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
wouldn't dispute that the default definition is reasonable, but it's certainly not clear to me from the report that it's something that I can safely assume for all Ord instances. In fact AFAICS the report quite clearly telling me *not* to assume this. But I have to assume *something* for maximum to make sense, so I guess that must be: (x==y) = True implies x=y IOW I have no idea if it's the first or last maximum that is returned, but who cares? Again, the report doesn't make it clear that the (==) law above holds (at least not on page 82). But I think in the absence of any explicit statement to the contary I think most programmers would assume that it does apply. I think this is quite reasonable and I have no intention of changing my programming habits to cope with weird instances for which: (x == y) = True does not imply x=y or max x y is not safely interchangeble with max y x. I'm not saying some people are not right to want classes with more mathematically inspired laws, but I see nothing in the report to indicate to me that Eq/Ord are those classes and consequently that the naive programmers interpretation of (==) is incorrect. Rather the contrary in fact. Regards -- Adrian Hey Aaron Denney wrote: On 2008-03-12, Adrian Hey [EMAIL PROTECTED] wrote: Aaron Denney wrote: On 2008-03-11, Adrian Hey [EMAIL PROTECTED] wrote: Having tried this approach myself too (with the clone) I can confirm that *this way lies madness*, so in future I will not be making any effort to define or respect sane, unambiguous and stable behaviour for insane Eq/Ord instances for any lib I produce or hack on. Instead I will be aiming for correctness and optimal efficiency on the assumption that Eq and Ord instances are sensible. Good. But sensible only means that the Eq and Ord instances agree, not that x == y = f x == f y. So can I assume that max x y = max y x? No. You can, however, assume that max x y == max y x. (Okay, this fails on Doubles, because of NaN. I'll agree that the Eq and Ord instances for Double are not sane.) If not, how can I tell if I've made the correct choice of argument order. When calling, or when defining max? It depends on what types you're using, and which equivalence and ordering relations are being used. When calling, and when it might matter which representative of an equivalence class comes back out (such as in sorts) you have no choice but to look at the documentation or implementation of max. The Haskell report guarantees that x == y = max x y = y (and hence max y x = x), and the opposite choice for min. This is to ensure that (min x y, max x y) = (x,y) or (y,x). IOW, the report notices that choice of representatives for equivalence classes matters in some circumstances, and makes it easy to do the right thing. This supports the reading that Eq a is not an absolute equality relation, but an equivalence relation. If I can't tell then I guess I have no alternative but document my arbitrary choice in the Haddock, and probably for the (sake of completeness) provide 2 or more alternative definitions of the same function which use a different argument order. When defining max, yes, you must take care to make sure it useable for cases when Eq is an equivalence relation, rather than equality. If you're writing library code, then it won't generally know whether Eq means true equality rather than equivalence. If this would let you optimize things, you need some other way to communicate this. The common typeclasses are for generic, parameterizable polymorphism. Equivalence is a more generally useful notion than equality, so that's what I want captured by the Eq typeclass. And no, an overloaded sort doesn't belong in Ord, either. If the implementation is truly dependent on the types in non-trivial, non-susbstitutable ways (i.e. beyond a substition of what = means), then they should be /different/ algorithms. It would be possible to right an Equal a typeclass, which does guarantee actual observable equality (but has no methods). Then you can write one equalSort (or whatever) of type equalSort :: (Eq a, Ord a, Equal a) = [a] - [a] that will work on any type willing to guarantee this, but rightly fail on types that only define an equivalence relation. A stable sort is more generally useful than an unstable one. It's composable for radix sorting on compound structures, etc. Hence we want to keep this guarantee. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Luke Palmer wrote: On Thu, Mar 13, 2008 at 1:00 AM, Adrian Hey [EMAIL PROTECTED] wrote: AFAICT the report is ambiguous about this, or at least the non-intutive equality semantics are not at all clear to me from what I can see in the Eq class definition (para 6.3.1). I think an the absence of any clear and *explicit* statement to the contrary people are entitled to assume this law is mandatory for all (correct) Eq instances. In mathematics we usually *don't* assume things that aren't stated assumptions. But the trouble is the report says practically *nothing* about Eq class or what the (==) operator means. It all seems to be assumed, and even when it does talk about it informally it talks about equality, not equivalence or some other word. The report doesn't state that for all Ints, (x==y = True) implies that x=y. There's no reason to suppose the Int instance is in any way special, so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Luke Palmer wrote: On Thu, Mar 13, 2008 at 3:02 AM, Adrian Hey [EMAIL PROTECTED] wrote: The report doesn't state that for all Ints, (x==y = True) implies that x=y. There's no reason to suppose the Int instance is in any way special, so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y Of course not :-). However, on what grounds am I to assume that these two will be semantically equivalent for instances other than Int? Umm..Maybe the fact that you're using the == method from the Eq class, not some Int specific isIntEqual function? :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Aaron Denney wrote: so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y In Int code, of course not, because I know the types, and I know the behaviour of (==) on Ints. But f is specialized to work on Ints, isn't it, so it's reasonable to know what semantics (==) has for Ints, and depend on them? Why are Ints special in this way? Couldn't you use say exacly the same about any type (just substitute type X of your choice for Int) IMO if your going to define a type X which is intended to be an Eq instance you should always ensure, one way or another that all exposed primitives that operate on that type respect equality, as defined by == for the instance method. (And hence more complex functions built on those primitives do too). Just MO, the report doesn't make this clear 1 way or another AFAICS. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
[EMAIL PROTECTED] wrote: What's disputed is whether or not this law should hold: (a == b) = True implies a = b Apart from possibly your good self, I don't think this is disputed. If that's supposed it imply you think I'm in a minority of one I don't think you've been following this thread very well. Even the report uses the word equality in the prose. And as I pointed out in another post, even the standard library maximum function appears to ambiguous if the law doesn't hold. It can be disambiguated if Aarons max law holds: (a == b) = True implies max x y = y But this is only true for the *default* max implementation. One of the few explicit things the report does say on these matters is that the default methods should *not* be regarded as definitive. Besides there are good pragmatic safety and performance reasons why Haskell should provide at least one class that offers strong guarantees regarding equality and the (==) operator. If that class isn't Eq, then where is it? The (==) law holds for: 1- All standard Eq instances 2- All wholly derived Eq instances 3- Most hand defined instances I suspect. ..and has almost certainly been implicitly assumed by heaven knows what extant code (some of it in the standard libraries I suspect). So I think that we should recognise that this was the original intent for the Eq class and this should be made official, albeit retrospectively. If there's a need for a similar class where the (==) law doesn't hold that's fine. But please don't insist that class must be Eq. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Denis Bueno wrote: On Tue, Mar 11, 2008 at 4:01 AM, Adrian Hey [EMAIL PROTECTED] wrote: and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Isn't 3 also a permutation? Why is it incorrect? Because it is not stable. The documentation for Data.List.sort says the sort is stable: The sort function implements a stable sorting algorithm. A stable sort respects the order of equal elements as they occur in the input list. This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. So really I think the docs have this backwards. It's sortBy that implements a stable sort (assuming a suitably sane comparison function I guess) and apparently sort is whatever you get from (sortBy compare). But this is unduly restrictive on possible correct sort implementations IMO. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Ketil Malde wrote: Adrian Hey [EMAIL PROTECTED] writes: So really I think the docs have this backwards. It's sortBy that implements a stable sort (assuming a suitably sane comparison function I guess) and apparently sort is whatever you get from (sortBy compare). But this is unduly restrictive on possible correct sort implementations IMO. Somebody (maybe you?) suggested that 'sort' should be a function in class Ord, giving the implementer freedom to decide exactly what is optimal for that particular data type. Could this also be used to solve the Data.Map issue? I mean, could Data.Map.insert use 'sort' instead of 'compare' to place new elements? I don't really think so. To be consistent people would have to do this all over the place, not just in Data.Map/Set. Anywhere where you have code like this (for Ord instances) if (x==y) then f x -- f y should be just as good! else g x y you'd now have to have something like.. if (x==y) then f (head (sort ([x,y])) else g x y Also, since the problem is with the concept of equality, in that we're now admitting that things can be equal but also not equal at the same time then choice should really be a method of the Eq class.. Something like.. -- Returns Nothing if args are not equal -- Just p if args are equal, where p is the prefered equal value chose :: Eq a = a - a - Maybe a Like I said, this way lies madness!! Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Jules Bean wrote: Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. The fact that you can't implement a function to differentiation does not meant the difference is not important. [b,a] might cause 500G of file IO which [a,b] will not cause. I can't imagine why, unless this is some weird side effect of lazy IO (which I thought was generally agreed to be a bad thing). What if it's the [a,b] ordering that causes this but the [b,a] ordering that doesn't? The choice is arbitrary (for sort), but neither is obviously correct. This is not observable within haskell, but is very observable to the user. Yes, there are plenty of things like space and time behaviour that are not observable in the semantic sense, but have real obvervable consequenses in the practical sense of real executables running on real machines. But constraints like this and Data.Set/Map left biasing often mean that implementations have to be made unnecessarily time and space *inefficient* for no good semantic reason. Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. I'm not arguing against it for sortBy. I wouldn't even object to the existance of an overloaded.. stableSort = sortBy compare by definition. I am arguing against it for the default sort that is applied to all types because for many types there will be more efficient alternatives which are perfectly correct in the semantic sense, but don't respect the (semantically meaningless IMO for Ord instances) stability property. Of course the proper place for this hypothetical sort (and several other variations) is probably as an Ord class method, not a single overloaded function in Data.List. I would also regard any use of stableSort (in preference to the hypothetical unstable overloaded sort) with about the same degree of suspicion as any use of unsafePerformIO. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Remi Turk wrote: I wouldn't bet on it either: Prelude 0.0 == -0.0 True Prelude isNegativeZero 0.0 == isNegativeZero (-0.0) False Although isNegativeZero might be considered a ``private, internal interface that exposes implementation details.'' Interesting example. So is the correct conclusion from this that all (polymorphic) code that assumes (x == y) = True implies x=y is inherently broken, or is just this particular Eq instance that's broken? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Aaron Denney wrote: On 2008-03-11, Adrian Hey [EMAIL PROTECTED] wrote: Having tried this approach myself too (with the clone) I can confirm that *this way lies madness*, so in future I will not be making any effort to define or respect sane, unambiguous and stable behaviour for insane Eq/Ord instances for any lib I produce or hack on. Instead I will be aiming for correctness and optimal efficiency on the assumption that Eq and Ord instances are sensible. Good. But sensible only means that the Eq and Ord instances agree, not that x == y = f x == f y. So can I assume that max x y = max y x? If not, how can I tell if I've made the correct choice of argument order. If I can't tell then I guess I have no alternative but document my arbitrary choice in the Haddock, and probably for the (sake of completeness) provide 2 or more alternative definitions of the same function which use a different argument order. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Jonathan Cast wrote: On 10 Mar 2008, at 4:00 AM, Adrian Hey wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Data.Map is implicitly using such an Ord instance behind the scenes, and I think it has to to maintain its own invariants. If I take the `union' of two maps that take the same key to different values, I have to decide which value to use, even if every Ord instance supplied by my clients is 100% Adrian-compliant. The biasing policy for Data.Map/Set is refering to (Set) elements, or (Map) keys, not the associated values (in a Map). So during an insertion op, if an equal element/key is found the Set/Map the biasing policy tells me which of the two equal elements/keys in incorporated in the resulting Set/Map. So there's an implied acceptance of the posibility that the choice is significant and that the two elements/keys can be both equal and not equal at the same time. This is crazy IMO. Implementors should not have to offer an guarantees about this and should be perfectly free to make their own unspecified choice regarding which of two equal values is used in any expression (on space efficiency grounds say). For example, the left biasing of insertion on Data.Set forces the implementation to burn O(log n) heap space creating a new equal set, even if the set already contains an old element that is equal to the new element. I would argue that in this situation it should be perfectly correct to simply return the old set instead. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Adrian Hey wrote: or specify and control the behaviour of their behaviour for such instances. Urk, sorry for the gibberish. I guess I should get into the habit of reading what I write before posting :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). The sort in Haskell is specified to be stable. What that means is that the above ord relation is fine. The above ordering observes all the necessary mathematical definitions of ordering, assuming an Eq of: instance Eq Foo where (==) (Foo a _) (Foo b _) = (==) a b Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Please give the sane law that this ordering violates. I can't see any! The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. The Haskell standard doen't even specify that compare x y = EQ implies (x == y) = True, but again it should (what's the purpose of the Eq constraint on Ord class otherwise). What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) Now, is the only possible answer that any Ord instance for Foo is wrong? Yes, if the Foo constuctor is exported. If it's scope confined to one module then you could maintain the invariant that the same function is always associated with a given Int. However, if this is the case then the issue you raise wrt sort behaviour is irrelevant. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Wrong. It shouldn't, Should too it doesn't, indeed and I don't think it even can! Well we need to be precise about exactly what = means, but informally I guess we're talking about observational equvalence. But seriously, once you admit the possibility that even if x == y it still matters which of x or y is used in expressions than all hell breaks loose. I shudder to think just how much Haskell code there must be out there that is (at best) ambiguious or just plain broken if this is a serious possibility. Again, I have to cite Data.Map as an obvious example. It's unclear to me exactly what the proper interpretation of left biasing is for all functions in the API. Furthermore, until quite recently some function implementations in Data.Map we're actually broken wrt the stated biasing policy (though few actually noticed this for obvious reasons). Perhaps some still are? Who knows.. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Bulat Ziganshin wrote: Hello Adrian, Monday, March 10, 2008, 2:00:18 PM, you wrote: instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I h. for example, imagine files in file manager sorted by size or date In such cases you should be using sortBy, not the overloaded sort (you have several reasonable orderings for the same record type say). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Ketil Malde wrote: Adrian Hey [EMAIL PROTECTED] writes: But seriously, once you admit the possibility that even if x == y it still matters which of x or y is used in expressions than all hell breaks loose. I shudder to think just how much Haskell code there must be out there that is (at best) ambiguious or just plain broken if this is a serious possibility. Just search for copy (on ByteStrings). One program of mine was extracting substrings from a large file. Since the file was pretty huge, I used lazy bytestrings for this purpose. Unfortunately, the short substrings I retained pulled with them rather large chunks from the file -- since a bytestring is essentially a pointer to a chunk, an offset, and a length. The solution is copy, which creates a new string, indistinguishable from within Haskell, but with very different effects on the program. We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). Indeed, one of my main objections the having things like biasing policy as part of a functions specification in that it seriously inhibits you when producing more refined and efficient implementations. BTW, I noticed this when I was writing my Data.Map clone. Respecting the stated biasing policy resulted in less efficient implementations. It broke my heart to knowingly write code that would slow down 99% of users code just keep the 1% who'd defined broken Ord instances happy, so I defined biasing policy differently for the clone. On reflection I think even that was a mistake and is something I intend drop if I ever do a Hackage release (the lib should not specify any biasing policy whatsoever). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Denis Bueno wrote: On Mon, Mar 10, 2008 at 10:10 AM, Adrian Hey [EMAIL PROTECTED] wrote: The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Unless I'm missing something obvious, the example Neil gave earlier should make it clear how impossible this requirement is: What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) There is no way in general to decide the observational equivalence of two values of this data type (by reduction to the halting problem). Therefore it is impossible to write any function implementing such an equality test. Did you read my original response to this example? http://www.haskell.org/pipermail/haskell-cafe/2008-March/040356.html Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). I would have thought: sort x == sortBy compare x was a reasonable property to have. Certainly, but this is part of (but not the complete) specification for sortBy, not sort. But given sane Ord/Eq instances and sortBy implementation, then this is indeed also one of many possible correct implementatations of sort. But you are proposing that sort should make assumptions on the compare function, Not just sort, but any function with an Ord constraint is entited to assume sane behavior wrt to compare. Without this the Ord class just becomes quite useless, other than saving a few keystrokes for folk who be bothered to pass any old compare function as explicit arg. Surely type classes are supposed to be more than that? which you can't even state in Haskell, There are plenty of formal statements about things that can't be written in Haskell. That doesn't mean they aren't true or should not be respected or relied upon. It just means Haskell is an imperfect language for expressing such things. but sortBy should not. sortBy should not assume anything about the function of type x - x - Ordering. Rather, what sortBy actually does with that function should be specified. I also know of a data type: data Set xs = Set [xs] where the Set constructor is all nicely hidden. If I define Set ab to be equal to Set ba, should the compiler burst into flames? ?? If we _require_ all Eq definitions to follow our very narrowly defined notion of equality, then the question comes up why we permit people to write Eq at all - why doesn't the compiler just do it for us, if there is only one right answer. You provided one example yourself.. data Foo = Foo Int (Int - Int) It's perfectly possible for Foo to be an abstract type exported from a module that preserves the invariant that the same function is always associated with a given Int (value). If this is the case there's no reason why Foo should not be an instance of Ord or Eq. If this isn't the case then Foo should certainly not be an instance or either class IMO. If this was intended to be the case but in fact isn't the case, then that's a bug. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Krzysztof Skrze;tnicki wrote: Ok, my turn now. Let's think about algorithm that takes equivalence relation EQ, ordering relation ORD on abstraction classes generated by this equivalence ( - equivalence classes ) and divides given input elements XS into appropriate classes and then prints them out according to given ordering ORD. If we pose the restriction (let's call it (*)), that input XS should have at most one element from every abstraction class, we get sorting in a way that you desire. Hovewer, if we don't pose that restriction the algorithm still makes perfect sense and is given by standard library sortBy. I see no reason for restriction (*). I don't understand the above paragraph. Let's consider a simple example: (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] I would say they are all correct and equivalent for any sane Ord instance, though from the point of view of space efficiency 1 or 4 are preferable to 2 or 3. For efficiency reasons there could be additional class StrictEq. If the type is in that class then we can assume (*) and use more space-efficient algorithm. Now, the problem with treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x - (x,[x])) is that i can't tell any compact way of implementing treeSortBy in nice manner. This is because Data.Map does not allow us to provide our own comparison test instead of compare. As a practical matter, for benchmarking you should also count the actual number of comparisons needed, not just execution times for simple types (Ints presumably). Also, I think you'll find that the AVL lib gives better performance than Data.Map, particularly for sorted inputs. Unfortunately you can't use this implementation in the standard libs without making the AVL lib a standard lib (the same happens to be true of Data.Map too, thought this is widely perceived as being standard because of ghc library bundling :-) But actually I would say that if either (both) of these is faster than the the standard sort then this is some kind of performance bug with the current ghc release. They weren't faster last time I tested (with Ints). I also happen to think that sort should be made an Ord class method, so that trie based sorts are possible (which should be faster for complex data types). We should only use sort = sortBy compare as the default. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Gobbler Benchmarks
Hello Folks, There's been some discussions recently about the pros and cons of various coding styles, particularly whether stack greedy or heap greedy is best, and how (if) ghcs stack management in particular should affect all this. In particular, the problem of implementing an eager take function. Here's some real numbers measured with ghc 6.8.2 under windowsxp using AMD Athlon 64 1.8 GHz. The source code can be found here.. http://homepages.nildram.co.uk/~ahey/Test1.zip There are 4 possible implementations that have been tested: -- Uses O(n) stack stackGobbler :: Int - [x] - [x] stackGobbler 0 _ = [] stackGobbler _ [] = [] stackGobbler n (x:xs) = let xs' = stackGobbler (n-1) xs in xs' `seq` (x:xs') -- Uses O(n) heap instead, O(1) stack heapGobbler :: Int - [x] - [x] heapGobbler = heapGobbler' [] where heapGobbler' rxs 0 _ = reverse rxs heapGobbler' rxs _ [] = reverse rxs heapGobbler' rxs n (x:xs) = heapGobbler' (x:rxs) (n-1) xs -- Neils O(n) heap version, O(1) stack neilGobbler :: Int - [x] - [x] neilGobbler n xs = length res `seq` res where res = take n xs -- Continuation passing O(n) heap version, O(1) stack cpGobbler :: Int - [x] - [x] cpGobbler = f id where f c 0 _ = c [] f c _ [] = c [] f c n (x:xs) = f (\xs' - c (x:xs')) (n-1) xs There are 16 tests for each, parameterised by p=0..15. Each test takes 63*(2^p) elements from a test list of the same length, and is repeated 2^(25-p) times. So in total, 63*(2^25) elements are processed in each case (independent of p). Here are the results in myCpuTimePrecision = 1562500 units (the figure exported by System.CPUTime is wrong for me). To convert these to actual time per element figures you need to multiply each by 7.4 pS (I think :-). All tests were run with fixed stack and heap sizes of 16 and 256 MiB respectively. pstack heap neil cp 0 - 1793 2684 4937 2593 1 - 1860 2673 4897 2584 2 - 1910 2673 4825 2578 3 - 1927 2659 4819 2575 4 - 1946 2657 4813 2574 5 - 1950 2656 5048 2578 6 - 1960 2711 5036 2627 7 - 1976 2730 5126 2643 8 - 2072 2900 5197 2813 9 - 2439 3044 5153 2974 10 - 2685 3275 5371 3199 11 - 2760 3384 5466 3321 12 - 2930 3487 5525 3444 13 - 3181 3648 5813 3698 14 - 3698 3973 6417 4031 15 - 4727 4987 7964 5224 So pretty much as I expected. For smallish lists, stackGobbler is easily the fastest, heapGobbler and cpGobbler are pretty similar, and neilGobbler is the slowest (sorry Neil:-). The performance of all is degraded as p increases. I guess this is not too surprising, but stackGobbler seems to degrade faster so that at p=15 there's not much difference between it and heapGobbler/cpGobbler. I'm not sure what it is that causes stackGobbler to be unfairly penalised this way, but I'm reminded of this post from John Meacham.. http://haskell.org/pipermail/glasgow-haskell-users/2007-May/012470.html The other big problem with stackgobbler in practice is the risk of stack overflow. For p=15 it would not work at all for ghc default stack limit. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Stack overflow
Philip Armstrong wrote: On Mon, Feb 18, 2008 at 05:56:41PM +, Adrian Hey wrote: Philip Armstrong wrote: On Sun, Feb 17, 2008 at 10:01:14PM +, Adrian Hey wrote: BTW, I find this especially ironic as fromDistinctAscList is the perfect example what I was talking about in another thread (continuation passing madness caused by an irrational fear of stack use). In *some* cases, continuation passing can be quite a bit faster than other approaches IIRC. I haven't looked at this bit of code though. Can you or anyone else give an example? I think I was thinking of this: http://r6.ca/blog/20071028T162529Z.html but can't be entirely sure. Sorry! Perhaps I should say can someone provide an example and explain why they believe it is faster. That isn't to say I doubt the measurements taken with that particular code, but I must say that if transforming code that uses the hardware accelerated continuation passing (by that I mean the machine stack of course) to explicit continuations on the heap is generally faster, there must be something very wrong in stack land. But perhaps there is? Interestingly, if you allow sufficient stack space for Grzegorz version (+64M) it does terminate, but it's still much slower than the version you provided (about 30 times slower). I don't know if this slow down is due only to the stack use, or whether both the stack use and the slow down are just different artefacts of the problem Bertram mentioned. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell maximum stack depth
Simon Marlow wrote: The point is, GHC has no such thing as the overall program memory limit unless by that you mean the total amount of memory + swap in your machine. You can set a limit with +RTS -M, but there isn't one by default. So what happens when you write a program with a space leak is that it gobbles up all the memory, and turns on the hard disk light for a few minutes until the OS gives up and kills it. Or at least, you hope it kills your program and not something else important you were doing at the time. Well if having unbounded (by default) memory use in the form of heap is OK with most users then I can't see why the same should not be OK for stack. An errant program is just as likely to explode the heap as it is the stack AFAICS. We used to have a default heap limit, for the reason above, but I was persuaded to remove it because it was too much of a pain to keep having to say +RTS -Mwhatever when your program ran out of memory. So we could do the same with the stack limit - the only concern is whether it will be highly inconvenient when we accidentally write programs with infinite loops, which in my experience seem to occur more than space-leaky programs. To be honest, in all my years of Haskelling I can't think of a single occasion where I've had a program get stuck in an infinite loop. I've had plenty of stack overflows, and they're reported on the mailing lists pretty regularly, but on all such occasions it's been caused by deep but very definitely finite recursion. We could also set the limit a lot higher than it currently is. Or, we could try to figure out how much memory is in the machine and set it based on that. Basically, I don't care that much, unless it means I have to do a lot of work to implement it :) I don't think just keeping the implementation as it is and just changing (or removing) the limit is really an option. Unfortunately, as things are at present, using a lot of stack with a program compiled by ghc really is a bug and the limit does provide users with some protection against this. But IMO the bug is in the ghc rts, not the users source code most of the time :-( I think at the minimum, the stack shrinking mod you suggested should be implemented before the limit is removed. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Stack overflow
Philip Armstrong wrote: On Sun, Feb 17, 2008 at 10:01:14PM +, Adrian Hey wrote: BTW, I find this especially ironic as fromDistinctAscList is the perfect example what I was talking about in another thread (continuation passing madness caused by an irrational fear of stack use). In *some* cases, continuation passing can be quite a bit faster than other approaches IIRC. I haven't looked at this bit of code though. Can you or anyone else give an example? Thanks -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Stack overflow
Philip Armstrong wrote: Since no-one else has answered, I'll take a stab. Obiously, you have a stack leak due to laziness somewhere I wouldn't say that was obvious, though it is certainly a possibility. I'm never exactly clear what people mean by a stack leak. It seems some folk regard any algorithm that makes use of O(n) or worse stack space as a stack leak. My opinion is that using O(n) or worse working memory when the job could be done in O(log n) or O(1) memory is certainly bad, but using O(n) stack is no worse in principle than using O(n) heap. But at the moment it is worse in practice with ghc, regretably :-( In fact, a little experimentation has revealed that this: do [path] - getArgs m - liftM decode (BS.readFile path)::IO [((Int, Maybe String), Int)] putStrLn . show . findMax . fromAscList $ m will work just fine. No extra evaluation needed at all! I'll leave it to the Haskell gurus to explain what's going on... That's very interesting. Strangely if you use fromDistinctAscList instead (as used by the Binary instance get method) the problem re-appears. This is despite fromAscList making use of fromDistinctAscList. BTW, I find this especially ironic as fromDistinctAscList is the perfect example what I was talking about in another thread (continuation passing madness caused by an irrational fear of stack use). As to what's really going on here, I haven't figured it out and I'm not really inclined to try :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Stack overflow
Bertram Felgenhauer wrote: I'm fairly certain that the stack overflow is (indirectly) caused by Data.Binary, not Data.Map. Yes, I think you are right. At least it seems clear that the stack overflow is not directly caused by fromDistinctAscList. The result of 'decode' is a list of known length with unevaluated elements (because the Get monad is lazy) [*], where each element depends on the previous one (because reading is sequential, and the elements of the list may have nonuniform size). Now when you evaluate the last element, that will demand the last but one element, which in turn will demand the previous element, and so on. If the list is long enough, you'll get a stack overflow. Using fromAscList instead of fromDistinctAscList cures the problem, because it inspects every key in the list, starting from the beginning, in order to eliminate duplicates. fromDistinctAscList, on the other hand, does not inspect the keys at all. do [path] - getArgs m - liftM decode (BS.readFile path)::IO [((Int, Maybe String), Int)] putStrLn . show . last $ m should also exhibit the same stack overflow. Indeed it does. So it seems if I understand correctly, this is yet another example what I would call the lazy applied strict thunk chain problem (I'm sure you understand what I mean by that). The standard party line advice in such circumstances is to prevent their formation by evaluating eagerly as they are applied. However, this is often not possible for users of libraries they don't own, if the API does not provide the necessary strictness control. I would argue that even if it possible is sometimes undesirable too (if evaluation is expensive, saves nothing in heap use and the final value may never be needed). So I still think the stack management system should be designed so that as far as is practical (given finite memory), any expression that can be built on the heap can also be evaluated without causing a stack overflow. But I guess this rant is not much help to the OP :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Adrian Hey wrote: AFAICT neilGobbler isn't even entirely safe as an implementation of an eager take. There's nothing the Haskell standard to stop it being transformed into.. neilGobbler :: Int - [x] - [x] neilGobbler n xs = length (take n xs) `seq` take n xs Whoops, I see stackGobbler has the same problem.. -- Strict version of take stackGobbler :: Int - [x] - [x] stackGobbler 0 _ = [] stackGobbler _ [] = [] stackGobbler n (x:xs) = let xs' = stackGobbler (n-1) xs in xs' `seq` (x:xs') I guess this is an example of the Haskell standard needing to be tightened up a bit, but that is another story.. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: Hi But the point is that *both* heapGobbler and neilGobbler are likely to be slower and chew through at least twice as much heap as stackGobbler, which would be the implementation of choice for both simplicity and performance if it wasn't for this stack management problem. Sure? Yes, though testing stackGobbler with a large enough data set could be problematic for the very reason we've been discsussing. But let's say your hypothesis was correct. If so then presumably *all* Haskell programs could give better performance than they currently do if we nuked the stack completely and have ghc generate CPS style code. This too would be fine with me. The problem with the current situation is that we have perfectly sound and correct programs that crash quite unnecessarily (and even if they don't get quite that far, can still cause considerable per thread memory wastage if what SPJ says is true). Why their authors choose to use a stack greedy implementation and whether that was by design or a mistake really *isn't* the point. As I said before (this is the third time I think), the fact that these programs use a lot of stack at all is just a peculiarity of *ghc* implementation, so it really is a ghc responsibility to do a decent job of stack management IMO. It's not a programmer responsibility to code in such a way that minimal stack is used (with ghc). That sounds like the thing that people can conjecture, but benchmarks can prove. And I'd claim that neilGobbler is the simplest function by a large margin. AFAICT neilGobbler isn't even entirely safe as an implementation of an eager take. There's nothing the Haskell standard to stop it being transformed into.. neilGobbler :: Int - [x] - [x] neilGobbler n xs = length (take n xs) `seq` take n xs Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: Hi I have already partially scanned the list looking for the first element that satisfies some condition, using a tail recursive search. If such an element is found I want to split the list at that point. span/break? I really can't believe the memory overhead of span is that terrible, you are only duplicating the (:)'s and its only one traversal. As an aside, my version of this function would be: neilGobbler :: Int - [x] - [x] neilGobbler n xs = length res `seq` res where res = take n xs My guess is it will use O(1) stack and burn O(n) heap (in addition that actually used by the result), so assymptotic complexity wise same as heapGobbler, but probably higher constant factors with ghc due to lazy building of take thunks and subsequent reduction and indirection costs. Sure? Guessing constant factors related to strictness and laziness is incredibly hard! My guess based on gut feeling is that the program will take less time, and use half the memory. But given my initial comment, that guess is not very reliable. But the point is that *both* heapGobbler and neilGobbler are likely to be slower and chew through at least twice as much heap as stackGobbler, which would be the implementation of choice for both simplicity and performance if it wasn't for this stack management problem. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: Hi If you mean an example of coding style and choice of stack vs. heap, I already have.. http://www.haskell.org/pipermail/haskell-cafe/2008-January/038832.html I'm at a loss as why you want a strict version of take. It's clearly not for performance, as it performs slower. I'd say both the gobbler programs have a bug, namely that they are not sufficiently lazy. I have already partially scanned the list looking for the first element that satisfies some condition, using a tail recursive search. If such an element is found I want to split the list at that point. If such an element is not found the entire list has been scanned without using any extra stack or heap (other than that used by the list itself and the condition test). I could build the reversed list accumulator on the heap as I did the search, but I don't because this will be completely wasted effort in the case where such an element is not found. So instead I just use an unboxed Int to count how far I get and have the search return this and the unsearched suffix (in the case where a matching element is found). But the lifetimes of the list prefix and suffix from this point on are completely unrelated so I don't want the prefix thunk to be hanging on to the unknown sized suffix. As I already know that the list has been evaluated at least up to the point that it gets chopped off, I choose to use a strict (eager) take. As an aside, my version of this function would be: neilGobbler :: Int - [x] - [x] neilGobbler n xs = length res `seq` res where res = take n xs I have no idea if it takes the heap or the stack, or if it performs faster or slower. If you still have whatever test you used on the gobbler, perhaps you could tell us. My guess is it will use O(1) stack and burn O(n) heap (in addition that actually used by the result), so assymptotic complexity wise same as heapGobbler, but probably higher constant factors with ghc due to lazy building of take thunks and subsequent reduction and indirection costs. If you mean an example of it biting with lazy code, this is discussed so often you'd be spoiled for choice if you search the mailing list archives. Here's a good one.. http://www.haskell.org/pipermail/haskell-cafe/2005-December/013521.html This example actually shows the problem twice. In one case it's solvable by using foldl' instead of foldl. Which reduces the memory from O(n) to O(1). Are you sure about that? Using foldl' here eliminates one of the two possible sources of stack overflow, but it doesn't eliminate a space leak. It's O(n) either way. Using strict Map insertion will eliminate a space leak (in this case) and also a possible source stack overflow. Surely thats a good thing, Would be if it was true :-) and the code before had a space leak. Space leak is bad, therefore telling people about it is good. There are plenty of space leaks that won't cause stack overflows, and plenty of stack overflows that aren't caused by space leaks (see above for one example). Again I have to say that even if true, I think this is a pretty lame justification for the current implementation. The *default* behaviour of any useful program should surely be to make best effort to carry on working (and in due course deliver an answer or whatever), even if there is unexpectedly high stack use for some reason (that may or may not be a bug). I think its sensible to let people set their own stack bound (which is possible), I have no objection to people bounding their stack if that's their choice. I can't imagine why anybody who stopped to think about this would actually want this feature, but it's free world. What I object to is it being bounded by default to something other than overall program memory limit. I know that I could probably achieve this effect myself with +RTS options, but I also want to be able to write libraries that other people are going to use safely without having to add a appropriate warning in the documentation to the effect that some parts use O(n) stack space deliberately, by design. But of course this all assumes that underlying implementation is sufficiently robust to make unbounded stacks safe (at least as safe as any other unbounded data structure). Unfortunately it seems this isn't the case at present if what various folk have told me is true. but that clearly just from taking an informal poll of respondants to this thread, the current default should indeed be the default. You seem to be the only person clamouring for an unlimited stack, Yes, this is strange. Same thing happened in the global variables debate despite it being obvious to any thinking person that I was correct. Denial of the reality of some very simple examples of the problem was typical of that debate too. :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Stefan O'Rear wrote: On Mon, Feb 04, 2008 at 10:13:12PM +, Adrian Hey wrote: Also remember that this behaviour never wastes more than 50% of the stack, which is a relatively small amount. Only if the stack is relatively small. Would you say the same about heap, or about a stack that only needed 50% of heap space but ended up using all of it? Or money? Using twice as much as you need of anything is bad IMO. Apparently you don't realize that GHC normally uses twice as much heap as is needed, due to the decision to use a two-space copying collector by default for the oldest generation. :) Yikes again! It gets worse :-) Perhaps I should have said *live* heap. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Simon Peyton-Jones wrote: | First bad thing: | Stack size (memory consumed) doubles each time it overflows. | | Second bad thing: | Arbitrary limit on stack size unrelated to overall (heap) memory | available. | | Third bad thing (the really bad thing): | If a stack has temporarily grown (to 64M say), it will never shrink | back down again to something more typical ( 4K say). If I understand | correctly, it will continue to take 64M from the heap regardless. | | What I would like is to be able to set an upper limit on total memory | useage and allow the program to freely use this memory as either stack | or heap. At least that should be the default behaviour, but maybe | also allow +RTS restrictions for debugging (though I don't think this | is a very good way of investigating a programs stack use). | | I would also like stack memory allocation to increase (and decrease :-) | in some sane sized linear increment, not double each time. With the | current scheme, as I understand it, if 65M is needed then 128M will be | allocated. Would you like to create a ticket for this? OK I don't know how many people it bites, and how often, The problem is that the fact that it *might* bite often affects your whole coding style (well mine actually :-) for some problems. It also seems to have given rise to the POV that ghc's current behaviour is good because stack use is bad. MHO is that it's only ghc's current behaviour that *makes* stack use bad. I think it bites a lot less often than it otherwise would because most people will deliberately chose to use heap in preference to stack (at least when writing eager code) just to avoid the problem. But it still bites pretty often anyway with lazy code for unexpected reasons. Arguably such situations are indeed a bug more often than not, but I still think terminating the program unnecessarily (at 8M stack) is bad default policy. Yes, this is the standard solution, and it's a good one because it has a robust cost model (no quadratic costs). However, it's tricky to get right; copying is simpler. If a significant fraction of runtime (for some interesting program(s)) turned out to be consumed by copying stacks then we could consider this. Do you really need such evidence? If we agree that allowing stack to grow to arbitrary (limited only by memory availability) size is reasonable then surely we already know that there will be some stack size for which quadratic copying cost is going to get stupid :-) Of course there other possible more radical solutions that come to mind, like not using a (C style) stack at all. But I guess we'd be talking about a complete re-write of the pretty much all the rts and much of the compiler to do this :-( Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Bulat Ziganshin wrote: Hello Matthew, Monday, February 4, 2008, 11:45:51 PM, you wrote: That would be nice. But its only beneficial if there are programs which takes large amounts of stack at some point, but then shrink down to very little stack and continue for a reasonable amount of time. From the 'when I was a lad' department... Thinking back to when Java transitioned to a garbage collector that could give memory back to the OS, we got some unexpected benefits. Consider a machine i would be also happy if ghc will return unused *heap* memory back to OS - it's immediately required for my GUI program where users may open huge files and then close them. but i personally don't have the same need for *stack* How do you know you don't or won't have the same need for stack? Given that most most real programs are going to pull in library code written by all sorts of people, don't you want your program to be robust and memory efficient even if it makes use of libraries whose authors chose stack gobbling in preference to heap gobbling, or who used a (currently non-existant AFAIK) CPS based implementation for development? I just don't get this idea that the current implementation (8M limit IIRC in the absence of +RTS options) is good. 8M is still a pretty big stack and (8M - 4K) per thread seems like an awful lot of memory to waste to me. If we're all so sure that big stacks are a bug then why bother allowing them to grow at all. Why not just limit them to 4K? Actually I think the latter option above might be good way to discover how many bug free Haskell progs there really are out there. Precious few I suspect :-( Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Simon Peyton-Jones wrote: | Yes, using lots of stack is clearly bad with ghc, but this is a ghc | bug. In fact the only reason these programs do use lots of stack | (vs. heap) is just a peculiarity of ghc rts implementation, so it | really should be ghc that fixes the problem, or at least admits | responsibility :-) I don't think there's anything fundamental here. GHC allocates the stack in the heap, and it can grow as big as you like. The size limit is simply to catch infinite recursion with a more helpful message than heap overflow. I think. There is one peculiarity though: I don't think we ever shrink the stack, so once it gets big it stays big. This could be fixed, though. Yikes! Sorry, but if what you say is true then things are even worse than I thought :-( This behaviour seems really bad to me, especially for concurrent programs. Also, I can't help thinking that the common justification for the current limit (that it helps find alleged bugs) is a little lame. It only helps find bugs if one expects ones program to use less than 8M of stack (hence if it's using more, it's a bug by ones *own* definition). But if a program or library is deliberately designed to make use of stack (in preference to heap) for efficiency reasons (or even just to avoid the awkwardness of using explict CPS style), then this is a source of bugs in otherwise perfectly correct and reasonable programs. If we want some way of investigating a programs stack use there must be a better way of doing it than deliberately inducing a crash in any program that exceeds 8M of stack. Thanks for the answer though. I think I'll write a ticket about this :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Hello Simon, Simon Peyton-Jones wrote: | Sorry, but if what you say is true then things are even worse than I | thought :-( This behaviour seems really bad to me, especially for | concurrent programs. Which behaviour precisely? Can you say what is wrong and what behaviour you expect? Roughly.. First bad thing: Stack size (memory consumed) doubles each time it overflows. Second bad thing: Arbitrary limit on stack size unrelated to overall (heap) memory available. Third bad thing (the really bad thing): If a stack has temporarily grown (to 64M say), it will never shrink back down again to something more typical ( 4K say). If I understand correctly, it will continue to take 64M from the heap regardless. What I would like is to be able to set an upper limit on total memory useage and allow the program to freely use this memory as either stack or heap. At least that should be the default behaviour, but maybe also allow +RTS restrictions for debugging (though I don't think this is a very good way of investigating a programs stack use). I would also like stack memory allocation to increase (and decrease :-) in some sane sized linear increment, not double each time. With the current scheme, as I understand it, if 65M is needed then 128M will be allocated. Stefan O'Rear suggested an alternative. I don't know how hard it would be to implement though (don't really know anything about ghc rts). http://haskell.org/pipermail/glasgow-haskell-users/2007-May/012472.html Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: Hi First bad thing: Stack size (memory consumed) doubles each time it overflows. Bad thing? Assume that allocating memory takes some constant amount of time, such as invoking overflow behaviour etc. To get the size of the stack to n bytes with doubling takes O(log n), to get it there with a constant increase takes O(n). But whatever the program did to get given stack size must have been at least O(n) anyway, so overall it's still going to be O(n) even if the stack allocation part is O(log n). We're just talking about a very tiny increase in constant factors, at least if Stefan O'Rears hypothesis is correct :-). I'm inclined to agree with him. If you store the stack in a linear block, then allocation costs O(n) and you can even risk O(n^2) behaviour unless you double each time. I think its fairly well understood that things like hash tables should double in size when they overflow, rather than increasing by some small increment. It is? Well obviously if the entire thing is copied each time this will be bad, but that's not what we're talking about. See Stefans proposal. Also remember that this behaviour never wastes more than 50% of the stack, which is a relatively small amount. Only if the stack is relatively small. Would you say the same about heap, or about a stack that only needed 50% of heap space but ended up using all of it? Or money? Using twice as much as you need of anything is bad IMO. Third bad thing (the really bad thing): If a stack has temporarily grown (to 64M say), it will never shrink back down again to something more typical ( 4K say). If I understand correctly, it will continue to take 64M from the heap regardless. That would be nice. But its only beneficial if there are programs which takes large amounts of stack at some point, but then shrink down to very little stack and continue for a reasonable amount of time. Console programs probably don't fit this pattern (since they tend to be batch style and exit quickly). GUI programs probably do, so perhaps stack reduction will be more important as the GUI toolkits mature and Haskell starts getting used for UI type things. The nature of the app has nothing to do with it AFAICS, this problem can affect any program that evaluates expressions. That said, unless there is a real user with a real problem (rather than a theoretical concern), priority may go to other bugs. The point is that writing a stack greedy function definition (rather than a heap greedy alternative) is almost always the simpler option, and would probably be more efficent too. It would also be perfectly OK in *most* situations. But being OK in most situations isn't good enough. You also (as far as is possible given finite amount of total memory) want it to be OK in pathological situations, or at least no worse than the heap greedy version. Why should the decision to use a stack greedy definition cause a crash at 8M whereas a heap greedy definition can happily use much more without crashing? I (like everyone else) tend to avoid knowingly writing stack greedy definitions because of this. But I do this as a workaround for ghc's currently (IMO) poor stack management, not because I consider code that uses the stack to be inherently buggy. Furthermore as I said earlier, using a lot of stack is purely a ghc rts implementation detail. Other possible Haskell implementations may not use a lot of stack for the same function (may not use a stack at all). So you can't say a program has bugs just because it happens to cause a stack overflow with ghc. You might reasonably argue that it has a bug if it uses a lot of memory with any plausible Haskell implementation (one way or another) *and* you can show that there is an alternative implementation which uses asymptotically less memory. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Derek Elkins wrote: While perhaps for a simple throw-away program it may be beneficial to write code that allocates unnecessary stack, I personally consider unnecessary stack use a bug. A stack overflow, to me, is always indicative of a bug. The bug is in ghc stack management. Why is it so important that the stack size is arbitrarily limited? It's just an intermediate data structure, no different from any other intermediate data structure you may build on the heap (well apart from it's efficiency). But I guess we would be in danger of having our programs run too fast if folk were silly enough to make use of the stack. So perhaps the current ghc defaults are too generous. What limit do you think should be placed on the stack size that a non buggy program can use? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Jonathan Cast wrote: http://www.cs.princeton.edu/~appel/papers/45.ps is the traditional cite here, no? Can be is not the same as is. A lot depends on exactly what you call a stack and the relative efficiencies of stack vs. heap implementations. Certainly my experience of library tuning tells me that (with ghc at least), designing your code and data structures to keep heap allocation down to an absolute minimum is very important. So I'm very sceptical about claims that burning heap is just as efficient. Heap allocation maybe just as cheap, but reclaiming costs more. A lot also depends on compiler (and associated rts), such as whether or not it translates to CPS, thereby in effect building a stack (in all but name) on the heap. So you could exactly have the same correct and *bug free* program giving a stack overflow on ghc, but not on a CPS based compiler (the CPS implementation just uses a shed load of heap instead). Other implementations (Chicken Scheme?) effectively build their heap on the stack, which never shrinks until it overflows. Is that inherently buggy? Surely the alleged buginess of programs should not be dependent on choice of compiler/rst? As nobody has provided any sensible justification for the assertion that using lots of stack (vs. heap) inherently is bad (and that programs which do have bugs, by definition), it seems to me this is one of those quasi-religious beliefs, like (so called) global variables or the cyclic module dependencies being a bug (by definition). Yes, using lots of stack is clearly bad with ghc, but this is a ghc bug. In fact the only reason these programs do use lots of stack (vs. heap) is just a peculiarity of ghc rts implementation, so it really should be ghc that fixes the problem, or at least admits responsibility :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: My claim is that any program which needs to adjust the stack size has a laziness leak - since I've made a universally quantified claim, a couple of real examples should blow it out of the water. But people often deliberately introduce lazyness leaks for improved efficency and in order to avoid space leaks. http://haskell.org/pipermail/glasgow-haskell-users/2007-May/012467.html Here there is essentially no difference between stackGobbler and heapGobbler (they both use a stack), but in order to avoid a stack overflow heapGobbler is tail recursive and explicitly implements the stack as a reversed list accumulator, which then has to be reversed at the end, so will burn twice as much heap to get a result as stackGobbler (at least if we already know the list has evaluated at least up to the point where it's tail get chopped off). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell maximum stack depth
Neil Mitchell wrote: Hi Istarex, Does Haskell have a maximum stack depth restriction like Java Python, or is it only limited by the maximum stack available on the system (i.e. what would be available to a C program)? You are probably thinking that recursive functions use up program stack, and hence the stack depth bounds the amount of recursion. In Haskell, that isn't the case. Haskell is lazily evaluated, and has tail recursion, which means that you rarely run into a problem with exceeding the stack depth. In GHC the stack is stored in the heap area of memory, so is not limited to the C stack, but can be set at runtime with a flag (+RTS ... something ...) - but you won't need to. Sorry, but I think that's a very misleading answer to give to someone (who's presumably a noob). The answer is that no such limit is defined in the standard, for the obvious reason that the standard does not presume anything about runtime implementation, not even the presence of a stack. ghc uses a pretty conventional stack AFAIK, and it is arbitrarily limited, but you can change the limit with +RTS options. Also, stack overflows are a pretty common cause of program failure IME, not at all rare. At least, far more common than whatever error message you get from heap exhaustion (can't even remember the last time I saw one of those). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANNOUNCE: Haddock version 2.0.0.0
Hello, I've just tried this with cabal runghc Setup haddock --hyperlink-source --hscolour-css=..\hscolour.css and I get this error.. Setup: haddock --hyperlink-source requires Haddock version 0.8 or later This looks like a bug in cabal rather than the new Haddock to me. Can anyone think of an easy workaround? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] HsColour version confusion
Hello, I'm confused about which HsColour version I should be using with Haddock/Cabal (on Windows). According to this page.. http://www.cs.york.ac.uk/fp/darcs/hscolour/ ..the latest version is 1.9. But the latest version in Hackage is 1.6, the latest version in the ftp downloads dir is 1.8, unless you want a pre-compiled windows version in which case you're stuck with 1.3 :-) Anyone know what's going on? Thanks -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Collections library
Ketil Malde wrote: Ben Franksen [EMAIL PROTECTED] writes: PS (completely off-topic, sorry): I've been using the collections library throughout the project I must say it is a lot nicer to work with I tried to Google for this, and ended up at http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework The only link that seems to work is the one that is marked as outdated. I've replaced them with a link to Hackage, but somebody who knows more about this might want to recheck the facts on the page. It currently lives here as a darcs repo.. http://code.haskell.org/collections/collections-ghc6.8 ..and is in the process of being 6.8ified and split up into separate smaller packages for hackage. I think one of the problems with it as one package (apart from it's size) is that different bits of it were in different states of real world readiness. Some of it quite stable (e.g. all the AVL tree stuff and Data.Map/Set clones) and some was still actively being worked on (the Data.Trie.General stuff) and this kinda stopped a stable hackage package for everything. I recently withdrew from this project and offered up the libs I'd been working on as they are for a new owner. Didn't get any takers though (no surprises there!). I've always found the lack of apparent interest in all this somewhat puzzling myself. It's not as if there's no latent demand for efficient collections. (Data.Map is probably the most regularly whined about of all the standard libs.) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] A tale of three shootout entries
Simon Peyton-Jones wrote: | Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking. If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case. I wonder if this could be related to what I observed with AVL trees and mentioned a while back (using a strict data type is slower than using explicit seqs to get the same strictness). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Libraries need a new owner
Hello Folks, Adrian Hey wrote: If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy. Sigh..no sooner than I go and write something like that than the IEE (or I should say IET) go and break my mail alias. So sorry if anyone did actually try to contact me and got a their mail bounced. It should be working again now. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Libraries need a new owner
Hello Folks, As some of you will be aware, I have been working on various Map implementations that currently live here.. http://code.haskell.org/collections/collections-ghc6.8 The libs in question being Data.Tree.AVL, Data.Trie.General and a few other bits like Data.COrdering and the AVL based Data.Map/Set clones. Well, I have decided to stop work on these. So they need a new owner if they're going to go anywhere. If anyone is interested in the job then I suggest they contact myself or Jean-Philippe Bernardy. Of course I will be happy to provide any help or advise anyone who takes over these libs may feel they need from me. I might even contribute a few patches from time to time myself :-) Thanks -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why does GHC limit stack size?
Hello, Why is there a limitation on the stack size in GHC? Like heap where we can limit the size by -M RTS option but the default is unlimited, why not let the program use as big a stack as required? If not by default, then by a separate option? Some of the functions that we write in recursive fashion will usually cause a stack overflow, but will work fine if there is more stack (suppose we are not worried about efficiency). And these functions generally look nicer and compact than their tail recursive versions. Is this is a technical hurdle, or just a checkpoint for runaway programs? This was discussed a while ago on the ghc users mailing list. I think there was general agreement that this was bad, but that doing something better meant a lot of work for someone (who could be trusted to get it right :-) http://www.haskell.org/pipermail/glasgow-haskell-users/2007-May/012467.html Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why does GHC limit stack size?
Bulat Ziganshin wrote: because program that require 8mb stack, will probably require 8gb when processing more data :) So.. what? You could say the same about heap, which was rather the point of the earlier thread. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why can't Haskell be faster?
Paulo J. Matos wrote: Hello all, I, along with some friends, have been looking to Haskell lately. I'm very happy with Haskell as a language, however, a friend sent me the link: http://shootout.alioth.debian.org/gp4/ which enables you compare several language implementations. Haskell seems to lag behind of Clean. From what I've seen of Clean it seems almost like Haskell. It even distributes a Haskell-Clean translator so the obvious question is, why is Haskell slower? Being similar languages and being GHC a very good compiler, can't it get at least as fast as Clean? What am I missing here? (I wrote this mail assuming the results from the URL are trustworthy). I don't know for certain that this is still the case (and if so why). But I do remember that when I was a Clean user a few years ago both the Clean compiler and the resulting executables were amazingly fast (certainly by FPL standards). I've often thought it's a real shame that two different but very similar languages exist. I think that the Clean compiler would be one of the best if not *the* best Haskell implementations available, apart from minor snag that it isn't Haskell at all :-) As things are at the moment ghc has no serious competition so we don't really know how fast it should be. Maybe this will change in future. BTW, the reason I still jumped ship in the end and became a Haskell user instead had nothing to do with performance. The reason was that if I was going to invest a lot of time in progs/libs I wanted to have some confidence I'd made the right choice long term and I had issues with the Clean approach to concurrency (what the Clean folk call deterministic concurrency). I didn't (and still don't) see this as viable, but during a long and heated flame war on the Clean mailing list it became clear that the Clean team did not agree with my point of view, so things were not likely to change any time soon :-( Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] unsafePerformIO: are we safe?
Chaddaï Fouché wrote: 2007/9/26, Adrian Hey [EMAIL PROTECTED]: Chaddaï Fouché wrote: There can't be alternatives, unsafeIO throw by the window most guarantee that Haskell can give you and you have to provide them yourself (with a proof of this part of your program), but it's inherent to the nature of the beast, it's what it do ! What about .. http://www.haskell.org/haskellwiki/Top_level_mutable_state This as unsafe a use of unsafePerformIO as you'll ever find, but necessary for real IO libs. I'm not arguing that there aren't specific current usage of unsafePerformIO that could be better formalized (there are), I'm saying that you can't design an alternative to unsafePerformIO which cover all its applications but stay safer. What we can do with unsafePerformIO is inherently unsafe, even though you can obtain safe results with it (and in some of those use-case, a restricted version of unsafePerformIO could be used and would be safer). In your original post you appear to be making the usual argument about proof obligations and the possibility that unsafePerformIO can be used safely, despite the name (and should only be used this way). I was simply pointing out that there is at least one common use of unsafePerformIO for which this isn't possible (and there is currently no safe alternative). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Does Haddock respect hiding?
Hello folks, Using Haddock 0.8, if I use haddock to document a module like this.. module MyModule ( module MyOtherModule ) where import MyOtherModule hiding (foo) I still get foo documented in the API for MyModule, despite the fact that foo is not really made available by importing MyModule. This looks like a bug to me, or maybe there's something else I should be doing. Anyone care to comment? Thanks -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] unsafePerformIO: are we safe?
Sebastian Sylvan wrote: Rule of thumb: If your name isn't Simon*, you shouldn't use unsafePerformIO. If this is so, maybe it's time someone (who may or may not be called Simon) gave us a realistic alternative. :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] unsafePerformIO: are we safe?
Chaddaï Fouché wrote: There can't be alternatives, unsafeIO throw by the window most guarantee that Haskell can give you and you have to provide them yourself (with a proof of this part of your program), but it's inherent to the nature of the beast, it's what it do ! What about .. http://www.haskell.org/haskellwiki/Top_level_mutable_state This as unsafe a use of unsafePerformIO as you'll ever find, but necessary for real IO libs. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe