Re: Proposal: EPHEMERAL pragma
On Thu, Oct 25, 2012 at 9:56 AM, José Pedro Magalhães j...@cs.uu.nl wrote: Hi all, Following up on a chat with Simon Peyton Jones at ICFP, I would like to discuss the possible introduction of a EPHEMERAL pragma. For example: {-# EPHEMERAL Rep #-} data Rep = ... This pragma would indicate that the programmer intends the Rep datatype not to be present in the final generated core code. Its proposed semantics are the following: 1. Make the compiler very keen to inline any functions that produce or consume Rep. 2. If Rep is exported, make all functions that operate on Rep INLINABLE (that is, make their code available for inlining in other modules). 3. Emit a warning if the generated core code still contains uses of Rep. Won't all of this require GHC to fix its handling of inlining in the presence of recursion (am I behind times? Or does Rep also end up with the restriction that it should never be an argument to or result from a recursive function? -Jan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: EPHEMERAL pragma
On Thu, Oct 25, 2012 at 9:56 AM, José Pedro Magalhães j...@cs.uu.nl wrote: Hi all, Following up on a chat with Simon Peyton Jones at ICFP, I would like to discuss the possible introduction of a EPHEMERAL pragma. For example: {-# EPHEMERAL Rep #-} data Rep = ... This pragma would indicate that the programmer intends the Rep datatype not to be present in the final generated core code. Its proposed semantics are the following: 1. Make the compiler very keen to inline any functions that produce or consume Rep. 2. If Rep is exported, make all functions that operate on Rep INLINABLE (that is, make their code available for inlining in other modules). 3. Emit a warning if the generated core code still contains uses of Rep. Are you restricting Rep to non-recursive uses? Or has GHC's inliner finally learned how to behave well in the presence of recursion? -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Stealing ideas from the latest GCC release
On Thu, Mar 22, 2012 at 5:09 PM, Johan Tibell johan.tib...@gmail.comwrote: Hi all, While looking at the GCC 4.7 [1] release notes I saw something that's perhaps worth stealing. Taken from the release notes: The inter-procedural constant propagation pass has been rewritten. It now performs generic function specialization. For example when compiling the following: void foo(bool flag) { if (flag) ... do something ... else ... do something else ... } void bar (void) { foo (false); foo (true); foo (false); foo (true); foo (false); foo (true); } GCC will now produce two copies of foo. One with flag being true, while other with flag being false. This leads to performance improvements previously possibly only by inlining all calls. Cloning causes a lot less code size growth. Wait, I thought this is essentially what constructor specialization does? I suppose we might then keep around the old body. Or will these behave differently in the presence of, say, different constant Int arguments? -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Records in Haskell
On Fri, Jan 13, 2012 at 6:52 PM, Simon Peyton-Jones simo...@microsoft.com wrote: [... good summary of the issues...] But note what has happened: we have simply re-invented SORF. So the conclusion is this: the only sensible way to implement FDR is using SORF. An obvious question at this point: can records have unboxed fields? I'm worried a bit about the kinds that can appear in a has constraint: A feature of SORF is that you can write functions like this k :: Has r f Int = r - Int k r = r.f + 1 I'm thinking out loud about the implementation implications here. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Unwanted eta-expansion
On Sun, Oct 9, 2011 at 10:54 AM, Roman Cheplyaka r...@ro-che.info wrote: * Jan-Willem Maessen jmaes...@alum.mit.edu [2011-10-08 12:32:18-0400] It seems to be a common misconception that eta-abstracting your functions in this way will speed up or otherwise improve your code. Simon PJ has already provided a good explanation of why GHC eta expands. Let me take another tack and describe why the code you wrote without eta expansion probably doesn't provide you with any actual benefit. Roughly speaking, you're creating a chain of closures whose contents exactly describe the contents of your list (ie you've created something that's isomorphic to your original list structure), and so you should expect no benefit at all. Thanks for the analysis. I used myFoldl as a minimal example to ask the question. Here's an example of real code where this does make a difference: https://github.com/feuerbach/regex-applicative/tree/03ca9c852f06bf9a9d51505640b8b72f07291c7d Ah, now things get more complicated! :-) I suspect here you're actually entering the regexp closures, and compiling it down is actually saving you a teensy bit of interpretive overhead. ... What's even more interesting (and puzzling!), if remove -fno-do-lambda-eta-expansion for Text/Regex/Applicative/Types.hs, the benchmark takes 2.82 seconds. That *Is* odd. The only obvious code generated here would be the newtype instances, for which this should surely be irrelevant? Can someone at GHC HQ explain this one? -Jan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Two Proposals
2011/10/5 Simon Peyton-Jones simo...@microsoft.com: | In the spirit of don't let the perfect be the enemy of the good | though, I'm solidly in favor of the original proposal as it is. This is my thought too. George is proposing to extend Haskell's existing mechanism for numeric literals (namely, replace 4 by (fromInteger (4::Integer))), so that it works for lists, just as Lennart did for Strings. One could do more, as Yitz has suggested, but that would be an altogether bigger deal, involving Template Haskell and quite a bit of new design; and if done should apply uniformly to numeric and string literals too. So personally I favour George's general approach as a first step. But here is one thought. In the spirit of monad comprehensions, should we not treat [a,b,c] as short for return a `mappend` return b `mappend` return c so that [a,b,c] syntax is, like [ e | x - xs ] syntax, just short for monadic goop. Then we would not need a new class at all, which would be nice. No, you should not. Most of the types of interest (Sets, Maps, arrays) are not monads. Conflating list comprehensions with monads is a huge mistake, and this would repeat it. -Jan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Unwanted eta-expansion
On Tue, Oct 4, 2011 at 2:39 AM, Roman Cheplyaka r...@ro-che.info wrote: Suppose I want a foldl which is evaluated many times on the same list but with different folding functions. I would write something like this, to perform pattern-matching on the list only once: module F where myFoldl :: [a] - (b - a - b) - b - b myFoldl [] = \f a - a myFoldl (x:xs) = let y = myFoldl xs in \f a - y f (f a x) However, for some reason GHC eta-expands it back. It seems to be a common misconception that eta-abstracting your functions in this way will speed up or otherwise improve your code. Simon PJ has already provided a good explanation of why GHC eta expands. Let me take another tack and describe why the code you wrote without eta expansion probably doesn't provide you with any actual benefit. Roughly speaking, you're creating a chain of closures whose contents exactly describe the contents of your list (ie you've created something that's isomorphic to your original list structure), and so you should expect no benefit at all. Let's consider a specific concrete call to myFoldl: myFoldl (1:2:3:4:[]) = r1 where rt = \ f a - a x4 = 4 r4 = \ f a - rt f (f a x4) x3 = 3 r3 = \ f a - r4 f (f a x3) x2 = 2 r2 = \ f a - r3 f (f a x2) x1 = 1 r1 = \ f a - r2 f (f a x1) Each of the bindings above is a separate heap-allocated object. Now consider the data representation for the function closures r1..r4. Each such closure has two free variables: the previous closure (r2..r4 or rt) and the value of the current list element (x1..x4). If you write that out schematically in box-and-pointer notation, you'll quickly see that the result has the exact same memory structure as your original list. Now, you'll probably want to point out that transforming your list into a string of closures like this eliminates the need to pattern match on the list structure. This is true, but that's because you've replaced that pattern match with a call to an unknown closure. That's because in most circumstances we'll get just one copy of the code for r1..r4. GHC will actually generate code a bit like the following [*]: myFoldl (1:2:3:4:[]) = r1 where closure x r = \ f a - r f (f a x) rt = \ f a - a x4 = 4 r4 = closure x4 rt x3 = 3 r3 = closure x3 r4 x2 = 2 r2 = closure x2 r3 x1 = 1 r1 = closure x1 r2 So there are two different kinds of closure that get passed to r in closure x r: 1) rt 2) a call to closure xn r(n+1) Distinguishing these two cases (even if just by branching to a closure code pointer) leads to overheads comparable to (and generally larger than) those of pattern matching. GHC used to distinguish (:) and [] by branching to an unknown function pointer (exactly as is happening here) and switched to pointer tagging instead because it was faster. All this assumes everything has been completely evaluated already. Laziness drowns out most of the relative advantages and disadvantages here (and there's an even-more-involved explanation of why you might lose strictness information in your eta-abstracted function, making things worse still). It also assumes you are not able to specialize your code to a particular higher-order function. Any time you can do that, it's potentially very beneficial. For example, the following *might* be a worthwhile definition of foldl: {-# INLINE foldl #-} foldl f = loop where loop a [] = a loop a (x:xs) = loop (f a x) xs -Jan-Willem Maessen [*] Note that GHC actually treats the free variables of the closure (x and r in this case) a little bit specially, so the code isn't necessarily *literally* equivalent to what I've shown here, but it's pretty close. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Reducing the cost of garbage collecting small arrays
On Thu, Jun 23, 2011 at 4:54 AM, Johan Tibell johan.tib...@gmail.com wrote: On Thu, Jun 23, 2011 at 8:27 AM, Johan Tibell johan.tib...@gmail.com wrote: Is 5 the optimal number of bits to slice off at a time (ie the best fanout)? It sounds like node copy cost on insert argues for a slightly narrower fanout. You'll be evacuating / scanning more words total, but new nodes may equate to less scanning overall (especially if this is running long enough to have some nodes get tenure). I'm experimenting with this. 6 is far too much, making inserts 4-5x slower. 4 doesn't seem to improve things much (which is a bit counter-intuitive given that 6 made things so much work), but I need to experiment some more. After some more testing I can improve the performance of insert by 30% by reducing the array size from 32 to 16. GC still dominates though. Yes, I'd still expect that; internal node churn with fat nodes exhausts heap more quickly than usual. If large nodes become the norm, cranking up GC nursery size might be in order. It's great to see that fat nodes finally work well in GHC, though. When I first tried this in the 90's it was so bad (due to the extra level of indirection write barrier problems) I gave up in frustration. -Jan Johan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Reducing the cost of garbage collecting small arrays
On Wed, Jun 22, 2011 at 12:33 PM, Johan Tibell johan.tib...@gmail.com wrote: Hi, Now when we can efficiently copy small arrays I've gone back to benchmarking/optimizing my hash array mapped trie data structure. The data structure's performance depends on how efficiently we can allocate/collect/copy Array#s and MutableArrays#s of size =32. Speeding up copying helped quite a bit, but the HAMT is still slower than a Patricia tree for inserts. Is 5 the optimal number of bits to slice off at a time (ie the best fanout)? It sounds like node copy cost on insert argues for a slightly narrower fanout. You'll be evacuating / scanning more words total, but new nodes may equate to less scanning overall (especially if this is running long enough to have some nodes get tenure). -Jan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Class constraints for associated type synonyms
On Thu, Mar 24, 2011 at 11:36 AM, Simon Peyton-Jones simo...@microsoft.com wrote: | class Monoid (GeneratorOf a) = Generable a where | type GeneratorOf a :: * - * | construct :: GeneratorOf a - a | | Now, it seems I need FlexibleInstances to do this when I'm using an | associated type synonym, but I don't need the flexibility when using a | multiparameter type class. Suppose you have these wierd instances: type instance GeneratorOf (Tree a) = Tree (Tree a) instance Generable a = Monoid (Tree a) instance Generable (Tree a) Now, in the last of these we need to cough up an instance of Generable (Tree a)'s superclasses. Ah, that's Monoid (GeneratorOf (Tree a)) Ah, that's Monoid (Tree (Tree a)) We have an instance of Monoid, but it needs, well Generable (Tree a), which is where we started. If I'd nested things a bit more deeply you can see I'd get into an infinite regress. So you have to take responsibility that instance solving will terminate, hence FlexibleInstances. As you say, the same thing can happen with fundeps. The fact that the thing is allowed is probably a bug in the Fundep stuff. Thanks, it's good to know that I was, in fact, being naughty in both instances (and not merely being constrained from doing Good Things by the typing rules for associated types). Back to the drawing board. -Jan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Class constraints for associated type synonyms
Hi all - I've been trying to construct a class declaration with an associated type synonym, but I'd like to constrain that type to belong to a particular class. Consider the following class: class Monoid m = Constructs c m | c - m where construct :: m - c This captures the idea that the collection c ought to be constructed using the monoid m (say if we're doing the construction using the Writer monad)--the functional dependency indicates the desire for the type c to injectively determine the choice of monoid m. For example: newtype ListBuilder a = Builder ([a] - [a]) deriving (Monoid) instance Constructs [a] (ListBuilder a) where construct (Builder f) = f [] instance (Ord a) = Constructs (Set a) (Set a) where construct = id Now I'd like to be able to do the same thing using an associated type synonym, something like this: class Monoid (GeneratorOf a) = Generable a where type GeneratorOf a :: * - * construct :: GeneratorOf a - a Now, it seems I need FlexibleInstances to do this when I'm using an associated type synonym, but I don't need the flexibility when using a multiparameter type class. In both cases the instance constraint involves types that can be injectively inferred (if I have my terminology straight; work with me here) from a single type mentioned in the class head. In particular, I can imagine storing the dictionary for Monoid (GeneratorOf a) in the dictionary for Generable a, and thus allowing context reduction of (Monoid (GeneratorOf tyvar)) to (Generable tyvar). Meanwhile, I think there are things that are permitted by FlexibleInstances that I'd rather *not* have creeping into my programs. Is there a fundamental constraint I'm missing here that requires the full generality of FlexibleInstances? Do I need to use FlexibleInstances whenever I use associated types in my programs? -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: compilation of pattern-matching?
On Mar 25, 2009, at 5:18 AM, Simon Peyton-Jones wrote: Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases ... * Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say this is the hot branch. That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation The usual compiler heuristic is backward branches or loop edges, which I would re-interpret in Haskell as contains a call (any call) to a function in the same SCC binding group. But I expect for hot code the effect would indeed be small. Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity. Thanks Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Strictness in data declaration not matched in assembler?
On Oct 15, 2008, at 11:08 AM, Lennart Augustsson wrote: I totally agree. Getting the value of the field should just evaluate x and then use a pointer indirection; there should be no conditional jumps involved in getting the value. GHC is just doing the wrong thing. Can indirection nodes occur in this context? [I'd think not, but it depends on what pointer we're storing when we force the thunk in the constructor.] I could see the need for the test if indirection handling is required. -Jan -- Lennart On Wed, Oct 15, 2008 at 3:58 PM, Tyson Whitehead [EMAIL PROTECTED] wrote: On Wednesday 15 October 2008 10:48:26 you wrote: Strictness does not imply unboxing. To see why not, think about the fact that unboxing breaks sharing. By keeping the pointer-indirection in place, we can share even strict fields between related values. I believe I realize that. What I was wondering about was the fact that it seemed to think the pointer might be to a thunk (instead of constructor closure). Doesn't the strictness flag mean the following assembler would work sni_info: movq 7(%rbx),%rbx movq $snj_info,(%rbp) jmp snj_info (which could be cleaned up further by combining it with snj_info) instead of sni_info: movq 7(%rbx),%rbx movq $snj_info,(%rbp) testq $7,%rbx jne snj_info jmp *(%rbx) (i.e., the whole test if it is a thunk and conditionally evaluate it bit is unnecessary due to constructor the strictness flag). Cheers! -Tyson ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: fundeps help
On Dec 3, 2007, at 4:02 AM, Simon Peyton-Jones wrote: GHC's new intermediate language, System FC, is specifically designed to do this. Currently we're in transition: equality constraints are starting to work, but fundeps are implemented as they always were. I hope we can eventually switch over to implementing fundeps using equality constraints, and then the above program will work. Meanwhile, in the HEAD you can write conv :: (a~b) = a - b conv = id Which, IHMO, is a much clearer way to say it! Is it really a good idea to permit a type signature to include equality constraints among unifiable types? Does the above type signature mean something different from a -a? Does the type signature: foo :: (a~Bar b) = a - Bar b mean something different from: foo :: Bar b - Bar b ? I know that System FC is designed to let us write stuff like: foo :: (Bar a ~ Baz b) = Bar a - Baz b Which is of course what we need for relating type functions. But I'm wondering if there's a subtlety of using an equality constraint vs just substitution that I've missed---and if not why there are so many ways of writing the same type, many of them arguably unreadable! Hoping this will give me a bit of insight into SystemFC, -Jan-Willem Maessen You may also like to try the paper that Martin and I and others wrote about fundeps: http://research.microsoft.com/%7Esimonpj/papers/fd-chr Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Annotation for unfolding wanted
On Jul 31, 2007, at 10:20 AM, Simon Peyton-Jones wrote: | However my point was more on a semantic point of view: If I write a function | in a recursive way, but actually do nothing else than a loop, I would like | a) that the compiler unrolls it to a loop and | b) that I can specify such a requirement, while violating it emits an error. What does it mean to say the compiler unrolls it to a loop. If GHC sees a tail recursive function, it certainly compiles it to a loop! (But that's not called unrolling.) I think what's meant here is translating something like this: {-# INLINE f #-} f x y z = ... f x' y' z' ... into this: {-# INLINE f #-} f x y z = f' x y z where f' x y z = ... f' x' y' z' ... That is, shoving (all of) the recursion in a level. Then inlining f results in a fresh loop, which presumably can be specialized or optimized in various ways. I did this in pH, mostly because it was less irritating than performing the same transformation by hand on key bits of the Prelude. Whether it's actually beneficial on a large scale probably depends on the effectiveness of transformations that can specialise f' to the site where it was inlined; invariant argument elimination comes to mind here, and I never did a good job with that. It does remove one irritant from performance tuning, though, by letting us write a simple top-level recursive function and have it inlined. -Jan Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: UTF-8 decoding error
On Sep 27, 2006, at 6:05 AM, Matthew Pocock wrote: Fortress (sun's possibly-not-vaporware hpc language) supports arbitrary unicode chars in code, and has an escape syntax for commonly used things. I have spent the past week writing Fortress code (which runs in parallel, even). But I'm perhaps a special case. :-) Similarly, proof-general/isabelle supports tex-style escapes for symbols greek. It seems to me that a pre-processor that turns human- friendly escapes (e.g. \{lambda} rather than some magic number) into unicode and a slightly intelligent IDE (or emacs mode?) would go most of the way to letting us use up-side-down ys and curly as with all the visual beauty and editor niceness that we have now with ascii. In Fortress we spent a *lot* of effort making the TWiki syntax as painless as possible for stuff which we planned to use often (for example, - and = turn into Unicode arrows, and the language syntax is defined in terms of them). One source of both encouragement and frustration is the fact that every unicode code point has an associated description. We support using these descriptions---and various shortenings of them, since they are too verbose for day-to- day use. The frustration is that the names or their shortenings are not necessarily unique. For characters which only occur in strings this is less critical, but a little effort will go a long way. One heuristic we've used is: if I do a diff on the ASCII representation provided by my version control system, will I be able to read the result? We of course have a little program which processes an official unicode character table (downloaded from the web) plus some information about our special cases and uses it to generate the appropriate conversion functions. This is important because Unicode is constantly changing (mostly getting bigger). -Jan-Willem Maessen Fortress developer, Haskell hacker Matthew On Wednesday 20 September 2006 21:42, Duncan Coutts wrote: On Wed, 2006-09-20 at 18:14 +0200, Christian Maeder wrote: How can I convince ghc version 6.5.20060919 to accept latin1 characters in literals? I wish to keep source files (containing umlauts in strings) that can be compiled by either ghc-6.4.2 and ghc-6.6. You can use numeric escapes like \222. Duncan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users smime.p7s Description: S/MIME cryptographic signature ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Re[2]: [GHC] #876: stack overflow on 'length . filter odd $ [0.. 999999999]'
On Aug 31, 2006, at 3:03 PM, Simon Peyton-Jones wrote: (replying to me) | Actually, it's sufficient to do good arity-raising transformations, | and use the definition: |foldl f z xs = foldr (\x k a - k (f a x)) id xs z | | This yields code which looks a bit like this: | ... Absolutely right. This particular thing has been on my to-do list for at least 10 years (see Andy Gill's thesis). In the interests of fairness I should point out that this was where I first saw this formulation (though it might have cropped up in some of Richard Bird's work). The pH pass could do some other fancy footwork, like changing the handedness of a commutative fold and re-associating a nested fold of an associative operator. That's all rather harder in the RULES framework (but I bet it's doable). All that footwork also relied on getting the arity analysis right in the end. I'm pretty convinced that arity munging was a critical enabling optimization. -Jan smime.p7s Description: S/MIME cryptographic signature ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Floating point problems
On Aug 30, 2006, at 6:04 PM, Lennart Augustsson wrote: On Aug 30, 2006, at 14:58 , David Roundy wrote: On Wed, Aug 30, 2006 at 07:38:35PM +0100, Jamie Brandon wrote: I recently defied my supervisor and used Haskell to write my coursework instead of C. All went well until I needed floating point and started having odd results. As far as I can tell it isn't substantially affecting my results but it is rather embarrassing after slagging off C so much. Here are some examples: *Main 0.2 + 0.1 0.30004 *Main 0.200 + 0.10 0.30004 *Main 0.3 + 0.1 0.4 *Main 0.2 + 0.1 0.30004 *Main it + 0.1 0.4 I assume this is a result of the discrepancy between binary and decimal representations of the numbers. Is there any way around? For a start, it would be nice to have a simple way to get 0.1 + 0.2 == 0.3 = True This is with GHC 6.4.1 and GCC 4.0.3 The trouble here is that ghci is printing more digits than it really ought to be printing. No, I don't think it is. Ghci is printing the number that is closest of all numbers in decimal notation to the Double in question (i.e., 0.1+0.2). Printing it with fewer decimals would yield a different number if it was read back. I always wondered why we didn't instead ask for the number that has the fewest digits of significand which converts to the Double in question. Of course, for doubles with a single ulp of difference, that's still an awfully long decimal. I feel like I looked into this once when I was trying to understand the bignum-heavy Read instance for Double in the report, and ended up with a nasty headache and some fixed-point code which used cute hacks and seemed to work with limited testing and the vagaries of gcc as a back end. -Jan-Willem Maessen -- Lennart ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users smime.p7s Description: S/MIME cryptographic signature ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: IORefs and garbage collection
On Apr 18, 2006, at 4:05 PM, Ravi Nanavati wrote: I recently discovered that I'm running into the IORef / garbage collection issue described in the ghc user guide (at the bottom of the following page): http://www.haskell.org/ghc/docs/6.4/html/users_guide/faster.html I did a bit of back-and-forth with Simon M. on this when I was fiddling with Data.HashTable. It's especially bad for IOArrays. Increasing the heap and allocation area size (with -H, -M and -A) helped improved my runtimes considerably (by preventing all of my execution time from being sucked up by the garbage collector), but I have some questions: 1. What goes wrong with mutable data and generational garbage collection to make this sort of workaround necessary? The issue surprised me because I thought there was plenty of satisfied experience with generational gc for languages with many more destructive updates (e.g. Java) than Haskell. Is there some optimization that the ghc runtime is not implementing? There are a number of known optimizations. Basically, for an efficient generational GC, you need to make use of either a read or a write barrier; since the write barrier only applies if we actually mutate something, it's the usual solution. This allows us to keep track of IORefs in old-space which point into new-space. GHC is naive and just assumes all IORefs point into new space. But after 1 GC pass, if you don't touch an IORef this is certainly false (if I understand the promotion policy correctly). Note that a read or write barrier is necessary for efficient implementation of almost *any* GC algorithm except naive stop and collect the entire world. 2. If there is a known fix for this issue, what would it involve (and, if there are any guesses, how much work might it be)? I had thought this was on the list of things fixed in 6.4. Simon? 3. What is the best workaround for this issue if you *don't* know the maximum amount of memory available for your program? Would be it be best to fall back copying collection if you want your program to consume and release memory as needed or is there a cleverer trick? Hmm, it'd be nice if the allocation area dynamically scaled up based on GC time. But those calculations are awfully hard to get right (you need to build a pretty-good cost model of GC, and the cost must include things like chasing down all those IORefs). -Jan Thanks, - Ravi ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: storing highly shared data structures
On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote: Christian Maeder wrote: Bulat Ziganshin wrote: CM My old version is faster, because the version with makeStableName does CM very much GC. CMMUT time 27.28s ( 28.91s elapsed) CMGCtime 133.98s (140.08s elapsed) try to add infamous +RTS -A10m switch ;) You saved my day, thank you Bulat! Without that flag: MUT time 24.30s ( 24.76s elapsed) GCtime 131.25s (140.01s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time 155.55s (164.77s elapsed) And with it: MUT time 23.86s ( 24.86s elapsed) GCtime 11.03s ( 11.68s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time 34.89s ( 36.54s elapsed) The real problem seems to be that minor GCs are taking too long. Having said that, you can usually reduce GC overhead with large -A or -H options. Is the full table of stable names being traversed at every minor GC? If so, it might be worth somehow segregating the nursery stable names. I bet this complicates management a bunch, though, since right now there's only one table to look things up in. -Jan Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Optimizations for mutable structures?
On Dec 8, 2005, at 5:15 AM, Seth Kurtzberg wrote: [Discussion of the appropriate role of fairness.] The fundamental requirement is the same for all languages, I believe; the concurrently executing threads must produce a system state that is identical to _one_ system state which would be produced by running the threads sequentially. It is easy to show that to even enumerate all the possible sequences is NP-complete. Beyond the requirement of serializability, there is no practical alternative to a dose of human intelligence. I'd point out that we're lucky if we even get serializability on most machines. That's a pretty strong guarantee. -Jan-Willem Maessen At least people coding in Haskell have an understanding of the underlying issues. Alas, this is far from true for even experiences coders of imperative languages. Seth ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Optimizations for mutable structures?
Talk of uniqueness and so forth on the various Haskell mailing lists causes me to wonder: with so much imperative code being written in Haskell these days, to what extent is / should GHC perform standard imperative optimizations? A few things come to mind: - Fetch elimination for imperative reads: writeIORef r e acts readIORef r === writeIORef r e acts return e readIORef r = \e - acts readIORef r === readIORef r = \e - acts return e And that sort of thing, generalized for other imperative monadic types... My feeling is this doesn't come up much in code as written on the page, but might well be relevant after inlining. - Some way to turn the following idiom into memcpy (for any array type): do a - newArray writeArray a 0 e0 writeArray a 1 e1 writeArray a 2 e2 ...etc... What say others? Is there a need yet? (I don't honestly know the answer!) -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Optimizations for mutable structures?
Oh, dear, a brief mail with a high-level view of optimization seems to have touched off some confusion about concurrency. I wasn't thinking about concurrency at all when I wrote my original message, but there seem to be some major misconceptions in the ensuing discussion, which I'd like to clear up. On Dec 7, 2005, at 9:15 AM, Simon Marlow wrote: On 07 December 2005 13:38, Malcolm Wallace wrote: Jan-Willem Maessen [EMAIL PROTECTED] writes: - Fetch elimination for imperative reads: writeIORef r e acts readIORef r === writeIORef r e acts return e This transformation is valid only on single-threaded systems. If there is any possibility of an IORef being shared across threads, you are out of luck. (assuming 'acts' doesn't modify 'r'). This was my intended meaning---I dashed off the above lines, and trusted they'd be understood. Apparently I should have been clearer. No, Jan's transformation is correct even in a multithreaded setting. It might eliminate some possible outcomes from a non-deterministic program, but that's ok. I agree with Simon here. Eliminate some possible outcomes is indeed possible to define in a meaningful and technically rigorous way. Others (I'll single out Malcolm and Claus here) have railed against this view, arguing that every program behavior should be preserved. Claus even raises the (difficult technical) issue of fairness. Sadly, I'm afraid any realistic implementation *will* eliminate possible outcomes. In its cooperative multithreading, GHC constantly makes decisions about where thread switching may occur---and I suspect that inserting every possible thread switch would result in dramatic performance drops. But worse than that, even if we insert a check between every single IORef operation, the processor may well decide to execute successive IORef operations in parallel, or even choose to reorder two IORef operations which refer to different locations. It may even choose to eliminate the read in the above example before the write has become visible to any other thread! In effect we get behavior indistinguishable from the suggested optimizations. Now, such behavior isn't always acceptable, so there are ways to get back to sanity. However, those ways (memory barriers and/or atomic memory operations) are extremely expensive. I'm betting one or both of you regularly use an x86 machine---for which there is not even a rigorous specification of where these operations must be inserted! Nonetheless, we get by. We do so by defining idioms based on synchronization---MVars and TMVars are entirely appropriate places to be enforcing memory ordering. Much of the early pain of the Java Memory Model revision (Claus referred to the mess which was made of the original spec, now fixed) was to avoid the need to insert barriers in most code. A consensus was reached on an acceptable programming style: Use monitor synchronization and avoid data races, or failing that use volatile variables in particular well-defined ways. If you break those rules, all bets are off; there is a lengthy spec defining exactly what that means (mostly to rule out the behavior then the program creates a valid password out of thin air), but this is everything you, the programmer, need to understand. Similar consensus opinions have formed in other communities, usually without rigorous specifications to back them up. Voices in this thread have suggested that the right idiom for Haskell is to synchronize using TMVars and MVars. I agree (actually I hope that TMVars are enough, though I'd love to have a TMArray), and I think we can even guarantee reasonable things about IORefs that get passed from thread to thread in this way. Want fairness? Put the stuff you want to observe in an MVar or a TMVar. Where does this leave us? Roughly where Simon noted---it's easy to do shallow things, like fetch elimination in the absence of intervening calls to unknown functions, and hard to do deep optimizations. I suspect that shallow things are all that is called for. We love to criticize C for all the things it has done badly. But let's not assume that everything C compilers do is stupid or a bad idea. [Oddly, I find myself telling C programmers the same thing about Fortran all the time.] -Jan-Willem Maessen There's no requirement that all interleavings according to the semantics have to be implemented. This is a hard property to state precisely, indeed we gave up trying to in the concurrency/FFI paper: http://www.haskell.org/~simonmar/papers/conc-ffi.pdf, see Section 6.1. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell
Checking for WHNF (a horrible naughty thing)
I would like to do a horrible naughty thing (which I promise never to expose to the world). I would like to tell whether a term is in WHNF, without forcing evaluation of that term. Something like: isWHNF :: a - Bool Is there a way of doing this? I can fake it with an IORef and much unsafeness, but I'm wondering if there's a safe-but-ugly way of doing the test in GHC. If you're curious, I'm trying to compact exactly the evaluated spine of a list without changing the list's laziness in any way. It remains to be seen whether this is even vaguely a good idea. :-) -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Rules for use of unsafeThaw...
I've recently been experimenting with unsafeFreeze/unsafeThaw in GHC. Judicious use of these functions vastly reduces GC overhead in Data.HashTable. However, a slightly mis-timed GC will cause the whole mess to crash. I am attempting to understand the invariants required to safely use unsafeFreeze/unsafeThaw. I believe the following usage ought to be 100% safe: 1) Take the last and only reference to a mutable array, and call unsafeFreeze to obtain an immutable array. 2) Take the last and only reference to an immutable array, and call unsafeThaw to obtain a mutable array. Are there any other usage modes which are GC-safe? In particular, is it possible for a reference to the frozen array to exist somewhere on the heap when a call to unsafeThaw occurs? Is there a particular set of circumstances under which we might get away with this (eg, no allocation during the interval where the ambiguous references occur)? -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: jhc vs ghc and the surprising result involving ghc generated assembly.
Nice analysis. I indeed found with phc that shadow stack references absolutely killed performance, and I aggressively cached stack locations in locals, spilling to stack only when GC information needed to be accurate. [There was a giant infrastructure to save only live data to stack, but we won't go into that now as it was the source of almost all the codegen bugs...] On Oct 26, 2005, at 5:43 AM, John Meacham wrote: here is the C code that jhc generates. (As an aside, I am very proud of how readable and how much structure the jhc generated C code preserves of the original haskell. it's a small thing, perhaps only implementors appreciate it, but I am glad I spent the time needed to do so.) This makes a big difference. The phc compiler even put comments in the code so that I could figure out what came from where. v99 = fWXAXDfMainXDfac(v97, v98); return v99; ... notice that besides being a bit verbose and using a tailcall, I'm impressed that gcc found this. It's definitely living a bit dangerously, and your suggestions below for self tail call handling are the ones I found most effective. (They also allowed me to bypass some prologue garbage, since phc used a one-C-function-per-Haskell- function model with internal resumption points.) Non-self tail calls I was careful to compile to: return f(...); I expect from the above that gcc does better at spotting tail calls now. furthermore gotos and labels are very problematic for gcc to optimize around. for various tiresome reasons gcc cannot perform (most) code motion optimizations across explicit labels and gotos, especially when they deal with the global register variables and memory stores and loads. ... there are a couple of things we can do to mitigate these problems: get rid of indirect jumps whenever possible. use C control constructs rather than gotos. for loop introduction would be especially nice, but a bit tricky in practice I fear (requiring a game of spot the induction variable). A couple simple rules seem to help greatly. * turn anything of the form JMP_((W_)self) where self is oneself into a goto that gotos a label at the beginning of the function. Or better yet, wrap the whole function in do { } while (1); and replace JMP_ by continue. This avoids the troubles with goto which John mentioned above. It made a difference for phc, at least. Of course, if you can introduce loops elsewhere you might get yourself into trouble with this solution. * do simple pattern matthing on the basic blocks to recognize where C control constructs can be placed. for instance turn if (x) { goto y; } blah.. baz.. JMP_(foo) into if (x) { goto y; } else { blah.. baz.. JMP_(foo) } extending the else to after the next jump or goto. I'm surprised this actually helps, I must admit. * getting stack dereferences out of your loops. I recommend caching stack references in C locals where possible, but it's tricky to get this right if loop bodies include embedded function calls. Last I checked this wasn't an issue for GHC, since function calls were CPS-converted and only tight call-free loops ended up in a single function anyway. in order to get rid of the unessesary memory accesess, we need to either 1. fully convert it to use C control constructs, so gcc will do it for us. (code motion and loop invarient being inhibited again by gotos) As I recall, the right solution here is to compute dominator trees, and coalesce functions which are only tail called from their dominator into a single function. Alas, I've forgotten where I saw this written up, but there are indeed papers on it. Because it takes a bunch of effort on the part of the implementor, it'd be nice to see if its benefits are quantified. These should be straightforward to implement in the C code generator. it also suggests we might want to try to use the native C calling convention on leaf nodes that deal with unboxed values (so we get register passing and return values for free) or taking groups of mutually recursive functions and turning them all into one function with explicit jumps between them. Making sure things are marked static and occur in an appropriate dependency order helps a bit here. It might even be worth marking some stuff inline in the code generator, though that's shaky ground. I actually considered making everything static and putting outwardly- visible functionality in an extern wrapper---effectively carrying worker-wrapper down to the C level. some random notes: the 3x-7x factor was tested on an i386, on x86_64 the margin is much much greater for reasons that are still unclear. Does x86-64 use a register-based calling convention by default? If you compiled the i386 code using __regparm(2), would you see the same speed difference? -Jan-Willem Maessen
Re: Profiling and Data.HashTable
On Oct 14, 2005, at 10:17 AM, Ketil Malde wrote: Hi all, I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!)) Is there any reason to expect this behavior? Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption. That 5K number made me immediately suspicious, so I took a look at the source code to Data.HashTable. Sure enough, it's allocating a number of large IOArrays, which are filled with pointers. The practical upshot is that, for a hash table with (say) 24 entries, the GC must scan an additional 1000 pointers and discover that each one is []. I've seen other implementations of this two-level technique which use a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for less-than-gigantic hash tables. This might be worth doing in the Data.HashTable implementation. [Curious: what (if anything) is being used to test Data.HashTable? I'd be willing to undertake very small amounts of fiddling if I could be sure I wasn't slowing down something which mattered.] -Jan-Willem Maessen Suggestions? -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: addListToFM
On Jun 3, 2005, at 7:51 AM, Christian Maeder wrote: Serge D. Mechveliani wrote: I used to apply FiniteMap.addListToFM. Now, what is its best expression in ghc-6.4 ? Is this \ mp pairs - Map.union (Map.fromList pairs) mp ? (the order of the arguments in Map.union is essential). This looks fine to me. best is hard to meet. I'ld suggest: foldr (uncurry Map.insert) The trouble is, neither of these approaches is demonstrably better for *all* finitemap implementations. The latter is probably best for standard binary trees with a balance metric, where union is notoriously difficult. But for other structures the union approach may work better (a randomly-balanced tree I'm fiddling with has this property, I think). That argues for presence in the library. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Map library
On Jun 2, 2005, at 10:29 AM, Mario Blazevic wrote: ... 3. The Data.Map looks much better than the FiniteMap library, and its export list is very complete. There are, however, two (or four) more functions that would be really nice to have in there, as they are impossible to write efficiently with the functions currently provided: mapFilter :: (a - Maybe b) - Map k a - Map k b mapFilter f = map Maybe.fromJust . filter Maybe.isJust . map f mapPartition :: (a - Either b c) - Map k a - (Map k b, Map k c) mapPartition f = removeTags . partition isLeft . map f where isLeft (Either.Left _) = True isLeft (Either.Right _) = False removeTags (leftMap, rightMap) = (map (\ (Left x) - x) leftMap, map (\ (Right x) - x) rightMap) Having worked on implementing, eg. IntSet in terms of FiniteMap-type code + UInt32 bitmaps at the nodes, I eventually hit on a basic set of functions which seem to let you do everything. Here in a descriptive (but possibly buggy) notation: data Found a = Keep | Modify a | Delete data Missing a = Omit | Insert a lookupLike :: (Ord k) = (r, Missing a) - -- What to do to map when the key is not found (a - (r, Found a)) --- What to do to map when the key is found k - Map k a - (r, Map k a) lookupLike can be used to do arbitrarily nasty update, insertion, and deletion actions. The Keep and Omit actions allow fast return without re-allocation if nothing changes. It can also be used to perform pure lookup of all kinds, though it'll generally be unacceptably clunky. We really should add a k argument to both the found and not found actions to be completely general---but this only matters if equality throws away information that is actually important, for example if we're being perverse and stashing our value in the key. type MapOrFilter k a b = k - a - Maybe b mapFilterLike :: (Ord k) = MapOrFilter k a b - Map k a - Map k b This is basically the mapFilter from above, with the key argument thrown in. From this we can get arbitrary generalizations of map and filter. mapPartition from above, again with an additional key argument, should be the only splitter you ever need. data UpdateAction k a b where Discard :: UpdateAction k a b NoUpdate :: UpdateAction k a a UpdateWith :: MapOrFilter k a b - UpdateAction k a b joinLike :: (Ord k) = UpdateAction k a c --- What to do with elements only in first map UpdateAction k b c --- What to do with elements only in second map (k - a - b - Maybe c) - -- How to combine elements in both maps Map k a - Map k b - Map k c This can be used to implement union, intersection, difference, union with combine, and so forth. Alas, the complexity of an UpdateAction is required, since NoUpdate and Discard can change the asymptotic complexity of the join relative to doing everything using MapOrFilter. Technically, of course, we can replace the type UpdateAction k a b with an equivalent function Map k a - Map k b, but that doesn't clearly convey the intention. So, anyone bold enough to put these most general possible functions into a Map implementation? I'm guessing there won't be general agreement on the names / types of the actions, but the general idea is tested and sound. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Can't compile GHC
On May 9, 2005, at 4:55 PM, Bennett Todd wrote: 2005-05-09T16:09:16 Daniel Carrera: I think it should be if you want Haskell to grow in acceptance. The first barrier that new potential users hit is the one that causes most people to give up and move on to a different project. That first barrier should be made as low as possible. This is a first barrier only for users on platforms that don't currently have ghc maintainers. Given how tricksy it is getting a working ghc build, even on a supported platform, doing so is relegated to the few and devoted; new users are only welcome on the platforms for which pre-built ghc binaries are available. I'll echo this. The Darwinports GHC build is the first time I've ever gotten GHC to build out of the box. Previous attempts have required multiple hours of hacking, even with a same-version binary installed on the same platform. Part of that was being burdened with variously non-standard machine configurations. But mostly it's because the fptools build structure was historically pretty fragile, and stuff broke badly if it didn't get built in just the right order with just the right versions of things. On many occasions I just gave up and kept the sources around for reference only (eg to look things up in the library code when the docs were fuzzy...). My perception is that this is improving with time. But it's still a big time investment. Certainly I don't expect to ever have the time to coax GHC to build from source on Solaris nowadays. -Jan-Willem Maessen I agree with you, Daniel, that this is a terribly unfortunate state of affairs, but as has already been said, the folks who are doing the actual work (I ain't one of 'em) have decided that making ghc easier to bootstrap onto a supported platform is not a priority for them, they're content that virtually all ghc users should have to download prebuilt binaries, or source packages on platforms where devoted souls have built working automated builds. -Bennett ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: fast IO in ghc
Simon Peyton-Jones [EMAIL PROTECTED] writes: And, with a lot of help from Koen, I'm about to fold in a much more efficient implementation of Read, which may help. Any of the guilty parties want to give a quick overview of how it's going to work? -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Straightforward conversion from Int - Word
Simon Marlow notes: But fromIntegral *does* do the right thing, doesn't it? Numeric.showHex (fromIntegral (-1 :: Int32) :: Word32) 0x Interesting. The fact that this fooled Julian as well as me suggests the behavior needs to be better documented. Interestingly, hugs yields an error. I'm going to follow up in more detail to haskell, as this is really about the behavior of fromInteger, and has nothing to do with Word32 or Int32 per se at all. -Jan ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: The size of things
Yes, I see. Would it be possible to have a standard strict list, i.e. something equivalent of data SList a = SNil | SCons !a SList (which could be a member of the same class as the normal lists, and have the usual functions (length, ++, isPrefixOf...) overloaded)? Unfortuantely, as I understand it the -funbox-strict-fields optimization doesn't apply to polymorphic fields. That is, you get the optimization for: data SIntList = SINil | SICons !Int SIntList but not for type SListOfInt = SList Int This is because SList a must have a uniform representation for every possible a, and the Ints must therefore be boxed. You could verify my instincts by trying this with your example and looking at the space usage... If strict cons is what's desired, than it shouldn't be hard to add it as a function and keep lists the way they are. It sounds like what's really wanted is an unboxed list class structure similar to the hierarchy of unboxed arrays. That's quite a bit more complicated. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: State Transformer
Theodore Norvell [EMAIL PROTECTED] asks: Jorge's question raised a question in my mind. The IOExts module has many of the same features as the ST module, why are there two ways to do the same thing? Is the ST module only there for legacy purposes? The ST monad provides safer encapsulation of mutable references. We can prove that references which escape a particular instance of ST are never side effected. See the paper Lazy Functional State Threads: http://www.cse.ogi.edu/~jl/Papers/stateThreads.ps This allows us to construct functions which are certain to present a functional face to the world, but use mutation internally. In this respect, ST is actually better than IO, albeit less well-supported. -Jan-Willem Maessen [Note that the above paper presents the version of ST contained in the LazyST library these days (if my memory serves me right). The arguments about encapsulation apply in either case.] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Questions about sharing
On Fri, 7 Dec 2001, Adrian Hey replied: On Friday 07 December 2001 2:27 pm, D. Tweed wrote: On Fri, 7 Dec 2001, Adrian Hey wrote: The first is.. Does the compiler keep a unique copy of expressions which consist of just a single zero arity constructor (eg. [],True,Nothing..) as a CAF which is referenced each time the constructor appears in an expression, or does it duplicate the constructor (expression) each time it's used. Well I suppose if it's necessary to create a new indirection heap record for each reference, then there's not really any point in having a single copy of the value itself. But I don't see why that should be so. Even if it is indirected for some reason it should still be possible to share the indirection record I think. Maybe CAF is the wrong word to use here since there's no application involved. What I mean is.. are zero arity constructors referenced the same way as a top level constant would be? (Not that I know how that's done, but I presume there's only 1 copy of top level constants in memory at any one time.) My understanding is that GHC tries to have it both ways. Here's my understanding of how it works (implementors should probably chime in if my memory is faulty or out of date): 1) There is a single, canonical copy of every nullary constructor. This canonical copy is used wherever possible---just like a top-level constant. 2) Updating a thunk with an indirection makes it expensive to obtain the thunk's value. The extra indirection does not require allocation---the only reason we need indirections at all is to overwrite memory that previously held a thunk. The real problem is that it takes time to chase down indirections once they exist. Therefore when a thunk evaluates to a nullary constructor, it is overwritten directly. This effectively creates another copy of the nullary constructor. 3) When the GC runs, instead of copying these newly-created nullary constructors, it replaces them with the canonical copy. [The GC also eliminates indirections, and thus helps us no matter what we do in 2) above] Thus we get the best of both worlds---efficient thunk update and reasonable space consumption. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: ghc --make feature request
Simon Marlow [EMAIL PROTECTED] writes: GHC actually has rather sophisticated recompilation checking which goes beyond just checking whether the interface changed - it keeps version information for each entity exported by a module and only recompiles if any of the entities actually used by the module have changed (this is described in the user's guide under the section on recompilation checking). So the upshot is that what you're describing shouldn't happen, and it may be a bug. Could you send us more info? Note that in GHC, the version number of a function can often change for hard-to-spot reasons. You just need to change (for example) the strictness properties of the function, which can be very easy to do when making changes to your code. The compiler cares about (much) more than just the types of imported objects. I tend to expect recompiles whenever something depends on a function I've changed, even if I don't think the changes were very significant. -Jan-Willem Maessen ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Working character by character in Haskell
Simon Marlow [EMAIL PROTECTED] writes: To really match the C program, you need to use IOExts.hGetBuf and IOExts.hPutBuf, and do the operations on raw characters in memory. Using a UArray of Word8 would be better, but there aren't any operations to do IO to/from a UArray yet (actually I've written these, but they aren't in the tree yet). So why don't getContents / putStr / etc. deforest cleanly to calls to hGetBuf and hPutBuf? I'm genuinely curious; my own experience in this direction is The engineering is challenging. These functions are so commonly used, though---and they're vastly easier to use, actually portable, etc. The effort would surely be repaid. Plus, I'm curious to hear about any challenges which may be involved in pulling this off. :-) If it's genuinely tricky, the tricks are likely to be generally useful for other stream-ish things. -Jan-Willem Maessen Eager Haskell Project [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Lightningspeed haskell
Josef Svenningsson [EMAIL PROTECTED] wrote: One benchmark turned out to give pretty remarkable results. It's the producer/consumer benchmark. I suggest you all take a look at it. The haskell version is six (SIX!!!) times faster than the c version. Hey, what's going on here? I would really like to hear some comments from our dear implementors. And Simon Peyton-Jones [EMAIL PROTECTED] replied: I guess GHC is fast because it's implementing lightweight threads inside a single OS thread. Absolutely. Good high-level thread support trumps anything provided by the operating system. I'd second Simon's guess that this was on Linux or some other system where pthreads map one-to-one to OS threads. Similar dramatic performance disparities have cropped up in the Java community. There are Java benchmarks which create thousands of threads; most implementations slow to a crawl when this happens, as the operating system collapses under the crushing load. Those which don't (IBM's Jalapeno, a research JVM, comes to mind) use this as a *big* bragging point. Unless you're mucking with thread priorities, mapping language-level threads one-to-one to operating-system threads is a terrible idea. -Jan-Willem Maessen Eager Haskell project [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Minimal typing derivations and free theorems...
Josef Sveningsson [EMAIL PROTECTED] writes: It's interesting reading. However, it seems to me that it would interact badly with minimal typing derivations (mtd). Mtd is an algorithm for type checking which computes the least general type instead of the most general. This is used by some compilers to guide other optimisations. My instinct is that any analysis which makes use of free theorems is going to require a most general type. This is because the free theorems are based upon the most general type---and of course the more polymorphism there is, the stronger the free theorems that can be derived in general. That being said, I don't see any difficulty with using both kinds of type information---the information we obtain with the two analyses is different, and both types constrain program behavior in different ways. Mtd is unlikely to be useful for extracting bottoms, but there is no reason bottom extraction should stop us from assigning a more specific minimal type to an error-handling expression. Actually, though, I wonder if mtd are useful at all for expressions known to be bottom---this knowledge is an assertion that we need not represent the value at all, and consequently seems strictly more flexible than any representation we might choose by eliminating polymorphism. The most important observation, really, is that we can still assign a minimal type (like "Int") to a bottom expression, and thus that we will not contaminate representation analysis or other mtd-based optimizations. And, in fact, its bottom-ness means we can assign such a minimal type to each occurrence, so that the sharing of the bottom expression need not introduce spurious polymorphism. -Jan-Willem Maessen
Inlining errors...
George Russell writes: Well that's a big question. As I understand it, the errant value, lvl20, is a string representing an error message (which I suppose is to be thrown in the event of a matching failure). Since it should only be used on error, surely there is no speed benefit to inlining it, and probably there is a space penalty (since you store the string multiple times). In this case there is a very real disadvantage, since inlining forces everything that imports that module to recompile. So my suggestion would be that values required only for errors should never be inlined or put in a .hi file. I agree so wholeheartedly that I just write a (very!) short paper on the subject for ICFP (having discovered to my surprise that no such write-up existed). It describes how to identify such expressions and hoist them out so they don't end up getting inlined. It's still being refereed is thus likely to be revised, but if you're interested in a pre-print take a look (there are no pointers to it from elsewhere at the moment): @unpublished{bottomExtract, AUTHOR = {Jan-Willem Maessen}, YEAR = {1999}, month= Mar, TITLE= {Bottom Extraction: Factoring error handling out of functional programs}, documentURL = "http://www.csg.lcs.mit.edu/~earwig/extraction.ps" note = {Submitted to ICFP 2000} } -Jan-Willem Maessen
MVar behavior...
I've been following the MVar debate with some interest. As I've mentioned in other mail, M structures were an Id thing, and so we here at MIT are in some sense responsible for their peculiar non-uniformity. It should be noted that M-structures were defined by analogy to I-structures---which are write-once, read-many. Thus, putMVar signals an error on a full location primarily because the I-structure write did as well. This non-uniformity stems from the fact that we can easily cook up an MVar implementation which stores queued reads in the same storage which it uses for the data (which is what I-structures do). It's an implementation hack. I have no idea whether this is true in GHC (I don't have the source handy right now), but I suspect it isn't quite. Really, it does seem to me that MVar behavior _ought_ to be symmetric. In this case, there are two options: * M-structure operations block. This shouldn't be a problem in practice, as M-structures mostly get used in a producer-consumer fashion, where this is what we want, or they get incrementally updated, in which case we should be able to prove that suspension on write is impossible (though the necessary compiler logic may not be worth the bother). * M-structure operations don't block. This breaks one of the important early M-structure ideas: That the presence state of the M-structure was invisible and enforced by the underlying synchronization. On the other hand, when we're using M-structures for e.g. caching we end up wasting a lot of effort coding up presence state explicitly in another MVar. Alas, my instinct is that Claus Reinke's "tryButDoNotSuspend" probably isn't the right way to go here. Effectively what it does is "change the mode" of the IO monad, altering the behavior of M-structures. It seems to me that blocking and non-blocking operations really are fundamentally different and have fundamentally different underlying implementations, and there should simply be a separate set of operations for each. I don't see an obvious way to code up the suspensive operations using the non-suspensive ones, since you want to associate the suspension state with the MVar (and do so atomically, in case someone else is trying to change the state). My conclusions: The suspensive behaviors are necessary. At least one, and preferably both of takeMVar/putMVar must be suspensive. The non-suspensive behaviors are nice, but should be captured using separate operations. Cooking them up using multiple MVars will work, but is arguably only necessary because the implementation isn't flexible enough. -Jan-Willem Maessen