RE: Planning for the 7.12 release
Lennart always says he’s going to work on it, but he’s a busy man and nothing has actually happened. It’s a pretty easy feature to implement I think. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Sean Leather Sent: 28 August 2015 06:40 To: Ben Gamari Cc: GHC developers Subject: Re: Planning for the 7.12 release On Thu, Aug 27, 2015 at 5:38 PM, Ben Gamari wrote: These items are a bit less certain but may make it in if the authors push forward quickly enough, * Support for Type Signature Sections, allowing you to write (:: ty) as a shorthand for (\x - x :: ty). Once Lennart convinced me of the usefulness of this [1], I've been finding plenty of places where I wish I had it. Who's working on it? What's the status? Is there a ticket for it? Regards, Sean [1] http://augustss.blogspot.com/2014/04/a-small-haskell-extension.html ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
Ben Gamari b...@well-typed.com writes: These items are a bit less certain but may make it in if the authors push forward quickly enough, [..] * A (possible) overhaul of GHC's build system to use Shake instead of Make. Is there a breakdown of what remains to be done on this front? -- с уважениeм / respectfully, Косырев Серёга ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Fwd: Thread-safe Hint
Hello, I am wondering if GHC is now thread-safe. I am using Hint, and it reports that GHC is not thread-safe, and that I can't safely run two instances of the interpreter simultaneously. Is that still the case? Thanks! Corentin -- Forwarded message -- From: Daniel Gorín jcpetru...@gmail.com Date: Thu, Aug 27, 2015 at 5:09 PM Subject: Re: Thread-safe Hint To: Corentin Dupont corentin.dup...@gmail.com Hi Corentin, sorry for the late reply. Until relatively recently, the problem was still on. But I too remember seeing something related to this issue being fixed (iirc, the problem was the runtime linker, which used global state), so perhaps it is already fixed in 7.10. If you can verify this, it shouldn’t be hard to show the error message only on old versions of ghc. I’ll be away for a couple of weeks, but if you want to look into this and send a patch, I’ll merge it when I return. Cheers, Daniel On 24 Aug 2015, at 10:43 am, Corentin Dupont corentin.dup...@gmail.com wrote: Hello Daniel, I noticed the following message in Hint: This version of GHC is not thread-safe,can't safely run two instances of the interpreter simultaneously. Is it still the case with recent versions of GHC? It would be neat to be able to launch several instances of the interpreter. In my game Nomyx I have several match-up going on and having one instance of the interpreter would be nicer. Otherwise I am obliged to reset the interpret each time I want to interpret something, which is time consuming (2-3 seconds). Thanks, C ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: Planning for the 7.12 release
Andrey Mokhov is the man to ask. I'm copying him. Simon | -Original Message- | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | Kosyrev Serge | Sent: 28 August 2015 11:31 | To: Ben Gamari | Cc: GHC developers | Subject: Re: Planning for the 7.12 release | | Ben Gamari b...@well-typed.com writes: | These items are a bit less certain but may make it in if the authors | push forward quickly enough, | | [..] | | * A (possible) overhaul of GHC's build system to use Shake | instead | of Make. | | Is there a breakdown of what remains to be done on this front? | | -- | с уважениeм / respectfully, | Косырев Серёга | ___ | ghc-devs mailing list | ghc-devs@haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
Matthew Pickering matthewtpicker...@gmail.com writes: Hi Ben, I think that D1152 (Record Pattern Synonyms) will be ready for 7.12. https://phabricator.haskell.org/D1152 Ahh yes. Thanks for pointing this out. I've added it to the list. Cheers, - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: ArrayArrays
At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon From: Edward Kmett [mailto:ekm...@gmail.com] Sent: 27 August 2015 16:54 To: Simon Peyton Jones Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs Subject: Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code a a new data type, which can speed things up a bit when you don't have very big arrays: data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil But what if I wanted my object itself to live in # and have two mutable fields and be able to share the sme write barrier? An ArrayArray# points directly to other unlifted array types. What if we have one # - * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the ArrayArray#'s hold other arrayarrays. data DLL = DLL (MutableArrayArray# RealWorld) now I need to make up a new Nil, which I can just make be a special MutableArrayArray# I allocate on program startup. I can even abuse pattern synonyms. Alternately I can exploit the internals further to make this cheaper. Then I can use the readMutableArrayArray# and writeMutableArrayArray# calls to directly access the preceding and next entry in the linked list. So now we have one DLL wrapper which just 'bootstraps me' into a strict world, and everything there lives in #. next :: DLL - IO DLL next (DLL m) = IO $ \s - case readMutableArrayArray# s of (# s', n #) - (# s', DLL n #) It turns out GHC is quite happy to optimize all of that code to keep things unboxed. The 'DLL' wrappers get removed pretty easily when they are known strict and you chain operations of this sort! Cleaning it Up -- Now I have one outermost indirection pointing to an array that points directly to other arrays. I'm stuck paying for a card marking table per object, but I can fix that by duplicating the code for MutableArrayArray# and using a SmallMutableArray#. I can hack up primops that let me store a mixture of SmallMutableArray# fields and normal ones in the data structure. Operationally, I can even do so by just unsafeCoercing the existing SmallMutableArray# primitives to change the kind of one of the arguments it takes. This is almost ideal, but not quite. I often have fields that would be best left unboxed. data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil was able to unpack the Int, but we lost that. We can currently at best point one of the entries of the SmallMutableArray# at a boxed or at a MutableByteArray# for all of our misc. data
Re: Planning for the 7.12 release
Elliot Cameron elliot.came...@covenanteyes.com writes: I can't seem to find the exact trac ticket, but the ability to swap out Integer implementations at link time would be a huge relief on Windows, which suffers from various problems with dynamic linking. I believe it was originally slated for 7.12. Can someone find it? Here's what I did find: https://ghc.haskell.org/trac/ghc/wiki/ReplacingGMPNotes A related discussion: https://github.com/commercialhaskell/stack/issues/399 As it stands, we've been trying hard to find good ways to provide custom GHC variants more easily to end users. Hmm, interesting. I'm not sure how realistic it is to make this a link-time option, however, considering that we may inline bindings from whatever integer package we compile against into the user's program. Herbert, do you have any thoughts on this? Cheers, - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
For the record, I wanted to implement this feature a couple of months ago but then got side-tracked. If somebody wants to pick it up (which would be great), please lemme know. In the meantime I've created https://ghc.haskell.org/trac/ghc/ticket/10803 and write up the little information I have on this topic already On 2015-08-28 at 09:20:54 +0200, Simon Peyton Jones wrote: Lennart always says he’s going to work on it, but he’s a busy man and nothing has actually happened. It’s a pretty easy feature to implement I think. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Sean Leather Sent: 28 August 2015 06:40 To: Ben Gamari Cc: GHC developers Subject: Re: Planning for the 7.12 release On Thu, Aug 27, 2015 at 5:38 PM, Ben Gamari wrote: These items are a bit less certain but may make it in if the authors push forward quickly enough, * Support for Type Signature Sections, allowing you to write (:: ty) as a shorthand for (\x - x :: ty). Once Lennart convinced me of the usefulness of this [1], I've been finding plenty of places where I wish I had it. Who's working on it? What's the status? Is there a ticket for it? Regards, Sean [1] http://augustss.blogspot.com/2014/04/a-small-haskell-extension.html ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -- Elegance is not optional -- Richard O'Keefe ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release: MonadFail
David Luposchainsky dluposchain...@googlemail.com writes: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey Ben, my summer was pretty busy, but we recently fixed our MonadFail implementation to work as desired, so that should make it in as well. We'll have to survive a heroic rebase/squash that we'll probably do in September when we're back from our holidays. Great, I've added it to the list. Keep us in the loop as things progress. Cheers, - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: ArrayArrays
On 27/08/2015 19:36, Edward Kmett wrote: On Thu, Aug 27, 2015 at 1:24 PM, Edward Z. Yang ezy...@mit.edu mailto:ezy...@mit.edu wrote: It seems to me that we should take a page from OCaml's playbook and add support for native mutable fields in objects, because this is essentially what a mix of words and pointers is. That actually doesn't work as well as one might hope. We currently treat data constructor closures as so much tissue paper around a present. We tear them open, rip out all their contents, scatter them throughout our code and then we build a whole new data constructor closure when we're done, or we just leave them suspended in closures awaiting someone to demand we finally make a new data constructor. Half the time we don't even give back the data constructor closure and push it into update g frames and we just give back the items on the stack. With the machinery I mentioned above I get a world where every time I access an object I can know it is evaluated for real, so this means I'm not stuck 'entering an unknown closure', and getting it to give me back a slab of memory that we know is a real data constructor that i can bang away on mutable entries in. In a world where things in * could hold mutable pointers we have to care a lot more about object identity in deeply uncomfortable ways. With what I've implemented I only care about object identity between things in # that are gcptrs. The garbage collector may move them around, but it doesn't put in thunks anywhere. Yeah, I've actually thought about whether we could have mutable fields in constructors a couple of times, and it's far from easy for the reasons you describe. A constructor with mutable fields would need to be an object with identity, with precise control over when it is created. This is nothing like an ordinary constructor. I like the alternative approach in this thread, which is to attack the problem from the other end: start with a primitive object and make it more like a constructor. I don't see any reason why we shouldn't add primops to read/write SmallArray# and other primitive objects in an ArrayArray#. Will someone make a patch? It should be pretty straightforward. Cheers, Simon ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
On 2015-08-28 at 12:31:13 +0200, Kosyrev Serge wrote: [...] * A (possible) overhaul of GHC's build system to use Shake instead of Make. Is there a breakdown of what remains to be done on this front? Btw, here's the GitHub repo were you can track the progress: https://github.com/snowleopard/shaking-up-ghc ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
Kosyrev Serge _deepf...@feelingofgreen.ru writes: Ben Gamari b...@well-typed.com writes: These items are a bit less certain but may make it in if the authors push forward quickly enough, [..] * A (possible) overhaul of GHC's build system to use Shake instead of Make. Is there a breakdown of what remains to be done on this front? I'm actually not entirely clear on the general plan regarding the Shake-up and perhaps this is a good time to discuss it. How gradual of a transition do we envision this will be? Specifically, for how long do we want the two build systems to coexist (if at all)? If this is intended to be an immediate wholesale switch to Shake I would be very skeptical of merging for 7.12 as three months is, in my opinion, very little time to test such a sweeping change. However, a long transition time is also not terribly desirable as maintaining two largely independent build systems potentially carries significant cost. Can we in principle expect the Shake build system to build all of the configurations GHC currently supports from day-one (including, for instance, cross compilation)? The batch files in the repository suggest that it has been used on Windows but has it been tested on our other Tier 1 platforms? I just attempted to use the current state of the repository and sadly found that things fell apart pretty quickly [1]. I would love to see this happen, but obviously we need to tread carefully when performing such a major overhaul to code so central to the project. Moreover, I would really like to minimize the probability that we increase the maintenance burden of our build infrastructure. I think if there is clear communication regarding what remains to be done and motivation to quickly finish these items then Shakification is still an possibility for 7.12. Otherwise I personally think we may want to be a bit conservative. End users can always check out the shaking-up-ghc repository into their GHC trees themselves if they want to try using it. I'm certainly willing to consider other opinions, however. Cheers, - Ben [1] After I manually ran ./boot, $ _shake/build --lint --directory .. $@ ... # various output from ./configure Reading shake/cfg/system.config... Reading package dependencies... Error when running Shake build system: * OracleQ (PackageDataKey (libraries/bin-package-db/dist-boot/package-data.mk,libraries_bin-package-db_dist-boot_LIB_NAME)) * libraries/bin-package-db/dist-boot/package-data.mk * libraries/bin-package-db/dist-boot/package-data.mk libraries/bin-package-db/dist-boot/haddock-prologue.txt libraries/bin-package-db/dist-boot/inplace-pkg-config libraries/bin-package-db/dist-boot/setup-config libraries/bin-package-db/dist-boot/build/autogen/cabal_macros.h * libraries/binary/dist-boot/package-data.mk * libraries/binary/dist-boot/package-data.mk libraries/binary/dist-boot/haddock-prologue.txt libraries/binary/dist-boot/inplace-pkg-config libraries/binary/dist-boot/setup-config libraries/binary/dist-boot/build/autogen/cabal_macros.h * /mnt/work/ghc/ghc-shake/inplace/bin/ghc-cabal Error, file does not exist and no rule available: /mnt/work/ghc/ghc-shake/inplace/bin/ghc-cabal signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: Planning for the 7.12 release
Actually that’s a good idea. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Greg Weber Sent: 28 August 2015 16:43 To: Ben Gamari Cc: GHC developers Subject: Re: Planning for the 7.12 release Can we call this GHC 8.0 instead of 7.12 ? Overloaded record fields and backtraces are a huge missing piece to Haskell. It would be nice to have the bump to celebrate this occasion and say that Haskell 8 is ready. I have had a hard time seriously recommending Haskell due to those last missing features. Now I should be able to say without reservation: use Haskell 8; it is great! On Thu, Aug 27, 2015 at 8:38 AM, Ben Gamari b...@well-typed.commailto:b...@well-typed.com wrote: Hello everyone! With the 7.10.1 release nearly six months behind us and 7.10.2 out of the way, now is a good time to begin looking forward to 7.12. In keeping with the typical release pace, we are aiming to have a release candidate ready in mid-December 2015 and a final release in January 2016. The items that that we currently believe have a good chance of making it in to 7.12 are listed on the release status page [1], which I've summarized below (in no particular order), * Support for implicit parameters providing callstacks and source locations * Support for wildcards in data and type family instances * A new, type-indexed type representation, data TTypeRep (a :: k). * Introduction of visible type application * Support for reasoning about kind equalities * Support for Injective Type Families * Support for the Strict language extension * Support for Overloaded Record Fields, allowing multiple uses of the same field name and a form of type-directed name resolution. * A huge improvement to pattern matching (including much better coverage of GADTs) * Backpack is chugging along; we have a new user-facing syntax which allows multiple modules to be defined a single file, and are hoping to release at least the ability to publish multiple units in a single Cabal file. * Support for Applicative Do, allowing GHC to desugar do-notation to Applicative where possible. * Improved DWARF based debugging support including backtraces from Haskell code * An Improved LLVM Backend that ships with every major Tier 1 platform. These items are a bit less certain but may make it in if the authors push forward quickly enough, * Support for Type Signature Sections, allowing you to write (:: ty) as a shorthand for (\x - x :: ty). * A (possible) overhaul of GHC's build system to use Shake instead of Make. * A DEPRECATED pragma for exports Is your pet project missing from this list? If you have a patch that you believe is on-track to make it in for 7.12, please let us know. Moreover, if you have an issue that you urgently need fixed in 7.12, please express you interest on the appropriate ticket. User feedback helps us immensely in figuring out how to best place our priorities. Cheers, - Ben [1] https://ghc.haskell.org/trac/ghc/wiki/Status/GHC-7.12.1 ___ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
Can we call this GHC 8.0 instead of 7.12 ? Overloaded record fields and backtraces are a huge missing piece to Haskell. It would be nice to have the bump to celebrate this occasion and say that Haskell 8 is ready. I have had a hard time seriously recommending Haskell due to those last missing features. Now I should be able to say without reservation: use Haskell 8; it is great! On Thu, Aug 27, 2015 at 8:38 AM, Ben Gamari b...@well-typed.com wrote: Hello everyone! With the 7.10.1 release nearly six months behind us and 7.10.2 out of the way, now is a good time to begin looking forward to 7.12. In keeping with the typical release pace, we are aiming to have a release candidate ready in mid-December 2015 and a final release in January 2016. The items that that we currently believe have a good chance of making it in to 7.12 are listed on the release status page [1], which I've summarized below (in no particular order), * Support for implicit parameters providing callstacks and source locations * Support for wildcards in data and type family instances * A new, type-indexed type representation, data TTypeRep (a :: k). * Introduction of visible type application * Support for reasoning about kind equalities * Support for Injective Type Families * Support for the Strict language extension * Support for Overloaded Record Fields, allowing multiple uses of the same field name and a form of type-directed name resolution. * A huge improvement to pattern matching (including much better coverage of GADTs) * Backpack is chugging along; we have a new user-facing syntax which allows multiple modules to be defined a single file, and are hoping to release at least the ability to publish multiple units in a single Cabal file. * Support for Applicative Do, allowing GHC to desugar do-notation to Applicative where possible. * Improved DWARF based debugging support including backtraces from Haskell code * An Improved LLVM Backend that ships with every major Tier 1 platform. These items are a bit less certain but may make it in if the authors push forward quickly enough, * Support for Type Signature Sections, allowing you to write (:: ty) as a shorthand for (\x - x :: ty). * A (possible) overhaul of GHC's build system to use Shake instead of Make. * A DEPRECATED pragma for exports Is your pet project missing from this list? If you have a patch that you believe is on-track to make it in for 7.12, please let us know. Moreover, if you have an issue that you urgently need fixed in 7.12, please express you interest on the appropriate ticket. User feedback helps us immensely in figuring out how to best place our priorities. Cheers, - Ben [1] https://ghc.haskell.org/trac/ghc/wiki/Status/GHC-7.12.1 ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: ArrayArrays
I'd love to have that last 10%, but its a lot of work to get there and more importantly I don't know quite what it should look like. On the other hand, I do have a pretty good idea of how the primitives above could be banged out and tested in a long evening, well in time for 7.12. And as noted earlier, those remain useful even if a nicer typed version with an extra level of indirection to the sizes is built up after. The rest sounds like a good graduate student project for someone who has graduate students lying around. Maybe somebody at Indiana University who has an interest in type theory and parallelism can find us one. =) -Edward On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates fryguy...@gmail.com wrote: I think from my perspective, the motivation for getting the type checker involved is primarily bringing this to the level where users could be expected to build these structures. it is reasonable to think that there are people who want to use STM (a context with mutation already) to implement a straight forward data structure that avoids extra indirection penalty. There should be some places where knowing that things are field accesses rather then array indexing could be helpful, but I think GHC is good right now about handling constant offsets. In my code I don't do any bounds checking as I know I will only be accessing my arrays with constant indexes. I make wrappers for each field access and leave all the unsafe stuff in there. When things go wrong though, the compiler is no help. Maybe template Haskell that generates the appropriate wrappers is the right direction to go. There is another benefit for me when working with these as arrays in that it is quite simple and direct (given the hoops already jumped through) to play with alignment. I can ensure two pointers are never on the same cache-line by just spacing things out in the array. On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett ekm...@gmail.com wrote: They just segfault at this level. ;) Sent from my iPhone On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote: You presumably also save a bounds check on reads by hard-coding the sizes? On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote: Also there are 4 different things here, basically depending on two independent questions: a.) if you want to shove the sizes into the info table, and b.) if you want cardmarking. Versions with/without cardmarking for different sizes can be done pretty easily, but as noted, the infotable variants are pretty invasive. -Edward On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote: Well, on the plus side you'd save 16 bytes per object, which adds up if they were small enough and there are enough of them. You get a bit better locality of reference in terms of what fits in the first cache line of them. -Edward On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote: Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty
Re: ArrayArrays
I completely agree. I would love to spend some time during ICFP and friends talking about what it could look like. My small array for STM changes for the RTS can be seen here [1]. It is on a branch somewhere between 7.8 and 7.10 and includes irrelevant STM bits and some confusing naming choices (sorry), but should cover all the details needed to implement it for a non-STM context. The biggest surprise for me was following small array too closely and having a word/byte offset miss-match [2]. [1]: https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 Ryan On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett ekm...@gmail.com wrote: I'd love to have that last 10%, but its a lot of work to get there and more importantly I don't know quite what it should look like. On the other hand, I do have a pretty good idea of how the primitives above could be banged out and tested in a long evening, well in time for 7.12. And as noted earlier, those remain useful even if a nicer typed version with an extra level of indirection to the sizes is built up after. The rest sounds like a good graduate student project for someone who has graduate students lying around. Maybe somebody at Indiana University who has an interest in type theory and parallelism can find us one. =) -Edward On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates fryguy...@gmail.com wrote: I think from my perspective, the motivation for getting the type checker involved is primarily bringing this to the level where users could be expected to build these structures. it is reasonable to think that there are people who want to use STM (a context with mutation already) to implement a straight forward data structure that avoids extra indirection penalty. There should be some places where knowing that things are field accesses rather then array indexing could be helpful, but I think GHC is good right now about handling constant offsets. In my code I don't do any bounds checking as I know I will only be accessing my arrays with constant indexes. I make wrappers for each field access and leave all the unsafe stuff in there. When things go wrong though, the compiler is no help. Maybe template Haskell that generates the appropriate wrappers is the right direction to go. There is another benefit for me when working with these as arrays in that it is quite simple and direct (given the hoops already jumped through) to play with alignment. I can ensure two pointers are never on the same cache-line by just spacing things out in the array. On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett ekm...@gmail.com wrote: They just segfault at this level. ;) Sent from my iPhone On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote: You presumably also save a bounds check on reads by hard-coding the sizes? On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote: Also there are 4 different things here, basically depending on two independent questions: a.) if you want to shove the sizes into the info table, and b.) if you want cardmarking. Versions with/without cardmarking for different sizes can be done pretty easily, but as noted, the infotable variants are pretty invasive. -Edward On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote: Well, on the plus side you'd save 16 bytes per object, which adds up if they were small enough and there are enough of them. You get a bit better locality of reference in terms of what fits in the first cache line of them. -Edward On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote: Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where
RE: Planning for the 7.12 release
Simon Peyton Jones simo...@microsoft.com writes: Actually that’s a good idea. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Greg Weber Sent: 28 August 2015 16:43 To: Ben Gamari Cc: GHC developers Subject: Re: Planning for the 7.12 release Can we call this GHC 8.0 instead of 7.12 ? Overloaded record fields and backtraces are a huge missing piece to Haskell. It would be nice to have the bump to celebrate this occasion and say that Haskell 8 is ready. I have had a hard time seriously recommending Haskell due to those last missing features. Now I should be able to say without reservation: use Haskell 8; it is great! I was discussing this very matter yesterday with a few folks. I think we certainly have enough features in this release to do a major bump. I half-jokingly suggested that 8.0 should only come with Phase 2 of Richard's Dependent Haskell work, but I'm willing to settle for merely kind equality. I think doing a major bump would be a great idea. Cheers, - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: Planning for the 7.12 release
Simon Peyton Jones simo...@microsoft.com writes: | If this is intended to be an immediate wholesale switch to Shake I | would be very skeptical of merging for 7.12 as three months is, in my | opinion, very little time to test such a sweeping change. No there is no chance that we switch over to Shake for 7.12. At the moment it's nowhere near ready for prime time. Thanks for clarifying! But I do hope that we'll have material progress on a side-by-side build system before 7.12 comes out. It probably won't do everything but it should do some things well. I hope. Andrey, I look forward to seeing an update on things. Cheers, - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: ArrayArrays
I posted a summary article on what this lets you do to https://www.fpcomplete.com/user/edwardk/unlifted-structures I can see about making a more proposal/feature-oriented summary for the Haskell Wiki. It may have to wait until after ICFP though. -Edward On Fri, Aug 28, 2015 at 5:42 AM, Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code a a new data type, which can speed things up a bit when you don't have very big arrays: data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil But what if I wanted my object itself to live in # and have two mutable fields and be able to share the sme write barrier? An ArrayArray# points directly to other unlifted array types. What if we have one # - * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the ArrayArray#'s hold other arrayarrays. data DLL = DLL (MutableArrayArray# RealWorld) now I need to make up a new Nil, which I can just make be a special MutableArrayArray# I allocate on program startup. I can even abuse pattern synonyms. Alternately I can exploit the internals further to make this cheaper. Then I can use the readMutableArrayArray# and writeMutableArrayArray# calls to directly access the preceding and next entry in the linked list. So now we have one DLL wrapper which just 'bootstraps me' into a strict world, and everything there lives in #. next :: DLL - IO DLL next (DLL m) = IO $ \s - case readMutableArrayArray# s of (# s', n #) - (# s', DLL n #) It turns out GHC is quite happy to optimize all of that code to keep things unboxed. The 'DLL' wrappers get removed pretty easily when they are known strict and you chain operations of this sort! Cleaning it Up -- Now I have one outermost indirection pointing to an array that points directly to other arrays. I'm stuck paying for a card marking table per object, but I can fix that by duplicating the code for MutableArrayArray# and using a SmallMutableArray#. I can hack up primops that let me store a mixture of SmallMutableArray# fields and normal ones in the data structure.
Re: ArrayArrays
You presumably also save a bounds check on reads by hard-coding the sizes? On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote: Also there are 4 different things here, basically depending on two independent questions: a.) if you want to shove the sizes into the info table, and b.) if you want cardmarking. Versions with/without cardmarking for different sizes can be done pretty easily, but as noted, the infotable variants are pretty invasive. -Edward On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote: Well, on the plus side you'd save 16 bytes per object, which adds up if they were small enough and there are enough of them. You get a bit better locality of reference in terms of what fits in the first cache line of them. -Edward On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote: Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld
Re: ArrayArrays
They just segfault at this level. ;) Sent from my iPhone On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote: You presumably also save a bounds check on reads by hard-coding the sizes? On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote: Also there are 4 different things here, basically depending on two independent questions: a.) if you want to shove the sizes into the info table, and b.) if you want cardmarking. Versions with/without cardmarking for different sizes can be done pretty easily, but as noted, the infotable variants are pretty invasive. -Edward On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote: Well, on the plus side you'd save 16 bytes per object, which adds up if they were small enough and there are enough of them. You get a bit better locality of reference in terms of what fits in the first cache line of them. -Edward On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote: Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon From: Edward Kmett [mailto:ekm...@gmail.com] Sent: 27 August 2015 16:54 To: Simon Peyton Jones Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs Subject: Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil'
Re: ArrayArrays
I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery
Re: ArrayArrays
Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct
Re: GHC 7.10 complie time regression
Simon Peyton Jones wrote: no it's not expected to take much longer. Can you make a ticket with a reproducible test case? An make sure you are using ghc 7.10.2 and not 7.10.1 because 7.10.2 had some signifcant fixes for these kinds of issues. Erik -- -- Erik de Castro Lopo http://www.mega-nerd.com/ ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: ArrayArrays
I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code a a new data type, which can speed things up a bit when you don't have very big arrays: data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil But what if I wanted my object itself to live in # and have two mutable fields and be able to share the sme write barrier? An ArrayArray# points directly to other unlifted array types. What if we have one # - * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the ArrayArray#'s hold other arrayarrays. data DLL = DLL (MutableArrayArray# RealWorld) now I need to make up a new Nil, which I can just make be a special MutableArrayArray# I allocate on program startup. I can even abuse pattern synonyms. Alternately I can exploit the internals further to make this cheaper. Then I can use the readMutableArrayArray# and writeMutableArrayArray# calls to directly access the preceding and next entry in the linked list. So now we have one DLL wrapper which just 'bootstraps me' into a strict world, and everything there lives in #. next :: DLL - IO DLL next (DLL m) = IO $ \s - case readMutableArrayArray# s of (# s', n #) - (# s', DLL n #) It turns out GHC is quite happy to optimize all of that code to keep things unboxed. The 'DLL' wrappers get removed pretty easily when they are known strict and you chain operations of this sort! Cleaning it Up -- Now I have one outermost indirection pointing to an array that points directly to other
Re: ArrayArrays
Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code a a new data type, which can speed things up a bit when you don't have very big arrays: data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil But what if I wanted my object itself to live in # and have two mutable fields and be able to share the sme write barrier? An ArrayArray# points directly to other unlifted array types. What if we have one # - * wrapper on the outside to deal with the impedence mismatch between the imperative world and Haskell, and then just let the ArrayArray#'s hold other arrayarrays. data DLL = DLL
Re: ArrayArrays
So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do make it into a wiki page on the GHC Trac, and maybe make a ticket for it. Thanks Simon *From:* Edward Kmett [mailto:ekm...@gmail.com] *Sent:* 27 August 2015 16:54 *To:* Simon Peyton Jones *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs *Subject:* Re: ArrayArrays An ArrayArray# is just an Array# with a modified invariant. It points directly to other unlifted ArrayArray#'s or ByteArray#'s. While those live in #, they are garbage collected objects, so this all lives on the heap. They were added to make some of the DPH stuff fast when it has to deal with nested arrays. I'm currently abusing them as a placeholder for a better thing. The Problem - Consider the scenario where you write a classic doubly-linked list in Haskell. data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) Chasing from one DLL to the next requires following 3 pointers on the heap. DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL That is 3 levels of indirection. We can trim one by simply unpacking the IORef with -funbox-strict-fields or UNPACK We can trim another by adding a 'Nil' constructor for DLL and worsening our representation. data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil but now we're still stuck with a level of indirection DLL ~ MutVar# RealWorld DLL ~ DLL This means that every operation we perform on this structure will be about half of the speed of an implementation in most other languages assuming we're memory bound on loading things into cache! Making Progress -- I have been working on a number of data structures where the indirection of going from something in * out to an object in # which contains the real pointer to my target and coming back effectively doubles my runtime. We go out to the MutVar# because we are allowed to put the MutVar# onto the mutable list when we dirty it. There is a well defined write-barrier. I could change out the representation to use data DLL = DLL (MutableArray# RealWorld DLL) | Nil I can just store two pointers in the MutableArray# every time, but this doesn't help _much_ directly. It has reduced the amount of distinct addresses in memory I touch on a walk of the DLL from 3 per object to 2. I still have to go out to the heap from my DLL and get to the array object and then chase it to the next DLL and chase that to the next array. I do get my two pointers together in memory though. I'm paying for a card marking table as well, which I don't particularly need with just two pointers, but we can shed that with the SmallMutableArray# machinery added back in 7.10, which is just the old array code a a new data type, which can speed things up a bit when you don't have very big arrays: data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil But what if I
Re: Planning for the 7.12 release
On 28/08/15 20:33, Ben Gamari wrote: I half-jokingly suggested that 8.0 should only come with Phase 2 of Richard's Dependent Haskell work Ah, that would be perfect. signature.asc Description: OpenPGP digital signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Planning for the 7.12 release
I'm going by my rather poor memory for this. Frankly, I don't really care where the option sits, as long as I don't need a separate build of GHC to avoid LGPL. From: Ben Gamari b...@well-typed.com Sent: Friday, August 28, 2015 5:48 AM To: Elliot Cameron; GHC developers Cc: Herbert Valerio Riedel Subject: Re: Planning for the 7.12 release Elliot Cameron elliot.came...@covenanteyes.com writes: I can't seem to find the exact trac ticket, but the ability to swap out Integer implementations at link time would be a huge relief on Windows, which suffers from various problems with dynamic linking. I believe it was originally slated for 7.12. Can someone find it? Here's what I did find: https://ghc.haskell.org/trac/ghc/wiki/ReplacingGMPNotes A related discussion: https://github.com/commercialhaskell/stack/issues/399 As it stands, we've been trying hard to find good ways to provide custom GHC variants more easily to end users. Hmm, interesting. I'm not sure how realistic it is to make this a link-time option, however, considering that we may inline bindings from whatever integer package we compile against into the user's program. Herbert, do you have any thoughts on this? Cheers, - Ben ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: ArrayArrays
I think from my perspective, the motivation for getting the type checker involved is primarily bringing this to the level where users could be expected to build these structures. it is reasonable to think that there are people who want to use STM (a context with mutation already) to implement a straight forward data structure that avoids extra indirection penalty. There should be some places where knowing that things are field accesses rather then array indexing could be helpful, but I think GHC is good right now about handling constant offsets. In my code I don't do any bounds checking as I know I will only be accessing my arrays with constant indexes. I make wrappers for each field access and leave all the unsafe stuff in there. When things go wrong though, the compiler is no help. Maybe template Haskell that generates the appropriate wrappers is the right direction to go. There is another benefit for me when working with these as arrays in that it is quite simple and direct (given the hoops already jumped through) to play with alignment. I can ensure two pointers are never on the same cache-line by just spacing things out in the array. On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett ekm...@gmail.com wrote: They just segfault at this level. ;) Sent from my iPhone On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote: You presumably also save a bounds check on reads by hard-coding the sizes? On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote: Also there are 4 different things here, basically depending on two independent questions: a.) if you want to shove the sizes into the info table, and b.) if you want cardmarking. Versions with/without cardmarking for different sizes can be done pretty easily, but as noted, the infotable variants are pretty invasive. -Edward On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote: Well, on the plus side you'd save 16 bytes per object, which adds up if they were small enough and there are enough of them. You get a bit better locality of reference in terms of what fits in the first cache line of them. -Edward On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote: Yes. And for the short term I can imagine places we will settle with arrays even if it means tracking lengths unnecessarily and unsafeCoercing pointers whose types don't actually match their siblings. Is there anything to recommend the hacks mentioned for fixed sized array objects *other* than using them to fake structs? (Much to derecommend, as you mentioned!) On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote: I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other. -Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote: So that primitive is an array like thing (Same pointed type, unbounded length) with extra payload. I can see how we can do without structs if we have arrays, especially with the extra payload at front. But wouldn't the general solution for structs be one that that allows new user data type defs for # types? On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote: Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves. Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote: I like the possibility of a general solution for mutable structs (like Ed said), and I'm trying to fully understand why it's hard. So, we can't unpack MutVar into constructors because of object identity problems. But what about directly supporting an extensible set of unlifted MutStruct# objects, generalizing (and even replacing) MutVar#? That may be too much work, but is it problematic otherwise? Needless to say, this is also critical if we ever want best in class lockfree mutable structures, just like their Stm and sequential counterparts. On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com wrote: At the very least I'll take this email and turn it into a short article. Yes, please do