date:20150828

Lennart always says he’s going to work on it, but he’s a busy man and nothing 
has actually happened.

It’s a pretty easy feature to implement I think.

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Sean Leather
Sent: 28 August 2015 06:40
To: Ben Gamari
Cc: GHC developers
Subject: Re: Planning for the 7.12 release

On Thu, Aug 27, 2015 at 5:38 PM, Ben Gamari wrote:
These items are a bit less certain but may make it in if the authors
push forward quickly enough,

* Support for Type Signature Sections, allowing you to write (:: ty)
  as a shorthand for (\x - x :: ty).

Once Lennart convinced me of the usefulness of this [1], I've been finding 
plenty of places where I wish I had it. Who's working on it? What's the status? 
Is there a ticket for it?

Regards,
Sean

[1] http://augustss.blogspot.com/2014/04/a-small-haskell-extension.html
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

2015-08-28 Thread Kosyrev Serge

Ben Gamari b...@well-typed.com writes:
 These items are a bit less certain but may make it in if the authors
 push forward quickly enough,

[..]

 * A (possible) overhaul of GHC's build system to use Shake instead
   of Make.

Is there a breakdown of what remains to be done on this front?

-- 
с уважениeм / respectfully,
Косырев Серёга
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Fwd: Thread-safe Hint

2015-08-28 Thread Corentin Dupont

Hello,
I am wondering if GHC is now thread-safe.
I am using Hint, and it reports that GHC is not thread-safe, and that I
can't safely run two instances of the interpreter simultaneously.
Is that still the case?
Thanks!
Corentin


-- Forwarded message --
From: Daniel Gorín jcpetru...@gmail.com
Date: Thu, Aug 27, 2015 at 5:09 PM
Subject: Re: Thread-safe Hint
To: Corentin Dupont corentin.dup...@gmail.com


Hi Corentin,

sorry for the late reply. Until relatively recently, the problem was still
on. But I too remember seeing something related to this issue being fixed
(iirc, the problem was the runtime linker, which used global state), so
perhaps it is already fixed in 7.10. If you can verify this, it shouldn’t
be hard to show the error message only on old versions of ghc. I’ll be away
for a couple of weeks, but if you want to look into this and send a patch,
I’ll merge it when I return.

Cheers,
Daniel

 On 24 Aug 2015, at 10:43 am, Corentin Dupont corentin.dup...@gmail.com
wrote:

 Hello Daniel,
 I noticed the following message in Hint:
 This version of GHC is not thread-safe,can't safely run two instances of
the interpreter simultaneously.

 Is it still the case with recent versions of GHC?
 It would be neat to be able to launch several instances of the
interpreter. In my game Nomyx I have several match-up going on and having
one instance of the interpreter would be nicer. Otherwise I am obliged to
reset the interpret each time I want to interpret something, which is time
consuming (2-3 seconds).

 Thanks,
 C
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Planning for the 7.12 release

Andrey Mokhov is the man to ask.  I'm copying him.

Simon

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|  Kosyrev Serge
|  Sent: 28 August 2015 11:31
|  To: Ben Gamari
|  Cc: GHC developers
|  Subject: Re: Planning for the 7.12 release
|  
|  Ben Gamari b...@well-typed.com writes:
|   These items are a bit less certain but may make it in if the authors
|   push forward quickly enough,
|  
|  [..]
|  
|   * A (possible) overhaul of GHC's build system to use Shake
|  instead
| of Make.
|  
|  Is there a breakdown of what remains to be done on this front?
|  
|  --
|  с уважениeм / respectfully,
|  Косырев Серёга
|  ___
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

Matthew Pickering matthewtpicker...@gmail.com writes:

 Hi Ben,

 I think that D1152 (Record Pattern Synonyms) will be ready for 7.12.
 https://phabricator.haskell.org/D1152

Ahh yes. Thanks for pointing this out. I've added it to the list.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: ArrayArrays

At the very least I'll take this email and turn it into a short article.
Yes, please do make it into a wiki page on the GHC Trac, and maybe make a 
ticket for it.

Thanks

Simon

From: Edward Kmett [mailto:ekm...@gmail.com]
Sent: 27 August 2015 16:54
To: Simon Peyton Jones
Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs
Subject: Re: ArrayArrays

An ArrayArray# is just an Array# with a modified invariant. It points directly 
to other unlifted ArrayArray#'s or ByteArray#'s.

While those live in #, they are garbage collected objects, so this all lives on 
the heap.

They were added to make some of the DPH stuff fast when it has to deal with 
nested arrays.

I'm currently abusing them as a placeholder for a better thing.

The Problem
-

Consider the scenario where you write a classic doubly-linked list in Haskell.

data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)

Chasing from one DLL to the next requires following 3 pointers on the heap.

DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~ DLL

That is 3 levels of indirection.

We can trim one by simply unpacking the IORef with -funbox-strict-fields or 
UNPACK

We can trim another by adding a 'Nil' constructor for DLL and worsening our 
representation.

data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil

but now we're still stuck with a level of indirection

DLL ~ MutVar# RealWorld DLL ~ DLL

This means that every operation we perform on this structure will be about half 
of the speed of an implementation in most other languages assuming we're memory 
bound on loading things into cache!

Making Progress
--

I have been working on a number of data structures where the indirection of 
going from something in * out to an object in # which contains the real pointer 
to my target and coming back effectively doubles my runtime.

We go out to the MutVar# because we are allowed to put the MutVar# onto the 
mutable list when we dirty it. There is a well defined write-barrier.

I could change out the representation to use

data DLL = DLL (MutableArray# RealWorld DLL) | Nil

I can just store two pointers in the MutableArray# every time, but this doesn't 
help _much_ directly. It has reduced the amount of distinct addresses in memory 
I touch on a walk of the DLL from 3 per object to 2.

I still have to go out to the heap from my DLL and get to the array object and 
then chase it to the next DLL and chase that to the next array. I do get my two 
pointers together in memory though. I'm paying for a card marking table as 
well, which I don't particularly need with just two pointers, but we can shed 
that with the SmallMutableArray# machinery added back in 7.10, which is just 
the old array code a a new data type, which can speed things up a bit when you 
don't have very big arrays:

data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil

But what if I wanted my object itself to live in # and have two mutable fields 
and be able to share the sme write barrier?

An ArrayArray# points directly to other unlifted array types. What if we have 
one # - * wrapper on the outside to deal with the impedence mismatch between 
the imperative world and Haskell, and then just let the ArrayArray#'s hold 
other arrayarrays.

data DLL = DLL (MutableArrayArray# RealWorld)

now I need to make up a new Nil, which I can just make be a special 
MutableArrayArray# I allocate on program startup. I can even abuse pattern 
synonyms. Alternately I can exploit the internals further to make this cheaper.

Then I can use the readMutableArrayArray# and writeMutableArrayArray# calls to 
directly access the preceding and next entry in the linked list.

So now we have one DLL wrapper which just 'bootstraps me' into a strict world, 
and everything there lives in #.

next :: DLL - IO DLL
next (DLL m) = IO $ \s - case readMutableArrayArray# s of
   (# s', n #) - (# s', DLL n #)

It turns out GHC is quite happy to optimize all of that code to keep things 
unboxed. The 'DLL' wrappers get removed pretty easily when they are known 
strict and you chain operations of this sort!

Cleaning it Up
--

Now I have one outermost indirection pointing to an array that points directly 
to other arrays.

I'm stuck paying for a card marking table per object, but I can fix that by 
duplicating the code for MutableArrayArray# and using a SmallMutableArray#. I 
can hack up primops that let me store a mixture of SmallMutableArray# fields 
and normal ones in the data structure. Operationally, I can even do so by just 
unsafeCoercing the existing SmallMutableArray# primitives to change the kind of 
one of the arguments it takes.

This is almost ideal, but not quite. I often have fields that would be best 
left unboxed.

data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil

was able to unpack the Int, but we lost that. We can currently at best point 
one of the entries of the SmallMutableArray# at a boxed or at a 
MutableByteArray# for all of our misc. data

Re: Planning for the 7.12 release

Elliot Cameron elliot.came...@covenanteyes.com writes:

 I can't seem to find the exact trac ticket, but the ability to swap
 out Integer implementations at link time would be a huge relief on
 Windows, which suffers from various problems with dynamic linking. I
 believe it was originally slated for 7.12. Can someone find it? Here's
 what I did find:
 https://ghc.haskell.org/trac/ghc/wiki/ReplacingGMPNotes

 A related discussion:
 https://github.com/commercialhaskell/stack/issues/399

 As it stands, we've been trying hard to find good ways to provide
 custom GHC variants more easily to end users.

Hmm, interesting. I'm not sure how realistic it is to make this a
link-time option, however, considering that we may inline bindings from
whatever integer package we compile against into the user's program.

Herbert, do you have any thoughts on this?

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

2015-08-28 Thread Herbert Valerio Riedel

For the record,

I wanted to implement this feature a couple of months ago but then got
side-tracked. If somebody wants to pick it up (which would be great),
please lemme know.

In the meantime I've created

  https://ghc.haskell.org/trac/ghc/ticket/10803

and write up the little information I have on this topic already


On 2015-08-28 at 09:20:54 +0200, Simon Peyton Jones wrote:
 Lennart always says he’s going to work on it, but he’s a busy man and nothing 
 has actually happened.

 It’s a pretty easy feature to implement I think.

 Simon

 From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Sean Leather
 Sent: 28 August 2015 06:40
 To: Ben Gamari
 Cc: GHC developers
 Subject: Re: Planning for the 7.12 release

 On Thu, Aug 27, 2015 at 5:38 PM, Ben Gamari wrote:
 These items are a bit less certain but may make it in if the authors
 push forward quickly enough,


 * Support for Type Signature Sections, allowing you to write (:: ty)
   as a shorthand for (\x - x :: ty).

 Once Lennart convinced me of the usefulness of this [1], I've been finding 
 plenty of places where I wish I had it. Who's working on it? What's the 
 status? Is there a ticket for it?

 Regards,
 Sean

 [1] http://augustss.blogspot.com/2014/04/a-small-haskell-extension.html
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

-- 
Elegance is not optional -- Richard O'Keefe
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release: MonadFail

David Luposchainsky dluposchain...@googlemail.com writes:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hey Ben,

 my summer was pretty busy, but we recently fixed our MonadFail implementation 
 to
 work as desired, so that should make it in as well. We'll have to survive a
 heroic rebase/squash that we'll probably do in September when we're back from 
 our
 holidays.

Great, I've added it to the list. Keep us in the loop as things progress.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: ArrayArrays

2015-08-28 Thread Simon Marlow


On 27/08/2015 19:36, Edward Kmett wrote:

On Thu, Aug 27, 2015 at 1:24 PM, Edward Z. Yang ezy...@mit.edu
mailto:ezy...@mit.edu wrote:

It seems to me that we should take a page from OCaml's playbook
and add support for native mutable fields in objects, because
this is essentially what a mix of words and pointers is.


That actually doesn't work as well as one might hope.

We currently treat data constructor closures as so much tissue paper
around a present. We tear them open, rip out all their contents, scatter
them throughout our code and then we build a whole new data constructor
closure when we're done, or we just leave them suspended in closures
awaiting someone to demand we finally make a new data constructor.

Half the time we don't even give back the data constructor closure and
push it into update g frames and we just give back the items on the stack.

With the machinery I mentioned above I get a world where every time I
access an object I can know it is evaluated for real, so this means I'm
not stuck 'entering an unknown closure', and getting it to give me back
a slab of memory that we know is a real data constructor that i can bang
away on mutable entries in.

In a world where things in * could hold mutable pointers we have to care
a lot more about object identity in deeply uncomfortable ways.

With what I've implemented I only care about object identity between
things in # that are gcptrs. The garbage collector may move them around,
but it doesn't put in thunks anywhere.


Yeah, I've actually thought about whether we could have mutable fields 
in constructors a couple of times, and it's far from easy for the 
reasons you describe.  A constructor with mutable fields would need to 
be an object with identity, with precise control over when it is 
created.  This is nothing like an ordinary constructor.


I like the alternative approach in this thread, which is to attack the 
problem from the other end: start with a primitive object and make it 
more like a constructor.


I don't see any reason why we shouldn't add primops to read/write 
SmallArray# and other primitive objects in an ArrayArray#.  Will someone 
make a patch?  It should be pretty straightforward.


Cheers,
Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

2015-08-28 Thread Herbert Valerio Riedel

On 2015-08-28 at 12:31:13 +0200, Kosyrev Serge wrote:

[...]

 * A (possible) overhaul of GHC's build system to use Shake instead
   of Make.

 Is there a breakdown of what remains to be done on this front?

Btw, here's the GitHub repo were you can track the progress:

https://github.com/snowleopard/shaking-up-ghc
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

Kosyrev Serge _deepf...@feelingofgreen.ru writes:

Ben Gamari b...@well-typed.com writes:
These items are a bit less certain but may make it in if the authors
push forward quickly enough,

[..]

* A (possible) overhaul of GHC's build system to use Shake instead
of Make.

Is there a breakdown of what remains to be done on this front?

I'm actually not entirely clear on the general plan regarding the
Shake-up and perhaps this is a good time to discuss it.

How gradual of a transition do we envision this will be? Specifically,
for how long do we want the two build systems to coexist (if at all)?

If this is intended to be an immediate wholesale switch to Shake I would
be very skeptical of merging for 7.12 as three months is, in my
opinion, very little time to test such a sweeping change.

However, a long transition time is also not terribly desirable as
maintaining two largely independent build systems potentially carries
significant cost.

Can we in principle expect the Shake build system to build all of
the configurations GHC currently supports from day-one (including, for
instance, cross compilation)? The batch files in the repository suggest
that it has been used on Windows but has it been tested on our other
Tier 1 platforms?

I just attempted to use the current state of the repository and sadly
found that things fell apart pretty quickly [1].

I would love to see this happen, but obviously we need to tread
carefully when performing such a major overhaul to code so central to
the project. Moreover, I would really like to minimize the probability
that we increase the maintenance burden of our build infrastructure.

I think if there is clear communication regarding what remains to be
done and motivation to quickly finish these items then Shakification is
still an possibility for 7.12.

Otherwise I personally think we may want to be a bit conservative. End
users can always check out the shaking-up-ghc repository into their GHC
trees themselves if they want to try using it. I'm certainly willing to
consider other opinions, however.

Cheers,

- Ben

[1] After I manually ran ./boot,

$ _shake/build --lint --directory .. $@
... # various output from ./configure
Reading shake/cfg/system.config...
Reading package dependencies...
Error when running Shake build system:
* OracleQ (PackageDataKey
(libraries/bin-package-db/dist-boot/package-data.mk,libraries_bin-package-db_dist-boot_LIB_NAME))
* libraries/bin-package-db/dist-boot/package-data.mk
* libraries/bin-package-db/dist-boot/package-data.mk
libraries/bin-package-db/dist-boot/haddock-prologue.txt
libraries/bin-package-db/dist-boot/inplace-pkg-config
libraries/bin-package-db/dist-boot/setup-config
libraries/bin-package-db/dist-boot/build/autogen/cabal_macros.h
* libraries/binary/dist-boot/package-data.mk
* libraries/binary/dist-boot/package-data.mk
libraries/binary/dist-boot/haddock-prologue.txt
libraries/binary/dist-boot/inplace-pkg-config
libraries/binary/dist-boot/setup-config
libraries/binary/dist-boot/build/autogen/cabal_macros.h
* /mnt/work/ghc/ghc-shake/inplace/bin/ghc-cabal
Error, file does not exist and no rule available:
/mnt/work/ghc/ghc-shake/inplace/bin/ghc-cabal

signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Planning for the 7.12 release

Actually that’s a good idea.

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Greg Weber
Sent: 28 August 2015 16:43
To: Ben Gamari
Cc: GHC developers
Subject: Re: Planning for the 7.12 release

Can we call this GHC 8.0 instead of 7.12 ?
Overloaded record fields and backtraces are a huge missing piece to Haskell. It 
would be nice to have the bump to celebrate this occasion and say that Haskell 
8 is ready. I have had a hard time seriously recommending Haskell due to 
those last missing features. Now I should be able to say without reservation: 
use Haskell  8; it is great!

On Thu, Aug 27, 2015 at 8:38 AM, Ben Gamari 
b...@well-typed.commailto:b...@well-typed.com wrote:

Hello everyone!

With the 7.10.1 release nearly six months behind us and 7.10.2 out of the
way, now is a good time to begin looking forward to 7.12. In keeping
with the typical release pace, we are aiming to have a release
candidate ready in mid-December 2015 and a final release in January
2016.

The items that that we currently believe have a good chance of making it
in to 7.12 are listed on the release status page [1], which I've
summarized below (in no particular order),

* Support for implicit parameters providing callstacks and source
  locations

* Support for wildcards in data and type family instances

* A new, type-indexed type representation, data TTypeRep (a :: k).

* Introduction of visible type application

* Support for reasoning about kind equalities

* Support for Injective Type Families

* Support for the Strict language extension

* Support for Overloaded Record Fields, allowing multiple uses of
  the same field name and a form of type-directed name resolution.

* A huge improvement to pattern matching (including much better
  coverage of GADTs)

* Backpack is chugging along; we have a new user-facing syntax which
  allows multiple modules to be defined a single file, and are
  hoping to release at least the ability to publish multiple units
  in a single Cabal file.

* Support for Applicative Do, allowing GHC to desugar do-notation to
  Applicative where possible.

* Improved DWARF based debugging support including backtraces from
  Haskell code

* An Improved LLVM Backend that ships with every major Tier 1 platform.

These items are a bit less certain but may make it in if the authors
push forward quickly enough,

* Support for Type Signature Sections, allowing you to write (:: ty)
  as a shorthand for (\x - x :: ty).

* A (possible) overhaul of GHC's build system to use Shake instead
  of Make.

* A DEPRECATED pragma for exports

Is your pet project missing from this list? If you have a patch that you
believe is on-track to make it in for 7.12, please let us know.

Moreover, if you have an issue that you urgently need fixed in 7.12,
please express you interest on the appropriate ticket. User feedback
helps us immensely in figuring out how to best place our priorities.

Cheers,

- Ben

[1] https://ghc.haskell.org/trac/ghc/wiki/Status/GHC-7.12.1

___
ghc-devs mailing list
ghc-devs@haskell.orgmailto:ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Planning for the 7.12 release

2015-08-28 Thread Greg Weber

Can we call this GHC 8.0 instead of 7.12 ?
Overloaded record fields and backtraces are a huge missing piece to
Haskell. It would be nice to have the bump to celebrate this occasion and
say that Haskell 8 is ready. I have had a hard time seriously
recommending Haskell due to those last missing features. Now I should be
able to say without reservation: use Haskell  8; it is great!

On Thu, Aug 27, 2015 at 8:38 AM, Ben Gamari b...@well-typed.com wrote:


 Hello everyone!

 With the 7.10.1 release nearly six months behind us and 7.10.2 out of the
 way, now is a good time to begin looking forward to 7.12. In keeping
 with the typical release pace, we are aiming to have a release
 candidate ready in mid-December 2015 and a final release in January
 2016.

 The items that that we currently believe have a good chance of making it
 in to 7.12 are listed on the release status page [1], which I've
 summarized below (in no particular order),


 * Support for implicit parameters providing callstacks and source
   locations

 * Support for wildcards in data and type family instances

 * A new, type-indexed type representation, data TTypeRep (a :: k).

 * Introduction of visible type application

 * Support for reasoning about kind equalities

 * Support for Injective Type Families

 * Support for the Strict language extension

 * Support for Overloaded Record Fields, allowing multiple uses of
   the same field name and a form of type-directed name resolution.

 * A huge improvement to pattern matching (including much better
   coverage of GADTs)

 * Backpack is chugging along; we have a new user-facing syntax which
   allows multiple modules to be defined a single file, and are
   hoping to release at least the ability to publish multiple units
   in a single Cabal file.

 * Support for Applicative Do, allowing GHC to desugar do-notation to
   Applicative where possible.

 * Improved DWARF based debugging support including backtraces from
   Haskell code

 * An Improved LLVM Backend that ships with every major Tier 1 platform.


 These items are a bit less certain but may make it in if the authors
 push forward quickly enough,


 * Support for Type Signature Sections, allowing you to write (:: ty)
   as a shorthand for (\x - x :: ty).

 * A (possible) overhaul of GHC's build system to use Shake instead
   of Make.

 * A DEPRECATED pragma for exports


 Is your pet project missing from this list? If you have a patch that you
 believe is on-track to make it in for 7.12, please let us know.

 Moreover, if you have an issue that you urgently need fixed in 7.12,
 please express you interest on the appropriate ticket. User feedback
 helps us immensely in figuring out how to best place our priorities.

 Cheers,

 - Ben


 [1] https://ghc.haskell.org/trac/ghc/wiki/Status/GHC-7.12.1

 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: ArrayArrays

I'd love to have that last 10%, but its a lot of work to get there and more
importantly I don't know quite what it should look like.

On the other hand, I do have a pretty good idea of how the primitives above
could be banged out and tested in a long evening, well in time for 7.12.
And as noted earlier, those remain useful even if a nicer typed version
with an extra level of indirection to the sizes is built up after.

The rest sounds like a good graduate student project for someone who has
graduate students lying around. Maybe somebody at Indiana University who
has an interest in type theory and parallelism can find us one. =)

-Edward

On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates fryguy...@gmail.com wrote:

 I think from my perspective, the motivation for getting the type
 checker involved is primarily bringing this to the level where users
 could be expected to build these structures.  it is reasonable to
 think that there are people who want to use STM (a context with
 mutation already) to implement a straight forward data structure that
 avoids extra indirection penalty.  There should be some places where
 knowing that things are field accesses rather then array indexing
 could be helpful, but I think GHC is good right now about handling
 constant offsets.  In my code I don't do any bounds checking as I know
 I will only be accessing my arrays with constant indexes.  I make
 wrappers for each field access and leave all the unsafe stuff in
 there.  When things go wrong though, the compiler is no help.  Maybe
 template Haskell that generates the appropriate wrappers is the right
 direction to go.
 There is another benefit for me when working with these as arrays in
 that it is quite simple and direct (given the hoops already jumped
 through) to play with alignment.  I can ensure two pointers are never
 on the same cache-line by just spacing things out in the array.

 On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett ekm...@gmail.com wrote:
  They just segfault at this level. ;)
 
  Sent from my iPhone
 
  On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote:
 
  You presumably also save a bounds check on reads by hard-coding the
 sizes?
 
  On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote:
 
  Also there are 4 different things here, basically depending on two
  independent questions:
 
  a.) if you want to shove the sizes into the info table, and
  b.) if you want cardmarking.
 
  Versions with/without cardmarking for different sizes can be done pretty
  easily, but as noted, the infotable variants are pretty invasive.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote:
 
  Well, on the plus side you'd save 16 bytes per object, which adds up if
  they were small enough and there are enough of them. You get a bit
 better
  locality of reference in terms of what fits in the first cache line of
 them.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com
 wrote:
 
  Yes. And for the short term I can imagine places we will settle with
  arrays even if it means tracking lengths unnecessarily and
 unsafeCoercing
  pointers whose types don't actually match their siblings.
 
  Is there anything to recommend the hacks mentioned for fixed sized
 array
  objects *other* than using them to fake structs? (Much to
 derecommend, as
  you mentioned!)
 
  On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com
 wrote:
 
  I think both are useful, but the one you suggest requires a lot more
  plumbing and doesn't subsume all of the usecases of the other.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com
  wrote:
 
  So that primitive is an array like thing (Same pointed type,
 unbounded
  length) with extra payload.
 
  I can see how we can do without structs if we have arrays,
 especially
  with the extra payload at front. But wouldn't the general solution
 for
  structs be one that that allows new user data type defs for # types?
 
 
 
  On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com
 wrote:
 
  Some form of MutableStruct# with a known number of words and a
 known
  number of pointers is basically what Ryan Yates was suggesting
 above, but
  where the word counts were stored in the objects themselves.
 
  Given that it'd have a couple of words for those counts it'd likely
  want to be something we build in addition to MutVar# rather than a
  replacement.
 
  On the other hand, if we had to fix those numbers and build info
  tables that knew them, and typechecker support, for instance, it'd
 get
  rather invasive.
 
  Also, a number of things that we can do with the 'sized' versions
  above, like working with evil unsized c-style arrays directly
 inline at the
  end of the structure cease to be possible, so it isn't even a pure
 win if we
  did the engineering effort.
 
  I think 90% of the needs I have are covered just by adding the one
  primitive. The last 10% gets pretty

Re: ArrayArrays

2015-08-28 Thread Ryan Yates

I completely agree.  I would love to spend some time during ICFP and
friends talking about what it could look like.  My small array for STM
changes for the RTS can be seen here [1].  It is on a branch somewhere
between 7.8 and 7.10 and includes irrelevant STM bits and some
confusing naming choices (sorry), but should cover all the details
needed to implement it for a non-STM context.  The biggest surprise
for me was following small array too closely and having a word/byte
offset miss-match [2].

[1]: 
https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut
[2]: https://ghc.haskell.org/trac/ghc/ticket/10413

Ryan

On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett ekm...@gmail.com wrote:
 I'd love to have that last 10%, but its a lot of work to get there and more
 importantly I don't know quite what it should look like.

 On the other hand, I do have a pretty good idea of how the primitives above
 could be banged out and tested in a long evening, well in time for 7.12. And
 as noted earlier, those remain useful even if a nicer typed version with an
 extra level of indirection to the sizes is built up after.

 The rest sounds like a good graduate student project for someone who has
 graduate students lying around. Maybe somebody at Indiana University who has
 an interest in type theory and parallelism can find us one. =)

 -Edward

 On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates fryguy...@gmail.com wrote:

 I think from my perspective, the motivation for getting the type
 checker involved is primarily bringing this to the level where users
 could be expected to build these structures.  it is reasonable to
 think that there are people who want to use STM (a context with
 mutation already) to implement a straight forward data structure that
 avoids extra indirection penalty.  There should be some places where
 knowing that things are field accesses rather then array indexing
 could be helpful, but I think GHC is good right now about handling
 constant offsets.  In my code I don't do any bounds checking as I know
 I will only be accessing my arrays with constant indexes.  I make
 wrappers for each field access and leave all the unsafe stuff in
 there.  When things go wrong though, the compiler is no help.  Maybe
 template Haskell that generates the appropriate wrappers is the right
 direction to go.
 There is another benefit for me when working with these as arrays in
 that it is quite simple and direct (given the hoops already jumped
 through) to play with alignment.  I can ensure two pointers are never
 on the same cache-line by just spacing things out in the array.

 On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett ekm...@gmail.com wrote:
  They just segfault at this level. ;)
 
  Sent from my iPhone
 
  On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote:
 
  You presumably also save a bounds check on reads by hard-coding the
  sizes?
 
  On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote:
 
  Also there are 4 different things here, basically depending on two
  independent questions:
 
  a.) if you want to shove the sizes into the info table, and
  b.) if you want cardmarking.
 
  Versions with/without cardmarking for different sizes can be done
  pretty
  easily, but as noted, the infotable variants are pretty invasive.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote:
 
  Well, on the plus side you'd save 16 bytes per object, which adds up
  if
  they were small enough and there are enough of them. You get a bit
  better
  locality of reference in terms of what fits in the first cache line of
  them.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com
  wrote:
 
  Yes. And for the short term I can imagine places we will settle with
  arrays even if it means tracking lengths unnecessarily and
  unsafeCoercing
  pointers whose types don't actually match their siblings.
 
  Is there anything to recommend the hacks mentioned for fixed sized
  array
  objects *other* than using them to fake structs? (Much to
  derecommend, as
  you mentioned!)
 
  On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com
  wrote:
 
  I think both are useful, but the one you suggest requires a lot more
  plumbing and doesn't subsume all of the usecases of the other.
 
  -Edward
 
  On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com
  wrote:
 
  So that primitive is an array like thing (Same pointed type,
  unbounded
  length) with extra payload.
 
  I can see how we can do without structs if we have arrays,
  especially
  with the extra payload at front. But wouldn't the general solution
  for
  structs be one that that allows new user data type defs for #
  types?
 
 
 
  On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com
  wrote:
 
  Some form of MutableStruct# with a known number of words and a
  known
  number of pointers is basically what Ryan Yates was suggesting
  above, but
  where

RE: Planning for the 7.12 release

Simon Peyton Jones simo...@microsoft.com writes:

 Actually that’s a good idea.

 Simon

 From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Greg Weber
 Sent: 28 August 2015 16:43
 To: Ben Gamari
 Cc: GHC developers
 Subject: Re: Planning for the 7.12 release

 Can we call this GHC 8.0 instead of 7.12 ?
 Overloaded record fields and backtraces are a huge missing piece to
 Haskell. It would be nice to have the bump to celebrate this occasion
 and say that Haskell 8 is ready. I have had a hard time seriously
 recommending Haskell due to those last missing features. Now I should
 be able to say without reservation: use Haskell  8; it is great!

I was discussing this very matter yesterday with a few folks. I think we
certainly have enough features in this release to do a major bump. I
half-jokingly suggested that 8.0 should only come with Phase 2 of
Richard's Dependent Haskell work, but I'm willing to settle for
merely kind equality.

I think doing a major bump would be a great idea.

Cheers,

- Ben

signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Planning for the 7.12 release

Simon Peyton Jones simo...@microsoft.com writes:

 |  If this is intended to be an immediate wholesale switch to Shake I
 |  would be very skeptical of merging for 7.12 as three months is, in my
 |  opinion, very little time to test such a sweeping change.

 No there is no chance that we switch over to Shake for 7.12. At the
 moment it's nowhere near ready for prime time.

Thanks for clarifying!

 But I do hope that we'll have material progress on a side-by-side
 build system before 7.12 comes out. It probably won't do everything
 but it should do some things well. I hope.

Andrey, I look forward to seeing an update on things.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: ArrayArrays

I posted a summary article on what this lets you do to

https://www.fpcomplete.com/user/edwardk/unlifted-structures

I can see about making a more proposal/feature-oriented summary for the
Haskell Wiki. It may have to wait until after ICFP though.

-Edward

On Fri, Aug 28, 2015 at 5:42 AM, Simon Peyton Jones simo...@microsoft.com
wrote:

 At the very least I'll take this email and turn it into a short article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe make a
 ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It points
 directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this all
 lives on the heap.



 They were added to make some of the DPH stuff fast when it has to deal
 with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list in
 Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on the heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~
 DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with -funbox-strict-fields
 or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and worsening
 our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld DLL ~ DLL



 This means that every operation we perform on this structure will be about
 half of the speed of an implementation in most other languages assuming
 we're memory bound on loading things into cache!



 Making Progress

 --



 I have been working on a number of data structures where the indirection
 of going from something in * out to an object in # which contains the real
 pointer to my target and coming back effectively doubles my runtime.



 We go out to the MutVar# because we are allowed to put the MutVar# onto
 the mutable list when we dirty it. There is a well defined write-barrier.



 I could change out the representation to use



 data DLL = DLL (MutableArray# RealWorld DLL) | Nil



 I can just store two pointers in the MutableArray# every time, but this
 doesn't help _much_ directly. It has reduced the amount of distinct
 addresses in memory I touch on a walk of the DLL from 3 per object to 2.



 I still have to go out to the heap from my DLL and get to the array object
 and then chase it to the next DLL and chase that to the next array. I do
 get my two pointers together in memory though. I'm paying for a card
 marking table as well, which I don't particularly need with just two
 pointers, but we can shed that with the SmallMutableArray# machinery
 added back in 7.10, which is just the old array code a a new data type,
 which can speed things up a bit when you don't have very big arrays:



 data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil



 But what if I wanted my object itself to live in # and have two mutable
 fields and be able to share the sme write barrier?



 An ArrayArray# points directly to other unlifted array types. What if we
 have one # - * wrapper on the outside to deal with the impedence mismatch
 between the imperative world and Haskell, and then just let the
 ArrayArray#'s hold other arrayarrays.



 data DLL = DLL (MutableArrayArray# RealWorld)



 now I need to make up a new Nil, which I can just make be a special
 MutableArrayArray# I allocate on program startup. I can even abuse pattern
 synonyms. Alternately I can exploit the internals further to make this
 cheaper.



 Then I can use the readMutableArrayArray# and writeMutableArrayArray#
 calls to directly access the preceding and next entry in the linked list.



 So now we have one DLL wrapper which just 'bootstraps me' into a strict
 world, and everything there lives in #.



 next :: DLL - IO DLL

 next (DLL m) = IO $ \s - case readMutableArrayArray# s of

(# s', n #) - (# s', DLL n #)



 It turns out GHC is quite happy to optimize all of that code to keep
 things unboxed. The 'DLL' wrappers get removed pretty easily when they are
 known strict and you chain operations of this sort!



 Cleaning it Up

 --



 Now I have one outermost indirection pointing to an array that points
 directly to other arrays.



 I'm stuck paying for a card marking table per object, but I can fix that
 by duplicating the code for MutableArrayArray# and using a
 SmallMutableArray#. I can hack up primops that let me store a mixture of
 SmallMutableArray# fields and normal ones in the data structure.

Re: ArrayArrays

You presumably also save a bounds check on reads by hard-coding the sizes?

On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote:

 Also there are 4 different things here, basically depending on two
 independent questions:

 a.) if you want to shove the sizes into the info table, and
 b.) if you want cardmarking.

 Versions with/without cardmarking for different sizes can be done pretty
 easily, but as noted, the infotable variants are pretty invasive.

 -Edward

 On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote:

 Well, on the plus side you'd save 16 bytes per object, which adds up if
 they were small enough and there are enough of them. You get a bit better
 locality of reference in terms of what fits in the first cache line of them.

 -Edward

 On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote:

 Yes. And for the short term I can imagine places we will settle with
 arrays even if it means tracking lengths unnecessarily and unsafeCoercing
 pointers whose types don't actually match their siblings.

 Is there anything to recommend the hacks mentioned for fixed sized array
 objects *other* than using them to fake structs? (Much to derecommend, as
 you mentioned!)

 On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote:

 I think both are useful, but the one you suggest requires a lot more
 plumbing and doesn't subsume all of the usecases of the other.

 -Edward

 On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com
 wrote:

 So that primitive is an array like thing (Same pointed type, unbounded
 length) with extra payload.

 I can see how we can do without structs if we have arrays, especially
 with the extra payload at front. But wouldn't the general solution for
 structs be one that that allows new user data type defs for # types?



 On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote:

 Some form of MutableStruct# with a known number of words and a known
 number of pointers is basically what Ryan Yates was suggesting above, but
 where the word counts were stored in the objects themselves.

 Given that it'd have a couple of words for those counts it'd likely
 want to be something we build in addition to MutVar# rather than a
 replacement.

 On the other hand, if we had to fix those numbers and build info
 tables that knew them, and typechecker support, for instance, it'd get
 rather invasive.

 Also, a number of things that we can do with the 'sized' versions
 above, like working with evil unsized c-style arrays directly inline at 
 the
 end of the structure cease to be possible, so it isn't even a pure win if
 we did the engineering effort.

 I think 90% of the needs I have are covered just by adding the one
 primitive. The last 10% gets pretty invasive.

 -Edward

 On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com
 wrote:

 I like the possibility of a general solution for mutable structs
 (like Ed said), and I'm trying to fully understand why it's hard.

 So, we can't unpack MutVar into constructors because of object
 identity problems. But what about directly supporting an extensible set 
 of
 unlifted MutStruct# objects, generalizing (and even replacing) MutVar#?
 That may be too much work, but is it problematic otherwise?

 Needless to say, this is also critical if we ever want best in class
 lockfree mutable structures, just like their Stm and sequential
 counterparts.

 On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones 
 simo...@microsoft.com wrote:

 At the very least I'll take this email and turn it into a short
 article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe
 make a ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It
 points directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this
 all lives on the heap.



 They were added to make some of the DPH stuff fast when it has to
 deal with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list
 in Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on
 the heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe
 DLL ~ DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with
 -funbox-strict-fields or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and
 worsening our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld

Re: ArrayArrays

They just segfault at this level. ;)

Sent from my iPhone

 On Aug 28, 2015, at 7:25 PM, Ryan Newton rrnew...@gmail.com wrote:
 
 You presumably also save a bounds check on reads by hard-coding the sizes?
 
 On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett ekm...@gmail.com wrote:
 Also there are 4 different things here, basically depending on two 
 independent questions: 
 
 a.) if you want to shove the sizes into the info table, and 
 b.) if you want cardmarking.
 
 Versions with/without cardmarking for different sizes can be done pretty 
 easily, but as noted, the infotable variants are pretty invasive.
 
 -Edward
 
 On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett ekm...@gmail.com wrote:
 Well, on the plus side you'd save 16 bytes per object, which adds up if 
 they were small enough and there are enough of them. You get a bit better 
 locality of reference in terms of what fits in the first cache line of them.
 
 -Edward
 
 On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton rrnew...@gmail.com wrote:
 Yes. And for the short term I can imagine places we will settle with 
 arrays even if it means tracking lengths unnecessarily and unsafeCoercing 
 pointers whose types don't actually match their siblings. 
 
 Is there anything to recommend the hacks mentioned for fixed sized array 
 objects *other* than using them to fake structs? (Much to derecommend, as 
 you mentioned!)
 
 On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote:
 I think both are useful, but the one you suggest requires a lot more 
 plumbing and doesn't subsume all of the usecases of the other.
 
 -Edward
 
 On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote:
 So that primitive is an array like thing (Same pointed type, unbounded 
 length) with extra payload. 
 
 I can see how we can do without structs if we have arrays, especially 
 with the extra payload at front. But wouldn't the general solution for 
 structs be one that that allows new user data type defs for # types?
 
 
 
 On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote:
 Some form of MutableStruct# with a known number of words and a known 
 number of pointers is basically what Ryan Yates was suggesting above, 
 but where the word counts were stored in the objects themselves.
 
 Given that it'd have a couple of words for those counts it'd likely 
 want to be something we build in addition to MutVar# rather than a 
 replacement.
 
 On the other hand, if we had to fix those numbers and build info tables 
 that knew them, and typechecker support, for instance, it'd get rather 
 invasive.
 
 Also, a number of things that we can do with the 'sized' versions 
 above, like working with evil unsized c-style arrays directly inline at 
 the end of the structure cease to be possible, so it isn't even a pure 
 win if we did the engineering effort.
 
 I think 90% of the needs I have are covered just by adding the one 
 primitive. The last 10% gets pretty invasive.
 
 -Edward
 
 On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com 
 wrote:
 I like the possibility of a general solution for mutable structs (like 
 Ed said), and I'm trying to fully understand why it's hard. 
 
 So, we can't unpack MutVar into constructors because of object 
 identity problems. But what about directly supporting an extensible 
 set of unlifted MutStruct# objects, generalizing (and even replacing) 
 MutVar#? That may be too much work, but is it problematic otherwise?
 
 Needless to say, this is also critical if we ever want best in class 
 lockfree mutable structures, just like their Stm and sequential 
 counterparts. 
 
 On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones 
 simo...@microsoft.com wrote:
 At the very least I'll take this email and turn it into a short 
 article.
 Yes, please do make it into a wiki page on the GHC Trac, and maybe 
 make a ticket for it.
 
 
 Thanks
 
  
 
 Simon
 
  
 
 From: Edward Kmett [mailto:ekm...@gmail.com] 
 Sent: 27 August 2015 16:54
 To: Simon Peyton Jones
 Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs
 Subject: Re: ArrayArrays
 
  
 
 An ArrayArray# is just an Array# with a modified invariant. It points 
 directly to other unlifted ArrayArray#'s or ByteArray#'s.
  
 While those live in #, they are garbage collected objects, so this 
 all lives on the heap.
  
 They were added to make some of the DPH stuff fast when it has to 
 deal with nested arrays.
  
 I'm currently abusing them as a placeholder for a better thing.
  
 The Problem
 -
  
 Consider the scenario where you write a classic doubly-linked list in 
 Haskell.
  
 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)
  
 Chasing from one DLL to the next requires following 3 pointers on the 
 heap.
  
 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe 
 DLL ~ DLL
  
 That is 3 levels of indirection.
  
 We can trim one by simply unpacking the IORef with 
 -funbox-strict-fields or UNPACK
  
 We can trim another by adding a 'Nil'

Re: ArrayArrays

I think both are useful, but the one you suggest requires a lot more
plumbing and doesn't subsume all of the usecases of the other.

-Edward

On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote:

 So that primitive is an array like thing (Same pointed type, unbounded
 length) with extra payload.

 I can see how we can do without structs if we have arrays, especially with
 the extra payload at front. But wouldn't the general solution for structs
 be one that that allows new user data type defs for # types?



 On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote:

 Some form of MutableStruct# with a known number of words and a known
 number of pointers is basically what Ryan Yates was suggesting above, but
 where the word counts were stored in the objects themselves.

 Given that it'd have a couple of words for those counts it'd likely want
 to be something we build in addition to MutVar# rather than a replacement.

 On the other hand, if we had to fix those numbers and build info tables
 that knew them, and typechecker support, for instance, it'd get rather
 invasive.

 Also, a number of things that we can do with the 'sized' versions above,
 like working with evil unsized c-style arrays directly inline at the end of
 the structure cease to be possible, so it isn't even a pure win if we did
 the engineering effort.

 I think 90% of the needs I have are covered just by adding the one
 primitive. The last 10% gets pretty invasive.

 -Edward

 On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote:

 I like the possibility of a general solution for mutable structs (like
 Ed said), and I'm trying to fully understand why it's hard.

 So, we can't unpack MutVar into constructors because of object identity
 problems. But what about directly supporting an extensible set of unlifted
 MutStruct# objects, generalizing (and even replacing) MutVar#? That may be
 too much work, but is it problematic otherwise?

 Needless to say, this is also critical if we ever want best in class
 lockfree mutable structures, just like their Stm and sequential
 counterparts.

 On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones 
 simo...@microsoft.com wrote:

 At the very least I'll take this email and turn it into a short article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe make
 a ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It points
 directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this all
 lives on the heap.



 They were added to make some of the DPH stuff fast when it has to deal
 with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list in
 Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on the
 heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL
 ~ DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with
 -funbox-strict-fields or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and worsening
 our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld DLL ~ DLL



 This means that every operation we perform on this structure will be
 about half of the speed of an implementation in most other languages
 assuming we're memory bound on loading things into cache!



 Making Progress

 --



 I have been working on a number of data structures where the
 indirection of going from something in * out to an object in # which
 contains the real pointer to my target and coming back effectively doubles
 my runtime.



 We go out to the MutVar# because we are allowed to put the MutVar# onto
 the mutable list when we dirty it. There is a well defined write-barrier.



 I could change out the representation to use



 data DLL = DLL (MutableArray# RealWorld DLL) | Nil



 I can just store two pointers in the MutableArray# every time, but this
 doesn't help _much_ directly. It has reduced the amount of distinct
 addresses in memory I touch on a walk of the DLL from 3 per object to 2.



 I still have to go out to the heap from my DLL and get to the array
 object and then chase it to the next DLL and chase that to the next array.
 I do get my two pointers together in memory though. I'm paying for a card
 marking table as well, which I don't particularly need with just two
 pointers, but we can shed that with the SmallMutableArray# machinery

Re: ArrayArrays

Yes. And for the short term I can imagine places we will settle with arrays
even if it means tracking lengths unnecessarily and unsafeCoercing pointers
whose types don't actually match their siblings.

Is there anything to recommend the hacks mentioned for fixed sized array
objects *other* than using them to fake structs? (Much to derecommend, as
you mentioned!)
On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett ekm...@gmail.com wrote:

 I think both are useful, but the one you suggest requires a lot more
 plumbing and doesn't subsume all of the usecases of the other.

 -Edward

 On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton rrnew...@gmail.com wrote:

 So that primitive is an array like thing (Same pointed type, unbounded
 length) with extra payload.

 I can see how we can do without structs if we have arrays, especially
 with the extra payload at front. But wouldn't the general solution for
 structs be one that that allows new user data type defs for # types?



 On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett ekm...@gmail.com wrote:

 Some form of MutableStruct# with a known number of words and a known
 number of pointers is basically what Ryan Yates was suggesting above, but
 where the word counts were stored in the objects themselves.

 Given that it'd have a couple of words for those counts it'd likely want
 to be something we build in addition to MutVar# rather than a replacement.

 On the other hand, if we had to fix those numbers and build info tables
 that knew them, and typechecker support, for instance, it'd get rather
 invasive.

 Also, a number of things that we can do with the 'sized' versions above,
 like working with evil unsized c-style arrays directly inline at the end of
 the structure cease to be possible, so it isn't even a pure win if we did
 the engineering effort.

 I think 90% of the needs I have are covered just by adding the one
 primitive. The last 10% gets pretty invasive.

 -Edward

 On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote:

 I like the possibility of a general solution for mutable structs (like
 Ed said), and I'm trying to fully understand why it's hard.

 So, we can't unpack MutVar into constructors because of object identity
 problems. But what about directly supporting an extensible set of unlifted
 MutStruct# objects, generalizing (and even replacing) MutVar#? That may be
 too much work, but is it problematic otherwise?

 Needless to say, this is also critical if we ever want best in class
 lockfree mutable structures, just like their Stm and sequential
 counterparts.

 On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones 
 simo...@microsoft.com wrote:

 At the very least I'll take this email and turn it into a short
 article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe
 make a ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It points
 directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this all
 lives on the heap.



 They were added to make some of the DPH stuff fast when it has to deal
 with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list in
 Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on the
 heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL
 ~ DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with
 -funbox-strict-fields or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and
 worsening our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld DLL ~ DLL



 This means that every operation we perform on this structure will be
 about half of the speed of an implementation in most other languages
 assuming we're memory bound on loading things into cache!



 Making Progress

 --



 I have been working on a number of data structures where the
 indirection of going from something in * out to an object in # which
 contains the real pointer to my target and coming back effectively doubles
 my runtime.



 We go out to the MutVar# because we are allowed to put the MutVar#
 onto the mutable list when we dirty it. There is a well defined
 write-barrier.



 I could change out the representation to use



 data DLL = DLL (MutableArray# RealWorld DLL) | Nil



 I can just store two pointers in the MutableArray# every time, but
 this doesn't help _much_ directly. It has reduced the amount of distinct

Re: GHC 7.10 complie time regression

2015-08-28 Thread Erik de Castro Lopo

Simon Peyton Jones wrote:

 no it's not expected to take much longer.  Can you make a ticket with
 a reproducible test case?

An make sure you are using ghc 7.10.2 and not 7.10.1 because 7.10.2
had some signifcant fixes for these kinds of issues.

Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: ArrayArrays

I like the possibility of a general solution for mutable structs (like Ed
said), and I'm trying to fully understand why it's hard.

So, we can't unpack MutVar into constructors because of object identity
problems. But what about directly supporting an extensible set of unlifted
MutStruct# objects, generalizing (and even replacing) MutVar#? That may be
too much work, but is it problematic otherwise?

Needless to say, this is also critical if we ever want best in class
lockfree mutable structures, just like their Stm and sequential
counterparts.

On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com
wrote:

 At the very least I'll take this email and turn it into a short article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe make a
 ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It points
 directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this all
 lives on the heap.



 They were added to make some of the DPH stuff fast when it has to deal
 with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list in
 Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on the heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~
 DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with -funbox-strict-fields
 or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and worsening
 our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld DLL ~ DLL



 This means that every operation we perform on this structure will be about
 half of the speed of an implementation in most other languages assuming
 we're memory bound on loading things into cache!



 Making Progress

 --



 I have been working on a number of data structures where the indirection
 of going from something in * out to an object in # which contains the real
 pointer to my target and coming back effectively doubles my runtime.



 We go out to the MutVar# because we are allowed to put the MutVar# onto
 the mutable list when we dirty it. There is a well defined write-barrier.



 I could change out the representation to use



 data DLL = DLL (MutableArray# RealWorld DLL) | Nil



 I can just store two pointers in the MutableArray# every time, but this
 doesn't help _much_ directly. It has reduced the amount of distinct
 addresses in memory I touch on a walk of the DLL from 3 per object to 2.



 I still have to go out to the heap from my DLL and get to the array object
 and then chase it to the next DLL and chase that to the next array. I do
 get my two pointers together in memory though. I'm paying for a card
 marking table as well, which I don't particularly need with just two
 pointers, but we can shed that with the SmallMutableArray# machinery
 added back in 7.10, which is just the old array code a a new data type,
 which can speed things up a bit when you don't have very big arrays:



 data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil



 But what if I wanted my object itself to live in # and have two mutable
 fields and be able to share the sme write barrier?



 An ArrayArray# points directly to other unlifted array types. What if we
 have one # - * wrapper on the outside to deal with the impedence mismatch
 between the imperative world and Haskell, and then just let the
 ArrayArray#'s hold other arrayarrays.



 data DLL = DLL (MutableArrayArray# RealWorld)



 now I need to make up a new Nil, which I can just make be a special
 MutableArrayArray# I allocate on program startup. I can even abuse pattern
 synonyms. Alternately I can exploit the internals further to make this
 cheaper.



 Then I can use the readMutableArrayArray# and writeMutableArrayArray#
 calls to directly access the preceding and next entry in the linked list.



 So now we have one DLL wrapper which just 'bootstraps me' into a strict
 world, and everything there lives in #.



 next :: DLL - IO DLL

 next (DLL m) = IO $ \s - case readMutableArrayArray# s of

(# s', n #) - (# s', DLL n #)



 It turns out GHC is quite happy to optimize all of that code to keep
 things unboxed. The 'DLL' wrappers get removed pretty easily when they are
 known strict and you chain operations of this sort!



 Cleaning it Up

 --



 Now I have one outermost indirection pointing to an array that points
 directly to other

Re: ArrayArrays

Some form of MutableStruct# with a known number of words and a known number
of pointers is basically what Ryan Yates was suggesting above, but where
the word counts were stored in the objects themselves.

Given that it'd have a couple of words for those counts it'd likely want to
be something we build in addition to MutVar# rather than a replacement.

On the other hand, if we had to fix those numbers and build info tables
that knew them, and typechecker support, for instance, it'd get rather
invasive.

Also, a number of things that we can do with the 'sized' versions above,
like working with evil unsized c-style arrays directly inline at the end of
the structure cease to be possible, so it isn't even a pure win if we did
the engineering effort.

I think 90% of the needs I have are covered just by adding the one
primitive. The last 10% gets pretty invasive.

-Edward

On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton rrnew...@gmail.com wrote:

 I like the possibility of a general solution for mutable structs (like Ed
 said), and I'm trying to fully understand why it's hard.

 So, we can't unpack MutVar into constructors because of object identity
 problems. But what about directly supporting an extensible set of unlifted
 MutStruct# objects, generalizing (and even replacing) MutVar#? That may be
 too much work, but is it problematic otherwise?

 Needless to say, this is also critical if we ever want best in class
 lockfree mutable structures, just like their Stm and sequential
 counterparts.

 On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones simo...@microsoft.com
 wrote:

 At the very least I'll take this email and turn it into a short article.

 Yes, please do make it into a wiki page on the GHC Trac, and maybe make a
 ticket for it.


 Thanks



 Simon



 *From:* Edward Kmett [mailto:ekm...@gmail.com]
 *Sent:* 27 August 2015 16:54
 *To:* Simon Peyton Jones
 *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs
 *Subject:* Re: ArrayArrays



 An ArrayArray# is just an Array# with a modified invariant. It points
 directly to other unlifted ArrayArray#'s or ByteArray#'s.



 While those live in #, they are garbage collected objects, so this all
 lives on the heap.



 They were added to make some of the DPH stuff fast when it has to deal
 with nested arrays.



 I'm currently abusing them as a placeholder for a better thing.



 The Problem

 -



 Consider the scenario where you write a classic doubly-linked list in
 Haskell.



 data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL)



 Chasing from one DLL to the next requires following 3 pointers on the
 heap.



 DLL ~ IORef (Maybe DLL) ~ MutVar# RealWorld (Maybe DLL) ~ Maybe DLL ~
 DLL



 That is 3 levels of indirection.



 We can trim one by simply unpacking the IORef with -funbox-strict-fields
 or UNPACK



 We can trim another by adding a 'Nil' constructor for DLL and worsening
 our representation.



 data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil



 but now we're still stuck with a level of indirection



 DLL ~ MutVar# RealWorld DLL ~ DLL



 This means that every operation we perform on this structure will be
 about half of the speed of an implementation in most other languages
 assuming we're memory bound on loading things into cache!



 Making Progress

 --



 I have been working on a number of data structures where the indirection
 of going from something in * out to an object in # which contains the real
 pointer to my target and coming back effectively doubles my runtime.



 We go out to the MutVar# because we are allowed to put the MutVar# onto
 the mutable list when we dirty it. There is a well defined write-barrier.



 I could change out the representation to use



 data DLL = DLL (MutableArray# RealWorld DLL) | Nil



 I can just store two pointers in the MutableArray# every time, but this
 doesn't help _much_ directly. It has reduced the amount of distinct
 addresses in memory I touch on a walk of the DLL from 3 per object to 2.



 I still have to go out to the heap from my DLL and get to the array
 object and then chase it to the next DLL and chase that to the next array.
 I do get my two pointers together in memory though. I'm paying for a card
 marking table as well, which I don't particularly need with just two
 pointers, but we can shed that with the SmallMutableArray# machinery
 added back in 7.10, which is just the old array code a a new data type,
 which can speed things up a bit when you don't have very big arrays:



 data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil



 But what if I wanted my object itself to live in # and have two mutable
 fields and be able to share the sme write barrier?



 An ArrayArray# points directly to other unlifted array types. What if we
 have one # - * wrapper on the outside to deal with the impedence mismatch
 between the imperative world and Haskell, and then just let the
 ArrayArray#'s hold other arrayarrays.



 data DLL = DLL

Re: ArrayArrays