Re: Improving GHC GC for latency-sensitive networked services

2016-10-19 Thread Alexander Kjeldaas
On Tue, Oct 18, 2016 at 4:46 PM, Niklas Hambüchen  wrote:

> I'll be lazy and answer the simplest question in this thread :)
>
> On 18/10/16 16:32, Simon Marlow wrote:
> > If not, are you willing to recompile GHC and all your libraries?
>
> Yes.
>

I'll add that managing this is probably a lot easier now than it was back
then.  Today you would just add a flag in stack.yaml, get a cup of coffee,
and the tooling would guarantee that there's no breakage.



> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Improving GHC GC for latency-sensitive networked services

2016-10-18 Thread Niklas Hambüchen
I'll be lazy and answer the simplest question in this thread :)

On 18/10/16 16:32, Simon Marlow wrote:
> If not, are you willing to recompile GHC and all your libraries?

Yes.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Improving GHC GC for latency-sensitive networked services

2016-10-18 Thread Simon Marlow
Chris,

There are a few things here.

- There are different levels of latency-sensitivity.  The system I work on
at Facebook is latency sensitive and we have no problem with the GC (after
we implemented a few optimisations and did some tuning).  But we're ok with
pauses up to 100ms or so, and our average pause time is <50ms with 100MB
live data on large multicore machines.  There's probably still scope to
reduce that some more.

- Thread-local heaps don't fix the pause-time issue.  They reduce the pause
time for a local collection but have no impact on the global collection,
which is still unbounded in size.

- I absolutely agree we should have incremental or concurrent collection.
It's a big project though.  Most of the technology is fairly well
understood (just read
https://www.amazon.co.uk/gp/product/1420082795/ref=pd_bxgy_14_img_2?ie=UTF8=1=P08F0WS4W6Q6Q6K8CSCF)
and I have some vague plans for what direction to take.

- The issue is not so much maintaining multiple GCs.  We already have 3 GCs
(one of which is experimental and unsupported).  The issue is more that a
new kind of GC has non-local implications because it affects read- and
write-barriers, and making a bad tradeoff can penalize the performance of
all code.  Perhaps you're willing to give up 10% of performance to get
guaranteed 10ms pause times, but can we impose that 10% on everyone?  If
not, are you willing to recompile GHC and all your libraries?

Cheers
Simon


On 17 October 2016 at 18:08, Christopher Allen  wrote:

> It'd be unfortunate if more companies trying out Haskell came to the
> same result: https://blog.pusher.com/latency-working-set-ghc-gc-
> pick-two/#comment-2866985345
> (They gave up and rewrote the service in Golang)
>
> Most of the state of the art I'm aware of (such as from Azul Systems)
> is from when I was using a JVM language, which isn't necessarily
> applicable for GHC.
>
> I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
> abandoned because it penalized performance too much. Does the
> impression that there isn't the labor to maintain two GCs still hold?
> It seems like thread-local heaps would be pervasive.
>
> Does anyone know what could be done in GHC itself to improve this
> situation? Stop-the-world is pretty painful when the otherwise
> excellent concurrency primitives are much of why you're using Haskell.
>
> --- Chris Allen
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Improving GHC GC for latency-sensitive networked services

2016-10-18 Thread Boespflug, Mathieu
Hi Chris,

the GC pauses when using GHC have seldom been a serious issue in most of
our projects at Tweag I/O. We do also have some projects with special
requirements, however (strong synchrony between many machines that block
frequently). For those the GC pauses are indeed a problem. And like most
non-trivial problems, it's a combination of multiple solutions that'll help
us reduce or eliminate these long pauses.

The first line of work involves hacks to the GC. Making the GC incremental
would certainly be nice. Local heaps might help for some workloads, but
it's no silver bullet, as Simon PJ writes below. I think it would be very
illuminating if Simon M or whoever else worked on early experiments
regarding local heaps could post a detailed writeup as to what made the
"complexity of the implementation extremely daunting" and the tradeoffs
involved. Or a link there already is one. :)

As Ben alluded to earlier and as Reddit discovered some weeks ago, as part
of another line of work, we are donating some ongoing effort to help with
the problem by simply taking out some objects from the GC managed heap.
Objects that the GC just doesn't have to deal with at all (either because
allocated elsewhere or not at all, thanks to fusion) can relieve the
pressure on the GC. But quite apart from our effort here, which does
involve an extension to the type system to enable the programmer to make
more of her/his intent clear to the compiler, I think the compact regions
work that will be part of 8.2 is already a great step forward. It requires
some programmer assistance, but if it's GC pause times you're wrestling
with, chances are you have a very hard use case indeed so providing that
assistance is likely easier than most other things you'll have to deal with.

Best,

--
Mathieu Boespflug
Founder at http://tweag.io.

On 17 October 2016 at 19:08, Christopher Allen  wrote:

> It'd be unfortunate if more companies trying out Haskell came to the
> same result: https://blog.pusher.com/latency-working-set-ghc-gc-
> pick-two/#comment-2866985345
> (They gave up and rewrote the service in Golang)
>
> Most of the state of the art I'm aware of (such as from Azul Systems)
> is from when I was using a JVM language, which isn't necessarily
> applicable for GHC.
>
> I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
> abandoned because it penalized performance too much. Does the
> impression that there isn't the labor to maintain two GCs still hold?
> It seems like thread-local heaps would be pervasive.
>
> Does anyone know what could be done in GHC itself to improve this
> situation? Stop-the-world is pretty painful when the otherwise
> excellent concurrency primitives are much of why you're using Haskell.
>
> --- Chris Allen
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Improving GHC GC for latency-sensitive networked services

2016-10-18 Thread Harendra Kumar
It will be awesome if we can spread the GC work instead of stopping the
world for too long. I am a new entrant to the Haskell world but something
similar to this was the first real problem (other than lazy IO) that I
faced with GHC. While I was debugging I had to learn how the GC works to
really understand what's going on. Then I learnt to always strive to keep
the retained heap to the minimum possible. But sometimes the minimum
possible could be a lot. This blog article was sort of a deja vu for me. It
seems this is not a rare problem.

I guess, the compact regions technique as suggested by Ben can be used to
workaround the problem but it sounds like it is application aware and users
will have to discover the possibility of that solution, I might be mistaken
though. If we want GHC to work smoothly for performance critical
applications then we should perhaps find a cost effective way to solve this
in an application transparent manner.

-harendra

On 18 October 2016 at 18:33, Simon Peyton Jones via ghc-devs <
ghc-devs@haskell.org> wrote:

> |  I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
> |  abandoned because it penalized performance too much. Does the
> |  impression that there isn't the labor to maintain two GCs still hold?
> |  It seems like thread-local heaps would be pervasive.
>
> I was optimistic about thread-local heaps, but while perf did improve a
> bit, the complexity of the implementation was extremely daunting.   So we
> decided that the pain didn't justify the gain.
>
> I'm not sure it'd help much here, since the data is long-lived and might
> migrate into the global heap anyway.
>
> Most GCs rely on traversing live data. Here the live data is big. So
> really the only solution is to traverse it incrementally.  You can still
> stop-the-world, but you have to be able to resume normal execution before
> GC is complete, thus smearing GC out into a series of slices, interleaved
> with (but not necessarily in parallel with) the main application.
>
> I believe the that the OCaml runtime now has such a GC.  It'd be lovely to
> have one for GHC.
>
> But I defer to Simon M
>
> Simon
>
> |  -Original Message-
> |  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
> |  Christopher Allen
> |  Sent: 17 October 2016 18:08
> |  To: ghc-devs@haskell.org
> |  Subject: Improving GHC GC for latency-sensitive networked services
> |
> |  It'd be unfortunate if more companies trying out Haskell came to the
> |  same result:
> |  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.
> |  pusher.com%2Flatency-working-set-ghc-gc-pick-two%2F%23comment-
> |  2866985345=01%7C01%7Csimonpj%40microsoft.com%7C04c1bc69e00c47d382
> |  2908d3f6b028d0%7C72f988bf86f141af91ab2d7cd011db47%7C1=dE1VP0u3kQ
> |  L9R7CaGTAOGswRY6SyKH72c0xG%2FOggEK0%3D=0
> |  (They gave up and rewrote the service in Golang)
> |
> |  Most of the state of the art I'm aware of (such as from Azul Systems)
> |  is from when I was using a JVM language, which isn't necessarily
> |  applicable for GHC.
> |
> |  I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
> |  abandoned because it penalized performance too much. Does the
> |  impression that there isn't the labor to maintain two GCs still hold?
> |  It seems like thread-local heaps would be pervasive.
> |
> |  Does anyone know what could be done in GHC itself to improve this
> |  situation? Stop-the-world is pretty painful when the otherwise
> |  excellent concurrency primitives are much of why you're using Haskell.
> |
> |  --- Chris Allen
> |  ___
> |  ghc-devs mailing list
> |  ghc-devs@haskell.org
> |  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.h
> |  askell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
> |  devs=01%7C01%7Csimonpj%40microsoft.com%7C04c1bc69e00c47d3822908d3
> |  f6b028d0%7C72f988bf86f141af91ab2d7cd011db47%7C1=XwvaAPx%2BGqugD4
> |  Kx%2FkXiYticgBCUMkboqH9QE315EhQ%3D=0
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Improving GHC GC for latency-sensitive networked services

2016-10-18 Thread Simon Peyton Jones via ghc-devs
|  I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
|  abandoned because it penalized performance too much. Does the
|  impression that there isn't the labor to maintain two GCs still hold?
|  It seems like thread-local heaps would be pervasive.

I was optimistic about thread-local heaps, but while perf did improve a bit, 
the complexity of the implementation was extremely daunting.   So we decided 
that the pain didn't justify the gain.

I'm not sure it'd help much here, since the data is long-lived and might 
migrate into the global heap anyway.

Most GCs rely on traversing live data. Here the live data is big. So really the 
only solution is to traverse it incrementally.  You can still stop-the-world, 
but you have to be able to resume normal execution before GC is complete, thus 
smearing GC out into a series of slices, interleaved with (but not necessarily 
in parallel with) the main application.

I believe the that the OCaml runtime now has such a GC.  It'd be lovely to have 
one for GHC.  

But I defer to Simon M

Simon

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|  Christopher Allen
|  Sent: 17 October 2016 18:08
|  To: ghc-devs@haskell.org
|  Subject: Improving GHC GC for latency-sensitive networked services
|  
|  It'd be unfortunate if more companies trying out Haskell came to the
|  same result:
|  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.
|  pusher.com%2Flatency-working-set-ghc-gc-pick-two%2F%23comment-
|  2866985345=01%7C01%7Csimonpj%40microsoft.com%7C04c1bc69e00c47d382
|  2908d3f6b028d0%7C72f988bf86f141af91ab2d7cd011db47%7C1=dE1VP0u3kQ
|  L9R7CaGTAOGswRY6SyKH72c0xG%2FOggEK0%3D=0
|  (They gave up and rewrote the service in Golang)
|  
|  Most of the state of the art I'm aware of (such as from Azul Systems)
|  is from when I was using a JVM language, which isn't necessarily
|  applicable for GHC.
|  
|  I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
|  abandoned because it penalized performance too much. Does the
|  impression that there isn't the labor to maintain two GCs still hold?
|  It seems like thread-local heaps would be pervasive.
|  
|  Does anyone know what could be done in GHC itself to improve this
|  situation? Stop-the-world is pretty painful when the otherwise
|  excellent concurrency primitives are much of why you're using Haskell.
|  
|  --- Chris Allen
|  ___
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.h
|  askell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs=01%7C01%7Csimonpj%40microsoft.com%7C04c1bc69e00c47d3822908d3
|  f6b028d0%7C72f988bf86f141af91ab2d7cd011db47%7C1=XwvaAPx%2BGqugD4
|  Kx%2FkXiYticgBCUMkboqH9QE315EhQ%3D=0
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Improving GHC GC for latency-sensitive networked services

2016-10-17 Thread Ben Gamari
Christopher Allen  writes:

> It'd be unfortunate if more companies trying out Haskell came to the
> same result: 
> https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/#comment-2866985345
> (They gave up and rewrote the service in Golang)
>
Aside: Go strikes me as an odd choice here; I would have thought they
would just move to something like Rust or C++ to avoid GC entirely and
still benefit from a reasonably expressive type system. Anyways, moving
along...

> Most of the state of the art I'm aware of (such as from Azul Systems)
> is from when I was using a JVM language, which isn't necessarily
> applicable for GHC.
>
> I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
> abandoned because it penalized performance too much. Does the
> impression that there isn't the labor to maintain two GCs still hold?
> It seems like thread-local heaps would be pervasive.
>
Yes, I believe that this indeed still holds. In general the RTS lacks
hands and garbage collectors (especially parallel implementations)
require a fair bit of background knowledge to maintain.

> Does anyone know what could be done in GHC itself to improve this
> situation? Stop-the-world is pretty painful when the otherwise
> excellent concurrency primitives are much of why you're using Haskell.
>
Indeed it is quite painful. However, I suspect that compact regions
(coming in 8.2) could help in many workloads.

In the case of Pusher's workload (which isn't very precisely described,
so I'm guessing here) I suspect you could take batches of N messages and
add them to a compact region, essentially reducing the number of live
heap objects (and hence work that the GC must perform) by a factor of N.
Of course, in doing this you give up the ability to "retire" messages
individually. To recover this ability one could introduce a Haskell
"garbage collector" task to scan the active regions and copy messages
that should be kept into a new region, dropping those that
should be retired. Here you benefit from the fact that copying into a
compact region can be done in parallel (IIRC), allowing us to
essentially implement a copying, non-stop-the-world GC in our Haskell
program. This allows the runtime's GC to handle a large, static heap as
though it were a constant factor smaller, hopefully reducing pause
duration. That being said, this is all just wild speculation; I could be
wrong, YMMV, etc.

Of course, another option is splitting your workload across multiple
runtime systems. Cloud Haskell is a very nice tool for this which I've
used on client projects with very good results. Obviously it isn't
always possible to segment your heap as required by this approach, but
it is quite effective when possible.

While clearly neither of these are as convenient as a more scalable
garbage collector, they are both things we can (nearly) do today.

Looking farther into the future, know there is a group looking to add
linear types to GHC/Haskell with a separate linear heap (which needn't
be garbage collected). I'll let them elaborate if they so desire.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Improving GHC GC for latency-sensitive networked services

2016-10-17 Thread Christopher Allen
It'd be unfortunate if more companies trying out Haskell came to the
same result: 
https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/#comment-2866985345
(They gave up and rewrote the service in Golang)

Most of the state of the art I'm aware of (such as from Azul Systems)
is from when I was using a JVM language, which isn't necessarily
applicable for GHC.

I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
abandoned because it penalized performance too much. Does the
impression that there isn't the labor to maintain two GCs still hold?
It seems like thread-local heaps would be pervasive.

Does anyone know what could be done in GHC itself to improve this
situation? Stop-the-world is pretty painful when the otherwise
excellent concurrency primitives are much of why you're using Haskell.

--- Chris Allen
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs