Re: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Ben Gamari

Ccing David Spitzenberg, who has thought about proc-point splitting, which
is relevant for reasons that we will see below.


Harendra Kumar  writes:

> On 16 June 2016 at 13:59, Ben Gamari  wrote:
>>
>> It actually came to my attention while researching this that the
>> -fregs-graph flag is currently silently ignored [2]. Unfortunately this
>> means you'll need to build a new compiler if you want to try using it.
>
> Yes I did try -fregs-graph and -fregs-iterative both. To debug why nothing
> changed I had to compare the executables produced with and without the
> flags and found them identical.  A note in the manual could have saved me
> some time since that's the first place to go for help. I was wondering if I
> am making a mistake in the build and if it is not being rebuilt
> properly. Your note confirms my observation, it indeed does not change
> anything.
>
Indeed; I've opened D2335 [1] to reenable -fregs-graph and add an
appropriate note to the users guide.

>> All-in-all, the graph coloring allocator is in great need of some love;
>> Harendra, perhaps you'd like to have a try at dusting it off and perhaps
>> look into why it regresses in compiler performance? It would be great if
>> we could use it by default.
>
> Yes, I can try that. In fact I was going in that direction and then stopped
> to look at what llvm does. llvm gave me impressive results in some cases
> though not so great in others. I compared the code generated by llvm and it
> perhaps did a better job in theory (used fewer instructions) but due to
> more spilling the end result was pretty similar.
>
For the record, I have also struggled with register spilling issues in
the past. See, for instance, #10012, which describes a behavior which
arises from the C-- sinking pass's unwillingness to duplicate code
across branches. While in general it's good to avoid the code bloat that
this duplication implies, in the case shown in that ticket duplicating
the computation would be significantly less code than the bloat from
spilling the needed results.

> But I found a few interesting optimizations that llvm did. For example,
> there was a heap adjustment and check in the looping path which was
> redundant and was readjusted in the loop itself without use. LLVM either
> removed the redundant  _adjustments_ in the loop or moved them out of the
> loop. But it did not remove the corresponding heap _checks_. That makes me
> wonder if the redundant heap checks can also be moved or removed. If we can
> do some sort of loop analysis at the CMM level itself and avoid or remove
> the redundant heap adjustments as well as checks or at least float them out
> of the cycle wherever possible. That sort of optimization can make a
> significant difference to my case at least. Since we are explicitly aware
> of the heap at the CMM level there may be an opportunity to do better than
> llvm if we optimize the generated CMM or the generation of CMM itself.
>
Very interesting, thanks for writing this down! Indeed if these checks
really are redundant then we should try to avoid them. Do you have any
code you could share that demosntrates this?

It would be great to open Trac tickets to track some of the optimization
opportunities that you noted we may be missing. Trac tickets are far
easier to track over longer durations than mailing list conversations,
which tend to get lost in the noise after a few weeks pass.

> A thought that came to my mind was whether we should focus on getting
> better code out of the llvm backend or the native code generator. LLVM
> seems pretty good at the specialized task of code generation and low level
> optimization, it is well funded, widely used and has a big community
> support. That allows us to leverage that huge effort and take advantage of
> the new developments. Does it make sense to outsource the code generation
> and low level optimization tasks to llvm and ghc focussing on higher level
> optimizations which are harder to do at the llvm level? What are the
> downsides of using llvm exclusively in future?
>

There is indeed a question of where we wish to focus our optimization
efforts. However, I think using LLVM exclusively would be a mistake.
LLVM is a rather large dependency that has in the past been rather
difficult to track (this is why we now only target one LLVM release in a
given GHC release). Moreover, it's significantly slower than our
existing native code generator. There are a number of reasons for this,
some of which are fixable. For instance, we currently make no effort to tell
LLVM which passes are worth running and which we've handled; this is
something which should be fixed but will require a rather significant
investment by someone to determine how GHC's and LLVM's passes overlap,
how they interact, and generally which are helpful (see GHC #11295).

Furthermore, there are a few annoying impedance mismatches between Cmm
and LLVM's representation. This can be seen 

RE: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Simon Peyton Jones
| All-in-all, the graph coloring allocator is in great need of some love;
| Harendra, perhaps you'd like to have a try at dusting it off and perhaps
| look into why it regresses in compiler performance? It would be great if
| we could use it by default.

I second this.   Volunteers are sorely needed.

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Harendra Kumar
That's a nice read, thanks for the pointer. I agree with the solution
presented there. If we can do that it will be awesome. If help is needed I
can spend some time on it.

One of the things that I noticed is that the code can be optimized
significantly if we know the common case so that we can optimize that path
at the expense of less common path. At times I saw wild difference in
performance just by a very small change in the source. I could attribute
the difference to code blocks having moved and differently placed jump
instructions or change in register allocations impacting the common case
more. This could be avoided if we know the common case.

The common case is not visible or obvious to low level tools. It is easier
to write the code in a low level language like C such that it is closer to
how it will run on the processor, we can also easily influence gcc from the
source level. It is harder to do the same in a high level language like
Haskell. Perhaps there is no point in doing so. What we can do instead is
to use the llvm toolchain to perform feedback directed optimization and it
will adjust the low level code accordingly based on your feedback runs.
That will be entirely free since it can be done at the llvm level.

My point is that it will pay off in things like that if we invest in
integrating llvm better.

-harendra

On 16 June 2016 at 16:48, Karel Gardas  wrote:

> On 06/16/16 12:53 PM, Harendra Kumar wrote:
>
>> A thought that came to my mind was whether we should focus on getting
>> better code out of the llvm backend or the native code generator. LLVM
>> seems pretty good at the specialized task of code generation and low
>> level optimization, it is well funded, widely used and has a big
>> community support. That allows us to leverage that huge effort and take
>> advantage of the new developments. Does it make sense to outsource the
>> code generation and low level optimization tasks to llvm and ghc
>> focussing on higher level optimizations which are harder to do at the
>> llvm level? What are the downsides of using llvm exclusively in future?
>>
>
> Good reading IMHO about the topic is here:
> https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend
>
> Cheers,
> Karel
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Karel Gardas

On 06/16/16 12:53 PM, Harendra Kumar wrote:

A thought that came to my mind was whether we should focus on getting
better code out of the llvm backend or the native code generator. LLVM
seems pretty good at the specialized task of code generation and low
level optimization, it is well funded, widely used and has a big
community support. That allows us to leverage that huge effort and take
advantage of the new developments. Does it make sense to outsource the
code generation and low level optimization tasks to llvm and ghc
focussing on higher level optimizations which are harder to do at the
llvm level? What are the downsides of using llvm exclusively in future?


Good reading IMHO about the topic is here: 
https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend


Cheers,
Karel

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Harendra Kumar
On 16 June 2016 at 13:59, Ben Gamari  wrote:
>
> It actually came to my attention while researching this that the
> -fregs-graph flag is currently silently ignored [2]. Unfortunately this
> means you'll need to build a new compiler if you want to try using it.

Yes I did try -fregs-graph and -fregs-iterative both. To debug why nothing
changed I had to compare the executables produced with and without the
flags and found them identical.  A note in the manual could have saved me
some time since that's the first place to go for help. I was wondering if I
am making a mistake in the build and if it is not being rebuilt
properly. Your note confirms my observation, it indeed does not change
anything.

> All-in-all, the graph coloring allocator is in great need of some love;
> Harendra, perhaps you'd like to have a try at dusting it off and perhaps
> look into why it regresses in compiler performance? It would be great if
> we could use it by default.

Yes, I can try that. In fact I was going in that direction and then stopped
to look at what llvm does. llvm gave me impressive results in some cases
though not so great in others. I compared the code generated by llvm and it
perhaps did a better job in theory (used fewer instructions) but due to
more spilling the end result was pretty similar.

But I found a few interesting optimizations that llvm did. For example,
there was a heap adjustment and check in the looping path which was
redundant and was readjusted in the loop itself without use. LLVM either
removed the redundant  _adjustments_ in the loop or moved them out of the
loop. But it did not remove the corresponding heap _checks_. That makes me
wonder if the redundant heap checks can also be moved or removed. If we can
do some sort of loop analysis at the CMM level itself and avoid or remove
the redundant heap adjustments as well as checks or at least float them out
of the cycle wherever possible. That sort of optimization can make a
significant difference to my case at least. Since we are explicitly aware
of the heap at the CMM level there may be an opportunity to do better than
llvm if we optimize the generated CMM or the generation of CMM itself.

A thought that came to my mind was whether we should focus on getting
better code out of the llvm backend or the native code generator. LLVM
seems pretty good at the specialized task of code generation and low level
optimization, it is well funded, widely used and has a big community
support. That allows us to leverage that huge effort and take advantage of
the new developments. Does it make sense to outsource the code generation
and low level optimization tasks to llvm and ghc focussing on higher level
optimizations which are harder to do at the llvm level? What are the
downsides of using llvm exclusively in future?

-harendra
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: ghc-pkg, package database path containing a trailing slash, and ${pkgroot}

2016-06-16 Thread Ben Gamari
Nicolas Dudebout  writes:

> When passing a package database to ghc-pkg via GHC_PACKAGE_PATH or
> --package-db, ${pkgroot} does not get computed properly if the input path
> contains a trailing slash.
>
Thanks for the report, Nicolas. I've opened #12196 to track this and
proposed a fix in D2336.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: CMM-to-ASM: Register allocation wierdness

2016-06-16 Thread Ben Gamari
Harendra Kumar  writes:

> My earlier experiment was on GHC-7.10.3. I repeated this on GHC-8.0.1 and
> the assembly traced was exactly the same except for a marginal improvement.
> The 8.0.1 code generator removed the r14/r11 swap but the rest of the
> register ring shift remains the same. I have updated the github gist with
> the 8.0.1 trace:
>

Have you tried compiling with -fregs-graph [1] (the graph-coloring
allocator)?

By default GHC uses a very naive linear register allocator which I'd
imagine may produce these sorts of results. At some point there was an
effort to make -fregs-graph the default (see #2790) but it is
unfortunately quite slow despite having a relatively small impact on
produced-code quality in most cases. However, in your case it may be
worth enabling. Note, however, that the graph coloring allocator has a
few quirks of its own (see #8657 and #7697).

It actually came to my attention while researching this that the
-fregs-graph flag is currently silently ignored [2]. Unfortunately this
means you'll need to build a new compiler if you want to try using it.

Simon Marlow: If we really want to disable this option we should at very
least issue an error when the user requests it. However, really it seems
to me like we shouldn't disable it at all; why not just allow the user
to use it and add a note to the documentation stating that the graph
coloring allocator may fail with some programs and if it breaks the user
gets to keep both pieces?

All-in-all, the graph coloring allocator is in great need of some love;
Harendra, perhaps you'd like to have a try at dusting it off and perhaps
look into why it regresses in compiler performance? It would be great if
we could use it by default.

Cheers,

- Ben


[1] 
http://downloads.haskell.org/~ghc/master/users-guide//using-optimisation.html?highlight=register%20graph#ghc-flag--fregs-graph
[2] 
https://git.haskell.org/ghc.git/commitdiff/f0a7261a39bd1a8c5217fecba56c593c353f198c


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs