Re: Faster Array#/MutableArray# copies

2011-02-18 Thread Max Bolingbroke
On 18 February 2011 01:18, Johan Tibell johan.tib...@gmail.com wrote:
 C compilers, like gcc, go to great lengths making memcpy fast and I
 was thinking that we might be able to steal a trick or two from them.
 I'd like some feedback on these ideas:

It seems like a sufficient solution for your needs would be for us to
use the LTO support in LLVM to inline across module boundaries - in
particular to inline primop implementations into their call sites.
LLVM would then probably deal with unrolling small loops with
statically known bounds.

I don't think this would require a major change to GHC, though LTO
would only work with the Gold linker (which only supports ELF) at the
moment :-(

Cheers,
Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Behavior of the -H RTS option, possible doc/impl mismatch

2011-02-18 Thread Akio Takano
Hi Simon,

Thank you for explanation. I think I now understand why -H behaves that way.

2011/2/17 Simon Marlow marlo...@gmail.com:
 Anyway, with -N2 and above I don't recommend using -H, generally I've found
 it results in lower performance.  -A1m might be good if your CPUs have
 larger L2 caches.  I have some local patches that implement an option like
 -H but which applies to the old generation sizing rather than the nursery,
 which tends to work better with -N2 and above.

An experiment shows my program benefits from larger -H value, at least
with a fixed -A. Also -A256M is much better than -A1M in my case,
perhaps because decreasing the number of minor GCs is very important
to the performance.

-- Takano Akio


 Cheers,
        Simon


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Faster Array#/MutableArray# copies

2011-02-18 Thread Roman Leshchinskiy
Max Bolingbroke wrote:
 On 18 February 2011 01:18, Johan Tibell johan.tib...@gmail.com wrote:

 It seems like a sufficient solution for your needs would be for us to
 use the LTO support in LLVM to inline across module boundaries - in
 particular to inline primop implementations into their call sites. LLVM
 would then probably deal with unrolling small loops with statically known
 bounds.

Could we simply use this?

http://llvm.org/docs/LangRef.html#int_memcpy

Roman




___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Faster Array#/MutableArray# copies

2011-02-18 Thread Roman Leshchinskiy
Johan Tibell wrote:

 * Could we use built-in compiler rules to catch array copies of known
 length and replace them with e.g. unrolled loops? My particular use case
 involves copying small arrays (size: 1-32). Ideally this should be as fast
 as copying a tuple of the corresponding size but I'm pretty sure we're far
 off that goal.

Out of idle curiousity, couldn't you use tuples instead of arrays?

FWIW, I agree that doing something cleverer than just calling memcpy could
be very worthwhile. As Max points out, you could perhaps try to do
something with the LLVM backend.

Roman




___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Recompile with -fPIC (was building a patched ghc)

2011-02-18 Thread José Pedro Magalhães
Hi all,

I'm getting the same error as Alexy below in some 64bit linux system. What
can I do? Adding -fPIC and also -dynamic does not seem to solve the problem.
Also, this only happens with a perf build; devel1 works fine.


Thanks,
Pedro

On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote:

 An attempt to build the trunk gets me this:

 /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64-
 pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation
 R_X86_64_PC32 against symbol `StgRun' can not be used when making a
 shared object; recompile with -fPIC

 -- I use prefix portage on a CentOS box, admittedly a non-standard
 setup.  Its gcc is found first and it wants -fPIC...  Should I just
 add it to CFLAGS or what?

 -- Alexy
 ___
 Haskell-Cafe mailing list
 haskell-c...@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Recompile with -fPIC (was building a patched ghc)

2011-02-18 Thread Wolfram Kahl
I am not certain, but this may be the same problem that I once had,
and that was solved by updating to binutils-2.20.

 ld --version
GNU ld (GNU Binutils) 2.20.1.20100303


Wolfram


On Fri, Feb 18, 2011 at 11:34:03AM +0100, José Pedro Magalhães wrote:
 Hi all,
 
 I'm getting the same error as Alexy below in some 64bit linux system. What
 can I do? Adding -fPIC and also -dynamic does not seem to solve the problem.
 Also, this only happens with a perf build; devel1 works fine.
 
 
 Thanks,
 Pedro
 
 On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote:
 
  An attempt to build the trunk gets me this:
 
  /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64-
  pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation
  R_X86_64_PC32 against symbol `StgRun' can not be used when making a
  shared object; recompile with -fPIC
 
  -- I use prefix portage on a CentOS box, admittedly a non-standard
  setup.  Its gcc is found first and it wants -fPIC...  Should I just
  add it to CFLAGS or what?
 
  -- Alexy
  ___
  Haskell-Cafe mailing list
  haskell-c...@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 
 
 !DSPAM:4d5e4b2789541804284693!

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
 
 
 !DSPAM:4d5e4b2789541804284693!


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Recompile with -fPIC (was building a patched ghc)

2011-02-18 Thread José Pedro Magalhães
Thanks for the tip. I cannot easily update binutils. I am with version
2.17.50.0.6-14.el5 20061020. On another machine (32bit) it works fine...


Cheers,
Pedro

2011/2/18 Wolfram Kahl k...@cas.mcmaster.ca

 I am not certain, but this may be the same problem that I once had,
 and that was solved by updating to binutils-2.20.

  ld --version
 GNU ld (GNU Binutils) 2.20.1.20100303


 Wolfram


 On Fri, Feb 18, 2011 at 11:34:03AM +0100, José Pedro Magalhães wrote:
  Hi all,
 
  I'm getting the same error as Alexy below in some 64bit linux system.
 What
  can I do? Adding -fPIC and also -dynamic does not seem to solve the
 problem.
  Also, this only happens with a perf build; devel1 works fine.
 
 
  Thanks,
  Pedro
 
  On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote:
 
   An attempt to build the trunk gets me this:
  
   /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64-
   pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation
   R_X86_64_PC32 against symbol `StgRun' can not be used when making a
   shared object; recompile with -fPIC
  
   -- I use prefix portage on a CentOS box, admittedly a non-standard
   setup.  Its gcc is found first and it wants -fPIC...  Should I just
   add it to CFLAGS or what?
  
   -- Alexy
   ___
   Haskell-Cafe mailing list
   haskell-c...@haskell.org
   http://www.haskell.org/mailman/listinfo/haskell-cafe
  
 
 
  !DSPAM:4d5e4b2789541804284693!

  ___
  Glasgow-haskell-users mailing list
  Glasgow-haskell-users@haskell.org
  http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
 
 
  !DSPAM:4d5e4b2789541804284693!


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Faster Array#/MutableArray# copies

2011-02-18 Thread Tyson Whitehead
On February 17, 2011 20:18:11 Johan Tibell wrote:
  * Can we use SSE instructions?
 
  * Can we get the C memcpy code inlined into the C-- source (crazy, I
 know). If so we could perhaps benefit directly from optimizations in
 libc.

From the standpoint of numerical code, it would be very nice to get some 
vector .  Perhaps this would also give a natural expression of memcpy.

In the spirit of the common infinite register file assumption, I've always 
imagined idealized arbitrary-length vector types/operations at the C-- level 
(mapped to reality via chunking it over fixed length vector instructions).

I see this is LLVM supports arbitrary length vectors as a first class type

http://llvm.org/docs/LangRef.html#t_vector
http://llvm.org/docs/LangRef.html#i_add

Looking at the C-- specification (section 2.4 -- page 10)

http://www.cminusminus.org/extern/man2.pdf

it seems this may not be such a good fit as it considers everything as fixed-
size bit collections (bits8, bits16, etc.) and booleans (bool).  Presumably 
memory orientated primitive instructions are also out as they break the strict 
load/store architecture.  Has anyone though of how to do this?

Some half baked suggestions

 - perhaps it should be fixed-size bit collections with repetition, or
 - built in primitives for instructions composition (i.e., a map operation)?

Cheers!  -Tyson


signature.asc
Description: This is a digitally signed message part.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users