Re: Faster Array#/MutableArray# copies
On 18 February 2011 01:18, Johan Tibell johan.tib...@gmail.com wrote: C compilers, like gcc, go to great lengths making memcpy fast and I was thinking that we might be able to steal a trick or two from them. I'd like some feedback on these ideas: It seems like a sufficient solution for your needs would be for us to use the LTO support in LLVM to inline across module boundaries - in particular to inline primop implementations into their call sites. LLVM would then probably deal with unrolling small loops with statically known bounds. I don't think this would require a major change to GHC, though LTO would only work with the Gold linker (which only supports ELF) at the moment :-( Cheers, Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Behavior of the -H RTS option, possible doc/impl mismatch
Hi Simon, Thank you for explanation. I think I now understand why -H behaves that way. 2011/2/17 Simon Marlow marlo...@gmail.com: Anyway, with -N2 and above I don't recommend using -H, generally I've found it results in lower performance. -A1m might be good if your CPUs have larger L2 caches. I have some local patches that implement an option like -H but which applies to the old generation sizing rather than the nursery, which tends to work better with -N2 and above. An experiment shows my program benefits from larger -H value, at least with a fixed -A. Also -A256M is much better than -A1M in my case, perhaps because decreasing the number of minor GCs is very important to the performance. -- Takano Akio Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Faster Array#/MutableArray# copies
Max Bolingbroke wrote: On 18 February 2011 01:18, Johan Tibell johan.tib...@gmail.com wrote: It seems like a sufficient solution for your needs would be for us to use the LTO support in LLVM to inline across module boundaries - in particular to inline primop implementations into their call sites. LLVM would then probably deal with unrolling small loops with statically known bounds. Could we simply use this? http://llvm.org/docs/LangRef.html#int_memcpy Roman ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Faster Array#/MutableArray# copies
Johan Tibell wrote: * Could we use built-in compiler rules to catch array copies of known length and replace them with e.g. unrolled loops? My particular use case involves copying small arrays (size: 1-32). Ideally this should be as fast as copying a tuple of the corresponding size but I'm pretty sure we're far off that goal. Out of idle curiousity, couldn't you use tuples instead of arrays? FWIW, I agree that doing something cleverer than just calling memcpy could be very worthwhile. As Max points out, you could perhaps try to do something with the LLVM backend. Roman ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Recompile with -fPIC (was building a patched ghc)
Hi all, I'm getting the same error as Alexy below in some 64bit linux system. What can I do? Adding -fPIC and also -dynamic does not seem to solve the problem. Also, this only happens with a perf build; devel1 works fine. Thanks, Pedro On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote: An attempt to build the trunk gets me this: /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64- pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation R_X86_64_PC32 against symbol `StgRun' can not be used when making a shared object; recompile with -fPIC -- I use prefix portage on a CentOS box, admittedly a non-standard setup. Its gcc is found first and it wants -fPIC... Should I just add it to CFLAGS or what? -- Alexy ___ Haskell-Cafe mailing list haskell-c...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Recompile with -fPIC (was building a patched ghc)
I am not certain, but this may be the same problem that I once had, and that was solved by updating to binutils-2.20. ld --version GNU ld (GNU Binutils) 2.20.1.20100303 Wolfram On Fri, Feb 18, 2011 at 11:34:03AM +0100, José Pedro Magalhães wrote: Hi all, I'm getting the same error as Alexy below in some 64bit linux system. What can I do? Adding -fPIC and also -dynamic does not seem to solve the problem. Also, this only happens with a perf build; devel1 works fine. Thanks, Pedro On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote: An attempt to build the trunk gets me this: /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64- pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation R_X86_64_PC32 against symbol `StgRun' can not be used when making a shared object; recompile with -fPIC -- I use prefix portage on a CentOS box, admittedly a non-standard setup. Its gcc is found first and it wants -fPIC... Should I just add it to CFLAGS or what? -- Alexy ___ Haskell-Cafe mailing list haskell-c...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe !DSPAM:4d5e4b2789541804284693! ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users !DSPAM:4d5e4b2789541804284693! ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Recompile with -fPIC (was building a patched ghc)
Thanks for the tip. I cannot easily update binutils. I am with version 2.17.50.0.6-14.el5 20061020. On another machine (32bit) it works fine... Cheers, Pedro 2011/2/18 Wolfram Kahl k...@cas.mcmaster.ca I am not certain, but this may be the same problem that I once had, and that was solved by updating to binutils-2.20. ld --version GNU ld (GNU Binutils) 2.20.1.20100303 Wolfram On Fri, Feb 18, 2011 at 11:34:03AM +0100, José Pedro Magalhães wrote: Hi all, I'm getting the same error as Alexy below in some 64bit linux system. What can I do? Adding -fPIC and also -dynamic does not seem to solve the problem. Also, this only happens with a perf build; devel1 works fine. Thanks, Pedro On Sat, Jun 26, 2010 at 05:56, braver delivera...@gmail.com wrote: An attempt to build the trunk gets me this: /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64- pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation R_X86_64_PC32 against symbol `StgRun' can not be used when making a shared object; recompile with -fPIC -- I use prefix portage on a CentOS box, admittedly a non-standard setup. Its gcc is found first and it wants -fPIC... Should I just add it to CFLAGS or what? -- Alexy ___ Haskell-Cafe mailing list haskell-c...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe !DSPAM:4d5e4b2789541804284693! ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users !DSPAM:4d5e4b2789541804284693! ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Faster Array#/MutableArray# copies
On February 17, 2011 20:18:11 Johan Tibell wrote: * Can we use SSE instructions? * Can we get the C memcpy code inlined into the C-- source (crazy, I know). If so we could perhaps benefit directly from optimizations in libc. From the standpoint of numerical code, it would be very nice to get some vector . Perhaps this would also give a natural expression of memcpy. In the spirit of the common infinite register file assumption, I've always imagined idealized arbitrary-length vector types/operations at the C-- level (mapped to reality via chunking it over fixed length vector instructions). I see this is LLVM supports arbitrary length vectors as a first class type http://llvm.org/docs/LangRef.html#t_vector http://llvm.org/docs/LangRef.html#i_add Looking at the C-- specification (section 2.4 -- page 10) http://www.cminusminus.org/extern/man2.pdf it seems this may not be such a good fit as it considers everything as fixed- size bit collections (bits8, bits16, etc.) and booleans (bool). Presumably memory orientated primitive instructions are also out as they break the strict load/store architecture. Has anyone though of how to do this? Some half baked suggestions - perhaps it should be fixed-size bit collections with repetition, or - built in primitives for instructions composition (i.e., a map operation)? Cheers! -Tyson signature.asc Description: This is a digitally signed message part. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users