Re: darcs switchover
John Goerzen [EMAIL PROTECTED] writes: Meanwhile, I noted that the HaXml repo on darcs.haskell.org seems to be a verbatim copy of the darcs repo at York. Ahh. You are correct. Re-converting now, since you've presumably committed patches to the darcs side, is probably not going to be practical. Actually, the way I have been working is to commit changes to CVS first, then to (forget to) propagate them into darcs. AFAIK, the two repositories are in synch right now, but if there is ever any discrepancy, I always treat the CVS one as correct. Thus, now would be an excellent time to re-convert, and I would then throw away my own darcs repo and switch to treating the darcs.haskell.org repo as the master. I can't remember where the rest of the ghc conversion process has reached - is it also about ready to switch over to darcs-as-master yet? Regards, Malcolm ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
GHC vs. GCC on raw vector addition
Hi List, I'm running GHC and GCC head-to-head on the task of adding a bunch of long IOUArray-Vectors really fast. My machine is a Linux-ppc PowerBook and gets a runtime for the GHC-compiled binary that's about 10x as long as for GCC. Simon M. tells me this should be much better. Here are the precise command lines: $ gcc -O3 -o b-addvec-gcc b-addvec-c.c $ ghc -O2 -fasm --make -o b-addvec-ghc b-addvec-hs.hs $ time ./b-addvec-gcc 10 elements 1000 real0m5.130s user0m4.466s sys 0m0.061s $ time ./b-addvec-ghc 10 elements 1000 real0m49.701s user0m43.466s sys 0m0.586s (compiling with -fvia-C -optc-O3 runs only about 1 second longer) Can somebody shed some light on this? Regards, Sven Moritz #include stdio.h #include stdlib.h int main(int argc, char **argv) { int nelems, niterations; int i; double *x, *v; if(strcmp(argv[2], elements) == 0) { nelems = atoi(argv[1]); niterations = atoi(argv[3]); } else if(strcmp(argv[2], iterations) == 0) { nelems = atoi(argv[3]); niterations = atoi(argv[1]); } else { exit(-1); } x = malloc(nelems * sizeof(double)); v = malloc(nelems * sizeof(double)); srand(time()); for(i=0; inelems; i++) { x[i] = 0.0; v[i] = ((double)rand()/RAND_MAX)*2 - 1.0; } for(i=0; initerations; i++) { int j; for(j=0; jnelems; j++) x[j] = x[j] + v[j]; } /* for(i=0; inelems; i++) printf(%f , x[i]); printf(\n); */ return 0; } import Data.Array.IO import System.Environment (getArgs) import System.Random type Vector = IOUArray Int Double main = do (n:f:m:_) - getArgs let (nelems,niterations) = case f of elements- (read n, read m) iterations - (read m, read n) x - newArray (0,nelems-1) 0 :: IO Vector v - newArray_ (0,nelems-1) :: IO Vector for 0 nelems $ \i - do r - randomRIO (-1,1) writeArray v i r for 0 niterations $ \_ - for 0 nelems $ \i - do xi - readArray x i vi - readArray v i writeArray x i (xi+vi) --for 0 nelems $ \i - -- do xi - readArray x i -- putStr (show xi) -- putChar ' ' --putChar '\n' for :: Int - Int - (Int - IO a) - IO () --for i n f -- | i n = do f i -- for (i+1) n f -- | otherwise = return () for i n f = mapM_ f [i..n-1] signature.asc Description: OpenPGP digital signature ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC vs. GCC on raw vector addition
Sven Moritz Hallberg [EMAIL PROTECTED] writes: I'm running GHC and GCC head-to-head on the task of adding a bunch of long IOUArray-Vectors really fast. My machine is a Linux-ppc PowerBook and gets a runtime for the GHC-compiled binary that's about 10x as long as for GCC. Is it possible that gcc is making use of the ppc AltiVec instructions, and ghc is not? Regards, Malcolm ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC vs. GCC on raw vector addition
Hello Sven, Wednesday, January 18, 2006, 3:33:40 PM, you wrote: SMH and gets a runtime for the GHC-compiled binary that's about 10x as long SMH as for GCC. Simon M. tells me this should be much better. Here are the attached version is only 5 times slower :) please note that 1) unsafeRead/Write indexes from 0 and don't checks bounds (just C-like :) 2) generating random values takes about 1.5-2 seconds by itself. Haskell's RNG is very different from C's one -- Best regards, Bulatmailto:[EMAIL PROTECTED] new.hs Description: Binary data ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC vs. GCC on raw vector addition
Bulat Ziganshin wrote: Wednesday, January 18, 2006, 3:33:40 PM, you wrote: SMH and gets a runtime for the GHC-compiled binary that's about 10x as long SMH as for GCC. Simon M. tells me this should be much better. Here are the attached version is only 5 times slower :) please note that 1) unsafeRead/Write indexes from 0 and don't checks bounds (just C-like :) 2) generating random values takes about 1.5-2 seconds by itself. Haskell's RNG is very different from C's one I squeezed a bit more out (see attached). I think the main bottleneck is now the random number generator, in particular it is supplying boxed Doubles which have to be unboxed again before storing in the array. Cheers, Simon import Data.Array.IO import Data.Array.Base import System.Environment (getArgs) import System.Random type Vector = IOUArray Int Double main = do (n:f:m:_) - getArgs let (nelems,niterations) = case f of elements- (read n, read m) iterations - (read m, read n) x - newArray (0,nelems-1) 0 :: IO Vector v - newArray_ (0,nelems-1) :: IO Vector x `seq` v `seq` return () for 0 nelems $ \i - do r - randomRIO (-1,1) unsafeWrite v i r for 0 niterations $ \_ - for 0 nelems $ \i - do xi - unsafeRead x i vi - unsafeRead v i unsafeWrite x i (xi+vi) --for 0 nelems $ \i - -- do xi - unsafeRead x i -- putStr (show xi) -- putChar ' ' --putChar '\n' for :: Int - Int - (Int - IO a) - IO () -- Faster equivalent of mapM_ action [from..to-1] for from to action | from `seq` to `seq` False = undefined for from to action = go from where go i | i=to = return () | otherwise = do action i go $! (i+1) ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: GHC vs. GCC on raw vector addition
Hello Malcolm, Wednesday, January 18, 2006, 4:22:23 PM, you wrote: I'm running GHC and GCC head-to-head on the task of adding a bunch of long IOUArray-Vectors really fast. My machine is a Linux-ppc PowerBook and gets a runtime for the GHC-compiled binary that's about 10x as long as for GCC. MW Is it possible that gcc is making use of the ppc AltiVec instructions, MW and ghc is not? :) even C version performs only 20 millions of additions in one second because this program is most limited by memory throughput - it access to 24 memory bytes per each addition. GHC just can't produce simple loops even for imperative code. JHC can be much better in that area, i strongly recommend Sven to try it -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Lexically scoped type variables
Simon Peyton-Jones wrote: I'm very interested to know whether you like it or hate it. In the latter case, I'd also like to know whether you also have programs that will be broken by the change. I don't use GADTs yet and I assume this change will not (seriously) break our code, but let me/us know which compiler we should use for testing. Cheers Christian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: GHC vs. GCC on raw vector addition
Hello Simon, Wednesday, January 18, 2006, 5:31:25 PM, you wrote: 2) generating random values takes about 1.5-2 seconds by itself. Haskell's RNG is very different from C's one SM I squeezed a bit more out (see attached). x `seq` v `seq` return () it's new trick for me :) now the difference is less than 3x btw, i also use return $! length xs trick to ensure that all xs elements will be evaluated for from to action | from `seq` to `seq` False = undefined and this changes nothing, at least with 6.4.1/mingw32 btw, using mapM_ action [n..m] is very common operation. can it be automatically substituted with my code by using some RULE pragmas in ghc libraries? that will automatically improve many ghc-compiled programs too, i use the following code instead of replicateM: myReplicateM n action = if (n=5*10^4) then sequence (replicate n action) else goLarge n [] = return.reverse where goLarge 0 xs = return xs goLarge n xs = do x - action (goLarge $! n-1) $! x:xs it doesn't overflow stack and works much faster for the large n. that is my testbed for this function: import Control.Monad main = do a - replicateM 1 $ myReplicateM (1*10^6) (return 1) return $! sum (map last a) and also, how about adding to GHC strictness annotations? x - newArray (0,nelems-1) 0 :: IO !Vector v - newArray_ (0,nelems-1) :: IO !Vector for :: !Int - !Int - (!Int - IO a) - IO () it's SO common source of performance problems... SM I think the main bottleneck SM is now the random number generator, in particular it is supplying boxed SM Doubles which have to be unboxed again before storing in the array. as i say, it uses 1.5-2 seconds, i.e. only 10% of time when you run 1000 iterations (may be you not noticed that it used only in initialization?). so, while RNG itself runs 150 times slower (!), it doesn't make so much difference when you run 1000 iterations after initial filling the array and about using Altivec instructions. the code produced for new.hs contains only one `fadd` operation, so it is easy to find entire cycle as it is compiled by GHC. that is one: movl(%ebp), %eax cmpl12(%esi), %eax jge L81 movl8(%esi), %edx leal8(%edx,%eax,8), %eax movl(%eax), %edx movl%edx, 16(%esp) movl4(%eax), %eax movl%eax, 20(%esp) fldl16(%esp) fstpl 24(%esp) fldl24(%esp) fstpl 48(%esp) movl(%ebp), %eax movl4(%esi), %edx leal8(%edx,%eax,8), %eax movl(%eax), %edx movl%edx, 8(%esp) movl4(%eax), %eax movl%eax, 12(%esp) fldl8(%esp) fstpl 24(%esp) fldl24(%esp) fstpl 40(%esp) fldl48(%esp) faddl 40(%esp) fstpl 32(%esp) movl(%ebp), %ecx movl8(%esi), %eax leal8(%eax,%ecx,8), %ecx fldl32(%esp) fstpl 24(%esp) movl24(%esp), %eax movl28(%esp), %edx movl%eax, (%ecx) movl%edx, 4(%ecx) incl(%ebp) movl$_s3IY_info, %eax L85: jmp *%eax L81: good work, yes? ;-) the C source is also amateur :) IF_(s3IY_entry) { W_ _c3MF; StgDouble _s3IP; StgDouble _s3IQ; StgDouble _s3IS; W_ _s3IW; FB_ _c3MF = (I_)(*Sp) = (I_)(R1.p[3]); if (_c3MF = 0x1U) goto _c3MI; _s3IP = PK_DBL((P_)(((R1.p[2]) + 0x8U) + ((*Sp) 0x3U))); _s3IQ = PK_DBL((P_)(((R1.p[1]) + 0x8U) + ((*Sp) 0x3U))); _s3IS = _s3IP + _s3IQ; ASSIGN_DBL((P_)(((R1.p[2]) + 0x8U) + ((*Sp) 0x3U)),_s3IS); _s3IW = (*Sp) + 0x1U; *Sp = _s3IW; JMP_((W_)s3IY_info); _c3MI: R1.p = (P_)(W_)GHCziBase_Z0T_closure; Sp=Sp+1; JMP_(*Sp); FE_ } the only cause that this code is only 3 times slower is that C version is really limited by memory speed. when tested on 1000-element arrays, it is 20 times slower. i'm not yet tried SSE optimization for gcc ;) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Error in GHC
I'm trying to run the following sequence on ghc 6.4: ghc -fglasgow-exts --make Main ghc -o exec Main.o Exemplo1.oBut I always get this error message after the second command:/usr/lib/ghc-6.4/libHSrts.a(Main.o)(.text+0xe): In function `main':: undefined reference to `__stginit_ZCMain'/usr/lib/ghc-6.4/libHSrts.a(Main.o)(.text+0x28): In function `main':: undefined reference to `ZCMain_main_closure'collect2: ld returned 1 exit statusPlease, does anybody know what can I do to fix it?ThanksTays Cristina do Amaral Pales Soares Yahoo! doce lar. Faça do Yahoo! sua homepage.___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[3]: GHC vs. GCC on raw vector addition
Hello Bulat, Wednesday, January 18, 2006, 8:34:54 PM, you wrote: BZ the only cause that this code is only 3 times slower is that C version BZ is really limited by memory speed. when tested on 1000-element BZ arrays, it is 20 times slower. i'm not yet tried SSE optimization for BZ gcc ;) sorry, with the gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops the C version is 50 times faster than best Haskell one... it's the loop from C version: L18: fldl (%edx) faddl (%ecx) fstpl (%edx) fldl 8(%edx) faddl 8(%ecx) fstpl 8(%edx) fldl 16(%edx) faddl 16(%ecx) fstpl 16(%edx) fldl 24(%edx) faddl 24(%ecx) addl $4,%ebx addl $32,%ecx fstpl 24(%edx) addl $32,%edx cmpl -4(%ebp),%ebx jl L18 -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Error in GHC
On 1/18/06, Tays Soares [EMAIL PROTECTED] wrote: I'm trying to run the following sequence on ghc 6.4: ghc -fglasgow-exts --make Main ghc -o exec Main.o Exemplo1.o But I always get this error message after the second command: /usr/lib/ghc-6.4/libHSrts.a(Main.o)(.text+0xe): In function `main': : undefined reference to `__stginit_ZCMain' /usr/lib/ghc-6.4/libHSrts.a(Main.o)(.text+0x28): In function `main': : undefined reference to `ZCMain_main_closure' collect2: ld returned 1 exit status Please, does anybody know what can I do to fix it? Does your main module contain a function called main? Does ghc -fglasgow-exts --make Main -o exec work? -- Friendly, Lemmih ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC vs. GCC on raw vector addition
On Wed, Jan 18, 2006 at 06:18:29PM +0300, Bulat Ziganshin wrote: :) even C version performs only 20 millions of additions in one second because this program is most limited by memory throughput - it access to 24 memory bytes per each addition. GHC just can't produce simple loops even for imperative code. JHC can be much better in that area, i strongly recommend Sven to try it Jhc doesn't have 'true' arrays yet, partially because I have not decided how points-to analysis should work for them. (I will probably just union all their points-to information since they most likely will be filled by the same routine). GHCs indirect calls are really killing its performance in tight loops. I think there is room for collaboration between the various compilers there, since we are all moving to a c-- back end (in theory) we could work on a common c-- - C translator that searches out such uneeded indirections and zaps them before they get to gcc which doesn't handle them well at all. John -- John Meacham - ⑆repetae.net⑆john⑈ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC vs. GCC on raw vector addition
On Wed, Jan 18, 2006 at 08:54:43PM +0300, Bulat Ziganshin wrote: sorry, with the gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops the C version is 50 times faster than best Haskell one... it's the loop from C version: I believe something similar to what I noted here is the culprit: http://www.haskell.org//pipermail/glasgow-haskell-users/2005-October/009174.html it is fixable, but not without modifying ghc. John -- John Meacham - ⑆repetae.net⑆john⑈ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users