Re: [Haskell-cafe] Why does this blow the stack?
On Dec 22, 2007 12:06 AM, Stefan O'Rear [EMAIL PROTECTED] The explicit loop you're talking about is: enumDeltaInteger :: Integer - Integer - [Integer] enumDeltaInteger x d = x : enumDeltaInteger (x+d) d That code isn't very complicated, and I would hope to be able to write code like that in my own programs without having to worry about strictness. Given that the compiler even has an explicit signature, why can't it transform that code to enumDeltaInteger x d = let s = x + d in x : (seq s $ enumDeltaInteger s d) since it knows that (Integer+Integer) is strict? Of course, improving the strictness analysis is harder, but it pays off more, too. Because they simply aren't the same. Try applying your functions to undefined undefined. This took a little work for me to see. Here it is for the interested: Prelude let edi :: Integer - Integer - [Integer]; edi x d = x : edi (x+d) d Prelude let edi' :: Integer - Integer - [Integer]; edi' x d = let s = x + d in x : (seq s $ edi s d) Prelude _:_:_ - return $ edi undefined undefined Prelude _:_:_ - return $ edi' undefined undefined *** Exception: Prelude.undefined Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why does this blow the stack?
On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote: Because they simply aren't the same. Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program: Prelude length $ take 10 ([undefined ..] :: [Integer]) 10 With his patch, that will throw. Some other types have the same issue (lazy Enum when it could be strict, leading to stack overflow): Prelude length $ take 10 ([undefined ..] :: [Double]) 10 Prelude length $ take 10 ([undefined ..] :: [Float]) 10 Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CFloat]) 10 Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CDouble]) 10 Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CLDouble]) 10 Prelude Data.Ratio length $ take 10 ([undefined ..] :: [Ratio Int]) 10 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [1/16] SBM: The Haskell and C benchmarks
Here are the 48 Haskell and C benchmarks. Don Stewart contributed three (although I had to fight a bit to make one of them compile). Jules Bean (quicksilver) contributed one. Bertram Felgenhauer (int-e) contributed three (in the form of a single file, which I untangled). Spencer Jannsen (sjannsen) contributed one. wli (William Lee Irwin III) inspired me to add the getwchar benchmarks. I used the following shell code to gather all the benchmarks: (for F in hs/*.hs c/*.c; \ do echo --; \ echo $F:;\ echo ; \ cat $F; \ done; \ echo == \ ) xx.txt They are not in the same order as in the Makefile or in the reports, unfortunately. -Peter -- hs/byte-bsacc.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as B cnt :: Int - B.ByteString - Int cnt !acc !bs = if B.null bs then acc else cnt (acc+1) (B.tail bs) main = do s - B.getContents print (cnt 0 s) -- hs/byte-bsfoldlx.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as B cnt :: B.ByteString - Int cnt !bs = B.foldl' (\sum _ - sum+1) 0 bs main = do s - B.getContents print (cnt s) -- hs/byte-bsfoldrx.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as B cnt :: B.ByteString - Int cnt !bs = B.foldr' (\_ sum - sum+1) 0 bs main = do s - B.getContents print (cnt s) -- hs/byte-bsl---acc.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy as B cnt :: Int - B.ByteString - Int cnt !acc !bs = if B.null bs then acc else cnt (acc+1) (B.tail bs) main = do s - B.getContents print (cnt 0 s) -- hs/byte-x-acc-1.hs: {-# LANGUAGE BangPatterns #-} cnt :: Int - String - Int cnt !acc bs = if null bs then acc else cnt (acc+1) (tail bs) main = do s - getContents print (cnt 0 s) -- hs/byte-x-acc-2.hs: {-# LANGUAGE BangPatterns #-} cnt :: Int - String - Int cnt !acc !bs = if null bs then acc else cnt (acc+1) (tail bs) main = do s - getContents print (cnt 0 s) -- hs/byte-x-foldl.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt !bs = foldl (\sum _ - sum+1) 0 bs main = do s - getContents print (cnt s) -- hs/byte-x-foldr-1.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt bs = foldr (\_ sum - sum+1) 0 bs main = do s - getContents print (cnt s) -- hs/byte-x-foldr-2.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt !bs = foldr (\_ sum - sum+1) 0 bs main = do s - getContents print (cnt s) -- hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) -- hs/space-bs-c8-count.hs: -- Don Stewart import qualified Data.ByteString.Char8 as B main = print . B.count ' ' = B.getContents -- hs/space-bs-c8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) -- hs/space-bs-c8-foldlx-2.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B main = do s - B.getContents print $ B.foldl' (\v c - if c == ' ' then v+1 else v :: Int) 0 s -- hs/space-bs-c8-foldrx.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldr' (\c sum - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) -- hs/space-bs-c8-lenfil.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs = B.length (B.filter (== ' ') bs) main = do s - B.getContents print (cnt s) -- hs/space-bslc8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1
[Haskell-cafe] [0/16] SBM: Simple Bytestring Microbenchmarks, Overview and Introduction
Table of contents: 0/16This email 1/16The Haskell and C benchmarks 2/16Inner loops of the hand-tweaked assembly benchmarks 3/16The Makefile 4/16How to use the Makefile (how to run benchmarks etc.) 5/16Support scripts and scriptlets 6/166.9.20071124 Athlon Duron 7/166.9.20071208 Core Duo 8/166.9.20071119 Pentium III 9/166.9.20071119 Athlon64 10/16 Graphs for 6.9.x across four cpus 11/16 Graphs for hand-tweaked assembly benchmarks 12/16 Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+ 13/16 Graphs that show the infidelity of -sstderr 14/16 Behind the measurements (rationale) 15/16 Predictions compared to the measurements 16/16 Discussion and Conclusion Simple Bytestring Microbenchmarks - Introduction I love parsers. I have been writing parsers for fun for over twenty years. The nicest way to construct a parser used to be to write a recursive descent parser by hand. If you had to work with people who'd had the misfortune of a university education, you would resort to lex and yacc (flex and bison), despite their many shortcomings. Combinator parsers is the only real improvement over hand-written recursive descent parsers that I know of. They do tend to require features that not all languages provide. I don't know how to write a good one in C, for example. They do work very well in Haskell, though. So, I've started writing a parser in Haskell (ghc, really) for the programming language X++. X++ is not a nice language but that's beside the point. The challenge for me is to write an efficient compiler + provide good analysis tools for X++. I think I stand a better chance of doing that in Haskell (ghc) than in practically any other language. There are a few drawbacks, though. I love speed. And efficiency. String handling in Haskell -- Native strings are simple and generally work well, but they are slow, take up too much memory and there's the whole encoding mess that still needs to be sorted out. People have worked on other string representations and libraries for quite a while. I think packedstrings (as used in Darcs) was one of the first ones. Bytestrings is the current incarnation. It seems to be just the right thing, especially when combined with improved automatic fusion in the compiler so higher-order functions don't have to be expensive. I use Parsec as my parser combinator library at the moment, which uses native strings. I would dearly love Parsec to be faster and use less memory. I think bytestrings will be part of any substantial improvement in Parsec's resource consumption. Other performance concerns -- File I/O is also interesting for a compiler writer. I would like to have a program that is as fast as possible both when the source files are already cached by the operating system and when they are not. The former situation is best handled with mmap() and the latter is best handled by read(), preferably in combination with multi-threading so the compiler doesn't have to waste too much time waiting for disk seeks. Haskell seems to be very close to ideal for me because it has very good threading support and very accessible raw access to the operating system. File I/O is not my current bottleneck, though. I'll probably take a closer look at file I/O when the other performance problems have been solved. Then there's the general quality of the generated code. Having read just about every paper on ghc that was available back in the late nineties (when I first looked at Haskell), I'd thought that the quality was good and that the compiler also had extremely good high-level optimizations, in other words, that abstraction was free. I also read the C-- papers and thought that it was a very interesting and promising approach. I'd expected the C-- path to have matured and be well- optimized by now. Unfortunately, the backend is /the/ weak spot in ghc. The frontend is heroic, the typesystems are (too) abundant and rich, the language itself is nice -- but the backend is not. Looking at the generated code I'd say that it is slightly better than Turbo Pascal 3.x and about on par with Turbo Pascal 4.0, a compiler that didn't use any intermediate code at all, compiled each statement in isolation, was single-pass, and had a compilation speed of about 27000 lines per minute on an 8 MHz IBM PC AT. Ecosystem and culture - Haskell has a very good ecosystem. Probably the second best one amongst the modern functional languages. Ten years ago, I'd thought that Standard ML would win but the only MLish language with a good ecosystem and culture is OCaml, which unfortunately isn't really Standard ML. By ecosystem I mean things like access to raw operating system calls, access to libraries written in other languages, readily available libraries for graphical user interfaces, databases, XML processing, network
[Haskell-cafe] [2/16] SBM: Inner loops of the hand-tweaked assembly benchmarks
I've taken the two benchmarks byte-bsacc and space-bs-c8-acc-1 and gradually tweaked their inner loops from something that used memory all the time to something that used registers more and more efficiently. I've done this gradually, pretty much one register at a time. Along the way, I've also done a simple common subexpression/loop hoisting thing in which I combined the pointer to the start of the string and the index into the string into a single pointer. Doing this in real life may cause bad problems with the garbage collector. At the end, I go a bit mad and start doing heroic optimizations (reading four bytes at a time, using MMX registers to read 8 bytes at a time, twisted MMX math to keep 8 space counters in an MMX register + a bit of loop unrolling). Here follows first the two original inner loops and then the 23 hand-tweaked versions. I used the following shell code to isolate the inner loops: (for F in hs/byte-bsacc.s hs/space-bs-c8-acc-1.s hand/*.s ; \ do echo --; \ echo $F:;\ echo ; \ cat $F | perl -e 'while(){ if (/Main_zdwcnt_info:/ .. /.section .data/) { print; }}' | head -n-1; \ done; \ echo ==; \ ) xx.txt -Peter -- hs/byte-bsacc.s: Main_zdwcnt_info: .LcYL: cmpl $0,16(%ebp) jle .LcYO movl 12(%ebp),%eax incl %eax movl (%ebp),%ecx incl %ecx subl $1,16(%ebp) movl %eax,12(%ebp) movl %ecx,(%ebp) jmp Main_zdwcnt_info .LcYO: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) -- hs/space-bs-c8-acc-1.s: Main_zdwcnt_info: .Lc16u: cmpl $0,16(%ebp) jle .Lc16x movl 4(%ebp),%eax movl 12(%ebp),%ecx movzbl (%eax,%ecx,1),%eax cmpl $32,%eax jne .Lc16F movl 12(%ebp),%eax incl %eax movl (%ebp),%ecx incl %ecx subl $1,16(%ebp) movl %eax,12(%ebp) movl %ecx,(%ebp) jmp Main_zdwcnt_info .Lc16x: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) .Lc16F: movl 12(%ebp),%eax incl %eax subl $1,16(%ebp) movl %eax,12(%ebp) jmp Main_zdwcnt_info -- hand/byte-bsacc-a.s: Main_zdwcnt_info: .LcYN: cmpl $0,16(%ebp) jle .LcYQ movl 00(%ebp),%ecx movl 12(%ebp),%eax movl 16(%ebp),%edx incl %ecx incl %eax decl %edx movl %ecx,00(%ebp) movl %eax,12(%ebp) movl %edx,16(%ebp) jmp Main_zdwcnt_info .LcYQ: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) -- hand/byte-bsacc-b.s: Main_zdwcnt_info: .LcYN: cmpl $0,16(%ebp) jle .LcYQ movl 00(%ebp),%ecx movl 12(%ebp),%eax movl 16(%ebp),%edx .L_again: cmpl $0,%edx jle .L_out incl %ecx incl %eax decl %edx jmp .L_again .L_out: movl %ecx,00(%ebp) movl %eax,12(%ebp) movl %edx,16(%ebp) jmp Main_zdwcnt_info .LcYQ: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) -- hand/byte-bsacc-c.s: Main_zdwcnt_info: .LcYN: cmpl $0,16(%ebp) jle .LcYQ movl 00(%ebp),%ecx movl 12(%ebp),%eax movl 16(%ebp),%edx cmpl $0,%edx jle .L_out .L_again: incl %ecx incl %eax decl %edx cmpl $0,%edx jg .L_again .L_out: movl %ecx,00(%ebp) movl %eax,12(%ebp) movl %edx,16(%ebp) jmp Main_zdwcnt_info .LcYQ: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) -- hand/byte-bsacc-d.s: Main_zdwcnt_info: .LcYN: cmpl $0,16(%ebp) jle .LcYQ movl 00(%ebp),%ecx movl 12(%ebp),%eax movl 16(%ebp),%edx cmpl $0,%edx jle .L_out .align 16 .L_again: incl %ecx incl %eax decl %edx cmpl $0,%edx jg .L_again .L_out: movl %ecx,00(%ebp) movl %eax,12(%ebp) movl %edx,16(%ebp) jmp Main_zdwcnt_info .LcYQ: movl (%ebp),%esi addl $20,%ebp jmp *(%ebp) -- hand/space-bs-c8-acc-1-a.s: Main_zdwcnt_info: .Lc16w: cmpl $0,16(%ebp) jle .Lc16z movl 4(%ebp),%eax movl 12(%ebp),%ecx movzbl (%eax,%ecx,1),%eax cmpl $32,%eax jne .Lc16H movl 12(%ebp),%eax incl %eax movl (%ebp),%ecx incl %ecx subl $1,16(%ebp) movl %eax,12(%ebp) movl %ecx,(%ebp) jmp Main_zdwcnt_info .Lc16H: movl 12(%ebp),%eax
[Haskell-cafe] [3/16] SBM: The Makefile
This is the entire Makefile. It perhaps ought to be sent as an attachment but my hacky mailer script wouldn't like it. A few of the lines are wider than 80 columns, unfortunately. -Peter # GHC benchmarks of parsing (bytestring, basic code generation, I/O). # # Copyright 2007 Peter Lund [EMAIL PROTECTED], licensed under GPLv2. # # # You will need the following tools: # perl, strace, /usr/bin/time, bash, a gcc that uses shared libraries and libc # (not dietlibc, klibc, uclibc), objdump (usually found in the binutils # package). # # A benchmark run on a new platform can be split into two phases: # # phase1: Compiles, dices, and slices the code in various ways. If this # completes you can be pretty sure that everything works all right. # Performs non-timing sensitive measurements. # # phase2: Performs timing sensitive measurements. It is a good idea to run # this phase on an idle machine, preferably without using X. # For example, you can log out of X and run telinit 1 to get to # single-user mode. If you do so, please remember to set the correct # path to your ghc compiler. # Make a few bits of the makefile less noisy. Q:=@ # # Ask ghc to optimize and warn GHCFLAGS= -O2 -W # Some newer versions of gcc prefer -Wextra -Wall GCCWARNFLAGS=-W -Wall # Default compilers CC=gcc GHC=ghc GHCPKG=ghc-pkg # HSPROGS=hs/byte-bsacc \ hs/byte-bsfoldlx\ hs/byte-bsfoldrx\ hs/byte-bsl---acc \ hs/byte-x-acc-1 \ hs/byte-x-acc-2 \ hs/byte-x-foldl \ \ hs/space-bs-c8-acc-1\ hs/space-bs-c8-count\ hs/space-bs-c8-foldlx-1 \ hs/space-bs-c8-foldlx-2 \ hs/space-bs-c8-foldrx \ hs/space-bs-c8-lenfil \ hs/space-bslc8-acc-1\ hs/space-bslc8-acc-2\ hs/space-bslc8-acc-3\ hs/space-bslc8-chunk-1 \ hs/space-bslc8-chunk-2 \ hs/space-bslc8-chunk-3 \ hs/space-bslc8-chunk-4 \ hs/space-bslc8-count\ hs/space-bslc8-foldl\ hs/space-bslc8-foldlx-1 \ hs/space-bslc8-foldlx-2 \ hs/space-bslc8-foldr-1 \ hs/space-bslc8-foldr-2 \ hs/space-bslc8-lenfil-1 \ hs/space-bslc8-lenfil-2 \ hs/space-bsl---foldlx \ hs/space-x-acc-1\ hs/space-x-acc-2\ hs/space-x-foldl\ hs/space-x-lenfil # RMPROGS keeps track of programs that are not always included in the tests. # We do want 'make clean' to delete them even when they are not currently # part of the build (they may be left over from a previous build). # stack overflow with long4. #HSPROGS:=$(HSPROGS) hs/byte-x-foldr-1 RMPROGS:=$(RMPROGS) hs/byte-x-foldr-1 # stack overflow with long4. #HSPROGS:=$(HSPROGS) hs/byte-x-foldr-2 RMPROGS:=$(RMPROGS) hs/byte-x-foldr-2 # stack overflow with long4. #HSPROGS:=$(HSPROGS) hs/space-x-foldr-1 RMPROGS:=$(RMPROGS) hs/space-x-foldr-1 # stack overflow with long4. #HSPROGS:=$(HSPROGS) hs/space-x-foldr-2 RMPROGS:=$(RMPROGS) hs/space-x-foldr-2 HANDPROGS= hand/byte-bsacc-a\ hand/byte-bsacc-b\ hand/byte-bsacc-c\ hand/byte-bsacc-d\ \ hand/space-bs-c8-acc-1-a \ hand/space-bs-c8-acc-1-b \ hand/space-bs-c8-acc-1-c \ hand/space-bs-c8-acc-1-d \ hand/space-bs-c8-acc-1-e \ hand/space-bs-c8-acc-1-f \ hand/space-bs-c8-acc-1-g \ hand/space-bs-c8-acc-1-h \ hand/space-bs-c8-acc-1-i \ hand/space-bs-c8-acc-1-j \ hand/space-bs-c8-acc-1-k \ hand/space-bs-c8-acc-1-l \ hand/space-bs-c8-acc-1-m \ hand/space-bs-c8-acc-1-n \ hand/space-bs-c8-acc-1-o \ hand/space-bs-c8-acc-1-p \ hand/space-bs-c8-acc-1-q \ hand/space-bs-c8-acc-1-r \ hand/space-bs-c8-acc-1-s RMPROGS:=$(RMPROGS) $(HANDPROGS) ifeq ($(shell $(GHCPKG) list | grep bytestring),) # ghc 6.6.1 with an old version of bytestring in 'base' but without its own # module name HSPROGS:=$(shell printf %s\n $(HSPROGS) | grep -v '.*-chunk-.*') endif HANDTEXT:=including hand-tweaked assembly ifeq ($(shell $(GHC) --version | grep 6.9.20071119),) HANDPROGS:= HANDTEXT:=no hand-tweaked assembly endif ifneq ($(SUFFIX),) HANDPROGS:= HANDTEXT:=no hand-tweaked assembly endif CPROGS= c/byte-getchar c/byte-getchar-u c/byte-4k \ \ c/space-getchar c/space-getchar-u c/space-4k\ c/space-megabuf c/space-getwchar c/space-getwchar-u \ c/space-32k
[Haskell-cafe] [4/16] SBM: How to use the Makefile (how to run benchmarks etc.)
Introduction Most of the smarts of the benchmark harness is in the Makefile. If you want to rerun the benchmarks (or a single benchmark) or look at the intermediate code for a benchmark or the I/O trace or the memory consumption or the time spent or ... then you use the makefile. There are some support scripts in shell and Perl (and two C programs) that the Makefile uses to do its job. And there are some that you, the user, will want to interact directly with. The benchmarks are only expected to work on Linux. They have been tested on SuSE 8.2 (from 2003, with a 2.4 kernel), Ubuntu 7.04, and Ubuntu 7.10. Quick howto --- make phase1-- compiles, generates test files, measures memory use. Safe to run on a busy machine if there's no active memory pressure. make phase2-- timing runs. NOT safe to run on a busy machine. Should be run in runlevel 1 (= no X, no daemons, single-user mode) for best measurements. Outputs report at end. This is where you check the quality of the measurements. If you don't like them, run 'make redophase2' (or delete the .time and .stat files with low quality and run 'make phase2' again.) make zipdata -- make a tarball with all the measurements, suitable for emailing or putting on a website. The Makefile will beep after phase 1 and 2. The above will run a NORMAL run, which is fine during development if you want to see if you nailed a performance bug. It runs reasonably fast (about 43 seconds on my Athlon64). If you want better measurements, you should use: make TESTKIND=THOROUGH phase1 phase2 zipdata This will use a 150MB data file instead of a 15MB one and it will run the timing measurements 6 times (before throwing the first away) instead of 4 times (before throwing the first away). If you don't want to use single-user mode, you can improve the measurements by piping the output to a file (or run the test from the console) instead of involving a terminal and an X server (the screen update may kick in in the middle of a timing run and disturb things if for no other reason than their polluting the CPU caches). Filesystem layout - The benchmarks are in: hs/*.hs c/*.c hand/*.s hand/*.hs and hand/*.c are not compiled. The two *.hs files are the originals from which the tweaked assembly code has been derived. The two *.c files are sketches of how the MMX tweaks work (because MMX code by itself can be a bit off-putting). These are the support scripts: tools/genfiles.pl -- generate the test input files. tools/cutmem.pl tools/cutpid.pl -- both are used to disentangle the outputs of strace and pause-at-end (see below). I combine strace, memory info, and +RTS -sstderr into a single run to save time. This means that things end up in fewer files than I'd like. tools/cut.pl -- cut out main loop from disassembly ('make discut') tools/stat.pl -- looks at all timings for a single benchmark and calculates average and standard deviation and time slack, that is the discrepancy between user+sys and real. It optionally throws away the first run. tools/eatmem.c-- allocates a chunk of memory and makes damn sure it really is in RAM! tools/pause-at-end.c -- part of a hack that copies /proc/self/maps and /proc/self/status to stderr just before a benchmark exits. tools/iosummary.pl-- takes an strace and sums up the I/O tools/genreport.pl-- generate a nice report with bar charts. Takes way too many options in the form of environment variables. tools/regenreport.sh -- regenerates the report from ANY measurement tarball. tools/merge.pl-- merge data from many measurement tarballs, with or without rescaling. Generated files: hs/*.core hs/*.stg hs/*.cmm hs/*.s-- intermediate code hs/*.hi -- Haskell Interface */*.o -- object code */* (the files in $(HSPROGS) $(CPROGS) $(HANDPROGS)) -- programs */*.dis */*.discut-- disassembled programs (and inner loops) */*.doc -- source + intermediate code + inner loops + timings */*.mem -- output from '+RTS -sstderr' + /proc/self/status + /proc/self/maps + output from /usr/bin/time (where the number of minor page faults is most interesting datum) */*.strace-- complete strace, taken together with */*.mem */*.iotrace
[Haskell-cafe] [5/16] SBM: Support scripts and scriptlets
Some of the scripts warrant a closer look. 'make zipdata' creates a nice tarball with all the data necessary to recreate a report AND to merge that report together with other reports, possible with rescaled bar charts. Very handy. All the files in the tarball are inside the 'ghc-measurements/' directory so the risk of things going wrong when unpacking the tarball is less. The names of the benchmarks are put in ghc-measurements/progs, mainly to ensure they end up in the right order when regenerating and merging reports. tools/genreport.pl [list of benchmarks to put in the report] It doesn't parse the command-line in any way because life is too short for command-line parsing. Instead, it is controlled via (too many) environment variables. ASCII - set to avoid using UTF-8 for bar charts and per mille character. NOSRC - the tool normally creates */*.srctimemem files containing the source code for each benchmark with bar charts for time/mem appended to the end. Setting this variable switches that off (necessary when regenerating and merging reports). EXCLUDE - disregard some of the benchmarks on the command line. Why is this necessary? Because it makes regenerating and merging reports easier. And because I was too lazy to filter the command line in tools/regenreport.sh and tools/merge.pl. FINDMAX - used by tools/merge.pl when rescaling. Outputs max time and max mem to stdout instead of the normal report. MAX_FILEWIDTH - used by tools/merge.pl to make merged reports look nice MAX_TIME, MAX_PEAKMEM - used by tools/merge.pl when rescaling Note that strictly speaking, there is a bug in the script(s) because it conflates the width of time/mem measurement represented as numbers (which you always want to take into account when merging) and MAX_TIME/MAX_PEAKMEM (which you only care about when rescaling). [FIXED now - 2007-12-21] tools/regenreport.sh unpacks a measurement tarball into a tmp directory and runs tools/genreport.pl to generate the report. Takes care not to disturb the normal files. tools/merge.pl [tarballs] Uses tools/regenreport.sh on each tarball in turn to generate a report which it reads in and stores on a benchmark-by-benchmark basis. At the end, synthetically combine all the pieces it cut out of the original report(s) into a brand-spanking new, merged report. Even the headers and the platforminfo at the top of each report is cut out and stored in data structures until they get spit out again at the end. The reading magic is in the state machine in gather(). It is not as bad as it looks. Some of the complications arise from marking repeated benchmark names as ' -- ', which improves the readability of the merged reports immensely. Another part of the complications arise due to the fact that not all tarballs contain the exact same benchmarks! Those that don't get a nice 'n/a' instead of numbers and a bar. And finally, the benchmarks should be in the right order. That is trickier than it sounds... When rescaling, tools/regenreport.sh is first run once for each tarball with the FINDMAX environment variable set. This results in tools/regenreport.sh outputting the maximum filename width, time, and peakmem for each tarball. ASCII - use ASCII instead of UTF-8 RESCALE - sometimes you want to rescale and sometimes you don't MAX_FILEWIDTH - if you want to force a specific width MAX_TIME, MAX_PEAKMEM - if you want to force a specific max -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [6/16] SBM: 6.9.20071124 Athlon Duron
This set of measurements was captured by Daniel Fischer on one of his older machines, running SuSE 8.2, which has a Linux 2.4 kernel! The benchmarks were run on 2007-12-16 using ghc 6.9.20071124. Unfortunately, the ghc version is not quite the same as the one I've used for most measurements (6.9.20071119) so things may be a little different just for that reason. The results seemed a bit off at first, but now that I have graphs of all the runs on all the machines they don't seem strange at all. First of all, the memory use is about the same as on the other machines. Secondly, the timing differences for the C getchar/getwchar might be partly due to different versions of the C library. The remaining differences (a steeper profile than on the core duo and the Athlon64) may be due to different microarchitectures. -Peter Fischer's machine ghc 6.9.20071124 AMD Duron(tm) processor 1200.089 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting)std avg dev slack hs/byte-bsacc:1.892 21â° 0.4 âââ | hs/byte-bsfoldlx: 2.258 3â° 0.1 ââââ | hs/byte-bsfoldrx: 2.933 0â° 0.1 âââââ | hs/byte-bsl---acc: 14.319 45â° 0.1 âââââââââââââââââââââ| hs/byte-x-acc-1: 20.915 17â° 4.0 ââââââââââââââââââââââââââââââ | hs/byte-x-acc-2: 20.691 8â° 0.1 ââââââââââââââââââââââââââââââ | hs/byte-x-foldl: 20.610 5â° 1.4 ââââââââââââââââââââââââââââââ | c/byte-getchar: 9.042 0â° 0.1 âââââââââââââ| c/byte-getchar-u: 1.314 3â° 0.2 ââ | c/byte-4k:0.419 5â° 0.5 â | Memory:Peak --- KB hs/byte-bsacc: 147492 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldlx:147492 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldrx:147488 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsl---acc: 2896 â | hs/byte-x-acc-1: 1612 â | hs/byte-x-acc-2: 1612 â | hs/byte-x-foldl: 1612 â | c/byte-getchar: 384 â | c/byte-getchar-u: 384 â | c/byte-4k: 380 â | Time (space counting) std - avg dev slack hs/space-bs-c8-acc-1: 2.467 1â° 0.3 ââââ | hs/space-bs-c8-foldlx-1: 2.585 2â° 0.1 ââââ | hs/space-bs-c8-foldlx-2: 2.576 2â° 0.3 ââââ | hs/space-bs-c8-foldrx:2.982 8â° 2.3 âââââ | hs/space-bs-c8-lenfil:2.599 1â° 0.2 ââââ | hs/space-bslc8-acc-1:15.228 8â° 0.1 ââââââââââââââââââââââ | hs/space-bslc8-acc-2:15.855 38â° 0.0 âââââââââââââââââââââââ | hs/space-bslc8-acc-3:14.980 14â° 0.0 ââââââââââââââââââââââ | hs/space-bslc8-chunk-1: 2.443 2â° 0.2 ââââ | hs/space-bslc8-chunk-2: 2.449 1â° 0.3 ââââ | hs/space-bslc8-chunk-3: 2.534 3â° 0.3 ââââ | hs/space-bslc8-foldl: 2.938 1â° 0.2 âââââ | hs/space-bslc8-foldlx-1: 2.928 1â° 0.0 âââââ | hs/space-bslc8-foldlx-2: 2.937 2â° 0.2 âââââ | hs/space-bslc8-foldr-1: 4.043 6â° 0.1 ââââââ | hs/space-bslc8-foldr-2: 4.007 4â° 0.1 ââââââ | hs/space-bslc8-lenfil-1: 3.240 1â° 0.2 âââââ
[Haskell-cafe] [8/16] SBM: 6.9.20071119 Pentium III
This set of measurements was captured by me on my old Compaq Armada E500 from around the year 2000. It has an Intel Coppermine Pentium III running at 600 MHz with 384 MB RAM and running Ubuntu 7.10. The benchmarks were run today (2007-12-21) using ghc 6.9.20071119 (freshly downloaded and reinstalled) in runlevel 1 (single-user mode). I did leave the wireless card in, which might have produced some interrupts. Still, the quality of the timings proved to be good, with a standard deviation of at most 1.2% of the average run-time. This run includes the hand-tweaked assembly benchmarks (as does the Athlon64 run in the next email). Note how much less the assembler tweaks help here (until one gets down to the wicked MMX tweaks) compared to the situation on the Athlon64 in the next mail. They still help, though. -Peter ls-search ghc 6.9.20071119 Pentium III (Coppermine) 596.932 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting) std avg dev slack hs/byte-bsacc: 3.274 1â° 0.1 âââ | hs/byte-bsfoldlx: 4.027 0â° 0.0 ââââ | hs/byte-bsfoldrx: 4.184 1â° 0.0 ââââ | hs/byte-bsl---acc:28.005 10â° 0.0 ââââââââââââââââââââââââââ | hs/byte-x-acc-1: 25.852 4â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-acc-2: 25.622 5â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-foldl: 25.803 5â° 0.0 ââââââââââââââââââââââââ | hand/byte-bsacc-a: 3.511 1â° 0.0 ââââ | hand/byte-bsacc-b: 1.998 2â° 0.1 ââ | hand/byte-bsacc-c: 1.876 2â° 0.1 ââ | hand/byte-bsacc-d: 1.876 1â° 0.1 ââ | c/byte-getchar: 13.016 0â° 0.0 ââââââââââââ| c/byte-getchar-u: 1.662 1â° 0.1 ââ | c/byte-4k: 0.543 2â° 0.2 â | Memory: Peak --- KB hs/byte-bsacc:147752 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldlx: 147756 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldrx: 147760 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsl---acc: 3180 â | hs/byte-x-acc-1:1916 â | hs/byte-x-acc-2:1912 â | hs/byte-x-foldl:1912 â | hand/byte-bsacc-a:147772 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-b:147776 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-c:147772 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-d:147776 ââââââââââââââââââââââââââââââââââââââââ | c/byte-getchar: 436 â | c/byte-getchar-u:432 â | c/byte-4k: 436 â | Time (space counting)std -avg dev slack hs/space-bs-c8-acc-1: 4.318 1â° 0.0 ââââ | hs/space-bs-c8-count: 3.118 1â° 0.1 âââ | hs/space-bs-c8-foldlx-1: 4.631 1â° 0.0 âââââ | hs/space-bs-c8-foldlx-2: 4.632 1â° 0.0 âââââ | hs/space-bs-c8-foldrx: 4.678 0â° 0.0 âââââ | hs/space-bs-c8-lenfil: 4.634 1â° 0.1 âââââ | hs/space-bslc8-acc-1: 32.733 7â° 0.0 ââââââââââââââââââââââââââââââ |
[Haskell-cafe] [9/16] SBM: 6.9.20071119 Athlon64
This set of measurements was captured by me on my slightly old noname 2GHz Athlon64 3000+ (in 32-bit mode on a 32-bit kernel). It has 1GB RAM and runs Ubuntu 7.04. The benchmarks were run two days ago (2007-12-19) using ghc 6.9.20071119 in runlevel 1 (single-user mode). I did leave the network cable in, which might have produced some interrupts (not very likely -- 1) why would it run in promiscuous mode and 2) who broadcasts apart from my wired/wireless access point which broadcasts a packet per second?). Still, the quality of the timings proved to be good, with a standard deviation of at most 2.7% of the average run-time. This run includes the hand-tweaked assembly benchmarks (as does the Pentium III run in the previous email). Note how much the assembler tweaks help! -Peter charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting)std avg dev slack hs/byte-bsacc:0.705 7â° 0.1 ââââ | hs/byte-bsfoldlx: 1.002 1â° 0.5 ââââââ | hs/byte-bsfoldrx: 1.112 2â° 0.1 ââââââ | hs/byte-bsl---acc:2.595 14â° 0.1 ââââââââââââââ | hs/byte-x-acc-1: 4.436 5â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-acc-2: 4.473 8â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-foldl: 4.412 10â° 0.0 âââââââââââââââââââââââ | hand/byte-bsacc-a:0.639 2â° 0.2 ââââ | hand/byte-bsacc-b:0.414 2â° 0.5 âââ | hand/byte-bsacc-c:0.414 3â° 0.2 âââ | hand/byte-bsacc-d:0.415 3â° 0.2 âââ | c/byte-getchar: 2.422 27â° 0.1 âââââââââââââ| c/byte-getchar-u: 0.632 3â° 0.3 ââââ | c/byte-4k:0.094 26â° 2.2 â | Memory: Peak --- KB hs/byte-bsacc:147752 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldlx: 147748 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsfoldrx: 147744 ââââââââââââââââââââââââââââââââââââââââ | hs/byte-bsl---acc: 3172 â | hs/byte-x-acc-1:1904 â | hs/byte-x-acc-2:1904 â | hs/byte-x-foldl:1900 â | hand/byte-bsacc-a:147764 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-b:147764 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-c:147760 ââââââââââââââââââââââââââââââââââââââââ | hand/byte-bsacc-d:147760 ââââââââââââââââââââââââââââââââââââââââ | c/byte-getchar: 440 â | c/byte-getchar-u:440 â | c/byte-4k: 440 â | Time (space counting) std - avg dev slack hs/space-bs-c8-acc-1: 1.145 1â° 0.2 ââââââ | hs/space-bs-c8-count: 0.521 1â° 0.2 âââ | hs/space-bs-c8-foldlx-1: 1.221 1â° 0.1 âââââââ | hs/space-bs-c8-foldlx-2: 1.219 2â° 0.2 âââââââ | hs/space-bs-c8-foldrx:1.172 2â° 0.3 âââââââ | hs/space-bs-c8-lenfil:1.223 1â° 0.2 âââââââ | hs/space-bslc8-acc-1: 3.388 10â° 0.1 ââââââââââââââââââ | hs/space-bslc8-acc-2: 3.386 5â° 0.1
[Haskell-cafe] [10/16] SBM: Graphs for 6.9.x across four cpus
This is what you get if you merge the previous four reports (and filter out the hand-tweaked assembly benchmarks). I generated the report with the following command: EXCLUDE='^hand/' \ tools/merge.pl \ ghc-armada-thorough-6.9.tgz\ ghc-fischer-thorough-6.9.tgz \ ghc-albatross-thorough-6.9.tgz \ ghc-thorough-6.9.tgz \ xx This graph shows the memory usage to be (almost) exactly the same. The difference can easily be explained with slightly different versions of the C library (there was also a security update or two from Ubuntu in the last few days - the kernel was definitely updated and I think the C library was too). We are also talking about two versions of Ubuntu and one (old) version of SuSE. And we are talking about three not quite identical versions of ghc. The fischer and albatross runs were made before the inclusion of the three benchmarks from Don Stewart (hs/space-bs-c8-count, hs/space-bslc8-count, and hs/space-bslc8-chunk-4) so they are a couple of holes in the graphs. The speed pattern is more fun. It makes no sense to compare absolute times here so the graphs were not rescaled. One would naïvely expect the bars to be about the same when taken in groups of four but that really turns out to be far from the case! The explanation is most likely that we are looking at four very different microarchitectures. ** This should hammer home the point that benchmarking on any single machine isn't enough! ** Two probably aren't enough either... -Peter ls-search ghc 6.9.20071119 Pentium III (Coppermine) 596.932 MHz TESTKIND=THOROUGH SUFFIX= Fischer's machine ghc 6.9.20071124 AMD Duron(tm) processor 1200.089 MHz TESTKIND=THOROUGH SUFFIX= albatross ghc 6.9.20071208 (or thereabouts) Genuine Intel(R) CPU T2300 @ 1.66GHz (Core Duo) 1667.000 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting)std avg dev slack hs/byte-bsacc:3.274 1â° 0.1 ââââ | -- 1.892 21â° 0.4 âââ | -- 0.918 4â° 0.1 ââââââ | -- 0.705 7â° 0.1 ââââ | hs/byte-bsfoldlx: 4.027 0â° 0.0 ââââ | -- 2.258 3â° 0.1 ââââ | -- 1.014 2â° 0.2 ââââââ | -- 1.002 1â° 0.5 ââââââ | hs/byte-bsfoldrx: 4.184 1â° 0.0 ââââ | -- 2.933 0â° 0.1 âââââ | -- 0.999 2â° 0.1 ââââââ | -- 1.112 2â° 0.1 ââââââ | hs/byte-bsl---acc: 28.005 10â° 0.0 âââââââââââââââââââââââââââ | -- 14.319 45â° 0.1 âââââââââââââââââââââ| -- 1.957 1â° 0.1 ââââââââââââ | -- 2.595 14â° 0.1 ââââââââââââââ | hs/byte-x-acc-1: 25.852 4â° 0.0 âââââââââââââââââââââââââ | -- 20.915 17â° 4.0 ââââââââââââââââââââââââââââââ | -- 3.591 2â° 0.1 ââââââââââââââââââââââ | -- 4.436 5â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-acc-2: 25.622 5â° 0.0 âââââââââââââââââââââââââ | -- 20.691 8â° 0.1 ââââââââââââââââââââââââââââââ | -- 3.598 4â° 0.1 ââââââââââââââââââââââ | -- 4.473 8â° 0.0 ââââââââââââââââââââââââ | hs/byte-x-foldl: 25.803 5â° 0.0 âââââââââââââââââââââââââ | -- 20.610 5â° 1.4
[Haskell-cafe] [11/16] SBM: Graphs for hand-tweaked assembly benchmarks
This report compares the hand-tweaked assembly programs with the original untweaked programs on two vastly different microarchitectures. This is the command I ran to generate the report: EXCLUDE='(|-bsl|chunk|count|acc-[23]|fold|lenfil|^c/)' \ tools/merge.pl \ ghc-armada-thorough-6.9.tgz \ ghc-thorough-6.9.tgz\ xx I cut out the memory sections manually since we've already seen them and inserted a few newlines for grouping purposes. The first one should note is that not all tweaks are better than the originals! The second is that the sequence of tweaks is not quite monotonically decreasing in run-time. The improvements don't really start until -e on the Athlon64 and -f on both. Not until then have the load pressure been sufficiently relieved on the L1 cache that the code actually runs faster. Note also how the two microarchitectures seem to have plateaus in different places. The Athlon64 seems to have the number 3 built into its silicon (efg, jkl, mno) which fits very well with what we know about it from AMD's documentation (the front end splits the instructions up into smaller pieces which then get distributed to three different pipelines, each with its own out-of-order execution engine). The Pentium III seems to have trouble with the simple MMX code but does very well with the more advanced MMX code that keeps 8 space counters in a single MMX register for many iterations. The code I used to add those counters horizontally is the same in both -q and -r. Perhaps operations on both MMX and normal registers are slow? Loop unrolling (-s) doesn't seem to matter, in this case. -Peter ls-search ghc 6.9.20071119 Pentium III (Coppermine) 596.932 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting)std avg dev slack hs/byte-bsacc:3.274 1â° 0.1 âââââââââââââââââââââââââââ | -- 0.705 7â° 0.1 ââââââââââââââââââââââ | hand/byte-bsacc-a:3.511 1â° 0.0 âââââââââââââââââââââââââââââ | -- 0.639 2â° 0.2 ââââââââââââââââââââ | hand/byte-bsacc-b:1.998 2â° 0.1 âââââââââââââââââ| -- 0.414 2â° 0.5 âââââââââââââ| hand/byte-bsacc-c:1.876 2â° 0.1 ââââââââââââââââ | -- 0.414 3â° 0.2 âââââââââââââ| hand/byte-bsacc-d:1.876 1â° 0.1 ââââââââââââââââ | -- 0.415 3â° 0.2 âââââââââââââ| Time (space counting) std - avg dev slack hs/space-bs-c8-acc-1: 4.318 1â° 0.0 ââââââââââââââââââââââââââââââââââââ | -- 1.145 1â° 0.2 ââââââââââââââââââââââââââââââââââââ | hand/space-bs-c8-acc-1-a: 4.318 1â° 0.0 ââââââââââââââââââââââââââââââââââââ | -- 1.177 2â° 0.3 âââââââââââââââââââââââââââââââââââââ| hand/space-bs-c8-acc-1-b: 4.331 1â° 0.0 ââââââââââââââââââââââââââââââââââââ | -- 1.104 1â° 0.2 ââââââââââââââââââââââââââââââââââ | hand/space-bs-c8-acc-1-c: 4.492 1â° 0.1 âââââââââââââââââââââââââââââââââââââ| -- 1.207 1â° 0.3 âââââââââââââââââââââââââââââââââââââ| hand/space-bs-c8-acc-1-d: 4.354 1â° 0.0 ââââââââââââââââââââââââââââââââââââ | -- 1.191 1â° 0.2 âââââââââââââââââââââââââââââââââââââ| hand/space-bs-c8-acc-1-e: 4.424 0â° 0.1 âââââââââââââââââââââââââââââââââââââ| -- 0.937 1â° 0.2
[Haskell-cafe] [14/16] SBM: Behind the measurements (rationale)
// I am getting sick and tired of working on this project and it's probably // better to get it fired off than polishing it any further. // // This email could benefit from being rewritten from a rough draft into a // well-crafted letter but that would take a couple of hours. // // So here it is, a lot rougher than I'd like -- but it *IS* :) why so big input files? easiest way to spot non-linearity and bad memory behaviour. Anyway, files should be big enough to overflow caches and kick the gc in. (short files interesting, too, but the big ones cause more complex behaviour of run-time system and CPU. If complex behaviour is behaved then simple behaviour probably is too -- but can still get its constant factors improved. If complex behaviour bad, then shouldn't that be fixed in any case?) waitpid4() has a struct w/ info about the child program's resource usage. unfortunately, the peakrss field is not filled in. Seems to be a general Unix problem. I've seen complaints on the net that Solaris doesn't fill it in, either. Other solution needed. pause-at-end, /proc/self/maps + /proc/self/status. VmmHWM = peak of VmmRSS, which is Resident (working) Set Size. It doesn't say what is shared with other processes or the operating system, though. In our case, we don't expect to share anything but some libraries -- which nobody else wants to share with us anyway (except for the C library). We are the only user of them. Discovered about a week ago that I could probably have used waitid() w/ WNOWAIT flag but didn't know. Was quick to write pause-at-end, anyway. It took about 15 minutes from the desire to know the peak memory use to having written and tested the first cut of it. Pause-at-end not completely bullet-proof in case of dyn libraries that get unloaded before the end of the program has been reached. On the other hand, plenty good enough for these tests + can conceivably allow more intricate poking around than waitid() solution. getting good measurements - eatmem, dd, probably should also dd library. good to have a sacrificial run. Good to measure how good the measurements are (rel. std.dev + user/sys/real check). why average -- disturbances are mostly interrupts, daemons that everybody have anyway, slightly luckier/unluckier physical pages. These are real effects that nobody can control anyway. I'm not interested in the best possible times on an ideal, undisturbed machine with a helpful kernel. I'm interested in clean times under realistic circumstances. Therefore average instead of minimum. why I use real and not user/sys -- handling of blocking reads vs. mmap vs. madvise/fadvise vs. reading in separate thread in the future. User+sys would probably give me better numbers at the moment and I could change to real later. Still, I choose to stay with real (and the difference is marginal, anyway). Funny that the exact distribution of time between sys and user fluctuates a lot. In space-bslc8-lenfil-2 sys varies between 0.160s and 0.244s. Real is completely stable with 5x 1.396s and 1x 1.397s. look at /proc/interrupts, perhaps copy before/after to .intr? Warn if more than 100 (or 1000) Hz + 10%? write date/time + runlevel to platforminfo and/or sysinfo. barcharts why barcharts. should the time/mem barcharts be equal length? don't think so (hard to colour them in a text file. Would work with less -r and the console but not in an email or a text editor. Visual difference is good). But should perhaps not be /that/ different. visible markers if measurements bad. (5% real/user/sys check, typically within 0.1% on old laptop when doing a quick or thorough benchmark. Occasionally up to 1% - and 3% on c/byte-4k because it only takes 56ms in total.) prints out how tight the user/sys/real thing is. microarchitecture -- performance counters. Would be interesting to look at once the obvious performance problems have been handled. Let's fix the memory usage of bytestrings, the performance of lazy bytestrings, and start using registers in the machine code first. regularity of input file probably means that branch predictor on all three CPUs can remember pattern of spaces vs. non-spaces (or at least part of the pattern). Branch predictors not only use two-bit saturating counter for strongly non-taken/weakly non-taken/weakly taken/strongly taken. They also try to remember the pattern of jumps/non-jumps. A more realistic test would have less regular input file. This effect is very small given the current performance limiters, though. cache -- turned out to be pretty regular (by eyeballing cachegrind reports). Go up a factor of 10 in filesize and the number of access also went up a factor of 10. The miss ratios stayed the same. The miss ratios differed a bit between the benchmarks but I don't think it's
[Haskell-cafe] [12/16] SBM: Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+
This report combines the measurements from 7 ghc/bytestring combinations on a single machine, namely: ghc 6.6.1 ghc 6.8.2 ghc 6.8.2 + bytestring 0.9.0.2 ghc 6.9.20071119 ghc 6.9.20071119 + bytestring 0.9.0.2 ghc head-as-of-noon-2007-12-19 ghc head-as-of-noon-2007-12-19 + bytestring 0.9.0.2 This is the command I ran to generate the report: RESCALE=1 EXCLUDE='(^hand/|^c/)' \ tools/merge.pl \ ghc-thorough-6.6.1.tgz \ ghc-thorough-6.8.2.tgz ghc-thorough-6.8.2-bs0902.tgz \ ghc-thorough-6.9.tgz ghc-thorough-6.9-bs0902.tgz \ ghc-thorough-head.tgz ghc-thorough-head-bs0902.tgz \ mergerep-amdy.txt We can readily see that there are some memory leaks with ghc 6.9.x. We can also see that Don Stewart's fix in bytestring 0.9.0.2 actually fixed things for hs/space-bslc8-lenfil-1 on ghc 6.8.2 but only improved things slightly for 6.9.x. It also makes things worse for hs/space-bs-c8-lenfil on 6.8.2 and both the 6.9.x's. Lazy bytestrings are slow, when used manually, but quite acceptable, memory- wise. On the other hand, they cost dearly in memory on 6.9.x unless one does it all by hand. Good old native strings seem to be the most predictable of the bunch but they are a bit slow. Surprisingly, they are not always /the/ slowest option measured! hs/space-bslc8-acc-1 and hs/space-bslc8-acc-2 are actually slower on ghc 6.6.1 than the native strings are in all combinations except hs/space-x-lenfil. -Peter PS: Note that this is the same report that I sent yesterday, except that it had the order of some of the benchmarks slightly wrong. charybdis ghc 6.6.1 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.8.2 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.8.2 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX=-bs0.9.0.2 charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX=-bs0.9.0.2 charybdis ghc 6.9.20071217 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.9.20071217 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX=-bs0.9.0.2 Time (byte counting) std avg dev slack hs/byte-bsacc: 1.020 40â° 0.0 ââââââ | -- 0.703 5â° 0.4 ââââ | -- 0.702 7â° 0.3 ââââ | -- 0.705 7â° 0.1 ââââ | -- 0.712 3â° 0.1 ââââ | -- 0.706 7â° 0.1 ââââ | -- 0.707 7â° 0.9 ââââ | hs/byte-bsfoldlx:0.789 3â° 0.3 ââââ | -- 0.993 2â° 0.2 âââââ | -- 1.102 1â° 0.2 ââââââ | -- 1.002 1â° 0.5 ââââââ | -- 1.112 1â° 0.3 ââââââ | -- 1.024 2â° 0.2 ââââââ | -- 1.111 1â° 0.1 ââââââ | hs/byte-bsfoldrx:0.813 3â° 0.1 âââââ | -- 1.102 2â° 0.3 ââââââ | -- 1.100 1â° 0.1 ââââââ | -- 1.112 2â° 0.1 ââââââ | -- 1.114 1â° 0.4 ââââââ | -- 1.113 2â° 0.5 ââââââ | -- 1.112 1â° 0.2 ââââââ | hs/byte-bsl---acc: 3.599 13â° 0.0 âââââââââââââââââââ | -- 2.609 17â° 0.0 ââââââââââââââ| -- 2.560 15â° 0.1 âââââââââââââ | -- 2.595 14â° 0.1 ââââââââââââââ| -- 2.574 16â° 0.1 âââââââââââââ | -- 2.613 12â° 0.2 ââââââââââââââ
[Haskell-cafe] [13/16] SBM: Graphs that show the infidelity of -sstderr
Everybody seems to use +RTS -sstderr to measure the memory performance of their Haskell (ghc) code. That's what I started out doing, too. However, top seemed to occasionally disagree with the run-time system. I decided to dig deeper and ask the operating system at process exit time. I use LD_PRELOAD to get my code injected into each benchmarked process where it gets activated when the process is about to exit and copies /proc/self/maps and /proc/self/status to stderr where the test harness captures them. Unfortunately, -sstderr really *IS* unreliable! tools/regensstderr.sh and tools/sstderr.pl are two scripts that together generate bar graphs for an existing measurement tarball. I used the following shell code to generate graphs for 7 different ghc/bytestring combinations: (for F in ghc-thorough-6.6.1.tgz \ ghc-thorough-6.8.2.tgz ghc-thorough-6.8.2-bs0902.tgz \ ghc-thorough-6.9.tgz ghc-thorough-6.9-bs0902.tgz \ ghc-thorough-head.tgz ghc-thorough-head-bs0902.tgz; \ do \ printf %s\n---\n $F; \ tools/regensstderr.sh $F; \ printf \n\n; \ done \ ) xx The graphs clearly show that -sstderr sometimes severely underestimates the memory use and sometimes somewhat overestimates it. Moreover, the errors are not always for the same programs. This holds for all ghc versions I've tested: 6.6.1, 6.8.2, 6.9.20071119, and head as of noon 2007-12-19. -Peter ghc-thorough-6.6.1.tgz --- hs/byte-bsacc: 1 147448 â |âââââââââââââââââââ | hs/byte-bsfoldlx:145 147468 âââââââââââ |âââââââââââââââââââ | hs/byte-bsfoldrx: 1 147452 â |âââââââââââââââââââ | hs/byte-bsl---acc: 2 2872 â|â | hs/byte-x-acc-1: 1 1572 â|â | hs/byte-x-acc-2: 1 1568 â|â | hs/byte-x-foldl: 1 1572 â|â | hs/space-bs-c8-acc-1: 1 147448 â |âââââââââââââââââââ | hs/space-bs-c8-count: 1 147448 â |âââââââââââââââââââ | hs/space-bs-c8-foldlx-1: 145 147468 âââââââââââ |âââââââââââââââââââ | hs/space-bs-c8-foldlx-2: 145 147468 âââââââââââ |âââââââââââââââââââ | hs/space-bs-c8-foldrx: 1 147448 â |âââââââââââââââââââ | hs/space-bs-c8-lenfil: 145 147464 âââââââââââ |âââââââââââââââââââ | hs/space-bslc8-acc-1: 2 2872 â|â | hs/space-bslc8-acc-2: 2 2872 â|â | hs/space-bslc8-acc-3: 2 2868 â|â | hs/space-bslc8-count: 2 1476 â|â | hs/space-bslc8-foldl: 2 1552 â|â | hs/space-bslc8-foldlx-1: 2 1556 â|â | hs/space-bslc8-foldlx-2: 2 1552 â|â | hs/space-bslc8-foldr-1: 297 169480 âââââââââââââââââââââ|âââââââââââââââââââââ| hs/space-bslc8-foldr-2: 297 169480 âââââââââââââââââââââ|âââââââââââââââââââââ| hs/space-bslc8-lenfil-1: 38 38632 âââ |âââââ | hs/space-bslc8-lenfil-2: 38 38628 âââ |âââââ | hs/space-bsl---foldlx: 2 1560 â|â | hs/space-x-acc-1: 1 1568 â|â | hs/space-x-acc-2: 1 1572 â|â | hs/space-x-foldl: 1 1572 â|â | hs/space-x-lenfil: 1 1556 â|â | c/byte-getchar:- - c/byte-getchar-u: - - c/byte-4k: - - c/space-getchar: - - c/space-getchar-u: - - c/space-4k:- - c/space-megabuf: - - c/space-getwchar: - - c/space-getwchar-u:- - c/space-32k: - - c/space-32k-8: - -
[Haskell-cafe] [16/16] SBM: Discussion and Conclusion
General --- Bytestrings are faster than haskell strings. This is good and expected. But their memory use, ouch! There seems to be a grave memory bug in 6.9. Lazy bytestrings are slower than strict bytestrings -- this was completely unexpected for me. The strictness analyzer works *very* well. It rarely made a difference to mark arguments as strict (and when it did, it was very easy to use). I/O --- The I/O patterns seen in the various programs are: 1) strict bytestrings = read increasingly large chunks (they double every time). I don't know if the already-read data have to be copied around or if the memory allocator can handle extending existing blocks (if they are at the front of the heap, say) like realloc() can. Even if the blocks do get copied around, that's not where the performance limiter is (speed-wise). It does seem to be a bit unwise in terms of memory pressure. Does the gc kick in every time the heap has to expand? Does it do a full collection then? If no other allocations have happened since the last collection than the allocation that caused the heap to expand, then perhaps skip the collection? (or some criteria similar to that) 2) lazy bytestrings = read blocks of 32K-8 bytes, as expected. The C benchmarks show that there's no penalty for reading 32K-8 vs. 32K. 3) native strings = read blocks 8K. The C benchmarks show that it barely matters if the block size is 4K, 32K, or 32K-8 bytes. In any case, the small differences due to block size are completely in the noise. Reading very large blocks, though, as the strict bytestrings do, actually has a cost (38-70% slower, depending on the CPU and kernel and RAM and busspeeds, etc -- see email 10 in the series). It is still peanets compared for almost all the benchmarks. Backend --- The backend turns out to be rather bad. It is generally very easy to improve the generated code by hand without doing anything in terms of sophisticated analysis. One can rewrite inner loops using heroic code (load four characters at a time together into a 32-bit register or even use MMX to handle eight characters in parallel) but it isn't really necessary to do that to gain a good improvement and start being competitive with C. The backend is probably what is costly on some of the lazy bytestring and native string code because there are too many tests to see if the buffer has been exhausted (I think). Some simple hoisting and common subexpression elimination would go far here. For the simpler strict bytestring benchmarks, heroic backend optimizations are not necessary because they give less of an improvement over merely competent register use than sensible I/O and buffer management (cache effects start to be important once the compiler generates sensible code). I would have liked to examine backend performance further by playing with the generated C-- code but unfortunately, the C-- that ghc emits won't compile. Dissecting the generated machine code for the more complicated benchmarks was so hard that I skipped it -- I can't even recognize the labels! It's one thing if the backend generates lots of extra labels but another if the expected labels simply are not there! (this is why tools/cut.pl complains that it can't find the main loop for some of the programs -- if the */*.discut file is unexpectedly short, then that's the reason.) Quality of measurements --- The speed measurements are of extremely high quality, much more than it turns out is needed at this moment. I probably spent a bit too much effort there. The memory measurements are unusual. I don't recall having seen anything like them elsewhere because one seldom has to work with uncooperative programs when benchmarking (the preloading trick works with any program, as long as it is dynamically linked on Linux and uses the standard loader). They are probably the biggest single contribution from this work, together with the visualization provided by the bar charts. Thoroughness It really is necessary to benchmark more than one ghc/library combination, otherwise the bad memory bugs wouldn't stand out so clearly or I would have believed, as Don Stewart did, that he had fixed the memory allocation bug. It also matters what microarchitectures one use, as pointed out by email 10 in this series. And you gotta have many benchmarks, otherwise you might not really be measuring what you think you measure. I could probably have gotten away with fewer benchmarks in this case but that would have left a nagging feeling at the back of my head that there was a benchmark I /could/ have included that would have showed up a spectacular performance bug. Using so big input files (150MB) made the programs run slow enough that the timer granularity (1ms) was small compared to their runtime. It also made the memory performance problems really stand out (and be unignorable). I think that was a good idea.
Re: [Haskell-cafe] Why does this blow the stack?
On 12/22/07, David Benbennick [EMAIL PROTECTED] wrote: On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote: Because they simply aren't the same. Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program: In fact, it's possible to get strictness to avoid stack overflow, and still have laziness to allow undefined: myEnumFrom :: Integer - [Integer] myEnumFrom a = map (a+) $ enumDeltaIntegerStrict 0 1 where enumDeltaIntegerStrict x d = x `seq` x : enumDeltaIntegerStrict (x+d) d then *Main (myEnumFrom 42) !! (10^6) 142 *Main length $ take 10 $ myEnumFrom undefined 10 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Applying a Dynamic function to a container of Dynamics
Alfonso Acosta wrote: dynApp allows to apply a Dynamic function to a Dynamic argument: dynApp :: Dynamic - Dynamic - Dynamic I don't seem to find a way (without modifying Data.Dynamic itself) to code this function This is not very difficult if we have a well-delineated (and still infinite) type universe: to be precise, if we know the signature of our type terms. That is usually a reasonable assumption. The trick is to build a function that is an inverse of typeOf: given a TypeRep, we wish to find its representative, a value whose type has the given TypeRep. We use the result of this inverse function (to be called reflect) when applying fromDynamic. We know that writing of reflect is possible because we know the syntax of the type language (by assumption) and because Haskell luckily is inconsistent as a logic and so every type is populated. The code below borrows most of the machinery from the type-checker/typed-compiler http://okmij.org/ftp/Haskell/staged/IncopeTypecheck.hs which may be entitled 'How to program hypothetical proofs', as the long title comments explain. The above code essentially implements dynApp, which is called there typecheck_app. We adapt that code below. We assume that our type universe is described by the following syntax data Type = Bool | Int | Type - Type Regarding the previous problem: the following bit -- it is important to give the signature to (,) below: we pack the cons -- function of the right type! cons :: forall a b. (Typeable a, Typeable b) = Signal a - Signal b - Signal (a,b) cons (Signal sig1) (Signal sig2) = Signal (PrimSignal (Cons (toDyn ((,)::a-b-(a,b))) sig1 sig2)) was indeed essential. One of the main lessons of our APLAS paper was the realization that things become significantly simpler if we do more work at the value production site. We could not have implemented typed Partial Evaluation or typed CPS so simply. The latter usually considered to require significantly complex type systems, beyond the reach of the mainstream languages. Incidentally, that paper promotes a different way of building typed DSL (the final tagless way). http://okmij.org/ftp/papers/tagless-final-APLAS.pdf http://www.cs.rutgers.edu/~ccshan/tagless/talk.pdf http://www.cs.rutgers.edu/~ccshan/quote/language.pdf I was thinking that perhaps that method might be beneficial to your application. I don't know your DSL language well enough to be able to tell though... {-# OPTIONS -fglasgow-exts -W #-} module DA where import Data.Typeable import Data.Dynamic data Dyn = forall a. Typeable a = Dyn a -- Check to see if a term represents a function. If so, -- return terms that witness the type of the argument and of the body reflect_fn :: TypeRep - Maybe (Dyn, Dyn) reflect_fn tfun | (con,[arg1,arg2]) - splitTyConApp tfun, con == arrowTyCon = Just (reflect arg1, reflect arg2) reflect_fn _ = Nothing arrowTyCon = typeRepTyCon (typeOf (undefined::Int-Int)) -- reflect typerep to a type (witness). The inverse of typeOf. reflect :: TypeRep - Dyn reflect x | x == typeOf (undefined::Int) = Dyn (undefined::Int) reflect x | x == typeOf (undefined::Bool) = Dyn (undefined::Bool) reflect x | Just (Dyn e1, Dyn e2) - reflect_fn x = let mkfun :: a - b - (a-b); mkfun = undefined in Dyn (mkfun e1 e2) -- the re-implementation of dynApply mydynApply :: Dynamic - Dynamic - Maybe Dynamic mydynApply e1 e2 | let tfun = dynTypeRep e1, let targ = dynTypeRep e2, Just tres - funResultTy tfun targ, Dyn a - reflect targ, Dyn b - reflect tres, Just e1' - fromDynamic e1, Just e2' - fromDynamic e2 `asTypeOf` (Just a) = return $ toDyn (e1' e2' `asTypeOf` b) mydynApply e1 e2 = fail $ unwords [Bad App, of types ,show (dynTypeRep e1),and, show (dynTypeRep e2)] test1 :: Maybe Bool test1 = mydynApply (toDyn not) (toDyn False) = fromDynamic -- Just True test2 :: Maybe Bool test2 = mydynApply (toDyn not) (toDyn (1::Int)) = fromDynamic -- Nothing test3 :: Maybe Int test3 = mydynApply (toDyn ((+)::Int-Int-Int)) (toDyn (1::Int)) = (\f - mydynApply f (toDyn (2::Int))) = fromDynamic -- Just 3 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Functions are first class values in C
Let me show you an example to prove it. The example is limited to composition of unary functions defined on int u::Int-Int v::Int-Int o ::(Int-Int)-(Int-Int)-(Int-Int) o u v= \x-u(v(x)) #include stdio.h #include conio.h #include functional.h int f1(int x){ return x+x; } int f2(int x){ return 2*x; } int g1(int x){ return x+1; } int g2(int x){ return x-1; } #define P1 P0 int main(){ #define P2 P1 printf(%d,%d,%d\n,2, #define P3 O(f1,f2,P2)(2), #define P4 O(g1,g2,P3)(3)); #define P5 P4 getch(); #define P6 P5 } MAIN P6 Here is the file functional.h #define FUNC2(x,y) x##y #define FUNC1(x,y) FUNC2(x,y) #define FUNC(x) FUNC1(x,__COUNTER__) #define COMP(c,f,g,p) \ int c (int x) { return f(g(x)); }; \ p \ c #define O(f,g,p) COMP( FUNC(a), f, g, p) #define P0 #define MAIN ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell performance
G'day all. Quoting Jon Harrop [EMAIL PROTECTED]: I would recommend adding: 1. FFT. 2. Graph traversal, e.g. nth-nearest neighbor. I'd like to put in a request for Pseudoknot. Does anyone still have it? Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
Cristian Baboi wrote: Let me show you an example to prove it. That's not C. That's the C preprocessor, which is a textual substitution macro language. Macros certainly aren't first class (you can't pass a macro to a function, only its expansion). C does support function pointers, which are something like first class functions. The main things C lacks which people associate with true first-class function is: The ability to construct anonymous/local functions. The ability to capture local variables and return a function with some variables bound. The ability to write type-safe functions with polymorphic arguments. Jules ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
Cristian Baboi wrote: Let me show you an example to prove it. This reminds me of arguments in the late 80s and early 90s that C could do OO programming via function pointers, so there was no need for OO languages. Paul. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Functions are first class values in C
Actually, the C/C++ programmers I know think nothing can be done with functional programming languages because they are so used the concept of a C function, that they think that Haskell functions are just that, functions like in C. C# does not have function pointers, only the concept of a delegates (which was copied from Delphi's closures, the official name?), and that is much closer to FP's functions If people would see that Haskell functions are *not* that silly kind of C functions, but actually little objects that *can* carry immutable state, they would look differently at functional programming languages I guess. So I think the word function means different things depending to which community you talk... Before I knew Haskell, the OO community started to embrace the concept of interfaces more and more (aka purely abstract classes). Furthermore in OO, many bugs are caused IMO by keeping track of mutable state and caches (which is often premature optimization). So for complicated tasks, I tended to use more and more immutable objects, e.g. objects that could be constructed once, but not mutated. And then I noticed that it was often not needed to precompute all the values that got passed to the constructor, so I added C# properties that computed the inner cached values once, lazily. IMO Haskell embraces the above ideas and much more, with the difference it encapsulates these ideas nicely and concisely, so you need only a fraction of the lines of code :) So I guess if an OO programmer gets at this level, he should *not* look at Haskell, because he will not want to go back to his messy imperative programming languages :) Peter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jules Bean Sent: Saturday, December 22, 2007 1:09 PM To: Cristian Baboi Cc: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Functions are first class values in C Cristian Baboi wrote: Let me show you an example to prove it. That's not C. That's the C preprocessor, which is a textual substitution macro language. Macros certainly aren't first class (you can't pass a macro to a function, only its expansion). C does support function pointers, which are something like first class functions. The main things C lacks which people associate with true first-class function is: The ability to construct anonymous/local functions. The ability to capture local variables and return a function with some variables bound. The ability to write type-safe functions with polymorphic arguments. Jules ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] GtkMathView wrapper for Haskell / Gtk2HS?
Did anybody already wrap http://helm.cs.unibo.it/mml-widget so it can be used with Haskell / Gtk2HS? If not, would it be good project for me to learn Haskell's FFI? It involves C++, so I guess that would be harded to wrap? Thanks, Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 14:28:56 +0200, Paul Johnson [EMAIL PROTECTED] wrote: Cristian Baboi wrote: Let me show you an example to prove it. This reminds me of arguments in the late 80s and early 90s that C could do OO programming via function pointers, so there was no need for OO languages. What is Objective C ,if not just C with some syntactic sugar ? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: Re: [Haskell-cafe] Functions are first class values in C
--- Forwarded message --- From: Cristian Baboi [EMAIL PROTECTED] To: Jules Bean [EMAIL PROTECTED] Cc: Subject: Re: [Haskell-cafe] Functions are first class values in C Date: Sat, 22 Dec 2007 15:58:50 +0200 On Sat, 22 Dec 2007 14:09:13 +0200, Jules Bean [EMAIL PROTECTED] wrote: Cristian Baboi wrote: Let me show you an example to prove it. That's not C. That's the C preprocessor, which is a textual substitution macro language. Well, the preprocessor is part of the language in a way. These two come together. Macros certainly aren't first class (you can't pass a macro to a function, only its expansion). In Haskell I cannot pass a function to a function, only its expansion. C does support function pointers, which are something like first class functions. The main things C lacks which people associate with true first-class function is: The ability to construct anonymous/local functions. If you look at the example you will see I've done that. The ability to capture local variables and return a function with some variables bound. If I can construct anonymous functions and constants, I can construct functions with some variables bound. The ability to write type-safe functions with polymorphic arguments. I didn't know this must be a property of first-class functions. C is staticaly typed, so type errors will be detected. Haskell is just C with some syntactic sugar :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 14:55:44 +0200, Peter Verswyvelen [EMAIL PROTECTED] wrote: Before I knew Haskell, the OO community started to embrace the concept of interfaces more and more (aka purely abstract classes). Furthermore in OO, many bugs are caused IMO by keeping track of mutable state and caches (which is often premature optimization). So for complicated tasks, I tended to use more and more immutable objects, e.g. objects that could be constructed once, but not mutated. And then I noticed that it was often not needed to precompute all the values that got passed to the constructor, so I added C# properties that computed the inner cached values once, lazily. Lazy constant in C: int C1 (){ return 7; } C1 is computed only when you apply the operator () to it. IMO Haskell embraces the above ideas and much more, with the difference it encapsulates these ideas nicely and concisely, so you need only a fraction of the lines of code :) This is why we have syntactic sugar :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
Lazy constant in C: int C1 (){ return 7; } C1 is computed only when you apply the operator () to it. But that's not the point of lazyness. Lazy value is computed only ONCE. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
That's not C. That's the C preprocessor, which is a textual substitution macro language. Well, the preprocessor is part of the language in a way. These two come together. No. In fact, these are even two different programs, see man cpp. Macros certainly aren't first class (you can't pass a macro to a function, only its expansion). In Haskell I cannot pass a function to a function, only its expansion. What do you mean by expansion? Can you clarify this? C does support function pointers, which are something like first class functions. The main things C lacks which people associate with true first-class function is: The ability to construct anonymous/local functions. If you look at the example you will see I've done that. No. Your compose macro is not a function; for example, you can't use it as an argument to itself (which is easy in Haskell: (.)(.)) The ability to capture local variables and return a function with some variables bound. If I can construct anonymous functions and constants, I can construct functions with some variables bound. See above. You can't. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 16:25:26 +0200, Miguel Mitrofanov [EMAIL PROTECTED] wrote: That's not C. That's the C preprocessor, which is a textual substitution macro language. Well, the preprocessor is part of the language in a way. These two come together. No. In fact, these are even two different programs, see man cpp. These two different programs come together. Macros certainly aren't first class (you can't pass a macro to a function, only its expansion). In Haskell I cannot pass a function to a function, only its expansion. What do you mean by expansion? Can you clarify this? f1=\x-x+1 f2=\x-2*x g=\x-x.f1 h=\x-x.(\x-x+1) h is g C does support function pointers, which are something like first class functions. The main things C lacks which people associate with true first-class function is: The ability to construct anonymous/local functions. If you look at the example you will see I've done that. No. Your compose macro is not a function; for example, you can't use it as an argument to itself (which is easy in Haskell: (.)(.)) Ok. The ability to capture local variables and return a function with some variables bound. If I can construct anonymous functions and constants, I can construct functions with some variables bound. See above. You can't. Ok. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Functions are first class values in C
Cristian Baboi wrote Lazy constant in C: int C1 (){ return 7; } Not really, this is not lazy, since it always recomputes the value 7. To have lazy values in C you would have to do something like: struct ValueInt { int IsComputed; union { int Value; struct { int (*ComputeValue)(void *args); void* Args; }; }; }; int GetLazyInt (ValueInt* v) { if( !v-IsComputed ) { v-Value = v-ComputeValue(v-Args); v-IsComputed = true; } return v-Value; } But this of course, is totally useless in C and very bulky. It's also impossible to know when to call freemem on the Args (hence garbage collection in FP), when *not* to use lazy values but strict values instead (hence strictness analysis in FP), etc... I must say I had the same opinion as you had for many many years. I always thought functions as first class values where just function pointers, so what is it these Haskell/FP guys are so excited about? But if you dig deeper, you'll see the magic... Notice you will have to give yourself some time; it is very hard to get out of the imperative blob. E.g. I'm still being sucked into the imperative blob after my first year of Haskell hacking :) PS: As I'm relatively new to Haskell, don't take the above C code too seriously; it certainly will not reflect the way a real Haskell system works. Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On 23/12/2007, Peter Verswyvelen [EMAIL PROTECTED] wrote: Cristian Baboi wrote Lazy constant in C: int C1 (){ return 7; } Not really, this is not lazy, since it always recomputes the value 7. Actually GCC will happily optimise this away in almost all cases. -- - Jeremy Apthorp ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 16:26:18 +0200, Miguel Mitrofanov [EMAIL PROTECTED] wrote: Lazy constant in C: int C1 (){ return 7; } C1 is computed only when you apply the operator () to it. But that's not the point of lazyness. Lazy value is computed only ONCE. Ok. I guess I cannot be sure I'll call C1 only once. How about this int C1(){ static c1=-; if(c1==-){ c1=7 } return c1 } ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 16:55:08 +0200, Miguel Mitrofanov [EMAIL PROTECTED] wrote: In Haskell I cannot pass a function to a function, only its expansion. What do you mean by expansion? Can you clarify this? f1=\x-x+1 f2=\x-2*x g=\x-x.f1 h=\x-x.(\x-x+1) h is g Not clear. Try starting with function's expansion is... function's expansion is ... just like macro expansion. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007, Cristian Baboi wrote: On Sat, 22 Dec 2007 16:55:08 +0200, Miguel Mitrofanov [EMAIL PROTECTED] wrote: In Haskell I cannot pass a function to a function, only its expansion. What do you mean by expansion? Can you clarify this? f1=\x-x+1 f2=\x-2*x g=\x-x.f1 h=\x-x.(\x-x+1) h is g Not clear. Try starting with function's expansion is... function's expansion is ... just like macro expansion. No, it's not. Expanding variables (swapping f1 for \x-x+1) isn't the evaluation mechanism in haskell, g and h really are semantically equivalent values and we can't do actual computation just by expanding variables. Whereas expanding a macro is equivalent to beta-reduction (evaluating a function application), which isn't required before passing something in Haskell. We pass functions, not just their results. Here's a trivial example that does so: (\x - x) (\x - x) A lambda calculus classic that doesn't typecheck in Haskell: (\x - x x) (\x - x x) Feel free to try evaluating it! -- [EMAIL PROTECTED] I think you mean Philippa. I believe Phillipa is the one from an alternate universe, who has a beard and programs in BASIC, using only gotos for control flow. -- Anton van Straaten on Lambda the Ultimate ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 16:48:51 +0200, Peter Verswyvelen [EMAIL PROTECTED] wrote: Cristian Baboi wrote Lazy constant in C: int C1 (){ return 7; } Not really, this is not lazy, since it always recomputes the value 7. To have lazy values in C you would have to do something like: struct ValueInt { int IsComputed; union { int Value; struct { int (*ComputeValue)(void *args); void* Args; }; }; }; int GetLazyInt (ValueInt* v) { if( !v-IsComputed ) { v-Value = v-ComputeValue(v-Args); v-IsComputed = true; } return v-Value; } But this of course, is totally useless in C and very bulky. It's also impossible to know when to call freemem on the Args (hence garbage collection in FP), when *not* to use lazy values but strict values instead (hence strictness analysis in FP), etc... I know FP have automatic garbage collection. I know FP compilers use strictness analysis. In C++ one can isolate memory management in constructors and destructors. There are C compilers that are also able to do some optimizations. I must say I had the same opinion as you had for many many years. I always thought functions as first class values where just function pointers, so what is it these Haskell/FP guys are so excited about? But if you dig deeper, you'll see the magic... Notice you will have to give yourself some time; it is very hard to get out of the imperative blob. E.g. I'm still being sucked into the imperative blob after my first year of Haskell hacking :) PS: As I'm relatively new to Haskell, don't take the above C code too seriously; it certainly will not reflect the way a real Haskell system works. I am new to Haskell, but not new to declarative programming. I programmed in Prolog for several years, and I tryed LISP, but I don't liked the LISP syntax. I don't take my C example seriously either. The thing is I think that for a language to have first-class functions, it must be homoiconic if I understand the terms correctly. Have you tryed to write a Haskell program that manipulate Haskell programs ? Please don't tell me that Haskell compiler is written in Haskell, because there are C compilers written in C. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy [EMAIL PROTECTED] wrote: function's expansion is ... just like macro expansion. No, it's not. Expanding variables (swapping f1 for \x-x+1) isn't the evaluation mechanism in haskell, g and h really are semantically equivalent values and we can't do actual computation just by expanding variables. Whereas expanding a macro is equivalent to beta-reduction (evaluating a function application), which isn't required before passing something in Haskell. We pass functions, not just their results. Here's a trivial example that does so: (\x - x) (\x - x) A lambda calculus classic that doesn't typecheck in Haskell: (\x - x x) (\x - x x) Feel free to try evaluating it! Thank you for your message. I tryed and this is what I've got: ERROR - cannot find show function for: *** Expression : (\x - x) (\x - x) *** Of type: a - a ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Fwd: Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 17:18:18 +0200, Philippa Cowderoy [EMAIL PROTECTED] wrote: The thing is I think that for a language to have first-class functions, it must be homoiconic if I understand the terms correctly. You're confusing functions with the terms that are used to define them. The terms aren't first-class, the functions are. This is intentional: the only way you can tell functions apart is if they give you different results for the same parameter. Otherwise, what you have isn't a function but a combination of a function and some extra structure. I also confuse numbers with the terms that are used to define them (like 1.2) I guess I have to study more about this. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
function's expansion is ... just like macro expansion. Then you CAN pass a function to another function. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007, Cristian Baboi wrote: On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy [EMAIL PROTECTED] wrote: Here's a trivial example that does so: (\x - x) (\x - x) A lambda calculus classic that doesn't typecheck in Haskell: (\x - x x) (\x - x x) Feel free to try evaluating it! Thank you for your message. I tryed and this is what I've got: ERROR - cannot find show function for: *** Expression : (\x - x) (\x - x) *** Of type: a - a Yep, that's because while it can evaluate it down to (\x - x) your interpreter doesn't know how to print the result. You can demonstrate that it works by then passing in something to that result though: ((\x -x) (\x - x)) 1 You'll have to evaluate the other one by hand. Don't spend too long with it though! -- [EMAIL PROTECTED] The reason for this is simple yet profound. Equations of the form x = x are completely useless. All interesting equations are of the form x = y. -- John C. Baez ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions are first class values in C
On Sat, 22 Dec 2007 17:25:34 +0200, Philippa Cowderoy [EMAIL PROTECTED] wrote: On Sat, 22 Dec 2007, Cristian Baboi wrote: On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy [EMAIL PROTECTED] wrote: Here's a trivial example that does so: (\x - x) (\x - x) A lambda calculus classic that doesn't typecheck in Haskell: (\x - x x) (\x - x x) Feel free to try evaluating it! Thank you for your message. I tryed and this is what I've got: ERROR - cannot find show function for: *** Expression : (\x - x) (\x - x) *** Of type: a - a Yep, that's because while it can evaluate it down to (\x - x) your interpreter doesn't know how to print the result. You can demonstrate that it works by then passing in something to that result though: ((\x -x) (\x - x)) 1 I know that. The reason the interpreter doesn't know how to print the result is because converting functions to strings doesn't make sense. Thank you. You'll have to evaluate the other one by hand. Don't spend too long with it though! Don't worry, I'm lazy too :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Importing Data.Char speeds up ghc around 70%
On Sat, Dec 22, 2007 at 06:00:29PM +, Joost Behrends wrote: Hi, while still working on optimizing (naively programmed) primefactors i watched a very strange behavior of ghc. The last version below takes 2.34 minutes on my system for computing 2^61+1 = 3*768614,336404,564651. Importing Data.Char without anywhere using it reduces this time to 1.34 minute - a remarkable speed up. System is WindowsXP on 2.2GHZ Intel, 512MB Ram. I give the complete code here - hopefully all tabs are (4) blanks. Can this be reproduced ? I compile just with --make -O2. If you can reproduce it on your machine (rm executable *.o *.hi between tests for maximum reliability), it's definitely a bug. http://hackage.haskell.org/trac/ghc/wiki/ReportABug Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Hoogle search scope question -- was: [HXT] Simple question
I have a hoogle question. While I was reading the HXT discussion (below), I tried to search runX and readString in Hoogle (since I am new to HXT and Arrows). But neither search yielded any result and I had to use google to find the Haskell docs. So I am wondering what is the scope of Hoogle that people can search? Only things in the standard library? How much does it cover in the Hackage? I tried to find the scope description in the Tutorial but in vain. It is fair to say if someone needs to use Hoogle, he/she probably has no idea what he is looking for, needless to say, what is in scope of Hoogle or what is not. Thanks, Steve On Dec 19, 2007 7:48 AM, Miguel Mitrofanov [EMAIL PROTECTED] wrote: Seems rather strange for me, I've just installed HXT and got this: Prelude Text.XML.HXT.Arrow runX $ readString [(a_validate,0)] this /this writeDocumentToString [] [?xml version=\1.0\ encoding=\UTF-8\?\nthis /this] Everything works fine except for the fact that all the nodes this /this (that is, a space (an XML text node whose contents are a single space character) within a this element node) get transformed to a this/ element node, ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Stefan O'Rear stefanor at cox.net writes: If you can reproduce it on your machine (rm executable *.o *.hi between tests for maximum reliability), it's definitely a bug. http://hackage.haskell.org/trac/ghc/wiki/ReportABug Stefan Yes, it was the same as before. Had i reboot meanwhile, because i was out for meal. 1.34 vs. 2.34 minutes are still the same times now, WITH deletion of the object files and the .exe between both compiles and runs. What is happening here ? Does importing Data.Char shadow (in a hidden way) some timeworn types or methods in the Prelude ? If so, ghc might be still faster than it looks now. Cheers, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hoogle search scope question -- was: [HXT] Simple question
Hi I have a hoogle question. While I was reading the HXT discussion (below), I tried to search runX and readString in Hoogle (since I am new to HXT and Arrows). But neither search yielded any result and I had to use google to find the Haskell docs. So I am wondering what is the scope of Hoogle that people can search? http://www.haskell.org/haskellwiki/Hoogle#Scope_of_Searches (new section just added) Only things in the standard library? Mainly those things distributed with GHC. How much does it cover in the Hackage? Virtually none. This will change at some point in the future. I tried to find the scope description in the Tutorial but in vain. It is fair to say if someone needs to use Hoogle, he/she probably has no idea what he is looking for, needless to say, what is in scope of Hoogle or what is not. In general, Hoogle is designed if you want to find a generally applicable function. If you are using HXT, then you probably know whether you want a general function or one that works over the HXT data types/classes/monads etc. I do want to expand Hoogle to search everything on hackage, and to have package specific searches as well, but it requires more coding first. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why does this blow the stack?
dbenbenn: On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote: Because they simply aren't the same. Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program: Prelude length $ take 10 ([undefined ..] :: [Integer]) 10 That's right. It makes Integer behave like the Int instance. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Daniel Fischer daniel.is.fischer at web.de writes: I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute). I have found the problem: We must possibly work recursive on a found factor. This was done in former versions, but got lost when isolating the function found. Here is a corrected version - complete again for reproducing easily the strange behavior with Data.Char. It decomposes 2^88+1 in 13 seconds. module Main where import IO import System.Exit --import Data.Char main = do hSetBuffering stdin LineBuffering putStrLn Number to decompose ? s - getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound:: Integer, result :: [Integer]} intsqrt m = floor (sqrt $ fromInteger m) primefactors :: Integer - [Integer] primefactors n | n2 = [] | even n= o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[]) twosect :: (Integer,[Integer]) - (Integer,[Integer]) twosect m |odd (fst m) = m |even (fst m) = twosect (div (fst m) 2, snd m ++ [2]) found :: DivIter - DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x) d2 :: DivIter - DivIter d2 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 2} |otherwise = d2$found x d4 :: DivIter - DivIter d4 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 4} |otherwise = d4$found x d6 :: DivIter - DivIter d6 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 6} |otherwise = d6$found x divisions :: DivIter - DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y) |divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise= y And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char. Hmmm ... Cheers, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why does this blow the stack?
dons: dbenbenn: On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote: Because they simply aren't the same. Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program: Prelude length $ take 10 ([undefined ..] :: [Integer]) 10 That's right. It makes Integer behave like the Int instance. And we see again that strictness properties are very ill-defined, here, the Enum types in base: Prelude length $ take 10 ([undefined ..] :: [Int]) *** Exception: Prelude.undefined Prelude length $ take 10 ([undefined ..] :: [()]) *** Exception: Prelude.undefined Prelude length $ take 10 ([undefined ..] :: [Ordering]) *** Exception: Prelude.undefined Prelude length $ take 10 ([undefined ..] :: [Bool]) *** Exception: Prelude.undefined But, Prelude length $ take 10 ([undefined ..] :: [Float]) 10 Prelude length $ take 10 ([undefined ..] :: [Double]) 10 And, Prelude length $ take 10 ([undefined ..] :: [Integer]) 10 Now, Prelude length $ take 10 ([undefined ..] :: [Integer]) *** Exception: Prelude.undefined So we see that Float and Double also have this problem, Prelude head (drop 1000 [1 .. ]) :: Float *** Exception: stack overflow Prelude head (drop 1000 [1 .. ]) :: Double *** Exception: stack overflow People shouldn't be writing code that depends on this! -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Am Samstag, 22. Dezember 2007 20:48 schrieb Joost Behrends: Daniel Fischer daniel.is.fischer at web.de writes: I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute). Yes, Daniel the code is not completely correct. There is a known bug. Don't know the connection to your number, but it doesn't decompose 29*29*31*31 correctly (gives [29,29,961] neither 59*59*61*61 (gives [59*59*3721]). Perhaps Prelude Main primefactors $ 7*17*29 [7,493] Prelude Main primefactors $ 7*23*29 [7,667] Prelude Main primefactors $ 7*11*23 [7,253] help you find the bug? If not, ask and ye shall be answered. Difficult to find it, but that has nothing to do with the strange differences of speed. True, and I have no idea how that could come. Cheers, Joost Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Am Samstag, 22. Dezember 2007 21:28 schrieb Joost Behrends: Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m | r == 0= go q (m+1) | otherwise = (n,m) where (q,r) = n `divMod` p and then leaving out the recursive call in d2 etc. For a measurable difference, you'd need a number with some high prime powers as factors, but still, saves some work even for squares. I have found the problem: We must possibly work recursive on a found factor. This was done in former versions, but got lost when isolating the function found. Here is a corrected version - complete again for reproducing easily the strange behavior with Data.Char. It decomposes 2^88+1 in 13 seconds. module Main where import IO import System.Exit --import Data.Char main = do hSetBuffering stdin LineBuffering putStrLn Number to decompose ? s - getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound:: Integer, result :: [Integer]} intsqrt m = floor (sqrt $ fromInteger m) primefactors :: Integer - [Integer] primefactors n | n2 = [] | even n= o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[]) twosect :: (Integer,[Integer]) - (Integer,[Integer]) twosect m |odd (fst m) = m |even (fst m) = twosect (div (fst m) 2, snd m ++ [2]) found :: DivIter - DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x) d2 :: DivIter - DivIter d2 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 2} |otherwise = d2$found x d4 :: DivIter - DivIter d4 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 4} |otherwise = d4$found x d6 :: DivIter - DivIter d6 x |dividend x `mod` divisor x 0 = x {divisor = divisor x + 6} |otherwise = d6$found x divisions :: DivIter - DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y) |divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise= y And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char. Hmmm ... Cheers, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Is a type synonym declaration really a synonym ?
Can someone confirm me that: type TA = A :+: B type TB = C :+: D type T = TA :+: TB is not equivalent to type T = A :+: B :+: C :+: D where I have defined infixr 6 :+: data (f :+: g) data A data B data C data D I have a computation at type level which is working with the later definition of T but not with the former (ghc 6.8.1) Thanks. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Is a type synonym declaration really a synonym ?
On Sat, Dec 22, 2007 at 10:22:46PM +0100, alpheccar wrote: Can someone confirm me that: type TA = A :+: B type TB = C :+: D type T = TA :+: TB is not equivalent to type T = A :+: B :+: C :+: D where I have defined infixr 6 :+: data (f :+: g) data A data B data C data D I have a computation at type level which is working with the later definition of T but not with the former (ghc 6.8.1) Type synonyms are implicitly parenthetized, and your :+: is non-associative. Compare: s +:+ t = concat[(,s,,,t,)] foo = a +:+ b bar = c +:+ d baz = (foo +:+ bar) == (a +:+ b +:+ c +:+ d) Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Is a type synonym declaration really a synonym ?
alpheccar wrote: Can someone confirm me that: type TA = A :+: B type TB = C :+: D type T = TA :+: TB This is type T = (A :+: B) :+: (C :+: D) is not equivalent to type T = A :+: B :+: C :+: D is type T = A :+: (B :+: (C :+: D)) So these types are indeed not the same. Twan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote: I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S The Prelude imports GHC.List, iirc. Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: MonadFix
Joost Behrends wrote: apfelmus writes: Huh? p intsqrt n is evaluated just as often as p*p n , with changing n . Why would that be less expensive? Btw, the code above test for r==0 first, which means that the following p*p n is tested exactly once for every prime candidate p . No. One point in the introduction of DivIter is, that intsqrt dividend is stored there and only recomputed, when a new factor is found. Yes, I'm sorry, it didn't occur to me that recomputing per actual prime factor (with multiplicities, i.e. p^5 counted 5 times) is better than recomputing per candidate factor (without multiplicities, i.e. p^5 counted only once). And concerning my cycled lists of summands as [2,4] or [2,4,2,4,2,4,2,6,2,6]: an easily computable function stepping through all primes can only be a function, which yields primes plus some more odd numbers. This is, what i tried. Yes, this scheme was my intention, too. The list primes' doesn't need to (and indeed shouldn't) be a list of actual primes, just a good guess like primes' = 2:[3,5] primes' = 2:3:scanl (+) 5 (cycle [2,4]) or something with [2,4,2,4,2,4,2,6,2,6]. So, it's an infinite list of numbers that qualify as candidates for being a prime factor of n (which I called prime candidates. Not a good name, since they don't need to be actual prime numbers.) What I want to say is that using such an infinite is a nice way to separate the generate-prime-factor-candidates-logic from the test-all-those-candidates-loop. It's not necessary to hard-code it with a predicate like iterdivisors x | x == 0 = 3 | x == 1 = 5 | otherwise x = iterdivisors (x-1) + ((cycle [2,4]) !! x) (which, in this particular form, is hopelessly inefficient) or special step functions like d2 :: DivIter - DivIter d2 x |dividend x `mod` divisor x 0 = x { divisor = divisor x + 2} |otherwise = found x d4 :: DivIter - DivIter d4 x |dividend x `mod` divisor x 0 = x { divisor = divisor x + 4} |otherwise = found x d6 :: DivIter - DivIter d6 x |dividend x `mod` divisor x 0 = x { divisor = divisor x + 6} |otherwise = found x divisions :: DivIter - DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y) |divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise= y It's not even necessary to treat 2 in a special way like twosect :: (Integer,[Integer]) - (Integer,[Integer]) twosect m |odd (fst m) = m |even (fst m) = twosect (div (fst m) 2, snd m ++ [2]) does, the primes' list handles it all. Regards apfelmus ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Daniel Fischer daniel.is.fischer at web.de writes: Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m | r == 0= go q (m+1) | otherwise = (n,m) where (q,r) = n `divMod` p True - but be aware, that this will slightly slow down the computation for not multiple factors. And - as you recently noted - the really expensive part are all the tried factors, which do not divide the queried number. All this is just a first approach to the problem. When i talk of naively programmed, then i want to say, that number theorists might have much better numerical orders marching through all primes plus some more odd numbers. I didn't search for that on the net. The last version was some kind of resign from tries like this: firstPrimes = [3,5,7,11,13,17] start = last firstPrimes pac = product firstPrimes slen = length lsumds lsumds = drop 1 (fst$getSummands (singleton start, start)) where getSummands :: (Seq Int, Int) - (Seq Int, Int) getSummands r |snd r bnd= getSummands ((fst r)|k, snd r + k) |otherwise = r where bnd = 2*pac + start k = getNext (snd r) getNext n |and [(n+2)`mod`x0 | x-firstPrimes] = 2 |otherwise= 2 + getNext (n+2) smallmod :: Int - Int - Int smallmod n m | nm = n | otherwise = 0 divstep :: (DivIter,Int) - (DivIter, Int) divstep (x,n) | and [(fromInteger $ divisor x)start, ximod0] = (x {divisor = divisor x + 2}, n) | (fromInteger$divisor x) start = (x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]}, n) | ximod0 = (x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen) | otherwise = (x {dividend = xidiv, bound= intsqrt(xidiv), result = result x ++ [divisor x]}, n) where (xidiv, ximod) = divMod (dividend x) (divisor x) divisions :: (DivIter, Int) - (DivIter, Int) divisions (y,n) | divisor y = bound y = divisions (divstep (y,n)) | otherwise= (y,0) Here the additions to divisor are taken from the sequence lsmnds (List of SuMaNDS) - the type Seq from Data.Sequence is faster with the function index than Data.List with !!. getSummands is a kind of reduced sieve of Eratosthenes. The main improvement is the longest line: |ximod0 = (x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen) I even considered converting lsmnds to ByteString and storing them - the build of lsmnds for firstPrimes = [3,5,7,11,13,17,19,23,29] (which already has some MB footprint) takes several minutes. But we have to track the number of iteration we are in. And that eats up much more than the reduction of divisions for failing factors. The code works (called slightly modificated by primefactors), but needs 5.41 minutes for 2^61+1 :((. Also expensive might be the lookup in lsumds - the code gets even slower with longer lists for firstPrimes. divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) is derived from lsmnds [3,5] = [4,2,4,2,4,6,2,6]. For me the whole matter is closed for now - the 1.34 minutes are no bad result. Amd anyway the code might represent a not too bad lower bound for efficiency of decomposing algorithms. Auf Wiedersehen, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Reducing the executable output filesize with GHC (on Windows)
I remember having read how to do this a while ago, but I can't google it back. How can I force the linker to exclude all the sections that are not referenced? I thought it had to do something with -split-objs? I've written a short consolemode program, but it compiles to 1.3MB. Since I will be creating a lot of these small apps, surely this filesize can be reduced? Thanks, Peter PS: The program I wrote is 250 lines of code. It is a simple vectormath exercise generator for my students which I wrote in 5 hours, something I would never been able to do in C++/C#, even though I have much more experience in the latter languages. Haskell's sooo nice, and I'm just scratching the surface of it. Thank to all of you guys for the hard work J ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Here's the Prelude imports I see at the moment. Didn't chase down the dependencies in all the code initially and now I see that GHC.Show does import GHC.List. Still, I suspect this has something to do with fusion nonetheless. #ifdef __GLASGOW_HASKELL__ import GHC.Base import GHC.IOBase import GHC.Exception import GHC.Read import GHC.Enum import GHC.Num import GHC.Real import GHC.Float import GHC.Show import GHC.Err ( error, undefined ) #endif (from http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Prelude.html) --s On Dec 22, 2007 4:44 PM, Stefan O'Rear [EMAIL PROTECTED] wrote: On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote: I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S The Prelude imports GHC.List, iirc. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHbYVTFBz7OZ2P+dIRAi02AJ41CyIVwCRLH2MU51Sc8Rjrtgxy+ACeL1m8 F2a0Id2PErsKgjOyggkT8Ig= =T0A3 -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Sterling Clover s.clover at gmail.com writes: I'm curious if you get the same performance difference importing GHC.List instead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? Yes - the same difference: 1.33 minutes vs. 2.30 now. I was near at reporting this as a bug, but rejected that idea. What does bug mean here ? I am really a rookie at Haskell - this working on primefactors is nothing but an excercise (however i casually try number theoretic problems and often missed a program like this). But - as it appears to me - the dramatic advance in speed ghc has made recently is to a great extent due to improved types and their methods. What i see at haskell.org/ghc/docs/latest/html/libraries looks like huge road works to me. And then it is not a bug, if elder libraries as the Prelude perhaps are not completely up to date. Another problem is, that my program was not completely correct. But it didn't crash and got most numbers decomposed correct. But the strange behavior disappeared with a correct version. Perhaps the Prelude will get better now. Cheers, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Hi Yes - the same difference: 1.33 minutes vs. 2.30 now. I was near at reporting this as a bug, but rejected that idea. What does bug mean here ? If it can be reproduced on anyones machine, it is a bug. If you can bundle up two programs which don't read from stdin (i.e. no getLine calls) or the standard arguments (i.e. getArgs) which differ only by the Data.Char import, and have massive speed difiference, then report a bug. You should probably also give your GHC versions and platforms etc. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Reducing the executable output filesize with GHC (on Windows)
Hello Peter, Sunday, December 23, 2007, 1:12:05 AM, you wrote: How can I force the linker to exclude all the sections that are not referenced? ghc ... -optl -s I thought it had to do something with -split-objs? no, it's just `strip` I▓ve written a short consolemode program, but it compiles to 1.3MB. you can use upx --lzma to further decrease EXE size -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Am Samstag, 22. Dezember 2007 22:57 schrieb Joost Behrends: Daniel Fischer daniel.is.fischer at web.de writes: Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m | r == 0= go q (m+1) | otherwise = (n,m) where (q,r) = n `divMod` p True - but be aware, that this will slightly slow down the computation for not multiple factors. Why? You have to do one more division than the power of the prime dividing anyway. Or are you thinking about 'replicate 1 p' for simple factors? That will indeed be a bit slower than [p], but I dare say factorise a number with at least two prime squares or one prime cube dividing it, and you gain. And - as you recently noted - the really expensive part are all the tried factors, which do not divide the queried number. Yes, for anything with sufficiently many small primes not dividing but one or more large, anything you do with the prime factors is peanuts. Okay, make it (xstop,k) = go (dividend x `quot` p) 1, and replace divMod with quotRem. then you still have one superfluous division, but that's your making, not mine: d2 x | dividend x `mod` divisor x 0 = | otherwise = make it d2 x | r == 0 = x{divisor = divisor x + 2} | otherwise = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (q,r) = dividend x `quotRem` p (xstop,k) = go q 1 go n m | r' == 0 = go q' (m+1) | otherwise = (n,m) where (q',r') = n `quotRem` p and you have no superfluous division, exactly one intsqrt per prime dividing your number. All this is just a first approach to the problem. When i talk of naively programmed, then i want to say, that number theorists might have much better numerical orders marching through all primes plus some more odd numbers. I didn't search for that on the net. A simple fast method, although not guaranteed to succeed is Pollard's rho-method, preceded by trial division by small primes (up to 1,2,10 million or so, depending on what you have) and a good pseudo-primality test (google for Rabin-Miller, if you're interested). I haven't but skimmed your code below, but I think the idea is something like factor :: Integer - [Integer] factor 0 = error 0 has no decomposition factor 1 = [] factor n | n 0 = (-1):factor (-n) | n 4 = [n] | otherwise = go n srn tests where srn = isqrt n go m sr pps@(p:ps) | r == 0= p:go q (isqrt q) pps | p sr = if m == 1 then [] else [m] | otherwise = go m sr ps where (q,r) = m `quotRem` p tests = 2:3:5:7:11:scanl (+) 13 (tail $ cycle dlist) isqrt :: Integer - Integer isqrt n | n 10^60 = floor (sqrt $ fromInteger n) | otherwise = 10^20*isqrt (n `quot` 10^40) dlist :: [Integer] dlist = zipWith (-) (tail wheel) wheel smallPrimes :: [Integer] smallPrimes = [2,3,5,7,11] wheelMod :: Integer wheelMod = product smallPrimes wheel :: [Integer] wheel = [r | r - [1,3 .. wheelMod+1] , all ((/= 0) . (mod r)) (tail smallPrimes)] Now, the construction of the wheel and the list of differences, dlist, is relatively time-consuming, but can be sped up by using a good sieving method (my choice would be using an array (STUArray s Int Bool)). Of course, when you go much beyond 13 with smallPrimes, you'll end up with a rather large dlist in your memory, but smallPrimes = [2,3,5,7,11,13,17] should still be okay, no indexing means it's probably faster than Seq. Cheers, Daniel The last version was some kind of resign from tries like this: firstPrimes = [3,5,7,11,13,17] start = last firstPrimes pac = product firstPrimes slen = length lsumds lsumds = drop 1 (fst$getSummands (singleton start, start)) where getSummands :: (Seq Int, Int) - (Seq Int, Int) getSummands r |snd r bnd= getSummands ((fst r)|k, snd r + k) |otherwise = r where bnd = 2*pac + start k = getNext (snd r) getNext n |and [(n+2)`mod`x0 | x-firstPrimes] = 2 |otherwise= 2 + getNext | (n+2) smallmod :: Int - Int - Int smallmod n m | nm = n | otherwise = 0 divstep :: (DivIter,Int) - (DivIter, Int) divstep (x,n) | and [(fromInteger $ divisor x)start, ximod0] = (x {divisor = divisor x + 2}, n) | (fromInteger$divisor x) start = (x
Re: [Haskell-cafe] GtkMathView wrapper for Haskell / Gtk2HS?
bf3: Did anybody already wrap [1]http://helm.cs.unibo.it/mml-widget so it can be used with Haskell / Gtk2HS? If not, would it be good project for me to learn Haskell's FFI? It involves C++, so I guess that would be harded to wrap? I'm not aware of any wrapper. C++ can be wrapped by going via C (its not done often though). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [15/16] SBM: Predictions compared to the measurements
firefly: From Don Stewart: Summary: the suspicious lazy bytestring program works now. (constant space, and fastest overall, as expected originally) Program 1, lazy bytestring length . filter Yesterday: ./A +RTS -sstderr 150M 1.01s user 0.10s system 98% cpu 1.123 total 40M allocated * Today (fixed!): ./A +RTS -sstderr 150M 0.26s user 0.06s system 96% cpu 0.332 total 2M allocated Reason, deprecated array fusion mucking up the optimiser. I think we can close this regression. Nope. Look at the memory graphs: hs/space-bslc8-lenfil-1: 38632 ██▌ | -- 38644 ██▌ | --1940 ▌ | -- 109404 █▋ | -- 82324 ██▍ | -- 109388 █▋ | -- 82304 ██▎ | It is fixed for ghc 6.8.2 running bytestring 0.9.0.2 but not for ghc ^ 6.9.20071119 and head (as of noon 2007-12-19), no matter the bytestring version. There are lots of memory performance bugs in ghc 6.9. Please package this up as a bug report against GHC head. Any regression wrt. 6.8.2, using the same bytestring version, is going to be a ghc issue (not a bytestring library issue). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [16/16] SBM: Discussion and Conclusion
firefly: General --- Bytestrings are faster than haskell strings. This is good and expected. But their memory use, ouch! There seems to be a grave memory bug in 6.9. Bug report -- since the bytestring library hasn't changed between 6.8.2 and head, that sounds like a bug report. Lazy bytestrings are slower than strict bytestrings -- this was completely unexpected for me. Which programs should I look at? Is there a specific program or two where lazy bytestring performance is entirely out of whack (against 6.8.2?)? If so, please forward it to me and Duncan. I/O --- The I/O patterns seen in the various programs are: 1) strict bytestrings = read increasingly large chunks (they double every time). I don't know if the already-read data have to be copied around or if the memory allocator can handle extending existing blocks (if they are at the front of the heap, say) like realloc() can. if you're reading from stdin, and we don't know the size, bytestring getContents uses realloc. Even if the blocks do get copied around, that's not where the performance limiter is (speed-wise). It does seem to be a bit unwise in terms of memory pressure. Does the gc kick in every time the heap has to expand? Does it do a full collection then? If no other allocations have happened since the last collection than the allocation that caused the heap to expand, then perhaps skip the collection? (or some criteria similar to that) 2) lazy bytestrings = read blocks of 32K-8 bytes, as expected. The C benchmarks show that there's no penalty for reading 32K-8 vs. 32K. 3) native strings = read blocks 8K. The C benchmarks show that it barely matters if the block size is 4K, 32K, or 32K-8 bytes. In any case, the small differences due to block size are completely in the noise. Reading very large blocks, though, as the strict bytestrings do, actually has a cost (38-70% slower, depending on the CPU and kernel and RAM and busspeeds, etc -- see email 10 in the series). It is still peanets compared for almost all the benchmarks. Useful information, thanks. Backend --- The backend turns out to be rather bad. It is generally very easy to improve the generated code by hand without doing anything in terms of sophisticated analysis. One can rewrite inner loops using heroic code (load four characters at a time together into a 32-bit register or even use MMX to handle eight characters in parallel) but it isn't really necessary to do that to gain a good improvement and start being competitive with C. The backend is probably what is costly on some of the lazy bytestring and native string code because there are too many tests to see if the buffer has been exhausted (I think). Some simple hoisting and common subexpression elimination would go far here. There may be things we can do at the library level too. We changed the lazy bytestring representation in the last release cycle to remove a number of tests. If you've a specific example would help though. Thoroughness It really is necessary to benchmark more than one ghc/library combination, otherwise the bad memory bugs wouldn't stand out so clearly or I would have believed, as Don Stewart did, that he had fixed the memory allocation bug. I fixed a particular issue with the library. But please make a bug report if there are other issues wrt. ghc head in particularly. Action Items Suggested short-term actions: o Can the allocator expand blocks? If not, see if it can be implemented. o Find out why lazy bytestrings by hand are that slow - is it due to lack of hoisting in the backend? Example please! o Improve backend, mainly by not hiding information from register allocator. Alternatively, by using an assembly-to-assembly transformator that improves the register allocation. Some nice examples that could help motivate the native gen hackers would be useful. o Fix -sstderr. o perhaps even let the RTS print out /proc/self/maps or /proc/self/status or peak RSS returned by getrusage() in the ru_maxrss field? o Find memory leaks in ghc 6.9 code. Bug report time. o ghc should allow round-tripping of the C-- code it generates. o would be awfully nice if the backend didn't sometimes throw the recognizable labels away. o can the core/stg code be made easier to read? Can you ensure these points are summarised and directed to the ghc bug tracker, http://hackage.haskell.org/trac/ghc/newticket?type=bug to this work isn't lost on [EMAIL PROTECTED] -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Neil Mitchell ndmitchell at gmail.com writes: If it can be reproduced on anyones machine, it is a bug. If you can bundle up two programs which don't read from stdin (i.e. no getLine calls) or the standard arguments (i.e. getArgs) which differ only by the Data.Char import, and have massive speed difiference, then report a bug. You should probably also give your GHC versions and platforms etc. Thanks for your attention too ! Now i tried a version without input (just computing the primefactors of the constant 2^61+1, what it did correctly in spite of its bugginess). And it didn't have the Data.Char bug (and Data.List bug) too. As my original code hasn't on Linux also it seems. Thus it happens only in an exotic configuration. Windows will stay exotic in the Haskell community. Before should noone has reproduced it at least on Windows (XPpro SP2 is my version), i will do nothing more. The hardware is Intel Celeron 2.2GHZ, 512 MB Ram. ghc 6.8.1 lives on D:\\Programme (not on the system drive C:, which poses problems to Cabal, told aside). I just made the standard installation (do not remember, whether by unzipping alone or there was an MSI) not touching anything, of course. I was happy about no autoconf (which i see in the category of desasters like EMM386 for MS-DOS). But something is strange: ghci doesn't accept qualified imports. It produces a parse error. That seems a bug to me, because my ghci accepts import (any module) nevertheless. But i see it as a minor bug - the compiler is much more important. Happy days, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%
Hi again, Daniel cannot sleep tonight - perhaps from the feeling to loose too much time with these things. Yes - it's the wheel. And a dlist made from [3,5,7,11,13,17] was optimal in some of my experiments too. You will probably know it - but perhaps there are third-party readers: A last try to improve the final version was to replace Integer by Int64 (importing Prelude qualified). There was no difference ! That is good news and bad - the good news (and that's much more important), that Integer is absoulutely cleanly implemented. However, if this is so, i have little sympathy for being forced to use fromInteger and toInteger - for this special case i would prefer automatic coercion. And - sorry - cannot do other than seeing anything else as ill-advised dogmatism. But perhaps i am spoiled from Python :). I did this, because for secure decision about divisibility the program isn't useful beyond that. Thanks for that entry point: I took google Rabin-Miller in a comment of this last version. Happy days, Joost ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe