Re: [Haskell-cafe] Why does this blow the stack?

2007-12-22 Thread Luke Palmer
On Dec 22, 2007 12:06 AM, Stefan O'Rear [EMAIL PROTECTED]   The
explicit loop you're talking about is:
  enumDeltaInteger :: Integer - Integer - [Integer]
  enumDeltaInteger x d = x : enumDeltaInteger (x+d) d
  That code isn't very complicated, and I would hope to be able to write
  code like that in my own programs without having to worry about
  strictness.  Given that the compiler even has an explicit signature,
  why can't it transform that code to
  enumDeltaInteger x d = let s = x + d in x : (seq s $ enumDeltaInteger s 
  d)
  since it knows that (Integer+Integer) is strict?  Of course, improving
  the strictness analysis is harder, but it pays off more, too.

 Because they simply aren't the same.

 Try applying your functions to undefined undefined.

This took a little work for me to see.  Here it is for the interested:

Prelude let edi :: Integer - Integer - [Integer]; edi x d = x : edi (x+d) d
Prelude let edi' :: Integer - Integer - [Integer]; edi' x d = let s
= x + d in x : (seq s $ edi s d)
Prelude _:_:_ - return $ edi undefined undefined
Prelude _:_:_ - return $ edi' undefined undefined
*** Exception: Prelude.undefined

Luke
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why does this blow the stack?

2007-12-22 Thread David Benbennick
On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
 Because they simply aren't the same.

Good point; thanks.  That means that Don's patch could theoretically
break some existing Haskell program:

Prelude length $ take 10 ([undefined ..] :: [Integer])
10

With his patch, that will throw.

Some other types have the same issue (lazy Enum when it could be
strict, leading to stack overflow):

Prelude length $ take 10 ([undefined ..] :: [Double])
10
Prelude length $ take 10 ([undefined ..] :: [Float])
10
Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CFloat])
10
Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CDouble])
10
Prelude Foreign.C.Types length $ take 10 ([undefined ..] :: [CLDouble])
10
Prelude Data.Ratio length $ take 10 ([undefined ..] :: [Ratio Int])
10
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [1/16] SBM: The Haskell and C benchmarks

2007-12-22 Thread Peter Firefly Brodersen Lund
Here are the 48 Haskell and C benchmarks.

Don Stewart contributed three (although I had to fight a bit to make one of
them compile).  
Jules Bean (quicksilver) contributed one.
Bertram Felgenhauer (int-e) contributed three (in the form of a single file,
which I untangled).
Spencer Jannsen (sjannsen) contributed one.

wli (William Lee Irwin III) inspired me to add the getwchar benchmarks.

I used the following shell code to gather all the benchmarks:

 (for F in hs/*.hs c/*.c; \
do echo --; \
   echo $F:;\
   echo ; \
   cat $F;  \
done; \
   echo ==  \
 )  xx.txt

They are not in the same order as in the Makefile or in the reports,
unfortunately.

-Peter

--
hs/byte-bsacc.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString as B

cnt :: Int - B.ByteString - Int
cnt !acc !bs = if B.null bs
 then acc
 else cnt (acc+1) (B.tail bs)

main = do s - B.getContents
  print (cnt 0 s)
--
hs/byte-bsfoldlx.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString as B

cnt :: B.ByteString - Int
cnt !bs = B.foldl' (\sum _ - sum+1) 0 bs

main = do s - B.getContents
  print (cnt s)
--
hs/byte-bsfoldrx.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString as B

cnt :: B.ByteString - Int
cnt !bs = B.foldr' (\_ sum - sum+1) 0 bs

main = do s - B.getContents
  print (cnt s)
--
hs/byte-bsl---acc.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Lazy as B

cnt :: Int - B.ByteString - Int
cnt !acc !bs = if B.null bs
 then acc
 else cnt (acc+1) (B.tail bs)

main = do s - B.getContents
  print (cnt 0 s)
--
hs/byte-x-acc-1.hs:

{-# LANGUAGE BangPatterns #-}

cnt :: Int - String - Int
cnt !acc bs = if null bs
 then acc
 else cnt (acc+1) (tail bs)

main = do s - getContents
  print (cnt 0 s)
--
hs/byte-x-acc-2.hs:

{-# LANGUAGE BangPatterns #-}

cnt :: Int - String - Int
cnt !acc !bs = if null bs
 then acc
 else cnt (acc+1) (tail bs)

main = do s - getContents
  print (cnt 0 s)
--
hs/byte-x-foldl.hs:

{-# LANGUAGE BangPatterns #-}

cnt :: String - Int
cnt !bs = foldl (\sum _ - sum+1) 0 bs

main = do s - getContents
  print (cnt s)
--
hs/byte-x-foldr-1.hs:

{-# LANGUAGE BangPatterns #-}

cnt :: String - Int
cnt bs  = foldr (\_ sum - sum+1) 0 bs

main = do s - getContents
  print (cnt s)
--
hs/byte-x-foldr-2.hs:

{-# LANGUAGE BangPatterns #-}

cnt :: String - Int
cnt !bs = foldr (\_ sum - sum+1) 0 bs

main = do s - getContents
  print (cnt s)
--
hs/space-bs-c8-acc-1.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Char8 as B

cnt :: Int - B.ByteString - Int
cnt !acc bs = if B.null bs
then acc
else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)

main = do s - B.getContents
  print (cnt 0 s)
--
hs/space-bs-c8-count.hs:

-- Don Stewart
import qualified Data.ByteString.Char8 as B
main = print . B.count ' ' = B.getContents

--
hs/space-bs-c8-foldlx-1.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Char8 as B

cnt :: B.ByteString - Int
cnt bs  = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs

main = do s - B.getContents
  print (cnt s)
--
hs/space-bs-c8-foldlx-2.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Char8 as B

main = do s - B.getContents
  print $ B.foldl' (\v c - if c == ' ' then v+1 else v :: Int) 0 s

--
hs/space-bs-c8-foldrx.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Char8 as B

cnt :: B.ByteString - Int
cnt bs  = B.foldr' (\c sum - if c == ' ' then sum+1 else sum) 0 bs

main = do s - B.getContents
  print (cnt s)
--
hs/space-bs-c8-lenfil.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Char8 as B

cnt :: B.ByteString - Int
cnt bs  = B.length (B.filter (== ' ') bs)

main = do s - B.getContents
  print (cnt s)
--
hs/space-bslc8-acc-1.hs:

{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Lazy.Char8 as B

cnt :: Int - B.ByteString - Int
cnt !acc bs = if B.null bs
then acc
else cnt (if B.head bs == ' ' then acc+1 

[Haskell-cafe] [0/16] SBM: Simple Bytestring Microbenchmarks, Overview and Introduction

2007-12-22 Thread Peter Firefly Brodersen Lund
Table of contents:
0/16This email
1/16The Haskell and C benchmarks
2/16Inner loops of the hand-tweaked assembly benchmarks
3/16The Makefile
4/16How to use the Makefile (how to run benchmarks etc.)
5/16Support scripts and scriptlets
6/166.9.20071124 Athlon Duron
7/166.9.20071208 Core Duo
8/166.9.20071119 Pentium III
9/166.9.20071119 Athlon64
10/16   Graphs for 6.9.x across four cpus
11/16   Graphs for hand-tweaked assembly benchmarks
12/16   Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+
13/16   Graphs that show the infidelity of -sstderr
14/16   Behind the measurements (rationale)
15/16   Predictions compared to the measurements
16/16   Discussion and Conclusion



Simple Bytestring Microbenchmarks
-

Introduction

I love parsers.  I have been writing parsers for fun for over twenty years.
The nicest way to construct a parser used to be to write a recursive descent
parser by hand.  If you had to work with people who'd had the misfortune of
a university education, you would resort to lex and yacc (flex and bison),
despite their many shortcomings.

Combinator parsers is the only real improvement over hand-written recursive
descent parsers that I know of.  They do tend to require features that not all
languages provide.  I don't know how to write a good one in C, for example.
They do work very well in Haskell, though.

So, I've started writing a parser in Haskell (ghc, really) for the programming
language X++.  X++ is not a nice language but that's beside the point.  The
challenge for me is to write an efficient compiler + provide good analysis
tools for X++.  I think I stand a better chance of doing that in Haskell (ghc)
than in practically any other language.

There are a few drawbacks, though.

I love speed.  And efficiency.


String handling in Haskell
--
Native strings are simple and generally work well, but they are slow, take up
too much memory and there's the whole encoding mess that still needs to be
sorted out.

People have worked on other string representations and libraries for quite a
while.  I think packedstrings (as used in Darcs) was one of the first ones.
Bytestrings is the current incarnation.  It seems to be just the right thing,
especially when combined with improved automatic fusion in the compiler so
higher-order functions don't have to be expensive.

I use Parsec as my parser combinator library at the moment, which
uses native strings.  I would dearly love Parsec to be faster and use less
memory.  I think bytestrings will be part of any substantial improvement in
Parsec's resource consumption.


Other performance concerns
--
File I/O is also interesting for a compiler writer.  I would like to have a
program that is as fast as possible both when the source files are already
cached by the operating system and when they are not.  The former situation is
best handled with mmap() and the latter is best handled by read(), preferably
in combination with multi-threading so the compiler doesn't have to waste too
much time waiting for disk seeks.  Haskell seems to be very close to ideal for
me because it has very good threading support and very accessible raw access
to the operating system.

File I/O is not my current bottleneck, though.  I'll probably take a closer
look at file I/O when the other performance problems have been solved.

Then there's the general quality of the generated code.  Having read just about
every paper on ghc that was available back in the late nineties (when I first
looked at Haskell), I'd thought that the quality was good and that the compiler
also had extremely good high-level optimizations, in other words, that
abstraction was free.

I also read the C-- papers and thought that it was a very interesting and
promising approach.  I'd expected the C-- path to have matured and be well-
optimized by now.

Unfortunately, the backend is /the/ weak spot in ghc.  The frontend is heroic,
the typesystems are (too) abundant and rich, the language itself is nice -- but
the backend is not.  Looking at the generated code I'd say that it is slightly
better than Turbo Pascal 3.x and about on par with Turbo Pascal 4.0, a compiler
that didn't use any intermediate code at all, compiled each statement in
isolation, was single-pass, and had a compilation speed of about 27000 lines
per minute on an 8 MHz IBM PC AT.


Ecosystem and culture
-
Haskell has a very good ecosystem.  Probably the second best one amongst the
modern functional languages.  Ten years ago, I'd thought that Standard ML would
win but the only MLish language with a good ecosystem and culture is OCaml,
which unfortunately isn't really Standard ML.

By ecosystem I mean things like access to raw operating system calls, access
to libraries written in other languages, readily available libraries for
graphical user interfaces, databases, XML processing, network 

[Haskell-cafe] [2/16] SBM: Inner loops of the hand-tweaked assembly benchmarks

2007-12-22 Thread Peter Firefly Brodersen Lund
I've taken the two benchmarks byte-bsacc and space-bs-c8-acc-1 and
gradually tweaked their inner loops from something that used memory all the
time to something that used registers more and more efficiently.  I've done
this gradually, pretty much one register at a time.  Along the way, I've also
done a simple common subexpression/loop hoisting thing in which I combined the
pointer to the start of the string and the index into the string into a single
pointer.  Doing this in real life may cause bad problems with the garbage
collector.

At the end, I go a bit mad and start doing heroic optimizations (reading four
bytes at a time, using MMX registers to read 8 bytes at a time, twisted MMX
math to keep 8 space counters in an MMX register + a bit of loop unrolling).

Here follows first the two original inner loops and then the 23 hand-tweaked
versions.

I used the following shell code to isolate the inner loops:

 (for F in hs/byte-bsacc.s hs/space-bs-c8-acc-1.s hand/*.s ; \
do echo --; \
   echo $F:;\
   echo ; \
   cat $F | perl -e 'while(){ if (/Main_zdwcnt_info:/ .. /.section 
.data/) { print; }}' | head -n-1; \
done; \
   echo ==; \
 )  xx.txt


-Peter

--
hs/byte-bsacc.s:

Main_zdwcnt_info:
.LcYL:
cmpl $0,16(%ebp)
jle .LcYO
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.LcYO:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
--
hs/space-bs-c8-acc-1.s:

Main_zdwcnt_info:
.Lc16u:
cmpl $0,16(%ebp)
jle .Lc16x
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16F
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.Lc16x:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16F:
movl 12(%ebp),%eax
incl %eax
subl $1,16(%ebp)
movl %eax,12(%ebp)
jmp Main_zdwcnt_info
--
hand/byte-bsacc-a.s:

Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ

movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx

incl %ecx
incl %eax
decl %edx

movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info

.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
--
hand/byte-bsacc-b.s:

Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ

movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx

.L_again:
cmpl $0,%edx
jle  .L_out
incl %ecx
incl %eax
decl %edx
jmp  .L_again
.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info

.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
--
hand/byte-bsacc-c.s:

Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ

movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx

cmpl $0,%edx
jle  .L_out
.L_again:
incl %ecx
incl %eax
decl %edx
cmpl $0,%edx
jg   .L_again

.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info

.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
--
hand/byte-bsacc-d.s:

Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ

movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx

cmpl $0,%edx
jle  .L_out
.align 16
.L_again:
incl %ecx
incl %eax
decl %edx
cmpl $0,%edx
jg   .L_again

.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info

.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
--
hand/space-bs-c8-acc-1-a.s:

Main_zdwcnt_info:

.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.Lc16H:
movl 12(%ebp),%eax
   

[Haskell-cafe] [3/16] SBM: The Makefile

2007-12-22 Thread Peter Firefly Brodersen Lund
This is the entire Makefile.  It perhaps ought to be sent as an attachment
but my hacky mailer script wouldn't like it.

A few of the lines are wider than 80 columns, unfortunately.

-Peter

# GHC benchmarks of parsing (bytestring, basic code generation, I/O).
#
# Copyright 2007 Peter Lund [EMAIL PROTECTED], licensed under GPLv2.
#
#
# You will need the following tools:
#  perl, strace, /usr/bin/time, bash, a gcc that uses shared libraries and libc
#  (not dietlibc, klibc, uclibc), objdump (usually found in the binutils
#  package).
#
# A benchmark run on a new platform can be split into two phases:
#
#  phase1: Compiles, dices, and slices the code in various ways.  If this
#  completes you can be pretty sure that everything works all right.
#  Performs non-timing sensitive measurements.
#
#  phase2: Performs timing sensitive measurements.  It is a good idea to run
#  this phase on an idle machine, preferably without using X.
#  For example, you can log out of X and run telinit 1 to get to
#  single-user mode.  If you do so, please remember to set the correct
#  path to your ghc compiler.


# Make a few bits of the makefile less noisy.
Q:=@

#

# Ask ghc to optimize and warn
GHCFLAGS= -O2 -W

# Some newer versions of gcc prefer -Wextra -Wall
GCCWARNFLAGS=-W -Wall

# Default compilers
CC=gcc
GHC=ghc
GHCPKG=ghc-pkg

#

HSPROGS=hs/byte-bsacc   \
hs/byte-bsfoldlx\
hs/byte-bsfoldrx\
hs/byte-bsl---acc   \
hs/byte-x-acc-1 \
hs/byte-x-acc-2 \
hs/byte-x-foldl \
\
hs/space-bs-c8-acc-1\
hs/space-bs-c8-count\
hs/space-bs-c8-foldlx-1 \
hs/space-bs-c8-foldlx-2 \
hs/space-bs-c8-foldrx   \
hs/space-bs-c8-lenfil   \
hs/space-bslc8-acc-1\
hs/space-bslc8-acc-2\
hs/space-bslc8-acc-3\
hs/space-bslc8-chunk-1  \
hs/space-bslc8-chunk-2  \
hs/space-bslc8-chunk-3  \
hs/space-bslc8-chunk-4  \
hs/space-bslc8-count\
hs/space-bslc8-foldl\
hs/space-bslc8-foldlx-1 \
hs/space-bslc8-foldlx-2 \
hs/space-bslc8-foldr-1  \
hs/space-bslc8-foldr-2  \
hs/space-bslc8-lenfil-1 \
hs/space-bslc8-lenfil-2 \
hs/space-bsl---foldlx   \
hs/space-x-acc-1\
hs/space-x-acc-2\
hs/space-x-foldl\
hs/space-x-lenfil


# RMPROGS keeps track of programs that are not always included in the tests.
# We do want 'make clean' to delete them even when they are not currently
# part of the build (they may be left over from a previous build).

# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/byte-x-foldr-1
 RMPROGS:=$(RMPROGS) hs/byte-x-foldr-1

# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/byte-x-foldr-2
 RMPROGS:=$(RMPROGS) hs/byte-x-foldr-2

# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/space-x-foldr-1
 RMPROGS:=$(RMPROGS) hs/space-x-foldr-1

# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/space-x-foldr-2
 RMPROGS:=$(RMPROGS) hs/space-x-foldr-2


HANDPROGS= hand/byte-bsacc-a\
   hand/byte-bsacc-b\
   hand/byte-bsacc-c\
   hand/byte-bsacc-d\
   \
   hand/space-bs-c8-acc-1-a \
   hand/space-bs-c8-acc-1-b \
   hand/space-bs-c8-acc-1-c \
   hand/space-bs-c8-acc-1-d \
   hand/space-bs-c8-acc-1-e \
   hand/space-bs-c8-acc-1-f \
   hand/space-bs-c8-acc-1-g \
   hand/space-bs-c8-acc-1-h \
   hand/space-bs-c8-acc-1-i \
   hand/space-bs-c8-acc-1-j \
   hand/space-bs-c8-acc-1-k \
   hand/space-bs-c8-acc-1-l \
   hand/space-bs-c8-acc-1-m \
   hand/space-bs-c8-acc-1-n \
   hand/space-bs-c8-acc-1-o \
   hand/space-bs-c8-acc-1-p \
   hand/space-bs-c8-acc-1-q \
   hand/space-bs-c8-acc-1-r \
   hand/space-bs-c8-acc-1-s

RMPROGS:=$(RMPROGS) $(HANDPROGS)


ifeq ($(shell $(GHCPKG) list | grep bytestring),)
 # ghc 6.6.1 with an old version of bytestring in 'base' but without its own
 # module name
HSPROGS:=$(shell printf %s\n $(HSPROGS) | grep -v '.*-chunk-.*')
endif

HANDTEXT:=including hand-tweaked assembly
ifeq ($(shell $(GHC) --version | grep 6.9.20071119),)
HANDPROGS:=
HANDTEXT:=no hand-tweaked assembly
endif
ifneq ($(SUFFIX),)
HANDPROGS:=
HANDTEXT:=no hand-tweaked assembly
endif

CPROGS= c/byte-getchar c/byte-getchar-u c/byte-4k   \
\
c/space-getchar c/space-getchar-u c/space-4k\
c/space-megabuf c/space-getwchar c/space-getwchar-u \
c/space-32k 

[Haskell-cafe] [4/16] SBM: How to use the Makefile (how to run benchmarks etc.)

2007-12-22 Thread Peter Firefly Brodersen Lund
Introduction

Most of the smarts of the benchmark harness is in the Makefile.
If you want to rerun the benchmarks (or a single benchmark) or look at the
intermediate code for a benchmark or the I/O trace or the memory consumption or
the time spent or ... then you use the makefile.

There are some support scripts in shell and Perl (and two C programs) that the
Makefile uses to do its job.  And there are some that you, the user, will want
to interact directly with.

The benchmarks are only expected to work on Linux.  They have been tested on
SuSE 8.2 (from 2003, with a 2.4 kernel), Ubuntu 7.04, and Ubuntu 7.10.


Quick howto
---

 make phase1-- compiles, generates test files, measures memory use.
   Safe to run on a busy machine if there's no active memory
   pressure.
 make phase2-- timing runs.  NOT safe to run on a busy machine.
   Should be run in runlevel 1 (= no X, no daemons, single-user
   mode) for best measurements.
   Outputs report at end.  This is where you check the quality
   of the measurements.  If you don't like them, run 'make
   redophase2' (or delete the .time and .stat files with low
   quality and run 'make phase2' again.)
 make zipdata   -- make a tarball with all the measurements, suitable for
   emailing or putting on a website.

The Makefile will beep after phase 1 and 2.

The above will run a NORMAL run, which is fine during development if you
want to see if you nailed a performance bug.  It runs reasonably fast (about
43 seconds on my Athlon64).

If you want better measurements, you should use:

 make TESTKIND=THOROUGH phase1 phase2 zipdata

This will use a 150MB data file instead of a 15MB one and it will run the
timing measurements 6 times (before throwing the first away) instead of 4
times (before throwing the first away).

If you don't want to use single-user mode, you can improve the measurements
by piping the output to a file (or run the test from the console) instead of
involving a terminal and an X server (the screen update may kick in in the
middle of a timing run and disturb things if for no other reason than their
polluting the CPU caches).


Filesystem layout
-
The benchmarks are in:
  hs/*.hs
  c/*.c
  hand/*.s

  hand/*.hs and hand/*.c are not compiled.  The two *.hs files are the
  originals from which the tweaked assembly code has been derived.  The two
  *.c files are sketches of how the MMX tweaks work (because MMX code by itself
  can be a bit off-putting).

These are the support scripts:
  tools/genfiles.pl -- generate the test input files.

  tools/cutmem.pl
  tools/cutpid.pl   -- both are used to disentangle the outputs of strace
   and pause-at-end (see below).  I combine strace,
   memory info, and +RTS -sstderr into a single run to
   save time.  This means that things end up in fewer
   files than I'd like.

  tools/cut.pl  -- cut out main loop from disassembly ('make discut')

  tools/stat.pl -- looks at all timings for a single benchmark and
   calculates average and standard deviation and time
   slack, that is the discrepancy between user+sys and
   real.  It optionally throws away the first run.

  tools/eatmem.c-- allocates a chunk of memory and makes damn sure
   it really is in RAM!
  tools/pause-at-end.c  -- part of a hack that copies /proc/self/maps and
   /proc/self/status to stderr just before a benchmark
   exits.

  tools/iosummary.pl-- takes an strace and sums up the I/O
  tools/genreport.pl-- generate a nice report with bar charts.
   Takes way too many options in the form of
   environment variables.

  tools/regenreport.sh  -- regenerates the report from ANY measurement tarball.

  tools/merge.pl-- merge data from many measurement tarballs, with or
   without rescaling.


Generated files:
  hs/*.core hs/*.stg hs/*.cmm hs/*.s-- intermediate code
  hs/*.hi   -- Haskell Interface
  */*.o -- object code
  */*  (the files in $(HSPROGS) $(CPROGS) $(HANDPROGS)) -- programs
  */*.dis */*.discut-- disassembled programs (and inner loops)
  */*.doc   -- source + intermediate code + inner loops + timings

  */*.mem   -- output from '+RTS -sstderr' + /proc/self/status +
   /proc/self/maps + output from /usr/bin/time (where
   the number of minor page faults is most interesting
   datum)
  */*.strace-- complete strace, taken together with */*.mem
  */*.iotrace   

[Haskell-cafe] [5/16] SBM: Support scripts and scriptlets

2007-12-22 Thread Peter Firefly Brodersen Lund
Some of the scripts warrant a closer look.

'make zipdata'
  creates a nice tarball with all the data necessary to recreate a report AND
  to merge that report together with other reports, possible with rescaled
  bar charts.  Very handy.

  All the files in the tarball are inside the 'ghc-measurements/' directory so
  the risk of things going wrong when unpacking the tarball is less.

  The names of the benchmarks are put in ghc-measurements/progs, mainly to
  ensure they end up in the right order when regenerating and merging reports.

tools/genreport.pl [list of benchmarks to put in the report]
  It doesn't parse the command-line in any way because life is too short for
  command-line parsing.  Instead, it is controlled via (too many) environment
  variables.

  ASCII - set to avoid using UTF-8 for bar charts and per mille character.
  NOSRC - the tool normally creates */*.srctimemem files containing the source
  code for each benchmark with bar charts for time/mem appended to the
  end.  Setting this variable switches that off (necessary when
  regenerating and merging reports).
  EXCLUDE - disregard some of the benchmarks on the command line.  Why is this
  necessary?  Because it makes regenerating and merging reports easier.
  And because I was too lazy to filter the command line in
  tools/regenreport.sh and tools/merge.pl.

  FINDMAX - used by tools/merge.pl when rescaling.  Outputs max time and max
mem to stdout instead of the normal report.
  MAX_FILEWIDTH - used by tools/merge.pl to make merged reports look nice
  MAX_TIME,
  MAX_PEAKMEM - used by tools/merge.pl when rescaling

  Note that strictly speaking, there is a bug in the script(s) because it
  conflates the width of time/mem measurement represented as numbers (which
  you always want to take into account when merging) and MAX_TIME/MAX_PEAKMEM
  (which you only care about when rescaling).
  [FIXED now - 2007-12-21]

tools/regenreport.sh
  unpacks a measurement tarball into a tmp directory and runs
  tools/genreport.pl to generate the report.
  Takes care not to disturb the normal files.

tools/merge.pl [tarballs]
  Uses tools/regenreport.sh on each tarball in turn to generate a report which
  it reads in and stores on a benchmark-by-benchmark basis.   At the end,
  synthetically combine all the pieces it cut out of the original report(s)
  into a brand-spanking new, merged report.

  Even the headers and the platforminfo at the top of each report is cut out
  and stored in data structures until they get spit out again at the end.

  The reading magic is in the state machine in gather().  It is not as bad as
  it looks.  Some of the complications arise from marking repeated benchmark
  names as ' -- ', which improves the readability of the merged reports
  immensely.  Another part of the complications arise due to the fact that not
  all tarballs contain the exact same benchmarks!  Those that don't get a nice
  'n/a' instead of numbers and a bar.  And finally, the benchmarks should be
  in the right order.  That is trickier than it sounds...

  When rescaling, tools/regenreport.sh is first run once for each tarball with
  the FINDMAX environment variable set.  This results in tools/regenreport.sh
  outputting the maximum filename width, time, and peakmem for each tarball.

  ASCII - use ASCII instead of UTF-8
  RESCALE - sometimes you want to rescale and sometimes you don't
  MAX_FILEWIDTH - if you want to force a specific width
  MAX_TIME,
  MAX_PEAKMEM   - if you want to force a specific max

-Peter
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [6/16] SBM: 6.9.20071124 Athlon Duron

2007-12-22 Thread Peter Firefly Brodersen Lund
This set of measurements was captured by Daniel Fischer on one of his older
machines, running SuSE 8.2, which has a Linux 2.4 kernel!

The benchmarks were run on 2007-12-16 using ghc 6.9.20071124.

Unfortunately, the ghc version is not quite the same as the one I've used for
most measurements (6.9.20071119) so things may be a little different just for
that reason.

The results seemed a bit off at first, but now that I have graphs of all the
runs on all the machines they don't seem strange at all.  First of all, the
memory use is about the same as on the other machines.  Secondly, the timing
differences for the C getchar/getwchar might be partly due to different
versions of the C library.  The remaining differences (a steeper profile
than on the core duo and the Athlon64) may be due to different
microarchitectures.

-Peter

Fischer's machine
ghc 6.9.20071124
AMD Duron(tm) processor
1200.089 MHz
TESTKIND=THOROUGH
SUFFIX=



Time (byte counting)std
avg dev slack
hs/byte-bsacc:1.892 21‰ 0.4  ██▊
  |
hs/byte-bsfoldlx: 2.258  3‰ 0.1  ███▎ 
|
hs/byte-bsfoldrx: 2.933  0‰ 0.1  ████▎  
  |
hs/byte-bsl---acc:   14.319 45‰ 0.1  
████████████████████▌|
hs/byte-x-acc-1: 20.915 17‰ 4.0  
█████████████████████████████▉
   |
hs/byte-x-acc-2: 20.691  8‰ 0.1  
█████████████████████████████▌
   |
hs/byte-x-foldl: 20.610  5‰ 1.4  
█████████████████████████████▍
   |
c/byte-getchar:   9.042  0‰ 0.1  
████████████▉|
c/byte-getchar-u: 1.314  3‰ 0.2  █▉   
|
c/byte-4k:0.419  5‰ 0.5  ▋  
  |

Memory:Peak
--- KB 
hs/byte-bsacc:   147492 
████████████████████████████████████████
  |
hs/byte-bsfoldlx:147492 
████████████████████████████████████████
  |
hs/byte-bsfoldrx:147488 
████████████████████████████████████████
  |
hs/byte-bsl---acc: 2896 ▊ 
|
hs/byte-x-acc-1:   1612 ▌ 
|
hs/byte-x-acc-2:   1612 ▌ 
|
hs/byte-x-foldl:   1612 ▌ 
|
c/byte-getchar: 384 ▏ 
|
c/byte-getchar-u:   384 ▏ 
|
c/byte-4k:  380 ▏ 
|

Time (space counting)   std
-   avg dev slack
hs/space-bs-c8-acc-1: 2.467  1‰ 0.3  ███▌ 
|
hs/space-bs-c8-foldlx-1:  2.585  2‰ 0.1  ███▊ 
|
hs/space-bs-c8-foldlx-2:  2.576  2‰ 0.3  ███▋ 
|
hs/space-bs-c8-foldrx:2.982  8‰ 2.3  ████▎  
  |
hs/space-bs-c8-lenfil:2.599  1‰ 0.2  ███▊ 
|
hs/space-bslc8-acc-1:15.228  8‰ 0.1  
█████████████████████▊  
 |
hs/space-bslc8-acc-2:15.855 38‰ 0.0  
██████████████████████▋   
   |
hs/space-bslc8-acc-3:14.980 14‰ 0.0  
█████████████████████▍  
 |
hs/space-bslc8-chunk-1:   2.443  2‰ 0.2  ███▌ 
|
hs/space-bslc8-chunk-2:   2.449  1‰ 0.3  ███▌ 
|
hs/space-bslc8-chunk-3:   2.534  3‰ 0.3  ███▋ 
|
hs/space-bslc8-foldl: 2.938  1‰ 0.2  ████▎  
  |
hs/space-bslc8-foldlx-1:  2.928  1‰ 0.0  ████▏  
  |
hs/space-bslc8-foldlx-2:  2.937  2‰ 0.2  ████▎  
  |
hs/space-bslc8-foldr-1:   4.043  6‰ 0.1  █████▊   
|
hs/space-bslc8-foldr-2:   4.007  4‰ 0.1  █████▊   
|
hs/space-bslc8-lenfil-1:  3.240  1‰ 0.2  ████▋

[Haskell-cafe] [8/16] SBM: 6.9.20071119 Pentium III

2007-12-22 Thread Peter Firefly Brodersen Lund
This set of measurements was captured by me on my old Compaq Armada E500 from
around the year 2000.  It has an Intel Coppermine Pentium III running at 600
MHz with 384 MB RAM and running Ubuntu 7.10.

The benchmarks were run today (2007-12-21) using ghc 6.9.20071119 (freshly
downloaded and reinstalled) in runlevel 1 (single-user mode).  I did leave
the wireless card in, which might have produced some interrupts.  Still, the
quality of the timings proved to be good, with a standard deviation of at most
1.2% of the average run-time.

This run includes the hand-tweaked assembly benchmarks (as does the Athlon64
run in the next email).

Note how much less the assembler tweaks help here (until one gets down to the
wicked MMX tweaks) compared to the situation on the Athlon64 in the next mail.
They still help, though.

-Peter

ls-search
ghc 6.9.20071119
Pentium III (Coppermine)
596.932 MHz
TESTKIND=THOROUGH
SUFFIX=


Time (byte counting) std
 avg dev slack
hs/byte-bsacc: 3.274  1‰ 0.1  ███   
  |
hs/byte-bsfoldlx:  4.027  0‰ 0.0  ███▋
|
hs/byte-bsfoldrx:  4.184  1‰ 0.0  ███▉
|
hs/byte-bsl---acc:28.005 10‰ 0.0  
█████████████████████████▋  
|
hs/byte-x-acc-1:  25.852  4‰ 0.0  
███████████████████████▋
|
hs/byte-x-acc-2:  25.622  5‰ 0.0  
███████████████████████▌
|
hs/byte-x-foldl:  25.803  5‰ 0.0  
███████████████████████▋
|
hand/byte-bsacc-a: 3.511  1‰ 0.0  ███▎
|
hand/byte-bsacc-b: 1.998  2‰ 0.1  █▉  
|
hand/byte-bsacc-c: 1.876  2‰ 0.1  █▊  
|
hand/byte-bsacc-d: 1.876  1‰ 0.1  █▊  
|
c/byte-getchar:   13.016  0‰ 0.0  
███████████▉|
c/byte-getchar-u:  1.662  1‰ 0.1  █▌  
|
c/byte-4k: 0.543  2‰ 0.2  ▌ 
  |

Memory: Peak
---  KB 
hs/byte-bsacc:147752 
███████████████████████████████████████▏
 |
hs/byte-bsfoldlx: 147756 
███████████████████████████████████████▏
 |
hs/byte-bsfoldrx: 147760 
███████████████████████████████████████▏
 |
hs/byte-bsl---acc:  3180 ▉
|
hs/byte-x-acc-1:1916 ▌
|
hs/byte-x-acc-2:1912 ▌
|
hs/byte-x-foldl:1912 ▌
|
hand/byte-bsacc-a:147772 
███████████████████████████████████████▏
 |
hand/byte-bsacc-b:147776 
███████████████████████████████████████▏
 |
hand/byte-bsacc-c:147772 
███████████████████████████████████████▏
 |
hand/byte-bsacc-d:147776 
███████████████████████████████████████▏
 |
c/byte-getchar:  436 ▏
|
c/byte-getchar-u:432 ▏
|
c/byte-4k:   436 ▏
|

Time (space counting)std
-avg dev slack
hs/space-bs-c8-acc-1:  4.318  1‰ 0.0  ████
|
hs/space-bs-c8-count:  3.118  1‰ 0.1  ██▉   
  |
hs/space-bs-c8-foldlx-1:   4.631  1‰ 0.0  ████▎ 
  |
hs/space-bs-c8-foldlx-2:   4.632  1‰ 0.0  ████▎ 
  |
hs/space-bs-c8-foldrx: 4.678  0‰ 0.0  ████▎ 
  |
hs/space-bs-c8-lenfil: 4.634  1‰ 0.1  ████▎ 
  |
hs/space-bslc8-acc-1: 32.733  7‰ 0.0  
██████████████████████████████
  |

[Haskell-cafe] [9/16] SBM: 6.9.20071119 Athlon64

2007-12-22 Thread Peter Firefly Brodersen Lund
This set of measurements was captured by me on my slightly old noname 2GHz
Athlon64 3000+ (in 32-bit mode on a 32-bit kernel).  It has 1GB RAM and
runs Ubuntu 7.04.

The benchmarks were run two days ago (2007-12-19) using ghc 6.9.20071119 in
runlevel 1 (single-user mode).  I did leave the network cable in, which might
have produced some interrupts (not very likely -- 1) why would it run in
promiscuous mode and 2) who broadcasts apart from my wired/wireless access
point which broadcasts a packet per second?).  Still, the quality of the
timings proved to be good, with a standard deviation of at most 2.7% of the
average run-time.

This run includes the hand-tweaked assembly benchmarks (as does the Pentium III
run in the previous email).

Note how much the assembler tweaks help!

-Peter

charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


Time (byte counting)std
avg dev slack
hs/byte-bsacc:0.705  7‰ 0.1  ███▋ 
|
hs/byte-bsfoldlx: 1.002  1‰ 0.5  █████▎   
|
hs/byte-bsfoldrx: 1.112  2‰ 0.1  █████▊   
|
hs/byte-bsl---acc:2.595 14‰ 0.1  
█████████████▌   |
hs/byte-x-acc-1:  4.436  5‰ 0.0  
███████████████████████▏
 |
hs/byte-x-acc-2:  4.473  8‰ 0.0  
███████████████████████▎
 |
hs/byte-x-foldl:  4.412 10‰ 0.0  
███████████████████████   
   |
hand/byte-bsacc-a:0.639  2‰ 0.2  ███▍ 
|
hand/byte-bsacc-b:0.414  2‰ 0.5  ██▏
  |
hand/byte-bsacc-c:0.414  3‰ 0.2  ██▏
  |
hand/byte-bsacc-d:0.415  3‰ 0.2  ██▏
  |
c/byte-getchar:   2.422 27‰ 0.1  
████████████▋|
c/byte-getchar-u: 0.632  3‰ 0.3  ███▎ 
|
c/byte-4k:0.094 26‰ 2.2  ▌  
  |

Memory: Peak
---  KB 
hs/byte-bsacc:147752 
███████████████████████████████████████▏
 |
hs/byte-bsfoldlx: 147748 
███████████████████████████████████████▏
 |
hs/byte-bsfoldrx: 147744 
███████████████████████████████████████▏
 |
hs/byte-bsl---acc:  3172 ▉
|
hs/byte-x-acc-1:1904 ▌
|
hs/byte-x-acc-2:1904 ▌
|
hs/byte-x-foldl:1900 ▌
|
hand/byte-bsacc-a:147764 
███████████████████████████████████████▏
 |
hand/byte-bsacc-b:147764 
███████████████████████████████████████▏
 |
hand/byte-bsacc-c:147760 
███████████████████████████████████████▏
 |
hand/byte-bsacc-d:147760 
███████████████████████████████████████▏
 |
c/byte-getchar:  440 ▏
|
c/byte-getchar-u:440 ▏
|
c/byte-4k:   440 ▏
|

Time (space counting)   std
-   avg dev slack
hs/space-bs-c8-acc-1: 1.145  1‰ 0.2  ██████   
|
hs/space-bs-c8-count: 0.521  1‰ 0.2  ██▊
  |
hs/space-bs-c8-foldlx-1:  1.221  1‰ 0.1  ██████▍
  |
hs/space-bs-c8-foldlx-2:  1.219  2‰ 0.2  ██████▍
  |
hs/space-bs-c8-foldrx:1.172  2‰ 0.3  ██████▏
  |
hs/space-bs-c8-lenfil:1.223  1‰ 0.2  ██████▍
  |
hs/space-bslc8-acc-1: 3.388 10‰ 0.1  
█████████████████▋   |
hs/space-bslc8-acc-2: 3.386  5‰ 0.1  

[Haskell-cafe] [10/16] SBM: Graphs for 6.9.x across four cpus

2007-12-22 Thread Peter Firefly Brodersen Lund
This is what you get if you merge the previous four reports (and filter out the
hand-tweaked assembly benchmarks).

I generated the report with the following command:

  EXCLUDE='^hand/' \
  tools/merge.pl   \
 ghc-armada-thorough-6.9.tgz\
 ghc-fischer-thorough-6.9.tgz   \
 ghc-albatross-thorough-6.9.tgz \
 ghc-thorough-6.9.tgz   \
xx

This graph shows the memory usage to be (almost) exactly the same.  The
difference can easily be explained with slightly different versions of the
C library (there was also a security update or two from Ubuntu in the last
few days - the kernel was definitely updated and I think the C library was
too).  We are also talking about two versions of Ubuntu and one (old) version
of SuSE.  And we are talking about three not quite identical versions of ghc.

The fischer and albatross runs were made before the inclusion of the three
benchmarks from Don Stewart (hs/space-bs-c8-count, hs/space-bslc8-count, and
hs/space-bslc8-chunk-4) so they are a couple of holes in the graphs.

The speed pattern is more fun.  It makes no sense to compare absolute times
here so the graphs were not rescaled.  One would naïvely expect the bars to
be about the same when taken in groups of four but that really turns out to be
far from the case!

The explanation is most likely that we are looking at four very different
microarchitectures.

 **
This should hammer home the point that benchmarking on any single machine
isn't enough!
 **

Two probably aren't enough either...

-Peter

ls-search
ghc 6.9.20071119
Pentium III (Coppermine)
596.932 MHz
TESTKIND=THOROUGH
SUFFIX=


Fischer's machine
ghc 6.9.20071124
AMD Duron(tm) processor
1200.089 MHz
TESTKIND=THOROUGH
SUFFIX=



albatross
ghc 6.9.20071208 (or thereabouts)
Genuine Intel(R) CPU   T2300  @ 1.66GHz (Core Duo)
1667.000 MHz
TESTKIND=THOROUGH
SUFFIX=



charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


Time (byte counting)std
avg dev slack
hs/byte-bsacc:3.274  1‰ 0.1  ███▏ 
|
 --   1.892 21‰ 0.4  ██▊
  |
 --   0.918  4‰ 0.1  █████▍   
|
 --   0.705  7‰ 0.1  ███▋ 
|
hs/byte-bsfoldlx: 4.027  0‰ 0.0  ███▊ 
|
 --   2.258  3‰ 0.1  ███▎ 
|
 --   1.014  2‰ 0.2  ██████   
|
 --   1.002  1‰ 0.5  █████▎   
|
hs/byte-bsfoldrx: 4.184  1‰ 0.0  ███▉ 
|
 --   2.933  0‰ 0.1  ████▎  
  |
 --   0.999  2‰ 0.1  █████▉   
|
 --   1.112  2‰ 0.1  █████▊   
|
hs/byte-bsl---acc:   28.005 10‰ 0.0  
██████████████████████████▍
  |
 --  14.319 45‰ 0.1  
████████████████████▌|
 --   1.957  1‰ 0.1  ███████████▌ 
|
 --   2.595 14‰ 0.1  
█████████████▌   |
hs/byte-x-acc-1: 25.852  4‰ 0.0  
████████████████████████▎ 
   |
 --  20.915 17‰ 4.0  
█████████████████████████████▉
   |
 --   3.591  2‰ 0.1  
█████████████████████▏  
 |
 --   4.436  5‰ 0.0  
███████████████████████▏
 |
hs/byte-x-acc-2: 25.622  5‰ 0.0  
████████████████████████▏ 
   |
 --  20.691  8‰ 0.1  
█████████████████████████████▌
   |
 --   3.598  4‰ 0.1  
█████████████████████▏  
 |
 --   4.473  8‰ 0.0  
███████████████████████▎
 |
hs/byte-x-foldl: 25.803  5‰ 0.0  
████████████████████████▎ 
   |
 --  20.610  5‰ 1.4  

[Haskell-cafe] [11/16] SBM: Graphs for hand-tweaked assembly benchmarks

2007-12-22 Thread Peter Firefly Brodersen Lund
This report compares the hand-tweaked assembly programs with the original
untweaked programs on two vastly different microarchitectures.

This is the command I ran to generate the report:
 EXCLUDE='(|-bsl|chunk|count|acc-[23]|fold|lenfil|^c/)' \
 tools/merge.pl \
   ghc-armada-thorough-6.9.tgz \
   ghc-thorough-6.9.tgz\
   xx

I cut out the memory sections manually since we've already seen them and
inserted a few newlines for grouping purposes.

The first one should note is that not all tweaks are better than the originals!
The second is that the sequence of tweaks is not quite monotonically decreasing
in run-time.

The improvements don't really start until -e on the Athlon64 and -f on both.
Not until then have the load pressure been sufficiently relieved on the L1
cache that the code actually runs faster.

Note also how the two microarchitectures seem to have plateaus in different
places.  The Athlon64 seems to have the number 3 built into its silicon (efg,
jkl, mno) which fits very well with what we know about it from AMD's
documentation (the front end splits the instructions up into smaller pieces
which then get distributed to three different pipelines, each with its own
out-of-order execution engine).

The Pentium III seems to have trouble with the simple MMX code but does very
well with the more advanced MMX code that keeps 8 space counters in a single
MMX register for many iterations.  The code I used to add those counters
horizontally is the same in both -q and -r.  Perhaps operations on both MMX
and normal registers are slow?

Loop unrolling (-s) doesn't seem to matter, in this case.

-Peter

ls-search
ghc 6.9.20071119
Pentium III (Coppermine)
596.932 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


Time (byte counting)std
avg dev slack
hs/byte-bsacc:3.274  1‰ 0.1  
███████████████████████████
  |
 --   0.705  7‰ 0.1  
█████████████████████▋  
 |
hand/byte-bsacc-a:3.511  1‰ 0.0  
████████████████████████████▉
|
 --   0.639  2‰ 0.2  
███████████████████▋ |
hand/byte-bsacc-b:1.998  2‰ 0.1  
████████████████▌|
 --   0.414  2‰ 0.5  
████████████▊|
hand/byte-bsacc-c:1.876  2‰ 0.1  
███████████████▌ |
 --   0.414  3‰ 0.2  
████████████▊|
hand/byte-bsacc-d:1.876  1‰ 0.1  
███████████████▌ |
 --   0.415  3‰ 0.2  
████████████▊|


Time (space counting)   std
-   avg dev slack
hs/space-bs-c8-acc-1: 4.318  1‰ 0.0  
███████████████████████████████████▋
 |
 --   1.145  1‰ 0.2  
███████████████████████████████████▏
 |
hand/space-bs-c8-acc-1-a: 4.318  1‰ 0.0  
███████████████████████████████████▋
 |
 --   1.177  2‰ 0.3  
████████████████████████████████████▏|
hand/space-bs-c8-acc-1-b: 4.331  1‰ 0.0  
███████████████████████████████████▋
 |
 --   1.104  1‰ 0.2  
█████████████████████████████████▉
   |
hand/space-bs-c8-acc-1-c: 4.492  1‰ 0.1  
█████████████████████████████████████|
 --   1.207  1‰ 0.3  
█████████████████████████████████████|
hand/space-bs-c8-acc-1-d: 4.354  1‰ 0.0  
███████████████████████████████████▉
 |
 --   1.191  1‰ 0.2  
████████████████████████████████████▌|

hand/space-bs-c8-acc-1-e: 4.424  0‰ 0.1  
████████████████████████████████████▌|
 --   0.937  1‰ 0.2  

[Haskell-cafe] [14/16] SBM: Behind the measurements (rationale)

2007-12-22 Thread Peter Firefly Brodersen Lund
 // I am getting sick and tired of working on this project and it's probably
 // better to get it fired off than polishing it any further.
 //
 // This email could benefit from being rewritten from a rough draft into a
 // well-crafted letter but that would take a couple of hours.
 //
 // So here it is, a lot rougher than I'd like -- but it *IS* :)

why so big input files?
  easiest way to spot non-linearity and bad memory behaviour.  Anyway, files
  should be big enough to overflow caches and kick the gc in.
  (short files interesting, too, but the big ones cause more complex behaviour
  of run-time system and CPU.  If complex behaviour is behaved then simple
  behaviour probably is too -- but can still get its constant factors
  improved.  If complex behaviour bad, then shouldn't that be fixed in any
  case?)

waitpid4() has a struct w/ info about the child program's resource usage.
  unfortunately, the peakrss field is not filled in.  Seems to be a general
  Unix problem.  I've seen complaints on the net that Solaris doesn't fill it
  in, either.  Other solution needed.

pause-at-end, /proc/self/maps + /proc/self/status.  VmmHWM = peak of VmmRSS,
  which is Resident (working) Set Size.  It doesn't say what is shared with
  other processes or the operating system, though.  In our case, we don't
  expect to share anything but some libraries -- which nobody else wants to
  share with us anyway (except for the C library).  We are the only user of
  them.
  Discovered about a week ago that I could probably have used waitid()
  w/ WNOWAIT flag but didn't know.  Was quick to write pause-at-end, anyway.
  It took about 15 minutes from the desire to know the peak memory use to
  having written and tested the first cut of it.  Pause-at-end not completely
  bullet-proof in case of dyn libraries that get unloaded before the end of
  the program has been reached.  On the other hand, plenty good enough for
  these tests + can conceivably allow more intricate poking around than
  waitid() solution.

getting good measurements - eatmem, dd, probably should also dd library.
  good to have a sacrificial run.  Good to measure how good the measurements
  are (rel. std.dev + user/sys/real check).
  why average -- disturbances are mostly interrupts, daemons that everybody
have anyway, slightly luckier/unluckier physical pages.  These are real
effects that nobody can control anyway.  I'm not interested in the best
possible times on an ideal, undisturbed machine with a helpful kernel.
I'm interested in clean times under realistic circumstances.  Therefore
average instead of minimum.
  why I use real and not user/sys -- handling of blocking reads vs. mmap vs.
   madvise/fadvise vs. reading in separate thread in the future.  User+sys
   would probably give me better numbers at the moment and I could change to
   real later.  Still, I choose to stay with real (and the difference is 
marginal,
   anyway).
   
   Funny that the exact distribution of time between sys and user fluctuates a
   lot.  In space-bslc8-lenfil-2 sys varies between 0.160s and 0.244s.  Real is
   completely stable with 5x 1.396s and 1x 1.397s.

  look at /proc/interrupts, perhaps copy before/after to .intr?  Warn if more
  than 100 (or 1000) Hz + 10%?

write date/time + runlevel to platforminfo and/or sysinfo.

barcharts
  why barcharts.
  should the time/mem barcharts be equal length?  don't think so (hard to
colour them in a text file.  Would work with less -r and the console but
not in an email or a text editor.  Visual difference is good).
But should perhaps not be /that/ different.
  visible markers if measurements bad.
   (5% real/user/sys check, typically within 0.1% on old laptop when doing a
   quick or thorough benchmark.  Occasionally up to 1% - and 3% on c/byte-4k
   because it only takes 56ms in total.)
  prints out how tight the user/sys/real thing is.

microarchitecture -- performance counters.  Would be interesting to look at
once the obvious performance problems have been handled.  Let's fix the
memory usage of bytestrings, the performance of lazy bytestrings, and 
start using registers in the machine code first.
regularity of input file probably means that branch predictor on all three
CPUs can remember pattern of spaces vs. non-spaces (or at least part of the
pattern).  Branch predictors not only use two-bit saturating counter for
strongly non-taken/weakly non-taken/weakly taken/strongly taken.  They also
try to remember the pattern of jumps/non-jumps.  A more realistic test
would have less regular input file.  This effect is very small given the
current performance limiters, though.

cache -- turned out to be pretty regular (by eyeballing cachegrind reports).
Go up a factor of 10 in filesize and the number of access also went up a
factor of 10.  The miss ratios stayed the same.  The miss ratios differed
a bit between the benchmarks but I don't think it's 

[Haskell-cafe] [12/16] SBM: Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+

2007-12-22 Thread Peter Firefly Brodersen Lund
This report combines the measurements from 7 ghc/bytestring combinations on a
single machine, namely:
 ghc 6.6.1
 ghc 6.8.2
 ghc 6.8.2 + bytestring 0.9.0.2
 ghc 6.9.20071119
 ghc 6.9.20071119 + bytestring 0.9.0.2
 ghc head-as-of-noon-2007-12-19
 ghc head-as-of-noon-2007-12-19 + bytestring 0.9.0.2

This is the command I ran to generate the report:
 RESCALE=1 EXCLUDE='(^hand/|^c/)' \
 tools/merge.pl \
   ghc-thorough-6.6.1.tgz \
   ghc-thorough-6.8.2.tgz ghc-thorough-6.8.2-bs0902.tgz \
   ghc-thorough-6.9.tgz   ghc-thorough-6.9-bs0902.tgz   \
   ghc-thorough-head.tgz  ghc-thorough-head-bs0902.tgz  \
   mergerep-amdy.txt

We can readily see that there are some memory leaks with ghc 6.9.x.  We can
also see that Don Stewart's fix in bytestring 0.9.0.2 actually fixed things for
hs/space-bslc8-lenfil-1 on ghc 6.8.2 but only improved things slightly for
6.9.x.  It also makes things worse for hs/space-bs-c8-lenfil on 6.8.2 and both
the 6.9.x's.

Lazy bytestrings are slow, when used manually, but quite acceptable, memory-
wise.  On the other hand, they cost dearly in memory on 6.9.x unless one does
it all by hand.

Good old native strings seem to be the most predictable of the bunch but they
are a bit slow.  Surprisingly, they are not always /the/ slowest option
measured!
hs/space-bslc8-acc-1 and hs/space-bslc8-acc-2 are actually slower on ghc 6.6.1
than the native strings are in all combinations except hs/space-x-lenfil.

-Peter

PS: Note that this is the same report that I sent yesterday, except that it
had the order of some of the benchmarks slightly wrong.

charybdis
ghc 6.6.1
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.8.2
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.8.2
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=-bs0.9.0.2


charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=-bs0.9.0.2


charybdis
ghc 6.9.20071217
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.9.20071217
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=-bs0.9.0.2


Time (byte counting)   std
   avg dev slack
hs/byte-bsacc:   1.020 40‰ 0.0  █████▏
|
 --  0.703  5‰ 0.4  ███▌  
|
 --  0.702  7‰ 0.3  ███▌  
|
 --  0.705  7‰ 0.1  ███▋  
|
 --  0.712  3‰ 0.1  ███▋  
|
 --  0.706  7‰ 0.1  ███▋  
|
 --  0.707  7‰ 0.9  ███▋  
|
hs/byte-bsfoldlx:0.789  3‰ 0.3  ████  
|
 --  0.993  2‰ 0.2  █████   
  |
 --  1.102  1‰ 0.2  █████▋
|
 --  1.002  1‰ 0.5  █████▏
|
 --  1.112  1‰ 0.3  █████▋
|
 --  1.024  2‰ 0.2  █████▏
|
 --  1.111  1‰ 0.1  █████▋
|
hs/byte-bsfoldrx:0.813  3‰ 0.1  ████▏   
  |
 --  1.102  2‰ 0.3  █████▋
|
 --  1.100  1‰ 0.1  █████▋
|
 --  1.112  2‰ 0.1  █████▋
|
 --  1.114  1‰ 0.4  █████▋
|
 --  1.113  2‰ 0.5  █████▋
|
 --  1.112  1‰ 0.2  █████▋
|
hs/byte-bsl---acc:   3.599 13‰ 0.0  
██████████████████▎   |
 --  2.609 17‰ 0.0  
█████████████▎|
 --  2.560 15‰ 0.1  
█████████████ |
 --  2.595 14‰ 0.1  
█████████████▏|
 --  2.574 16‰ 0.1  
█████████████ |
 --  2.613 12‰ 0.2  
█████████████▎ 

[Haskell-cafe] [13/16] SBM: Graphs that show the infidelity of -sstderr

2007-12-22 Thread Peter Firefly Brodersen Lund
Everybody seems to use +RTS -sstderr to measure the memory performance of their
Haskell (ghc) code.  That's what I started out doing, too.

However, top seemed to occasionally disagree with the run-time system.

I decided to dig deeper and ask the operating system at process exit time.

I use LD_PRELOAD to get my code injected into each benchmarked process where
it gets activated when the process is about to exit and copies /proc/self/maps
and /proc/self/status to stderr where the test harness captures them.

Unfortunately, -sstderr really *IS* unreliable!

tools/regensstderr.sh and tools/sstderr.pl are two scripts that together
generate bar graphs for an existing measurement tarball.  I used the following
shell code to generate graphs for 7 different ghc/bytestring combinations:

(for F in ghc-thorough-6.6.1.tgz \
  ghc-thorough-6.8.2.tgz ghc-thorough-6.8.2-bs0902.tgz \
  ghc-thorough-6.9.tgz   ghc-thorough-6.9-bs0902.tgz   \
  ghc-thorough-head.tgz  ghc-thorough-head-bs0902.tgz; \
do   \
  printf %s\n---\n $F; \
  tools/regensstderr.sh $F;  \
  printf \n\n; \
done \
)  xx

The graphs clearly show that -sstderr sometimes severely underestimates the
memory use and sometimes somewhat overestimates it.  Moreover, the errors are
not always for the same programs.

This holds for all ghc versions I've tested: 6.6.1, 6.8.2, 6.9.20071119, and
head as of noon 2007-12-19.

-Peter

ghc-thorough-6.6.1.tgz
---
hs/byte-bsacc: 1 147448 ▏
|██████████████████▎  |
hs/byte-bsfoldlx:145 147468 ██████████▎  
|██████████████████▎  |
hs/byte-bsfoldrx:  1 147452 ▏
|██████████████████▎  |
hs/byte-bsl---acc: 2   2872 ▏|▍ 
   |
hs/byte-x-acc-1:   1   1572 ▏|▎ 
   |
hs/byte-x-acc-2:   1   1568 ▏|▎ 
   |
hs/byte-x-foldl:   1   1572 ▏|▎ 
   |
hs/space-bs-c8-acc-1:  1 147448 ▏
|██████████████████▎  |
hs/space-bs-c8-count:  1 147448 ▏
|██████████████████▎  |
hs/space-bs-c8-foldlx-1: 145 147468 ██████████▎  
|██████████████████▎  |
hs/space-bs-c8-foldlx-2: 145 147468 ██████████▎  
|██████████████████▎  |
hs/space-bs-c8-foldrx: 1 147448 ▏
|██████████████████▎  |
hs/space-bs-c8-lenfil:   145 147464 ██████████▎  
|██████████████████▎  |
hs/space-bslc8-acc-1:  2   2872 ▏|▍ 
   |
hs/space-bslc8-acc-2:  2   2872 ▏|▍ 
   |
hs/space-bslc8-acc-3:  2   2868 ▏|▍ 
   |
hs/space-bslc8-count:  2   1476 ▏|▏ 
   |
hs/space-bslc8-foldl:  2   1552 ▏|▎ 
   |
hs/space-bslc8-foldlx-1:   2   1556 ▏|▎ 
   |
hs/space-bslc8-foldlx-2:   2   1552 ▏|▎ 
   |
hs/space-bslc8-foldr-1:  297 169480 
█████████████████████|█████████████████████|
hs/space-bslc8-foldr-2:  297 169480 
█████████████████████|█████████████████████|
hs/space-bslc8-lenfil-1:  38  38632 ██▋  |████▊ 
   |
hs/space-bslc8-lenfil-2:  38  38628 ██▋  |████▊ 
   |
hs/space-bsl---foldlx: 2   1560 ▏|▎ 
   |
hs/space-x-acc-1:  1   1568 ▏|▎ 
   |
hs/space-x-acc-2:  1   1572 ▏|▎ 
   |
hs/space-x-foldl:  1   1572 ▏|▎ 
   |
hs/space-x-lenfil: 1   1556 ▏|▎ 
   |
c/byte-getchar:-  - 
c/byte-getchar-u:  -  - 
c/byte-4k: -  - 
c/space-getchar:   -  - 
c/space-getchar-u: -  - 
c/space-4k:-  - 
c/space-megabuf:   -  - 
c/space-getwchar:  -  - 
c/space-getwchar-u:-  - 
c/space-32k:   -  - 
c/space-32k-8: -  - 



[Haskell-cafe] [16/16] SBM: Discussion and Conclusion

2007-12-22 Thread Peter Firefly Brodersen Lund
General
---
Bytestrings are faster than haskell strings.  This is good and expected.
But their memory use, ouch!  There seems to be a grave memory bug in 6.9.

Lazy bytestrings are slower than strict bytestrings -- this was completely
unexpected for me.

The strictness analyzer works *very* well.  It rarely made a difference to mark
arguments as strict (and when it did, it was very easy to use).


I/O
---
The I/O patterns seen in the various programs are:
 1) strict bytestrings = read increasingly large chunks (they double every
time).  I don't know if the already-read data have to be copied around or
if the memory allocator can handle extending existing blocks (if they are
at the front of the heap, say) like realloc() can.

Even if the blocks do get copied around, that's not where the performance
limiter is (speed-wise).  It does seem to be a bit unwise in terms of
memory pressure.

Does the gc kick in every time the heap has to expand?  Does it do a full
collection then?  If no other allocations have happened since the last
collection than the allocation that caused the heap to expand, then perhaps
skip the collection?  (or some criteria similar to that)

 2) lazy bytestrings = read blocks of 32K-8 bytes, as expected.  The C
benchmarks show that there's no penalty for reading 32K-8 vs. 32K.

 3) native strings = read blocks 8K.

The C benchmarks show that it barely matters if the block size is 4K, 32K, or
32K-8 bytes.  In any case, the small differences due to block size are
completely in the noise.  Reading very large blocks, though, as the strict
bytestrings do, actually has a cost (38-70% slower, depending on the CPU and
kernel and RAM and busspeeds, etc -- see email 10 in the series).  It is still
peanets compared for almost all the benchmarks.


Backend
---
The backend turns out to be rather bad.  It is generally very easy to improve
the generated code by hand without doing anything in terms of sophisticated
analysis.  One can rewrite inner loops using heroic code (load four characters
at a time together into a 32-bit register or even use MMX to handle eight
characters in parallel) but it isn't really necessary to do that to gain a
good improvement and start being competitive with C.

The backend is probably what is costly on some of the lazy bytestring and
native string code because there are too many tests to see if the buffer has
been exhausted (I think).  Some simple hoisting and common subexpression
elimination would go far here.

For the simpler strict bytestring benchmarks, heroic backend optimizations are
not necessary because they give less of an improvement over merely competent
register use than sensible I/O and buffer management (cache effects start to
be important once the compiler generates sensible code).

I would have liked to examine backend performance further by playing with the
generated C-- code but unfortunately, the C-- that ghc emits won't compile.

Dissecting the generated machine code for the more complicated benchmarks was
so hard that I skipped it -- I can't even recognize the labels!  It's one thing
if the backend generates lots of extra labels but another if the expected
labels simply are not there!  (this is why tools/cut.pl complains that it can't
find the main loop for some of the programs -- if the */*.discut file is
unexpectedly short, then that's the reason.)


Quality of measurements
---
The speed measurements are of extremely high quality, much more than it turns
out is needed at this moment.  I probably spent a bit too much effort there.

The memory measurements are unusual.  I don't recall having seen anything
like them elsewhere because one seldom has to work with uncooperative programs
when benchmarking (the preloading trick works with any program, as long as it
is dynamically linked on Linux and uses the standard loader).  They are
probably the biggest single contribution from this work, together with the
visualization provided by the bar charts.


Thoroughness

It really is necessary to benchmark more than one ghc/library combination,
otherwise the bad memory bugs wouldn't stand out so clearly or I would have
believed, as Don Stewart did, that he had fixed the memory allocation bug.

It also matters what microarchitectures one use, as pointed out by email 10
in this series.

And you gotta have many benchmarks, otherwise you might not really be measuring
what you think you measure.  I could probably have gotten away with fewer
benchmarks in this case but that would have left a nagging feeling at the back
of my head that there was a benchmark I /could/ have included that would have
showed up a spectacular performance bug.

Using so big input files (150MB) made the programs run slow enough that the
timer granularity (1ms) was small compared to their runtime.  It also made the
memory performance problems really stand out (and be unignorable).  I think
that was a good idea.


Re: [Haskell-cafe] Why does this blow the stack?

2007-12-22 Thread David Benbennick
On 12/22/07, David Benbennick [EMAIL PROTECTED] wrote:
 On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
  Because they simply aren't the same.

 Good point; thanks.  That means that Don's patch could theoretically
 break some existing Haskell program:

In fact, it's possible to get strictness to avoid stack overflow, and
still have laziness to allow undefined:

myEnumFrom :: Integer - [Integer]
myEnumFrom a = map (a+) $ enumDeltaIntegerStrict 0 1 where
  enumDeltaIntegerStrict x d = x `seq` x : enumDeltaIntegerStrict (x+d) d

then

*Main (myEnumFrom 42) !! (10^6)
142
*Main length $ take 10 $ myEnumFrom undefined
10
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Applying a Dynamic function to a container of Dynamics

2007-12-22 Thread oleg

Alfonso Acosta wrote:
 dynApp allows to apply a Dynamic function to a Dynamic argument:

 dynApp :: Dynamic - Dynamic - Dynamic

 I don't seem to find a way (without modifying Data.Dynamic itself) to
 code this function

This is not very difficult if we have a well-delineated (and still
infinite) type universe: to be precise, if we know the signature of
our type terms. That is usually a reasonable assumption. The trick is
to build a function that is an inverse of typeOf: given a TypeRep, we
wish to find its representative, a value whose type has the given
TypeRep. We use the result of this inverse function (to be called
reflect) when applying fromDynamic. We know that writing of reflect
is possible because we know the syntax of the type language (by
assumption) and because Haskell luckily is inconsistent as a logic
and so every type is populated. 

The code below borrows most of the machinery from the 
type-checker/typed-compiler
http://okmij.org/ftp/Haskell/staged/IncopeTypecheck.hs
which may be entitled 'How to program hypothetical proofs', as the
long title comments explain. The above code essentially implements dynApp,
which is called there typecheck_app. We adapt that code below. We
assume that our type universe is described by the following syntax
 data Type = Bool | Int | Type - Type


Regarding the previous problem: the following bit

 -- it is important to give the signature to (,) below: we pack the cons
 -- function of the right type!
 cons :: forall a b. (Typeable a, Typeable b) =
 Signal a - Signal b - Signal (a,b)
 cons (Signal sig1) (Signal sig2) =
 Signal (PrimSignal (Cons (toDyn ((,)::a-b-(a,b))) sig1 sig2))

was indeed essential. One of the main lessons of our APLAS paper was
the realization that things become significantly simpler if we do more
work at the value production site. We could not have implemented typed
Partial Evaluation or typed CPS so simply. The latter usually
considered to require significantly complex type systems, beyond the
reach of the mainstream languages. Incidentally, that paper promotes a
different way of building typed DSL (the final tagless way). 

http://okmij.org/ftp/papers/tagless-final-APLAS.pdf
http://www.cs.rutgers.edu/~ccshan/tagless/talk.pdf
http://www.cs.rutgers.edu/~ccshan/quote/language.pdf

I was thinking that perhaps that method might be beneficial to your
application. I don't know your DSL language well enough to be able to
tell though...

{-# OPTIONS -fglasgow-exts -W #-}

module DA where

import Data.Typeable
import Data.Dynamic


data Dyn = forall a. Typeable a = Dyn a

-- Check to see if a term represents a function. If so,
-- return terms that witness the type of the argument and of the body
reflect_fn :: TypeRep - Maybe (Dyn, Dyn)
reflect_fn tfun
   | (con,[arg1,arg2]) - splitTyConApp tfun, con == arrowTyCon
   = Just (reflect arg1, reflect arg2)
reflect_fn _ = Nothing

arrowTyCon = typeRepTyCon (typeOf (undefined::Int-Int))

-- reflect typerep to a type (witness). The inverse of typeOf.
reflect :: TypeRep - Dyn
reflect x | x == typeOf (undefined::Int)  = Dyn (undefined::Int)
reflect x | x == typeOf (undefined::Bool) = Dyn (undefined::Bool)
reflect x | Just (Dyn e1, Dyn e2) - reflect_fn x
   = let mkfun :: a - b - (a-b); mkfun = undefined
 in Dyn (mkfun e1 e2)

-- the re-implementation of dynApply
mydynApply :: Dynamic - Dynamic - Maybe Dynamic
mydynApply e1 e2 |
   let tfun = dynTypeRep e1,
   let targ = dynTypeRep e2,
   Just tres - funResultTy tfun targ,
   Dyn a - reflect targ,
   Dyn b - reflect tres,
   Just e1' - fromDynamic e1,
   Just e2' - fromDynamic e2 `asTypeOf` (Just a)
 = return $ toDyn (e1' e2' `asTypeOf` b)
mydynApply e1 e2 =
fail $ unwords [Bad App, of types ,show (dynTypeRep e1),and,
show (dynTypeRep e2)]


test1 :: Maybe Bool
test1 = mydynApply (toDyn not) (toDyn False) = fromDynamic
-- Just True

test2 :: Maybe Bool
test2 = mydynApply (toDyn not) (toDyn (1::Int)) = fromDynamic
-- Nothing

test3 :: Maybe Int
test3 = mydynApply (toDyn ((+)::Int-Int-Int)) (toDyn (1::Int)) = 
(\f - mydynApply f (toDyn (2::Int))) =
fromDynamic
-- Just 3

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi

Let me show you an example to prove it.
The example is limited to composition of unary functions defined on int
   u::Int-Int
   v::Int-Int
   o ::(Int-Int)-(Int-Int)-(Int-Int)
   o u v= \x-u(v(x))


#include stdio.h
#include conio.h

#include functional.h

int f1(int x){
return x+x;
}
int f2(int x){
return 2*x;
}
int g1(int x){
return x+1;
}
int g2(int x){
return x-1;
}

#define P1 P0 int main(){
#define P2 P1  printf(%d,%d,%d\n,2,
#define P3   O(f1,f2,P2)(2),
#define P4   O(g1,g2,P3)(3));
#define P5 P4  getch();
#define P6 P5 }

MAIN P6


Here is the file functional.h

#define FUNC2(x,y) x##y
#define FUNC1(x,y) FUNC2(x,y)
#define FUNC(x) FUNC1(x,__COUNTER__)

#define COMP(c,f,g,p) \
 int c (int x) { return f(g(x)); }; \
 p \
 c

#define O(f,g,p)  COMP( FUNC(a), f, g, p)

#define P0
#define MAIN

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Haskell performance

2007-12-22 Thread ajb

G'day all.

Quoting Jon Harrop [EMAIL PROTECTED]:


I would recommend adding:

1. FFT.

2. Graph traversal, e.g. nth-nearest neighbor.


I'd like to put in a request for Pseudoknot.  Does anyone still have it?

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Jules Bean

Cristian Baboi wrote:

Let me show you an example to prove it.


That's not C.

That's the C preprocessor, which is a textual substitution macro 
language. Macros certainly aren't first class (you can't pass a macro to 
a function, only its expansion).


C does support function pointers, which are something like first class 
functions. The main things C lacks which people associate with true 
first-class function is:


The ability to construct anonymous/local functions.

The ability to capture local variables and return a function with some 
variables bound.


The ability to write type-safe functions with polymorphic arguments.

Jules
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Paul Johnson

Cristian Baboi wrote:

Let me show you an example to prove it.
This reminds me of arguments in the late 80s and early 90s that C could 
do OO programming via function pointers, so there was no need for OO 
languages.


Paul.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Peter Verswyvelen
Actually, the C/C++ programmers I know think nothing can be done with 
functional programming languages because they are so used the concept of a 
C function, that they think that Haskell functions are just that, functions 
like in C.

C# does not have function pointers, only the concept of a delegates (which was 
copied from Delphi's closures, the official name?), and that is much closer 
to FP's functions

If people would see that Haskell functions are *not* that silly kind of C 
functions, but actually little objects that *can* carry immutable state, they 
would look differently at functional programming languages I guess. 

So I think the word function means different things depending to which 
community you talk...

Before I knew Haskell, the OO community started to embrace the concept of 
interfaces more and more (aka purely abstract classes). Furthermore in OO, 
many bugs are caused IMO by keeping track of mutable state and caches (which is 
often premature optimization). So for complicated tasks, I tended to use more 
and more immutable objects, e.g. objects that could be constructed once, but 
not mutated. And then I noticed that it was often not needed to precompute all 
the values that got passed to the constructor, so I added C# properties that 
computed the inner cached values once, lazily.  

IMO Haskell embraces the above ideas and much more, with the difference it 
encapsulates these ideas nicely and concisely, so you need only a fraction of 
the lines of code :)

So I guess if an OO programmer gets at this level, he should *not* look at 
Haskell, because he will not want to go back to his messy imperative 
programming languages :)

Peter

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jules Bean
Sent: Saturday, December 22, 2007 1:09 PM
To: Cristian Baboi
Cc: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Functions are first class values in C

Cristian Baboi wrote:
 Let me show you an example to prove it.

That's not C.

That's the C preprocessor, which is a textual substitution macro 
language. Macros certainly aren't first class (you can't pass a macro to 
a function, only its expansion).

C does support function pointers, which are something like first class 
functions. The main things C lacks which people associate with true 
first-class function is:

The ability to construct anonymous/local functions.

The ability to capture local variables and return a function with some 
variables bound.

The ability to write type-safe functions with polymorphic arguments.

Jules
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] GtkMathView wrapper for Haskell / Gtk2HS?

2007-12-22 Thread Peter Verswyvelen
Did anybody already wrap http://helm.cs.unibo.it/mml-widget so it can be
used with Haskell / Gtk2HS?

 

If not, would it be good project for me to learn Haskell's FFI? It involves
C++, so I guess that would be harded to wrap?

 

Thanks,

Peter

 

 

 

 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 14:28:56 +0200, Paul Johnson [EMAIL PROTECTED]  
wrote:



Cristian Baboi wrote:

Let me show you an example to prove it.
This reminds me of arguments in the late 80s and early 90s that C could  
do OO programming via function pointers, so there was no need for OO  
languages.


What is Objective C ,if not just C with some syntactic sugar ?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Fwd: Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi



--- Forwarded message ---
From: Cristian Baboi [EMAIL PROTECTED]
To: Jules Bean [EMAIL PROTECTED]
Cc:
Subject: Re: [Haskell-cafe] Functions are first class values in C
Date: Sat, 22 Dec 2007 15:58:50 +0200

On Sat, 22 Dec 2007 14:09:13 +0200, Jules Bean [EMAIL PROTECTED]
wrote:


Cristian Baboi wrote:

Let me show you an example to prove it.



That's not C.
That's the C preprocessor, which is a textual substitution macro  
language.


Well, the preprocessor is part of the language in a way. These two come
together.

Macros certainly aren't first class (you can't pass a macro to a  
function, only its expansion).


In Haskell I cannot pass a function to a function, only its expansion.

C does support function pointers, which are something like first class  
functions. The main things C lacks which people associate with true  
first-class function is:



The ability to construct anonymous/local functions.


If you look at the example you will see I've done that.

The ability to capture local variables and return a function with some  
variables bound.


If I can construct anonymous functions and constants, I can construct
functions with some variables bound.



The ability to write type-safe functions with polymorphic arguments.


I didn't know this must be a property of first-class functions.
C is staticaly typed, so type errors will be detected.


Haskell is just C with some syntactic sugar :-)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 14:55:44 +0200, Peter Verswyvelen [EMAIL PROTECTED]  
wrote:


Before I knew Haskell, the OO community started to embrace the concept  
of interfaces more and more (aka purely abstract classes). Furthermore  
in OO, many bugs are caused IMO by keeping track of mutable state and  
caches (which is often premature optimization). So for complicated  
tasks, I tended to use more and more immutable objects, e.g. objects  
that could be constructed once, but not mutated. And then I noticed that  
it was often not needed to precompute all the values that got passed to  
the constructor, so I added C# properties that computed the inner  
cached values once, lazily.


Lazy constant in C:

int C1 (){
return 7;
}
C1 is computed only when you apply the operator () to it.

IMO Haskell embraces the above ideas and much more, with the difference  
it encapsulates these ideas nicely and concisely, so you need only a  
fraction of the lines of code :)


This is why we have syntactic sugar :)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Miguel Mitrofanov

Lazy constant in C:

int C1 (){
return 7;
}
C1 is computed only when you apply the operator () to it.


But that's not the point of lazyness. Lazy value is computed only ONCE.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Miguel Mitrofanov

That's not C.
That's the C preprocessor, which is a textual substitution macro  
language.


Well, the preprocessor is part of the language in a way. These two  
come

together.


No. In fact, these are even two different programs, see man cpp.

Macros certainly aren't first class (you can't pass a macro to a  
function, only its expansion).


In Haskell I cannot pass a function to a function, only its expansion.


What do you mean by expansion? Can you clarify this?

C does support function pointers, which are something like first  
class functions. The main things C lacks which people associate  
with true first-class function is:



The ability to construct anonymous/local functions.


If you look at the example you will see I've done that.


No. Your compose macro is not a function; for example, you can't  
use it as an argument to itself (which is easy in Haskell: (.)(.))


The ability to capture local variables and return a function with  
some variables bound.


If I can construct anonymous functions and constants, I can  
construct

functions with some variables bound.


See above. You can't.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 16:25:26 +0200, Miguel Mitrofanov  
[EMAIL PROTECTED] wrote:



That's not C.
That's the C preprocessor, which is a textual substitution macro  
language.


Well, the preprocessor is part of the language in a way. These two come
together.



No. In fact, these are even two different programs, see man cpp.


These two different programs come together.

Macros certainly aren't first class (you can't pass a macro to a  
function, only its expansion).


In Haskell I cannot pass a function to a function, only its expansion.



What do you mean by expansion? Can you clarify this?


f1=\x-x+1
f2=\x-2*x
g=\x-x.f1
h=\x-x.(\x-x+1)

h is g

C does support function pointers, which are something like first class  
functions. The main things C lacks which people associate with true  
first-class function is:



The ability to construct anonymous/local functions.


If you look at the example you will see I've done that.


No. Your compose macro is not a function; for example, you can't use  
it as an argument to itself (which is easy in Haskell: (.)(.))


Ok.

The ability to capture local variables and return a function with some  
variables bound.


If I can construct anonymous functions and constants, I can  
construct

functions with some variables bound.


See above. You can't.


Ok.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Peter Verswyvelen
Cristian Baboi wrote
 Lazy constant in C:
 int C1 (){ return 7; }

Not really, this is not lazy, since it always recomputes the value 7. 

To have lazy values in C you would have to do something like:

struct ValueInt
{  
  int IsComputed;
  union
  {
int Value;
struct
{
   int (*ComputeValue)(void *args);
   void* Args; 
};
  };
};

int GetLazyInt (ValueInt* v)
{
  if( !v-IsComputed )
  {
v-Value = v-ComputeValue(v-Args);
v-IsComputed = true;
  }
  return v-Value;
}

But this of course, is totally useless in C and very bulky. It's also 
impossible to know when to call freemem on the Args (hence garbage collection 
in FP), when *not* to use lazy values but strict values instead (hence 
strictness analysis in FP), etc...

I must say I had the same opinion as you had for many many years. I always 
thought functions as first class values where just function pointers, so what 
is it these Haskell/FP guys are so excited about? But if you dig deeper, you'll 
see the magic... Notice you will have to give yourself some time; it is very 
hard to get out of the imperative blob. E.g. I'm still being sucked into the 
imperative blob after my first year of Haskell hacking :)

PS: As I'm relatively new to Haskell, don't take the above C code too 
seriously; it certainly will not reflect the way a real Haskell system works.

Peter

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Jeremy Apthorp
On 23/12/2007, Peter Verswyvelen [EMAIL PROTECTED] wrote:
 Cristian Baboi wrote
  Lazy constant in C:
  int C1 (){ return 7; }

 Not really, this is not lazy, since it always recomputes the value 7.

Actually GCC will happily optimise this away in almost all cases.

--
- Jeremy Apthorp
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 16:26:18 +0200, Miguel Mitrofanov  
[EMAIL PROTECTED] wrote:



Lazy constant in C:

int C1 (){
return 7;
}
C1 is computed only when you apply the operator () to it.



But that's not the point of lazyness. Lazy value is computed only ONCE.


Ok. I guess I cannot be sure I'll call C1 only once.


How about this

int C1(){
static c1=-;
if(c1==-){
c1=7
}
return c1
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 16:55:08 +0200, Miguel Mitrofanov  
[EMAIL PROTECTED] wrote:



In Haskell I cannot pass a function to a function, only its expansion.



What do you mean by expansion? Can you clarify this?


f1=\x-x+1
f2=\x-2*x
g=\x-x.f1
h=\x-x.(\x-x+1)

h is g


Not clear. Try starting with function's expansion is...



function's expansion is ... just like macro expansion.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Philippa Cowderoy
On Sat, 22 Dec 2007, Cristian Baboi wrote:

 On Sat, 22 Dec 2007 16:55:08 +0200, Miguel Mitrofanov
 [EMAIL PROTECTED] wrote:
 
 In Haskell I cannot pass a function to a function, only its
 expansion.
   
What do you mean by expansion? Can you clarify this?
   
   f1=\x-x+1
   f2=\x-2*x
   g=\x-x.f1
   h=\x-x.(\x-x+1)
   
   h is g
  
  Not clear. Try starting with function's expansion is...
 
 
 function's expansion is ... just like macro expansion.

No, it's not. Expanding variables (swapping f1 for \x-x+1) isn't the 
evaluation mechanism in haskell, g and h really are semantically 
equivalent values and we can't do actual computation just by expanding 
variables. Whereas expanding a macro is equivalent to beta-reduction 
(evaluating a function application), which isn't required before passing 
something in Haskell. We pass functions, not just their results.

Here's a trivial example that does so:

(\x - x) (\x - x)

A lambda calculus classic that doesn't typecheck in Haskell:

(\x - x x) (\x - x x)

Feel free to try evaluating it!

-- 
[EMAIL PROTECTED]

I think you mean Philippa. I believe Phillipa is the one from an
alternate universe, who has a beard and programs in BASIC, using only
gotos for control flow. -- Anton van Straaten on Lambda the Ultimate
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Fwd: Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi



On Sat, 22 Dec 2007 16:48:51 +0200, Peter Verswyvelen [EMAIL PROTECTED]
wrote:


Cristian Baboi wrote

Lazy constant in C:
int C1 (){ return 7; }


Not really, this is not lazy, since it always recomputes the value 7.

To have lazy values in C you would have to do something like:

struct ValueInt
{
  int IsComputed;
  union
  {
int Value;
struct
{
   int (*ComputeValue)(void *args);
   void* Args;
};
  };
};

int GetLazyInt (ValueInt* v)
{
  if( !v-IsComputed )
  {
v-Value = v-ComputeValue(v-Args);
v-IsComputed = true;
  }
  return v-Value;
}

But this of course, is totally useless in C and very bulky. It's also  
impossible to know when to call freemem on the Args (hence garbage  
collection in FP), when *not* to use lazy values but strict values  
instead (hence strictness analysis in FP), etc...


I know FP have automatic garbage collection.
I know FP compilers use strictness analysis.

In C++ one can isolate memory management in constructors and destructors.
There are C compilers that are also able to do some optimizations.

I must say I had the same opinion as you had for many many years. I  
always thought functions as first class values where just function  
pointers, so what is it these Haskell/FP guys are so excited about? But  
if you dig deeper, you'll see the magic... Notice you will have to give  
yourself some time; it is very hard to get out of the imperative blob.  
E.g. I'm still being sucked into the imperative blob after my first year  
of Haskell hacking :)


PS: As I'm relatively new to Haskell, don't take the above C code too  
seriously; it certainly will not reflect the way a real Haskell system  
works.


I am new to Haskell, but not new to declarative programming. I programmed
in Prolog for several years, and I tryed LISP, but I don't liked the LISP
syntax.

I don't take my C example seriously either.
The thing is I think that for a language to have first-class functions,
it must be homoiconic if I understand the terms correctly.

Have you tryed to write a Haskell program that manipulate Haskell programs
?
Please don't tell me that Haskell compiler is written in Haskell, because
there are C compilers written in C.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy [EMAIL PROTECTED]  
wrote:




function's expansion is ... just like macro expansion.


No, it's not. Expanding variables (swapping f1 for \x-x+1) isn't the
evaluation mechanism in haskell, g and h really are semantically
equivalent values and we can't do actual computation just by expanding
variables. Whereas expanding a macro is equivalent to beta-reduction
(evaluating a function application), which isn't required before passing
something in Haskell. We pass functions, not just their results.

Here's a trivial example that does so:

(\x - x) (\x - x)

A lambda calculus classic that doesn't typecheck in Haskell:

(\x - x x) (\x - x x)



Feel free to try evaluating it!


Thank you for your message.

I tryed and this is what I've got:
ERROR - cannot find show function for:
*** Expression : (\x - x) (\x - x)
*** Of type: a - a


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Fwd: Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 17:18:18 +0200, Philippa Cowderoy [EMAIL PROTECTED]  
wrote:


The thing is I think that for a language to have first-class  
functions,

it must be homoiconic if I understand the terms correctly.



You're confusing functions with the terms that are used to define them.



The terms aren't first-class, the functions are. This is intentional: the
only way you can tell functions apart is if they give you different
results for the same parameter. Otherwise, what you have isn't a function
but a combination of a function and some extra structure.


I also confuse numbers with the terms that are used to define them (like  
1.2)


I guess I have to study more about this.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Miguel Mitrofanov



function's expansion is ... just like macro expansion.


Then you CAN pass a function to another function.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Philippa Cowderoy
On Sat, 22 Dec 2007, Cristian Baboi wrote:

 On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy [EMAIL PROTECTED]
 wrote:
 
  Here's a trivial example that does so:
  
  (\x - x) (\x - x)
  
  A lambda calculus classic that doesn't typecheck in Haskell:
  
  (\x - x x) (\x - x x)
 
  Feel free to try evaluating it!
 
 Thank you for your message.
 
 I tryed and this is what I've got:
 ERROR - cannot find show function for:
 *** Expression : (\x - x) (\x - x)
 *** Of type: a - a
 

Yep, that's because while it can evaluate it down to (\x - x) your 
interpreter doesn't know how to print the result. You can demonstrate that 
it works by then passing in something to that result though:

((\x -x) (\x - x)) 1

You'll have to evaluate the other one by hand. Don't spend too long with 
it though!

-- 
[EMAIL PROTECTED]

The reason for this is simple yet profound. Equations of the form
x = x are completely useless. All interesting equations are of the
form x = y. -- John C. Baez
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Functions are first class values in C

2007-12-22 Thread Cristian Baboi
On Sat, 22 Dec 2007 17:25:34 +0200, Philippa Cowderoy [EMAIL PROTECTED]  
wrote:



On Sat, 22 Dec 2007, Cristian Baboi wrote:

On Sat, 22 Dec 2007 17:13:55 +0200, Philippa Cowderoy  
[EMAIL PROTECTED]

wrote:

 Here's a trivial example that does so:

 (\x - x) (\x - x)

 A lambda calculus classic that doesn't typecheck in Haskell:

 (\x - x x) (\x - x x)

 Feel free to try evaluating it!

Thank you for your message.

I tryed and this is what I've got:
ERROR - cannot find show function for:
*** Expression : (\x - x) (\x - x)
*** Of type: a - a




Yep, that's because while it can evaluate it down to (\x - x) your
interpreter doesn't know how to print the result. You can demonstrate  
that

it works by then passing in something to that result though:

((\x -x) (\x - x)) 1


I know that.  The reason the interpreter doesn't know how to print the  
result is because converting functions to strings doesn't make sense.


Thank you.


You'll have to evaluate the other one by hand. Don't spend too long with
it though!


Don't worry, I'm lazy too :-)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Stefan O'Rear
On Sat, Dec 22, 2007 at 06:00:29PM +, Joost Behrends wrote:
 Hi,
 
 while still working on optimizing (naively programmed) primefactors i watched 
 a
 very strange behavior of ghc. The last version below takes 2.34 minutes on my
 system for computing 2^61+1 = 3*768614,336404,564651. Importing Data.Char
 without anywhere using it reduces this time to 1.34 minute - a remarkable 
 speed
 up. System is WindowsXP on 2.2GHZ Intel, 512MB Ram.
 
 I give the complete code here - hopefully all tabs are (4) blanks. Can this be
 reproduced ? I compile just with --make -O2.

If you can reproduce it on your machine (rm executable *.o *.hi
between tests for maximum reliability), it's definitely a bug.

http://hackage.haskell.org/trac/ghc/wiki/ReportABug

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Hoogle search scope question -- was: [HXT] Simple question

2007-12-22 Thread Steve Lihn
I have a hoogle question. While I was reading the HXT discussion
(below), I tried to search runX and readString in Hoogle (since I
am new to HXT and Arrows). But neither search yielded any result and I
had to use google to find the Haskell docs.

So I am wondering what is the scope of Hoogle that people can search?
Only things in the standard library? How much does it cover in the
Hackage? I tried to find the scope description in the Tutorial but in
vain. It is fair to say if someone needs to use Hoogle, he/she
probably has no idea what he is looking for, needless to say, what is
in scope of Hoogle or what is not.

Thanks,
Steve


On Dec 19, 2007 7:48 AM, Miguel Mitrofanov [EMAIL PROTECTED] wrote:
 Seems rather strange for me, I've just installed HXT and got this:

 Prelude Text.XML.HXT.Arrow runX $ readString [(a_validate,0)] this 
 /this  writeDocumentToString []
 [?xml version=\1.0\ encoding=\UTF-8\?\nthis /this]

  Everything works fine except for the fact that all the nodes  this
  /this
  (that is, a space (an XML text node whose contents are a single space
  character)
  within a this element node) get transformed to a  this/  element node,

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Stefan O'Rear stefanor at cox.net writes:


 If you can reproduce it on your machine (rm executable *.o *.hi
 between tests for maximum reliability), it's definitely a bug.
 
 http://hackage.haskell.org/trac/ghc/wiki/ReportABug
 
 Stefan

Yes, it was the same as before. Had i reboot meanwhile, because i was out for 
meal. 1.34 vs. 2.34 minutes are still the same times now, WITH deletion of the 
object files and the .exe between both compiles and runs.

What is happening here ? Does importing Data.Char shadow (in a hidden way) some
timeworn types or methods in the Prelude ? If so, ghc might be still faster
than it looks now.

Cheers, Joost




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hoogle search scope question -- was: [HXT] Simple question

2007-12-22 Thread Neil Mitchell
Hi

 I have a hoogle question. While I was reading the HXT discussion
 (below), I tried to search runX and readString in Hoogle (since I
 am new to HXT and Arrows). But neither search yielded any result and I
 had to use google to find the Haskell docs.

 So I am wondering what is the scope of Hoogle that people can search?

http://www.haskell.org/haskellwiki/Hoogle#Scope_of_Searches (new
section just added)

 Only things in the standard library?

Mainly those things distributed with GHC.

 How much does it cover in the Hackage?

Virtually none. This will change at some point in the future.

 I tried to find the scope description in the Tutorial but in
 vain. It is fair to say if someone needs to use Hoogle, he/she
 probably has no idea what he is looking for, needless to say, what is
 in scope of Hoogle or what is not.

In general, Hoogle is designed if you want to find a generally
applicable function. If you are using HXT, then you probably know
whether you want a general function or one that works over the HXT
data types/classes/monads etc. I do want to expand Hoogle to search
everything on hackage, and to have package specific searches as well,
but it requires more coding first.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why does this blow the stack?

2007-12-22 Thread Don Stewart
dbenbenn:
 On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
  Because they simply aren't the same.
 
 Good point; thanks.  That means that Don's patch could theoretically
 break some existing Haskell program:
 
 Prelude length $ take 10 ([undefined ..] :: [Integer])
 10

That's right. It makes Integer behave like the Int instance.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Daniel Fischer daniel.is.fischer at web.de writes:

 I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron).
 However, it's running over 30 minutes now trying to factorise 2^88+1 without 
 any sign of approaching success, which suggests your code has a bug (the 
 factorization is [257,229153,119782433,43872038849], so even a naive approach 
 shouldn't take much longer than a minute).


I have found the problem: We must possibly work recursive on a found factor.
This was done in former versions, but got lost when isolating the function 
found. Here is a corrected version - complete again for reproducing easily
the strange behavior with Data.Char. It decomposes 2^88+1 in 13 seconds.


module Main
where

import IO
import System.Exit
--import Data.Char

main = do
hSetBuffering stdin LineBuffering
putStrLn Number to decompose ?
s - getLine
if s == [] then
exitWith ExitSuccess
else do
putStrLn (show$primefactors$read s)
main

data DivIter = DivIter {dividend :: Integer, 
divisor  :: Integer,
bound:: Integer, 
result   :: [Integer]}

intsqrt m = floor (sqrt $ fromInteger m)

primefactors :: Integer - [Integer]
primefactors n | n2   = []
   | even n= o2 ++ (primefactors o1)
   | otherwise = if z/=1 then result res ++[z] else result res
   where 
   res = divisions (DivIter {dividend = o1, 
 divisor = 3, 
 bound = intsqrt(o1),
 result = o2})
   z = dividend res  -- is 1 sometimes
   (o1,o2) = twosect (n,[])

twosect :: (Integer,[Integer]) - (Integer,[Integer])
twosect m |odd  (fst m) = m
  |even (fst m) = twosect (div (fst m) 2, snd m ++ [2])

found :: DivIter - DivIter
found x = x {dividend = xidiv,
bound = intsqrt(xidiv), 
   result = result x ++ [divisor x]}
where xidiv = (dividend x) `div` (divisor x)

d2 :: DivIter - DivIter
d2 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 2}
 |otherwise   = d2$found x
d4 :: DivIter - DivIter
d4 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 4}
 |otherwise   = d4$found x
d6 :: DivIter - DivIter
d6 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 6}
 |otherwise   = d6$found x

divisions :: DivIter - DivIter
divisions y |or[divisor y == 3, 
divisor y == 5]   = divisions (d2 y)
|divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y)
|otherwise= y


And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char.
Hmmm ...

Cheers, Joost

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why does this blow the stack?

2007-12-22 Thread Don Stewart
dons:
 dbenbenn:
  On 12/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
   Because they simply aren't the same.
  
  Good point; thanks.  That means that Don's patch could theoretically
  break some existing Haskell program:
  
  Prelude length $ take 10 ([undefined ..] :: [Integer])
  10
 
 That's right. It makes Integer behave like the Int instance.

And we see again that strictness properties are very ill-defined,
here, the Enum types in base:

Prelude length $ take 10 ([undefined ..] :: [Int])
*** Exception: Prelude.undefined

Prelude length $ take 10 ([undefined ..] :: [()])
*** Exception: Prelude.undefined

Prelude length $ take 10 ([undefined ..] :: [Ordering])
*** Exception: Prelude.undefined

Prelude length $ take 10 ([undefined ..] :: [Bool])
*** Exception: Prelude.undefined

But,

Prelude length $ take 10 ([undefined ..] :: [Float])
10

Prelude length $ take 10 ([undefined ..] :: [Double])
10

And,
Prelude length $ take 10 ([undefined ..] :: [Integer])
10

Now,
Prelude length $ take 10 ([undefined ..] :: [Integer])
*** Exception: Prelude.undefined

So we see that Float and Double also have this problem,

Prelude head (drop 1000 [1 .. ]) :: Float
*** Exception: stack overflow

Prelude head (drop 1000 [1 .. ]) :: Double
*** Exception: stack overflow

People shouldn't be writing code that depends on this!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Daniel Fischer
Am Samstag, 22. Dezember 2007 20:48 schrieb Joost Behrends:
 Daniel Fischer daniel.is.fischer at web.de writes:
  I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron).
  However, it's running over 30 minutes now trying to factorise 2^88+1
  without any sign of approaching success, which suggests your code has a
  bug (the factorization is [257,229153,119782433,43872038849], so even a
  naive approach shouldn't take much longer than a minute).

 Yes, Daniel

 the code is not completely correct. There is a known bug. Don't know the
 connection to your number, but it doesn't decompose 29*29*31*31 correctly
 (gives [29,29,961] neither 59*59*61*61 (gives [59*59*3721]).

Perhaps
 
Prelude Main primefactors $ 7*17*29
[7,493]
Prelude Main primefactors $ 7*23*29
[7,667]
Prelude Main primefactors $ 7*11*23
[7,253]

help you find the bug? If not, ask and ye shall be answered.

 Difficult to find it, but that has nothing to do with the strange
 differences of speed.

True, and I have no idea how that could come.

 Cheers, Joost

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Daniel Fischer
Am Samstag, 22. Dezember 2007 21:28 schrieb Joost Behrends:

Of course, one minute after I sent my previous mail, I receive this one :(
However, one point, it might be faster to factor out all factors p in found 
and only then compute the intsqrt, like

found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ 
replicate k p}
  where
p = divisor x
(xstop,k) = go (dividend x) 0
go n m
| r == 0= go q (m+1)
| otherwise = (n,m)
  where
(q,r) = n `divMod` p

and then leaving out the recursive call in d2 etc.
For a measurable difference, you'd need a number with some high prime powers 
as factors, but still, saves some work even for squares.

 I have found the problem: We must possibly work recursive on a found
 factor. This was done in former versions, but got lost when isolating the
 function found. Here is a corrected version - complete again for
 reproducing easily the strange behavior with Data.Char. It decomposes
 2^88+1 in 13 seconds.


 module Main
 where

 import IO
 import System.Exit
 --import Data.Char

 main = do
 hSetBuffering stdin LineBuffering
 putStrLn Number to decompose ?
 s - getLine
 if s == [] then
 exitWith ExitSuccess
 else do
 putStrLn (show$primefactors$read s)
 main

 data DivIter = DivIter {dividend :: Integer,
 divisor  :: Integer,
 bound:: Integer,
 result   :: [Integer]}

 intsqrt m = floor (sqrt $ fromInteger m)

 primefactors :: Integer - [Integer]
 primefactors n | n2   = []

| even n= o2 ++ (primefactors o1)
| otherwise = if z/=1 then result res ++[z] else result res

where
res = divisions (DivIter {dividend = o1,
  divisor = 3,
  bound = intsqrt(o1),
  result = o2})
z = dividend res  -- is 1 sometimes
(o1,o2) = twosect (n,[])

 twosect :: (Integer,[Integer]) - (Integer,[Integer])
 twosect m |odd  (fst m) = m

   |even (fst m) = twosect (div (fst m) 2, snd m ++ [2])

 found :: DivIter - DivIter
 found x = x {dividend = xidiv,
   bound = intsqrt(xidiv),
  result = result x ++ [divisor x]}
 where xidiv = (dividend x) `div` (divisor x)

 d2 :: DivIter - DivIter
 d2 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 2}

  |otherwise   = d2$found x

 d4 :: DivIter - DivIter
 d4 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 4}

  |otherwise   = d4$found x

 d6 :: DivIter - DivIter
 d6 x |dividend x `mod` divisor x  0  = x {divisor = divisor x + 6}

  |otherwise   = d6$found x

 divisions :: DivIter - DivIter
 divisions y |or[divisor y == 3,
 divisor y == 5]   = divisions (d2 y)

   |divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y)
   |otherwise= y

 And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char.
 Hmmm ...

 Cheers, Joost

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Is a type synonym declaration really a synonym ?

2007-12-22 Thread alpheccar

Can someone confirm me that:

type TA = A :+: B
type TB = C :+: D
type T = TA :+: TB

is not equivalent to

type T = A :+: B :+: C :+: D

where I have defined

infixr 6 :+:
data (f :+: g)
data A
data B
data C
data D

I have a computation at type level which is working with the later  
definition of T but not with the former (ghc 6.8.1)


Thanks.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Is a type synonym declaration really a synonym ?

2007-12-22 Thread Stefan O'Rear
On Sat, Dec 22, 2007 at 10:22:46PM +0100, alpheccar wrote:
 Can someone confirm me that:

 type TA = A :+: B
 type TB = C :+: D
 type T = TA :+: TB

 is not equivalent to

 type T = A :+: B :+: C :+: D

 where I have defined

 infixr 6 :+:
 data (f :+: g)
 data A
 data B
 data C
 data D

 I have a computation at type level which is working with the later 
 definition of T but not with the former (ghc 6.8.1)

Type synonyms are implicitly parenthetized, and your :+: is
non-associative.  Compare:

s +:+ t = concat[(,s,,,t,)]

foo = a +:+ b
bar = c +:+ d
baz = (foo +:+ bar) == (a +:+ b +:+ c +:+ d)

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Is a type synonym declaration really a synonym ?

2007-12-22 Thread Twan van Laarhoven

alpheccar wrote:

Can someone confirm me that:

type TA = A :+: B
type TB = C :+: D
type T = TA :+: TB


This is

  type T = (A :+: B) :+: (C :+: D)


is not equivalent to

type T = A :+: B :+: C :+: D


is

  type T = A :+: (B :+: (C :+: D))

So these types are indeed not the same.

Twan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Stefan O'Rear
On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote:
 I'm curious if you get the same performance difference importing
 GHC.Listinstead of
 Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which
 in turn imports GHC.List, which provides a bunch of fusion rules pragmas
 that would probably optimize your (++) usage. If this is the case, not sure
 if its a bug or not, but all this will have to be thought through as more
 stream fusion is rolled out anyway, I suspect?
 --S

The Prelude imports GHC.List, iirc.

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: MonadFix

2007-12-22 Thread apfelmus

Joost Behrends wrote:

apfelmus writes:

Huh?  p  intsqrt n  is evaluated just as often as  p*p  n , with 
changing  n  . Why would that be less expensive? Btw, the code above 
test for  r==0  first, which means that the following  p*p  n  is 
tested exactly once for every prime candidate  p .


No. One point in the introduction of DivIter is, that intsqrt dividend is stored
there and only recomputed, when a new factor is found.


Yes, I'm sorry, it didn't occur to me that recomputing per actual prime 
factor (with multiplicities, i.e. p^5 counted 5 times) is better than 
recomputing per candidate factor (without multiplicities, i.e. p^5 
counted only once).



And concerning my cycled lists of summands as [2,4] or [2,4,2,4,2,4,2,6,2,6]:

an easily computable function stepping through all primes can only be
a function, which yields primes plus some more odd numbers. This is, what i
tried.


Yes, this scheme was my intention, too. The list  primes'  doesn't need 
to (and indeed shouldn't) be a list of actual primes, just a good guess like


  primes' = 2:[3,5]
  primes' = 2:3:scanl (+) 5 (cycle [2,4])

or something with [2,4,2,4,2,4,2,6,2,6]. So, it's an infinite list of 
numbers that qualify as candidates for being a prime factor of  n 
(which I called prime candidates. Not a good name, since they don't 
need to be actual prime numbers.)



What I want to say is that using such an infinite is a nice way to 
separate the generate-prime-factor-candidates-logic from the 
test-all-those-candidates-loop. It's not necessary to hard-code it with 
a predicate like



iterdivisors x | x == 0 = 3
   | x == 1 = 5
   | otherwise x = iterdivisors (x-1) + ((cycle [2,4]) !! x)


(which, in this particular form, is hopelessly inefficient) or special 
step functions like



d2 :: DivIter - DivIter
d2 x |dividend x `mod` divisor x  0  = x { divisor = divisor x + 2}
 |otherwise   = found x
d4 :: DivIter - DivIter
d4 x |dividend x `mod` divisor x  0  = x { divisor = divisor x + 4}
 |otherwise   = found x
d6 :: DivIter - DivIter
d6 x |dividend x `mod` divisor x  0  = x { divisor = divisor x + 6}
 |otherwise   = found x

divisions :: DivIter - DivIter
divisions y |or[divisor y == 3, 
divisor y == 5]   = divisions (d2 y)

|divisor y = bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y)
|otherwise= y


It's not even necessary to treat 2 in a special way like


twosect :: (Integer,[Integer]) - (Integer,[Integer])
twosect m |odd  (fst m) = m
  |even (fst m) = twosect (div (fst m) 2, snd m ++ [2])


does, the  primes'  list handles it all.


Regards
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Daniel Fischer daniel.is.fischer at web.de writes:
 
 Of course, one minute after I sent my previous mail, I receive this one :(
 However, one point, it might be faster to factor out all factors p in found 
 and only then compute the intsqrt, like
 
 found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ 
 replicate k p}
 where
   p = divisor x
   (xstop,k) = go (dividend x) 0
   go n m
   | r == 0= go q (m+1)
   | otherwise = (n,m)
 where
   (q,r) = n `divMod` p

True - but be aware, that this will slightly slow down the computation for 
not multiple factors. And - as you recently noted - the really expensive
part are all the tried factors, which do not divide the queried number.

All this is just a first approach to the problem. When i talk of naively
programmed, then i want to say, that number theorists might have much better 
numerical orders marching through all primes plus some more odd numbers.
I didn't search for that on the net.

The last version was some kind of resign from tries like this:

firstPrimes = [3,5,7,11,13,17]
start = last firstPrimes
pac = product firstPrimes
slen = length lsumds

lsumds = drop 1 (fst$getSummands (singleton start, start)) where
getSummands :: (Seq Int, Int) - (Seq Int, Int)
getSummands r |snd r  bnd= getSummands ((fst r)|k, snd r + k) 
  |otherwise  = r
where
bnd = 2*pac + start
k = getNext (snd r)
getNext n |and [(n+2)`mod`x0 | x-firstPrimes] = 2 
  |otherwise= 2 + getNext (n+2)

smallmod :: Int - Int - Int
smallmod n m | nm = n | otherwise = 0

divstep :: (DivIter,Int) - (DivIter, Int)
divstep (x,n) | and [(fromInteger $ divisor x)start, ximod0] = 
  (x {divisor = divisor x + 2}, n)
  | (fromInteger$divisor x)  start = 
  (x {dividend = xidiv, 
  bound = intsqrt(xidiv), 
  result = result x ++ [divisor x]}, n) 
  | ximod0 = 
(x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen)
  | otherwise = (x {dividend = xidiv, 
bound= intsqrt(xidiv), 
result   = result x ++ [divisor x]}, n) 
where 
(xidiv, ximod) = divMod (dividend x) (divisor x)

divisions :: (DivIter, Int) - (DivIter, Int)
divisions (y,n) | divisor y = bound y = divisions (divstep (y,n))
| otherwise= (y,0)

Here the additions to divisor are taken from the sequence lsmnds (List of
SuMaNDS) - the type Seq from Data.Sequence is faster with the function index 
than Data.List with !!. getSummands is a kind of reduced sieve of 
Eratosthenes. The main improvement is the longest line:

|ximod0 = (x {divisor = divisor x + toInteger (index lsumds n)}, 
   smallmod (n+1) slen)

I even considered converting lsmnds to ByteString and storing them - the
build of lsmnds for firstPrimes = [3,5,7,11,13,17,19,23,29] (which already
has some MB footprint) takes several minutes.  

But we have to track the number of iteration we are in. And that eats up
much more than the reduction of divisions for failing factors. The code works
(called slightly modificated by primefactors), but needs 5.41 minutes
for 2^61+1 :((. Also expensive might be the lookup in lsumds - the code gets
even slower with longer lists for firstPrimes.

divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) is derived from

lsmnds [3,5] = [4,2,4,2,4,6,2,6].

For me the whole matter is closed for now - the 1.34 minutes are no bad result.
Amd anyway the code might represent a not too bad lower bound for efficiency of
decomposing algorithms. 

Auf Wiedersehen, Joost

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Reducing the executable output filesize with GHC (on Windows)

2007-12-22 Thread Peter Verswyvelen
I remember having read how to do this a while ago, but I can't google it
back.

 

How can I force the linker to exclude all the sections that are not
referenced? I thought it had to do something with -split-objs?

 

I've written a short consolemode program, but it compiles to 1.3MB. Since I
will be creating a lot of these small apps, surely this filesize can be
reduced?

 

Thanks,

Peter

 

PS: The program I wrote is 250 lines of code. It is a simple vectormath
exercise generator for my students which I wrote in 5 hours,  something I
would never been able to do in C++/C#, even though I have much more
experience in the latter languages.  Haskell's sooo nice, and I'm just
scratching the surface of it. Thank to all of you guys for the hard work J

 

 

 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Sterling Clover
Here's the Prelude imports I see at the moment. Didn't chase down the
dependencies in all the code initially and now I see that GHC.Show does
import GHC.List. Still, I suspect this has something to do with fusion
nonetheless.
#ifdef __GLASGOW_HASKELL__
import GHC.Base
import GHC.IOBase
import GHC.Exception
import GHC.Read
import GHC.Enum
import GHC.Num
import GHC.Real
import GHC.Float
import GHC.Show
import GHC.Err   ( error, undefined )
#endif

(from
http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Prelude.html)

--s


On Dec 22, 2007 4:44 PM, Stefan O'Rear [EMAIL PROTECTED] wrote:

 On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote:
  I'm curious if you get the same performance difference importing
  GHC.Listinstead of
  Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr,
 which
  in turn imports GHC.List, which provides a bunch of fusion rules pragmas
  that would probably optimize your (++) usage. If this is the case, not
 sure
  if its a bug or not, but all this will have to be thought through as
 more
  stream fusion is rolled out anyway, I suspect?
  --S

 The Prelude imports GHC.List, iirc.

 Stefan

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)

 iD8DBQFHbYVTFBz7OZ2P+dIRAi02AJ41CyIVwCRLH2MU51Sc8Rjrtgxy+ACeL1m8
 F2a0Id2PErsKgjOyggkT8Ig=
 =T0A3
 -END PGP SIGNATURE-


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Sterling Clover
I'm curious if you get the same performance difference importing
GHC.Listinstead of
Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which
in turn imports GHC.List, which provides a bunch of fusion rules pragmas
that would probably optimize your (++) usage. If this is the case, not sure
if its a bug or not, but all this will have to be thought through as more
stream fusion is rolled out anyway, I suspect?
--S
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Sterling Clover s.clover at gmail.com writes:
 
 I'm curious if you get the same performance difference importing GHC.List
instead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr,
which in turn imports GHC.List, which provides a bunch of fusion rules pragmas
that would probably optimize your (++) usage. If this is the case, not sure if
its a bug or not, but all this will have to be thought through as more stream
fusion is rolled out anyway, I suspect?
 
 
Yes - the same difference: 1.33 minutes vs. 2.30 now.

I was near at reporting this as a bug, but rejected that idea. What does bug
mean here ? I am really a rookie at Haskell - this working on primefactors
is nothing but an excercise (however i casually try number theoretic
problems and often missed a program like this). But - as it appears to me -
the dramatic advance in speed ghc has made recently is to a great extent
due to improved types and their methods. What i see at 
haskell.org/ghc/docs/latest/html/libraries looks like huge road works to me.

And then it is not a bug, if elder libraries as the Prelude perhaps are not
completely up to date.

Another problem is, that my program was not completely correct. But it didn't
crash and got most numbers decomposed correct. But the strange
behavior disappeared with a correct version. 

Perhaps the Prelude will get better now. 

Cheers, Joost



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Neil Mitchell
Hi

 Yes - the same difference: 1.33 minutes vs. 2.30 now.

 I was near at reporting this as a bug, but rejected that idea. What does bug
 mean here ?

If it can be reproduced on anyones machine, it is a bug. If you can
bundle up two programs which don't read from stdin (i.e. no getLine
calls) or the standard arguments (i.e. getArgs) which differ only by
the Data.Char import, and have massive speed difiference, then report
a bug.

You should probably also give your GHC versions and platforms etc.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Reducing the executable output filesize with GHC (on Windows)

2007-12-22 Thread Bulat Ziganshin
Hello Peter,

Sunday, December 23, 2007, 1:12:05 AM, you wrote:
 How can I force the linker to exclude all the sections that are not
 referenced?

ghc ... -optl -s

 I thought it had to do something with -split-objs?

no, it's just `strip`

 I▓ve written a short consolemode program, but it compiles to 1.3MB.

you can use upx --lzma to further decrease EXE size


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Daniel Fischer
Am Samstag, 22. Dezember 2007 22:57 schrieb Joost Behrends:
 Daniel Fischer daniel.is.fischer at web.de writes:
  Of course, one minute after I sent my previous mail, I receive this one
  :( However, one point, it might be faster to factor out all factors p in
  found and only then compute the intsqrt, like
 
  found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++
  replicate k p}
where
  p = divisor x
  (xstop,k) = go (dividend x) 0
  go n m
 
  | r == 0= go q (m+1)
  | otherwise = (n,m)
 
where
  (q,r) = n `divMod` p

 True - but be aware, that this will slightly slow down the computation for
 not multiple factors.

Why? You have to do one more division than the power of the prime dividing 
anyway. Or are you thinking about 'replicate 1 p' for simple factors? That 
will indeed be a bit slower than [p], but I dare say factorise a number with 
at least two prime squares or one prime cube dividing it, and you gain.

 And - as you recently noted - the really expensive
 part are all the tried factors, which do not divide the queried number.

Yes, for anything with sufficiently many small primes not dividing but one or 
more large, anything you do with the prime factors is peanuts.

Okay, make it (xstop,k) = go (dividend x `quot` p) 1, and replace divMod with 
quotRem. then you still have one superfluous division, but that's your 
making, not mine:

d2 x | dividend x `mod` divisor x  0 =
 | otherwise = 

make it 

d2 x | r == 0   = x{divisor = divisor x + 2}
 | otherwise = x{dividend = xstop, bound = intsqrt xstop, result = result 
x ++ replicate k p}
where
  p = divisor x
  (q,r) = dividend x `quotRem` p
  (xstop,k) = go q 1
  go n m
| r' == 0 = go q' (m+1)
| otherwise = (n,m)
  where
(q',r') = n `quotRem` p

and you have no superfluous division, exactly one intsqrt per prime dividing 
your number.

 All this is just a first approach to the problem. When i talk of naively
 programmed, then i want to say, that number theorists might have much
 better numerical orders marching through all primes plus some more odd
 numbers. I didn't search for that on the net.

A simple fast method, although not guaranteed to succeed is Pollard's 
rho-method, preceded by trial division by small primes (up to 1,2,10 million 
or so, depending on what you have) and a good pseudo-primality test (google 
for Rabin-Miller, if you're interested).

I haven't but skimmed your code below, but I think the idea is something like

factor :: Integer - [Integer]
factor 0 = error 0 has no decomposition
factor 1 = []
factor n
| n  0 = (-1):factor (-n)
| n  4 = [n]
| otherwise = go n srn tests
  where
srn = isqrt n
go m sr pps@(p:ps)
| r == 0= p:go q (isqrt q) pps
| p  sr   = if m == 1 then [] else [m]
| otherwise = go m sr ps
  where
(q,r) = m `quotRem` p
tests = 2:3:5:7:11:scanl (+) 13 (tail $ cycle dlist)

isqrt :: Integer - Integer
isqrt n
| n  10^60 = floor (sqrt $ fromInteger n)
| otherwise = 10^20*isqrt (n `quot` 10^40)

dlist :: [Integer]
dlist = zipWith (-) (tail wheel) wheel

smallPrimes :: [Integer]
smallPrimes = [2,3,5,7,11]

wheelMod :: Integer
wheelMod = product smallPrimes

wheel :: [Integer]
wheel = [r | r - [1,3 .. wheelMod+1]
, all ((/= 0) . (mod r)) (tail smallPrimes)]


Now, the construction of the wheel and the list of differences, dlist, is 
relatively time-consuming, but can be sped up by using a good sieving method 
(my choice would be using an array (STUArray s Int Bool)). Of course, when 
you go much beyond 13 with smallPrimes, you'll end up with a rather large 
dlist in your memory, but smallPrimes = [2,3,5,7,11,13,17] should still be 
okay, no indexing means it's probably faster than Seq.

Cheers,
Daniel


 The last version was some kind of resign from tries like this:

 firstPrimes = [3,5,7,11,13,17]
 start = last firstPrimes
 pac = product firstPrimes
 slen = length lsumds

 lsumds = drop 1 (fst$getSummands (singleton start, start)) where
 getSummands :: (Seq Int, Int) - (Seq Int, Int)
 getSummands r |snd r  bnd= getSummands ((fst r)|k, snd r + k)

 |otherwise  = r

 where
 bnd = 2*pac + start
 k = getNext (snd r)
 getNext n |and [(n+2)`mod`x0 | x-firstPrimes] = 2

   |otherwise= 2 + getNext
   | (n+2)

 smallmod :: Int - Int - Int
 smallmod n m | nm = n | otherwise = 0

 divstep :: (DivIter,Int) - (DivIter, Int)
 divstep (x,n) | and [(fromInteger $ divisor x)start, ximod0] =
   (x {divisor = divisor x + 2}, n)

   | (fromInteger$divisor x)  start =

   (x 

Re: [Haskell-cafe] GtkMathView wrapper for Haskell / Gtk2HS?

2007-12-22 Thread Don Stewart
bf3:
Did anybody already wrap [1]http://helm.cs.unibo.it/mml-widget so it can
be used with Haskell / Gtk2HS?
 
If not, would it be good project for me to learn Haskell's FFI? It
involves C++, so I guess that would be harded to wrap?

I'm not aware of any wrapper. C++ can be wrapped by going via C (its not
done often though).

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [15/16] SBM: Predictions compared to the measurements

2007-12-22 Thread Don Stewart
firefly:
 From Don Stewart:
 Summary: the suspicious lazy bytestring program works now. (constant
 space, and fastest overall, as expected originally)
 
 Program 1, lazy bytestring length . filter
 
 Yesterday:
 ./A +RTS -sstderr  150M  1.01s user 0.10s system 98% cpu 1.123 total
 40M allocated
 
   * Today (fixed!):
 ./A +RTS -sstderr  150M  0.26s user 0.06s system 96% cpu 0.332 total
 2M allocated
 
 Reason, deprecated array fusion mucking up the optimiser.
 
 I think we can close this regression.
 
 Nope.  Look at the memory graphs:
 
 hs/space-bslc8-lenfil-1:  38632 ██▌   
 |
  --   38644 ██▌   
 |
  --1940 ▌ 
 |
  --  109404 █▋
 |
  --   82324 ██▍   
 |
  --  109388 █▋
 |
  --   82304 ██▎   
 |
 
 It is fixed for ghc 6.8.2 running bytestring 0.9.0.2 but not for ghc
 ^
 6.9.20071119 and head (as of noon 2007-12-19), no matter the bytestring
  
 version.  There are lots of memory performance bugs in ghc 6.9.

Please package this up as a bug report against GHC head. 

Any regression wrt. 6.8.2, using the same bytestring version, is going
to be a ghc issue (not a bytestring library issue).

-- Don

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [16/16] SBM: Discussion and Conclusion

2007-12-22 Thread Don Stewart
firefly:
 General
 ---
 Bytestrings are faster than haskell strings.  This is good and expected.
 But their memory use, ouch!  There seems to be a grave memory bug in 6.9.

Bug report -- since the bytestring library hasn't changed between 6.8.2
and head, that sounds like a bug report.

 Lazy bytestrings are slower than strict bytestrings -- this was completely
 unexpected for me.

Which programs should I look at? Is there a specific program or two
where lazy bytestring performance is entirely out of whack (against
6.8.2?)? If so, please forward it to me and Duncan.

 
 I/O
 ---
 The I/O patterns seen in the various programs are:
  1) strict bytestrings = read increasingly large chunks (they double every
 time).  I don't know if the already-read data have to be copied around or
 if the memory allocator can handle extending existing blocks (if they are
 at the front of the heap, say) like realloc() can.

if you're reading from stdin, and we don't know the size, bytestring
getContents uses realloc.

 Even if the blocks do get copied around, that's not where the performance
 limiter is (speed-wise).  It does seem to be a bit unwise in terms of
 memory pressure.
 
 Does the gc kick in every time the heap has to expand?  Does it do a full
 collection then?  If no other allocations have happened since the last
 collection than the allocation that caused the heap to expand, then 
 perhaps
 skip the collection?  (or some criteria similar to that)
 
  2) lazy bytestrings = read blocks of 32K-8 bytes, as expected.  The C
 benchmarks show that there's no penalty for reading 32K-8 vs. 32K.
 
  3) native strings = read blocks 8K.
 
 The C benchmarks show that it barely matters if the block size is 4K, 32K, or
 32K-8 bytes.  In any case, the small differences due to block size are
 completely in the noise.  Reading very large blocks, though, as the strict
 bytestrings do, actually has a cost (38-70% slower, depending on the CPU and
 kernel and RAM and busspeeds, etc -- see email 10 in the series).  It is still
 peanets compared for almost all the benchmarks.

Useful information, thanks.

 Backend
 ---
 The backend turns out to be rather bad.  It is generally very easy to improve
 the generated code by hand without doing anything in terms of sophisticated
 analysis.  One can rewrite inner loops using heroic code (load four characters
 at a time together into a 32-bit register or even use MMX to handle eight
 characters in parallel) but it isn't really necessary to do that to gain a
 good improvement and start being competitive with C.
 
 The backend is probably what is costly on some of the lazy bytestring and
 native string code because there are too many tests to see if the buffer has
 been exhausted (I think).  Some simple hoisting and common subexpression
 elimination would go far here.

There may be things we can do at the library level too. We changed the
lazy bytestring representation in the last release cycle to remove a
number of tests. If you've a specific example would help though.

 Thoroughness
 
 It really is necessary to benchmark more than one ghc/library combination,
 otherwise the bad memory bugs wouldn't stand out so clearly or I would have
 believed, as Don Stewart did, that he had fixed the memory allocation bug.

I fixed a particular issue with the library. But please make a bug
report if there are other issues wrt. ghc head in particularly.

 Action Items
 
 Suggested short-term actions:
  o Can the allocator expand blocks?  If not, see if it can be implemented.
  o Find out why lazy bytestrings by hand are that slow - is it due to
lack of hoisting in the backend?

Example please!

  o Improve backend, mainly by not hiding information from register allocator.
Alternatively, by using an assembly-to-assembly transformator that
improves the register allocation.

Some nice examples that could help motivate the native gen hackers would
be useful.

  o Fix -sstderr.
  o perhaps even let the RTS print out /proc/self/maps or /proc/self/status
or peak RSS returned by getrusage() in the ru_maxrss field?
  o Find memory leaks in ghc 6.9 code.

Bug report time.

  o ghc should allow round-tripping of the C-- code it generates.
  o would be awfully nice if the backend didn't sometimes throw the 
 recognizable
labels away.
  o can the core/stg code be made easier to read?

Can you ensure these points are summarised and directed to the ghc bug
tracker,

http://hackage.haskell.org/trac/ghc/newticket?type=bug

to this work isn't lost on [EMAIL PROTECTED]

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Neil Mitchell ndmitchell at gmail.com writes:
 
 If it can be reproduced on anyones machine, it is a bug. If you can
 bundle up two programs which don't read from stdin (i.e. no getLine
 calls) or the standard arguments (i.e. getArgs) which differ only by
 the Data.Char import, and have massive speed difiference, then report
 a bug.
 
 You should probably also give your GHC versions and platforms etc.

Thanks for your attention too !

Now i tried a version without input (just computing the primefactors of the
constant 2^61+1, what it did correctly in spite of its bugginess). And it
didn't have the Data.Char bug (and Data.List bug) too.

As my original code hasn't on Linux also it seems.

Thus it  happens only in an exotic configuration. Windows will stay exotic
in the Haskell community. Before should noone
has reproduced it at least on Windows (XPpro SP2 is my version), i will do 
nothing more.

The hardware is Intel Celeron 2.2GHZ, 512 MB Ram. ghc 6.8.1 lives on
D:\\Programme (not on the system drive C:, which poses problems to Cabal, 
told aside). I just made the standard installation (do not remember, whether
by unzipping alone or there was an MSI) not touching anything, of course. 
I was happy about no autoconf (which i see in the category of desasters like
EMM386 for MS-DOS).

But something is strange: ghci doesn't accept
qualified imports. It produces a parse error. That seems a bug to me, because
my ghci accepts import (any module) nevertheless. But i see it as a minor
bug - the compiler is much more important.

Happy days, Joost

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

2007-12-22 Thread Joost Behrends
Hi again, Daniel

cannot sleep tonight - perhaps from the feeling to loose too much time with
these things.

Yes - it's the wheel. And a dlist made from [3,5,7,11,13,17] was optimal in some
of my experiments too.

You will probably know it - but perhaps there are third-party readers:
A last try to improve the final version was to replace Integer by Int64
(importing Prelude qualified). There was no difference ! That is good news
and bad - the good news (and that's much more important), that Integer is
absoulutely cleanly implemented. 

However, if this is so, i have little sympathy for
being forced to use fromInteger and toInteger - for this special case i would
prefer automatic coercion. And - sorry - cannot do other than seeing anything
else as ill-advised dogmatism. But perhaps i am spoiled from Python :).

I did this, because for secure decision about divisibility the program 
isn't useful beyond that.

Thanks for that entry point: I took google Rabin-Miller in a comment of this 
last version.

Happy days, Joost  

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe