Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-18 Thread Ketil Malde
Ketil Malde [EMAIL PROTECTED] writes:

 data EvidenceCode = IAC | IUG | IFR | NAC | NR | ... deriving Show

 Could it be that this derived read instance is somehow very inefficient?

To answer my own question: this is exactly it, ghc derives less than
optimal code in this case.  Rather than reiterate the case here, I did
a quick writeup at http://blog.malde.org/, and would be quite happy
about any feedback or suggestions.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-18 Thread Andrew Coppin

Ketil Malde wrote:

Ketil Malde [EMAIL PROTECTED] writes:

  

data EvidenceCode = IAC | IUG | IFR | NAC | NR | ... deriving Show
  


  

Could it be that this derived read instance is somehow very inefficient?



To answer my own question: this is exactly it, ghc derives less than
optimal code in this case.  Rather than reiterate the case here, I did
a quick writeup at http://blog.malde.org/, and would be quite happy
about any feedback or suggestions.
  


I think you'll find the code that GHC derives for a Read instance 
handles extra whitespace and all kinds of other whatnot that you don't 
actually need in this specific case. I suspect this is what's up - but I 
don't have any hard proof for that. ;-)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-18 Thread Andrew Coppin

Andrew Coppin wrote:

Ketil Malde wrote:

Ketil Malde [EMAIL PROTECTED] writes:

 

data EvidenceCode = IAC | IUG | IFR | NAC | NR | ... deriving Show
  


 
Could it be that this derived read instance is somehow very 
inefficient?



To answer my own question: this is exactly it, ghc derives less than
optimal code in this case.  Rather than reiterate the case here, I did
a quick writeup at http://blog.malde.org/, and would be quite happy
about any feedback or suggestions.
  


I think you'll find the code that GHC derives for a Read instance 
handles extra whitespace and all kinds of other whatnot that you don't 
actually need in this specific case. I suspect this is what's up - but 
I don't have any hard proof for that. ;-)


I wrote three programs:

One does

data Tag =
 Orange |
 Lemon  |
 Lime   |
 Apple  |
 Banana |
 Pear   |
 Peach
 deriving Read

The other two use

get_tag :: String - Tag
get_tag Orange = Orange
get_tag Lemon  = Lemon
get_tag Lime   = Lime
get_tag Apple  = Apple
get_tag Banana = Banana
get_tag Pear   = Pear
get_tag Peach  = Peach
get_tag _= error not a tag

and

get_tag :: String - Tag
get_tag xs = case xs of
 [] - bad
 (x:xs1) - case x of
   'A' - case xs1 of
 pple - Apple
 _  - bad
   'B' - case xs1 of
 anana - Banana
 _   - bad
   'L' - case xs1 of
 emon - Lemon
 ime  - Lime
 _  - bad
   'O' - case xs1 of
 range - Orange
 _   - bad
   'P' - case xs1 of
 ('e':'r':xs2) - case xs2 of
   r  - Pear
   ch - Peach
   _- bad
 _ - bad
   _   - bad

bad = error not a tag

I wrote a driver program that reads a file of 1,000,000 tag values. 
Using the first version (GHC-derived Read instance) it took about 32 
seconds to process. Using the second version (just a bunch of strings to 
match, no cleaverness at all) took approximately 1 second. The 3rd 
version was so fast I didn't have time to see the window open before it 
closed again.


Note that all of this was using plain ordinary Strings, not ByteString 
or anything fancy like that.


Note also that the actual documentation for the Prelude even says that 
Read is very slow. [Although it says it's slow for reading large 
structures, not large numbers of trivial structures.]


None of this is really all that surprising; in the general case, a Read 
instance might have to skip over spaces or parse deeply nested brackets 
or any number of other things. If you know you don't need to handle all 
those cases, write your own parser. It shouldn't be hard to come up with 
something faster. [Altough obviously it's kinda tedious.]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-18 Thread Bertram Felgenhauer
Andrew Coppin wrote:
[...]
 I think you'll find the code that GHC derives for a Read instance handles 
 extra whitespace and all kinds of other whatnot that you don't actually 
 need in this specific case. I suspect this is what's up - but I don't have 
 any hard proof for that. ;-)

Parentheses are handled as well.

It's worse than that - derived read instances are defined in terms of
'lex' (via lexP in GHC.Read) which, among other things, recognizes
numerical and string constants. The latter has the odd effect that
with the following declaration,

 data A = A deriving (Read, Show)

the expression   read ('' : repeat ' ') :: A   is the wrong kind of
bottom - instead of a parse error, you get an infinite loop.

Bertram
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-17 Thread Ketil Malde
Don Stewart [EMAIL PROTECTED] writes:

 mkAnn :: ByteString - Annotation
 mkAnn = pick . B.words
 where pick (_db:up:rest) = pick' up $ getGo rest
   pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack go) 
 (read $ B.unpack ev)
   getGo = dropWhile (not . B.isPrefixOf (pack GO:))

 read $ B.unpack go

 Looks suspicious. You're unpacking to lists.

 ByteString performance rule 1: don't unpack to lists.

I tend to use this idiom a bit when I want to loop over the
characters.  The strings being unpacked is an Int and a short (two or
three letter) identifier.  Doing a 'go' loop would probably be faster,
but a lot more work, and I was hoping the String would be deforested
or fused or otherwise optimized to the bone.

I wonder if the culprit is the last 'read', it reads one from a set of
keywords/identifiers, and since they're upper case, I just made a data
type with a matching set of nullary constructors, and derived Read
for it.

I.e:

 data EvidenceCode = IAC | IUG | IFR | NAC | NR | ... deriving Show

Could it be that this derived read instance is somehow very inefficient?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-17 Thread Martin Geisler
Andrew Coppin [EMAIL PROTECTED] writes:

 Look closer: it's hardER to read.

  mean xs = sum xs / fromIntegral (length xs)

  mean = go 0 0 n
where
  go s l x
| x  m = s / fromIntegra l
| otherwise = go (s+x) (l+1) (x+1

 One version makes it instantly clear, at a glance, what is happening.
 The other requires you to mentally walk round a look, imperative
 style, to figure out what's happening. It's not a *big* deal, but it's
 unfortunate.

I am new to Haskell and when I first saw the two versions side by side I
too was disappointed with the second version.

But after reading the great blog post by Don, I realized that the whole
problem comes from the fact that lists in Haskell are not like arrays or
vectors in other languages: you don't know how long they are before you
have found the end.

In other words: they behave like a linked lists -- lists that are
generated lazily on demand. Because they are generated on demand you
*cannot* really know the length beforehand, and thus you *must* traverse
the list to its end to count the length.

When the list is too big to fit in memory then it's clear that the code
*must* let go of the beginning to allow the garbage collector to do its
job. You wouldn't be able to work with a 7.5 GiB linked list otherwise.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.


pgpgElb902o1M.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Andrew Coppin

Don Stewart wrote:

I've written an extended post on how to understand and reliably optimise
code like this, looking at it all the way down to the assembly.

The result are some simple rules to follow for generated code as good
as gcc -O2.

Enjoy,

http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast
  


A well-written piece, as always.

My feelings are ambivilent. On the one hand, it's reassuring that such 
good performance can be obtained without resorting to calling C, 
explicit unboxed types, GHC-specific hacks, strictness annotations, 
manual seq calls, strange case expressions, or really anything remotely 
odd. It's fairly plain Haskell '98 that most beginners would be able to 
read through and eventually understand. And yet it's fast.


On the other hand, this is the anti-theisis of Haskell. We start with a 
high-level, declarative program, which performs horribly, and end up 
with a manually hand-optimised blob that's much harder to read but goes 
way faster. Obviously most people would prefer to write declarative code 
and feel secure that the compiler is going to produce something efficient.


If the muse takes me, maybe I'll see if I can't find a less ugly way to 
do this...


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Don Stewart
andrewcoppin:
 Don Stewart wrote:
 I've written an extended post on how to understand and reliably optimise
 code like this, looking at it all the way down to the assembly.
 
 The result are some simple rules to follow for generated code as good
 as gcc -O2.
 
 Enjoy,
 
 http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast
   
 
 A well-written piece, as always.
 
 My feelings are ambivilent. On the one hand, it's reassuring that such 
 good performance can be obtained without resorting to calling C, 
 explicit unboxed types, GHC-specific hacks, strictness annotations, 
 manual seq calls, strange case expressions, or really anything remotely 
 odd. It's fairly plain Haskell '98 that most beginners would be able to 
 read through and eventually understand. And yet it's fast.
 
 On the other hand, this is the anti-theisis of Haskell. We start with a 
 high-level, declarative program, which performs horribly, and end up 
 with a manually hand-optimised blob that's much harder to read but goes 
 way faster. Obviously most people would prefer to write declarative code 
 and feel secure that the compiler is going to produce something efficient.
 
 If the muse takes me, maybe I'll see if I can't find a less ugly way to 
 do this...
 

I don't understand what's ugly about:

go s l x | x  m  = s / fromIntegral l
 | otherwise  = go (s+x) (l+1) (x+1)

And the point is that it is *reliable*. If you make your money day in, day out
writing Haskell, and you don't want to rely on radical transformations for
correctness, this is a sensible idiom to follow.

Nothing beats understanding what you're writing at all levels of abstraction.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Bulat Ziganshin
Hello Andrew,

Friday, May 16, 2008, 10:56:36 PM, you wrote:

 On the other hand, this is the anti-theisis of Haskell. We start with a
 high-level, declarative program, which performs horribly, and end up 
 with a manually hand-optimised blob that's much harder to read but goes
 way faster. Obviously most people would prefer to write declarative code
 and feel secure that the compiler is going to produce something efficient.

if i understood correctly, fusion system about which Don plan to told
next time, is just about translating high-level code into lower-level
one behind the scenes. but it works only on limited subset of
programs. it's what we have now - haskell is very inefficient language


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Bryan O'Sullivan
Andrew Coppin wrote:

 On the other hand, this is the anti-theisis of Haskell. We start with a
 high-level, declarative program, which performs horribly, and end up
 with a manually hand-optimised blob that's much harder to read but goes
 way faster.

Buh?  This is hard to read?

mean n m = go 0 0 n
  where go s l x | x  m  = (s::Double) / fromIntegral (l::Int)
 | otherwise  = go (s+x) (l+1) (x+1)

One can in fact imagine a world in which the compiler does this
transformation for you, though it takes a bit of squinting.

http://reddit.com/r/programming/info/6jjhg/comments/c040ybt

b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Andrew Coppin

Bryan O'Sullivan wrote:

Andrew Coppin wrote:

  

On the other hand, this is the anti-theisis of Haskell. We start with a
high-level, declarative program, which performs horribly, and end up
with a manually hand-optimised blob that's much harder to read but goes
way faster.



Buh?  This is hard to read?
  


Look closer: it's hardER to read.

 mean xs = sum xs / fromIntegral (length xs)

 mean = go 0 0 n
   where
 go s l x
   | x  m = s / fromIntegra l
   | otherwise = go (s+x) (l+1) (x+1

One version makes it instantly clear, at a glance, what is happening. 
The other requires you to mentally walk round a look, imperative style, 
to figure out what's happening. It's not a *big* deal, but it's unfortunate.


I'm more worried about what happens in less trivial examples. [Let's 
face it, who wants to compute the sum of the numbers from 1 to N?]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread apfelmus

Andrew Coppin wrote:

Bryan O'Sullivan wrote:

Andrew Coppin wrote:
 

On the other hand, this is the anti-theisis of Haskell. We start with a
high-level, declarative program, which performs horribly, and end up
with a manually hand-optimised blob that's much harder to read but goes
way faster.



Buh?  This is hard to read?
  


Look closer: it's hardER to read.

 mean xs = sum xs / fromIntegral (length xs)

 mean = go 0 0 n
   where
 go s l x
   | x  m = s / fromIntegral l
   | otherwise = go (s+x) (l+1) (x+1

One version makes it instantly clear, at a glance, what is happening. 
The other requires you to mentally walk round a look, imperative style, 
to figure out what's happening. It's not a *big* deal, but it's 
unfortunate.


I'm more worried about what happens in less trivial examples. [Let's 
face it, who wants to compute the sum of the numbers from 1 to N?]


Hm, it seems like you're expecting magic, aren't you?

Of course the first equation is easier to read, but it's no surprise 
that this may actually be slower. Like the imperative bubblesort is 
easier to read than the imperative quicksort but far slower.


Put differently, making Haskell as fast as C is easy: just write a 
slower C program! Namely one that is as easy to read as the Haskell 
version. If you implement


  mean xs = sum xs / fromIntegral (length xs)

directly in C, I bet you'll be delighted to discover that they perform 
similarly (using linked lists in the C version).



Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Andrew Coppin

Don Stewart wrote:

I don't understand what's ugly about:

go s l x | x  m  = s / fromIntegral l
 | otherwise  = go (s+x) (l+1) (x+1)

And the point is that it is *reliable*. If you make your money day in, day out
writing Haskell, and you don't want to rely on radical transformations for
correctness, this is a sensible idiom to follow.

Nothing beats understanding what you're writing at all levels of abstraction.
  


What sets Haskell apart from every other programming language ever used 
in mainstream programming? You might say conciseness, or the ability to 
use lazy evaluation to structure your code in usual ways, or something 
like that. I would say what sets Haskell apart is abstraction. There 
are other things, but this is the big one. Haskell allows you to 
abstract almost everything. The result is often highly succinct yet very 
readable programs. It would seem a terribly shame if you always have to 
throw away Haskell's key advantage to get decent runtime performance.


If you're trying to get a real program to work, right now, then yes, you 
may have no choice. But that doesn't mean we shouldn't strive for ways 
to keep code high-level yet performant.


[I'm curios about your other comment. Does anybody, anywhere in the 
world, actually make *money* using Haskell? This seems rather unlikely 
to me...]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Andrew Coppin

apfelmus wrote:

Andrew Coppin wrote:

Look closer: it's hardER to read.

 mean xs = sum xs / fromIntegral (length xs)

 mean = go 0 0 n
   where
 go s l x
   | x  m = s / fromIntegral l
   | otherwise = go (s+x) (l+1) (x+1

One version makes it instantly clear, at a glance, what is happening. 
The other requires you to mentally walk round a look, imperative 
style, to figure out what's happening. It's not a *big* deal, but 
it's unfortunate.


I'm more worried about what happens in less trivial examples. [Let's 
face it, who wants to compute the sum of the numbers from 1 to N?]


Hm, it seems like you're expecting magic, aren't you?



Well, obviously it would be nice, wouldn't it? ;-)

Of course the first equation is easier to read, but it's no surprise 
that this may actually be slower. Like the imperative bubblesort is 
easier to read than the imperative quicksort but far slower.


I'm just saying, I prefer it when somebody posts some tiny snippet of 
Haskell that does the same thing as a 40-line C program, and then show 
how using some novel technique they just invented, the Haskell version 
actually outperforms C even though it's more reasable and more 
maintainable. Hey, who *wouldn't* like to have their cake and eat it 
too? :-)


But yeah, I get the point. Everybody wants me to be quiet and go away. 
So I'll go be quiet now...


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Achim Schneider
Andrew Coppin [EMAIL PROTECTED] wrote:

 But yeah, I get the point. Everybody wants me to be quiet and go
 away. So I'll go be quiet now...

Yes and no. Everybody wants you to be quiet and go to your study,
writing a compiler that's Smart Enough(tm). We will let you out as soon
as you're finished and supply you with pizza and crackers from time to
time.

-- 
(c) this sig last receiving data processing entity. Inspect headers for
past copyright information. All rights reserved. Unauthorised copying,
hiring, renting, public performance and/or broadcasting of this
signature prohibited. 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Andrew Coppin

Achim Schneider wrote:

Andrew Coppin [EMAIL PROTECTED] wrote:

  

But yeah, I get the point. Everybody wants me to be quiet and go
away. So I'll go be quiet now...



Yes and no. Everybody wants you to be quiet and go to your study,
writing a compiler that's Smart Enough(tm). We will let you out as soon
as you're finished and supply you with pizza and crackers from time to
time.
  


I... I think you just described my ideal place of employment! 0_0

It sure as hell beats the living daylights out of the nonesense I just 
spent 9-5 today dealing with. :-S


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Philippa Cowderoy
On Fri, 16 May 2008, Don Stewart wrote:

 I don't understand what's ugly about:
 
 go s l x | x  m  = s / fromIntegral l
  | otherwise  = go (s+x) (l+1) (x+1)
 

I suspect you've been looking at low-level code too long. How about the 
total lack of domain concepts?

Try:

mean n m = let (sum, length, _) = go (0,0,n)
 in sum / fromIntegral length
where
go :: (Double, Int, Double) - (Double, Int, Double)
go t@(s,l,x) | x  m  = t
 | otherwise  = go (s+x) (l+1) (x+1)

as a starting point. I might consider generalising to a while HOF while 
I'm at it, because ultimately this is a while loop. Admittedly that would 
be relying on the implementation doing a little inlining, which you're not 
reliant on.

Given the audience it'd be good to see some of the working to pull it off 
via fusion in a comment too:

-- [1 .. d ] = unfoldr (let f n | n  d = Nothing 
--  f n = Just (n,n+1) in f) 1
-- sum = foldr ...
-- length = foldr ...
-- sumAndLength = foldr ... (as calculated from the above)
-- mean [1 .. d] = s / l where
--   (sum, length) = sumAndLength [1 .. d]
-- = sumAndLength . unfoldr ... 
-- = foldr ... . unfoldr ...
-- = ...

Some things it might be nice to have and/or know about:

* We really ought to be able to build the sumAndLength fold by building 
the appropriate tuple and tuple function from its components - with there 
being a standard idiom for naming them, and a library of such things to 
hand. Same thing for go, too - this means we retain the domain concepts in 
its implementation by default. The interesting thing about go is that we 
ourselves running the guts of an unfold at the same time, which means it 
supplies our termination criteria - I suspect I ought to post a 'most 
general' form of go on that basis soon?
* Does nesting unboxed tuples work appropriately?

I was about to suggest a standard way of doing the wiring for the tupling 
as well, but so long as nesting unboxed tuples works and the optimiser 
'gets it' then there's an arrow instance that ought to do nicely!

While not quite as low-level, the resulting code should hopefully be easy 
bait for GHC's optimiser (if not, someone please yell!), while both 
retaining much of the domain info of the original code and conveying much 
about how it's made to go fast.

 Nothing beats understanding what you're writing at all levels of abstraction.
 

How about ensuring that a casual reader can do the same quickly?

-- 
[EMAIL PROTECTED]

Performance anxiety leads to premature optimisation
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C. [Was: Re: GHC predictability]

2008-05-16 Thread Philippa Cowderoy
On Fri, 16 May 2008, Andrew Coppin wrote:

 Obviously most people would prefer to write declarative code and feel secure
 that the compiler is going to produce something efficient.
 

Ultimately the only way to do this is to stick to Einstein's advice - make 
things as simple as possible but no simpler. This means that if you care 
about speed then somewhere, the structure that enables a fast 
implementation needs to be declared so that the compiler can work with it. 
For example, you might not want to hand-fuse (I know I get bored of it 
pretty quickly) but the possibility of fusion will have to be clear.

If you don't want to have to do it yourself (or don't know how!) and you 
want to be confident that something's going to run fast, that means a 
library covering a range of known cases that'll all go quickly. Don has 
been a major contributor here! But it's hard work, and if you don't 
understand how fast code is structured then ultimately you won't be able 
to predict - there'll never be a guarantee that lets you be completely 
ignorant.

-- 
[EMAIL PROTECTED]

'In Ankh-Morpork even the shit have a street to itself...
 Truly this is a land of opportunity.' - Detritus, Men at Arms
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-16 Thread Ketil Malde
Andrew Coppin [EMAIL PROTECTED] writes:

 I'm more worried about what happens in less trivial examples. [Let's
 face it, who wants to compute the sum of the numbers from 1 to N?]

Inspired by Don's blog post, and coincidentally working on a program
where profiling points to one particular, short function as
responsible for 60% of the work, I thought this would be a good time
to look into core and reveal the deep secrets of my code.  This is the
function: 

 mkAnn :: ByteString - Annotation
 mkAnn = pick . B.words
 where pick (_db:up:rest) = pick' up $ getGo rest
   pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack go) (read 
 $ B.unpack ev)
   getGo = dropWhile (not . B.isPrefixOf (pack GO:))

A bit clunky, but simple enough: given a line of input, break into
words, pick word number two, the word starting with GO: and the
second-to-next word.  Here are the data types involved:

 data Annotation = Ann !UniProtAcc !GoTerm !EvidenceCode deriving (Show)
 newtype GoTerm = GO Int deriving (Eq,Ord)
 type UniProtAcc = ByteString
 data EvidenceCode = ... -- many nullary constructors

Unfortunately, this results in no less than four pages of core, with
tons of less intelligible identfiers and nested cases and
whatnot... any idea why this would be so slow?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-16 Thread Don Stewart
ketil:
 Andrew Coppin [EMAIL PROTECTED] writes:
 
  I'm more worried about what happens in less trivial examples. [Let's
  face it, who wants to compute the sum of the numbers from 1 to N?]
 
 Inspired by Don's blog post, and coincidentally working on a program
 where profiling points to one particular, short function as
 responsible for 60% of the work, I thought this would be a good time
 to look into core and reveal the deep secrets of my code.  This is the
 function: 
 
  mkAnn :: ByteString - Annotation
  mkAnn = pick . B.words
  where pick (_db:up:rest) = pick' up $ getGo rest
pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack go) 
  (read $ B.unpack ev)
getGo = dropWhile (not . B.isPrefixOf (pack GO:))
 

read $ B.unpack go

Looks suspicious. You're unpacking to lists.

ByteString performance rule 1: don't unpack to lists.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-16 Thread Dan Weston

Ketil Malde wrote:

mkAnn :: ByteString - Annotation
mkAnn = pick . B.words
where pick (_db:up:rest) = pick' up $ getGo rest
  pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack go) (read $ 
B.unpack ev)
  getGo = dropWhile (not . B.isPrefixOf (pack GO:))


It seems at first face miraculously coincidental that the dropWhile in 
the getGo definition knows to stop dropping when there are exactly 4 
elements, in order to match the pattern in the second parameter of the 
pick' definition, whose argument is provided by (getGo Rest).


What magic makes this true? Just curious...

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-16 Thread Dan Weston

Dan Weston wrote:

Ketil Malde wrote:

mkAnn :: ByteString - Annotation
mkAnn = pick . B.words
where pick (_db:up:rest) = pick' up $ getGo rest
  pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack 
go) (read $ B.unpack ev)

  getGo = dropWhile (not . B.isPrefixOf (pack GO:))


It seems at first face miraculously coincidental that the dropWhile in 
the getGo definition knows to stop dropping when there are exactly 4 
elements, in order to match the pattern in the second parameter of the 
pick' definition, whose argument is provided by (getGo Rest).


What magic makes this true? Just curious...


I didn't mean exactly 4, but at least 3. Otherwise, I'm still 
curious! :)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Write Haskell as fast as C.

2008-05-16 Thread Ketil Malde
Dan Weston [EMAIL PROTECTED] writes:

 mkAnn :: ByteString - Annotation
 mkAnn = pick . B.words
 where pick (_db:up:rest) = pick' up $ getGo rest
   pick' up' (go:_:ev:_) = Ann (B.copy up') (read $ B.unpack go) 
 (read $ B.unpack ev)
   getGo = dropWhile (not . B.isPrefixOf (pack GO:))

 It seems at first face miraculously coincidental that the dropWhile in
 the getGo definition knows to stop dropping when there are exactly 4
 elements, in order to match the pattern in the second parameter of the
 pick' definition, whose argument is provided by (getGo Rest).

 What magic makes this true? Just curious...

You want the long story? :-)

This is for parsing the GOA file format, which contains links between
proteins from the UniProt database to Gene Onthology (GO) terms.  The
format is not quite as regular as one would wish, but the second word
is always the protein id, and whenever the GO term turns up, it is
followed by something I forget (an InterPro reference perhaps) and
then the evidence code - which I want.

You feel happier now, I can tell.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe