need help optimizing a function

2002-10-08 Thread David Roundy

Hello.  If you followed my previous post (which was about a bug in my lcs
function), this is still the same code (only now bug-free, it seems).

It turns out that my lcs diff function is MUCH slower than GNU diff on some
test files (much worse than 1000 time slower!).The test case where its
speed was close was IO bound on the output to an xterm (silly me).

I've profiled my lcs code, and it is spending about 95% of its time in the
hunt_one_char function.  It also claims to spend 0.0% of its time in
my_bs, which does a binary search on the array, and is the only other
function that gets called by hunt_one_char (not counting itself).

I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail.
I guess maybe it can't or won't inline a recursive function, or maybe that
just doesn't gain me anything.

My only guess is that for some reason it is making an entirely new copy of
the array each time it recurses, but that doesn't make any sense, since the
array is getting threaded through, so it should be able to just modify the
array in place.  But then, I'm not entirely clear as to what the compiler
can and cannot do as it optimizes.

data Threshold a = Thresh Int! [a]
 deriving (Show)

hunt_one_char :: String - [Int] - Array Int (Threshold String) -
 Array Int (Threshold String)
hunt_one_char c [] th = th
hunt_one_char c (j:js) th =
case my_bs (Thresh j [c]) th of
Nothing - hunt_one_char c js th
Just k -
case th!(k-1) of
Thresh _ rest -
hunt_one_char c js th//[(k,Thresh j (c:rest))]

For what it's worth, the calling function is:

hunt_internal :: [String] - [[Int]] - 
 Array Int (Threshold String) -
 Array Int (Threshold String)
hunt_internal [] _ th = th
hunt_internal _ [] th = th
hunt_internal (c:cs) (m:ms) th =
hunt_internal cs ms $ hunt_one_char c m th

Each string in this list of strings is a line from one of the files (and
the [[Int]] is a list of line numbers in the other file that that line
matches).  The array 'th' has size (1,n) where n is the number of lines in
the file.

My simplest test case consists of a 20k line file, each line of which is
unique and a second file which is simply a permutation of the first (moving
the first line to the last), so hunt_one_char should be called exactly once
for each line, and it is always called with its second argument having
exactly one element.  In this test case, according to the ghc profiler,
89.7% of the time is spent in hunt_one_char:
  individual inherited
COST CENTRE  MODULE entries  %time %alloc   %time %alloc
  hunt_internal  Main 200010.2   0.0 91.4  95.3
   hunt_one_char Main 4   89.7  95.0 91.1  95.3
my_bsMain 20.2   0.0  1.4   0.3
 my_helper_bsMain3072311.2   0.3  1.2   0.3

This has me at a loss.  It seems like such a simple function... of course,
it is called 20k times, which could certainly add up, but while each call
of my_bs should take O(log 20k) time, each call of hunt_one_char (apart
from its one call to my_bs) should only take O(1), so most of the time
should be spent in my_bs unless something is terribly wrong (which must be
the case here).

If it would help, I'd be happy to send a listing of the complete code, but
it's 6.5k so I figured it'd be better not to send it, since it seems
unlikely that anyone would want to run it anyways.
-- 
David Roundy
http://civet.berkeley.edu/droundy/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: Graphics output

2002-10-08 Thread Ferenc Wagner

Gerhard Navratil [EMAIL PROTECTED] writes:

 I would like to write line graphics into a file
 (e.g. contour lines calculated by a Haskell function) and
 access the data from standard programs.  For the output I
 need a Library.

Why don't you output some ASCII numbers, and use another
program (gnuplot, plotutils, whatever) to produce graphics?

Feri.
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: need help optimizing a function

2002-10-08 Thread Hal Daume III

I already saw one reply mentioning using state threaded arrays.  This is
probably your best (in terms of performance) option.

Keep in mind that Haskell arrays are *not* your standard imperative
arrays.  Like imperative arrays, they give you O(1) lookup, but only
O(n) insert.  If you want to keep with a functional style, I'd suggest
using a different data structure for the data.  A FiniteMap should work
fine and should give you pretty good (at least much
better) performance.  And you won't have to give up the FP style (for
whatever that's worth).

 - Hal

p.s., I sense your next question is going to be something like why can't
the compiler detect that the array can be updated in place instead of
copied and the answer, from what i can tell, is simply that it doesn't
try.

--
Hal Daume III

 Computer science is no more about computers| [EMAIL PROTECTED]
  than astronomy is about telescopes. -Dijkstra | www.isi.edu/~hdaume

On Tue, 8 Oct 2002, David Roundy wrote:

 Hello.  If you followed my previous post (which was about a bug in my lcs
 function), this is still the same code (only now bug-free, it seems).
 
 It turns out that my lcs diff function is MUCH slower than GNU diff on some
 test files (much worse than 1000 time slower!).The test case where its
 speed was close was IO bound on the output to an xterm (silly me).
 
 I've profiled my lcs code, and it is spending about 95% of its time in the
 hunt_one_char function.  It also claims to spend 0.0% of its time in
 my_bs, which does a binary search on the array, and is the only other
 function that gets called by hunt_one_char (not counting itself).
 
 I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail.
 I guess maybe it can't or won't inline a recursive function, or maybe that
 just doesn't gain me anything.
 
 My only guess is that for some reason it is making an entirely new copy of
 the array each time it recurses, but that doesn't make any sense, since the
 array is getting threaded through, so it should be able to just modify the
 array in place.  But then, I'm not entirely clear as to what the compiler
 can and cannot do as it optimizes.
 
 data Threshold a = Thresh Int! [a]
  deriving (Show)
 
 hunt_one_char :: String - [Int] - Array Int (Threshold String) -
  Array Int (Threshold String)
 hunt_one_char c [] th = th
 hunt_one_char c (j:js) th =
 case my_bs (Thresh j [c]) th of
 Nothing - hunt_one_char c js th
 Just k -
 case th!(k-1) of
 Thresh _ rest -
 hunt_one_char c js th//[(k,Thresh j (c:rest))]
 
 For what it's worth, the calling function is:
 
 hunt_internal :: [String] - [[Int]] - 
  Array Int (Threshold String) -
  Array Int (Threshold String)
 hunt_internal [] _ th = th
 hunt_internal _ [] th = th
 hunt_internal (c:cs) (m:ms) th =
 hunt_internal cs ms $ hunt_one_char c m th
 
 Each string in this list of strings is a line from one of the files (and
 the [[Int]] is a list of line numbers in the other file that that line
 matches).  The array 'th' has size (1,n) where n is the number of lines in
 the file.
 
 My simplest test case consists of a 20k line file, each line of which is
 unique and a second file which is simply a permutation of the first (moving
 the first line to the last), so hunt_one_char should be called exactly once
 for each line, and it is always called with its second argument having
 exactly one element.  In this test case, according to the ghc profiler,
 89.7% of the time is spent in hunt_one_char:
   individual inherited
 COST CENTRE  MODULE entries  %time %alloc   %time %alloc
   hunt_internal  Main 200010.2   0.0 91.4  95.3
hunt_one_char Main 4   89.7  95.0 91.1  95.3
 my_bsMain 20.2   0.0  1.4   0.3
  my_helper_bsMain3072311.2   0.3  1.2   0.3
 
 This has me at a loss.  It seems like such a simple function... of course,
 it is called 20k times, which could certainly add up, but while each call
 of my_bs should take O(log 20k) time, each call of hunt_one_char (apart
 from its one call to my_bs) should only take O(1), so most of the time
 should be spent in my_bs unless something is terribly wrong (which must be
 the case here).
 
 If it would help, I'd be happy to send a listing of the complete code, but
 it's 6.5k so I figured it'd be better not to send it, since it seems
 unlikely that anyone would want to run it anyways.
 -- 
 David Roundy
 http://civet.berkeley.edu/droundy/
 ___
 Haskell-Cafe mailing list
 [EMAIL PROTECTED]
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: need help optimizing a function

2002-10-08 Thread Alastair Reid


Hal Daume [EMAIL PROTECTED] writes:

 p.s., I sense your next question is going to be something like why
 can't the compiler detect that the array can be updated in place
 instead of copied and the answer, from what i can tell, is simply
 that it doesn't try.

And one might argue that it should not try since the result would be
incredibly fragile with small, semantics preserving changes to the
program confusing the analysis and causing changes in the big-O
complexity of your program and, since the analysis would undoubtedly
be pretty darn cunning, it'd also be virtually impossible to
understand why it did or did not manage to detect the
single-threadedness of your code.

I think one of the reasons that approaches based on types (monads,
MADTs, linear types, etc.) are so popular is that they make the
programmers assertion (X is used in a single-threaded manner)
sufficiently explicit that the compiler can complain when it disagrees
and, since parts of the analysis show through in the typesystem, they
make the analysis less of a black box.

--
Alastair Reid [EMAIL PROTECTED]  
Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alastair/


ps The opening paragraphs of Paul Hudak's MADT paper
(http://www.haskell.org/papers/madt.ps) have a more coherent
explanation of why smarter compilers are not the answer.

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: need help optimizing a function

2002-10-08 Thread Zdenek Dvorak

Hello,

  p.s., I sense your next question is going to be something like why
  can't the compiler detect that the array can be updated in place
  instead of copied and the answer, from what i can tell, is simply
  that it doesn't try.

And one might argue that it should not try
[snip]

and the other answer is that the result would not be worth the effort
usually. The idea seems nice, but the laziness is where I see the problem;
see the following code:

do_something::Array Int Int-(Array Int Int, Result)
do_something a = (a'',res)
where
  a' = a // [(summon_index, summon_int)]
  w = a' ! 4
  res = do_some_work w
  a'' = a' // [(summon_other_index, summon_other_int)]

Seems like nice candidate for update-in-place? But the problem is, that
in general you cannot say when w will be evaluated -- so you simply cannot
destroy a' before then.

This is also reason why DiffArray (from hslibs) does not have to be very 
useful,
unless you have a good control on when things are evaluated -- it may (and 
it
also happened to me) have quadratic behavior in cases like this.

To solve this, '!' would have to produce a new array too; but then you must
pass it everywhere and effectively you are doing the work monads would do
for you.

So in general you must choose -- either your programs will be nice and 
functional,
but you will have to use structures with (some kind of) logarithmic slowdown
(FiniteMap, Trie), or some of their parts will be embedded in monads.

Zdenek Dvorak

_
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Expander2

2002-10-08 Thread Prof. Dr. Peter Padawitz

Expander2: A Formal Methods Presenter and Animator
==

Expander2 has been designed as a multi-purpose workbench for interactive
logical inference, constraint solving, data flow analysis and other
procedures building up proofs or computation sequences. Moreover,
several interpreters translate expressions into tailor-made
two-dimensional representations that range from trees and term graphs to
tables, fractals or other turtle-system-generated pictures.

Expander2 has been implemented in O'Haskell}, an extension of
Haskell with object-oriented features for reactive programming and a
typed interface to Tcl/Tk for developing GUIs. Besides a comfortable GUI
the design goals of Expander2 were to integrate testing, proving and
visualizing deductive methods, to admit several degrees of interaction
and to keep the system open for extensions or adaptations of individual
components to changing demands.

Proofs and computations performed with Expander2 follow the rules and
the semantics of swinging types
(ls5-www.cs.uni-dortmund.de/~peter/Swinging.html). 
Swinging types combine constructor-based visible types with state-based
hidden types and have unique (Herbrand) models, which interpret
relations as the least or greatest solutions of their axioms.

All features of the system and their use are described in the manual

ls5-www.cs.uni-dortmund.de/~peter/Expander2/Expander2.html 

(sorry, this is still a big file, it will be splitted soon; PostScript
version: ../Expander2/Expander2.ps.gz). The paper
../Expander2/Chiemsee.ps.gz concentrates on the prover features.
Download everything with ../Expander2.tar.gz.

Please email comments, bugs, etc. to [EMAIL PROTECTED]

Peter
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Data structure definitions

2002-10-08 Thread Mark T.B. Carroll

I have a program that basically has,

  data Expression =
Value Value
  | EVariable Variable | other stuff ...

  data Value = VNumber Number | other stuff ...

  data Variable = Variable { variable_name :: String, variable_time :: Expression }
  data Number = Number { value :: Double, dimension :: Dimension }

  newtype VariableCount = VariableCount (Variable, Number)

The VNumber and EVariable constructors are ugly, though, so I was
wondering if I should be using typeclasses - e.g.,

  class Expression a
  class Expression a = Value a
  instance Value Number
  instance Expression Variable

... but I don't see how to define Variable in such a scheme. Maybe I
shouldn't be using typeclasses?

(Obviously, I actually have lots more type constructors in Expression and
Value - dyadic expressions, booleans, etc. - the above with just numbers
and variables is somewhat truncated, but should suffice.)

-- Mark

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: Data structure definitions

2002-10-08 Thread Hal Daume III

I think this is more or less the standard way of doing it.  I don't think
type classes are the right thing to do.  Usually constructors are prefixed
with the first character of their type in situations like this (or so I've
seen), so you get:

data Expression = EValue Value | EVariable Variable | ...
data Value = VNumber Number | V... | ...
etc...

I'm not sure if this answers your question or not, tho...

--
Hal Daume III

 Computer science is no more about computers| [EMAIL PROTECTED]
  than astronomy is about telescopes. -Dijkstra | www.isi.edu/~hdaume

On Tue, 8 Oct 2002, Mark T.B. Carroll wrote:

 I have a program that basically has,
 
   data Expression =
 Value Value
   | EVariable Variable | other stuff ...
 
   data Value = VNumber Number | other stuff ...
 
   data Variable = Variable { variable_name :: String, variable_time :: Expression }
   data Number = Number { value :: Double, dimension :: Dimension }
 
   newtype VariableCount = VariableCount (Variable, Number)
 
 The VNumber and EVariable constructors are ugly, though, so I was
 wondering if I should be using typeclasses - e.g.,
 
   class Expression a
   class Expression a = Value a
   instance Value Number
   instance Expression Variable
 
 ... but I don't see how to define Variable in such a scheme. Maybe I
 shouldn't be using typeclasses?
 
 (Obviously, I actually have lots more type constructors in Expression and
 Value - dyadic expressions, booleans, etc. - the above with just numbers
 and variables is somewhat truncated, but should suffice.)
 
 -- Mark
 
 ___
 Haskell-Cafe mailing list
 [EMAIL PROTECTED]
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe