need help optimizing a function
Hello. If you followed my previous post (which was about a bug in my lcs function), this is still the same code (only now bug-free, it seems). It turns out that my lcs diff function is MUCH slower than GNU diff on some test files (much worse than 1000 time slower!).The test case where its speed was close was IO bound on the output to an xterm (silly me). I've profiled my lcs code, and it is spending about 95% of its time in the hunt_one_char function. It also claims to spend 0.0% of its time in my_bs, which does a binary search on the array, and is the only other function that gets called by hunt_one_char (not counting itself). I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail. I guess maybe it can't or won't inline a recursive function, or maybe that just doesn't gain me anything. My only guess is that for some reason it is making an entirely new copy of the array each time it recurses, but that doesn't make any sense, since the array is getting threaded through, so it should be able to just modify the array in place. But then, I'm not entirely clear as to what the compiler can and cannot do as it optimizes. data Threshold a = Thresh Int! [a] deriving (Show) hunt_one_char :: String - [Int] - Array Int (Threshold String) - Array Int (Threshold String) hunt_one_char c [] th = th hunt_one_char c (j:js) th = case my_bs (Thresh j [c]) th of Nothing - hunt_one_char c js th Just k - case th!(k-1) of Thresh _ rest - hunt_one_char c js th//[(k,Thresh j (c:rest))] For what it's worth, the calling function is: hunt_internal :: [String] - [[Int]] - Array Int (Threshold String) - Array Int (Threshold String) hunt_internal [] _ th = th hunt_internal _ [] th = th hunt_internal (c:cs) (m:ms) th = hunt_internal cs ms $ hunt_one_char c m th Each string in this list of strings is a line from one of the files (and the [[Int]] is a list of line numbers in the other file that that line matches). The array 'th' has size (1,n) where n is the number of lines in the file. My simplest test case consists of a 20k line file, each line of which is unique and a second file which is simply a permutation of the first (moving the first line to the last), so hunt_one_char should be called exactly once for each line, and it is always called with its second argument having exactly one element. In this test case, according to the ghc profiler, 89.7% of the time is spent in hunt_one_char: individual inherited COST CENTRE MODULE entries %time %alloc %time %alloc hunt_internal Main 200010.2 0.0 91.4 95.3 hunt_one_char Main 4 89.7 95.0 91.1 95.3 my_bsMain 20.2 0.0 1.4 0.3 my_helper_bsMain3072311.2 0.3 1.2 0.3 This has me at a loss. It seems like such a simple function... of course, it is called 20k times, which could certainly add up, but while each call of my_bs should take O(log 20k) time, each call of hunt_one_char (apart from its one call to my_bs) should only take O(1), so most of the time should be spent in my_bs unless something is terribly wrong (which must be the case here). If it would help, I'd be happy to send a listing of the complete code, but it's 6.5k so I figured it'd be better not to send it, since it seems unlikely that anyone would want to run it anyways. -- David Roundy http://civet.berkeley.edu/droundy/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Graphics output
Gerhard Navratil [EMAIL PROTECTED] writes: I would like to write line graphics into a file (e.g. contour lines calculated by a Haskell function) and access the data from standard programs. For the output I need a Library. Why don't you output some ASCII numbers, and use another program (gnuplot, plotutils, whatever) to produce graphics? Feri. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
I already saw one reply mentioning using state threaded arrays. This is probably your best (in terms of performance) option. Keep in mind that Haskell arrays are *not* your standard imperative arrays. Like imperative arrays, they give you O(1) lookup, but only O(n) insert. If you want to keep with a functional style, I'd suggest using a different data structure for the data. A FiniteMap should work fine and should give you pretty good (at least much better) performance. And you won't have to give up the FP style (for whatever that's worth). - Hal p.s., I sense your next question is going to be something like why can't the compiler detect that the array can be updated in place instead of copied and the answer, from what i can tell, is simply that it doesn't try. -- Hal Daume III Computer science is no more about computers| [EMAIL PROTECTED] than astronomy is about telescopes. -Dijkstra | www.isi.edu/~hdaume On Tue, 8 Oct 2002, David Roundy wrote: Hello. If you followed my previous post (which was about a bug in my lcs function), this is still the same code (only now bug-free, it seems). It turns out that my lcs diff function is MUCH slower than GNU diff on some test files (much worse than 1000 time slower!).The test case where its speed was close was IO bound on the output to an xterm (silly me). I've profiled my lcs code, and it is spending about 95% of its time in the hunt_one_char function. It also claims to spend 0.0% of its time in my_bs, which does a binary search on the array, and is the only other function that gets called by hunt_one_char (not counting itself). I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail. I guess maybe it can't or won't inline a recursive function, or maybe that just doesn't gain me anything. My only guess is that for some reason it is making an entirely new copy of the array each time it recurses, but that doesn't make any sense, since the array is getting threaded through, so it should be able to just modify the array in place. But then, I'm not entirely clear as to what the compiler can and cannot do as it optimizes. data Threshold a = Thresh Int! [a] deriving (Show) hunt_one_char :: String - [Int] - Array Int (Threshold String) - Array Int (Threshold String) hunt_one_char c [] th = th hunt_one_char c (j:js) th = case my_bs (Thresh j [c]) th of Nothing - hunt_one_char c js th Just k - case th!(k-1) of Thresh _ rest - hunt_one_char c js th//[(k,Thresh j (c:rest))] For what it's worth, the calling function is: hunt_internal :: [String] - [[Int]] - Array Int (Threshold String) - Array Int (Threshold String) hunt_internal [] _ th = th hunt_internal _ [] th = th hunt_internal (c:cs) (m:ms) th = hunt_internal cs ms $ hunt_one_char c m th Each string in this list of strings is a line from one of the files (and the [[Int]] is a list of line numbers in the other file that that line matches). The array 'th' has size (1,n) where n is the number of lines in the file. My simplest test case consists of a 20k line file, each line of which is unique and a second file which is simply a permutation of the first (moving the first line to the last), so hunt_one_char should be called exactly once for each line, and it is always called with its second argument having exactly one element. In this test case, according to the ghc profiler, 89.7% of the time is spent in hunt_one_char: individual inherited COST CENTRE MODULE entries %time %alloc %time %alloc hunt_internal Main 200010.2 0.0 91.4 95.3 hunt_one_char Main 4 89.7 95.0 91.1 95.3 my_bsMain 20.2 0.0 1.4 0.3 my_helper_bsMain3072311.2 0.3 1.2 0.3 This has me at a loss. It seems like such a simple function... of course, it is called 20k times, which could certainly add up, but while each call of my_bs should take O(log 20k) time, each call of hunt_one_char (apart from its one call to my_bs) should only take O(1), so most of the time should be spent in my_bs unless something is terribly wrong (which must be the case here). If it would help, I'd be happy to send a listing of the complete code, but it's 6.5k so I figured it'd be better not to send it, since it seems unlikely that anyone would want to run it anyways. -- David Roundy http://civet.berkeley.edu/droundy/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
Hal Daume [EMAIL PROTECTED] writes: p.s., I sense your next question is going to be something like why can't the compiler detect that the array can be updated in place instead of copied and the answer, from what i can tell, is simply that it doesn't try. And one might argue that it should not try since the result would be incredibly fragile with small, semantics preserving changes to the program confusing the analysis and causing changes in the big-O complexity of your program and, since the analysis would undoubtedly be pretty darn cunning, it'd also be virtually impossible to understand why it did or did not manage to detect the single-threadedness of your code. I think one of the reasons that approaches based on types (monads, MADTs, linear types, etc.) are so popular is that they make the programmers assertion (X is used in a single-threaded manner) sufficiently explicit that the compiler can complain when it disagrees and, since parts of the analysis show through in the typesystem, they make the analysis less of a black box. -- Alastair Reid [EMAIL PROTECTED] Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/ ps The opening paragraphs of Paul Hudak's MADT paper (http://www.haskell.org/papers/madt.ps) have a more coherent explanation of why smarter compilers are not the answer. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
Hello, p.s., I sense your next question is going to be something like why can't the compiler detect that the array can be updated in place instead of copied and the answer, from what i can tell, is simply that it doesn't try. And one might argue that it should not try [snip] and the other answer is that the result would not be worth the effort usually. The idea seems nice, but the laziness is where I see the problem; see the following code: do_something::Array Int Int-(Array Int Int, Result) do_something a = (a'',res) where a' = a // [(summon_index, summon_int)] w = a' ! 4 res = do_some_work w a'' = a' // [(summon_other_index, summon_other_int)] Seems like nice candidate for update-in-place? But the problem is, that in general you cannot say when w will be evaluated -- so you simply cannot destroy a' before then. This is also reason why DiffArray (from hslibs) does not have to be very useful, unless you have a good control on when things are evaluated -- it may (and it also happened to me) have quadratic behavior in cases like this. To solve this, '!' would have to produce a new array too; but then you must pass it everywhere and effectively you are doing the work monads would do for you. So in general you must choose -- either your programs will be nice and functional, but you will have to use structures with (some kind of) logarithmic slowdown (FiniteMap, Trie), or some of their parts will be embedded in monads. Zdenek Dvorak _ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Expander2
Expander2: A Formal Methods Presenter and Animator == Expander2 has been designed as a multi-purpose workbench for interactive logical inference, constraint solving, data flow analysis and other procedures building up proofs or computation sequences. Moreover, several interpreters translate expressions into tailor-made two-dimensional representations that range from trees and term graphs to tables, fractals or other turtle-system-generated pictures. Expander2 has been implemented in O'Haskell}, an extension of Haskell with object-oriented features for reactive programming and a typed interface to Tcl/Tk for developing GUIs. Besides a comfortable GUI the design goals of Expander2 were to integrate testing, proving and visualizing deductive methods, to admit several degrees of interaction and to keep the system open for extensions or adaptations of individual components to changing demands. Proofs and computations performed with Expander2 follow the rules and the semantics of swinging types (ls5-www.cs.uni-dortmund.de/~peter/Swinging.html). Swinging types combine constructor-based visible types with state-based hidden types and have unique (Herbrand) models, which interpret relations as the least or greatest solutions of their axioms. All features of the system and their use are described in the manual ls5-www.cs.uni-dortmund.de/~peter/Expander2/Expander2.html (sorry, this is still a big file, it will be splitted soon; PostScript version: ../Expander2/Expander2.ps.gz). The paper ../Expander2/Chiemsee.ps.gz concentrates on the prover features. Download everything with ../Expander2.tar.gz. Please email comments, bugs, etc. to [EMAIL PROTECTED] Peter ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Data structure definitions
I have a program that basically has, data Expression = Value Value | EVariable Variable | other stuff ... data Value = VNumber Number | other stuff ... data Variable = Variable { variable_name :: String, variable_time :: Expression } data Number = Number { value :: Double, dimension :: Dimension } newtype VariableCount = VariableCount (Variable, Number) The VNumber and EVariable constructors are ugly, though, so I was wondering if I should be using typeclasses - e.g., class Expression a class Expression a = Value a instance Value Number instance Expression Variable ... but I don't see how to define Variable in such a scheme. Maybe I shouldn't be using typeclasses? (Obviously, I actually have lots more type constructors in Expression and Value - dyadic expressions, booleans, etc. - the above with just numbers and variables is somewhat truncated, but should suffice.) -- Mark ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Data structure definitions
I think this is more or less the standard way of doing it. I don't think type classes are the right thing to do. Usually constructors are prefixed with the first character of their type in situations like this (or so I've seen), so you get: data Expression = EValue Value | EVariable Variable | ... data Value = VNumber Number | V... | ... etc... I'm not sure if this answers your question or not, tho... -- Hal Daume III Computer science is no more about computers| [EMAIL PROTECTED] than astronomy is about telescopes. -Dijkstra | www.isi.edu/~hdaume On Tue, 8 Oct 2002, Mark T.B. Carroll wrote: I have a program that basically has, data Expression = Value Value | EVariable Variable | other stuff ... data Value = VNumber Number | other stuff ... data Variable = Variable { variable_name :: String, variable_time :: Expression } data Number = Number { value :: Double, dimension :: Dimension } newtype VariableCount = VariableCount (Variable, Number) The VNumber and EVariable constructors are ugly, though, so I was wondering if I should be using typeclasses - e.g., class Expression a class Expression a = Value a instance Value Number instance Expression Variable ... but I don't see how to define Variable in such a scheme. Maybe I shouldn't be using typeclasses? (Obviously, I actually have lots more type constructors in Expression and Value - dyadic expressions, booleans, etc. - the above with just numbers and variables is somewhat truncated, but should suffice.) -- Mark ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe