Re: [Haskell-cafe] performance question
I have to agree that reading and maintaining regular expressions can be challenging :) On Wed, Feb 13, 2013 at 9:50 PM, Erik de Castro Lopo mle...@mega-nerd.comwrote: wren ng thornton wrote: Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). This cannot be emphasized heavily enough. Once you have learnt how to use one or more of these parsec libraries they will become your main tool for parsing everything from complex input languages like haskell itself, all the way down to relatively simple config files. Parsec style parsers are built up out of small composable (and more importantly reusable) combinators, that are easier to read and easier to maintain than anything other than the most trivial regex. Erik -- -- Erik de Castro Lopo http://www.mega-nerd.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Just to play devil's advocate: 100% agreed that there are better things to do in Haskell _source code_ than regexps. The thing about regexps is that they can be accepted at run time as _data_. This means, for example, that they can be put in whatever you use for localisation. See for example YESEXPR/NOEXPR in langinfo.h ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 2/13/13 11:18 PM, wren ng thornton wrote: On 2/13/13 11:32 AM, Nicolas Bock wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). Just to be clear, the problem isn't that proper regexes are only good for regular languages (many files have regular syntax afterall). The problem is that regexes are only good for recognition. They're an excellent tool for deciding whether a given string is good or bad; but they're completely unsuitable for the task of parsing/interpreting a string into some structure or semantic response. If you've ever used tools like yacc or javaCC, one of the crucial things they offer is the ability to add these semantic responses. Parser combinator libraries in Haskell are similar, since the string processing is integrated into a programming language so we can say things like: myParser = do x - blah guard (p x) y - blargh return (f x y) where p and f can be an arbitrary Haskell functions. Perl extends on regular expressions to try and do things like this, but it's extremely baroque, hard to get right, and impossible to maintain. (N.B., I was raised on Perl and still love it.) And at some point we have to call into question the idea of regexes as an embedded DSL when we then turn around and try to have Perl be a DSL embedded into the regex language. One of the big things that makes regexes so nice is that they identify crucial combinators like choice and repetition. However, once those combinators have been identified, we can just offer them directly as functions in the host language. No need for a special DSL or special syntax. The big trick is doing this efficiently. Parser combinators were an academic curiosity for a long time until Parsec came around and made them efficient. And we've come a long way since then: with things like attoparsec, PEG parsing, and non-monadic applicative parsers (which can perform more optimizations because they can identify the structure of the grammar). The theory of regular expressions is indeed beautiful and elegant. However, it's a theory of recognition, not a theory of parsing; and that's a crucial distinction. Haskell is about clear, concise, and well-structured code; but to be clear, concise, and well-structured we have to choose the right tool for the job. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
(I'll be brief because my head is hurting, but please don't interpret that as an intent to offend) A few points: 1) Capture groups are all you need to do some meaningful interpretation of data; these were around long before perl. 2) Yacc is typically used in conjunction with lex, partly for (a) efficiency and partly for (b) ease of use (compared to writing out [a-z] as production rules). 3) I've actually used lex without yacc (well, flex without bison) when faced with dealing with a language that's regular (and easy enough to express that way - cf. an enormous finite subset of a context-free language). 2b is mostly irrelevant in Haskell, as Parsec already provides functions that can easily match the same things a regexp would. 2a, if it stands up to testing, is the best argument for ripping things apart in Haskell using a DFA. Parsec and cousins are efficient, but it's hard to beat a single table lookup per character. The questions are 1) is the difference enough to matter in many cases, and 2) is there a way to get this out of parsec without touching regexps? (It's not impossible that parsec already recognizes when a language is regular, although I'd be weakly surprised). On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton w...@freegeek.org wrote: On 2/13/13 11:18 PM, wren ng thornton wrote: On 2/13/13 11:32 AM, Nicolas Bock wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). Just to be clear, the problem isn't that proper regexes are only good for regular languages (many files have regular syntax afterall). The problem is that regexes are only good for recognition. They're an excellent tool for deciding whether a given string is good or bad; but they're completely unsuitable for the task of parsing/interpreting a string into some structure or semantic response. If you've ever used tools like yacc or javaCC, one of the crucial things they offer is the ability to add these semantic responses. Parser combinator libraries in Haskell are similar, since the string processing is integrated into a programming language so we can say things like: myParser = do x - blah guard (p x) y - blargh return (f x y) where p and f can be an arbitrary Haskell functions. Perl extends on regular expressions to try and do things like this, but it's extremely baroque, hard to get right, and impossible to maintain. (N.B., I was raised on Perl and still love it.) And at some point we have to call into question the idea of regexes as an embedded DSL when we then turn around and try to have Perl be a DSL embedded into the regex language. One of the big things that makes regexes so nice is that they identify crucial combinators like choice and repetition. However, once those combinators have been identified, we can just offer them directly as functions in the host language. No need for a special DSL or special syntax. The big trick is doing this efficiently. Parser combinators were an academic curiosity for a long time until Parsec came around and made them efficient. And we've come a long way since then: with things like attoparsec, PEG parsing, and non-monadic applicative parsers (which can perform more optimizations because they can identify the structure of the grammar). The theory of regular expressions is indeed beautiful and elegant. However, it's a theory of recognition, not a theory of parsing; and that's a crucial distinction. Haskell is about clear, concise, and well-structured code; but to be clear, concise, and well-structured we have to choose the right tool for the job. -- Live well, ~wren __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
It's worth remembering that the main gain from lex/yacc had originally to do with making the generated programs fit into 64K address space on a PDP11 more than with any direct performance efficiency. -- brandon s allbery kf8nh Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, February 14, 2013 at 6:27 PM, David Thomas wrote: (I'll be brief because my head is hurting, but please don't interpret that as an intent to offend) A few points: 1) Capture groups are all you need to do some meaningful interpretation of data; these were around long before perl. 2) Yacc is typically used in conjunction with lex, partly for (a) efficiency and partly for (b) ease of use (compared to writing out [a-z] as production rules). 3) I've actually used lex without yacc (well, flex without bison) when faced with dealing with a language that's regular (and easy enough to express that way - cf. an enormous finite subset of a context-free language). 2b is mostly irrelevant in Haskell, as Parsec already provides functions that can easily match the same things a regexp would. 2a, if it stands up to testing, is the best argument for ripping things apart in Haskell using a DFA. Parsec and cousins are efficient, but it's hard to beat a single table lookup per character. The questions are 1) is the difference enough to matter in many cases, and 2) is there a way to get this out of parsec without touching regexps? (It's not impossible that parsec already recognizes when a language is regular, although I'd be weakly surprised). On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton w...@freegeek.org (mailto:w...@freegeek.org) wrote: On 2/13/13 11:18 PM, wren ng thornton wrote: On 2/13/13 11:32 AM, Nicolas Bock wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). Just to be clear, the problem isn't that proper regexes are only good for regular languages (many files have regular syntax afterall). The problem is that regexes are only good for recognition. They're an excellent tool for deciding whether a given string is good or bad; but they're completely unsuitable for the task of parsing/interpreting a string into some structure or semantic response. If you've ever used tools like yacc or javaCC, one of the crucial things they offer is the ability to add these semantic responses. Parser combinator libraries in Haskell are similar, since the string processing is integrated into a programming language so we can say things like: myParser = do x - blah guard (p x) y - blargh return (f x y) where p and f can be an arbitrary Haskell functions. Perl extends on regular expressions to try and do things like this, but it's extremely baroque, hard to get right, and impossible to maintain. (N.B., I was raised on Perl and still love it.) And at some point we have to call into question the idea of regexes as an embedded DSL when we then turn around and try to have Perl be a DSL embedded into the regex language. One of the big things that makes regexes so nice is that they identify crucial combinators like choice and repetition. However, once those combinators have been identified, we can just offer them directly as functions in the host language. No need for a special DSL or special syntax. The big trick is doing this efficiently. Parser combinators were an academic curiosity for a long time until Parsec came around and made them efficient. And we've come a long way since then: with things like attoparsec, PEG parsing, and non-monadic applicative parsers (which can perform more optimizations because they can identify the structure of the grammar). The theory of regular expressions is indeed beautiful and elegant. However, it's a theory of recognition, not a theory of parsing; and that's a crucial distinction. Haskell is about clear,
Re: [Haskell-cafe] performance question
ByteString gains most improvements as String must be converted o CStringfirst, internaly, in regex (this is warpper for libpcre), while ByteString not.libpcre is much faster than posix (I guess posix is also wrapper).Interface for libpcre is same as for Posix, there is no real effortin replacing it. Date: Tue, 12 Feb 2013 20:32:01 -0800 From: bri...@aracnet.com To: nicolasb...@gmail.com CC: bm...@hotmail.com; b...@redivi.com; haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] performance question On Tue, 12 Feb 2013 15:57:37 -0700 Nicolas Bock nicolasb...@gmail.com wrote: Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble. I was hoping Branimir could comment on how the improvements were allocated. how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ? how much can be attributed to using data.bytestring ? you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? Brian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Tue, Feb 12, 2013 at 11:32 PM, bri...@aracnet.com wrote: actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? PCRE is pretty heavily optimized. POSIX regex engines generally rely on vendor regex libraries which my not be well optimized; there is a native Haskell implementation as well, but that one runs into a different issue, namely a lack of interest (regexes are often seen as foreign to Haskell-think, so there's little interest in making them work well; people who *do* need them for some reason usually punt to pcre). -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Thanks, nick On Wed, Feb 13, 2013 at 8:12 AM, Brandon Allbery allber...@gmail.comwrote: On Tue, Feb 12, 2013 at 11:32 PM, bri...@aracnet.com wrote: actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? PCRE is pretty heavily optimized. POSIX regex engines generally rely on vendor regex libraries which my not be well optimized; there is a native Haskell implementation as well, but that one runs into a different issue, namely a lack of interest (regexes are often seen as foreign to Haskell-think, so there's little interest in making them work well; people who *do* need them for some reason usually punt to pcre). -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
One way in which regexps are foreign to Haskell-think is that, if they break, they generally break at run-time. This could be ameliorated with template haskell, but a substantial portion of Haskell coders find that a smell itself. On Wed, Feb 13, 2013 at 8:32 AM, Nicolas Bock nicolasb...@gmail.com wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Thanks, nick On Wed, Feb 13, 2013 at 8:12 AM, Brandon Allbery allber...@gmail.comwrote: On Tue, Feb 12, 2013 at 11:32 PM, bri...@aracnet.com wrote: actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? PCRE is pretty heavily optimized. POSIX regex engines generally rely on vendor regex libraries which my not be well optimized; there is a native Haskell implementation as well, but that one runs into a different issue, namely a lack of interest (regexes are often seen as foreign to Haskell-think, so there's little interest in making them work well; people who *do* need them for some reason usually punt to pcre). -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Wed, 2013-02-13 at 08:39 -0800, David Thomas wrote: One way in which regexps are foreign to Haskell-think is that, if they break, they generally break at run-time. This could be ameliorated with template haskell Care to elaborate on the ameliorate using TH part? I figure regexes would be mostly used to parse some runtime-provided string, so how could compile-TH provide any help? Nicolas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Well, this runtime errors are actually type errors. Regexps are actually a DSL, which is not embedded in Haskell. But it could be. Strings won't work for that, but something like that would: filter (match $ a many anyChar .txt) filenames and this certainly can be produced by TH like that: filter (match $(regexp a.*\\.txt)) filenames On Feb 13, 2013, at 8:43 PM, Nicolas Trangez nico...@incubaid.com wrote: On Wed, 2013-02-13 at 08:39 -0800, David Thomas wrote: One way in which regexps are foreign to Haskell-think is that, if they break, they generally break at run-time. This could be ameliorated with template haskell Care to elaborate on the ameliorate using TH part? I figure regexes would be mostly used to parse some runtime-provided string, so how could compile-TH provide any help? Nicolas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Wed, Feb 13, 2013 at 11:32 AM, Nicolas Bock nicolasb...@gmail.comwrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, The native solution is a parser like parsec/attoparsec. The problem with regexes is that you can't at compile time verify that, for example, you have as many matching groups in the regex as the code using it expects, nor does an optional matching group behave as a Maybe like it should; nor are there nice ways to recover. A parser gives you full control and better compile time checking, and is generally recommended. -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
I don't think you can do much about fails to match the input string - indeed, that's often desired behavior... and matches the wrong thing you can only catch with testing. The simplest place template haskell could help with is when the expression isn't a valid expression in the first place, and will fail to compile. If you're just validating, I don't think you can do better; in order to improve your confidence of correctness, your only option is testing against a set of positives and negatives. If you're capturing, you might be able to do a little better, if you are able to get some of that info into the types (number of capture groups expected, for instance) - then, if your code expects to deal with a different number of captured pieces than your pattern represents, it can be caught at compile time. If you're capturing strings that you intend to convert to other types, and can decorate regexp components with the type they're going to capture (which can then be quickchecked - certainly a pattern should never match and then fail to read, c), and if you are able to propagate this info during composition, you might actually be able to catch a good chunk of errors. Note that much of this works quite a bit different than most existing regexp library APIs, where you pass a bare string and captures wind up in some kind of list, which I expect is much of the reason no one's done it yet (so far as I'm aware). On Wed, Feb 13, 2013 at 8:43 AM, Nicolas Trangez nico...@incubaid.comwrote: On Wed, 2013-02-13 at 08:39 -0800, David Thomas wrote: One way in which regexps are foreign to Haskell-think is that, if they break, they generally break at run-time. This could be ameliorated with template haskell Care to elaborate on the ameliorate using TH part? I figure regexes would be mostly used to parse some runtime-provided string, so how could compile-TH provide any help? Nicolas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
The fact that parsec and attoparsec exist and can be pressed into service with reasonable performance (I think?) on tasks for which regexps are suitable is probably another big part of the reason no one's done it yet. I expect much of the plumbing would wind up looking a lot like those, actually. On Wed, Feb 13, 2013 at 9:43 AM, David Thomas davidleotho...@gmail.comwrote: I don't think you can do much about fails to match the input string - indeed, that's often desired behavior... and matches the wrong thing you can only catch with testing. The simplest place template haskell could help with is when the expression isn't a valid expression in the first place, and will fail to compile. If you're just validating, I don't think you can do better; in order to improve your confidence of correctness, your only option is testing against a set of positives and negatives. If you're capturing, you might be able to do a little better, if you are able to get some of that info into the types (number of capture groups expected, for instance) - then, if your code expects to deal with a different number of captured pieces than your pattern represents, it can be caught at compile time. If you're capturing strings that you intend to convert to other types, and can decorate regexp components with the type they're going to capture (which can then be quickchecked - certainly a pattern should never match and then fail to read, c), and if you are able to propagate this info during composition, you might actually be able to catch a good chunk of errors. Note that much of this works quite a bit different than most existing regexp library APIs, where you pass a bare string and captures wind up in some kind of list, which I expect is much of the reason no one's done it yet (so far as I'm aware). On Wed, Feb 13, 2013 at 8:43 AM, Nicolas Trangez nico...@incubaid.comwrote: On Wed, 2013-02-13 at 08:39 -0800, David Thomas wrote: One way in which regexps are foreign to Haskell-think is that, if they break, they generally break at run-time. This could be ameliorated with template haskell Care to elaborate on the ameliorate using TH part? I figure regexes would be mostly used to parse some runtime-provided string, so how could compile-TH provide any help? Nicolas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Wed, Feb 13, 2013 at 12:46 PM, David Thomas davidleotho...@gmail.comwrote: The fact that parsec and attoparsec exist and can be pressed into service with reasonable performance (I think?) on tasks for which regexps are suitable is probably another big part of the reason no one's done it yet. I expect much of the plumbing would wind up looking a lot like those, actually. When I started out with Haskell, one of my early thoughts was about designing a DSL for Icon-style pattern matching; I dropped it when I realized I was reinventing (almost identically, at least for its lower level combinators) Parsec. Nothing really to be gained except from a tutelary standpoint. And the mapping from Icon patterns to regex patterns is pretty much mechanical if you phrase it so you aren't executing code in the middle. -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 10.02.2013 02:30, Nicolas Bock wrote: Hi Aleksey, could you show me how I would use ByteString? I can't get the script to compile. It's complaining that: No instance for (RegexContext Regex Data.ByteString.ByteString (AllTextSubmatches [] a0)) which is too cryptic for me. Is it not able to form a regular expression with a ByteString argument? From the documentation of Text.Regex.Posix it seems that it should be. Maybe it's because I am trying to read (r!!1) :: Double which I am having issues with also. Is (r!!1) a ByteString? And if so, how would I convert that to a Double? It's error message from regex library you use. I can't say what exactly it means, I never used it. But most likely it cannot work with bytestrings. Most other languages rely on regexp as go to tool for parsing. In haskell main parsing tools are parser combinators such as parsec[1] or attoparsec[2]. Parsec is more generic and attoparsec is much faster. In attachment program which uses attoparsec for parsing it's about 2times slower than C++ example posted in the thread. [1] http://hackage.haskell.org/package/parsec [2] http://hackage.haskell.org/package/attoparsec {-# LANGUAGE OverloadedStrings #-} import Control.Applicative import Data.Histogram.Fill import Data.Histogram (Histogram) import Data.Attoparsec.Text.Lazy(parse,Result(..)) import Data.Attoparsec.Text hiding (parse,Done,Fail) import qualified Data.Text.Lazyas T import qualified Data.Text.Lazy.IO as T import Prelude hiding (takeWhile) hb :: HBuilder Double (Histogram LogBinD Int) hb = forceInt - mkSimple (logBinDN 1e-8 10 10) streamBS :: T.Text - [Double] streamBS bs | T.null bs = [] | otherwise = case parse go bs of Done rest x - x : streamBS rest Fail _ cxt e - error $ e ++ ++ show cxt where num = decimal :: Parser Int go = string matrix( * num * char ',' * num * char ')' * takeWhile (==' ') * char '=' * takeWhile (== ' ') * double * endOfLine main :: IO () main = do print . fillBuilder hb . streamBS = T.getContents ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 13.02.2013 21:41, Brandon Allbery wrote: On Wed, Feb 13, 2013 at 11:32 AM, Nicolas Bock nicolasb...@gmail.com mailto:nicolasb...@gmail.com wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, The native solution is a parser like parsec/attoparsec. The problem with regexes is that you can't at compile time verify that, for example, you have as many matching groups in the regex as the code using it expects, nor does an optional matching group behave as a Maybe like it should; nor are there nice ways to recover. A parser gives you full control and better compile time checking, and is generally recommended. Regexps only have this problem if they are compiled from string. Nothing prevents from building them using combinators. regex-applicaitve[1] uses this approach and quite nice to use. [1] http://hackage.haskell.org/package/regex-applicative ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 13.02.2013 21:41, Brandon Allbery wrote: The native solution is a parser like parsec/attoparsec. Aleksey Khudyakov alexey.sklad...@gmail.com replied Regexps only have this problem if they are compiled from string. Nothing prevents from building them using combinators. regex-applicative[1] uses this approach and quite nice to use. [1] http://hackage.haskell.org/package/regex-applicative That _is_ a nice package, but it _is_ 'a parser like parsec/attoparsec'. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Wed, Feb 13, 2013 at 5:45 PM, o...@cs.otago.ac.nz wrote: On 13.02.2013 21:41, Brandon Allbery wrote: The native solution is a parser like parsec/attoparsec. Aleksey Khudyakov alexey.sklad...@gmail.com replied Regexps only have this problem if they are compiled from string. Nothing prevents from building them using combinators. regex-applicative[1] uses this approach and quite nice to use. [1] http://hackage.haskell.org/package/regex-applicative That _is_ a nice package, but it _is_ 'a parser like parsec/attoparsec'. Well, yes; it's a case in point. -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 2/13/13 11:32 AM, Nicolas Bock wrote: Since I have very little experience with Haskell and am not used to Haskell-think yet, I don't quite understand your statement that regexes are seen as foreign to Haskell-think. Could you elaborate? What would a more native solution look like? From what I have learned so far, it seems to me that Haskell is a lot about clear, concise, and well structured code. I find regexes extremely compact and powerful, allowing for very concise code, which should fit the bill perfectly, or shouldn't it? Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
wren ng thornton wrote: Regexes are powerful and concise for recognizing regular languages. They are not, however, very good for *parsing* regular languages; nor can they handle non-regular languages (unless you're relying on the badness of pcre). In other languages people press regexes into service for parsing because the alternative is using an external DSL like lex/yacc, javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for parsing context-free languages and beyond (e.g., parsec, attoparsec). This cannot be emphasized heavily enough. Once you have learnt how to use one or more of these parsec libraries they will become your main tool for parsing everything from complex input languages like haskell itself, all the way down to relatively simple config files. Parsec style parsers are built up out of small composable (and more importantly reusable) combinators, that are easier to read and easier to maintain than anything other than the most trivial regex. Erik -- -- Erik de Castro Lopo http://www.mega-nerd.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Thanks so much for your efforts, this really helped! Thanks again, nick On Sat, Feb 9, 2013 at 11:54 PM, Branimir Maksimovic bm...@hotmail.comwrote: Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble. bmaxa@maxa:~/haskell$ time ./printMatrixDecay - output.txt read 16384 matrix elements (128x128 = 16384) [0.00e0, 1.00e-8) = 0 (0.00%) 0 [1.00e-8, 1.00e-7) = 0 (0.00%) 0 [1.00e-7, 1.00e-6) = 0 (0.00%) 0 [1.00e-6, 1.00e-5) = 0 (0.00%) 0 [1.00e-5, 1.00e-4) = 1 (0.01%) 1 [1.00e-4, 1.00e-3) = 17 (0.10%) 18 [1.00e-3, 1.00e-2) = 155 (0.95%) 173 [1.00e-2, 1.00e-1) = 1434 (8.75%) 1607 [1.00e-1, 1.00e0) = 14777 (90.19%) 16384 [1.00e0, 2.00e0) = 0 (0.00%) 16384 real0m0.031s user0m0.028s sys 0m0.000s bmaxa@maxa:~/haskell$ time ./printMatrixDecay.py - output.txt (-) read 16384 matrix elements (128x128 = 16384) [0.00e+00, 1.00e-08) = 0 (0.00%) 0 [1.00e-08, 1.00e-07) = 0 (0.00%) 0 [1.00e-07, 1.00e-06) = 0 (0.00%) 0 [1.00e-06, 1.00e-05) = 0 (0.00%) 0 [1.00e-05, 1.00e-04) = 1 (0.00%) 1 [1.00e-04, 1.00e-03) = 17 (0.00%) 18 [1.00e-03, 1.00e-02) = 155 (0.00%) 173 [1.00e-02, 1.00e-01) = 1434 (0.00%) 1607 [1.00e-01, 1.00e+00) = 14777 (0.00%) 16384 [1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.081s user0m0.080s sys 0m0.000s Program follows... import System.Environment import Text.Printf import Text.Regex.PCRE import Data.Maybe import Data.Array.IO import Data.Array.Unboxed import qualified Data.ByteString.Char8 as B import Data.ByteString.Lex.Double (readDouble) strataBounds :: UArray Int Double strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ] newStrataCounts :: IO(IOUArray Int Int) newStrataCounts = newArray (bounds strataBounds) 0 main = do l - B.getContents let a = B.lines l strataCounts - newStrataCounts n - calculate strataCounts a 0 let printStrataCounts :: IO () printStrataCounts = do let s = round $ sqrt (fromIntegral n::Double) :: Int printf read %d matrix elements (%dx%d = %d)\n n s s n printStrataCounts' 0 0 printStrataCounts' :: Int - Int - IO () printStrataCounts' i total | i (snd $ bounds strataBounds) = do count - readArray strataCounts i let p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double) printf [%1.2e, %1.2e) = %i (%1.2f%%) %i\n (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count) | otherwise = return () printStrataCounts calculate :: IOUArray Int Int - [B.ByteString] - Int - IO Int calculate _ [] n = return n calculate counts (l:ls) n = do let a = case getAllTextSubmatches $ l =~ B.pack matrix.*= ([0-9eE.+-]+)$ :: [B.ByteString] of [_,v] - Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString)) _ - Nothing b = (fst.fromJust.fromJust) a loop :: Int - IO() loop i | i (snd $ bounds strataBounds) = if (b = (strataBounds ! i)) (b (strataBounds ! (i+1))) then do c - readArray counts i writeArray counts i (c+1) else loop (i+1) | otherwise = return () if isNothing a then calculate counts ls n else do loop 0 calculate counts ls (n+1) -- From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real0m2.655s user0m2.677s sys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445s user0m0.615s sys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Tue, 12 Feb 2013 15:57:37 -0700 Nicolas Bock nicolasb...@gmail.com wrote: Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble. I was hoping Branimir could comment on how the improvements were allocated. how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ? how much can be attributed to using data.bytestring ? you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? Brian import Text.Regex.PCRE import Data.Maybe import Data.Array.IO import Data.Array.Unboxed import qualified Data.ByteString.Char8 as B import Data.ByteString.Lex.Double (readDouble) strataBounds :: UArray Int Double strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ] newStrataCounts :: IO(IOUArray Int Int) newStrataCounts = newArray (bounds strataBounds) 0 main = do l - B.getContents let a = B.lines l strataCounts - newStrataCounts n - calculate strataCounts a 0 let printStrataCounts :: IO () printStrataCounts = do let s = round $ sqrt (fromIntegral n::Double) :: Int printf read %d matrix elements (%dx%d = %d)\n n s s n printStrataCounts' 0 0 printStrataCounts' :: Int - Int - IO () printStrataCounts' i total | i (snd $ bounds strataBounds) = do count - readArray strataCounts i let p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double) printf [%1.2e, %1.2e) = %i (%1.2f%%) %i\n (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count) | otherwise = return () printStrataCounts calculate :: IOUArray Int Int - [B.ByteString] - Int - IO Int calculate _ [] n = return n calculate counts (l:ls) n = do let a = case getAllTextSubmatches $ l =~ B.pack matrix.*= ([0-9eE.+-]+)$ :: [B.ByteString] of [_,v] - Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString)) _ - Nothing b = (fst.fromJust.fromJust) a loop :: Int - IO() loop i | i (snd $ bounds strataBounds) = if (b = (strataBounds ! i)) (b (strataBounds ! (i+1))) then do c - readArray counts i writeArray counts i (c+1) else loop (i+1) | otherwise = return () if isNothing a then calculate counts ls n else do loop 0 calculate counts ls (n+1) -- From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real0m2.655s user0m2.677s sys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445s user0m0.615s sys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Tuesday, February 12, 2013, wrote: On Tue, 12 Feb 2013 15:57:37 -0700 Nicolas Bock nicolasb...@gmail.com javascript:; wrote: Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble. I was hoping Branimir could comment on how the improvements were allocated. how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ? how much can be attributed to using data.bytestring ? you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? I think that there are two bottlenecks: the regex engine, and converting a bytestring to a double. There doesn't appear to be a fast and accurate strtod implementation for Haskell, and the faster regex implementations that I could find appear to be unmaintained. Brian import Text.Regex.PCRE import Data.Maybe import Data.Array.IO import Data.Array.Unboxed import qualified Data.ByteString.Char8 as B import Data.ByteString.Lex.Double (readDouble) strataBounds :: UArray Int Double strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ] newStrataCounts :: IO(IOUArray Int Int) newStrataCounts = newArray (bounds strataBounds) 0 main = do l - B.getContents let a = B.lines l strataCounts - newStrataCounts n - calculate strataCounts a 0 let printStrataCounts :: IO () printStrataCounts = do let s = round $ sqrt (fromIntegral n::Double) :: Int printf read %d matrix elements (%dx%d = %d)\n n s s n printStrataCounts' 0 0 printStrataCounts' :: Int - Int - IO () printStrataCounts' i total | i (snd $ bounds strataBounds) = do count - readArray strataCounts i let p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double) printf [%1.2e, %1.2e) = %i (%1.2f%%) %i\n (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count) | otherwise = return () printStrataCounts calculate :: IOUArray Int Int - [B.ByteString] - Int - IO Int calculate _ [] n = return n calculate counts (l:ls) n = do let a = case getAllTextSubmatches $ l =~ B.pack matrix.*= ([0-9eE.+-]+)$ :: [B.ByteString] of [_,v] - Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString)) _ - Nothing b = (fst.fromJust.fromJust) a loop :: Int - IO() loop i | i (snd $ bounds strataBounds) = if (b = (strataBounds ! i)) (b (strataBounds ! (i+1))) then do c - readArray counts i writeArray counts i (c+1) else loop (i+1) | otherwise = return () if isNothing a then calculate counts ls n else do loop 0 calculate counts ls (n+1) -- From: nicolasb...@gmail.com javascript:; Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org javascript:; Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real0m2.655s user0m2.677s sys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445s user0m0.615s sys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:; http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org
Re: [Haskell-cafe] performance question
On Fri, Feb 8, 2013 at 1:23 PM, Aleksey Khudyakov alexey.sklad...@gmail.com wrote: On 08.02.2013 23:26, Nicolas Bock wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. General performance hints 1) Strings are slow. Fast alternatives are text[1] for textual data and bytestrings[2] for binary data. I can't say anything about performance of Text.Regex.Posix. Hi Aleksey, could you show me how I would use ByteString? I can't get the script to compile. It's complaining that: No instance for (RegexContext Regex Data.ByteString.ByteString (AllTextSubmatches [] a0)) which is too cryptic for me. Is it not able to form a regular expression with a ByteString argument? From the documentation of Text.Regex.Posix it seems that it should be. Maybe it's because I am trying to read (r!!1) :: Double which I am having issues with also. Is (r!!1) a ByteString? And if so, how would I convert that to a Double? Thanks, nick 2) Appending list wrong operation to do in performance sensitive code. (++) traverses its first argument so it's O(n) in its length. What exactly are you tryeing to do? Create a histogram? The Haskell script was compiled with ghc --make printMatrixDecay.hs. If you want performance you absolutely should use -O2. [1] http://hackage.haskell.org/**package/texthttp://hackage.haskell.org/package/text [2] http://hackage.haskell.org/**package/bytestringhttp://hackage.haskell.org/package/bytestring __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Fri, Feb 8, 2013 at 1:23 PM, Aleksey Khudyakov alexey.sklad...@gmail.com wrote: On 08.02.2013 23:26, Nicolas Bock wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. General performance hints 1) Strings are slow. Fast alternatives are text[1] for textual data and bytestrings[2] for binary data. I can't say anything about performance of Text.Regex.Posix. 2) Appending list wrong operation to do in performance sensitive code. (++) traverses its first argument so it's O(n) in its length. What exactly are you tryeing to do? Create a histogram? The Haskell script was compiled with ghc --make printMatrixDecay.hs. If you want performance you absolutely should use -O2. Another question: When I compile the code with --make and -O2, and then run it on a larger matrix, I get this error message: $ ./createMatrixDump.py -N 512 | ./printMatrixDecay Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. When I use runghc instead, I don't get an error. What does this error mean, and how do I fix it? Thanks, nick [1] http://hackage.haskell.org/**package/texthttp://hackage.haskell.org/package/text [2] http://hackage.haskell.org/**package/bytestringhttp://hackage.haskell.org/package/bytestring __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
I've been playing with your example to optimize it a bit, I have to run but here's what I have so far. It's about as fast as the Python code, I'll make it faster when I have more time over the next few days. See https://gist.github.com/etrepum/4747507 and https://gist.github.com/etrepum/4747507/revisions On Sat, Feb 9, 2013 at 2:35 PM, Nicolas Bock nicolasb...@gmail.com wrote: On Fri, Feb 8, 2013 at 1:23 PM, Aleksey Khudyakov alexey.sklad...@gmail.com wrote: On 08.02.2013 23:26, Nicolas Bock wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. General performance hints 1) Strings are slow. Fast alternatives are text[1] for textual data and bytestrings[2] for binary data. I can't say anything about performance of Text.Regex.Posix. 2) Appending list wrong operation to do in performance sensitive code. (++) traverses its first argument so it's O(n) in its length. What exactly are you tryeing to do? Create a histogram? The Haskell script was compiled with ghc --make printMatrixDecay.hs. If you want performance you absolutely should use -O2. Another question: When I compile the code with --make and -O2, and then run it on a larger matrix, I get this error message: $ ./createMatrixDump.py -N 512 | ./printMatrixDecay Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. When I use runghc instead, I don't get an error. What does this error mean, and how do I fix it? Thanks, nick [1] http://hackage.haskell.org/**package/texthttp://hackage.haskell.org/package/text [2] http://hackage.haskell.org/**package/bytestringhttp://hackage.haskell.org/package/bytestring __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Here is haskell version that is faster than python, almost as fast as c++.You need to install bytestring-lexing package for readDouble. bmaxa@maxa:~/haskell$ time ./printMatrixDecay - output.txtread 16384 matrix elements (128x128 = 16384)[0.00e0, 1.00e-8) = 0 (0.00%) 0[1.00e-8, 1.00e-7) = 0 (0.00%) 0[1.00e-7, 1.00e-6) = 0 (0.00%) 0[1.00e-6, 1.00e-5) = 0 (0.00%) 0[1.00e-5, 1.00e-4) = 1 (0.01%) 1[1.00e-4, 1.00e-3) = 17 (0.10%) 18[1.00e-3, 1.00e-2) = 155 (0.95%) 173[1.00e-2, 1.00e-1) = 1434 (8.75%) 1607[1.00e-1, 1.00e0) = 14777 (90.19%) 16384[1.00e0, 2.00e0) = 0 (0.00%) 16384 real0m0.031suser0m0.028ssys 0m0.000sbmaxa@maxa:~/haskell$ time ./printMatrixDecay.py - output.txt(-) read 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.00%) 1[1.00e-04, 1.00e-03) = 17 (0.00%) 18[1.00e-03, 1.00e-02) = 155 (0.00%) 173[1.00e-02, 1.00e-01) = 1434 (0.00%) 1607[1.00e-01, 1.00e+00) = 14777 (0.00%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.081suser0m0.080ssys 0m0.000s Program follows... import System.Environmentimport Text.Printfimport Text.Regex.PCREimport Data.Maybeimport Data.Array.IOimport Data.Array.Unboxedimport qualified Data.ByteString.Char8 as Bimport Data.ByteString.Lex.Double (readDouble) strataBounds :: UArray Int DoublestrataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ] newStrataCounts :: IO(IOUArray Int Int)newStrataCounts = newArray (bounds strataBounds) 0 main = dol - B.getContentslet a = B.lines lstrataCounts - newStrataCountsn - calculate strataCounts a 0let printStrataCounts :: IO ()printStrataCounts = dolet s = round $ sqrt (fromIntegral n::Double) :: Intprintf read %d matrix elements (%dx%d = %d)\n n s s nprintStrataCounts' 0 0 printStrataCounts' :: Int - Int - IO ()printStrataCounts' i total | i (snd $ bounds strataBounds) = docount - readArray strataCounts ilet p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double)printf [%1.2e, %1.2e) = %i (%1.2f%%) %i\n (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count)| otherwise = return () printStrataCounts calculate :: IOUArray Int Int - [B.ByteString] - Int - IO Intcalculate _ [] n = return ncalculate counts (l:ls) n = dolet a = case getAllTextSubmatches $ l =~ B.pack matrix.*= ([0-9eE.+-]+)$ :: [B.ByteString] of[_,v] - Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString))_ - Nothingb = (fst.fromJust.fromJust) aloop :: Int - IO()loop i| i (snd $ bounds strataBounds) = if (b = (strataBounds ! i)) (b (strataBounds ! (i+1)))then doc - readArray counts iwriteArray counts i (c+1) else loop (i+1)| otherwise = return ()if isNothing athen calculate counts ls nelse do loop 0calculate counts ls (n+1) From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecayreal0m2.655s user0m2.677ssys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445suser0m0.615ssys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 08.02.2013 23:26, Nicolas Bock wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. General performance hints 1) Strings are slow. Fast alternatives are text[1] for textual data and bytestrings[2] for binary data. I can't say anything about performance of Text.Regex.Posix. 2) Appending list wrong operation to do in performance sensitive code. (++) traverses its first argument so it's O(n) in its length. What exactly are you tryeing to do? Create a histogram? The Haskell script was compiled with ghc --make printMatrixDecay.hs. If you want performance you absolutely should use -O2. [1] http://hackage.haskell.org/package/text [2] http://hackage.haskell.org/package/bytestring ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Do you mind posting createMatrixDump.py and printMatrixDecay.py? That would certainly make it easier to help you. On Fri, Feb 8, 2013 at 11:26 AM, Nicolas Bock nicolasb...@gmail.com wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real0m2.655s user0m2.677s sys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445s user0m0.615s sys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Sorry, should have done this right away. Here are the other two scripts. On Fri, Feb 8, 2013 at 1:45 PM, Bob Ippolito b...@redivi.com wrote: Do you mind posting createMatrixDump.py and printMatrixDecay.py? That would certainly make it easier to help you. On Fri, Feb 8, 2013 at 11:26 AM, Nicolas Bock nicolasb...@gmail.comwrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real0m2.655s user0m2.677s sys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445s user0m0.615s sys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe createMatrixDump.py Description: Binary data printMatrixDecay.py Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Fri, Feb 8, 2013 at 1:23 PM, Aleksey Khudyakov alexey.sklad...@gmail.com wrote: On 08.02.2013 23:26, Nicolas Bock wrote: Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. General performance hints 1) Strings are slow. Fast alternatives are text[1] for textual data and bytestrings[2] for binary data. I can't say anything about performance of Text.Regex.Posix. Thanks for the suggestion, I will try that. 2) Appending list wrong operation to do in performance sensitive code. (++) traverses its first argument so it's O(n) in its length. What exactly are you tryeing to do? Create a histogram? Yes, a histogram. The binning code is really a little awkward. I haven't gotten used to thinking in terms of inmutable objects yet and this list appending is really a pretty bad hack to kind of allow me to increment the bin counts. How would one do this more haskellishish? The Haskell script was compiled with ghc --make printMatrixDecay.hs. If you want performance you absolutely should use -O2. I'll try that. [1] http://hackage.haskell.org/**package/texthttp://hackage.haskell.org/package/text [2] http://hackage.haskell.org/**package/bytestringhttp://hackage.haskell.org/package/bytestring __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On 09.02.2013 01:02, Nicolas Bock wrote: Yes, a histogram. The binning code is really a little awkward. I haven't gotten used to thinking in terms of inmutable objects yet and this list appending is really a pretty bad hack to kind of allow me to increment the bin counts. How would one do this more haskellishish? Histogramming is a bit awkward in haskell. If we want to stick to immutable data types best choice is to have map bin → number of entries. For every new entry select bin and add 1 to its content. But if we want to store data in array we have to deal with mutable state. It's not realistic to copy array on every update. For that case I wrote I library histogram-fill[1]. Below is program which does approximately same thing. import Data.Histogram.Fill import Data.Histogram (Histogram) hb :: HBuilder String (Histogram LogBinD Int) hb = forceInt - mkSimple (logBinDN 1e-8 10 10) - read main :: IO () main = do l - getContents print $ fillBuilder hb $ lines l I cheated and used sed to strip unused data. It uses String so it's still slower than python. $ time (python gen.py -N 300 | sed 's/.*=//' | ./printMatrixDecay ) real0m0.958s user0m2.096s sys 0m0.052s $ time (python gen.py -N 300 | python printMatrixDecay.py -) real0m0.590s user0m0.952s sys 0m0.016s [1] http://hackage.haskell.org/package/histogram-fill ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Heh, I have wrote c++ version and that is much faster than python ;) bmaxa@maxa:~/haskell$ time ./createMatrixDump.py -N 128 output.txt real0m0.041suser0m0.040ssys 0m0.000sbmaxa@maxa:~/haskell$ time ./printMatrixDecay.py - output.txt(-) read 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.00%) 1[1.00e-04, 1.00e-03) = 15 (0.00%) 16[1.00e-03, 1.00e-02) = 149 (0.00%) 165[1.00e-02, 1.00e-01) = 1425 (0.00%) 1590[1.00e-01, 1.00e+00) = 14794 (0.00%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.081suser0m0.072ssys 0m0.008sbmaxa@maxa:~/haskell$ time ./printMatrixDecay output.txtread 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.01%) 1[1.00e-04, 1.00e-03) = 15 (0.09%) 16[1.00e-03, 1.00e-02) = 149 (0.91%) 165[1.00e-02, 1.00e-01) = 1425 (8.70%) 1590[1.00e-01, 1.00e+00) = 14794 (90.30%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.018suser0m0.012ssys 0m0.004s unfortunately g++ does not have regex implemented yet so I used libpcre ... #include pcre.h#include sstream#include cstdio#include cmath#include iostream#include stdexcept#include vector template class Fvoid regex(const std::string in, const std::string pattern,int n,F f){int ovec[3*n],position;const char* error; int errorpos; pcre* pe = pcre_compile(pattern.c_str(),0,error,errorpos,0); if(!pe)throw std::runtime_error(error); pcre_extra* extra=pcre_study(pe,0,error); for(position = 0; pcre_exec(pe,extra,in.c_str(),in.size(),position,0,ovec,3*n)=0; position = ovec[1])f(position,ovec);f(position,ovec);pcre_free(extra); pcre_free(pe); } int main(){ std::ios::sync_with_stdio(false); std::ostringstream oss; oss std::cin.rdbuf(); const std::string in = oss.str(); std::vectordouble strataBounds = { 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 }; std::vectorint strataCounts(strataBounds.size()); unsigned N = 0; auto f = [](int position,int* ovec) {if(int(position) ovec[0])return;++N;double aij = 0.0;std::istringstream iss(in.substr(ovec[2],ovec[3]-ovec[2]));iss aij;aij=fabs(aij); for(unsigned i = 0; i strataBounds.size() - 1; ++i){ if(aij = strataBounds[i] aij strataBounds[i+1]) {++strataCounts[i]; break; }} }; regex(in,matrix.*= ([0-9.eE+-]+)\n,2,f); printf(read %d matrix elements (%dx%d = %d)\n,N,int(sqrt(N)),int(sqrt(N)),N); int total = 0; for(unsigned i = 0; i strataBounds.size()-1;++i) {total += strataCounts[i];printf([%1.2e, %1.2e) = %d (%1.2f%%) %d\n, strataBounds[i], strataBounds[i+1],strataCounts[i], 100*(double(strataCounts[i])/N), total); }} From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecayreal0m2.655s user0m2.677ssys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445suser0m0.615ssys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
A lte reply, but if you still need to have circular module depency: 4.6.9. How to compile mutually recursive modules in http://www.haskell.org/ghc/docs/latest/html/users_guide/separate-compilation.html On 21 March 2010 01:31, Arnoldo Muller arnoldomul...@gmail.com wrote: Hello Daniel, Regarding your solution, can I apply {-# SPECIALISE ... #-} statements to datatypes I define? And if so, I am not able to import the datatypes to the module where binarySearch is. The problem is that if I import them a circular dependency is detected and the compiler gives an error. Is there a way of importing a datatype from another module do avoid this circular dependency? Thank you, Arnoldo On Thu, Mar 18, 2010 at 10:48 PM, Daniel Fischer daniel.is.fisc...@web.de wrote: Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer: Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests). However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete probably.). {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits {-# SPECIALISE binarySearch :: Double - Array Int Double - Int #-} {-# SPECIALISE binarySearch :: Int - Array Int Int - Int #-} {-# SPECIALISE binarySearch :: Bool - Array Int Bool - Int #-} {-# SPECIALISE binarySearch :: Char - Array Int Char - Int #-} {-# SPECIALISE binarySearch :: Float - Array Int Float - Int #-} binarySearch :: Ord a = a - Array Int a - Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi lo = -(lo+1) | otherwise = case compare mv q of LT - go (m+1) hi EQ - m GT - go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Ozgur Akgun ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Hello Daniel, Regarding your solution, can I apply {-# SPECIALISE ... #-} statements to datatypes I define? And if so, I am not able to import the datatypes to the module where binarySearch is. The problem is that if I import them a circular dependency is detected and the compiler gives an error. Is there a way of importing a datatype from another module do avoid this circular dependency? Thank you, Arnoldo On Thu, Mar 18, 2010 at 10:48 PM, Daniel Fischer daniel.is.fisc...@web.dewrote: Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer: Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests). However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete probably.). {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits {-# SPECIALISE binarySearch :: Double - Array Int Double - Int #-} {-# SPECIALISE binarySearch :: Int - Array Int Int - Int #-} {-# SPECIALISE binarySearch :: Bool - Array Int Bool - Int #-} {-# SPECIALISE binarySearch :: Char - Array Int Char - Int #-} {-# SPECIALISE binarySearch :: Float - Array Int Float - Int #-} binarySearch :: Ord a = a - Array Int a - Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi lo = -(lo+1) | otherwise = case compare mv q of LT - go (m+1) hi EQ - m GT - go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Am Donnerstag 18 März 2010 19:59:33 schrieb Arnoldo Muller: Hello! I am trying to implement a binary search function that returns the index of an exact or the (index + 1) where the item should be inserted in an array if the item to be searched is not found (I am not trying to insert data in the array) . Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times. If it's called often, and the arrays are 0-based and Int-indexed, import Data.Array.Base (unsafeAt) and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster. Do you have any ideas on how to improve the performance of this function? import Data.Array.IArray type IntArray a = Array Int a -- The array must be 0 indexed. binarySearch :: Ord a = a - IntArray a - Int binarySearch query array = let (low, high) = bounds array in binarySearch' query array low high binarySearch' :: Ord a = a - IntArray a - Int - Int - Int binarySearch' query array !low !high | low = high = let ! mid = low + ((high - low) `div` 2) ! midVal = array ! mid in next mid midVal | otherwise = -(low + 1) where next mid midVal | midVal query = binarySearch' query array (mid + 1) | high midVal query = binarySearch' query array low | (mid - 1) otherwise = mid No obvious performance killers, maybe the 'next' function costs a little and let ... in case compare midVal query of LT - binarySearch' query array (mid+1) high EQ - mid GT - binarySearch' query array low (mid-1) would be faster. Or moving binarySearch' from the top-level into binarySearch and eliminating the two static arguments may improve performance (I seem to remember that a static argument-transform for less than three or four non-function arguments can speed the code up or slow it down, so you'd have to test; for many arguments or function arguments it's pretty certain to give a speed-up, IIRC). binarySearch query array = go low high where (low,high) = bounds array go !l !h | h l= -(l+1) | mv query = go l (m-1) | mv == query = m | otherwise = go (m+1) h where m = l + (h-l) `quot` 2 mv = array `unsafeAt` m Thank you! Arnoldo Muller ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Am Donnerstag 18 März 2010 20:49:30 schrieb Daniel Fischer: Am Donnerstag 18 März 2010 19:59:33 schrieb Arnoldo Muller: Hello! I am trying to implement a binary search function that returns the index of an exact or the (index + 1) where the item should be inserted in an array if the item to be searched is not found (I am not trying to insert data in the array) . Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times. If it's called often, and the arrays are 0-based and Int-indexed, import Data.Array.Base (unsafeAt) and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster. Do you have any ideas on how to improve the performance of this function? would be faster. Or moving binarySearch' from the top-level into binarySearch and eliminating the two static arguments may improve performance (I seem to remember that a static argument-transform for less than three or four non-function arguments can speed the code up or slow it down, so you'd have to test; for many arguments or function arguments it's pretty certain to give a speed-up, IIRC). Yep, for me {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits binarySearch :: Ord a = a - Array Int a - Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi lo = -(lo+1) | otherwise = case compare mv q of LT - go (m+1) hi EQ - m GT - go lo (m-1) where m = (lo .. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m chops ~40% off the time. 'unsafeAt' alone reduces time by ~10%, the local loop gives the biggest speedup, and the bit-fiddling instead of m = lo + (hi-lo) `quot` 2 something like 4%. If you don't like bit-fiddling or want your code to be portable to machines that don't use two's complement, the last few percent can be left alone. Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests). Thank you! Arnoldo Muller ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Daniel Fischer wrote: If it's called often, and the arrays are 0-based and Int-indexed, import Data.Array.Base (unsafeAt) and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster. Beware that unboxed arrays are strict, and changing the strictness properties of your code can have non-obvious consequences... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer: Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests). However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete probably.). {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits {-# SPECIALISE binarySearch :: Double - Array Int Double - Int #-} {-# SPECIALISE binarySearch :: Int - Array Int Int - Int #-} {-# SPECIALISE binarySearch :: Bool - Array Int Bool - Int #-} {-# SPECIALISE binarySearch :: Char - Array Int Char - Int #-} {-# SPECIALISE binarySearch :: Float - Array Int Float - Int #-} binarySearch :: Ord a = a - Array Int a - Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi lo = -(lo+1) | otherwise = case compare mv q of LT - go (m+1) hi EQ - m GT - go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On 19/03/2010, at 08:48, Daniel Fischer wrote: Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer: Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests). However, a few {-# SPECIALISE #-} pragmas set the record straight. This is because without specialising, unsafeAt is a straight (inlineable) function call for boxed arrays but is overloaded and hence much slower for unboxed ones. In general, unboxed arrays tend to be slower in generic code. The only real solution is making functions such as binarySearch INLINE. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times. Do you have any ideas on how to improve the performance of this function? Bast solution for speeding up is to write it in assembler! Ragards, Andrey -- View this message in context: http://old.nabble.com/Performance-question-tp27949969p27950864.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On 2009 Feb 27, at 1:50, Ketil Malde wrote: Lennart Augustsson lenn...@augustsson.net writes: C's rand() function is very bad and should never be used really. On Linux (really GNU libc, I suppose) it is the same as random(). But for portability, one is encouraged to spend the two extra letters. I don't understand the details, but I think the Mersenne twister is much better in any case. Yes, much better than any LC PRNG including rand(), random(), and the System V/POSIX lrand48() family. That said, Linux making rand() == random() breaks portability in the case where you want a repeatable stream for testing purposes; as usual, Linux chucks portability out the window at any opportunity. (Sadly, POSIX permits this because of POSIX implementations for non-Unix hosts, although it does include a fixed-behavior PRNG as documentation for this case.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
The random() function is only marginally better than rand(). On Fri, Feb 27, 2009 at 6:50 AM, Ketil Malde ke...@malde.org wrote: Lennart Augustsson lenn...@augustsson.net writes: C's rand() function is very bad and should never be used really. On Linux (really GNU libc, I suppose) it is the same as random(). But for portability, one is encouraged to spend the two extra letters. I don't understand the details, but I think the Mersenne twister is much better in any case. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
First thing I noticed, how about removing the sqrt in isInCircle: isInCircle :: (Floating a, Ord a) = (a,a) - Bool isInCircle (x,y) = x*x + y*y = 1.0 But you can remove sqrt from the C++ implementation as well, so it only improves the relative performance if the C++ implementation of sqrt is worse than its Haskell counterpart. Regards, Thomas hask...@kudling.de wrote: Hi, i have compared a C++ implementation with a Haskell implementation of the Monte Carlo Pi approximation: http://lennart.kudling.de/haskellPi/ The Haskell version is 100 times slower and i wonder whether i do something obvious wrong. Profiling says that the majority of the time is spend in main. But i have no idea where. Can someone give me a hint? Thanks, Lenny individual inherited COST CENTRE MODULE no.entries %time %alloc %time %alloc MAIN MAIN 1 0 0.00.0 100.0 100.0 mainMain 254 1 88.1 90.8 100.0 100.0 monteCarloPi Main 255 1 0.61.111.99.2 pairs Main 2571000 0.71.4 0.71.4 countHits Main 2561001 4.22.910.66.7 accumulateHitMain 25827852236 3.02.3 6.43.8 isInCircle Main 2593000 3.31.5 3.31.5 CAF:lit_r1A7Main 248 1 0.00.0 0.00.0 isInCircle Main 260 0 0.00.0 0.00.0 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
But you can remove sqrt from the C++ implementation as well, so it only improves the relative performance if the C++ implementation of sqrt is worse than its Haskell counterpart. Oops, of course I mean, you only improve if Haskell's implementation is worse than C++'s implementation :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
But you can remove sqrt from the C++ implementation as well, so it only improves the relative performance if the C++ implementation of sqrt is worse than its Haskell counterpart. Oops, of course I mean, you only improve if Haskell's implementation is worse than C++'s implementation :) Actually i intend to compare real life performance of comparable algorithms. If sqrt is a problem in GHC/Haskell then this would be a valuable result. I want to know about these problems and not work around them at this point. Thanks, lenny ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Yep, this program will crawl. You can get reasonable numeric performance out of GHC, but you need to work at it. There is some advice in the GHC manual at http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html . The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Ben. On 26/02/2009, at 8:53 PM, hask...@kudling.de wrote: Hi, i have compared a C++ implementation with a Haskell implementation of the Monte Carlo Pi approximation: http://lennart.kudling.de/haskellPi/ The Haskell version is 100 times slower and i wonder whether i do something obvious wrong. Profiling says that the majority of the time is spend in main. But i have no idea where. Can someone give me a hint? Thanks, Lenny ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
hask...@kudling.de writes: Profiling says that the majority of the time is spend in main. But i have no idea where. Can someone give me a hint? Yes. Lots of them, but somehow, I suspect nobody tried your code. COST CENTRE MODULE no.entries %time %alloc %time %alloc MAIN MAIN 1 0 0.00.0 100.0 100.0 mainMain 254 1 88.1 90.8 100.0 100.0 monteCarloPi Main 255 1 0.61.111.99.2 pairs Main 2571000 0.71.4 0.71.4 countHits Main 2561001 4.22.910.66.7 accumulateHitMain 25827852236 3.02.3 6.43.8 isInCircle Main 2593000 3.31.5 3.31.5 CAF:lit_r1A7Main 248 1 0.00.0 0.00.0 isInCircle Main 260 0 0.00.0 0.00.0 Thomas van Noort: First thing I noticed, how about removing the sqrt in isInCircle: I did this. The result was the same - the sqrt doesn't matter, and perhaps it even gets optimized away? The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Then I did that. The result was still the same - I bet GHC is smart enought to specialize this on its own. The intresting thing about your profile is that all the time is spent in 'main'. Now, why would that be? I refactored a bit, and in partiuclar wrote this function: mkRandoms :: IO [Double] mkRandoms = do randomNumberGenerator - getStdGen return $ randoms randomNumberGenerator Here's the new profile (10^6 iterations): COST CENTREMODULE %time %alloc mkRandoms Main 96.1 96.0 accumulateHit Main 1.61.7 pairs Main 1.21.3 monteCarloPi Main 0.21.0 So it seems we're just tremendously lousy at generating random Doubles. Eliminating this bottleneck, the remaining 3.9% would put it at about 2x the C++ performance. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Hi, thanks for your input. You can get reasonable numeric performance out of GHC, but you need to work at it. There is some advice in the GHC manual at http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html I am using -O2 and strictness already. Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. The thing is that i don't see in the profile output yet what to improve. There are some allocations going on in main, but i don't know what causes it. The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? Ben. On 26/02/2009, at 8:53 PM, hask...@kudling.de wrote: Hi, i have compared a C++ implementation with a Haskell implementation of the Monte Carlo Pi approximation: http://lennart.kudling.de/haskellPi/ The Haskell version is 100 times slower and i wonder whether i do something obvious wrong. Profiling says that the majority of the time is spend in main. But i have no idea where. Can someone give me a hint? Thanks, Lenny ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. The thing is that i don't see in the profile output yet what to improve. There are some allocations going on in main, but i don't know what causes it. The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? At the machine level, GHC treats the (Floating a, Ord a) as an extra argument to the function. This argument holds function pointers that tell it how to perform multiplication and = for the unknown type 'a'. If you use Double instead of 'a', then it's more likely to use the actual machine op. Ben. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Performance question
Ben Lippmeier wrote: The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? At the machine level, GHC treats the (Floating a, Ord a) as an extra argument to the function. This argument holds function pointers that tell it how to perform multiplication and = for the unknown type 'a'. If you use Double instead of 'a', then it's more likely to use the actual machine op. I'd recommend use of a SPECIALIZE pragma instead of rewriting the code itself: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html (section 8.13.8) Ganesh === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: - You don't need to compute samples count an extra time - You don't need to assemble double pairs from a list - Notice the strictness in randomDoublePairs: it doubled performance {-# LANGUAGE BangPatterns #-} import System.Random.Mersenne.Pure64 import System( getArgs ) import Data.List( foldl' ) isInCircle :: (Double,Double) - Bool isInCircle (!x,!y) = sqrt (x*x + y*y) = 1.0 accumulateHit :: Int - (Double,Double) - Int accumulateHit !hits pair = if isInCircle pair then hits + 1 else hits monteCarloPi :: Int - [(Double,Double)] - Double monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n where hits = foldl' accumulateHit 0 . take n $ xs randomDoublePairs g = let !(!x,!g') = randomDouble g !(!y,!g'') = randomDouble g' in (x,y):randomDoublePairs g'' main = do samples - (read . head) `fmap` getArgs randomNumbers - randomDoublePairs `fmap` newPureMT putStrLn . show $ monteCarloPi samples randomNumbers j...@*:~/montecarlo$ time ./mc-hs 1000 3.1417088 real0m1.141s user0m1.140s sys 0m0.000s j...@*:~/montecarlo$ time ./mc 1000 1000 3.14113 real0m0.693s user0m0.690s sys 0m0.000s 2009/2/26 Ben Lippmeier ben.lippme...@anu.edu.au: On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. The thing is that i don't see in the profile output yet what to improve. There are some allocations going on in main, but i don't know what causes it. The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? At the machine level, GHC treats the (Floating a, Ord a) as an extra argument to the function. This argument holds function pointers that tell it how to perform multiplication and = for the unknown type 'a'. If you use Double instead of 'a', then it's more likely to use the actual machine op. Ben. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
I replaced the standard random number generated with the one from mersenne-random. On my system this makes the resulting program about 14 times faster than the original. I also made a change to accumulateHit because it doesn't need to count to total. That is already known. {-# LANGUAGE BangPatterns #-} import System( getArgs ) import Data.List( foldl' ) import System.Random.Mersenne pairs :: [a] - [(a,a)] pairs [] = [] pairs (x:[]) = [] pairs (x:y:rest) = (x, y) : pairs rest isInCircle :: (Double, Double) - Bool isInCircle (x,y) = sqrt (x*x + y*y) = 1.0 accumulateHit :: Int - (Double, Double) - Int accumulateHit (!hits) pair | isInCircle pair = hits + 1 | otherwise = hits countHits :: [(Double, Double)] - Int countHits ps = foldl' accumulateHit 0 ps monteCarloPi :: Int - [(Double, Double)] - Double monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n where hits = countHits $ take n xs main = do args - getArgs let samples = read $ head args randomNumberGenerator - getStdGen randomNumbers - randoms randomNumberGenerator let res = monteCarloPi samples $ pairs randomNumbers putStrLn $ show $ res ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
accumulateHit because it doesn't need to count to total. That is already known. Well in theory i agree. But somone could feed a non-infite number of doubles. Then the argument n would not necessarily be the same as the number of random number pairs really used. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Do you think it would be feasable to replace the GHC implementation of System.Random with something like System.Random.Mersenne? Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: - You don't need to compute samples count an extra time - You don't need to assemble double pairs from a list - Notice the strictness in randomDoublePairs: it doubled performance {-# LANGUAGE BangPatterns #-} import System.Random.Mersenne.Pure64 import System( getArgs ) import Data.List( foldl' ) isInCircle :: (Double,Double) - Bool isInCircle (!x,!y) = sqrt (x*x + y*y) = 1.0 accumulateHit :: Int - (Double,Double) - Int accumulateHit !hits pair = if isInCircle pair then hits + 1 else hits monteCarloPi :: Int - [(Double,Double)] - Double monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n where hits = foldl' accumulateHit 0 . take n $ xs randomDoublePairs g = let !(!x,!g') = randomDouble g !(!y,!g'') = randomDouble g' in (x,y):randomDoublePairs g'' main = do samples - (read . head) `fmap` getArgs randomNumbers - randomDoublePairs `fmap` newPureMT putStrLn . show $ monteCarloPi samples randomNumbers j...@*:~/montecarlo$ time ./mc-hs 1000 3.1417088 real0m1.141s user0m1.140s sys 0m0.000s j...@*:~/montecarlo$ time ./mc 1000 1000 3.14113 real0m0.693s user0m0.690s sys 0m0.000s 2009/2/26 Ben Lippmeier ben.lippme...@anu.edu.au: On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. The thing is that i don't see in the profile output yet what to improve. There are some allocations going on in main, but i don't know what causes it. The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? At the machine level, GHC treats the (Floating a, Ord a) as an extra argument to the function. This argument holds function pointers that tell it how to perform multiplication and = for the unknown type 'a'. If you use Double instead of 'a', then it's more likely to use the actual machine op. Ben. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
I, personally, do, but I think that's more of a question to the GHC people :) 2009/2/26 hask...@kudling.de: Do you think it would be feasable to replace the GHC implementation of System.Random with something like System.Random.Mersenne? Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: - You don't need to compute samples count an extra time - You don't need to assemble double pairs from a list - Notice the strictness in randomDoublePairs: it doubled performance {-# LANGUAGE BangPatterns #-} import System.Random.Mersenne.Pure64 import System( getArgs ) import Data.List( foldl' ) isInCircle :: (Double,Double) - Bool isInCircle (!x,!y) = sqrt (x*x + y*y) = 1.0 accumulateHit :: Int - (Double,Double) - Int accumulateHit !hits pair = if isInCircle pair then hits + 1 else hits monteCarloPi :: Int - [(Double,Double)] - Double monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n where hits = foldl' accumulateHit 0 . take n $ xs randomDoublePairs g = let !(!x,!g') = randomDouble g !(!y,!g'') = randomDouble g' in (x,y):randomDoublePairs g'' main = do samples - (read . head) `fmap` getArgs randomNumbers - randomDoublePairs `fmap` newPureMT putStrLn . show $ monteCarloPi samples randomNumbers j...@*:~/montecarlo$ time ./mc-hs 1000 3.1417088 real 0m1.141s user 0m1.140s sys 0m0.000s j...@*:~/montecarlo$ time ./mc 1000 1000 3.14113 real 0m0.693s user 0m0.690s sys 0m0.000s 2009/2/26 Ben Lippmeier ben.lippme...@anu.edu.au: On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. The thing is that i don't see in the profile output yet what to improve. There are some allocations going on in main, but i don't know what causes it. The first thing I would do is replace your isInCircle :: (Floating a, Ord a) = (a,a) - Bool with isInCircle :: (Double, Double) - Bool Can you point me to why that matters? At the machine level, GHC treats the (Floating a, Ord a) as an extra argument to the function. This argument holds function pointers that tell it how to perform multiplication and = for the unknown type 'a'. If you use Double instead of 'a', then it's more likely to use the actual machine op. Ben. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On Thu, Feb 26, 2009 at 7:56 AM, Eugene Kirpichov ekirpic...@gmail.com wrote: Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: And I would like to notice that rand() is incredibly dumber than the Mersenne twister, so probably if we took rand()'s code from glibc and rewrote it in Haskell, there would be another performance increase. -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
C's rand() function is very bad and should never be used really. On Thu, Feb 26, 2009 at 11:37 AM, Felipe Lessa felipe.le...@gmail.com wrote: On Thu, Feb 26, 2009 at 7:56 AM, Eugene Kirpichov ekirpic...@gmail.com wrote: Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: And I would like to notice that rand() is incredibly dumber than the Mersenne twister, so probably if we took rand()'s code from glibc and rewrote it in Haskell, there would be another performance increase. -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
hask...@kudling.de wrote: Do you think it would be feasable to replace the GHC implementation of System.Random with something like System.Random.Mersenne? There's a problem with using the Mersenne Twister: System.Random's interface has a split method: class RandomGen g where split:: g - (g, g) The Mersenne Twister is good at producing a single stream of random numbers - in fact it works by generating a whole block of random numbers in one go, then consuming the block, and only then generating the next block. I have no idea how to implement a split method that produces independent streams. Even if I did, using split a lot would likely spoil the performance benefit of the generator. (System.Random.Mersenne.Pure64 provides a RandomGen instance for PureMT, but it cheats:) split = error System.Random.Mersenne.Pure: unable to split the mersenne twister Bertram ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
You can implement a reasonable split if you can fast-forward the generator. There's no known method to fast-forward the MT, but other generators like MRG32k3a can handle it. -- Lennart On Thu, Feb 26, 2009 at 12:08 PM, Bertram Felgenhauer bertram.felgenha...@googlemail.com wrote: hask...@kudling.de wrote: Do you think it would be feasable to replace the GHC implementation of System.Random with something like System.Random.Mersenne? There's a problem with using the Mersenne Twister: System.Random's interface has a split method: class RandomGen g where split :: g - (g, g) The Mersenne Twister is good at producing a single stream of random numbers - in fact it works by generating a whole block of random numbers in one go, then consuming the block, and only then generating the next block. I have no idea how to implement a split method that produces independent streams. Even if I did, using split a lot would likely spoil the performance benefit of the generator. (System.Random.Mersenne.Pure64 provides a RandomGen instance for PureMT, but it cheats:) split = error System.Random.Mersenne.Pure: unable to split the mersenne twister Bertram ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On Thu, Feb 26, 2009 at 7:17 AM, Lennart Augustsson lenn...@augustsson.netwrote: You can implement a reasonable split if you can fast-forward the generator. There's no known method to fast-forward the MT, but other generators like MRG32k3a can handle it. Are you aware of any existing (C/C++/Haskell) library implementing this algorithm? A cursory google search didn't turn anything up for me, aside from something implemented in Java, and another in Lisp. --Tracy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
2009/2/26 Tracy Wadleigh tracy.wadle...@gmail.com: On Thu, Feb 26, 2009 at 7:17 AM, Lennart Augustsson lenn...@augustsson.net wrote: You can implement a reasonable split if you can fast-forward the generator. There's no known method to fast-forward the MT, but other generators like MRG32k3a can handle it. Are you aware of any existing (C/C++/Haskell) library implementing this algorithm? A cursory google search didn't turn anything up for me, aside from something implemented in Java, and another in Lisp. Maybe http://www.iro.umontreal.ca/~lecuyer/myftp/streams00/ ? -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Awesome, Felipe. Thanks. --Tracy On Thu, Feb 26, 2009 at 11:07 AM, Felipe Lessa felipe.le...@gmail.comwrote: 2009/2/26 Tracy Wadleigh tracy.wadle...@gmail.com: On Thu, Feb 26, 2009 at 7:17 AM, Lennart Augustsson lenn...@augustsson.net wrote: You can implement a reasonable split if you can fast-forward the generator. There's no known method to fast-forward the MT, but other generators like MRG32k3a can handle it. Are you aware of any existing (C/C++/Haskell) library implementing this algorithm? A cursory google search didn't turn anything up for me, aside from something implemented in Java, and another in Lisp. Maybe http://www.iro.umontreal.ca/~lecuyer/myftp/streams00/http://www.iro.umontreal.ca/%7Elecuyer/myftp/streams00/? -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Ben.Lippmeier: On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. data T = T !Double !Double should be enough. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
I looked at the core and the tuples were already unboxed IIRC. 2009/2/26 Don Stewart d...@galois.com: Ben.Lippmeier: On 26/02/2009, at 9:27 PM, hask...@kudling.de wrote: Currently i can only imagine to define a data type in order to use unboxed Ints instead of the accumulator tuple. That would probably help a lot. It would also help to use two separate Double# parameters instead of the tuple. data T = T !Double !Double should be enough. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
vandijk.roel: I replaced the standard random number generated with the one from mersenne-random. On my system this makes the resulting program about 14 times faster than the original. I also made a change to accumulateHit because it doesn't need to count to total. That is already known. {-# LANGUAGE BangPatterns #-} import System( getArgs ) import Data.List( foldl' ) import System.Random.Mersenne pairs :: [a] - [(a,a)] pairs [] = [] pairs (x:[]) = [] pairs (x:y:rest) = (x, y) : pairs rest isInCircle :: (Double, Double) - Bool isInCircle (x,y) = sqrt (x*x + y*y) = 1.0 accumulateHit :: Int - (Double, Double) - Int accumulateHit (!hits) pair | isInCircle pair = hits + 1 | otherwise = hits countHits :: [(Double, Double)] - Int countHits ps = foldl' accumulateHit 0 ps monteCarloPi :: Int - [(Double, Double)] - Double monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n where hits = countHits $ take n xs main = do args - getArgs let samples = read $ head args randomNumberGenerator - getStdGen randomNumbers - randoms randomNumberGenerator let res = monteCarloPi samples $ pairs randomNumbers putStrLn $ show $ res But note the lazy list of Double pairs, so the inner loop still looks like this though: $wlgo :: Int# - [(Double, Double)] - Int $wlgo = \ (ww_s1pv :: Int#) (w_s1px :: [(Double, Double)]) - case w_s1px of wild_aTl { [] - I# ww_s1pv; : x_aTp xs_aTq - case x_aTp of wild1_B1 { (x1_ak3, y_ak5) - case x1_ak3 of wild2_aX8 { D# x2_aXa - case y_ak5 of wild3_XYs { D# x3_XYx - case =## (sqrtDouble# (+## (*## x2_aXa x2_aXa) (*## x3_XYx x3_XYx))) 1.0 of wild4_X1D { False - $wlgo ww_s1pv xs_aTq; True - $wlgo (+# ww_s1pv 1) xs_aTq } while we want to keep everything in registers with something like: Int# - Double# - Double# - Int# So we'll be paying a penalty to force the next elem of the list (instead of just calling the Double generator). This definitely has an impact on performance. $ ghc-core B.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./B 1000 3.1407688 ./B 1000 2.41s user 0.01s system 99% cpu 2.415 total Now, what if we just rewrote that inner loop directly to avoid intermediate stuff? That'd give us a decent lower bound. {-# LANGUAGE BangPatterns #-} import System.Environment import System.Random.Mersenne isInCircle :: Double - Double - Bool isInCircle x y = sqrt (x*x + y*y) = 1.0 countHits :: Int - IO Int countHits lim = do g - newMTGen Nothing let go :: Int - Int - IO Int go !throws !hits | throws = lim = return hits | otherwise = do x - random g -- use mersenne-random-pure64 to stay pure! y - random g if isInCircle x y then go (throws+1) (hits+1) else go (throws+1) hits go 0 0 monteCarloPi :: Int - IO Double monteCarloPi n = do hits - countHits n return $ 4.0 * fromIntegral hits / fromIntegral n main = do [n] - getArgs res - monteCarloPi (read n) print res And now the inner loop looks like: $wa_s1yW :: Int# - Int# - State# RealWorld - (# State# RealWorld, Int #) Pretty good. Can't avoid the Int boxed return (and resulting heap check) due to use of IO monad. But at least does away with heap allocs in the inner loop! How does it go: $ ghc-core A.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./A 1000 3.1412564 ./A 1000 0.81s user 0.00s system 99% cpu 0.818 total Ok. So 3 times faster. Now the goal is to recover the high level version. We have many tools to employ: switching to mersenne-random-pure64 might help here. And seeing if you can fuse filling a uvector with randoms, and folding over it... t -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
How about an FFI call to rand() and then measure the performance On Thu, Feb 26, 2009 at 3:37 AM, Felipe Lessa felipe.le...@gmail.comwrote: On Thu, Feb 26, 2009 at 7:56 AM, Eugene Kirpichov ekirpic...@gmail.com wrote: Here is a variant that uses mersenne-random-pure64 and works less than 2x slower than C++: And I would like to notice that rand() is incredibly dumber than the Mersenne twister, so probably if we took rand()'s code from glibc and rewrote it in Haskell, there would be another performance increase. -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
On Thu, Feb 26, 2009 at 6:23 PM, Don Stewart d...@galois.com wrote: But note the lazy list of Double pairs, so the inner loop still looks like this though: $wlgo :: Int# - [(Double, Double)] - Int $wlgo = \ (ww_s1pv :: Int#) (w_s1px :: [(Double, Double)]) - case w_s1px of wild_aTl { [] - I# ww_s1pv; : x_aTp xs_aTq - case x_aTp of wild1_B1 { (x1_ak3, y_ak5) - case x1_ak3 of wild2_aX8 { D# x2_aXa - case y_ak5 of wild3_XYs { D# x3_XYx - case =## (sqrtDouble# (+## (*## x2_aXa x2_aXa) (*## x3_XYx x3_XYx))) 1.0 of wild4_X1D { False - $wlgo ww_s1pv xs_aTq; True - $wlgo (+# ww_s1pv 1) xs_aTq } while we want to keep everything in registers with something like: Int# - Double# - Double# - Int# So we'll be paying a penalty to force the next elem of the list (instead of just calling the Double generator). This definitely has an impact on performance. $ ghc-core B.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./B 1000 3.1407688 ./B 1000 2.41s user 0.01s system 99% cpu 2.415 total Now, what if we just rewrote that inner loop directly to avoid intermediate stuff? That'd give us a decent lower bound. {-# LANGUAGE BangPatterns #-} import System.Environment import System.Random.Mersenne isInCircle :: Double - Double - Bool isInCircle x y = sqrt (x*x + y*y) = 1.0 countHits :: Int - IO Int countHits lim = do g - newMTGen Nothing let go :: Int - Int - IO Int go !throws !hits | throws = lim = return hits | otherwise = do x - random g -- use mersenne-random-pure64 to stay pure! y - random g if isInCircle x y then go (throws+1) (hits+1) else go (throws+1) hits go 0 0 monteCarloPi :: Int - IO Double monteCarloPi n = do hits - countHits n return $ 4.0 * fromIntegral hits / fromIntegral n main = do [n] - getArgs res - monteCarloPi (read n) print res And now the inner loop looks like: $wa_s1yW :: Int# - Int# - State# RealWorld - (# State# RealWorld, Int #) Pretty good. Can't avoid the Int boxed return (and resulting heap check) due to use of IO monad. But at least does away with heap allocs in the inner loop! How does it go: $ ghc-core A.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./A 1000 3.1412564 ./A 1000 0.81s user 0.00s system 99% cpu 0.818 total Ok. So 3 times faster. Now the goal is to recover the high level version. We have many tools to employ: switching to mersenne-random-pure64 might help here. And seeing if you can fuse filling a uvector with randoms, and folding over it... t -- Don Very nice! I also wrote a naive version which used uvector but it was about twice as slow as the original Haskell version. I wanted to write lengthU . filterU isInCircle because that clearly expresses the algorithm. Sadly I was at work and didn't have time for profile the program to see what was wrong. Still, I couldn't resist having a go at the problem :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance question
Lennart Augustsson lenn...@augustsson.net writes: C's rand() function is very bad and should never be used really. On Linux (really GNU libc, I suppose) it is the same as random(). But for portability, one is encouraged to spend the two extra letters. I don't understand the details, but I think the Mersenne twister is much better in any case. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
On Mon, Jan 17, 2005 at 08:54:38PM -0800, Ben Rudiak-Gould wrote: If performance is the main concern, I would flatten the data structure: data Interval = IlII Double Double | IlIE Double Double | IlEI Double Double | IlEE Double Double | NilII Double Double | NilIE Double Double | NilEI Double Double | NilEE Double Double I would go even further data IntervalType = IlII | IlIE | IlEI | IlEE | NilII | NilIE | NilEI | NilEE data Interval = Interval IntervalType {-# UNPACK #-} !Double {-# UNPACK #-} !Double now, the doubles can be stored in their native form and are not under a union data type (which always must be represented by a pointer) so accessing them can be very fast. John -- John Meacham - repetae.netjohn ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Stijn De Saeger wrote: data Bound = I Double | E Double deriving (Eq, Show, Ord) data Interval = Il Bound Bound | Nil Bound Bound deriving (Eq,Ord) isIn :: Double - Interval - Bool isIn r (Nil x y) = not (isIn r (Il x y)) isIn r (Il (I x) (I y)) = r = x r = y isIn r (Il (I x) (E y)) = r = x r y isIn r (Il (E x) (I y)) = r x r = y isIn r (Il (E x) (E y)) = r x r y If performance is the main concern, I would flatten the data structure: data Interval = IlII Double Double | IlIE Double Double | IlEI Double Double | IlEE Double Double | NilII Double Double | NilIE Double Double | NilEI Double Double | NilEE Double Double isIn :: Double - Interval - Bool isIn r (IlII x y) = r = x r = y isIn r (IlIE x y) = r = x r y isIn r (IlEI x y) = r x r = y isIn r (IlEE x y) = r x r y isIn r (NilII x y) = r x || r y isIn r (NilIE x y) = r x || r = y isIn r (NilEI x y) = r = x || r y isIn r (NilEE x y) = r = x || r = y Depending on your application you might not need all of those cases. Another neat trick you can pull is to take advantage of the fact that Double is actually a discrete type, like Int, and you can therefore get away with closed intervals only: data Interval = Il Double Double | Nil Double Double isIn :: Double - Interval - Bool isIn r (Il x y) = r = x r = y isIn r (Nil x y) = r x || r y But this requires nextLargestDouble and nextSmallestDouble functions. I don't know if Haskell provides them. Also, you could run into trouble with wider-than-Double intermediate values. Finally, if you never do anything with intervals except pass them to isIn, you can do this: type Interval = Double - Bool isIn r i = i r -- Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe